public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [hsa merge 00/10] Merge of HSA branch
@ 2016-01-13 17:55 Martin Jambor
  2016-01-13 17:55 ` [hsa merge 01/10] Configury changes and new options Martin Jambor
                   ` (10 more replies)
  0 siblings, 11 replies; 77+ messages in thread
From: Martin Jambor @ 2016-01-13 17:55 UTC (permalink / raw)
  To: GCC Patches

Hi,

this is hopefully the last big re-post of the HSA patches.  We have
incorporated all the feedback and found and fixed a couple more bugs.
The complete patch-set bootstraps and tests fine on an x86_64-linux,
when you do not enable HSA, there are a few expected warnings when HSA
is enabled which I will address as a followup together with more
testsuite changes.  The patches were specifically designed so that the
impact on pople not enabling HSA should be minimal.  A last round of
complete testing on an actual HSA-capable APU is still underway and I
won't have the results until tomorrow but preliminary results were
good and I di dnot want to hold up these patches for any longer.

The libgomp, omp and configuration bits have been reviewed by Jakub, a
few other bits by Richi, but still Honza should review the IPA parts
and I suppose someone other than me should ack the hsa-* files, even
though I probably now have the authority to do it myself.

Thanks everybody for patience and feedback.  While we are of course
opened for mor more of it, let's also hope the approval process will
finish soon as it should now.

Martin

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [hsa merge 01/10] Configury changes and new options
  2016-01-13 17:55 [hsa merge 00/10] Merge of HSA branch Martin Jambor
@ 2016-01-13 17:55 ` Martin Jambor
  2016-01-14 12:35   ` Jakub Jelinek
  2016-02-02 12:53   ` Thomas Schwinge
  2016-01-13 17:55 ` [hsa merge 04/10] Avoid extraneous remapping in copy_gimple_seq_and_replace_locals Martin Jambor
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 77+ messages in thread
From: Martin Jambor @ 2016-01-13 17:55 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: configury.diff --]
[-- Type: text/plain, Size: 15092 bytes --]

Hi,

this patch contains changes to the configuration mechanism and offload
bits, so that users can build compilers with HSA support.

It is a re-post of
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00714.html, which, has
already been approved by Jakub after a few changes
(https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01284.html).

thanks,

Martin

2016-01-13  Martin Jambor  <mjambor@suse.cz>

	* Makefile.in (OBJS): Add new source files.
	(GTFILES): Add hsa.c.
	* common.opt (disable_hsa): New variable.
	(-Whsa): New warning.
	* config.in (ENABLE_HSA): New.
	* configure.ac: Treat hsa differently from other accelerators.
	(OFFLOAD_TARGETS): Define ENABLE_OFFLOADING according to
	$enable_offloading.
	(ENABLE_HSA): Define ENABLE_HSA according to $enable_hsa.
	* doc/install.texi (Configuration): Document --with-hsa-runtime,
	--with-hsa-runtime-include, --with-hsa-runtime-lib and
	--with-hsa-kmt-lib.
	* doc/invoke.texi (-Whsa): Document.
	(hsa-gen-debug-stores): Likewise.
	* lto-wrapper.c (compile_images_for_offload_targets): Do not attempt
	to invoke offload compiler for hsa acclerator.
	* opts.c (common_handle_option): Determine whether HSA offloading
	should be performed.
	* params.def (PARAM_HSA_GEN_DEBUG_STORES): New parameter.

libgomp/plugin/
	* Makefrag.am: Add HSA plugin requirements.
	* configfrag.ac (HSA_RUNTIME_INCLUDE): New variable.
	(HSA_RUNTIME_LIB): Likewise.
	(HSA_RUNTIME_CPPFLAGS): Likewise.
	(HSA_RUNTIME_INCLUDE): New substitution.
	(HSA_RUNTIME_LIB): Likewise.
	(HSA_RUNTIME_LDFLAGS): Likewise.
	(hsa-runtime): New configure option.
	(hsa-runtime-include): Likewise.
	(hsa-runtime-lib): Likewise.
	(PLUGIN_HSA): New substitution variable.
	Fill HSA_RUNTIME_INCLUDE and HSA_RUNTIME_LIB according to the new
	configure options.
	(PLUGIN_HSA_CPPFLAGS): Likewise.
	(PLUGIN_HSA_LDFLAGS): Likewise.
	(PLUGIN_HSA_LIBS): Likewise.
	Check that we have access to HSA run-time.

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 44a18eb..ab9cbbf 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1297,6 +1297,11 @@ OBJS = \
 	graphite-sese-to-poly.o \
 	gtype-desc.o \
 	haifa-sched.o \
+	hsa.o \
+	hsa-gen.o \
+	hsa-regalloc.o \
+	hsa-brig.o \
+	hsa-dump.o \
 	hw-doloop.o \
 	hwint.o \
 	ifcvt.o \
@@ -1321,6 +1326,7 @@ OBJS = \
 	ipa-icf.o \
 	ipa-icf-gimple.o \
 	ipa-reference.o \
+	ipa-hsa.o \
 	ipa-ref.o \
 	ipa-utils.o \
 	ipa.o \
@@ -2404,6 +2410,7 @@ GTFILES = $(CPP_ID_DATA_H) $(srcdir)/input.h $(srcdir)/coretypes.h \
   $(srcdir)/sancov.c \
   $(srcdir)/ipa-devirt.c \
   $(srcdir)/internal-fn.h \
+  $(srcdir)/hsa.c \
   @all_gtfiles@
 
 # Compute the list of GT header files from the corresponding C sources,
diff --git a/gcc/common.opt b/gcc/common.opt
index 49d347c..23e6ed7 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -239,6 +239,10 @@ Inserts call to __sanitizer_cov_trace_pc into every basic block.
 Variable
 bool dump_base_name_prefixed = false
 
+; Flag whether HSA generation has been explicitely disabled
+Variable
+bool flag_disable_hsa = false
+
 ###
 Driver
 
@@ -593,6 +597,10 @@ Wfree-nonheap-object
 Common Var(warn_free_nonheap_object) Init(1) Warning
 Warn when attempting to free a non-heap object.
 
+Whsa
+Common Var(warn_hsa) Init(1) Warning
+Warn when a function cannot be expanded to HSAIL.
+
 Winline
 Common Var(warn_inline) Warning
 Warn when an inlined function cannot be inlined.
diff --git a/gcc/config.in b/gcc/config.in
index c00cd0f..c3340bb0 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -144,6 +144,12 @@
 #endif
 
 
+/* Define this to enable support for generating HSAIL. */
+#ifndef USED_FOR_TARGET
+#undef ENABLE_HSA
+#endif
+
+
 /* Define if gcc should always pass --build-id to linker. */
 #ifndef USED_FOR_TARGET
 #undef ENABLE_LD_BUILDID
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 0a626e9..8d3a869 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -940,6 +940,13 @@ AC_SUBST(accel_dir_suffix)
 
 for tgt in `echo $enable_offload_targets | sed 's/,/ /g'`; do
   tgt=`echo $tgt | sed 's/=.*//'`
+
+  if echo "$tgt" | grep "^hsa" > /dev/null ; then
+    enable_hsa=1
+  else
+    enable_offloading=1
+  fi
+
   if test x"$offload_targets" = x; then
     offload_targets=$tgt
   else
@@ -948,7 +955,7 @@ for tgt in `echo $enable_offload_targets | sed 's/,/ /g'`; do
 done
 AC_DEFINE_UNQUOTED(OFFLOAD_TARGETS, "$offload_targets",
   [Define to offload targets, separated by commas.])
-if test x"$offload_targets" != x; then
+if test x"$enable_offloading" != x; then
   AC_DEFINE(ENABLE_OFFLOADING, 1,
     [Define this to enable support for offloading.])
 else
@@ -956,6 +963,11 @@ else
     [Define this to enable support for offloading.])
 fi
 
+if test x"$enable_hsa" = x1 ; then
+  AC_DEFINE(ENABLE_HSA, 1,
+    [Define this to enable support for generating HSAIL.])
+fi
+
 AC_ARG_WITH(multilib-list,
 [AS_HELP_STRING([--with-multilib-list], [select multilibs (AArch64, SH and x86-64 only)])],
 :,
diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 1c2b702..062f42c 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1992,6 +1992,28 @@ specifying paths @var{path1}, @dots{}, @var{pathN}.
 % @var{srcdir}/configure \
     --enable-offload-target=i686-unknown-linux-gnu=/path/to/i686/compiler,x86_64-pc-linux-gnu
 @end smallexample
+
+If @samp{hsa} is specified as one of the targets, the compiler will be
+built with support for HSA GPU accelerators.  Because the same
+compiler will emit the accelerator code, no path should be specified.
+
+@item --with-hsa-runtime=@var{pathname}
+@itemx --with-hsa-runtime-include=@var{pathname}
+@itemx --with-hsa-runtime-lib=@var{pathname}
+
+If you configure GCC with HSA offloading but do not have the HSA
+run-time library installed in a standard location then you can
+explicitly specify the directory where they are installed.  The
+@option{--with-hsa-runtime=@/@var{hsainstalldir}} option is a
+shorthand for
+@option{--with-hsa-runtime-lib=@/@var{hsainstalldir}/lib} and
+@option{--with-hsa-runtime-include=@/@var{hsainstalldir}/include}.
+
+@item --with-hsa-kmt-lib=@var{pathname}
+
+If you configure GCC with HSA offloading but do not have the HSA
+KMT library installed in a standard location then you can
+explicitly specify the directory where it resides.
 @end table
 
 @subheading Cross-Compiler-Specific Options
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 77115a7..8ce04b1 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -295,7 +295,7 @@ Objective-C and Objective-C++ Dialects}.
 -Wunused-but-set-parameter -Wunused-but-set-variable @gol
 -Wuseless-cast -Wvariadic-macros -Wvector-operation-performance @gol
 -Wvla -Wvolatile-register-var  -Wwrite-strings @gol
--Wzero-as-null-pointer-constant}
+-Wzero-as-null-pointer-constant -Whsa}
 
 @item C and Objective-C-only Warning Options
 @gccoptlist{-Wbad-function-cast  -Wmissing-declarations @gol
@@ -5721,6 +5721,10 @@ Suppress warnings when a positional initializer is used to initialize
 a structure that has been marked with the @code{designated_init}
 attribute.
 
+@item -Whsa
+Issue a warning when HSAIL cannot be emitted for the compiled function or
+OpenMP construct.
+
 @end table
 
 @node Debugging Options
@@ -11262,6 +11266,12 @@ dynamic, guided, auto, runtime).  The default is static.
 Maximum depth of recursion when querying properties of SSA names in things
 like fold routines.  One level of recursion corresponds to following a
 use-def chain.
+
+@item hsa-gen-debug-stores
+Enable emission of special debug stores within HSA kernels which are
+then read and reported by libgomp plugin.  Generation of these stores
+is disabled by default, use @option{--param hsa-gen-debug-stores=1} to
+enable it.
 @end table
 @end table
 
diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
index ecb9996..16d1f45 100644
--- a/gcc/lto-wrapper.c
+++ b/gcc/lto-wrapper.c
@@ -736,6 +736,7 @@ compile_images_for_offload_targets (unsigned in_argc, char *in_argv[],
     return;
   unsigned num_targets = parse_env_var (target_names, &names, NULL);
 
+  int next_name_entry = 0;
   const char *compiler_path = getenv ("COMPILER_PATH");
   if (!compiler_path)
     goto out;
@@ -745,13 +746,19 @@ compile_images_for_offload_targets (unsigned in_argc, char *in_argv[],
   offload_names = XCNEWVEC (char *, num_targets + 1);
   for (unsigned i = 0; i < num_targets; i++)
     {
-      offload_names[i]
+      /* HSA does not use LTO-like streaming and a different compiler, skip
+	 it. */
+      if (strcmp (names[i], "hsa") == 0)
+	continue;
+
+      offload_names[next_name_entry]
 	= compile_offload_image (names[i], compiler_path, in_argc, in_argv,
 				 compiler_opts, compiler_opt_count,
 				 linker_opts, linker_opt_count);
-      if (!offload_names[i])
+      if (!offload_names[next_name_entry])
 	fatal_error (input_location,
 		     "problem with building target image for %s\n", names[i]);
+      next_name_entry++;
     }
 
  out:
diff --git a/gcc/opts.c b/gcc/opts.c
index 2add158..8e8410c 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -1916,8 +1916,35 @@ common_handle_option (struct gcc_options *opts,
       break;
 
     case OPT_foffload_:
-      /* Deferred.  */
-      break;
+      {
+	const char *p = arg;
+	opts->x_flag_disable_hsa = true;
+	while (*p != 0)
+	  {
+	    const char *comma = strchr (p, ',');
+
+	    if ((strncmp (p, "disable", 7) == 0)
+		&& (p[7] == ',' || p[7] == '\0'))
+	      {
+		opts->x_flag_disable_hsa = true;
+		break;
+	      }
+
+	    if ((strncmp (p, "hsa", 3) == 0)
+		&& (p[3] == ',' || p[3] == '\0'))
+	      {
+#ifdef ENABLE_HSA
+		opts->x_flag_disable_hsa = false;
+#else
+		sorry ("HSA has not been enabled during configuration");
+#endif
+	      }
+	    if (!comma)
+	      break;
+	    p = comma + 1;
+	  }
+	break;
+      }
 
 #ifndef ACCEL_COMPILER
     case OPT_foffload_abi_:
diff --git a/gcc/params.def b/gcc/params.def
index 308844a..88971c7 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -1183,6 +1183,11 @@ DEFPARAM (PARAM_MAX_RTL_IF_CONVERSION_INSNS,
 	  "Maximum number of insns in a basic block to consider for RTL "
 	  "if-conversion.",
 	  10, 0, 99)
+
+DEFPARAM (PARAM_HSA_GEN_DEBUG_STORES,
+	  "hsa-gen-debug-stores",
+	  "Level of hsa debug stores verbosity",
+	  0, 0, 1)
 /*
 
 Local variables:
diff --git a/libgomp/plugin/Makefrag.am b/libgomp/plugin/Makefrag.am
index 4efe963..035a663 100644
--- a/libgomp/plugin/Makefrag.am
+++ b/libgomp/plugin/Makefrag.am
@@ -38,3 +38,16 @@ libgomp_plugin_nvptx_la_LDFLAGS += $(PLUGIN_NVPTX_LDFLAGS)
 libgomp_plugin_nvptx_la_LIBADD = libgomp.la $(PLUGIN_NVPTX_LIBS)
 libgomp_plugin_nvptx_la_LIBTOOLFLAGS = --tag=disable-static
 endif
+
+if PLUGIN_HSA
+# Heterogenous Systems Architecture plugin
+libgomp_plugin_hsa_version_info = -version-info $(libtool_VERSION)
+toolexeclib_LTLIBRARIES += libgomp-plugin-hsa.la
+libgomp_plugin_hsa_la_SOURCES = plugin/plugin-hsa.c
+libgomp_plugin_hsa_la_CPPFLAGS = $(AM_CPPFLAGS) $(PLUGIN_HSA_CPPFLAGS)
+libgomp_plugin_hsa_la_LDFLAGS = $(libgomp_plugin_hsa_version_info) \
+	$(lt_host_flags)
+libgomp_plugin_hsa_la_LDFLAGS += $(PLUGIN_HSA_LDFLAGS)
+libgomp_plugin_hsa_la_LIBADD = libgomp.la $(PLUGIN_HSA_LIBS)
+libgomp_plugin_hsa_la_LIBTOOLFLAGS = --tag=disable-static
+endif
diff --git a/libgomp/plugin/configfrag.ac b/libgomp/plugin/configfrag.ac
index 768954a..2a9d9f9 100644
--- a/libgomp/plugin/configfrag.ac
+++ b/libgomp/plugin/configfrag.ac
@@ -81,6 +81,62 @@ AC_SUBST(PLUGIN_NVPTX_CPPFLAGS)
 AC_SUBST(PLUGIN_NVPTX_LDFLAGS)
 AC_SUBST(PLUGIN_NVPTX_LIBS)
 
+# Look for HSA run-time, its includes and libraries
+
+HSA_RUNTIME_INCLUDE=
+HSA_RUNTIME_LIB=
+AC_SUBST(HSA_RUNTIME_INCLUDE)
+AC_SUBST(HSA_RUNTIME_LIB)
+HSA_RUNTIME_CPPFLAGS=
+HSA_RUNTIME_LDFLAGS=
+
+AC_ARG_WITH(hsa-runtime,
+	[AS_HELP_STRING([--with-hsa-runtime=PATH],
+		[specify prefix directory for installed HSA run-time package.
+		 Equivalent to --with-hsa-runtime-include=PATH/include
+		 plus --with-hsa-runtime-lib=PATH/lib])])
+AC_ARG_WITH(hsa-runtime-include,
+	[AS_HELP_STRING([--with-hsa-runtime-include=PATH],
+		[specify directory for installed HSA run-time include files])])
+AC_ARG_WITH(hsa-runtime-lib,
+	[AS_HELP_STRING([--with-hsa-runtime-lib=PATH],
+		[specify directory for the installed HSA run-time library])])
+if test "x$with_hsa_runtime" != x; then
+  HSA_RUNTIME_INCLUDE=$with_hsa_runtime/include
+  HSA_RUNTIME_LIB=$with_hsa_runtime/lib
+fi
+if test "x$with_hsa_runtime_include" != x; then
+  HSA_RUNTIME_INCLUDE=$with_hsa_runtime_include
+fi
+if test "x$with_hsa_runtime_lib" != x; then
+  HSA_RUNTIME_LIB=$with_hsa_runtime_lib
+fi
+if test "x$HSA_RUNTIME_INCLUDE" != x; then
+  HSA_RUNTIME_CPPFLAGS=-I$HSA_RUNTIME_INCLUDE
+fi
+if test "x$HSA_RUNTIME_LIB" != x; then
+  HSA_RUNTIME_LDFLAGS=-L$HSA_RUNTIME_LIB
+fi
+
+AC_ARG_WITH(hsa-kmt-lib,
+	[AS_HELP_STRING([--with-hsa-kmt-lib=PATH],
+		[specify directory for installed HSA KMT library.])])
+if test "x$with_hsa_kmt_lib" != x; then
+  HSA_RUNTIME_LDFLAGS="$HSA_RUNTIME_LDFLAGS -L$with_hsa_kmt_lib"
+  HSA_RUNTIME_LIB=
+fi
+
+PLUGIN_HSA=0
+PLUGIN_HSA_CPPFLAGS=
+PLUGIN_HSA_LDFLAGS=
+PLUGIN_HSA_LIBS=
+AC_SUBST(PLUGIN_HSA)
+AC_SUBST(PLUGIN_HSA_CPPFLAGS)
+AC_SUBST(PLUGIN_HSA_LDFLAGS)
+AC_SUBST(PLUGIN_HSA_LIBS)
+
+
+
 # Get offload targets and path to install tree of offloading compiler.
 offload_additional_options=
 offload_additional_lib_paths=
@@ -122,6 +178,49 @@ if test x"$enable_offload_targets" != x; then
 	    ;;
 	esac
 	;;
+      hsa*)
+	case "${target}" in
+	  x86_64-*-*)
+	    case " ${CC} ${CFLAGS} " in
+	      *" -m32 "*)
+	        PLUGIN_HSA=0
+		;;
+	      *)
+	        tgt_name=hsa
+	        PLUGIN_HSA=$tgt
+	        PLUGIN_HSA_CPPFLAGS=$HSA_RUNTIME_CPPFLAGS
+	        PLUGIN_HSA_LDFLAGS=$HSA_RUNTIME_LDFLAGS
+	        PLUGIN_HSA_LIBS="-lhsa-runtime64 -lhsakmt"
+
+	        PLUGIN_HSA_save_CPPFLAGS=$CPPFLAGS
+	        CPPFLAGS="$PLUGIN_HSA_CPPFLAGS $CPPFLAGS"
+	        PLUGIN_HSA_save_LDFLAGS=$LDFLAGS
+	        LDFLAGS="$PLUGIN_HSA_LDFLAGS $LDFLAGS"
+	        PLUGIN_HSA_save_LIBS=$LIBS
+	        LIBS="$PLUGIN_HSA_LIBS $LIBS"
+
+	        AC_LINK_IFELSE(
+	          [AC_LANG_PROGRAM(
+	            [#include "hsa.h"],
+	              [hsa_status_t status = hsa_init ()])],
+	          [PLUGIN_HSA=1])
+	        CPPFLAGS=$PLUGIN_HSA_save_CPPFLAGS
+	        LDFLAGS=$PLUGIN_HSA_save_LDFLAGS
+	        LIBS=$PLUGIN_HSA_save_LIBS
+	        case $PLUGIN_HSA in
+	          hsa*)
+	            HSA_PLUGIN=0
+	            AC_MSG_ERROR([HSA run-time package required for HSA support])
+	            ;;
+	        esac
+		;;
+	      esac
+    	    ;;
+	  *-*-*)
+	    PLUGIN_HSA=0
+            ;;
+        esac
+        ;;
       *)
 	AC_MSG_ERROR([unknown offload target specified])
 	;;
@@ -145,3 +244,6 @@ AC_DEFINE_UNQUOTED(OFFLOAD_TARGETS, "$offload_targets",
 AM_CONDITIONAL([PLUGIN_NVPTX], [test $PLUGIN_NVPTX = 1])
 AC_DEFINE_UNQUOTED([PLUGIN_NVPTX], [$PLUGIN_NVPTX],
   [Define to 1 if the NVIDIA plugin is built, 0 if not.])
+AM_CONDITIONAL([PLUGIN_HSA], [test $PLUGIN_HSA = 1])
+AC_DEFINE_UNQUOTED([PLUGIN_HSA], [$PLUGIN_HSA],
+  [Define to 1 if the HSA plugin is built, 0 if not.])

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [hsa merge 02/10] Modifications to libgomp proper
  2016-01-13 17:55 [hsa merge 00/10] Merge of HSA branch Martin Jambor
                   ` (2 preceding siblings ...)
  2016-01-13 17:55 ` [hsa merge 10/10] HSA register allocator Martin Jambor
@ 2016-01-13 17:55 ` Martin Jambor
  2016-01-20 13:37   ` Ilya Verbin
  2016-01-20 14:48   ` Ilya Verbin
  2016-01-13 17:55 ` [hsa merge 07/10] IPA-HSA pass Martin Jambor
                   ` (6 subsequent siblings)
  10 siblings, 2 replies; 77+ messages in thread
From: Martin Jambor @ 2016-01-13 17:55 UTC (permalink / raw)
  To: GCC Patches; +Cc: Jakub Jelinek

[-- Attachment #1: libgomp.diff --]
[-- Type: text/plain, Size: 26361 bytes --]

Hi,

The patch below contains all changes to libgomp files except for the
hsa plugin (which is in the following patch).

The patch is a re-post of
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01288.html but we have
incorporated a number of requests from the feedback.  From the
subsequent communications with Jakub, I have the feeling he is fine
with the changes.  But perhaps he or someone else would like to have
one more look.

Thanks,

Martin


2016-01-13  Martin Jambor  <mjambor@suse.cz>

include/
	* gomp-constants.h (GOMP_DEVICE_HSA): New macro.
	(GOMP_VERSION_HSA): Likewise.
	(GOMP_TARGET_ARG_DEVICE_MASK): Likewise.
	(GOMP_TARGET_ARG_DEVICE_ALL): Likewise.
	(GOMP_TARGET_ARG_SUBSEQUENT_PARAM): Likewise.
	(GOMP_TARGET_ARG_ID_MASK): Likewise.
	(GOMP_TARGET_ARG_NUM_TEAMS): Likewise.
	(GOMP_TARGET_ARG_THREAD_LIMIT): Likewise.
	(GOMP_TARGET_ARG_VALUE_SHIFT): Likewise.
	(GOMP_TARGET_ARG_HSA_KERNEL_ATTRIBUTES): Likewise.

libgomp/
	* libgomp-plugin.h (offload_target_type): New element
	OFFLOAD_TARGET_TYPE_HSA.
	* libgomp.h (gomp_target_task): New fields firstprivate_copies and
	args.
	(bool gomp_create_target_task): Updated.
	(gomp_device_descr): Extra parameter of run_func and async_run_func,
	new field can_run_func.
	* libgomp_g.h (GOMP_target_ext): Update prototype.
	* oacc-host.c (host_run): Added a new parameter args.
	* target.c (calculate_firstprivate_requirements): New function.
	(copy_firstprivate_data): Likewise.
	(gomp_target_fallback_firstprivate): Use them.
	(gomp_target_unshare_firstprivate): New function.
	(gomp_get_target_fn_addr): Allow returning NULL for shared memory
	devices.
	(GOMP_target): Do host fallback for all shared memory devices.  Do not
	pass any args to plugins.
	(GOMP_target_ext): Introduce device-specific argument parameter args.
	Allow host fallback if device shares memory.  Do not remap data if
	device has shared memory.
	(gomp_target_task_fn): Likewise.  Also treat shared memory devices
	like host fallback for mappings.
	(GOMP_target_data): Treat shared memory devices like host fallback.
	(GOMP_target_data_ext): Likewise.
	(GOMP_target_update): Likewise.
	(GOMP_target_update_ext): Likewise.  Also pass NULL as args to
	gomp_create_target_task.
	(GOMP_target_enter_exit_data): Likewise.
	(omp_target_alloc): Treat shared memory devices like host fallback.
	(omp_target_free): Likewise.
	(omp_target_is_present): Likewise.
	(omp_target_memcpy): Likewise.
	(omp_target_memcpy_rect): Likewise.
	(omp_target_associate_ptr): Likewise.
	(gomp_load_plugin_for_device): Also load can_run.
	* task.c (GOMP_PLUGIN_target_task_completion): Free
	firstprivate_copies.
	(gomp_create_target_task): Accept new argument args and store it to
	ttask.

liboffloadmic/plugin
	* libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_async_run): New unused
	parameter.
	(GOMP_OFFLOAD_run): Likewise.

diff --git a/include/gomp-constants.h b/include/gomp-constants.h
index dffd631..a8e7723 100644
--- a/include/gomp-constants.h
+++ b/include/gomp-constants.h
@@ -176,6 +176,7 @@ enum gomp_map_kind
 #define GOMP_DEVICE_NOT_HOST		4
 #define GOMP_DEVICE_NVIDIA_PTX		5
 #define GOMP_DEVICE_INTEL_MIC		6
+#define GOMP_DEVICE_HSA			7
 
 #define GOMP_DEVICE_ICV			-1
 #define GOMP_DEVICE_HOST_FALLBACK	-2
@@ -201,6 +202,7 @@ enum gomp_map_kind
 #define GOMP_VERSION	0
 #define GOMP_VERSION_NVIDIA_PTX 1
 #define GOMP_VERSION_INTEL_MIC 0
+#define GOMP_VERSION_HSA 0
 
 #define GOMP_VERSION_PACK(LIB, DEV) (((LIB) << 16) | (DEV))
 #define GOMP_VERSION_LIB(PACK) (((PACK) >> 16) & 0xffff)
@@ -228,4 +230,30 @@ enum gomp_map_kind
 #define GOMP_LAUNCH_OP(X) (((X) >> GOMP_LAUNCH_OP_SHIFT) & 0xffff)
 #define GOMP_LAUNCH_OP_MAX 0xffff
 
+/* Bitmask to apply in order to find out the intended device of a target
+   argument.  */
+#define GOMP_TARGET_ARG_DEVICE_MASK		((1 << 7) - 1)
+/* The target argument is significant for all devices.  */
+#define GOMP_TARGET_ARG_DEVICE_ALL		0
+
+/* Flag set when the subsequent element in the device-specific argument
+   values.  */
+#define GOMP_TARGET_ARG_SUBSEQUENT_PARAM	(1 << 7)
+
+/* Bitmask to apply to a target argument to find out the value identifier.  */
+#define GOMP_TARGET_ARG_ID_MASK			(((1 << 8) - 1) << 8)
+/* Target argument index of NUM_TEAMS.  */
+#define GOMP_TARGET_ARG_NUM_TEAMS		(1 << 8)
+/* Target argument index of THREAD_LIMIT.  */
+#define GOMP_TARGET_ARG_THREAD_LIMIT		(2 << 8)
+
+/* If the value is directly embeded in target argument, it should be a 16-bit
+   at most and shifted by this many bits.  */
+#define GOMP_TARGET_ARG_VALUE_SHIFT		16
+
+/* HSA specific data structures.  */
+
+/* Identifiers of device-specific target arguments.  */
+#define GOMP_TARGET_ARG_HSA_KERNEL_ATTRIBUTES	(1 << 8)
+
 #endif
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index 64035e4..53f9248 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -48,7 +48,8 @@ enum offload_target_type
   OFFLOAD_TARGET_TYPE_HOST = 2,
   /* OFFLOAD_TARGET_TYPE_HOST_NONSHM = 3 removed.  */
   OFFLOAD_TARGET_TYPE_NVIDIA_PTX = 5,
-  OFFLOAD_TARGET_TYPE_INTEL_MIC = 6
+  OFFLOAD_TARGET_TYPE_INTEL_MIC = 6,
+  OFFLOAD_TARGET_TYPE_HSA = 7
 };
 
 /* Auxiliary struct, used for transferring pairs of addresses from plugin
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 6ddde56..7108a6d 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -496,6 +496,10 @@ struct gomp_target_task
   struct target_mem_desc *tgt;
   struct gomp_task *task;
   struct gomp_team *team;
+  /* Copies of firstprivate mapped data for shared memory accelerators.  */
+  void *firstprivate_copies;
+  /* Device-specific target arguments.  */
+  void **args;
   void *hostaddrs[];
 };
 
@@ -750,7 +754,8 @@ extern void gomp_task_maybe_wait_for_dependencies (void **);
 extern bool gomp_create_target_task (struct gomp_device_descr *,
 				     void (*) (void *), size_t, void **,
 				     size_t *, unsigned short *, unsigned int,
-				     void **, enum gomp_target_task_state);
+				     void **, void **,
+				     enum gomp_target_task_state);
 
 static void inline
 gomp_finish_task (struct gomp_task *task)
@@ -937,8 +942,9 @@ struct gomp_device_descr
   void *(*dev2host_func) (int, void *, const void *, size_t);
   void *(*host2dev_func) (int, void *, const void *, size_t);
   void *(*dev2dev_func) (int, void *, const void *, size_t);
-  void (*run_func) (int, void *, void *);
-  void (*async_run_func) (int, void *, void *, void *);
+  bool (*can_run_func) (void *);
+  void (*run_func) (int, void *, void *, void **);
+  void (*async_run_func) (int, void *, void *, void **, void *);
 
   /* Splay tree containing information about mapped memory regions.  */
   struct splay_tree_s mem_map;
diff --git a/libgomp/libgomp_g.h b/libgomp/libgomp_g.h
index 6229ca0..24eebb6 100644
--- a/libgomp/libgomp_g.h
+++ b/libgomp/libgomp_g.h
@@ -278,8 +278,7 @@ extern void GOMP_single_copy_end (void *);
 extern void GOMP_target (int, void (*) (void *), const void *,
 			 size_t, void **, size_t *, unsigned char *);
 extern void GOMP_target_ext (int, void (*) (void *), size_t, void **, size_t *,
-			     unsigned short *, unsigned int, void **,
-			     int, int);
+			     unsigned short *, unsigned int, void **, void **);
 extern void GOMP_target_data (int, const void *,
 			      size_t, void **, size_t *, unsigned char *);
 extern void GOMP_target_data_ext (int, size_t, void **, size_t *,
diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
index 0760e44..1e760f6 100644
--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c
@@ -123,7 +123,8 @@ host_host2dev (int n __attribute__ ((unused)),
 }
 
 static void
-host_run (int n __attribute__ ((unused)), void *fn_ptr, void *vars)
+host_run (int n __attribute__ ((unused)), void *fn_ptr, void *vars,
+	  void **args __attribute__((unused)))
 {
   void (*fn)(void *) = (void (*)(void *)) fn_ptr;
 
diff --git a/libgomp/target.c b/libgomp/target.c
index bea5822..f1f5849 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1329,44 +1329,90 @@ gomp_target_fallback (void (*fn) (void *), void **hostaddrs)
   *thr = old_thr;
 }
 
-/* Host fallback with firstprivate map-type handling.  */
+/* Calculate alignment and size requirements of a private copy of data shared
+   as GOMP_MAP_FIRSTPRIVATE and store them to TGT_ALIGN and TGT_SIZE.  */
 
-static void
-gomp_target_fallback_firstprivate (void (*fn) (void *), size_t mapnum,
-				   void **hostaddrs, size_t *sizes,
-				   unsigned short *kinds)
+static inline void
+calculate_firstprivate_requirements (size_t mapnum, size_t *sizes,
+				     unsigned short *kinds, size_t *tgt_align,
+				     size_t *tgt_size)
 {
-  size_t i, tgt_align = 0, tgt_size = 0;
-  char *tgt = NULL;
+  size_t i;
+  for (i = 0; i < mapnum; i++)
+    if ((kinds[i] & 0xff) == GOMP_MAP_FIRSTPRIVATE)
+      {
+	size_t align = (size_t) 1 << (kinds[i] >> 8);
+	if (*tgt_align < align)
+	  *tgt_align = align;
+	*tgt_size = (*tgt_size + align - 1) & ~(align - 1);
+	*tgt_size += sizes[i];
+      }
+}
+
+/* Copy data shared as GOMP_MAP_FIRSTPRIVATE to DST.  */
+
+static inline void
+copy_firstprivate_data (char *tgt, size_t mapnum, void **hostaddrs,
+			size_t *sizes, unsigned short *kinds, size_t tgt_align,
+			size_t tgt_size)
+{
+  uintptr_t al = (uintptr_t) tgt & (tgt_align - 1);
+  if (al)
+    tgt += tgt_align - al;
+  tgt_size = 0;
+  size_t i;
   for (i = 0; i < mapnum; i++)
     if ((kinds[i] & 0xff) == GOMP_MAP_FIRSTPRIVATE)
       {
 	size_t align = (size_t) 1 << (kinds[i] >> 8);
-	if (tgt_align < align)
-	  tgt_align = align;
 	tgt_size = (tgt_size + align - 1) & ~(align - 1);
-	tgt_size += sizes[i];
+	memcpy (tgt + tgt_size, hostaddrs[i], sizes[i]);
+	hostaddrs[i] = tgt + tgt_size;
+	tgt_size = tgt_size + sizes[i];
       }
+}
+
+/* Host fallback with firstprivate map-type handling.  */
+
+static void
+gomp_target_fallback_firstprivate (void (*fn) (void *), size_t mapnum,
+				   void **hostaddrs, size_t *sizes,
+				   unsigned short *kinds)
+{
+  size_t tgt_align = 0, tgt_size = 0;
+  calculate_firstprivate_requirements (mapnum, sizes, kinds, &tgt_align,
+				       &tgt_size);
   if (tgt_align)
     {
-      tgt = gomp_alloca (tgt_size + tgt_align - 1);
-      uintptr_t al = (uintptr_t) tgt & (tgt_align - 1);
-      if (al)
-	tgt += tgt_align - al;
-      tgt_size = 0;
-      for (i = 0; i < mapnum; i++)
-	if ((kinds[i] & 0xff) == GOMP_MAP_FIRSTPRIVATE)
-	  {
-	    size_t align = (size_t) 1 << (kinds[i] >> 8);
-	    tgt_size = (tgt_size + align - 1) & ~(align - 1);
-	    memcpy (tgt + tgt_size, hostaddrs[i], sizes[i]);
-	    hostaddrs[i] = tgt + tgt_size;
-	    tgt_size = tgt_size + sizes[i];
-	  }
+      char *tgt = gomp_alloca (tgt_size + tgt_align - 1);
+      copy_firstprivate_data (tgt, mapnum, hostaddrs, sizes, kinds, tgt_align,
+			      tgt_size);
     }
   gomp_target_fallback (fn, hostaddrs);
 }
 
+/* Handle firstprivate map-type for shared memory devices and the host
+   fallback.  Return the pointer of firstprivate copies which has to be freed
+   after use.  */
+
+static void *
+gomp_target_unshare_firstprivate (size_t mapnum, void **hostaddrs,
+				  size_t *sizes, unsigned short *kinds)
+{
+  size_t tgt_align = 0, tgt_size = 0;
+  char *tgt = NULL;
+
+  calculate_firstprivate_requirements (mapnum, sizes, kinds, &tgt_align,
+				       &tgt_size);
+  if (tgt_align)
+    {
+      tgt = gomp_malloc (tgt_size + tgt_align - 1);
+      copy_firstprivate_data (tgt, mapnum, hostaddrs, sizes, kinds, tgt_align,
+			      tgt_size);
+    }
+  return tgt;
+}
+
 /* Helper function of GOMP_target{,_ext} routines.  */
 
 static void *
@@ -1390,7 +1436,12 @@ gomp_get_target_fn_addr (struct gomp_device_descr *devicep,
       splay_tree_key tgt_fn = splay_tree_lookup (&devicep->mem_map, &k);
       gomp_mutex_unlock (&devicep->lock);
       if (tgt_fn == NULL)
-	gomp_fatal ("Target function wasn't mapped");
+	{
+	  if (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
+	    return NULL;
+	  else
+	    gomp_fatal ("Target function wasn't mapped");
+	}
 
       return (void *) tgt_fn->tgt_offset;
     }
@@ -1416,13 +1467,16 @@ GOMP_target (int device, void (*fn) (void *), const void *unused,
   void *fn_addr;
   if (devicep == NULL
       || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
+      /* All shared memory devices should use the GOMP_target_ext function.  */
+      || devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM
       || !(fn_addr = gomp_get_target_fn_addr (devicep, fn)))
     return gomp_target_fallback (fn, hostaddrs);
 
   struct target_mem_desc *tgt_vars
     = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false,
 		     GOMP_MAP_VARS_TARGET);
-  devicep->run_func (devicep->target_id, fn_addr, (void *) tgt_vars->tgt_start);
+  devicep->run_func (devicep->target_id, fn_addr, (void *) tgt_vars->tgt_start,
+		     NULL);
   gomp_unmap_vars (tgt_vars, true);
 }
 
@@ -1430,6 +1484,15 @@ GOMP_target (int device, void (*fn) (void *), const void *unused,
    and several arguments have been added:
    FLAGS is a bitmask, see GOMP_TARGET_FLAG_* in gomp-constants.h.
    DEPEND is array of dependencies, see GOMP_task for details.
+
+   ARGS is a pointer to an array consisting of a variable number of both
+   device-independent and device-specific arguments, which can take one two
+   elements where the first specifies for which device it is intended, the type
+   and optionally also the value.  If the value is not present in the first
+   one, the whole second element the actual value.  The last element of the
+   array is a single NULL.  Among the device independent can be for example
+   NUM_TEAMS and THREAD_LIMIT.
+
    NUM_TEAMS is positive if GOMP_teams will be called in the body with
    that value, or 1 if teams construct is not present, or 0, if
    teams construct does not have num_teams clause and so the choice is
@@ -1443,14 +1506,10 @@ GOMP_target (int device, void (*fn) (void *), const void *unused,
 void
 GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum,
 		 void **hostaddrs, size_t *sizes, unsigned short *kinds,
-		 unsigned int flags, void **depend, int num_teams,
-		 int thread_limit)
+		 unsigned int flags, void **depend, void **args)
 {
   struct gomp_device_descr *devicep = resolve_device (device);
 
-  (void) num_teams;
-  (void) thread_limit;
-
   if (flags & GOMP_TARGET_FLAG_NOWAIT)
     {
       struct gomp_thread *thr = gomp_thread ();
@@ -1487,7 +1546,7 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum,
 	  && !thr->task->final_task)
 	{
 	  gomp_create_target_task (devicep, fn, mapnum, hostaddrs,
-				   sizes, kinds, flags, depend,
+				   sizes, kinds, flags, depend, args,
 				   GOMP_TARGET_TASK_BEFORE_MAP);
 	  return;
 	}
@@ -1507,17 +1566,30 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum,
   void *fn_addr;
   if (devicep == NULL
       || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
-      || !(fn_addr = gomp_get_target_fn_addr (devicep, fn)))
+      || !(fn_addr = gomp_get_target_fn_addr (devicep, fn))
+      || (devicep->can_run_func && !devicep->can_run_func (fn_addr)))
     {
       gomp_target_fallback_firstprivate (fn, mapnum, hostaddrs, sizes, kinds);
       return;
     }
 
-  struct target_mem_desc *tgt_vars
-    = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, true,
-		     GOMP_MAP_VARS_TARGET);
-  devicep->run_func (devicep->target_id, fn_addr, (void *) tgt_vars->tgt_start);
-  gomp_unmap_vars (tgt_vars, true);
+  struct target_mem_desc *tgt_vars;
+  void *fpc = NULL;
+  if (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
+    {
+      fpc = gomp_target_unshare_firstprivate (mapnum, hostaddrs, sizes, kinds);
+      tgt_vars = NULL;
+    }
+  else
+    tgt_vars = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds,
+			      true, GOMP_MAP_VARS_TARGET);
+  devicep->run_func (devicep->target_id, fn_addr,
+		     tgt_vars ? (void *) tgt_vars->tgt_start : hostaddrs,
+		     args);
+  if (tgt_vars)
+    gomp_unmap_vars (tgt_vars, true);
+  else
+    free (fpc);
 }
 
 /* Host fallback for GOMP_target_data{,_ext} routines.  */
@@ -1547,7 +1619,8 @@ GOMP_target_data (int device, const void *unused, size_t mapnum,
   struct gomp_device_descr *devicep = resolve_device (device);
 
   if (devicep == NULL
-      || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
+      || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
+      || (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM))
     return gomp_target_data_fallback ();
 
   struct target_mem_desc *tgt
@@ -1565,7 +1638,8 @@ GOMP_target_data_ext (int device, size_t mapnum, void **hostaddrs,
   struct gomp_device_descr *devicep = resolve_device (device);
 
   if (devicep == NULL
-      || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
+      || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
+      || devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
     return gomp_target_data_fallback ();
 
   struct target_mem_desc *tgt
@@ -1595,7 +1669,8 @@ GOMP_target_update (int device, const void *unused, size_t mapnum,
   struct gomp_device_descr *devicep = resolve_device (device);
 
   if (devicep == NULL
-      || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
+      || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
+      || devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
     return;
 
   gomp_update (devicep, mapnum, hostaddrs, sizes, kinds, false);
@@ -1626,7 +1701,7 @@ GOMP_target_update_ext (int device, size_t mapnum, void **hostaddrs,
 	      if (gomp_create_target_task (devicep, (void (*) (void *)) NULL,
 					   mapnum, hostaddrs, sizes, kinds,
 					   flags | GOMP_TARGET_FLAG_UPDATE,
-					   depend, GOMP_TARGET_TASK_DATA))
+					   depend, NULL, GOMP_TARGET_TASK_DATA))
 		return;
 	    }
 	  else
@@ -1646,7 +1721,8 @@ GOMP_target_update_ext (int device, size_t mapnum, void **hostaddrs,
     }
 
   if (devicep == NULL
-      || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
+      || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
+      || devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
     return;
 
   struct gomp_thread *thr = gomp_thread ();
@@ -1756,7 +1832,7 @@ GOMP_target_enter_exit_data (int device, size_t mapnum, void **hostaddrs,
 	    {
 	      if (gomp_create_target_task (devicep, (void (*) (void *)) NULL,
 					   mapnum, hostaddrs, sizes, kinds,
-					   flags, depend,
+					   flags, depend, NULL,
 					   GOMP_TARGET_TASK_DATA))
 		return;
 	    }
@@ -1777,7 +1853,8 @@ GOMP_target_enter_exit_data (int device, size_t mapnum, void **hostaddrs,
     }
 
   if (devicep == NULL
-      || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
+      || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
+      || devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
     return;
 
   struct gomp_thread *thr = gomp_thread ();
@@ -1815,7 +1892,8 @@ gomp_target_task_fn (void *data)
       void *fn_addr;
       if (devicep == NULL
 	  || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
-	  || !(fn_addr = gomp_get_target_fn_addr (devicep, ttask->fn)))
+	  || !(fn_addr = gomp_get_target_fn_addr (devicep, ttask->fn))
+	  || (devicep->can_run_func && !devicep->can_run_func (fn_addr)))
 	{
 	  ttask->state = GOMP_TARGET_TASK_FALLBACK;
 	  gomp_target_fallback_firstprivate (ttask->fn, ttask->mapnum,
@@ -1826,22 +1904,36 @@ gomp_target_task_fn (void *data)
 
       if (ttask->state == GOMP_TARGET_TASK_FINISHED)
 	{
-	  gomp_unmap_vars (ttask->tgt, true);
+	  if (ttask->tgt)
+	    gomp_unmap_vars (ttask->tgt, true);
 	  return false;
 	}
 
-      ttask->tgt
-	= gomp_map_vars (devicep, ttask->mapnum, ttask->hostaddrs, NULL,
-			 ttask->sizes, ttask->kinds, true,
-			 GOMP_MAP_VARS_TARGET);
+      void *actual_arguments;
+      if (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
+	{
+	  ttask->tgt = NULL;
+	  ttask->firstprivate_copies
+	    = gomp_target_unshare_firstprivate (ttask->mapnum, ttask->hostaddrs,
+						ttask->sizes, ttask->kinds);
+	  actual_arguments = ttask->hostaddrs;
+	}
+      else
+	{
+	  ttask->tgt = gomp_map_vars (devicep, ttask->mapnum, ttask->hostaddrs,
+				      NULL, ttask->sizes, ttask->kinds, true,
+				      GOMP_MAP_VARS_TARGET);
+	  actual_arguments = (void *) ttask->tgt->tgt_start;
+	}
       ttask->state = GOMP_TARGET_TASK_READY_TO_RUN;
 
-      devicep->async_run_func (devicep->target_id, fn_addr,
-			       (void *) ttask->tgt->tgt_start, (void *) ttask);
+      devicep->async_run_func (devicep->target_id, fn_addr, actual_arguments,
+			       ttask->args, (void *) ttask);
       return true;
     }
   else if (devicep == NULL
-	   || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
+	   || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
+	   || devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
     return false;
 
   size_t i;
@@ -1891,7 +1983,8 @@ omp_target_alloc (size_t size, int device_num)
   if (devicep == NULL)
     return NULL;
 
-  if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
+  if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
+      || devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
     return malloc (size);
 
   gomp_mutex_lock (&devicep->lock);
@@ -1919,7 +2012,8 @@ omp_target_free (void *device_ptr, int device_num)
   if (devicep == NULL)
     return;
 
-  if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
+  if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
+      || devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
     {
       free (device_ptr);
       return;
@@ -1946,7 +2040,8 @@ omp_target_is_present (void *ptr, int device_num)
   if (devicep == NULL)
     return 0;
 
-  if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
+  if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
+      || devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
     return 1;
 
   gomp_mutex_lock (&devicep->lock);
@@ -1976,7 +2071,8 @@ omp_target_memcpy (void *dst, void *src, size_t length, size_t dst_offset,
       if (dst_devicep == NULL)
 	return EINVAL;
 
-      if (!(dst_devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
+      if (!(dst_devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
+	  || dst_devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
 	dst_devicep = NULL;
     }
   if (src_device_num != GOMP_DEVICE_HOST_FALLBACK)
@@ -1988,7 +2084,8 @@ omp_target_memcpy (void *dst, void *src, size_t length, size_t dst_offset,
       if (src_devicep == NULL)
 	return EINVAL;
 
-      if (!(src_devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
+      if (!(src_devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
+	  || src_devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
 	src_devicep = NULL;
     }
   if (src_devicep == NULL && dst_devicep == NULL)
@@ -2118,7 +2215,8 @@ omp_target_memcpy_rect (void *dst, void *src, size_t element_size,
       if (dst_devicep == NULL)
 	return EINVAL;
 
-      if (!(dst_devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
+      if (!(dst_devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
+	  || dst_devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
 	dst_devicep = NULL;
     }
   if (src_device_num != GOMP_DEVICE_HOST_FALLBACK)
@@ -2130,7 +2228,8 @@ omp_target_memcpy_rect (void *dst, void *src, size_t element_size,
       if (src_devicep == NULL)
 	return EINVAL;
 
-      if (!(src_devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
+      if (!(src_devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
+	  || src_devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
 	src_devicep = NULL;
     }
 
@@ -2166,7 +2265,8 @@ omp_target_associate_ptr (void *host_ptr, void *device_ptr, size_t size,
   if (devicep == NULL)
     return EINVAL;
 
-  if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
+  if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
+      || devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
     return EINVAL;
 
   gomp_mutex_lock (&devicep->lock);
@@ -2309,6 +2409,7 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
     {
       DLSYM (run);
       DLSYM (async_run);
+      DLSYM_OPT (can_run, can_run);
       DLSYM (dev2dev);
     }
   if (device->capabilities & GOMP_OFFLOAD_CAP_OPENACC_200)
diff --git a/libgomp/task.c b/libgomp/task.c
index de9936e..90bdb1c 100644
--- a/libgomp/task.c
+++ b/libgomp/task.c
@@ -581,6 +581,7 @@ GOMP_PLUGIN_target_task_completion (void *data)
       gomp_mutex_unlock (&team->task_lock);
     }
   ttask->state = GOMP_TARGET_TASK_FINISHED;
+  free (ttask->firstprivate_copies);
   gomp_target_task_completion (team, task);
   gomp_mutex_unlock (&team->task_lock);
 }
@@ -593,7 +594,7 @@ bool
 gomp_create_target_task (struct gomp_device_descr *devicep,
 			 void (*fn) (void *), size_t mapnum, void **hostaddrs,
 			 size_t *sizes, unsigned short *kinds,
-			 unsigned int flags, void **depend,
+			 unsigned int flags, void **depend, void **args,
 			 enum gomp_target_task_state state)
 {
   struct gomp_thread *thr = gomp_thread ();
@@ -653,6 +654,7 @@ gomp_create_target_task (struct gomp_device_descr *devicep,
   ttask->devicep = devicep;
   ttask->fn = fn;
   ttask->mapnum = mapnum;
+  ttask->args = args;
   memcpy (ttask->hostaddrs, hostaddrs, mapnum * sizeof (void *));
   ttask->sizes = (size_t *) &ttask->hostaddrs[mapnum];
   memcpy (ttask->sizes, sizes, mapnum * sizeof (size_t));
diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
index 68f7b2c..58ef595 100644
--- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
+++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
@@ -528,7 +528,7 @@ GOMP_OFFLOAD_dev2dev (int device, void *dst_ptr, const void *src_ptr,
 
 extern "C" void
 GOMP_OFFLOAD_async_run (int device, void *tgt_fn, void *tgt_vars,
-			void *async_data)
+			void **, void *async_data)
 {
   TRACE ("(device = %d, tgt_fn = %p, tgt_vars = %p, async_data = %p)", device,
 	 tgt_fn, tgt_vars, async_data);
@@ -544,7 +544,7 @@ GOMP_OFFLOAD_async_run (int device, void *tgt_fn, void *tgt_vars,
 }
 
 extern "C" void
-GOMP_OFFLOAD_run (int device, void *tgt_fn, void *tgt_vars)
+GOMP_OFFLOAD_run (int device, void *tgt_fn, void *tgt_vars, void **)
 {
   TRACE ("(device = %d, tgt_fn = %p, tgt_vars = %p)", device, tgt_fn, tgt_vars);
 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [hsa merge 07/10] IPA-HSA pass
  2016-01-13 17:55 [hsa merge 00/10] Merge of HSA branch Martin Jambor
                   ` (3 preceding siblings ...)
  2016-01-13 17:55 ` [hsa merge 02/10] Modifications to libgomp proper Martin Jambor
@ 2016-01-13 17:55 ` Martin Jambor
  2016-01-14 12:59   ` Jakub Jelinek
  2016-01-15  9:52   ` Jan Hubicka
  2016-01-13 17:55 ` [hsa merge 03/10] HSA libgomp plugin Martin Jambor
                   ` (5 subsequent siblings)
  10 siblings, 2 replies; 77+ messages in thread
From: Martin Jambor @ 2016-01-13 17:55 UTC (permalink / raw)
  To: GCC Patches; +Cc: Jan Hubicka, Martin Liska

[-- Attachment #1: ipa-hsa.diff --]
[-- Type: text/plain, Size: 13936 bytes --]

Hi,

this patch contains IPA-related changes that we need to bring about
for HSA.

The patch is a re-post of
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00720.html but so far we
have not received any feedback.  Let me quote the original
accompanying email here for reference:

When a target construct is gridified, the HSA GPU function is
associated with the CPU function throughout the compilation, so that
they can be registered as a pair in libgomp.

Ungridified target constructs and, more importantly, "pragma omp
declare target" marked functions emerge out of OMP expansion as one
gimple function for both the host and the accelerator. However, at
some point we need to create a special HSA function representation so
that we can modify behavior of a (very) few optimization passes for
them.

Both is done by the following new IPA pass, which creates new HSA
clones in these cases.  Moreover, it redirects the appropriate call
graph edges to be in between HSA implementations, marks HSA clones
with the flatten attribute to minimize any call overhead (which is
much more significant on GPUs) and makes sure both the CPU and GPU
functions are coupled together and remain in the same LTO partition so
that they can b registered together to libgomp.

Thanks,

Martin


2016-01-13  Martin Liska  <mliska@suse.cz>
	    Martin Jambor  <mjambor@suse.cz>

	* ipa-hsa.c: New file.
	* lto-section-in.c (lto_section_name): Add hsa section name.
	* lto-streamer.h (lto_section_type): Add hsa section.
	* lto-partition.c: Include "hsa.h"
	(add_symbol_to_partition_1): Put hsa implementations into the
	same partition as host implementations.
	* timevar.def (TV_IPA_HSA): New.

diff --git a/gcc/ipa-hsa.c b/gcc/ipa-hsa.c
new file mode 100644
index 0000000..dd47995
--- /dev/null
+++ b/gcc/ipa-hsa.c
@@ -0,0 +1,329 @@
+/* Callgraph based analysis of static variables.
+   Copyright (C) 2015-2016 Free Software Foundation, Inc.
+   Contributed by Martin Liska <mliska@suse.cz>
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* Interprocedural HSA pass is responsible for creation of HSA clones.
+   For all these HSA clones, we emit HSAIL instructions and pass processing
+   is terminated.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "is-a.h"
+#include "hash-set.h"
+#include "vec.h"
+#include "tree.h"
+#include "tree-pass.h"
+#include "function.h"
+#include "basic-block.h"
+#include "gimple.h"
+#include "dumpfile.h"
+#include "gimple-pretty-print.h"
+#include "tree-streamer.h"
+#include "stringpool.h"
+#include "cgraph.h"
+#include "print-tree.h"
+#include "symbol-summary.h"
+#include "hsa.h"
+
+namespace {
+
+/* If NODE is not versionable, warn about not emiting HSAIL and return false.
+   Otherwise return true.  */
+
+static bool
+check_warn_node_versionable (cgraph_node *node)
+{
+  if (!node->local.versionable)
+    {
+      warning_at (EXPR_LOCATION (node->decl), OPT_Whsa,
+		  "could not emit HSAIL for function %s: function cannot be "
+		  "cloned", node->name ());
+      return false;
+    }
+  return true;
+}
+
+/* The function creates HSA clones for all functions that were either
+   marked as HSA kernels or are callable HSA functions.  Apart from that,
+   we redirect all edges that come from an HSA clone and end in another
+   HSA clone to connect these two functions.  */
+
+static unsigned int
+process_hsa_functions (void)
+{
+  struct cgraph_node *node;
+
+  if (hsa_summaries == NULL)
+    hsa_summaries = new hsa_summary_t (symtab);
+
+  FOR_EACH_DEFINED_FUNCTION (node)
+    {
+      hsa_function_summary *s = hsa_summaries->get (node);
+
+      /* A linked function is skipped.  */
+      if (s->m_binded_function != NULL)
+	continue;
+
+      if (s->m_kind != HSA_NONE)
+	{
+	  if (!check_warn_node_versionable (node))
+	    continue;
+	  cgraph_node *clone = node->create_virtual_clone
+	    (vec <cgraph_edge *> (), NULL, NULL, "hsa");
+	  TREE_PUBLIC (clone->decl) = TREE_PUBLIC (node->decl);
+
+	  clone->force_output = true;
+	  hsa_summaries->link_functions (clone, node, s->m_kind, false);
+
+	  if (dump_file)
+	    fprintf (dump_file, "Created a new HSA clone: %s, type: %s\n",
+		     clone->name (),
+		     s->m_kind == HSA_KERNEL ? "kernel" : "function");
+	}
+      else if (hsa_callable_function_p (node->decl))
+	{
+	  if (!check_warn_node_versionable (node))
+	    continue;
+	  cgraph_node *clone = node->create_virtual_clone
+	    (vec <cgraph_edge *> (), NULL, NULL, "hsa");
+	  TREE_PUBLIC (clone->decl) = TREE_PUBLIC (node->decl);
+
+	  if (!cgraph_local_p (node))
+	    clone->force_output = true;
+	  hsa_summaries->link_functions (clone, node, HSA_FUNCTION, false);
+
+	  if (dump_file)
+	    fprintf (dump_file, "Created a new HSA function clone: %s\n",
+		     clone->name ());
+	}
+    }
+
+  /* Redirect all edges that are between HSA clones.  */
+  FOR_EACH_DEFINED_FUNCTION (node)
+    {
+      cgraph_edge *e = node->callees;
+
+      while (e)
+	{
+	  hsa_function_summary *src = hsa_summaries->get (node);
+	  if (src->m_kind != HSA_NONE && src->m_gpu_implementation_p)
+	    {
+	      hsa_function_summary *dst = hsa_summaries->get (e->callee);
+	      if (dst->m_kind != HSA_NONE && !dst->m_gpu_implementation_p)
+		{
+		  e->redirect_callee (dst->m_binded_function);
+		  if (dump_file)
+		    fprintf (dump_file,
+			     "Redirecting edge to HSA function: %s->%s\n",
+			     xstrdup_for_dump (e->caller->name ()),
+			     xstrdup_for_dump (e->callee->name ()));
+		}
+	    }
+
+	  e = e->next_callee;
+	}
+    }
+
+  return 0;
+}
+
+/* Iterate all HSA functions and stream out HSA function summary.  */
+
+static void
+ipa_hsa_write_summary (void)
+{
+  struct bitpack_d bp;
+  struct cgraph_node *node;
+  struct output_block *ob;
+  unsigned int count = 0;
+  lto_symtab_encoder_iterator lsei;
+  lto_symtab_encoder_t encoder;
+
+  if (!hsa_summaries)
+    return;
+
+  ob = create_output_block (LTO_section_ipa_hsa);
+  encoder = ob->decl_state->symtab_node_encoder;
+  ob->symbol = NULL;
+  for (lsei = lsei_start_function_in_partition (encoder); !lsei_end_p (lsei);
+       lsei_next_function_in_partition (&lsei))
+    {
+      node = lsei_cgraph_node (lsei);
+      hsa_function_summary *s = hsa_summaries->get (node);
+
+      if (s->m_kind != HSA_NONE)
+	count++;
+    }
+
+  streamer_write_uhwi (ob, count);
+
+  /* Process all of the functions.  */
+  for (lsei = lsei_start_function_in_partition (encoder); !lsei_end_p (lsei);
+       lsei_next_function_in_partition (&lsei))
+    {
+      node = lsei_cgraph_node (lsei);
+      hsa_function_summary *s = hsa_summaries->get (node);
+
+      if (s->m_kind != HSA_NONE)
+	{
+	  encoder = ob->decl_state->symtab_node_encoder;
+	  int node_ref = lto_symtab_encoder_encode (encoder, node);
+	  streamer_write_uhwi (ob, node_ref);
+
+	  bp = bitpack_create (ob->main_stream);
+	  bp_pack_value (&bp, s->m_kind, 2);
+	  bp_pack_value (&bp, s->m_gpu_implementation_p, 1);
+	  bp_pack_value (&bp, s->m_binded_function != NULL, 1);
+	  streamer_write_bitpack (&bp);
+	  if (s->m_binded_function)
+	    stream_write_tree (ob, s->m_binded_function->decl, true);
+	}
+    }
+
+  streamer_write_char_stream (ob->main_stream, 0);
+  produce_asm (ob, NULL);
+  destroy_output_block (ob);
+}
+
+/* Read section in file FILE_DATA of length LEN with data DATA.  */
+
+static void
+ipa_hsa_read_section (struct lto_file_decl_data *file_data, const char *data,
+		       size_t len)
+{
+  const struct lto_function_header *header =
+    (const struct lto_function_header *) data;
+  const int cfg_offset = sizeof (struct lto_function_header);
+  const int main_offset = cfg_offset + header->cfg_size;
+  const int string_offset = main_offset + header->main_size;
+  struct data_in *data_in;
+  unsigned int i;
+  unsigned int count;
+
+  lto_input_block ib_main ((const char *) data + main_offset,
+			   header->main_size, file_data->mode_table);
+
+  data_in =
+    lto_data_in_create (file_data, (const char *) data + string_offset,
+			header->string_size, vNULL);
+  count = streamer_read_uhwi (&ib_main);
+
+  for (i = 0; i < count; i++)
+    {
+      unsigned int index;
+      struct cgraph_node *node;
+      lto_symtab_encoder_t encoder;
+
+      index = streamer_read_uhwi (&ib_main);
+      encoder = file_data->symtab_node_encoder;
+      node = dyn_cast<cgraph_node *> (lto_symtab_encoder_deref (encoder,
+								index));
+      gcc_assert (node->definition);
+      hsa_function_summary *s = hsa_summaries->get (node);
+
+      struct bitpack_d bp = streamer_read_bitpack (&ib_main);
+      s->m_kind = (hsa_function_kind) bp_unpack_value (&bp, 2);
+      s->m_gpu_implementation_p = bp_unpack_value (&bp, 1);
+      bool has_tree = bp_unpack_value (&bp, 1);
+
+      if (has_tree)
+	{
+	  tree decl = stream_read_tree (&ib_main, data_in);
+	  s->m_binded_function = cgraph_node::get_create (decl);
+	}
+    }
+  lto_free_section_data (file_data, LTO_section_ipa_hsa, NULL, data,
+			 len);
+  lto_data_in_delete (data_in);
+}
+
+/* Load streamed HSA functions summary and assign the summary to a function.  */
+
+static void
+ipa_hsa_read_summary (void)
+{
+  struct lto_file_decl_data **file_data_vec = lto_get_file_decl_data ();
+  struct lto_file_decl_data *file_data;
+  unsigned int j = 0;
+
+  if (hsa_summaries == NULL)
+    hsa_summaries = new hsa_summary_t (symtab);
+
+  while ((file_data = file_data_vec[j++]))
+    {
+      size_t len;
+      const char *data = lto_get_section_data (file_data, LTO_section_ipa_hsa,
+					       NULL, &len);
+
+      if (data)
+	ipa_hsa_read_section (file_data, data, len);
+    }
+}
+
+const pass_data pass_data_ipa_hsa =
+{
+  IPA_PASS, /* type */
+  "hsa", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_IPA_HSA, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  TODO_dump_symtab, /* todo_flags_finish */
+};
+
+class pass_ipa_hsa : public ipa_opt_pass_d
+{
+public:
+  pass_ipa_hsa (gcc::context *ctxt)
+    : ipa_opt_pass_d (pass_data_ipa_hsa, ctxt,
+		      NULL, /* generate_summary */
+		      ipa_hsa_write_summary, /* write_summary */
+		      ipa_hsa_read_summary, /* read_summary */
+		      ipa_hsa_write_summary, /* write_optimization_summary */
+		      ipa_hsa_read_summary, /* read_optimization_summary */
+		      NULL, /* stmt_fixup */
+		      0, /* function_transform_todo_flags_start */
+		      NULL, /* function_transform */
+		      NULL) /* variable_transform */
+    {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *);
+
+  virtual unsigned int execute (function *) { return process_hsa_functions (); }
+
+}; // class pass_ipa_reference
+
+bool
+pass_ipa_hsa::gate (function *)
+{
+  return hsa_gen_requested_p () || in_lto_p;
+}
+
+} // anon namespace
+
+ipa_opt_pass_d *
+make_pass_ipa_hsa (gcc::context *ctxt)
+{
+  return new pass_ipa_hsa (ctxt);
+}
diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c
index 972f062..93b82be 100644
--- a/gcc/lto-section-in.c
+++ b/gcc/lto-section-in.c
@@ -51,7 +51,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] =
   "ipcp_trans",
   "icf",
   "offload_table",
-  "mode_table"
+  "mode_table",
+  "hsa"
 };
 
 
diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index 42654f5..0cb200e 100644
--- a/gcc/lto-streamer.h
+++ b/gcc/lto-streamer.h
@@ -244,6 +244,7 @@ enum lto_section_type
   LTO_section_ipa_icf,
   LTO_section_offload_table,
   LTO_section_mode_table,
+  LTO_section_ipa_hsa,
   LTO_N_SECTION_TYPES		/* Must be last.  */
 };
 
diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c
index 81a63a5..0a56170 100644
--- a/gcc/lto/lto-partition.c
+++ b/gcc/lto/lto-partition.c
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-prop.h"
 #include "ipa-inline.h"
 #include "lto-partition.h"
+#include "hsa.h"
 
 vec<ltrans_partition> ltrans_partitions;
 
@@ -170,6 +171,24 @@ add_symbol_to_partition_1 (ltrans_partition part, symtab_node *node)
 	 Therefore put it into the same partition.  */
       if (cnode->instrumented_version)
 	add_symbol_to_partition_1 (part, cnode->instrumented_version);
+
+      /* Add an HSA associated with the symbol.  */
+      if (hsa_summaries != NULL)
+	{
+	  hsa_function_summary *s = hsa_summaries->get (cnode);
+	  if (s->m_kind == HSA_KERNEL)
+	    {
+	      /* Add binded function.  */
+	      bool added = add_symbol_to_partition_1 (part,
+						      s->m_binded_function);
+	      gcc_assert (added);
+	      if (symtab->dump_file)
+		fprintf (symtab->dump_file,
+			 "adding an HSA function (host/gpu) to the "
+			 "partition: %s\n",
+			 s->m_binded_function->name ());
+	    }
+	}
     }
 
   add_references_to_partition (part, node);
diff --git a/gcc/timevar.def b/gcc/timevar.def
index 2765179..d9a5066 100644
--- a/gcc/timevar.def
+++ b/gcc/timevar.def
@@ -97,6 +97,7 @@ DEFTIMEVAR (TV_WHOPR_WPA_IO          , "whopr wpa I/O")
 DEFTIMEVAR (TV_WHOPR_PARTITIONING    , "whopr partitioning")
 DEFTIMEVAR (TV_WHOPR_LTRANS          , "whopr ltrans")
 DEFTIMEVAR (TV_IPA_REFERENCE         , "ipa reference")
+DEFTIMEVAR (TV_IPA_HSA		     , "ipa HSA")
 DEFTIMEVAR (TV_IPA_PROFILE           , "ipa profile")
 DEFTIMEVAR (TV_IPA_AUTOFDO           , "auto profile")
 DEFTIMEVAR (TV_IPA_PURE_CONST        , "ipa pure const")

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [hsa merge 06/10] Pass manager changes
  2016-01-13 17:55 [hsa merge 00/10] Merge of HSA branch Martin Jambor
                   ` (5 preceding siblings ...)
  2016-01-13 17:55 ` [hsa merge 03/10] HSA libgomp plugin Martin Jambor
@ 2016-01-13 17:55 ` Martin Jambor
  2016-01-14 12:52   ` Jakub Jelinek
  2016-01-13 18:11 ` [hsa merge 09/10] Majority of the HSA back-end Martin Jambor
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 77+ messages in thread
From: Martin Jambor @ 2016-01-13 17:55 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: pass-manager.diff --]
[-- Type: text/plain, Size: 2314 bytes --]

Hi,

the pass manager changes required for HSA have already been committed
to trunk so all that remains are these additions to the pass pipeline.

This bit has already been approved by Richi in
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00996.html

Thanks,

Martin


2016-01-13  Martin Jambor  <mjambor@suse.cz>
	    Martin Liska  <mliska@suse.cz>

	    * passes.def: Schedule pass_ipa_hsa and pass_gen_hsail.
	    * tree-pass.h (make_pass_gen_hsail): Declare.
	    (make_pass_ipa_hsa): Likewise.

diff --git a/gcc/passes.def b/gcc/passes.def
index c593851..a6a1719 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -151,6 +151,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_ipa_cp);
   NEXT_PASS (pass_ipa_cdtor_merge);
   NEXT_PASS (pass_target_clone);
+  NEXT_PASS (pass_ipa_hsa);
   NEXT_PASS (pass_ipa_inline);
   NEXT_PASS (pass_ipa_pure_const);
   NEXT_PASS (pass_ipa_reference);
@@ -388,6 +389,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_nrv);
   NEXT_PASS (pass_cleanup_cfg_post_optimizing);
   NEXT_PASS (pass_warn_function_noreturn);
+  NEXT_PASS (pass_gen_hsail);
 
   NEXT_PASS (pass_expand);
 
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index e8e8e48..b942a01 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -471,6 +471,7 @@ extern gimple_opt_pass *make_pass_sanopt (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_oacc_kernels (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_ipa_oacc (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_ipa_oacc_kernels (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_gen_hsail (gcc::context *ctxt);
 
 /* IPA Passes */
 extern simple_ipa_opt_pass *make_pass_ipa_lower_emutls (gcc::context *ctxt);
@@ -495,6 +496,7 @@ extern ipa_opt_pass_d *make_pass_ipa_cp (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_icf (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_devirt (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_reference (gcc::context *ctxt);
+extern ipa_opt_pass_d *make_pass_ipa_hsa (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_pure_const (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_ipa_pta (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_ipa_tm (gcc::context *ctxt);

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [hsa merge 03/10] HSA libgomp plugin
  2016-01-13 17:55 [hsa merge 00/10] Merge of HSA branch Martin Jambor
                   ` (4 preceding siblings ...)
  2016-01-13 17:55 ` [hsa merge 07/10] IPA-HSA pass Martin Jambor
@ 2016-01-13 17:55 ` Martin Jambor
  2016-01-14 12:37   ` Jakub Jelinek
  2016-01-13 17:55 ` [hsa merge 06/10] Pass manager changes Martin Jambor
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 77+ messages in thread
From: Martin Jambor @ 2016-01-13 17:55 UTC (permalink / raw)
  To: GCC Patches; +Cc: Jakub Jelinek, Martin Liska

[-- Attachment #1: libgomp-plugin.diff --]
[-- Type: text/plain, Size: 49328 bytes --]

Hi,

the patch below adds the HSA-specific plugin for libgomp.  The plugin
implements the interface mandated by libgomp and takes care of finding
any available HSA devices, finalizing HSAIL code and running it on
HSA-capable GPUs.

This patch is a re-post of
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00716.html with a number
of modifications requested by Jakub.

Thanks,

Martin


2016-01-13  Martin Jambor  <mjambor@suse.cz>
	    Martin Liska  <mliska@suse.cz>

	* plugin/plugin-hsa.c: New file.

diff --git a/libgomp/plugin/plugin-hsa.c b/libgomp/plugin/plugin-hsa.c
new file mode 100644
index 0000000..d888493
--- /dev/null
+++ b/libgomp/plugin/plugin-hsa.c
@@ -0,0 +1,1493 @@
+/* Plugin for HSAIL execution.
+
+   Copyright (C) 2013-2016 Free Software Foundation, Inc.
+
+   Contributed by Martin Jambor <mjambor@suse.cz> and
+   Martin Liska <mliska@suse.cz>.
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <pthread.h>
+#include <hsa.h>
+#include <hsa_ext_finalize.h>
+#include <dlfcn.h>
+#include "libgomp-plugin.h"
+#include "gomp-constants.h"
+
+/* Keep the following GOMP prefixed structures in sync with respective parts of
+   the compiler.  */
+
+/* Structure describing the run-time and grid properties of an HSA kernel
+   lauch.  */
+
+struct GOMP_kernel_launch_attributes
+{
+  /* Number of dimensions the workload has.  Maximum number is 3.  */
+  uint32_t ndim;
+  /* Size of the grid in the three respective dimensions.  */
+  uint32_t gdims[3];
+  /* Size of work-groups in the respective dimensions.  */
+  uint32_t wdims[3];
+};
+
+/* Collection of information needed for a dispatch of a kernel from a
+   kernel.  */
+
+struct GOMP_hsa_kernel_dispatch
+{
+  /* Pointer to a command queue associated with a kernel dispatch agent.  */
+  void *queue;
+  /* Pointer to reserved memory for OMP data struct copying.  */
+  void *omp_data_memory;
+  /* Pointer to a memory space used for kernel arguments passing.  */
+  void *kernarg_address;
+  /* Kernel object.  */
+  uint64_t object;
+  /* Synchronization signal used for dispatch synchronization.  */
+  uint64_t signal;
+  /* Private segment size.  */
+  uint32_t private_segment_size;
+  /* Group segment size.  */
+  uint32_t group_segment_size;
+  /* Number of children kernel dispatches.  */
+  uint64_t kernel_dispatch_count;
+  /* Debug purpose argument.  */
+  uint64_t debug;
+  /* Levels-var ICV.  */
+  uint64_t omp_level;
+  /* Kernel dispatch structures created for children kernel dispatches.  */
+  struct GOMP_hsa_kernel_dispatch **children_dispatches;
+  /* Number of threads.  */
+  uint32_t omp_num_threads;
+};
+
+/* Part of the libgomp plugin interface.  Return the name of the accelerator,
+   which is "hsa".  */
+
+const char *
+GOMP_OFFLOAD_get_name (void)
+{
+  return "hsa";
+}
+
+/* Part of the libgomp plugin interface.  Return the specific capabilities the
+   HSA accelerator have.  */
+
+unsigned int
+GOMP_OFFLOAD_get_caps (void)
+{
+  return GOMP_OFFLOAD_CAP_SHARED_MEM | GOMP_OFFLOAD_CAP_OPENMP_400;
+}
+
+/* Part of the libgomp plugin interface.  Identify as HSA accelerator.  */
+
+int
+GOMP_OFFLOAD_get_type (void)
+{
+  return OFFLOAD_TARGET_TYPE_HSA;
+}
+
+/* Return the libgomp version number we're compatible with.  There is
+   no requirement for cross-version compatibility.  */
+
+unsigned
+GOMP_OFFLOAD_version (void)
+{
+  return GOMP_VERSION;
+}
+
+/* Flag to decide whether print to stderr information about what is going on.
+   Set in init_debug depending on environment variables.  */
+
+static bool debug;
+
+/* Flag to decide if the runtime should suppress a possible fallback to host
+   execution.  */
+
+static bool suppress_host_fallback;
+
+/* Initialize debug and suppress_host_fallback according to the environment.  */
+
+static void
+init_enviroment_variables (void)
+{
+  if (getenv ("HSA_DEBUG"))
+    debug = true;
+  else
+    debug = false;
+
+  if (getenv ("HSA_SUPPRESS_HOST_FALLBACK"))
+    suppress_host_fallback = true;
+  else
+    suppress_host_fallback = false;
+}
+
+/* Print a logging message with PREFIX to stderr if HSA_DEBUG value
+   is set to true.  */
+
+#define HSA_LOG(prefix, ...) \
+  do \
+  { \
+    if (debug) \
+      { \
+	fprintf (stderr, prefix); \
+	fprintf (stderr, __VA_ARGS__); \
+      } \
+  } \
+  while (false);
+
+/* Print a debugging message to stderr.  */
+
+#define HSA_DEBUG(...) HSA_LOG ("HSA debug: ", __VA_ARGS__)
+
+/* Print a warning message to stderr.  */
+
+#define HSA_WARNING(...) HSA_LOG ("HSA warning: ", __VA_ARGS__)
+
+/* Print HSA warning STR with an HSA STATUS code.  */
+
+static void
+hsa_warn (const char *str, hsa_status_t status)
+{
+  if (!debug)
+    return;
+
+  const char *hsa_error;
+  hsa_status_string (status, &hsa_error);
+
+  fprintf (stderr, "HSA warning: %s\nRuntime message: %s", str, hsa_error);
+}
+
+/* Report a fatal error STR together with the HSA error corresponding to STATUS
+   and terminate execution of the current process.  */
+
+static void
+hsa_fatal (const char *str, hsa_status_t status)
+{
+  const char *hsa_error;
+  hsa_status_string (status, &hsa_error);
+  GOMP_PLUGIN_fatal ("HSA fatal error: %s\nRuntime message: %s", str,
+		     hsa_error);
+}
+
+struct hsa_kernel_description
+{
+  const char *name;
+  unsigned omp_data_size;
+  bool gridified_kernel_p;
+  unsigned kernel_dependencies_count;
+  const char **kernel_dependencies;
+};
+
+struct global_var_info
+{
+  const char *name;
+  void *address;
+};
+
+/* Data passed by the static initializer of a compilation unit containing BRIG
+   to GOMP_offload_register.  */
+
+struct brig_image_desc
+{
+  hsa_ext_module_t brig_module;
+  const unsigned kernel_count;
+  struct hsa_kernel_description *kernel_infos;
+  const unsigned global_variable_count;
+  struct global_var_info *global_variables;
+};
+
+struct agent_info;
+
+/* Information required to identify, finalize and run any given kernel.  */
+
+struct kernel_info
+{
+  /* Name of the kernel, required to locate it within the brig module.  */
+  const char *name;
+  /* Size of memory space for OMP data.  */
+  unsigned omp_data_size;
+  /* The specific agent the kernel has been or will be finalized for and run
+     on.  */
+  struct agent_info *agent;
+  /* The specific module where the kernel takes place.  */
+  struct module_info *module;
+  /* Mutex enforcing that at most once thread ever initializes a kernel for
+     use.  A thread should have locked agent->modules_rwlock for reading before
+     acquiring it.  */
+  pthread_mutex_t init_mutex;
+  /* Flag indicating whether the kernel has been initialized and all fields
+     below it contain valid data.  */
+  bool initialized;
+  /* Flag indicating that the kernel has a problem that blocks an execution.  */
+  bool initialization_failed;
+  /* The object to be put into the dispatch queue.  */
+  uint64_t object;
+  /* Required size of kernel arguments.  */
+  uint32_t kernarg_segment_size;
+  /* Required size of group segment.  */
+  uint32_t group_segment_size;
+  /* Required size of private segment.  */
+  uint32_t private_segment_size;
+  /* List of all kernel dependencies.  */
+  const char **dependencies;
+  /* Number of dependencies.  */
+  unsigned dependencies_count;
+  /* Maximum OMP data size necessary for kernel from kernel dispatches.  */
+  unsigned max_omp_data_size;
+  /* True if the kernel is gridified.  */
+  bool gridified_kernel_p;
+};
+
+/* Information about a particular brig module, its image and kernels.  */
+
+struct module_info
+{
+  /* The next and previous module in the linked list of modules of an agent.  */
+  struct module_info *next, *prev;
+  /* The description with which the program has registered the image.  */
+  struct brig_image_desc *image_desc;
+
+  /* Number of kernels in this module.  */
+  int kernel_count;
+  /* An array of kernel_info structures describing each kernel in this
+     module.  */
+  struct kernel_info kernels[];
+};
+
+/* Information about shared brig library.  */
+
+struct brig_library_info
+{
+  char *file_name;
+  hsa_ext_module_t image;
+};
+
+/* Description of an HSA GPU agent and the program associated with it.  */
+
+struct agent_info
+{
+  /* The HSA ID of the agent.  Assigned when hsa_context is initialized.  */
+  hsa_agent_t id;
+  /* Whether the agent has been initialized.  The fields below are usable only
+     if it has been.  */
+  bool initialized;
+  /* The HSA ISA of this agent.  */
+  hsa_isa_t isa;
+  /* Command queue of the agent.  */
+  hsa_queue_t *command_q;
+  /* Kernel from kernel dispatch command queue.  */
+  hsa_queue_t *kernel_dispatch_command_q;
+  /* The HSA memory region from which to allocate kernel arguments.  */
+  hsa_region_t kernarg_region;
+
+  /* Read-write lock that protects kernels which are running or about to be run
+     from interference with loading and unloading of images.  Needs to be
+     locked for reading while a kernel is being run, and for writing if the
+     list of modules is manipulated (and thus the HSA program invalidated).  */
+  pthread_rwlock_t modules_rwlock;
+  /* The first module in a linked list of modules associated with this
+     kernel.  */
+  struct module_info *first_module;
+
+  /* Mutex enforcing that only one thread will finalize the HSA program.  A
+     thread should have locked agent->modules_rwlock for reading before
+     acquiring it.  */
+  pthread_mutex_t prog_mutex;
+  /* Flag whether the HSA program that consists of all the modules has been
+     finalized.  */
+  bool prog_finalized;
+  /* Flag whether the program was finalized but with a failure.  */
+  bool prog_finalized_error;
+  /* HSA executable - the finalized program that is used to locate kernels.  */
+  hsa_executable_t executable;
+  /* List of BRIG libraries.  */
+  struct brig_library_info **brig_libraries;
+  /* Number of loaded shared BRIG libraries.  */
+  unsigned brig_libraries_count;
+};
+
+/* Information about the whole HSA environment and all of its agents.  */
+
+struct hsa_context_info
+{
+  /* Whether the structure has been initialized.  */
+  bool initialized;
+  /* Number of usable GPU HSA agents in the system.  */
+  int agent_count;
+  /* Array of agent_info structures describing the individual HSA agents.  */
+  struct agent_info *agents;
+};
+
+/* Information about the whole HSA environment and all of its agents.  */
+
+static struct hsa_context_info hsa_context;
+
+/* Find kernel for an AGENT by name provided in KERNEL_NAME.  */
+
+static struct kernel_info *
+get_kernel_for_agent (struct agent_info *agent, const char *kernel_name)
+{
+  struct module_info *module = agent->first_module;
+
+  while (module)
+    {
+      for (unsigned i = 0; i < module->kernel_count; i++)
+	if (strcmp (module->kernels[i].name, kernel_name) == 0)
+	  return &module->kernels[i];
+
+      module = module->next;
+    }
+
+  return NULL;
+}
+
+/* Return true if the agent is a GPU and acceptable of concurrent submissions
+   from different threads.  */
+
+static bool
+suitable_hsa_agent_p (hsa_agent_t agent)
+{
+  hsa_device_type_t device_type;
+  hsa_status_t status
+    = hsa_agent_get_info (agent, HSA_AGENT_INFO_DEVICE, &device_type);
+  if (status != HSA_STATUS_SUCCESS || device_type != HSA_DEVICE_TYPE_GPU)
+    return false;
+
+  uint32_t features = 0;
+  status = hsa_agent_get_info (agent, HSA_AGENT_INFO_FEATURE, &features);
+  if (status != HSA_STATUS_SUCCESS
+      || !(features & HSA_AGENT_FEATURE_KERNEL_DISPATCH))
+    return false;
+  hsa_queue_type_t queue_type;
+  status = hsa_agent_get_info (agent, HSA_AGENT_INFO_QUEUE_TYPE, &queue_type);
+  if (status != HSA_STATUS_SUCCESS
+      || (queue_type != HSA_QUEUE_TYPE_MULTI))
+    return false;
+
+  return true;
+}
+
+/* Callback of hsa_iterate_agents, if AGENT is a GPU device, increment
+   agent_count in hsa_context.  */
+
+static hsa_status_t
+count_gpu_agents (hsa_agent_t agent, void *data __attribute__ ((unused)))
+{
+  if (suitable_hsa_agent_p (agent))
+    hsa_context.agent_count++;
+  return HSA_STATUS_SUCCESS;
+}
+
+/* Callback of hsa_iterate_agents, if AGENT is a GPU device, assign the agent
+   id to the describing structure in the hsa context.  The index of the
+   structure is pointed to by DATA, increment it afterwards.  */
+
+static hsa_status_t
+assign_agent_ids (hsa_agent_t agent, void *data)
+{
+  if (suitable_hsa_agent_p (agent))
+    {
+      int *agent_index = (int *) data;
+      hsa_context.agents[*agent_index].id = agent;
+      ++*agent_index;
+    }
+  return HSA_STATUS_SUCCESS;
+}
+
+/* Initialize hsa_context if it has not already been done.  */
+
+static void
+init_hsa_context (void)
+{
+  hsa_status_t status;
+  int agent_index = 0;
+
+  if (hsa_context.initialized)
+    return;
+  init_enviroment_variables ();
+  status = hsa_init ();
+  if (status != HSA_STATUS_SUCCESS)
+    hsa_fatal ("Run-time could not be initialized", status);
+  HSA_DEBUG ("HSA run-time initialized\n");
+  status = hsa_iterate_agents (count_gpu_agents, NULL);
+  if (status != HSA_STATUS_SUCCESS)
+    hsa_fatal ("HSA GPU devices could not be enumerated", status);
+  HSA_DEBUG ("There are %i HSA GPU devices.\n", hsa_context.agent_count);
+
+  hsa_context.agents
+    = GOMP_PLUGIN_malloc_cleared (hsa_context.agent_count
+				  * sizeof (struct agent_info));
+  status = hsa_iterate_agents (assign_agent_ids, &agent_index);
+  if (agent_index != hsa_context.agent_count)
+    GOMP_PLUGIN_fatal ("Failed to assign IDs to all HSA agents");
+  hsa_context.initialized = true;
+}
+
+/* Callback of dispatch queues to report errors.  */
+
+static void
+queue_callback (hsa_status_t status,
+		hsa_queue_t *queue __attribute__ ((unused)),
+		void *data __attribute__ ((unused)))
+{
+  hsa_fatal ("Asynchronous queue error", status);
+}
+
+/* Callback of hsa_agent_iterate_regions.  Determine if a memory REGION can be
+   used for kernarg allocations and if so write it to the memory pointed to by
+   DATA and break the query.  */
+
+static hsa_status_t
+get_kernarg_memory_region (hsa_region_t region, void *data)
+{
+  hsa_status_t status;
+  hsa_region_segment_t segment;
+
+  status = hsa_region_get_info (region, HSA_REGION_INFO_SEGMENT, &segment);
+  if (status != HSA_STATUS_SUCCESS)
+    return status;
+  if (segment != HSA_REGION_SEGMENT_GLOBAL)
+    return HSA_STATUS_SUCCESS;
+
+  uint32_t flags;
+  status = hsa_region_get_info (region, HSA_REGION_INFO_GLOBAL_FLAGS, &flags);
+  if (status != HSA_STATUS_SUCCESS)
+    return status;
+  if (flags & HSA_REGION_GLOBAL_FLAG_KERNARG)
+    {
+      hsa_region_t *ret = (hsa_region_t *) data;
+      *ret = region;
+      return HSA_STATUS_INFO_BREAK;
+    }
+  return HSA_STATUS_SUCCESS;
+}
+
+/* Part of the libgomp plugin interface.  Return the number of HSA devices on
+   the system.  */
+
+int
+GOMP_OFFLOAD_get_num_devices (void)
+{
+  init_hsa_context ();
+  return hsa_context.agent_count;
+}
+
+/* Part of the libgomp plugin interface.  Initialize agent number N so that it
+   can be used for computation.  */
+
+void
+GOMP_OFFLOAD_init_device (int n)
+{
+  init_hsa_context ();
+  if (n >= hsa_context.agent_count)
+    GOMP_PLUGIN_fatal ("Request to initialize non-existing HSA device %i", n);
+  struct agent_info *agent = &hsa_context.agents[n];
+
+  if (agent->initialized)
+    return;
+
+  if (pthread_rwlock_init (&agent->modules_rwlock, NULL))
+    GOMP_PLUGIN_fatal ("Failed to initialize an HSA agent rwlock");
+  if (pthread_mutex_init (&agent->prog_mutex, NULL))
+    GOMP_PLUGIN_fatal ("Failed to initialize an HSA agent program mutex");
+
+  uint32_t queue_size;
+  hsa_status_t status;
+  status = hsa_agent_get_info (agent->id, HSA_AGENT_INFO_QUEUE_MAX_SIZE,
+			       &queue_size);
+  if (status != HSA_STATUS_SUCCESS)
+    hsa_fatal ("Error requesting maximum queue size of the HSA agent", status);
+  status = hsa_agent_get_info (agent->id, HSA_AGENT_INFO_ISA, &agent->isa);
+  if (status != HSA_STATUS_SUCCESS)
+    hsa_fatal ("Error querying the ISA of the agent", status);
+  status = hsa_queue_create (agent->id, queue_size, HSA_QUEUE_TYPE_MULTI,
+			     queue_callback, NULL, UINT32_MAX, UINT32_MAX,
+			     &agent->command_q);
+  if (status != HSA_STATUS_SUCCESS)
+    hsa_fatal ("Error creating command queue", status);
+
+  status = hsa_queue_create (agent->id, queue_size, HSA_QUEUE_TYPE_MULTI,
+			     queue_callback, NULL, UINT32_MAX, UINT32_MAX,
+			     &agent->kernel_dispatch_command_q);
+  if (status != HSA_STATUS_SUCCESS)
+    hsa_fatal ("Error creating kernel dispatch command queue", status);
+
+  agent->kernarg_region.handle = (uint64_t) -1;
+  status = hsa_agent_iterate_regions (agent->id, get_kernarg_memory_region,
+				      &agent->kernarg_region);
+  if (agent->kernarg_region.handle == (uint64_t) -1)
+    GOMP_PLUGIN_fatal ("Could not find suitable memory region for kernel "
+		       "arguments");
+  HSA_DEBUG ("HSA agent initialized, queue has id %llu\n",
+	     (long long unsigned) agent->command_q->id);
+  HSA_DEBUG ("HSA agent initialized, kernel dispatch queue has id %llu\n",
+	     (long long unsigned) agent->kernel_dispatch_command_q->id);
+  agent->initialized = true;
+}
+
+/* Verify that hsa_context has already been initialized and return the
+   agent_info structure describing device number N.  */
+
+static struct agent_info *
+get_agent_info (int n)
+{
+  if (!hsa_context.initialized)
+    GOMP_PLUGIN_fatal ("Attempt to use uninitialized HSA context.");
+  if (n >= hsa_context.agent_count)
+    GOMP_PLUGIN_fatal ("Request to operate on anon-existing HSA device %i", n);
+  if (!hsa_context.agents[n].initialized)
+    GOMP_PLUGIN_fatal ("Attempt to use an uninitialized HSA agent.");
+  return &hsa_context.agents[n];
+}
+
+/* Insert MODULE to the linked list of modules of AGENT.  */
+
+static void
+add_module_to_agent (struct agent_info *agent, struct module_info *module)
+{
+  if (agent->first_module)
+    agent->first_module->prev = module;
+  module->next = agent->first_module;
+  module->prev = NULL;
+  agent->first_module = module;
+}
+
+/* Remove MODULE from the linked list of modules of AGENT.  */
+
+static void
+remove_module_from_agent (struct agent_info *agent, struct module_info *module)
+{
+  if (agent->first_module == module)
+    agent->first_module = module->next;
+  if (module->prev)
+    module->prev->next = module->next;
+  if (module->next)
+    module->next->prev = module->prev;
+}
+
+/* Free the HSA program in agent and everything associated with it and set
+   agent->prog_finalized and the initialized flags of all kernels to false.  */
+
+static void
+destroy_hsa_program (struct agent_info *agent)
+{
+  if (!agent->prog_finalized || agent->prog_finalized_error)
+    return;
+
+  hsa_status_t status;
+
+  HSA_DEBUG ("Destroying the current HSA program.\n");
+
+  status = hsa_executable_destroy (agent->executable);
+  if (status != HSA_STATUS_SUCCESS)
+    hsa_fatal ("Could not destroy HSA executable", status);
+
+  struct module_info *module;
+  for (module = agent->first_module; module; module = module->next)
+    {
+      int i;
+      for (i = 0; i < module->kernel_count; i++)
+	module->kernels[i].initialized = false;
+    }
+  agent->prog_finalized = false;
+}
+
+/* Part of the libgomp plugin interface.  Load BRIG module described by struct
+   brig_image_desc in TARGET_DATA and return references to kernel descriptors
+   in TARGET_TABLE.  */
+
+int
+GOMP_OFFLOAD_load_image (int ord, unsigned version, void *target_data,
+			 struct addr_pair **target_table)
+{
+  if (GOMP_VERSION_DEV (version) > GOMP_VERSION_HSA)
+    GOMP_PLUGIN_fatal ("Offload data incompatible with HSA plugin"
+		       " (expected %u, received %u)",
+		       GOMP_VERSION_HSA, GOMP_VERSION_DEV (version));
+
+  struct brig_image_desc *image_desc = (struct brig_image_desc *) target_data;
+  struct agent_info *agent;
+  struct addr_pair *pair;
+  struct module_info *module;
+  struct kernel_info *kernel;
+  int kernel_count = image_desc->kernel_count;
+
+  agent = get_agent_info (ord);
+  if (pthread_rwlock_wrlock (&agent->modules_rwlock))
+    GOMP_PLUGIN_fatal ("Unable to write-lock an HSA agent rwlock");
+  if (agent->prog_finalized)
+    destroy_hsa_program (agent);
+
+  HSA_DEBUG ("Encountered %d kernels in an image\n", kernel_count);
+  pair = GOMP_PLUGIN_malloc (kernel_count * sizeof (struct addr_pair));
+  *target_table = pair;
+  module = (struct module_info *)
+    GOMP_PLUGIN_malloc_cleared (sizeof (struct module_info)
+				+ kernel_count * sizeof (struct kernel_info));
+  module->image_desc = image_desc;
+  module->kernel_count = kernel_count;
+
+  kernel = &module->kernels[0];
+
+  /* Allocate memory for kernel dependencies.  */
+  for (unsigned i = 0; i < kernel_count; i++)
+    {
+      pair->start = (uintptr_t) kernel;
+      pair->end = (uintptr_t) (kernel + 1);
+
+      struct hsa_kernel_description *d = &image_desc->kernel_infos[i];
+      kernel->agent = agent;
+      kernel->module = module;
+      kernel->name = d->name;
+      kernel->omp_data_size = d->omp_data_size;
+      kernel->gridified_kernel_p = d->gridified_kernel_p;
+      kernel->dependencies_count = d->kernel_dependencies_count;
+      kernel->dependencies = d->kernel_dependencies;
+      if (pthread_mutex_init (&kernel->init_mutex, NULL))
+	GOMP_PLUGIN_fatal ("Failed to initialize an HSA kernel mutex");
+
+      kernel++;
+      pair++;
+    }
+
+  add_module_to_agent (agent, module);
+  if (pthread_rwlock_unlock (&agent->modules_rwlock))
+    GOMP_PLUGIN_fatal ("Unable to unlock an HSA agent rwlock");
+  return kernel_count;
+}
+
+/* Add a shared BRIG library from a FILE_NAME to an AGENT.  */
+
+static struct brig_library_info *
+add_shared_library (const char *file_name, struct agent_info *agent)
+{
+  struct brig_library_info *library = NULL;
+
+  void *f = dlopen (file_name, RTLD_NOW);
+  void *start = dlsym (f, "__brig_start");
+  void *end = dlsym (f, "__brig_end");
+
+  if (start == NULL || end == NULL)
+    return NULL;
+
+  unsigned size = end - start;
+  char *buf = (char *) GOMP_PLUGIN_malloc (size);
+  memcpy (buf, start, size);
+
+  library = GOMP_PLUGIN_malloc (sizeof (struct agent_info));
+  library->file_name = (char *) GOMP_PLUGIN_malloc
+    ((strlen (file_name) + 1));
+  strcpy (library->file_name, file_name);
+  library->image = (hsa_ext_module_t) buf;
+
+  return library;
+}
+
+/* Release memory used for BRIG shared libraries that correspond
+   to an AGENT.  */
+
+static void
+release_agent_shared_libraries (struct agent_info *agent)
+{
+  for (unsigned i = 0; i < agent->brig_libraries_count; i++)
+    if (agent->brig_libraries[i])
+      {
+	free (agent->brig_libraries[i]->file_name);
+	free (agent->brig_libraries[i]->image);
+	free (agent->brig_libraries[i]);
+      }
+
+  free (agent->brig_libraries);
+}
+
+/* Create and finalize the program consisting of all loaded modules.  */
+
+static void
+create_and_finalize_hsa_program (struct agent_info *agent)
+{
+  hsa_status_t status;
+  hsa_ext_program_t prog_handle;
+  int mi = 0;
+
+  if (pthread_mutex_lock (&agent->prog_mutex))
+    GOMP_PLUGIN_fatal ("Could not lock an HSA agent program mutex");
+  if (agent->prog_finalized)
+    goto final;
+
+  status = hsa_ext_program_create (HSA_MACHINE_MODEL_LARGE, HSA_PROFILE_FULL,
+				   HSA_DEFAULT_FLOAT_ROUNDING_MODE_DEFAULT,
+				   NULL, &prog_handle);
+  if (status != HSA_STATUS_SUCCESS)
+    hsa_fatal ("Could not create an HSA program", status);
+
+  HSA_DEBUG ("Created a finalized program\n");
+
+  struct module_info *module = agent->first_module;
+  while (module)
+    {
+      status = hsa_ext_program_add_module (prog_handle,
+					   module->image_desc->brig_module);
+      if (status != HSA_STATUS_SUCCESS)
+	hsa_fatal ("Could not add a module to the HSA program", status);
+      module = module->next;
+      mi++;
+    }
+
+  /* Load all shared libraries.  */
+  const char *libraries[] = { "libhsamath.so", "libhsastd.so" };
+  const unsigned libraries_count = sizeof (libraries) / sizeof (const char *);
+
+  agent->brig_libraries_count = libraries_count;
+  agent->brig_libraries = GOMP_PLUGIN_malloc_cleared
+    (sizeof (struct brig_library_info) * libraries_count);
+
+  for (unsigned i = 0; i < libraries_count; i++)
+    {
+      struct brig_library_info *library = add_shared_library (libraries[i],
+							      agent);
+      if (library == NULL)
+	{
+	  HSA_WARNING ("Could not open a shared BRIG library: %s\n",
+		       libraries[i]);
+	  continue;
+	}
+
+      status = hsa_ext_program_add_module (prog_handle, library->image);
+      if (status != HSA_STATUS_SUCCESS)
+	hsa_warn ("Could not add a shared BRIG library the HSA program",
+		  status);
+      else
+	HSA_DEBUG ("a shared BRIG library has been added to a program: %s\n",
+		   libraries[i]);
+    }
+
+  hsa_ext_control_directives_t control_directives;
+  memset (&control_directives, 0, sizeof (control_directives));
+  hsa_code_object_t code_object;
+  status = hsa_ext_program_finalize (prog_handle, agent->isa,
+				     HSA_EXT_FINALIZER_CALL_CONVENTION_AUTO,
+				     control_directives, "",
+				     HSA_CODE_OBJECT_TYPE_PROGRAM,
+				     &code_object);
+  if (status != HSA_STATUS_SUCCESS)
+    {
+      hsa_warn ("Finalization of the HSA program failed", status);
+      goto failure;
+    }
+
+  HSA_DEBUG ("Finalization done\n");
+  hsa_ext_program_destroy (prog_handle);
+
+  status
+    = hsa_executable_create (HSA_PROFILE_FULL, HSA_EXECUTABLE_STATE_UNFROZEN,
+			     "", &agent->executable);
+  if (status != HSA_STATUS_SUCCESS)
+    hsa_fatal ("Could not create HSA executable", status);
+
+  module = agent->first_module;
+  while (module)
+    {
+      /* Initialize all global variables declared in the module.  */
+      for (unsigned i = 0; i < module->image_desc->global_variable_count; i++)
+	{
+	  struct global_var_info *var;
+	  var = &module->image_desc->global_variables[i];
+	  status
+	    = hsa_executable_global_variable_define (agent->executable,
+						     var->name, var->address);
+
+	  HSA_DEBUG ("Defining global variable: %s, address: %p\n", var->name,
+		     var->address);
+
+	  if (status != HSA_STATUS_SUCCESS)
+	    hsa_fatal ("Could not define a global variable in the HSA program",
+		       status);
+	}
+
+      module = module->next;
+    }
+
+  status = hsa_executable_load_code_object (agent->executable, agent->id,
+					    code_object, "");
+  if (status != HSA_STATUS_SUCCESS)
+    hsa_fatal ("Could not add a code object to the HSA executable", status);
+  status = hsa_executable_freeze (agent->executable, "");
+  if (status != HSA_STATUS_SUCCESS)
+    hsa_fatal ("Could not freeze the HSA executable", status);
+
+  HSA_DEBUG ("Froze HSA executable with the finalized code object\n");
+
+  /* If all goes good, jump to final.  */
+  goto final;
+
+failure:
+  agent->prog_finalized_error = true;
+
+final:
+  agent->prog_finalized = true;
+
+  if (pthread_mutex_unlock (&agent->prog_mutex))
+    GOMP_PLUGIN_fatal ("Could not unlock an HSA agent program mutex");
+}
+
+/* Create kernel dispatch data structure for given KERNEL.  */
+
+static struct GOMP_hsa_kernel_dispatch *
+create_single_kernel_dispatch (struct kernel_info *kernel,
+			       unsigned omp_data_size)
+{
+  struct agent_info *agent = kernel->agent;
+  struct GOMP_hsa_kernel_dispatch *shadow
+    = GOMP_PLUGIN_malloc_cleared (sizeof (struct GOMP_hsa_kernel_dispatch));
+
+  shadow->queue = agent->command_q;
+  shadow->omp_data_memory
+    = omp_data_size > 0 ? GOMP_PLUGIN_malloc (omp_data_size) : NULL;
+  unsigned dispatch_count = kernel->dependencies_count;
+  shadow->kernel_dispatch_count = dispatch_count;
+
+  shadow->children_dispatches
+    = GOMP_PLUGIN_malloc (dispatch_count * sizeof (shadow));
+
+  shadow->object = kernel->object;
+
+  hsa_signal_t sync_signal;
+  hsa_status_t status = hsa_signal_create (1, 0, NULL, &sync_signal);
+  if (status != HSA_STATUS_SUCCESS)
+    hsa_fatal ("Error creating the HSA sync signal", status);
+
+  shadow->signal = sync_signal.handle;
+  shadow->private_segment_size = kernel->private_segment_size;
+  shadow->group_segment_size = kernel->group_segment_size;
+
+  status
+    = hsa_memory_allocate (agent->kernarg_region, kernel->kernarg_segment_size,
+			   &shadow->kernarg_address);
+  if (status != HSA_STATUS_SUCCESS)
+    hsa_fatal ("Could not allocate memory for HSA kernel arguments", status);
+
+  return shadow;
+}
+
+/* Release data structure created for a kernel dispatch in SHADOW argument.  */
+
+static void
+release_kernel_dispatch (struct GOMP_hsa_kernel_dispatch *shadow)
+{
+  HSA_DEBUG ("Released kernel dispatch: %p has value: %lu (%p)\n", shadow,
+	     shadow->debug, (void *) shadow->debug);
+
+  hsa_memory_free (shadow->kernarg_address);
+
+  hsa_signal_t s;
+  s.handle = shadow->signal;
+  hsa_signal_destroy (s);
+
+  free (shadow->omp_data_memory);
+
+  for (unsigned i = 0; i < shadow->kernel_dispatch_count; i++)
+    release_kernel_dispatch (shadow->children_dispatches[i]);
+
+  free (shadow->children_dispatches);
+  free (shadow);
+}
+
+/* Initialize a KERNEL without its dependencies.  MAX_OMP_DATA_SIZE is used
+   to calculate maximum necessary memory for OMP data allocation.  */
+
+static void
+init_single_kernel (struct kernel_info *kernel, unsigned *max_omp_data_size)
+{
+  hsa_status_t status;
+  struct agent_info *agent = kernel->agent;
+  hsa_executable_symbol_t kernel_symbol;
+  status = hsa_executable_get_symbol (agent->executable, NULL, kernel->name,
+				      agent->id, 0, &kernel_symbol);
+  if (status != HSA_STATUS_SUCCESS)
+    {
+      hsa_warn ("Could not find symbol for kernel in the code object", status);
+      goto failure;
+    }
+  HSA_DEBUG ("Located kernel %s\n", kernel->name);
+  status
+    = hsa_executable_symbol_get_info (kernel_symbol,
+				      HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_OBJECT,
+				      &kernel->object);
+  if (status != HSA_STATUS_SUCCESS)
+    hsa_fatal ("Could not extract a kernel object from its symbol", status);
+  status = hsa_executable_symbol_get_info
+    (kernel_symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_SIZE,
+     &kernel->kernarg_segment_size);
+  if (status != HSA_STATUS_SUCCESS)
+    hsa_fatal ("Could not get info about kernel argument size", status);
+  status = hsa_executable_symbol_get_info
+    (kernel_symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_GROUP_SEGMENT_SIZE,
+     &kernel->group_segment_size);
+  if (status != HSA_STATUS_SUCCESS)
+    hsa_fatal ("Could not get info about kernel group segment size", status);
+  status = hsa_executable_symbol_get_info
+    (kernel_symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_PRIVATE_SEGMENT_SIZE,
+     &kernel->private_segment_size);
+  if (status != HSA_STATUS_SUCCESS)
+    hsa_fatal ("Could not get info about kernel private segment size",
+	       status);
+
+  HSA_DEBUG ("Kernel structure for %s fully initialized with "
+	     "following segment sizes: \n", kernel->name);
+  HSA_DEBUG ("  group_segment_size: %u\n",
+	     (unsigned) kernel->group_segment_size);
+  HSA_DEBUG ("  private_segment_size: %u\n",
+	     (unsigned) kernel->private_segment_size);
+  HSA_DEBUG ("  kernarg_segment_size: %u\n",
+	     (unsigned) kernel->kernarg_segment_size);
+  HSA_DEBUG ("  omp_data_size: %u\n", kernel->omp_data_size);
+  HSA_DEBUG ("  gridified_kernel_p: %u\n", kernel->gridified_kernel_p);
+
+  if (kernel->omp_data_size > *max_omp_data_size)
+    *max_omp_data_size = kernel->omp_data_size;
+
+  for (unsigned i = 0; i < kernel->dependencies_count; i++)
+    {
+      struct kernel_info *dependency
+	= get_kernel_for_agent (agent, kernel->dependencies[i]);
+
+      if (dependency == NULL)
+	{
+	  HSA_DEBUG ("Could not find a dependency for a kernel: %s, "
+		     "dependency name: %s\n", kernel->name,
+		     kernel->dependencies[i]);
+	  goto failure;
+	}
+
+      if (dependency->dependencies_count > 0)
+	{
+	  HSA_DEBUG ("HSA does not allow kernel dispatching code with "
+		     "a depth bigger than one\n")
+	  goto failure;
+	}
+
+      init_single_kernel (dependency, max_omp_data_size);
+    }
+
+  return;
+
+failure:
+  kernel->initialization_failed = true;
+}
+
+/* Indent stream F by INDENT spaces.  */
+
+static void
+indent_stream (FILE *f, unsigned indent)
+{
+  fprintf (f, "%*s", indent, "");
+}
+
+/* Dump kernel DISPATCH data structure and indent it by INDENT spaces.  */
+
+static void
+print_kernel_dispatch (struct GOMP_hsa_kernel_dispatch *dispatch, unsigned indent)
+{
+  indent_stream (stderr, indent);
+  fprintf (stderr, "this: %p\n", dispatch);
+  indent_stream (stderr, indent);
+  fprintf (stderr, "queue: %p\n", dispatch->queue);
+  indent_stream (stderr, indent);
+  fprintf (stderr, "omp_data_memory: %p\n", dispatch->omp_data_memory);
+  indent_stream (stderr, indent);
+  fprintf (stderr, "kernarg_address: %p\n", dispatch->kernarg_address);
+  indent_stream (stderr, indent);
+  fprintf (stderr, "object: %lu\n", dispatch->object);
+  indent_stream (stderr, indent);
+  fprintf (stderr, "signal: %lu\n", dispatch->signal);
+  indent_stream (stderr, indent);
+  fprintf (stderr, "private_segment_size: %u\n",
+	   dispatch->private_segment_size);
+  indent_stream (stderr, indent);
+  fprintf (stderr, "group_segment_size: %u\n",
+	   dispatch->group_segment_size);
+  indent_stream (stderr, indent);
+  fprintf (stderr, "children dispatches: %lu\n",
+	   dispatch->kernel_dispatch_count);
+  indent_stream (stderr, indent);
+  fprintf (stderr, "omp_num_threads: %u\n",
+	   dispatch->omp_num_threads);
+  fprintf (stderr, "\n");
+
+  for (unsigned i = 0; i < dispatch->kernel_dispatch_count; i++)
+    print_kernel_dispatch (dispatch->children_dispatches[i], indent + 2);
+}
+
+/* Create kernel dispatch data structure for a KERNEL and all its
+   dependencies.  */
+
+static struct GOMP_hsa_kernel_dispatch *
+create_kernel_dispatch (struct kernel_info *kernel, unsigned omp_data_size)
+{
+  struct GOMP_hsa_kernel_dispatch *shadow
+    = create_single_kernel_dispatch (kernel, omp_data_size);
+  shadow->omp_num_threads = 64;
+  shadow->debug = 0;
+  shadow->omp_level = kernel->gridified_kernel_p ? 1 : 0;
+
+  /* Create kernel dispatch data structures.  We do not allow to have
+     a kernel dispatch with depth bigger than one.  */
+  for (unsigned i = 0; i < kernel->dependencies_count; i++)
+    {
+      struct kernel_info *dependency
+	= get_kernel_for_agent (kernel->agent, kernel->dependencies[i]);
+      shadow->children_dispatches[i]
+	= create_single_kernel_dispatch (dependency, omp_data_size);
+      shadow->children_dispatches[i]->queue
+	= kernel->agent->kernel_dispatch_command_q;
+      shadow->children_dispatches[i]->omp_level = 1;
+    }
+
+  return shadow;
+}
+
+/* Do all the work that is necessary before running KERNEL for the first time.
+   The function assumes the program has been created, finalized and frozen by
+   create_and_finalize_hsa_program.  */
+
+static void
+init_kernel (struct kernel_info *kernel)
+{
+  if (pthread_mutex_lock (&kernel->init_mutex))
+    GOMP_PLUGIN_fatal ("Could not lock an HSA kernel initialization mutex");
+  if (kernel->initialized)
+    {
+      if (pthread_mutex_unlock (&kernel->init_mutex))
+	GOMP_PLUGIN_fatal ("Could not unlock an HSA kernel initialization "
+			   "mutex");
+
+      return;
+    }
+
+  /* Precomputed maximum size of OMP data necessary for a kernel from kernel
+     dispatch operation.  */
+  init_single_kernel (kernel, &kernel->max_omp_data_size);
+
+  if (!kernel->initialization_failed)
+    HSA_DEBUG ("\n");
+
+  kernel->initialized = true;
+  if (pthread_mutex_unlock (&kernel->init_mutex))
+    GOMP_PLUGIN_fatal ("Could not unlock an HSA kernel initialization "
+		       "mutex");
+}
+
+/* Parse the target attributes INPUT provided by the compiler and return true
+   if we should run anything all.  If INPUT is NULL, fill DEF with default
+   values, then store INPUT or DEF into *RESULT.  */
+
+static bool
+parse_target_attributes (void **input,
+			 struct GOMP_kernel_launch_attributes *def,
+			 struct GOMP_kernel_launch_attributes **result)
+{
+  if (!input)
+    GOMP_PLUGIN_fatal ("No target arguments provided");
+
+  bool attrs_found = false;
+  while (*input)
+    {
+      uintptr_t id = (uintptr_t) *input;
+      if ((id & GOMP_TARGET_ARG_DEVICE_MASK) == GOMP_DEVICE_HSA
+	  && ((id & GOMP_TARGET_ARG_ID_MASK)
+	      == GOMP_TARGET_ARG_HSA_KERNEL_ATTRIBUTES))
+	{
+	  input++;
+	  attrs_found = true;
+	  break;
+	}
+
+      if (id & GOMP_TARGET_ARG_SUBSEQUENT_PARAM)
+	input++;
+      input++;
+    }
+
+  if (!attrs_found)
+    {
+      def->ndim = 1;
+      def->gdims[0] = 1;
+      def->gdims[1] = 1;
+      def->gdims[2] = 1;
+      def->wdims[0] = 1;
+      def->wdims[1] = 1;
+      def->wdims[2] = 1;
+      *result = def;
+      HSA_DEBUG ("GOMP_OFFLOAD_run called with no launch attributes\n");
+      return true;
+    }
+
+  struct GOMP_kernel_launch_attributes *kla;
+  kla = (struct GOMP_kernel_launch_attributes *) *input;
+  *result = kla;
+  if (kla->ndim != 1)
+    GOMP_PLUGIN_fatal ("HSA does not yet support number of dimensions "
+		       "different from one.");
+  if (kla->gdims[0] == 0)
+    return false;
+
+  HSA_DEBUG ("GOMP_OFFLOAD_run called with grid size %u and group size %u\n",
+	     kla->gdims[0], kla->wdims[0]);
+
+  return true;
+}
+
+/* Return true if the HSA runtime can run function FN_PTR.  */
+
+bool
+GOMP_OFFLOAD_can_run (void *fn_ptr)
+{
+  struct kernel_info *kernel = (struct kernel_info *) fn_ptr;
+  struct agent_info *agent = kernel->agent;
+  create_and_finalize_hsa_program (agent);
+
+  if (agent->prog_finalized_error)
+    goto failure;
+
+  init_kernel (kernel);
+  if (kernel->initialization_failed)
+    goto failure;
+
+  return true;
+
+failure:
+  if (suppress_host_fallback)
+    GOMP_PLUGIN_fatal ("HSA host fallback has been suppressed");
+  HSA_DEBUG ("HSA target cannot be launched, doing a host fallback\n");
+  return false;
+}
+
+/* Part of the libgomp plugin interface.  Run a kernel on device N and pass it
+   an array of pointers in VARS as a parameter.  The kernel is identified by
+   FN_PTR which must point to a kernel_info structure.  */
+
+void
+GOMP_OFFLOAD_run (int n, void *fn_ptr, void *vars, void **args)
+{
+  struct kernel_info *kernel = (struct kernel_info *) fn_ptr;
+  struct agent_info *agent = kernel->agent;
+  struct GOMP_kernel_launch_attributes def;
+  struct GOMP_kernel_launch_attributes *kla;
+  if (!parse_target_attributes (args, &def, &kla))
+    {
+      HSA_DEBUG ("Will not run HSA kernel because the grid size is zero\n");
+      return;
+    }
+  if (pthread_rwlock_rdlock (&agent->modules_rwlock))
+    GOMP_PLUGIN_fatal ("Unable to read-lock an HSA agent rwlock");
+
+  if (!agent->initialized)
+    GOMP_PLUGIN_fatal ("Agent must be initialized");
+
+  if (!kernel->initialized)
+    GOMP_PLUGIN_fatal ("Called kernel must be initialized");
+
+  struct GOMP_hsa_kernel_dispatch *shadow
+    = create_kernel_dispatch (kernel, kernel->max_omp_data_size);
+
+  if (debug)
+    {
+      fprintf (stderr, "\nKernel has following dependencies:\n");
+      print_kernel_dispatch (shadow, 2);
+    }
+
+  uint64_t index = hsa_queue_add_write_index_release (agent->command_q, 1);
+  HSA_DEBUG ("Got AQL index %llu\n", (long long int) index);
+
+  /* Wait until the queue is not full before writing the packet.   */
+  while (index - hsa_queue_load_read_index_acquire (agent->command_q)
+	 >= agent->command_q->size)
+    ;
+
+  hsa_kernel_dispatch_packet_t *packet;
+  packet = ((hsa_kernel_dispatch_packet_t *) agent->command_q->base_address)
+	   + index % agent->command_q->size;
+
+  memset (((uint8_t *) packet) + 4, 0, sizeof (*packet) - 4);
+  packet->setup |= (uint16_t) 1 << HSA_KERNEL_DISPATCH_PACKET_SETUP_DIMENSIONS;
+  packet->grid_size_x = kla->gdims[0];
+  uint32_t wgs = kla->wdims[0];
+  if (wgs == 0)
+    /* TODO: Provide a default via environment.  */
+    wgs = 64;
+  else if (wgs > kla->gdims[0])
+    wgs = kla->gdims[0];
+  packet->workgroup_size_x = wgs;
+  packet->grid_size_y = 1;
+  packet->workgroup_size_y = 1;
+  packet->grid_size_z = 1;
+  packet->workgroup_size_z = 1;
+  packet->private_segment_size = kernel->private_segment_size;
+  packet->group_segment_size = kernel->group_segment_size;
+  packet->kernel_object = kernel->object;
+  packet->kernarg_address = shadow->kernarg_address;
+  hsa_signal_t s;
+  s.handle = shadow->signal;
+  packet->completion_signal = s;
+  hsa_signal_store_relaxed (s, 1);
+  memcpy (shadow->kernarg_address, &vars, sizeof (vars));
+
+  memcpy (shadow->kernarg_address + sizeof (vars), &shadow,
+	  sizeof (struct hsa_kernel_runtime *));
+
+  HSA_DEBUG ("Copying kernel runtime pointer to kernarg_address\n");
+
+  uint16_t header;
+  header = HSA_PACKET_TYPE_KERNEL_DISPATCH << HSA_PACKET_HEADER_TYPE;
+  header |= HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_ACQUIRE_FENCE_SCOPE;
+  header |= HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_RELEASE_FENCE_SCOPE;
+
+  HSA_DEBUG ("Going to dispatch kernel %s\n", kernel->name);
+
+  __atomic_store_n ((uint16_t *) (&packet->header), header, __ATOMIC_RELEASE);
+  hsa_signal_store_release (agent->command_q->doorbell_signal, index);
+
+  /* TODO: GPU agents in Carrizo APUs cannot properly update L2 cache for
+     signal wait and signal load operations on their own and we need to
+     periodically call the hsa_signal_load_acquire on completion signals of
+     children kernels in the CPU to make that happen.  As soon the
+     limitation will be resolved, this workaround can be removed.  */
+
+  HSA_DEBUG ("Kernel dispatched, waiting for completion\n");
+
+  /* Root signal waits with 1ms timeout.  */
+  while (hsa_signal_wait_acquire (s, HSA_SIGNAL_CONDITION_LT, 1, 1000 * 1000,
+				  HSA_WAIT_STATE_BLOCKED) != 0)
+    for (unsigned i = 0; i < shadow->kernel_dispatch_count; i++)
+      {
+	hsa_signal_t child_s;
+	child_s.handle = shadow->children_dispatches[i]->signal;
+
+	HSA_DEBUG ("Waiting for children completion signal: %lu\n",
+		   shadow->children_dispatches[i]->signal);
+	hsa_signal_load_acquire (child_s);
+      }
+
+  release_kernel_dispatch (shadow);
+
+  if (pthread_rwlock_unlock (&agent->modules_rwlock))
+    GOMP_PLUGIN_fatal ("Unable to unlock an HSA agent rwlock");
+}
+
+/* Information to be passed to a thread running a kernel asycnronously.  */
+
+struct async_run_info
+{
+  int device;
+  void *tgt_fn;
+  void *tgt_vars;
+  void **args;
+  void *async_data;
+};
+
+/* Thread routine to run a kernel asynchronously.  */
+
+static void *
+run_kernel_asynchronously (void *thread_arg)
+{
+  struct async_run_info *info = (struct async_run_info *) thread_arg;
+  int device = info->device;
+  void *tgt_fn = info->tgt_fn;
+  void *tgt_vars = info->tgt_vars;
+  void **args = info->args;
+  void *async_data = info->async_data;
+
+  free (info);
+  GOMP_OFFLOAD_run (device, tgt_fn, tgt_vars, args);
+  GOMP_PLUGIN_target_task_completion (async_data);
+  return NULL;
+}
+
+/* Part of the libgomp plugin interface.  Run a kernel like GOMP_OFFLOAD_run
+   does, but asynchronously and call GOMP_PLUGIN_target_task_completion when it
+   has finished.  */
+
+void
+GOMP_OFFLOAD_async_run (int device, void *tgt_fn, void *tgt_vars,
+			void **args, void *async_data)
+{
+  pthread_t pt;
+  struct async_run_info *info;
+  HSA_DEBUG ("GOMP_OFFLOAD_async_run invoked\n")
+  info = GOMP_PLUGIN_malloc (sizeof (struct async_run_info));
+
+  info->device = device;
+  info->tgt_fn = tgt_fn;
+  info->tgt_vars = tgt_vars;
+  info->args = args;
+  info->async_data = async_data;
+
+  int err = pthread_create (&pt, NULL, &run_kernel_asynchronously, info);
+  if (err != 0)
+    GOMP_PLUGIN_fatal ("HSA asynchronous thread creation failed: %s",
+		       strerror (err));
+  err = pthread_detach (pt);
+  if (err != 0)
+    GOMP_PLUGIN_fatal ("Failed to detach a thread to run HSA kernel "
+		       "asynchronously: %s", strerror (err));
+}
+
+/* Deinitialize all information associated with MODULE and kernels within
+   it.  */
+
+void
+destroy_module (struct module_info *module)
+{
+  int i;
+  for (i = 0; i < module->kernel_count; i++)
+    if (pthread_mutex_destroy (&module->kernels[i].init_mutex))
+      GOMP_PLUGIN_fatal ("Failed to destroy an HSA kernel initialization "
+			 "mutex");
+}
+
+/* Part of the libgomp plugin interface.  Unload BRIG module described by
+   struct brig_image_desc in TARGET_DATA from agent number N.  */
+
+void
+GOMP_OFFLOAD_unload_image (int n, unsigned version, void *target_data)
+{
+  if (GOMP_VERSION_DEV (version) > GOMP_VERSION_HSA)
+    GOMP_PLUGIN_fatal ("Offload data incompatible with HSA plugin"
+		       " (expected %u, received %u)",
+		       GOMP_VERSION_HSA, GOMP_VERSION_DEV (version));
+
+  struct agent_info *agent;
+  agent = get_agent_info (n);
+  if (pthread_rwlock_wrlock (&agent->modules_rwlock))
+    GOMP_PLUGIN_fatal ("Unable to write-lock an HSA agent rwlock");
+
+  struct module_info *module = agent->first_module;
+  while (module)
+    {
+      if (module->image_desc == target_data)
+	break;
+      module = module->next;
+    }
+  if (!module)
+    GOMP_PLUGIN_fatal ("Attempt to unload an image that has never been "
+		       "loaded before");
+
+  remove_module_from_agent (agent, module);
+  destroy_module (module);
+  free (module);
+  destroy_hsa_program (agent);
+  if (pthread_rwlock_unlock (&agent->modules_rwlock))
+    GOMP_PLUGIN_fatal ("Unable to unlock an HSA agent rwlock");
+}
+
+/* Part of the libgomp plugin interface.  Deinitialize all information and
+   status associated with agent number N.  We do not attempt any
+   synchronization, assuming the user and libgomp will not attempt
+   deinitialization of a device that is in any way being used at the same
+   time.  */
+
+void
+GOMP_OFFLOAD_fini_device (int n)
+{
+  struct agent_info *agent = get_agent_info (n);
+  if (!agent->initialized)
+    return;
+
+  struct module_info *next_module = agent->first_module;
+  while (next_module)
+    {
+      struct module_info *module = next_module;
+      next_module = module->next;
+      destroy_module (module);
+      free (module);
+    }
+  agent->first_module = NULL;
+  destroy_hsa_program (agent);
+
+  release_agent_shared_libraries (agent);
+
+  hsa_status_t status = hsa_queue_destroy (agent->command_q);
+  if (status != HSA_STATUS_SUCCESS)
+    hsa_fatal ("Error destroying command queue", status);
+  status = hsa_queue_destroy (agent->kernel_dispatch_command_q);
+  if (status != HSA_STATUS_SUCCESS)
+    hsa_fatal ("Error destroying kernel dispatch command queue", status);
+  if (pthread_mutex_destroy (&agent->prog_mutex))
+    GOMP_PLUGIN_fatal ("Failed to destroy an HSA agent program mutex");
+  if (pthread_rwlock_destroy (&agent->modules_rwlock))
+    GOMP_PLUGIN_fatal ("Failed to destroy an HSA agent rwlock");
+  agent->initialized = false;
+}
+
+/* Part of the libgomp plugin interface.  Not implemented as it is not required
+   for HSA.  */
+
+void *
+GOMP_OFFLOAD_alloc (int ord, size_t size)
+{
+  GOMP_PLUGIN_fatal ("HSA GOMP_OFFLOAD_alloc is not implemented because "
+		     "it should never be called");
+}
+
+/* Part of the libgomp plugin interface.  Not implemented as it is not required
+   for HSA.  */
+
+void
+GOMP_OFFLOAD_free (int ord, void *ptr)
+{
+  GOMP_PLUGIN_fatal ("HSA GOMP_OFFLOAD_free is not implemented because "
+		     "it should never be called");
+}
+
+/* Part of the libgomp plugin interface.  Not implemented as it is not required
+   for HSA.  */
+
+void *
+GOMP_OFFLOAD_dev2host (int ord, void *dst, const void *src, size_t n)
+{
+  GOMP_PLUGIN_fatal ("HSA GOMP_OFFLOAD_dev2host is not implemented because "
+		     "it should never be called");
+}
+
+/* Part of the libgomp plugin interface.  Not implemented as it is not required
+   for HSA.  */
+
+void *
+GOMP_OFFLOAD_host2dev (int ord, void *dst, const void *src, size_t n)
+{
+  GOMP_PLUGIN_fatal ("HSA GOMP_OFFLOAD_host2dev is not implemented because "
+		     "it should never be called");
+}
+
+/* Part of the libgomp plugin interface.  Not implemented as it is not required
+   for HSA.  */
+
+void *
+GOMP_OFFLOAD_dev2dev (int ord, void *dst, const void *src, size_t n)
+{
+  GOMP_PLUGIN_fatal ("HSA GOMP_OFFLOAD_dev2dev is not implemented because "
+		     "it should never be called");
+}

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [hsa merge 10/10] HSA register allocator
  2016-01-13 17:55 [hsa merge 00/10] Merge of HSA branch Martin Jambor
  2016-01-13 17:55 ` [hsa merge 01/10] Configury changes and new options Martin Jambor
  2016-01-13 17:55 ` [hsa merge 04/10] Avoid extraneous remapping in copy_gimple_seq_and_replace_locals Martin Jambor
@ 2016-01-13 17:55 ` Martin Jambor
  2016-01-14 14:41   ` Jakub Jelinek
  2016-01-13 17:55 ` [hsa merge 02/10] Modifications to libgomp proper Martin Jambor
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 77+ messages in thread
From: Martin Jambor @ 2016-01-13 17:55 UTC (permalink / raw)
  To: GCC Patches; +Cc: Michael Matz

[-- Attachment #1: hsa-regalloc.diff --]
[-- Type: text/plain, Size: 20490 bytes --]

Hi,

because HSA backend is not based on RTL,we need our own, and it is in
this patch.  The allocator has been written by Michael Matz and I have
put it into a separate email so that I can add him to CC, because he
is much better suited to answer any questions or review comments.

Thanks,

Martin


2016-01-13  Michael Matz <matz@suse.de>
	    Martin Jambor  <mjambor@suse.cz>

	* hsa-regalloc.c: New file.

diff --git a/gcc/hsa-regalloc.c b/gcc/hsa-regalloc.c
new file mode 100644
index 0000000..5a42beb
--- /dev/null
+++ b/gcc/hsa-regalloc.c
@@ -0,0 +1,719 @@
+/* HSAIL IL Register allocation and out-of-SSA.
+   Copyright (C) 2013-2016 Free Software Foundation, Inc.
+   Contributed by Michael Matz <matz@suse.de>
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "is-a.h"
+#include "vec.h"
+#include "tree.h"
+#include "dominance.h"
+#include "cfg.h"
+#include "cfganal.h"
+#include "function.h"
+#include "bitmap.h"
+#include "dumpfile.h"
+#include "cgraph.h"
+#include "print-tree.h"
+#include "cfghooks.h"
+#include "symbol-summary.h"
+#include "hsa.h"
+
+
+/* Process a PHI node PHI of basic block BB as a part of naive out-f-ssa.  */
+
+static void
+naive_process_phi (hsa_insn_phi *phi)
+{
+  unsigned count = phi->operand_count ();
+  for (unsigned i = 0; i < count; i++)
+    {
+      gcc_checking_assert (phi->get_op (i));
+      hsa_op_base *op = phi->get_op (i);
+      hsa_bb *hbb;
+      edge e;
+
+      if (!op)
+	break;
+
+      e = EDGE_PRED (phi->m_bb, i);
+      if (single_succ_p (e->src))
+	hbb = hsa_bb_for_bb (e->src);
+      else
+	{
+	  basic_block old_dest = e->dest;
+	  hbb = hsa_init_new_bb (split_edge (e));
+
+	  /* If switch insn used this edge, fix jump table.  */
+	  hsa_bb *source = hsa_bb_for_bb (e->src);
+	  hsa_insn_sbr *sbr;
+	  if (source->m_last_insn
+	      && (sbr = dyn_cast <hsa_insn_sbr *> (source->m_last_insn)))
+	    sbr->replace_all_labels (old_dest, hbb->m_bb);
+	}
+
+      hsa_build_append_simple_mov (phi->m_dest, op, hbb);
+    }
+}
+
+/* Naive out-of SSA.  */
+
+static void
+naive_outof_ssa (void)
+{
+  basic_block bb;
+
+  hsa_cfun->m_in_ssa = false;
+
+  FOR_ALL_BB_FN (bb, cfun)
+  {
+    hsa_bb *hbb = hsa_bb_for_bb (bb);
+    hsa_insn_phi *phi;
+
+    for (phi = hbb->m_first_phi;
+	 phi;
+	 phi = phi->m_next ? as_a <hsa_insn_phi *> (phi->m_next): NULL)
+      naive_process_phi (phi);
+
+    /* Zap PHI nodes, they will be deallocated when everything else will.  */
+    hbb->m_first_phi = NULL;
+    hbb->m_last_phi = NULL;
+  }
+}
+
+/* Return register class number for the given HSA TYPE.  0 means the 'c' one
+   bit register class, 1 means 's' 32 bit class, 2 stands for 'd' 64 bit class
+   and 3 for 'q' 128 bit class.  */
+
+static int
+m_reg_class_for_type (BrigType16_t type)
+{
+  switch (type)
+    {
+    case BRIG_TYPE_B1:
+      return 0;
+
+    case BRIG_TYPE_U8:
+    case BRIG_TYPE_U16:
+    case BRIG_TYPE_U32:
+    case BRIG_TYPE_S8:
+    case BRIG_TYPE_S16:
+    case BRIG_TYPE_S32:
+    case BRIG_TYPE_F16:
+    case BRIG_TYPE_F32:
+    case BRIG_TYPE_B8:
+    case BRIG_TYPE_B16:
+    case BRIG_TYPE_B32:
+    case BRIG_TYPE_U8X4:
+    case BRIG_TYPE_S8X4:
+    case BRIG_TYPE_U16X2:
+    case BRIG_TYPE_S16X2:
+    case BRIG_TYPE_F16X2:
+      return 1;
+
+    case BRIG_TYPE_U64:
+    case BRIG_TYPE_S64:
+    case BRIG_TYPE_F64:
+    case BRIG_TYPE_B64:
+    case BRIG_TYPE_U8X8:
+    case BRIG_TYPE_S8X8:
+    case BRIG_TYPE_U16X4:
+    case BRIG_TYPE_S16X4:
+    case BRIG_TYPE_F16X4:
+    case BRIG_TYPE_U32X2:
+    case BRIG_TYPE_S32X2:
+    case BRIG_TYPE_F32X2:
+      return 2;
+
+    case BRIG_TYPE_B128:
+    case BRIG_TYPE_U8X16:
+    case BRIG_TYPE_S8X16:
+    case BRIG_TYPE_U16X8:
+    case BRIG_TYPE_S16X8:
+    case BRIG_TYPE_F16X8:
+    case BRIG_TYPE_U32X4:
+    case BRIG_TYPE_U64X2:
+    case BRIG_TYPE_S32X4:
+    case BRIG_TYPE_S64X2:
+    case BRIG_TYPE_F32X4:
+    case BRIG_TYPE_F64X2:
+      return 3;
+
+    default:
+      gcc_unreachable ();
+    }
+}
+
+/* If the Ith operands of INSN is or contains a register (in an address),
+   return the address of that register operand.  If not return NULL.  */
+
+static hsa_op_reg **
+insn_reg_addr (hsa_insn_basic *insn, int i)
+{
+  hsa_op_base *op = insn->get_op (i);
+  if (!op)
+    return NULL;
+  hsa_op_reg *reg = dyn_cast <hsa_op_reg *> (op);
+  if (reg)
+    return (hsa_op_reg **) insn->get_op_addr (i);
+  hsa_op_address *addr = dyn_cast <hsa_op_address *> (op);
+  if (addr && addr->m_reg)
+    return &addr->m_reg;
+  return NULL;
+}
+
+struct m_reg_class_desc
+{
+  unsigned next_avail, max_num;
+  unsigned used_num, max_used;
+  uint64_t used[2];
+  char cl_char;
+};
+
+/* Rewrite the instructions in BB to observe spilled live ranges.
+   CLASSES is the global register class state.  */
+
+static void
+rewrite_code_bb (basic_block bb, struct m_reg_class_desc *classes)
+{
+  hsa_bb *hbb = hsa_bb_for_bb (bb);
+  hsa_insn_basic *insn, *next_insn;
+
+  for (insn = hbb->m_first_insn; insn; insn = next_insn)
+    {
+      next_insn = insn->m_next;
+      unsigned count = insn->operand_count ();
+      for (unsigned i = 0; i < count; i++)
+	{
+	  gcc_checking_assert (insn->get_op (i));
+	  hsa_op_reg **regaddr = insn_reg_addr (insn, i);
+
+	  if (regaddr)
+	    {
+	      hsa_op_reg *reg = *regaddr;
+	      if (reg->m_reg_class)
+		continue;
+	      gcc_assert (reg->m_spill_sym);
+
+	      int cl = m_reg_class_for_type (reg->m_type);
+	      hsa_op_reg *tmp, *tmp2;
+	      if (insn->op_output_p (i))
+		tmp = hsa_spill_out (insn, reg, &tmp2);
+	      else
+		tmp = hsa_spill_in (insn, reg, &tmp2);
+
+	      *regaddr = tmp;
+
+	      tmp->m_reg_class = classes[cl].cl_char;
+	      tmp->m_hard_num = (char) (classes[cl].max_num + i);
+	      if (tmp2)
+		{
+		  gcc_assert (cl == 0);
+		  tmp2->m_reg_class = classes[1].cl_char;
+		  tmp2->m_hard_num = (char) (classes[1].max_num + i);
+		}
+	    }
+	}
+    }
+}
+
+/* Dump current function to dump file F, with info specific
+   to register allocation.  */
+
+void
+dump_hsa_cfun_regalloc (FILE *f)
+{
+  basic_block bb;
+
+  fprintf (f, "\nHSAIL IL for %s\n", hsa_cfun->m_name);
+
+  FOR_ALL_BB_FN (bb, cfun)
+  {
+    hsa_bb *hbb = (struct hsa_bb *) bb->aux;
+    bitmap_print (dump_file, hbb->m_livein, "m_livein  ", "\n");
+    dump_hsa_bb (f, hbb);
+    bitmap_print (dump_file, hbb->m_liveout, "m_liveout ", "\n");
+  }
+}
+
+/* Given the global register allocation state CLASSES and a
+   register REG, try to give it a hardware register.  If successful,
+   store that hardreg in REG and return it, otherwise return -1.
+   Also changes CLASSES to accommodate for the allocated register.  */
+
+static int
+try_alloc_reg (struct m_reg_class_desc *classes, hsa_op_reg *reg)
+{
+  int cl = m_reg_class_for_type (reg->m_type);
+  int ret = -1;
+  if (classes[1].used_num + classes[2].used_num * 2 + classes[3].used_num * 4
+      >= 128 - 5)
+    return -1;
+  if (classes[cl].used_num < classes[cl].max_num)
+    {
+      unsigned int i;
+      classes[cl].used_num++;
+      if (classes[cl].used_num > classes[cl].max_used)
+	classes[cl].max_used = classes[cl].used_num;
+      for (i = 0; i < classes[cl].used_num; i++)
+	if (! (classes[cl].used[i / 64] & (((uint64_t)1) << (i & 63))))
+	  break;
+      ret = i;
+      classes[cl].used[i / 64] |= (((uint64_t)1) << (i & 63));
+      reg->m_reg_class = classes[cl].cl_char;
+      reg->m_hard_num = i;
+    }
+  return ret;
+}
+
+/* Free up hardregs used by REG, into allocation state CLASSES.  */
+
+static void
+free_reg (struct m_reg_class_desc *classes, hsa_op_reg *reg)
+{
+  int cl = m_reg_class_for_type (reg->m_type);
+  int ret = reg->m_hard_num;
+  gcc_assert (reg->m_reg_class == classes[cl].cl_char);
+  classes[cl].used_num--;
+  classes[cl].used[ret / 64] &= ~(((uint64_t)1) << (ret & 63));
+}
+
+/* Note that the live range for REG ends at least at END.  */
+
+static void
+note_lr_end (hsa_op_reg *reg, int end)
+{
+  if (reg->m_lr_end < end)
+    reg->m_lr_end = end;
+}
+
+/* Note that the live range for REG starts at least at BEGIN.  */
+
+static void
+note_lr_begin (hsa_op_reg *reg, int begin)
+{
+  if (reg->m_lr_begin > begin)
+    reg->m_lr_begin = begin;
+}
+
+/* Given two registers A and B, return -1, 0 or 1 if A's live range
+   starts before, at or after B's live range.  */
+
+static int
+cmp_begin (const void *a, const void *b)
+{
+  const hsa_op_reg * const *rega = (const hsa_op_reg * const *)a;
+  const hsa_op_reg * const *regb = (const hsa_op_reg * const *)b;
+  int ret;
+  if (rega == regb)
+    return 0;
+  ret = (*rega)->m_lr_begin - (*regb)->m_lr_begin;
+  if (ret)
+    return ret;
+  return ((*rega)->m_order - (*regb)->m_order);
+}
+
+/* Given two registers REGA and REGB, return true if REGA's
+   live range ends after REGB's.  This results in a sorting order
+   with earlier end points at the end.  */
+
+static bool
+cmp_end (hsa_op_reg * const &rega, hsa_op_reg * const &regb)
+{
+  int ret;
+  if (rega == regb)
+    return false;
+  ret = (regb)->m_lr_end - (rega)->m_lr_end;
+  if (ret)
+    return ret < 0;
+  return (((regb)->m_order - (rega)->m_order)) < 0;
+}
+
+/* Expire all old intervals in ACTIVE (a per-regclass vector),
+   that is, those that end before the interval REG starts.  Give
+   back resources freed so into the state CLASSES.  */
+
+static void
+expire_old_intervals (hsa_op_reg *reg, vec<hsa_op_reg*> *active,
+		      struct m_reg_class_desc *classes)
+{
+  for (int i = 0; i < 4; i++)
+    while (!active[i].is_empty ())
+      {
+	hsa_op_reg *a = active[i].pop ();
+	if (a->m_lr_end > reg->m_lr_begin)
+	  {
+	    active[i].quick_push (a);
+	    break;
+	  }
+	free_reg (classes, a);
+      }
+}
+
+/* The interval REG didn't get a hardreg.  Spill it or one of those
+   from ACTIVE (if the latter, then REG will become allocated to the
+   hardreg that formerly was used by it).  */
+
+static void
+spill_at_interval (hsa_op_reg *reg, vec<hsa_op_reg*> *active)
+{
+  int cl = m_reg_class_for_type (reg->m_type);
+  gcc_assert (!active[cl].is_empty ());
+  hsa_op_reg *cand = active[cl][0];
+  if (cand->m_lr_end > reg->m_lr_end)
+    {
+      reg->m_reg_class = cand->m_reg_class;
+      reg->m_hard_num = cand->m_hard_num;
+      active[cl].ordered_remove (0);
+      unsigned place = active[cl].lower_bound (reg, cmp_end);
+      active[cl].quick_insert (place, reg);
+    }
+  else
+    cand = reg;
+
+  gcc_assert (!cand->m_spill_sym);
+  BrigType16_t type = cand->m_type;
+  if (type == BRIG_TYPE_B1)
+    type = BRIG_TYPE_U8;
+  cand->m_reg_class = 0;
+  cand->m_spill_sym = hsa_get_spill_symbol (type);
+  cand->m_spill_sym->m_name_number = cand->m_order;
+}
+
+/* Given the global register state CLASSES allocate all HSA virtual
+   registers either to hardregs or to a spill symbol.  */
+
+static void
+linear_scan_regalloc (struct m_reg_class_desc *classes)
+{
+  /* Compute liveness.  */
+  bool changed;
+  int i, n;
+  int insn_order;
+  int *bbs = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
+  bitmap work = BITMAP_ALLOC (NULL);
+  vec<hsa_op_reg*> ind2reg = vNULL;
+  vec<hsa_op_reg*> active[4] = {vNULL, vNULL, vNULL, vNULL};
+  hsa_insn_basic *m_last_insn;
+
+  /* We will need the reverse post order for linearization,
+     and the post order for liveness analysis, which is the same
+     backward.  */
+  n = pre_and_rev_post_order_compute (NULL, bbs, true);
+  ind2reg.safe_grow_cleared (hsa_cfun->m_reg_count);
+
+  /* Give all instructions a linearized number, at the same time
+     build a mapping from register index to register.  */
+  insn_order = 1;
+  for (i = 0; i < n; i++)
+    {
+      basic_block bb = BASIC_BLOCK_FOR_FN (cfun, bbs[i]);
+      hsa_bb *hbb = hsa_bb_for_bb (bb);
+      hsa_insn_basic *insn;
+      for (insn = hbb->m_first_insn; insn; insn = insn->m_next)
+	{
+	  unsigned opi;
+	  insn->m_number = insn_order++;
+	  for (opi = 0; opi < insn->operand_count (); opi++)
+	    {
+	      gcc_checking_assert (insn->get_op (opi));
+	      hsa_op_reg **regaddr = insn_reg_addr (insn, opi);
+	      if (regaddr)
+		ind2reg[(*regaddr)->m_order] = *regaddr;
+	    }
+	}
+    }
+
+  /* Initialize all live ranges to [after-end, 0).  */
+  for (i = 0; i < hsa_cfun->m_reg_count; i++)
+    if (ind2reg[i])
+      ind2reg[i]->m_lr_begin = insn_order, ind2reg[i]->m_lr_end = 0;
+
+  /* Classic liveness analysis, as long as something changes:
+       m_liveout is union (m_livein of successors)
+       m_livein is m_liveout minus defs plus uses.  */
+  do
+    {
+      changed = false;
+      for (i = n - 1; i >= 0; i--)
+	{
+	  edge e;
+	  edge_iterator ei;
+	  basic_block bb = BASIC_BLOCK_FOR_FN (cfun, bbs[i]);
+	  hsa_bb *hbb = hsa_bb_for_bb (bb);
+
+	  /* Union of successors m_livein (or empty if none).  */
+	  bool first = true;
+	  FOR_EACH_EDGE (e, ei, bb->succs)
+	    if (e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun))
+	      {
+		hsa_bb *succ = hsa_bb_for_bb (e->dest);
+		if (first)
+		  {
+		    bitmap_copy (work, succ->m_livein);
+		    first = false;
+		  }
+		else
+		  bitmap_ior_into (work, succ->m_livein);
+	      }
+	  if (first)
+	    bitmap_clear (work);
+
+	  bitmap_copy (hbb->m_liveout, work);
+
+	  /* Remove defs, include uses in a backward insn walk.  */
+	  hsa_insn_basic *insn;
+	  for (insn = hbb->m_last_insn; insn; insn = insn->m_prev)
+	    {
+	      unsigned opi;
+	      unsigned ndefs = insn->input_count ();
+	      for (opi = 0; opi < ndefs && insn->get_op (opi); opi++)
+		{
+		  gcc_checking_assert (insn->get_op (opi));
+		  hsa_op_reg **regaddr = insn_reg_addr (insn, opi);
+		  if (regaddr)
+		    bitmap_clear_bit (work, (*regaddr)->m_order);
+		}
+	      for (; opi < insn->operand_count (); opi++)
+		{
+		  gcc_checking_assert (insn->get_op (opi));
+		  hsa_op_reg **regaddr = insn_reg_addr (insn, opi);
+		  if (regaddr)
+		    bitmap_set_bit (work, (*regaddr)->m_order);
+		}
+	    }
+
+	  /* Note if that changed something.  */
+	  if (bitmap_ior_into (hbb->m_livein, work))
+	    changed = true;
+	}
+    }
+  while (changed);
+
+  /* Make one pass through all instructions in linear order,
+     noting and merging possible live range start and end points.  */
+  m_last_insn = NULL;
+  for (i = n - 1; i >= 0; i--)
+    {
+      basic_block bb = BASIC_BLOCK_FOR_FN (cfun, bbs[i]);
+      hsa_bb *hbb = hsa_bb_for_bb (bb);
+      hsa_insn_basic *insn;
+      int after_end_number;
+      unsigned bit;
+      bitmap_iterator bi;
+
+      if (m_last_insn)
+	after_end_number = m_last_insn->m_number;
+      else
+	after_end_number = insn_order;
+      /* Everything live-out in this BB has at least an end point
+         after us. */
+      EXECUTE_IF_SET_IN_BITMAP (hbb->m_liveout, 0, bit, bi)
+	note_lr_end (ind2reg[bit], after_end_number);
+
+      for (insn = hbb->m_last_insn; insn; insn = insn->m_prev)
+	{
+	  unsigned opi;
+	  unsigned ndefs = insn->input_count ();
+	  for (opi = 0; opi < insn->operand_count (); opi++)
+	    {
+	      gcc_checking_assert (insn->get_op (opi));
+	      hsa_op_reg **regaddr = insn_reg_addr (insn, opi);
+	      if (regaddr)
+		{
+		  hsa_op_reg *reg = *regaddr;
+		  if (opi < ndefs)
+		    note_lr_begin (reg, insn->m_number);
+		  else
+		    note_lr_end (reg, insn->m_number);
+		}
+	    }
+	}
+
+      /* Everything live-in in this BB has a start point before
+         our first insn.  */
+      int before_start_number;
+      if (hbb->m_first_insn)
+	before_start_number = hbb->m_first_insn->m_number;
+      else
+	before_start_number = after_end_number;
+      before_start_number--;
+      EXECUTE_IF_SET_IN_BITMAP (hbb->m_livein, 0, bit, bi)
+	note_lr_begin (ind2reg[bit], before_start_number);
+
+      if (hbb->m_first_insn)
+	m_last_insn = hbb->m_first_insn;
+    }
+
+  for (i = 0; i < hsa_cfun->m_reg_count; i++)
+    if (ind2reg[i])
+      {
+	/* All regs that have still their start at after all code actually
+	   are defined at the start of the routine (prologue).  */
+	if (ind2reg[i]->m_lr_begin == insn_order)
+	  ind2reg[i]->m_lr_begin = 0;
+	/* All regs that have no use but a def will have lr_end == 0, 
+	   they are actually live from def until after the insn they are
+	   defined in.  */
+	if (ind2reg[i]->m_lr_end == 0)
+	  ind2reg[i]->m_lr_end = ind2reg[i]->m_lr_begin + 1;
+      }
+
+  /* Sort all intervals by increasing start point.  */
+  gcc_assert (ind2reg.length () == (size_t) hsa_cfun->m_reg_count);
+
+#ifdef ENABLE_CHECKING
+  for (unsigned i = 0; i < ind2reg.length (); i++)
+    gcc_assert (ind2reg[i]);
+#endif
+
+  ind2reg.qsort (cmp_begin);
+  for (i = 0; i < 4; i++)
+    active[i].reserve_exact (hsa_cfun->m_reg_count);
+
+  /* Now comes the linear scan allocation.  */
+  for (i = 0; i < hsa_cfun->m_reg_count; i++)
+    {
+      hsa_op_reg *reg = ind2reg[i];
+      if (!reg)
+	continue;
+      expire_old_intervals (reg, active, classes);
+      int cl = m_reg_class_for_type (reg->m_type);
+      if (try_alloc_reg (classes, reg) >= 0)
+	{
+	  unsigned place = active[cl].lower_bound (reg, cmp_end);
+	  active[cl].quick_insert (place, reg);
+	}
+      else
+	spill_at_interval (reg, active);
+
+      /* Some interesting dumping as we go.  */
+      if (dump_file)
+	{
+	  fprintf (dump_file, "  reg%d: [%5d, %5d)->",
+		   reg->m_order, reg->m_lr_begin, reg->m_lr_end);
+	  if (reg->m_reg_class)
+	    fprintf (dump_file, "$%c%i", reg->m_reg_class, reg->m_hard_num);
+	  else
+	    fprintf (dump_file, "[%%__%s_%i]",
+		     hsa_seg_name (reg->m_spill_sym->m_segment),
+		     reg->m_spill_sym->m_name_number);
+	  for (int cl = 0; cl < 4; cl++)
+	    {
+	      bool first = true;
+	      hsa_op_reg *r;
+	      fprintf (dump_file, " {");
+	      for (int j = 0; active[cl].iterate (j, &r); j++)
+		if (first)
+		  {
+		    fprintf (dump_file, "%d", r->m_order);
+		    first = false;
+		  }
+		else
+		  fprintf (dump_file, ", %d", r->m_order);
+	      fprintf (dump_file, "}");
+	    }
+	  fprintf (dump_file, "\n");
+	}
+    }
+
+  BITMAP_FREE (work);
+  free (bbs);
+
+  if (dump_file)
+    {
+      fprintf (dump_file, "------- After liveness: -------\n");
+      dump_hsa_cfun_regalloc (dump_file);
+      fprintf (dump_file, "  ----- Intervals:\n");
+      for (i = 0; i < hsa_cfun->m_reg_count; i++)
+	{
+	  hsa_op_reg *reg = ind2reg[i];
+	  if (!reg)
+	    continue;
+	  fprintf (dump_file, "  reg%d: [%5d, %5d)->", reg->m_order,
+		   reg->m_lr_begin, reg->m_lr_end);
+	  if (reg->m_reg_class)
+	    fprintf (dump_file, "$%c%i\n", reg->m_reg_class, reg->m_hard_num);
+	  else
+	    fprintf (dump_file, "[%%__%s_%i]\n",
+		     hsa_seg_name (reg->m_spill_sym->m_segment),
+		     reg->m_spill_sym->m_name_number);
+	}
+    }
+
+  for (i = 0; i < 4; i++)
+    active[i].release ();
+  ind2reg.release ();
+}
+
+/* Entry point for register allocation.  */
+
+static void
+regalloc (void)
+{
+  basic_block bb;
+  m_reg_class_desc classes[4];
+
+  /* If there are no registers used in the function, exit right away. */
+  if (hsa_cfun->m_reg_count == 0)
+    return;
+
+  memset (classes, 0, sizeof (classes));
+  classes[0].next_avail = 0;
+  classes[0].max_num = 7;
+  classes[0].cl_char = 'c';
+  classes[1].cl_char = 's';
+  classes[2].cl_char = 'd';
+  classes[3].cl_char = 'q';
+
+  for (int i = 1; i < 4; i++)
+    {
+      classes[i].next_avail = 0;
+      classes[i].max_num = 20;
+    }
+
+  linear_scan_regalloc (classes);
+
+  FOR_ALL_BB_FN (bb, cfun)
+    rewrite_code_bb (bb, classes);
+}
+
+/* Out of SSA and register allocation on HSAIL IL.  */
+
+void
+hsa_regalloc (void)
+{
+  naive_outof_ssa ();
+
+  if (dump_file)
+    {
+      fprintf (dump_file, "------- After out-of-SSA: -------\n");
+      dump_hsa_cfun (dump_file);
+    }
+
+  regalloc ();
+
+  if (dump_file)
+    {
+      fprintf (dump_file, "------- After register allocation: -------\n");
+      dump_hsa_cfun (dump_file);
+    }
+}

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [hsa merge 04/10] Avoid extraneous remapping in copy_gimple_seq_and_replace_locals
  2016-01-13 17:55 [hsa merge 00/10] Merge of HSA branch Martin Jambor
  2016-01-13 17:55 ` [hsa merge 01/10] Configury changes and new options Martin Jambor
@ 2016-01-13 17:55 ` Martin Jambor
  2016-01-13 18:02   ` Jakub Jelinek
  2016-01-13 17:55 ` [hsa merge 10/10] HSA register allocator Martin Jambor
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 77+ messages in thread
From: Martin Jambor @ 2016-01-13 17:55 UTC (permalink / raw)
  To: GCC Patches; +Cc: Jakub Jelinek

[-- Attachment #1: tree-inline.diff --]
[-- Type: text/plain, Size: 3697 bytes --]

Hi,

this patch is new, it addresses a problem I outlined in
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00424.html and it is an
implementation of Jakub's suggestion in
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00614.html

I have refrained from bigger changes in struct copy_body_data in
tree-inline.h as I think that such a cleanup should be done
separately, but the structure could probably use some field-re
ordering to remove padding.

I hope I have grasped it correctly and that the patch is OK for trunk.

Thanks,

Martin


2016-01-13  Martin Jambor  <mjambor@suse.cz>

	* tree-inline.c (remap_decl): Use existing dclarations if
	remapping a type and prevent_decl_creation_for_types.
	(replace_locals_stmt): Do an initial remapping of non-VLA typed
	decls first.  Do real remapping with
	prevent_decl_creation_for_types set.
	* tree-inline.h (copy_body_data): New field
	prevent_decl_creation_for_types, moved remap_var_for_cilk to avoid
	padding.

diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index 6bf2467..7b34288 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -340,8 +340,22 @@ remap_decl (tree decl, copy_body_data *id)
       return decl;
     }
 
-  /* If we didn't already have an equivalent for this declaration,
-     create one now.  */
+  /* When remapping a type within copy_gimple_seq_and_replace_locals, all
+     necessary DECLs have already been remapped and we do not want to duplicate
+     a decl coming from outside of the sequence we are copying.  */
+  if (!n
+      && id->prevent_decl_creation_for_types
+      && id->remapping_type_depth > 0
+      && (VAR_P (decl) || TREE_CODE (decl) == PARM_DECL))
+    {
+      if (id->do_not_unshare)
+	return decl;
+      else
+	return unshare_expr (decl);
+    }
+
+  /* If we didn't already have an equivalent for this declaration, create one
+     now.  */
   if (!n)
     {
       /* Make a copy of the variable or label.  */
@@ -5225,8 +5239,19 @@ replace_locals_stmt (gimple_stmt_iterator *gsip,
       /* This will remap a lot of the same decls again, but this should be
 	 harmless.  */
       if (gimple_bind_vars (stmt))
-	gimple_bind_set_vars (stmt, remap_decls (gimple_bind_vars (stmt),
-						 NULL, id));
+	{
+	  tree old_var, decls = gimple_bind_vars (stmt);
+
+	  for (old_var = decls; old_var; old_var = DECL_CHAIN (old_var))
+	    if (!can_be_nonlocal (old_var, id)
+		&& ! variably_modified_type_p (TREE_TYPE (old_var), id->src_fn))
+	      remap_decl (old_var, id);
+
+	  gcc_checking_assert (!id->prevent_decl_creation_for_types);
+	  id->prevent_decl_creation_for_types = true;
+	  gimple_bind_set_vars (stmt, remap_decls (decls, NULL, id));
+	  id->prevent_decl_creation_for_types = false;
+	}
     }
 
   /* Keep iterating.  */
diff --git a/gcc/tree-inline.h b/gcc/tree-inline.h
index d3e5229..4cc1f19 100644
--- a/gcc/tree-inline.h
+++ b/gcc/tree-inline.h
@@ -140,14 +140,17 @@ struct copy_body_data
      the originals have been mapped to a value rather than to a
      variable.  */
   hash_map<tree, tree> *debug_map;
- 
-  /* Cilk keywords currently need to replace some variables that
-     ordinary nested functions do not.  */ 
-  bool remap_var_for_cilk;
 
   /* A map from the inlined functions dependence info cliques to
      equivalents in the function into which it is being inlined.  */
   hash_map<dependence_hash, unsigned short> *dependence_map;
+
+  /* Cilk keywords currently need to replace some variables that
+     ordinary nested functions do not.  */
+  bool remap_var_for_cilk;
+
+  /* Do not create new declarations when within type remapping.  */
+  bool prevent_decl_creation_for_types;
 };
 
 /* Weights of constructions for estimate_num_insns.  */

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 04/10] Avoid extraneous remapping in copy_gimple_seq_and_replace_locals
  2016-01-13 17:55 ` [hsa merge 04/10] Avoid extraneous remapping in copy_gimple_seq_and_replace_locals Martin Jambor
@ 2016-01-13 18:02   ` Jakub Jelinek
  0 siblings, 0 replies; 77+ messages in thread
From: Jakub Jelinek @ 2016-01-13 18:02 UTC (permalink / raw)
  To: Martin Jambor; +Cc: GCC Patches

On Wed, Jan 13, 2016 at 06:39:29PM +0100, Martin Jambor wrote:
> --- a/gcc/tree-inline.c
> +++ b/gcc/tree-inline.c
> @@ -340,8 +340,22 @@ remap_decl (tree decl, copy_body_data *id)
>        return decl;
>      }
>  
> -  /* If we didn't already have an equivalent for this declaration,
> -     create one now.  */
> +  /* When remapping a type within copy_gimple_seq_and_replace_locals, all
> +     necessary DECLs have already been remapped and we do not want to duplicate
> +     a decl coming from outside of the sequence we are copying.  */
> +  if (!n
> +      && id->prevent_decl_creation_for_types
> +      && id->remapping_type_depth > 0
> +      && (VAR_P (decl) || TREE_CODE (decl) == PARM_DECL))
> +    {
> +      if (id->do_not_unshare)
> +	return decl;
> +      else
> +	return unshare_expr (decl);
> +    }

Just return decl; instead of the { if () return decl; else unshare_expr (decl); }
- both VAR_DECL and PARM_DECL are tcc_declaration for which unshare_expr
doesn't do anything.

Otherwise LGTM.  I'll get to your other patches tomorrow.

	Jakub

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [hsa merge 09/10] Majority of the HSA back-end
  2016-01-13 17:55 [hsa merge 00/10] Merge of HSA branch Martin Jambor
                   ` (6 preceding siblings ...)
  2016-01-13 17:55 ` [hsa merge 06/10] Pass manager changes Martin Jambor
@ 2016-01-13 18:11 ` Martin Jambor
  2016-01-14 14:05   ` Jakub Jelinek
  2016-01-15 23:49   ` Martin Jambor
  2016-01-13 18:12 ` [hsa merge 08/10] HSAIL BRIG description header file Martin Jambor
                   ` (2 subsequent siblings)
  10 siblings, 2 replies; 77+ messages in thread
From: Martin Jambor @ 2016-01-13 18:11 UTC (permalink / raw)
  To: GCC Patches; +Cc: Martin Liska

[-- Attachment #1: hsa-backend.diff --]
[-- Type: text/plain, Size: 358430 bytes --]

Hi,

the following patch comprises the parts of the HSA back-end that have
been developed by myself and Martin Liska (with some help of our
friends at AMD) - as opposed to the register allocator in the next
patch.

Full description of the back-end would result in an email so long
nobody would read it.  So let me just briefly describe the individual
files.

- hsa.h is the header file for GCC-specific HSA data structures and
  functions shared among a number of compilation units.

- hsa.c contains common HSA-related functionality that was too big to
  be in a header file.

- hsa-gen.c contains the HSA generating pass class and functionality
  required to translate GIMPLE into our own internal representation of
  HSAIL.

- hsa-dump.c contains functions capable of dumping HSA stuff.

- hsa-brig.c is where creation of the BRIG format is implemented.

The hunk in toplev just calls a function in hsa-brig.c that emits out
the created BRIG module at the end of the compilation.

I am not sure who should approve this but I will of course eagerly
help anyone who is willing to do it.

Thanks,

Martin


2016-01-13  Martin Jambor  <mjambor@suse.cz>
            Martin Liska  <mliska@suse.cz>

	* hsa-brig.c: New file.
	* hsa-dump.c: Likewise.
	* hsa-gen.c: Likewise.
	* hsa.c: Likewise.
	* hsa.h: Likewise.
	* toplev.c (compile_file): Call hsa_output_brig.

diff --git a/gcc/hsa-brig.c b/gcc/hsa-brig.c
new file mode 100644
index 0000000..17e3398
--- /dev/null
+++ b/gcc/hsa-brig.c
@@ -0,0 +1,2468 @@
+/* Producing binary form of HSA BRIG from our internal representation.
+   Copyright (C) 2013-2016 Free Software Foundation, Inc.
+   Contributed by Martin Jambor <mjambor@suse.cz> and
+   Martin Liska <mliska@suse.cz>.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "is-a.h"
+#include "vec.h"
+#include "hash-table.h"
+#include "hash-map.h"
+#include "tree.h"
+#include "tree-iterator.h"
+#include "stor-layout.h"
+#include "output.h"
+#include "cfg.h"
+#include "function.h"
+#include "fold-const.h"
+#include "stringpool.h"
+#include "gimple-pretty-print.h"
+#include "diagnostic-core.h"
+#include "cgraph.h"
+#include "dumpfile.h"
+#include "print-tree.h"
+#include "symbol-summary.h"
+#include "hsa.h"
+#include "gomp-constants.h"
+
+#define BRIG_ELF_SECTION_NAME ".brig"
+#define BRIG_LABEL_STRING "hsa_brig"
+#define BRIG_SECTION_DATA_NAME    "hsa_data"
+#define BRIG_SECTION_CODE_NAME    "hsa_code"
+#define BRIG_SECTION_OPERAND_NAME "hsa_operand"
+
+#define BRIG_CHUNK_MAX_SIZE (64 * 1024)
+
+/* Chunks of BRIG binary data.  */
+
+struct hsa_brig_data_chunk
+{
+  /* Size of the data already stored into a chunk.  */
+  unsigned size;
+
+  /* Pointer to the data.  */
+  char *data;
+};
+
+/* Structure representing a BRIG section, holding and writing its data.  */
+
+class hsa_brig_section
+{
+public:
+  /* Section name that will be output to the BRIG.  */
+  const char *section_name;
+  /* Size in bytes of all data stored in the section.  */
+  unsigned total_size;
+  /* The size of the header of the section including padding. */
+  unsigned header_byte_count;
+  /* The size of the header of the section without any padding.  */
+  unsigned header_byte_delta;
+
+  /* Buffers of binary data, each containing BRIG_CHUNK_MAX_SIZE bytes.  */
+  vec <struct hsa_brig_data_chunk> chunks;
+
+  /* More convenient access to the last chunk from the vector above. */
+  struct hsa_brig_data_chunk *cur_chunk;
+
+  void allocate_new_chunk ();
+  void init (const char *name);
+  void release ();
+  void output ();
+  unsigned add (const void *data, unsigned len);
+  void round_size_up (int factor);
+  void *get_ptr_by_offset (unsigned int offset);
+};
+
+static struct hsa_brig_section brig_data, brig_code, brig_operand;
+static uint32_t brig_insn_count;
+static bool brig_initialized = false;
+
+/* Mapping between emitted HSA functions and their offset in code segment.  */
+static hash_map<tree, BrigCodeOffset32_t> *function_offsets;
+
+/* Hash map of emitted function declarations.  */
+static hash_map <tree, BrigDirectiveExecutable *> *emitted_declarations;
+
+/* Hash table of emitted internal function declaration offsets.  */
+hash_table <hsa_internal_fn_hasher> *hsa_emitted_internal_decls;
+
+/* List of sbr instructions.  */
+static vec <hsa_insn_sbr *> *switch_instructions;
+
+struct function_linkage_pair
+{
+  function_linkage_pair (tree decl, unsigned int off):
+    function_decl (decl), offset (off) {}
+
+  /* Declaration of called function.  */
+  tree function_decl;
+
+  /* Offset in operand section.  */
+  unsigned int offset;
+};
+
+/* Vector of function calls where we need to resolve function offsets.  */
+static auto_vec <function_linkage_pair> function_call_linkage;
+
+/* Add a new chunk, allocate data for it and initialize it.  */
+
+void
+hsa_brig_section::allocate_new_chunk ()
+{
+  struct hsa_brig_data_chunk new_chunk;
+
+  new_chunk.data = XCNEWVEC (char, BRIG_CHUNK_MAX_SIZE);
+  new_chunk.size = 0;
+  cur_chunk = chunks.safe_push (new_chunk);
+}
+
+/* Initialize the brig section.  */
+
+void
+hsa_brig_section::init (const char *name)
+{
+  section_name = name;
+  /* While the following computation is basically wrong, because the intent
+     certainly wasn't to have the first character of name and padding, which
+     are a part of sizeof (BrigSectionHeader), included in the first addend,
+     this is what the disassembler expects.  */
+  total_size = sizeof (BrigSectionHeader) + strlen(section_name);
+  chunks.create (1);
+  allocate_new_chunk ();
+  header_byte_delta = total_size;
+  round_size_up (4);
+  header_byte_count = total_size;
+}
+
+/* Free all data in the section.  */
+
+void
+hsa_brig_section::release ()
+{
+  for (unsigned i = 0; i < chunks.length (); i++)
+    free (chunks[i].data);
+  chunks.release ();
+  cur_chunk = NULL;
+}
+
+/* Write the section to the output file to a section with the name given at
+   initialization.  Switches the output section and does not restore it.  */
+
+void
+hsa_brig_section::output ()
+{
+  struct BrigSectionHeader section_header;
+  char padding[8];
+
+  section_header.byteCount = htole64 (total_size);
+  section_header.headerByteCount = htole32 (header_byte_count);
+  section_header.nameLength = htole32 (strlen(section_name));
+  assemble_string ((const char*) &section_header, 16);
+  assemble_string (section_name, (section_header.nameLength));
+  memset (&padding, 0, sizeof (padding));
+  /* This is also a consequence of the wrong header size computation described
+     in a comment in hsa_brig_section::init.  */
+  assemble_string (padding, 8);
+  for (unsigned i = 0; i < chunks.length (); i++)
+    assemble_string (chunks[i].data, chunks[i].size);
+}
+
+/* Add to the stream LEN bytes of opaque binary DATA.  Return the offset at
+   which it was stored.  */
+
+unsigned
+hsa_brig_section::add (const void *data, unsigned len)
+{
+  unsigned offset = total_size;
+
+  gcc_assert (len <= BRIG_CHUNK_MAX_SIZE);
+  if (cur_chunk->size > (BRIG_CHUNK_MAX_SIZE - len))
+    allocate_new_chunk ();
+
+  memcpy (cur_chunk->data + cur_chunk->size, data, len);
+  cur_chunk->size += len;
+  total_size += len;
+
+  return offset;
+}
+
+/* Add padding to section so that its size is divisible by FACTOR.  */
+
+void
+hsa_brig_section::round_size_up (int factor)
+{
+  unsigned padding, res = total_size % factor;
+
+  if (res == 0)
+    return;
+
+  padding = factor - res;
+  total_size += padding;
+  if (cur_chunk->size > (BRIG_CHUNK_MAX_SIZE - padding))
+    {
+      padding -= BRIG_CHUNK_MAX_SIZE - cur_chunk->size;
+      cur_chunk->size = BRIG_CHUNK_MAX_SIZE;
+      allocate_new_chunk ();
+    }
+
+  cur_chunk->size += padding;
+}
+
+/* Return pointer to data by global OFFSET in the section.  */
+
+void*
+hsa_brig_section::get_ptr_by_offset (unsigned int offset)
+{
+  gcc_assert (offset < total_size);
+
+  offset -= header_byte_delta;
+  unsigned int i;
+
+  for (i = 0; offset >= chunks[i].size; i++)
+    offset -= chunks[i].size;
+
+  return chunks[i].data + offset;
+}
+
+/* BRIG string data hashing.  */
+
+struct brig_string_slot
+{
+  const char *s;
+  char prefix;
+  int len;
+  uint32_t offset;
+};
+
+/* Hash table helpers.  */
+
+struct brig_string_slot_hasher : pointer_hash <brig_string_slot>
+{
+  static inline hashval_t hash (const value_type);
+  static inline bool equal (const value_type, const compare_type);
+  static inline void remove (value_type);
+};
+
+/* Returns a hash code for DS.  Adapted from libiberty's htab_hash_string
+   to support strings that may not end in '\0'.  */
+
+inline hashval_t
+brig_string_slot_hasher::hash (const value_type ds)
+{
+  hashval_t r = ds->len;
+  int i;
+
+  for (i = 0; i < ds->len; i++)
+     r = r * 67 + (unsigned)ds->s[i] - 113;
+  r = r * 67 + (unsigned)ds->prefix - 113;
+  return r;
+}
+
+/* Returns nonzero if DS1 and DS2 are equal.  */
+
+inline bool
+brig_string_slot_hasher::equal (const value_type ds1, const compare_type ds2)
+{
+  if (ds1->len == ds2->len)
+    return ds1->prefix == ds2->prefix && memcmp (ds1->s, ds2->s, ds1->len) == 0;
+
+  return 0;
+}
+
+/* Deallocate memory for DS upon its removal.  */
+
+inline void
+brig_string_slot_hasher::remove (value_type ds)
+{
+  free (const_cast<char*> (ds->s));
+  free (ds);
+}
+
+/* Hash for strings we output in order not to duplicate them needlessly.  */
+
+static hash_table<brig_string_slot_hasher> *brig_string_htab;
+
+/* Emit a null terminated string STR to the data section and return its
+   offset in it.  If PREFIX is non-zero, output it just before STR too.
+   Sanitize the string if SANITIZE option is set to true.  */
+
+static unsigned
+brig_emit_string (const char *str, char prefix = 0, bool sanitize = true)
+{
+  unsigned slen = strlen (str);
+  unsigned offset, len = slen + (prefix ? 1 : 0);
+  uint32_t hdr_len = htole32 (len);
+  brig_string_slot s_slot;
+  brig_string_slot **slot;
+  char *str2;
+
+  /* XXX Sanitize the names without all the strdup.  */
+  str2 = xstrdup (str);
+
+  if (sanitize)
+    hsa_sanitize_name (str2);
+  s_slot.s = str2;
+  s_slot.len = slen;
+  s_slot.prefix = prefix;
+  s_slot.offset = 0;
+
+  slot = brig_string_htab->find_slot (&s_slot, INSERT);
+  if (*slot == NULL)
+    {
+      brig_string_slot *new_slot = XCNEW (brig_string_slot);
+
+      /* In theory we should fill in BrigData but that would mean copying
+         the string to a buffer for no reason, so we just emulate it. */
+      offset = brig_data.add (&hdr_len, sizeof (hdr_len));
+      if (prefix)
+        brig_data.add (&prefix, 1);
+
+      brig_data.add (str2, slen);
+      brig_data.round_size_up (4);
+
+      /* XXX could use the string we just copied into brig_string->cur_chunk */
+      new_slot->s = str2;
+      new_slot->len = slen;
+      new_slot->prefix = prefix;
+      new_slot->offset = offset;
+      *slot = new_slot;
+    }
+  else
+    {
+      offset = (*slot)->offset;
+      free (str2);
+    }
+
+  return offset;
+}
+
+/* Linked list of queued operands.  */
+
+static struct operand_queue
+{
+  /* First from the chain of queued operands.  */
+  hsa_op_base *first_op, *last_op;
+
+  /* The offset at which the next operand will be enqueued.  */
+  unsigned projected_size;
+
+} op_queue;
+
+/* Unless already initialized, initialize infrastructure to produce BRIG.  */
+
+static void
+brig_init (void)
+{
+  brig_insn_count = 0;
+
+  if (brig_initialized)
+    return;
+
+  brig_string_htab = new hash_table<brig_string_slot_hasher> (37);
+  brig_data.init (BRIG_SECTION_DATA_NAME);
+  brig_code.init (BRIG_SECTION_CODE_NAME);
+  brig_operand.init (BRIG_SECTION_OPERAND_NAME);
+  brig_initialized = true;
+
+  struct BrigDirectiveModule moddir;
+  memset (&moddir, 0, sizeof (moddir));
+  moddir.base.byteCount = htole16 (sizeof (moddir));
+
+  char *modname;
+  if (main_input_filename && *main_input_filename != '\0')
+    {
+      const char *part = strrchr (main_input_filename, '/');
+      if (!part)
+	part = main_input_filename;
+      else
+	part++;
+      asprintf (&modname, "&__hsa_module_%s", part);
+      char* extension = strchr (modname, '.');
+      if (extension)
+	*extension = '\0';
+
+      /* As in LTO mode, we have to emit a different module names.  */
+      if (flag_ltrans)
+	{
+	  part = strrchr (asm_file_name, '/');
+	  if (!part)
+	    part = asm_file_name;
+	  else
+	    part++;
+	  char *modname2;
+	  asprintf (&modname2, "%s_%s", modname, part);
+	  free (modname);
+	  modname = modname2;
+	}
+
+      hsa_sanitize_name (modname);
+      moddir.name = brig_emit_string (modname);
+      free (modname);
+    }
+  else
+    moddir.name = brig_emit_string ("__hsa_module_unnamed", '&');
+  moddir.base.kind = htole16 (BRIG_KIND_DIRECTIVE_MODULE);
+  moddir.hsailMajor = htole32 (BRIG_VERSION_HSAIL_MAJOR) ;
+  moddir.hsailMinor = htole32 (BRIG_VERSION_HSAIL_MINOR);
+  moddir.profile = hsa_full_profile_p () ? BRIG_PROFILE_FULL: BRIG_PROFILE_BASE;
+  if (hsa_machine_large_p ())
+    moddir.machineModel = BRIG_MACHINE_LARGE;
+  else
+    moddir.machineModel = BRIG_MACHINE_SMALL;
+  moddir.defaultFloatRound = BRIG_ROUND_FLOAT_DEFAULT;
+  brig_code.add (&moddir, sizeof (moddir));
+}
+
+/* Free all BRIG data.  */
+
+static void
+brig_release_data (void)
+{
+  delete brig_string_htab;
+  brig_data.release ();
+  brig_code.release ();
+  brig_operand.release ();
+
+  brig_initialized = 0;
+}
+
+/* Enqueue operation OP.  Return the offset at which it will be stored.  */
+
+static unsigned int
+enqueue_op (hsa_op_base *op)
+{
+  unsigned ret;
+
+  if (op->m_brig_op_offset)
+    return op->m_brig_op_offset;
+
+  ret = op_queue.projected_size;
+  op->m_brig_op_offset = op_queue.projected_size;
+
+  if (!op_queue.first_op)
+    op_queue.first_op = op;
+  else
+    op_queue.last_op->m_next = op;
+  op_queue.last_op = op;
+
+  if (is_a <hsa_op_immed *> (op))
+    op_queue.projected_size += sizeof (struct BrigOperandConstantBytes);
+  else if (is_a <hsa_op_reg *> (op))
+    op_queue.projected_size += sizeof (struct BrigOperandRegister);
+  else if (is_a <hsa_op_address *> (op))
+    op_queue.projected_size += sizeof (struct BrigOperandAddress);
+  else if (is_a <hsa_op_code_ref *> (op))
+    op_queue.projected_size += sizeof (struct BrigOperandCodeRef);
+  else if (is_a <hsa_op_code_list *> (op))
+    op_queue.projected_size += sizeof (struct BrigOperandCodeList);
+  else if (is_a <hsa_op_operand_list *> (op))
+    op_queue.projected_size += sizeof (struct BrigOperandOperandList);
+  else
+    gcc_unreachable ();
+  return ret;
+}
+
+
+/* Emit directive describing a symbol if it has not been emitted already.
+   Return the offset of the directive.  */
+
+static unsigned
+emit_directive_variable (struct hsa_symbol *symbol)
+{
+  struct BrigDirectiveVariable dirvar;
+  unsigned name_offset;
+  static unsigned res_name_offset;
+
+  if (symbol->m_directive_offset)
+    return symbol->m_directive_offset;
+
+  memset (&dirvar, 0, sizeof (dirvar));
+  dirvar.base.byteCount = htole16 (sizeof (dirvar));
+  dirvar.base.kind = htole16 (BRIG_KIND_DIRECTIVE_VARIABLE);
+  dirvar.allocation = symbol->m_allocation;
+
+  char prefix = symbol->m_global_scope_p ? '&' : '%';
+
+  if (symbol->m_decl && TREE_CODE (symbol->m_decl) == RESULT_DECL)
+    {
+      if (res_name_offset == 0)
+	res_name_offset = brig_emit_string (symbol->m_name, '%');
+      name_offset = res_name_offset;
+    }
+  else if (symbol->m_name)
+    name_offset = brig_emit_string (symbol->m_name, prefix);
+  else
+    {
+      char buf[64];
+      sprintf (buf, "__%s_%i", hsa_seg_name (symbol->m_segment),
+	       symbol->m_name_number);
+      name_offset = brig_emit_string (buf, prefix);
+    }
+
+  dirvar.name = htole32 (name_offset);
+  dirvar.init = 0;
+  dirvar.type = htole16 (symbol->m_type);
+  dirvar.segment = symbol->m_segment;
+  /* TODO: Once we are able to access global variables, we must copy their
+     alignment.  */
+  dirvar.align = MAX (hsa_natural_alignment (dirvar.type),
+		      (BrigAlignment8_t) BRIG_ALIGNMENT_4);
+  dirvar.linkage = symbol->m_linkage;
+  dirvar.dim.lo = (uint32_t) symbol->m_dim;
+  dirvar.dim.hi = (uint32_t) ((unsigned long long) symbol->m_dim >> 32);
+
+  /* Global variables are just declared and linked via HSA runtime.  */
+  if (symbol->m_linkage != BRIG_ALLOCATION_PROGRAM)
+    dirvar.modifier.allBits |= BRIG_VARIABLE_DEFINITION;
+  dirvar.reserved = 0;
+
+  if (symbol->m_cst_value)
+    {
+      dirvar.modifier.allBits |= BRIG_VARIABLE_CONST;
+      dirvar.init = htole32 (enqueue_op (symbol->m_cst_value));
+    }
+
+  symbol->m_directive_offset = brig_code.add (&dirvar, sizeof (dirvar));
+  return symbol->m_directive_offset;
+}
+
+/* Emit directives describing either a function declaration or
+   definition F.  */
+
+static BrigDirectiveExecutable *
+emit_function_directives (hsa_function_representation *f, bool is_declaration)
+{
+  struct BrigDirectiveExecutable fndir;
+  unsigned name_offset, inarg_off, scoped_off, next_toplev_off;
+  int count = 0;
+  BrigDirectiveExecutable *ptr_to_fndir;
+  hsa_symbol *sym;
+
+  if (!f->m_declaration_p)
+    for (int i = 0; f->m_global_symbols.iterate (i, &sym); i++)
+      {
+	emit_directive_variable (sym);
+	brig_insn_count++;
+      }
+
+  name_offset = brig_emit_string (f->m_name, '&');
+  inarg_off = brig_code.total_size + sizeof(fndir)
+    + (f->m_output_arg ? sizeof (struct BrigDirectiveVariable) : 0);
+  scoped_off = inarg_off
+    + f->m_input_args.length () * sizeof (struct BrigDirectiveVariable);
+
+  if (!f->m_declaration_p)
+    {
+      count += f->m_spill_symbols.length ();
+      count += f->m_private_variables.length ();
+    }
+
+  next_toplev_off = scoped_off + count * sizeof (struct BrigDirectiveVariable);
+
+  memset (&fndir, 0, sizeof (fndir));
+  fndir.base.byteCount = htole16 (sizeof (fndir));
+  fndir.base.kind = htole16 (f->m_kern_p ? BRIG_KIND_DIRECTIVE_KERNEL
+			     : BRIG_KIND_DIRECTIVE_FUNCTION);
+  fndir.name = htole32 (name_offset);
+  fndir.inArgCount = htole16 (f->m_input_args.length ());
+  fndir.outArgCount = htole16 (f->m_output_arg ? 1 : 0);
+  fndir.firstInArg = htole32 (inarg_off);
+  fndir.firstCodeBlockEntry = htole32 (scoped_off);
+  fndir.nextModuleEntry = htole32 (next_toplev_off);
+  fndir.linkage = f->get_linkage ();
+  if (!f->m_declaration_p)
+    fndir.modifier.allBits |= BRIG_EXECUTABLE_DEFINITION;
+  memset (&fndir.reserved, 0, sizeof (fndir.reserved));
+
+  /* Once we put a definition of function_offsets, we should not overwrite
+     it with a declaration of the function.  */
+  if (f->m_internal_fn == NULL)
+    {
+      if (!function_offsets->get (f->m_decl) || !is_declaration)
+	function_offsets->put (f->m_decl, brig_code.total_size);
+    }
+  else
+    {
+      /* Internal function.  */
+      hsa_internal_fn **slot = hsa_emitted_internal_decls->find_slot
+	(f->m_internal_fn, INSERT);
+      hsa_internal_fn *int_fn = new hsa_internal_fn (f->m_internal_fn);
+      int_fn->m_offset = brig_code.total_size;
+      *slot = int_fn;
+    }
+
+  brig_code.add (&fndir, sizeof (fndir));
+  /* XXX terrible hack: we need to set instCount after we emit all
+     insns, but we need to emit directive in order, and we emit directives
+     during insn emitting.  So we need to emit the FUNCTION directive
+     early, then the insns, and then we need to set instCount, so remember
+     a pointer to it, in some horrible way.  cur_chunk.data+size points
+     directly to after fndir here.  */
+  ptr_to_fndir
+      = (BrigDirectiveExecutable *)(brig_code.cur_chunk->data
+                                    + brig_code.cur_chunk->size
+                                    - sizeof (fndir));
+
+  if (f->m_output_arg)
+    emit_directive_variable (f->m_output_arg);
+  for (unsigned i = 0; i < f->m_input_args.length (); i++)
+    emit_directive_variable (f->m_input_args[i]);
+
+  if (!f->m_declaration_p)
+    {
+      for (int i = 0; f->m_spill_symbols.iterate (i, &sym); i++)
+	{
+	  emit_directive_variable (sym);
+	  brig_insn_count++;
+	}
+      for (unsigned i = 0; i < f->m_private_variables.length (); i++)
+	{
+	  emit_directive_variable (f->m_private_variables[i]);
+	  brig_insn_count++;
+	}
+    }
+
+  return ptr_to_fndir;
+}
+
+/* Emit a label directive for the given HBB.  We assume it is about to start on
+   the current offset in the code section.  */
+
+static void
+emit_bb_label_directive (hsa_bb *hbb)
+{
+  struct BrigDirectiveLabel lbldir;
+  char buf[32];
+
+  lbldir.base.byteCount = htole16 (sizeof (lbldir));
+  lbldir.base.kind = htole16 (BRIG_KIND_DIRECTIVE_LABEL);
+  sprintf (buf, "BB_%u_%i", DECL_UID (current_function_decl), hbb->m_index);
+  lbldir.name = htole32 (brig_emit_string (buf, '@'));
+
+  hbb->m_label_ref.m_directive_offset = brig_code.add (&lbldir, sizeof (lbldir));
+  brig_insn_count++;
+}
+
+/* Map a normal HSAIL type to the type of the equivalent BRIG operand
+   holding such, for constants and registers.  */
+
+static BrigType16_t
+regtype_for_type (BrigType16_t t)
+{
+  switch (t)
+    {
+    case BRIG_TYPE_B1:
+      return BRIG_TYPE_B1;
+
+    case BRIG_TYPE_U8:
+    case BRIG_TYPE_U16:
+    case BRIG_TYPE_U32:
+    case BRIG_TYPE_S8:
+    case BRIG_TYPE_S16:
+    case BRIG_TYPE_S32:
+    case BRIG_TYPE_B8:
+    case BRIG_TYPE_B16:
+    case BRIG_TYPE_B32:
+    case BRIG_TYPE_F16:
+    case BRIG_TYPE_F32:
+    case BRIG_TYPE_U8X4:
+    case BRIG_TYPE_U16X2:
+    case BRIG_TYPE_S8X4:
+    case BRIG_TYPE_S16X2:
+    case BRIG_TYPE_F16X2:
+      return BRIG_TYPE_B32;
+
+    case BRIG_TYPE_U64:
+    case BRIG_TYPE_S64:
+    case BRIG_TYPE_F64:
+    case BRIG_TYPE_B64:
+    case BRIG_TYPE_U8X8:
+    case BRIG_TYPE_U16X4:
+    case BRIG_TYPE_U32X2:
+    case BRIG_TYPE_S8X8:
+    case BRIG_TYPE_S16X4:
+    case BRIG_TYPE_S32X2:
+    case BRIG_TYPE_F16X4:
+    case BRIG_TYPE_F32X2:
+      return BRIG_TYPE_B64;
+
+    case BRIG_TYPE_B128:
+    case BRIG_TYPE_U8X16:
+    case BRIG_TYPE_U16X8:
+    case BRIG_TYPE_U32X4:
+    case BRIG_TYPE_U64X2:
+    case BRIG_TYPE_S8X16:
+    case BRIG_TYPE_S16X8:
+    case BRIG_TYPE_S32X4:
+    case BRIG_TYPE_S64X2:
+    case BRIG_TYPE_F16X8:
+    case BRIG_TYPE_F32X4:
+    case BRIG_TYPE_F64X2:
+      return BRIG_TYPE_B128;
+
+    default:
+      gcc_unreachable ();
+    }
+}
+
+/* Return the length of the BRIG type TYPE that is going to be streamed out as
+   an immediate constant (so it must not be B1).  */
+
+unsigned
+hsa_get_imm_brig_type_len (BrigType16_t type)
+{
+  BrigType16_t base_type = type & BRIG_TYPE_BASE_MASK;
+  BrigType16_t pack_type = type & BRIG_TYPE_PACK_MASK;
+
+  switch (pack_type)
+    {
+    case BRIG_TYPE_PACK_NONE:
+      break;
+    case BRIG_TYPE_PACK_32:
+      return 4;
+    case BRIG_TYPE_PACK_64:
+      return 8;
+    case BRIG_TYPE_PACK_128:
+      return 16;
+    default:
+      gcc_unreachable ();
+    }
+
+  switch (base_type)
+    {
+    case BRIG_TYPE_U8:
+    case BRIG_TYPE_S8:
+    case BRIG_TYPE_B8:
+      return 1;
+    case BRIG_TYPE_U16:
+    case BRIG_TYPE_S16:
+    case BRIG_TYPE_F16:
+    case BRIG_TYPE_B16:
+      return 2;
+    case BRIG_TYPE_U32:
+    case BRIG_TYPE_S32:
+    case BRIG_TYPE_F32:
+    case BRIG_TYPE_B32:
+      return 4;
+    case BRIG_TYPE_U64:
+    case BRIG_TYPE_S64:
+    case BRIG_TYPE_F64:
+    case BRIG_TYPE_B64:
+      return 8;
+    case BRIG_TYPE_B128:
+      return 16;
+    default:
+      gcc_unreachable ();
+    }
+}
+
+/* Emit one scalar VALUE to the buffer DATA intended for BRIG emission.
+   If NEED_LEN is not equal to zero, shrink or extend the value
+   to NEED_LEN bytes.  Return how many bytes were written.  */
+
+static int
+emit_immediate_scalar_to_buffer (tree value, char *data, unsigned need_len)
+{
+  union hsa_bytes bytes;
+
+  memset (&bytes, 0, sizeof (bytes));
+  tree type = TREE_TYPE (value);
+  gcc_checking_assert (TREE_CODE (type) != VECTOR_TYPE);
+
+  unsigned data_len = tree_to_uhwi (TYPE_SIZE (type))/BITS_PER_UNIT;
+  if (INTEGRAL_TYPE_P (type)
+      || (POINTER_TYPE_P (type) && TREE_CODE (value) == INTEGER_CST))
+    switch (data_len)
+      {
+      case 1:
+	bytes.b8 = (uint8_t) TREE_INT_CST_LOW (value);
+	break;
+      case 2:
+	bytes.b16 = (uint16_t) TREE_INT_CST_LOW (value);
+	break;
+      case 4:
+	bytes.b32 = (uint32_t) TREE_INT_CST_LOW (value);
+	break;
+      case 8:
+	bytes.b64 = (uint64_t) int_cst_value (value);
+	break;
+      default:
+	gcc_unreachable ();
+      }
+  else if (SCALAR_FLOAT_TYPE_P (type))
+    {
+      if (data_len == 2)
+	{
+	  sorry ("Support for HSA does not implement immediate 16 bit FPU "
+		 "operands");
+	  return 2;
+	}
+      unsigned int_len = GET_MODE_SIZE (TYPE_MODE (type));
+      /* There are always 32 bits in each long, no matter the size of
+	 the hosts long.  */
+      long tmp[6];
+
+      real_to_target (tmp, TREE_REAL_CST_PTR (value), TYPE_MODE (type));
+
+      if (int_len == 4)
+	bytes.b32 = (uint32_t) tmp[0];
+      else
+	{
+	  bytes.b64 = (uint64_t)(uint32_t) tmp[1];
+	  bytes.b64 <<= 32;
+	  bytes.b64 |= (uint32_t) tmp[0];
+	}
+    }
+  else
+    gcc_unreachable ();
+
+  int len;
+  if (need_len == 0)
+    len = data_len;
+  else
+    len = need_len;
+
+  memcpy (data, &bytes, len);
+  return len;
+}
+
+void
+hsa_op_immed::emit_to_buffer (tree value)
+{
+  unsigned total_len = m_brig_repr_size;
+
+  /* As we can have a constructor with fewer elements, fill the memory
+     with zeros.  */
+  m_brig_repr = XCNEWVEC (char, total_len);
+  char *p = m_brig_repr;
+
+  if (TREE_CODE (value) == VECTOR_CST)
+    {
+      int i, num = VECTOR_CST_NELTS (value);
+      for (i = 0; i < num; i++)
+	{
+	  unsigned actual;
+	  actual = emit_immediate_scalar_to_buffer
+	    (VECTOR_CST_ELT (value, i), p, 0);
+	  total_len -= actual;
+	  p += actual;
+	}
+      /* Vectors should have the exact size.  */
+      gcc_assert (total_len == 0);
+    }
+  else if (TREE_CODE (value) == STRING_CST)
+    memcpy (m_brig_repr, TREE_STRING_POINTER (value),
+	    TREE_STRING_LENGTH (value));
+  else if (TREE_CODE (value) == COMPLEX_CST)
+    {
+      gcc_assert (total_len % 2 == 0);
+      unsigned actual;
+      actual = emit_immediate_scalar_to_buffer
+	(TREE_REALPART (value), p, total_len / 2);
+
+      gcc_assert (actual == total_len / 2);
+      p += actual;
+
+      actual = emit_immediate_scalar_to_buffer
+	(TREE_IMAGPART (value), p, total_len / 2);
+      gcc_assert (actual == total_len / 2);
+    }
+  else if (TREE_CODE (value) == CONSTRUCTOR)
+    {
+      unsigned len = vec_safe_length (CONSTRUCTOR_ELTS (value));
+      for (unsigned i = 0; i < len; i++)
+	{
+	  unsigned actual = emit_immediate_scalar_to_buffer
+	    (CONSTRUCTOR_ELT (value, i)->value, p, 0);
+	  total_len -= actual;
+	  p += actual;
+	}
+    }
+  else
+    emit_immediate_scalar_to_buffer (value, p, total_len);
+}
+
+/* Emit an immediate BRIG operand IMM.  The BRIG type of the immediate might
+   have been massaged to comply with various HSA/BRIG type requirements, so the
+   only important aspect of that is the length (because HSAIL might expect
+   smaller constants or become bit-data).  The data should be represented
+   according to what is in the tree representation.  */
+
+static void
+emit_immediate_operand (hsa_op_immed *imm)
+{
+  struct BrigOperandConstantBytes out;
+
+  memset (&out, 0, sizeof (out));
+  out.base.byteCount = htole16 (sizeof (out));
+  out.base.kind = htole16 (BRIG_KIND_OPERAND_CONSTANT_BYTES);
+  uint32_t byteCount = htole32 (imm->m_brig_repr_size);
+  out.type = htole16 (imm->m_type);
+  out.bytes = htole32 (brig_data.add (&byteCount, sizeof (byteCount)));
+  brig_operand.add (&out, sizeof(out));
+  brig_data.add (imm->m_brig_repr, imm->m_brig_repr_size);
+  brig_data.round_size_up (4);
+}
+
+/* Emit a register BRIG operand REG.  */
+
+static void
+emit_register_operand (hsa_op_reg *reg)
+{
+  struct BrigOperandRegister out;
+
+  out.base.byteCount = htole16 (sizeof (out));
+  out.base.kind = htole16 (BRIG_KIND_OPERAND_REGISTER);
+  out.regNum = htole32 (reg->m_hard_num);
+
+  switch (regtype_for_type (reg->m_type))
+    {
+    case BRIG_TYPE_B32:
+      out.regKind = BRIG_REGISTER_KIND_SINGLE;
+      break;
+    case BRIG_TYPE_B64:
+      out.regKind = BRIG_REGISTER_KIND_DOUBLE;
+      break;
+    case BRIG_TYPE_B128:
+      out.regKind = BRIG_REGISTER_KIND_QUAD;
+      break;
+    case BRIG_TYPE_B1:
+      out.regKind = BRIG_REGISTER_KIND_CONTROL;
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+  brig_operand.add (&out, sizeof (out));
+}
+
+/* Emit an address BRIG operand ADDR.  */
+
+static void
+emit_address_operand (hsa_op_address *addr)
+{
+  struct BrigOperandAddress out;
+
+  out.base.byteCount = htole16 (sizeof (out));
+  out.base.kind = htole16 (BRIG_KIND_OPERAND_ADDRESS);
+  out.symbol = addr->m_symbol
+    ? htole32 (emit_directive_variable (addr->m_symbol)) : 0;
+  out.reg = addr->m_reg ? htole32 (enqueue_op (addr->m_reg)) : 0;
+
+  if (sizeof (addr->m_imm_offset) == 8)
+    {
+      out.offset.lo = htole32 ((uint32_t)addr->m_imm_offset);
+      out.offset.hi = htole32 (((long long) addr->m_imm_offset) >> 32);
+    }
+  else
+    {
+      gcc_assert (sizeof (addr->m_imm_offset) == 4);
+      out.offset.lo = htole32 (addr->m_imm_offset);
+      out.offset.hi = 0;
+    }
+
+  brig_operand.add (&out, sizeof (out));
+}
+
+/* Emit a code reference operand REF.  */
+
+static void
+emit_code_ref_operand (hsa_op_code_ref *ref)
+{
+  struct BrigOperandCodeRef out;
+
+  out.base.byteCount = htole16 (sizeof (out));
+  out.base.kind = htole16 (BRIG_KIND_OPERAND_CODE_REF);
+  out.ref = htole32 (ref->m_directive_offset);
+  brig_operand.add (&out, sizeof (out));
+}
+
+/* Emit a code list operand CODE_LIST.  */
+
+static void
+emit_code_list_operand (hsa_op_code_list *code_list)
+{
+  struct BrigOperandCodeList out;
+  unsigned args = code_list->m_offsets.length ();
+
+  for (unsigned i = 0; i < args; i++)
+    gcc_assert (code_list->m_offsets[i]);
+
+  out.base.byteCount = htole16 (sizeof (out));
+  out.base.kind = htole16 (BRIG_KIND_OPERAND_CODE_LIST);
+
+  uint32_t byteCount = htole32 (4 * args);
+
+  out.elements = htole32 (brig_data.add (&byteCount, sizeof (byteCount)));
+  brig_data.add (code_list->m_offsets.address (), args * sizeof (uint32_t));
+  brig_data.round_size_up (4);
+  brig_operand.add (&out, sizeof (out));
+}
+
+/* Emit an operand list operand OPERAND_LIST.  */
+
+static void
+emit_operand_list_operand (hsa_op_operand_list *operand_list)
+{
+  struct BrigOperandOperandList out;
+  unsigned args = operand_list->m_offsets.length ();
+
+  for (unsigned i = 0; i < args; i++)
+    gcc_assert (operand_list->m_offsets[i]);
+
+  out.base.byteCount = htole16 (sizeof (out));
+  out.base.kind = htole16 (BRIG_KIND_OPERAND_OPERAND_LIST);
+
+  uint32_t byteCount = htole32 (4 * args);
+
+  out.elements = htole32 (brig_data.add (&byteCount, sizeof (byteCount)));
+  brig_data.add (operand_list->m_offsets.address (), args * sizeof (uint32_t));
+  brig_data.round_size_up (4);
+  brig_operand.add (&out, sizeof (out));
+}
+
+/* Emit all operands queued for writing.  */
+
+static void
+emit_queued_operands (void)
+{
+  for (hsa_op_base *op = op_queue.first_op; op; op = op->m_next)
+    {
+      gcc_assert (op->m_brig_op_offset == brig_operand.total_size);
+      if (hsa_op_immed *imm = dyn_cast <hsa_op_immed *> (op))
+	emit_immediate_operand (imm);
+      else if (hsa_op_reg *reg = dyn_cast <hsa_op_reg *> (op))
+	emit_register_operand (reg);
+      else if (hsa_op_address *addr = dyn_cast <hsa_op_address *> (op))
+	emit_address_operand (addr);
+      else if (hsa_op_code_ref *ref = dyn_cast <hsa_op_code_ref *> (op))
+	emit_code_ref_operand (ref);
+      else if (hsa_op_code_list *code_list = dyn_cast <hsa_op_code_list *> (op))
+	emit_code_list_operand (code_list);
+      else if (hsa_op_operand_list *l = dyn_cast <hsa_op_operand_list *> (op))
+	emit_operand_list_operand (l);
+      else
+	gcc_unreachable ();
+    }
+}
+
+/* Emit directives describing the function that is used for
+   a function declaration.  */
+
+static BrigDirectiveExecutable *
+emit_function_declaration (tree decl)
+{
+  hsa_function_representation *f = hsa_generate_function_declaration (decl);
+
+  BrigDirectiveExecutable *e = emit_function_directives (f, true);
+  emit_queued_operands ();
+
+  delete f;
+
+  return e;
+}
+
+/* Emit directives describing the function that is used for
+   an internal function declaration.  */
+
+static BrigDirectiveExecutable *
+emit_internal_fn_decl (hsa_internal_fn *fn)
+{
+  hsa_function_representation *f = hsa_generate_internal_fn_decl (fn);
+
+  BrigDirectiveExecutable *e = emit_function_directives (f, true);
+  emit_queued_operands ();
+
+  delete f;
+
+  return e;
+}
+
+
+/* Enqueue all operands of INSN and return offset to BRIG data section
+   to list of operand offsets.  */
+
+static unsigned
+emit_insn_operands (hsa_insn_basic *insn)
+{
+  auto_vec<BrigOperandOffset32_t, HSA_BRIG_INT_STORAGE_OPERANDS>
+    operand_offsets;
+
+  unsigned l = insn->operand_count ();
+  operand_offsets.safe_grow (l);
+
+  for (unsigned i = 0; i < l; i++)
+    operand_offsets[i] = htole32 (enqueue_op (insn->get_op (i)));
+
+  /* We have N operands so use 4 * N for the byte_count.  */
+  uint32_t byte_count = htole32 (4 * l);
+
+  unsigned offset = brig_data.add (&byte_count, sizeof (byte_count));
+  brig_data.add (operand_offsets.address (),
+		 l * sizeof (BrigOperandOffset32_t));
+
+  brig_data.round_size_up (4);
+
+  return offset;
+}
+
+/* Enqueue operand OP0, OP1, OP2 (if different from NULL) and return offset
+   to BRIG data section to list of operand offsets.  */
+
+static unsigned
+emit_operands (hsa_op_base *op0, hsa_op_base *op1 = NULL,
+	       hsa_op_base *op2 = NULL)
+{
+  auto_vec<BrigOperandOffset32_t, HSA_BRIG_INT_STORAGE_OPERANDS>
+    operand_offsets;
+
+  gcc_checking_assert (op0 != NULL);
+  operand_offsets.safe_push (enqueue_op (op0));
+
+  if (op1 != NULL)
+    {
+      operand_offsets.safe_push (enqueue_op (op1));
+      if (op2 != NULL)
+	operand_offsets.safe_push (enqueue_op (op2));
+    }
+
+  unsigned l = operand_offsets.length ();
+
+  /* We have N operands so use 4 * N for the byte_count.  */
+  uint32_t byte_count = htole32 (4 * l);
+
+  unsigned offset = brig_data.add (&byte_count, sizeof (byte_count));
+  brig_data.add (operand_offsets.address (),
+		 l * sizeof (BrigOperandOffset32_t));
+
+  brig_data.round_size_up (4);
+
+  return offset;
+}
+
+/* Emit an HSA memory instruction and all necessary directives, schedule
+   necessary operands for writing .  */
+
+static void
+emit_memory_insn (hsa_insn_mem *mem)
+{
+  struct BrigInstMem repr;
+  gcc_checking_assert (mem->operand_count () == 2);
+
+  hsa_op_address *addr = as_a <hsa_op_address *> (mem->get_op (1));
+
+  /* This is necessary because of the erroneous typedef of
+     BrigMemoryModifier8_t which introduces padding which may then contain
+     random stuff (which we do not want so that we can test things don't
+     change).  */
+  memset (&repr, 0, sizeof (repr));
+  repr.base.base.byteCount = htole16 (sizeof (repr));
+  repr.base.base.kind = htole16 (BRIG_KIND_INST_MEM);
+  repr.base.opcode = htole16 (mem->m_opcode);
+  repr.base.type = htole16 (mem->m_type);
+  repr.base.operands = htole32 (emit_insn_operands (mem));
+
+  if (addr->m_symbol)
+    repr.segment = addr->m_symbol->m_segment;
+  else
+    repr.segment = BRIG_SEGMENT_FLAT;
+  repr.modifier.allBits = 0 ;
+  repr.equivClass = mem->m_equiv_class;
+  repr.align = mem->m_align;
+  if (mem->m_opcode == BRIG_OPCODE_LD)
+    repr.width = BRIG_WIDTH_1;
+  else
+    repr.width = BRIG_WIDTH_NONE;
+  memset (&repr.reserved, 0, sizeof (repr.reserved));
+  brig_code.add (&repr, sizeof (repr));
+  brig_insn_count++;
+}
+
+/* Emit an HSA signal memory instruction and all necessary directives, schedule
+   necessary operands for writing.  */
+
+static void
+emit_signal_insn (hsa_insn_signal *mem)
+{
+  struct BrigInstSignal repr;
+
+  /* This is necessary because of the erroneous typedef of
+     BrigMemoryModifier8_t which introduces padding which may then contain
+     random stuff (which we do not want so that we can test things don't
+     change).  */
+  memset (&repr, 0, sizeof (repr));
+  repr.base.base.byteCount = htole16 (sizeof (repr));
+  repr.base.base.kind = htole16 (BRIG_KIND_INST_SIGNAL);
+  repr.base.opcode = htole16 (mem->m_opcode);
+  repr.base.type = htole16 (mem->m_type);
+  repr.base.operands = htole32 (emit_insn_operands (mem));
+
+  repr.memoryOrder = mem->m_memoryorder;
+  repr.signalOperation = mem->m_atomicop;
+  repr.signalType = BRIG_TYPE_SIG64;
+
+  brig_code.add (&repr, sizeof (repr));
+  brig_insn_count++;
+}
+
+/* Emit an HSA atomic memory instruction and all necessary directives, schedule
+   necessary operands for writing .  */
+
+static void
+emit_atomic_insn (hsa_insn_atomic *mem)
+{
+  struct BrigInstAtomic repr;
+
+  /* Either operand[0] or operand[1] must be an address operand.  */
+  hsa_op_address *addr = NULL;
+  if (is_a <hsa_op_address *> (mem->get_op (0)))
+    addr = as_a <hsa_op_address *> (mem->get_op (0));
+  else
+    addr = as_a <hsa_op_address *> (mem->get_op (1));
+
+  /* This is necessary because of the erroneous typedef of
+     BrigMemoryModifier8_t which introduces padding which may then contain
+     random stuff (which we do not want so that we can test things don't
+     change).  */
+  memset (&repr, 0, sizeof (repr));
+  repr.base.base.byteCount = htole16 (sizeof (repr));
+  repr.base.base.kind = htole16 (BRIG_KIND_INST_ATOMIC);
+  repr.base.opcode = htole16 (mem->m_opcode);
+  repr.base.type = htole16 (mem->m_type);
+  repr.base.operands = htole32 (emit_insn_operands (mem));
+
+  if (addr->m_symbol)
+    repr.segment = addr->m_symbol->m_segment;
+  else
+    repr.segment = BRIG_SEGMENT_FLAT;
+  repr.memoryOrder = mem->m_memoryorder;
+  repr.memoryScope = mem->m_memoryscope;
+  repr.atomicOperation = mem->m_atomicop;
+
+  brig_code.add (&repr, sizeof (repr));
+  brig_insn_count++;
+}
+
+/* Emit an HSA LDA instruction and all necessary directives, schedule
+   necessary operands for writing .  */
+
+static void
+emit_addr_insn (hsa_insn_basic *insn)
+{
+  struct BrigInstAddr repr;
+
+  hsa_op_address *addr = as_a <hsa_op_address *> (insn->get_op (1));
+
+  repr.base.base.byteCount = htole16 (sizeof (repr));
+  repr.base.base.kind = htole16 (BRIG_KIND_INST_ADDR);
+  repr.base.opcode = htole16 (insn->m_opcode);
+  repr.base.type = htole16 (insn->m_type);
+  repr.base.operands = htole32 (emit_insn_operands (insn));
+
+  if (addr->m_symbol)
+    repr.segment = addr->m_symbol->m_segment;
+  else
+    repr.segment = BRIG_SEGMENT_FLAT;
+  memset (&repr.reserved, 0, sizeof (repr.reserved));
+
+  brig_code.add (&repr, sizeof (repr));
+  brig_insn_count++;
+}
+
+/* Emit an HSA segment conversion instruction and all necessary directives,
+   schedule necessary operands for writing .  */
+
+static void
+emit_segment_insn (hsa_insn_seg *seg)
+{
+  struct BrigInstSegCvt repr;
+
+  repr.base.base.byteCount = htole16 (sizeof (repr));
+  repr.base.base.kind = htole16 (BRIG_KIND_INST_SEG_CVT);
+  repr.base.opcode = htole16 (seg->m_opcode);
+  repr.base.type = htole16 (seg->m_type);
+  repr.base.operands = htole32 (emit_insn_operands (seg));
+  repr.sourceType = htole16 (as_a <hsa_op_reg *> (seg->get_op (1))->m_type);
+  repr.segment = seg->m_segment;
+  repr.modifier.allBits = 0;
+
+  brig_code.add (&repr, sizeof (repr));
+
+  brig_insn_count++;
+}
+
+/* Emit an HSA alloca instruction and all necessary directives,
+   schedule necessary operands for writing .  */
+
+static void
+emit_alloca_insn (hsa_insn_alloca *alloca)
+{
+  struct BrigInstMem repr;
+  gcc_checking_assert (alloca->operand_count () == 2);
+
+  /* This is necessary because of the erroneous typedef of
+     BrigMemoryModifier8_t which introduces padding which may then contain
+     random stuff (which we do not want so that we can test things don't
+     change).  */
+  memset (&repr, 0, sizeof (repr));
+  repr.base.base.byteCount = htole16 (sizeof (repr));
+  repr.base.base.kind = htole16 (BRIG_KIND_INST_MEM);
+  repr.base.opcode = htole16 (alloca->m_opcode);
+  repr.base.type = htole16 (alloca->m_type);
+  repr.base.operands = htole32 (emit_insn_operands (alloca));
+  repr.segment = BRIG_SEGMENT_PRIVATE;
+  repr.modifier.allBits = 0 ;
+  repr.equivClass = 0;
+  repr.align = alloca->m_align;
+  repr.width = BRIG_WIDTH_NONE;
+  memset (&repr.reserved, 0, sizeof (repr.reserved));
+  brig_code.add (&repr, sizeof (repr));
+  brig_insn_count++;
+}
+
+/* Emit an HSA comparison instruction and all necessary directives,
+   schedule necessary operands for writing .  */
+
+static void
+emit_cmp_insn (hsa_insn_cmp *cmp)
+{
+  struct BrigInstCmp repr;
+
+  memset (&repr, 0, sizeof (repr));
+  repr.base.base.byteCount = htole16 (sizeof (repr));
+  repr.base.base.kind = htole16 (BRIG_KIND_INST_CMP);
+  repr.base.opcode = htole16 (cmp->m_opcode);
+  repr.base.type = htole16 (cmp->m_type);
+  repr.base.operands = htole32 (emit_insn_operands (cmp));
+
+  if (is_a <hsa_op_reg *> (cmp->get_op (1)))
+    repr.sourceType = htole16 (as_a <hsa_op_reg *> (cmp->get_op (1))->m_type);
+  else
+    repr.sourceType = htole16 (as_a <hsa_op_immed *> (cmp->get_op (1))->m_type);
+  repr.modifier.allBits = 0;
+  repr.compare = cmp->m_compare;
+  repr.pack = 0;
+
+  brig_code.add (&repr, sizeof (repr));
+  brig_insn_count++;
+}
+
+/* Emit an HSA branching instruction and all necessary directives, schedule
+   necessary operands for writing .  */
+
+static void
+emit_branch_insn (hsa_insn_br *br)
+{
+  struct BrigInstBr repr;
+
+  basic_block target = NULL;
+  edge_iterator ei;
+  edge e;
+
+  /* At the moment we only handle direct conditional jumps.  */
+  gcc_assert (br->m_opcode == BRIG_OPCODE_CBR);
+  repr.base.base.byteCount = htole16 (sizeof (repr));
+  repr.base.base.kind = htole16 (BRIG_KIND_INST_BR);
+  repr.base.opcode = htole16 (br->m_opcode);
+  repr.width = BRIG_WIDTH_1;
+  /* For Conditional jumps the type is always B1.  */
+  repr.base.type = htole16 (BRIG_TYPE_B1);
+
+  FOR_EACH_EDGE (e, ei, br->m_bb->succs)
+    if (e->flags & EDGE_TRUE_VALUE)
+      {
+	target = e->dest;
+	break;
+      }
+  gcc_assert (target);
+
+  repr.base.operands = htole32
+    (emit_operands (br->get_op (0), &hsa_bb_for_bb (target)->m_label_ref));
+  memset (&repr.reserved, 0, sizeof (repr.reserved));
+
+  brig_code.add (&repr, sizeof (repr));
+  brig_insn_count++;
+}
+
+/* Emit an HSA unconditional jump branching instruction that points to
+   a label REFERENCE.  */
+
+static void
+emit_unconditional_jump (hsa_op_code_ref *reference)
+{
+  struct BrigInstBr repr;
+
+  repr.base.base.byteCount = htole16 (sizeof (repr));
+  repr.base.base.kind = htole16 (BRIG_KIND_INST_BR);
+  repr.base.opcode = htole16 (BRIG_OPCODE_BR);
+  repr.base.type = htole16 (BRIG_TYPE_NONE);
+  /* Direct branches to labels must be width(all).  */
+  repr.width = BRIG_WIDTH_ALL;
+
+  repr.base.operands = htole32 (emit_operands (reference));
+  memset (&repr.reserved, 0, sizeof (repr.reserved));
+  brig_code.add (&repr, sizeof (repr));
+  brig_insn_count++;
+}
+
+/* Emit an HSA switch jump instruction that uses a jump table to
+   jump to a destination label.  */
+
+static void
+emit_switch_insn (hsa_insn_sbr *sbr)
+{
+  struct BrigInstBr repr;
+
+  gcc_assert (sbr->m_opcode == BRIG_OPCODE_SBR);
+  repr.base.base.byteCount = htole16 (sizeof (repr));
+  repr.base.base.kind = htole16 (BRIG_KIND_INST_BR);
+  repr.base.opcode = htole16 (sbr->m_opcode);
+  repr.width = BRIG_WIDTH_1;
+  /* For Conditional jumps the type is always B1.  */
+  hsa_op_reg *index = as_a <hsa_op_reg *> (sbr->get_op (0));
+  repr.base.type = htole16 (index->m_type);
+  repr.base.operands = htole32
+    (emit_operands (sbr->get_op (0), sbr->m_label_code_list));
+  memset (&repr.reserved, 0, sizeof (repr.reserved));
+
+  brig_code.add (&repr, sizeof (repr));
+  brig_insn_count++;
+
+  /* Emit jump to default label.  */
+  hsa_bb *hbb = hsa_bb_for_bb (sbr->m_default_bb);
+  emit_unconditional_jump (&hbb->m_label_ref);
+}
+
+/* Emit a HSA convert instruction and all necessary directives, schedule
+   necessary operands for writing.  */
+
+static void
+emit_cvt_insn (hsa_insn_cvt *insn)
+{
+  struct BrigInstCvt repr;
+  BrigType16_t srctype;
+
+  repr.base.base.byteCount = htole16 (sizeof (repr));
+  repr.base.base.kind = htole16 (BRIG_KIND_INST_CVT);
+  repr.base.opcode = htole16 (insn->m_opcode);
+  repr.base.type = htole16 (insn->m_type);
+  repr.base.operands = htole32 (emit_insn_operands (insn));
+
+  if (is_a <hsa_op_reg *> (insn->get_op (1)))
+    srctype = as_a <hsa_op_reg *> (insn->get_op (1))->m_type;
+  else
+    srctype = as_a <hsa_op_immed *> (insn->get_op (1))->m_type;
+  repr.sourceType = htole16 (srctype);
+  repr.modifier.allBits = 0;
+  /* float to smaller float requires a rounding setting (we default
+     to 'near'.  */
+  if (hsa_type_float_p (insn->m_type)
+      && (!hsa_type_float_p (srctype)
+	  || ((insn->m_type & BRIG_TYPE_BASE_MASK)
+	      < (srctype & BRIG_TYPE_BASE_MASK))))
+    repr.round = BRIG_ROUND_FLOAT_NEAR_EVEN;
+  else if (hsa_type_integer_p (insn->m_type) &&
+	   hsa_type_float_p (srctype))
+    repr.round = BRIG_ROUND_INTEGER_ZERO;
+  else
+    repr.round = BRIG_ROUND_NONE;
+  brig_code.add (&repr, sizeof (repr));
+  brig_insn_count++;
+}
+
+/* Emit call instruction INSN, where this instruction must be closed
+   within a call block instruction.  */
+
+static void
+emit_call_insn (hsa_insn_call *call)
+{
+  struct BrigInstBr repr;
+
+  repr.base.base.byteCount = htole16 (sizeof (repr));
+  repr.base.base.kind = htole16 (BRIG_KIND_INST_BR);
+  repr.base.opcode = htole16 (BRIG_OPCODE_CALL);
+  repr.base.type = htole16 (BRIG_TYPE_NONE);
+
+  repr.base.operands = htole32
+    (emit_operands (call->m_result_code_list, &call->m_func,
+		    call->m_args_code_list));
+
+  /* Internal functions have not set m_called_function.  */
+  if (call->m_called_function)
+    function_call_linkage.safe_push
+      (function_linkage_pair (call->m_called_function,
+			      call->m_func.m_brig_op_offset));
+  else
+    {
+      hsa_internal_fn *slot = hsa_emitted_internal_decls->find
+	(call->m_called_internal_fn);
+      gcc_assert (slot);
+      gcc_assert (slot->m_offset > 0);
+      call->m_func.m_directive_offset = slot->m_offset;
+    }
+
+  repr.width = BRIG_WIDTH_ALL;
+  memset (&repr.reserved, 0, sizeof (repr.reserved));
+
+  brig_code.add (&repr, sizeof (repr));
+  brig_insn_count++;
+}
+
+/* Emit argument block directive.  */
+
+static void
+emit_arg_block_insn (hsa_insn_arg_block *insn)
+{
+  switch (insn->m_kind)
+    {
+    case BRIG_KIND_DIRECTIVE_ARG_BLOCK_START:
+      {
+	struct BrigDirectiveArgBlockStart repr;
+	repr.base.byteCount = htole16 (sizeof (repr));
+	repr.base.kind = htole16 (insn->m_kind);
+	brig_code.add (&repr, sizeof (repr));
+
+	for (unsigned i = 0; i < insn->m_call_insn->m_input_args.length (); i++)
+	  {
+	    insn->m_call_insn->m_args_code_list->m_offsets[i] = htole32
+	      (emit_directive_variable (insn->m_call_insn->m_input_args[i]));
+	    brig_insn_count++;
+	  }
+
+	if (insn->m_call_insn->m_output_arg)
+	  {
+	    insn->m_call_insn->m_result_code_list->m_offsets[0] = htole32
+	      (emit_directive_variable (insn->m_call_insn->m_output_arg));
+	    brig_insn_count++;
+	  }
+
+	break;
+      }
+    case BRIG_KIND_DIRECTIVE_ARG_BLOCK_END:
+      {
+	struct BrigDirectiveArgBlockEnd repr;
+	repr.base.byteCount = htole16 (sizeof (repr));
+	repr.base.kind = htole16 (insn->m_kind);
+	brig_code.add (&repr, sizeof (repr));
+	break;
+      }
+    default:
+      gcc_unreachable ();
+    }
+
+  brig_insn_count++;
+}
+
+/* Emit comment directive.  */
+
+static void
+emit_comment_insn (hsa_insn_comment *insn)
+{
+  struct BrigDirectiveComment repr;
+  memset (&repr, 0, sizeof (repr));
+
+  repr.base.byteCount = htole16 (sizeof (repr));
+  repr.base.kind = htole16 (insn->m_opcode);
+  repr.name = brig_emit_string (insn->m_comment, '\0', false);
+  brig_code.add (&repr, sizeof (repr));
+}
+
+/* Emit queue instruction INSN.  */
+
+static void
+emit_queue_insn (hsa_insn_queue *insn)
+{
+  BrigInstQueue repr;
+  memset (&repr, 0, sizeof (repr));
+
+  repr.base.base.byteCount = htole16 (sizeof (repr));
+  repr.base.base.kind = htole16 (BRIG_KIND_INST_QUEUE);
+  repr.base.opcode = htole16 (insn->m_opcode);
+  repr.base.type = htole16 (insn->m_type);
+  repr.segment = BRIG_SEGMENT_GLOBAL;
+  repr.memoryOrder = BRIG_MEMORY_ORDER_SC_RELEASE;
+  repr.base.operands = htole32 (emit_insn_operands (insn));
+  brig_data.round_size_up (4);
+  brig_code.add (&repr, sizeof (repr));
+
+  brig_insn_count++;
+}
+
+/* Emit source type instruction INSN.  */
+
+static void
+emit_srctype_insn (hsa_insn_srctype *insn)
+{
+  /* We assume that BrigInstMod has a BrigInstBasic prefix.  */
+  struct BrigInstSourceType repr;
+  unsigned operand_count = insn->operand_count ();
+  gcc_checking_assert (operand_count >= 2);
+
+  memset (&repr, 0, sizeof (repr));
+  repr.sourceType = htole16 (insn->m_source_type);
+  repr.base.base.byteCount = htole16 (sizeof (repr));
+  repr.base.base.kind = htole16 (BRIG_KIND_INST_SOURCE_TYPE);
+  repr.base.opcode = htole16 (insn->m_opcode);
+  repr.base.type = htole16 (insn->m_type);
+
+  repr.base.operands = htole32 (emit_insn_operands (insn));
+  brig_code.add (&repr, sizeof (struct BrigInstSourceType));
+  brig_insn_count++;
+}
+
+/* Emit packed instruction INSN.  */
+
+static void
+emit_packed_insn (hsa_insn_packed *insn)
+{
+  /* We assume that BrigInstMod has a BrigInstBasic prefix.  */
+  struct BrigInstSourceType repr;
+  unsigned operand_count = insn->operand_count ();
+  gcc_checking_assert (operand_count >= 2);
+
+  memset (&repr, 0, sizeof (repr));
+  repr.sourceType = htole16 (insn->m_source_type);
+  repr.base.base.byteCount = htole16 (sizeof (repr));
+  repr.base.base.kind = htole16 (BRIG_KIND_INST_SOURCE_TYPE);
+  repr.base.opcode = htole16 (insn->m_opcode);
+  repr.base.type = htole16 (insn->m_type);
+
+  if (insn->m_opcode == BRIG_OPCODE_COMBINE)
+    {
+      /* Create operand list for packed type.  */
+      for (unsigned i = 1; i < operand_count; i++)
+	{
+	  gcc_checking_assert (insn->get_op (i));
+	  insn->m_operand_list->m_offsets[i - 1] = htole32
+	    (enqueue_op (insn->get_op (i)));
+	}
+
+      repr.base.operands = htole32 (emit_operands (insn->get_op (0),
+						   insn->m_operand_list));
+    }
+  else if (insn->m_opcode == BRIG_OPCODE_EXPAND)
+    {
+      /* Create operand list for packed type.  */
+      for (unsigned i = 0; i < operand_count - 1; i++)
+	{
+	  gcc_checking_assert (insn->get_op (i));
+	  insn->m_operand_list->m_offsets[i] = htole32
+	    (enqueue_op (insn->get_op (i)));
+	}
+
+      repr.base.operands = htole32
+	(emit_operands (insn->m_operand_list,
+			insn->get_op (insn->operand_count () - 1)));
+    }
+
+
+  brig_code.add (&repr, sizeof (struct BrigInstSourceType));
+  brig_insn_count++;
+}
+
+/* Emit a basic HSA instruction and all necessary directives, schedule
+   necessary operands for writing.  */
+
+static void
+emit_basic_insn (hsa_insn_basic *insn)
+{
+  /* We assume that BrigInstMod has a BrigInstBasic prefix.  */
+  struct BrigInstMod repr;
+  BrigType16_t type;
+
+  memset (&repr, 0, sizeof (repr));
+  repr.base.base.byteCount = htole16 (sizeof (BrigInstBasic));
+  repr.base.base.kind = htole16 (BRIG_KIND_INST_BASIC);
+  repr.base.opcode = htole16 (insn->m_opcode);
+  switch (insn->m_opcode)
+    {
+      /* And the bit-logical operations need bit types and whine about
+         arithmetic types :-/  */
+      case BRIG_OPCODE_AND:
+      case BRIG_OPCODE_OR:
+      case BRIG_OPCODE_XOR:
+      case BRIG_OPCODE_NOT:
+	type = regtype_for_type (insn->m_type);
+	break;
+      default:
+	type = insn->m_type;
+	break;
+    }
+  repr.base.type = htole16 (type);
+  repr.base.operands = htole32 (emit_insn_operands (insn));
+
+  if ((type & BRIG_TYPE_PACK_MASK) != BRIG_TYPE_PACK_NONE)
+    {
+      if (hsa_type_float_p (type)
+	  && !hsa_opcode_floating_bit_insn_p (insn->m_opcode))
+	repr.round = BRIG_ROUND_FLOAT_NEAR_EVEN;
+      else
+	repr.round = 0;
+      /* We assume that destination and sources agree in packing
+         layout.  */
+      if (insn->num_used_ops () >= 2)
+	repr.pack = BRIG_PACK_PP;
+      else
+	repr.pack = BRIG_PACK_P;
+      repr.reserved = 0;
+      repr.base.base.byteCount = htole16 (sizeof (BrigInstMod));
+      repr.base.base.kind = htole16 (BRIG_KIND_INST_MOD);
+      brig_code.add (&repr, sizeof (struct BrigInstMod));
+    }
+  else
+    brig_code.add (&repr, sizeof (struct BrigInstBasic));
+  brig_insn_count++;
+}
+
+/* Emit an HSA instruction and all necessary directives, schedule necessary
+   operands for writing .  */
+
+static void
+emit_insn (hsa_insn_basic *insn)
+{
+  gcc_assert (!is_a <hsa_insn_phi *> (insn));
+
+  insn->m_brig_offset = brig_code.total_size;
+
+  if (hsa_insn_signal *signal = dyn_cast <hsa_insn_signal *> (insn))
+    emit_signal_insn (signal);
+  else if (hsa_insn_atomic *atom = dyn_cast <hsa_insn_atomic *> (insn))
+    emit_atomic_insn (atom);
+  else if (hsa_insn_mem *mem = dyn_cast <hsa_insn_mem *> (insn))
+    emit_memory_insn (mem);
+  else if (insn->m_opcode == BRIG_OPCODE_LDA)
+    emit_addr_insn (insn);
+  else if (hsa_insn_seg *seg = dyn_cast <hsa_insn_seg *> (insn))
+    emit_segment_insn (seg);
+  else if (hsa_insn_cmp *cmp = dyn_cast <hsa_insn_cmp *> (insn))
+    emit_cmp_insn (cmp);
+  else if (hsa_insn_br *br = dyn_cast <hsa_insn_br *> (insn))
+    emit_branch_insn (br);
+  else if (hsa_insn_sbr *sbr = dyn_cast <hsa_insn_sbr *> (insn))
+    {
+      if (switch_instructions == NULL)
+	switch_instructions = new vec <hsa_insn_sbr *> ();
+
+      switch_instructions->safe_push (sbr);
+      emit_switch_insn (sbr);
+    }
+  else if (hsa_insn_arg_block *block = dyn_cast <hsa_insn_arg_block *> (insn))
+    emit_arg_block_insn (block);
+  else if (hsa_insn_call *call = dyn_cast <hsa_insn_call *> (insn))
+    emit_call_insn (call);
+  else if (hsa_insn_comment *comment = dyn_cast <hsa_insn_comment *> (insn))
+    emit_comment_insn (comment);
+  else if (hsa_insn_queue *queue = dyn_cast <hsa_insn_queue *> (insn))
+    emit_queue_insn (queue);
+  else if (hsa_insn_srctype *srctype = dyn_cast <hsa_insn_srctype *> (insn))
+    emit_srctype_insn (srctype);
+  else if (hsa_insn_packed *packed = dyn_cast <hsa_insn_packed *> (insn))
+    emit_packed_insn (packed);
+  else if (hsa_insn_cvt *cvt = dyn_cast <hsa_insn_cvt *> (insn))
+    emit_cvt_insn (cvt);
+  else if (hsa_insn_alloca *alloca = dyn_cast <hsa_insn_alloca *> (insn))
+    emit_alloca_insn (alloca);
+  else
+    emit_basic_insn (insn);
+}
+
+/* We have just finished emitting BB and are about to emit NEXT_BB if non-NULL,
+   or we are about to finish emitting code, if it is NULL.  If the fall through
+   edge from BB does not lead to NEXT_BB, emit an unconditional jump.  */
+
+static void
+perhaps_emit_branch (basic_block bb, basic_block next_bb)
+{
+  basic_block t_bb = NULL, ff = NULL;
+
+  edge_iterator ei;
+  edge e;
+
+  /* If the last instruction of BB is a switch, ignore emission of all
+     edges.  */
+  if (hsa_bb_for_bb (bb)->m_last_insn
+      && is_a <hsa_insn_sbr *> (hsa_bb_for_bb (bb)->m_last_insn))
+    return;
+
+  FOR_EACH_EDGE (e, ei, bb->succs)
+    if (e->flags & EDGE_TRUE_VALUE)
+      {
+	gcc_assert (!t_bb);
+	t_bb = e->dest;
+      }
+    else
+      {
+	gcc_assert (!ff);
+	ff = e->dest;
+      }
+
+  if (!ff || ff == next_bb || ff == EXIT_BLOCK_PTR_FOR_FN (cfun))
+    return;
+
+  emit_unconditional_jump (&hsa_bb_for_bb (ff)->m_label_ref);
+}
+
+/* Emit the a function with name NAME to the various brig sections.  */
+
+void
+hsa_brig_emit_function (void)
+{
+  basic_block bb, prev_bb;
+  hsa_insn_basic *insn;
+  BrigDirectiveExecutable *ptr_to_fndir;
+
+  brig_init ();
+
+  brig_insn_count = 0;
+  memset (&op_queue, 0, sizeof (op_queue));
+  op_queue.projected_size = brig_operand.total_size;
+
+  if (!function_offsets)
+    function_offsets = new hash_map<tree, BrigCodeOffset32_t> ();
+
+  if (!emitted_declarations)
+    emitted_declarations = new hash_map <tree, BrigDirectiveExecutable *> ();
+
+  for (unsigned i = 0; i < hsa_cfun->m_called_functions.length (); i++)
+    {
+      tree called = hsa_cfun->m_called_functions[i];
+
+      /* If the function has no definition, emit a declaration.  */
+      if (!emitted_declarations->get (called))
+	{
+	  BrigDirectiveExecutable *e = emit_function_declaration (called);
+	  emitted_declarations->put (called, e);
+	}
+    }
+
+  for (unsigned i = 0; i < hsa_cfun->m_called_internal_fns.length (); i++)
+    {
+      hsa_internal_fn *called = hsa_cfun->m_called_internal_fns[i];
+      emit_internal_fn_decl (called);
+    }
+
+  ptr_to_fndir = emit_function_directives (hsa_cfun, false);
+  for (insn = hsa_bb_for_bb (ENTRY_BLOCK_PTR_FOR_FN (cfun))->m_first_insn;
+       insn;
+       insn = insn->m_next)
+    emit_insn (insn);
+  prev_bb = ENTRY_BLOCK_PTR_FOR_FN (cfun);
+  FOR_EACH_BB_FN (bb, cfun)
+    {
+      perhaps_emit_branch (prev_bb, bb);
+      emit_bb_label_directive (hsa_bb_for_bb (bb));
+      for (insn = hsa_bb_for_bb (bb)->m_first_insn; insn; insn = insn->m_next)
+	emit_insn (insn);
+      prev_bb = bb;
+    }
+  perhaps_emit_branch (prev_bb, NULL);
+  ptr_to_fndir->nextModuleEntry = brig_code.total_size;
+
+  /* Fill up label references for all sbr instructions.  */
+  if (switch_instructions)
+    {
+      for (unsigned i = 0; i < switch_instructions->length (); i++)
+	{
+	  hsa_insn_sbr *sbr = (*switch_instructions)[i];
+	  for (unsigned j = 0; j < sbr->m_jump_table.length (); j++)
+	    {
+	      hsa_bb *hbb = hsa_bb_for_bb (sbr->m_jump_table[j]);
+	      sbr->m_label_code_list->m_offsets[j] =
+		hbb->m_label_ref.m_directive_offset;
+	    }
+	}
+
+      switch_instructions->release ();
+      delete switch_instructions;
+      switch_instructions = NULL;
+    }
+
+  if (dump_file)
+    {
+      fprintf (dump_file, "------- After BRIG emission: -------\n");
+      dump_hsa_cfun (dump_file);
+    }
+
+  emit_queued_operands ();
+}
+
+/* Emit all OMP symbols related to OMP.  */
+
+void
+hsa_brig_emit_omp_symbols (void)
+{
+  brig_init ();
+  emit_directive_variable (hsa_num_threads);
+}
+
+/* Unit constructor and destructor statements.  */
+
+static GTY(()) tree hsa_ctor_statements;
+static GTY(()) tree hsa_dtor_statements;
+
+/* Create and return __hsa_global_variables symbol that contains
+   all informations consumed by libgomp to link global variables
+   with their string names used by an HSA kernel.  */
+
+static tree
+hsa_output_global_variables ()
+{
+  unsigned l = hsa_global_variable_symbols->elements ();
+
+  tree variable_info_type = make_node (RECORD_TYPE);
+  tree id_f1 = build_decl (BUILTINS_LOCATION, FIELD_DECL,
+			   get_identifier ("name"), ptr_type_node);
+  DECL_CHAIN (id_f1) = NULL_TREE;
+  tree id_f2 = build_decl (BUILTINS_LOCATION, FIELD_DECL,
+			   get_identifier ("omp_data_size"),
+			   ptr_type_node);
+  DECL_CHAIN (id_f2) = id_f1;
+  finish_builtin_struct (variable_info_type, "__hsa_variable_info", id_f2,
+			 NULL_TREE);
+
+  tree int_num_of_global_vars;
+  int_num_of_global_vars = build_int_cst (uint32_type_node, l);
+  tree global_vars_num_index_type = build_index_type (int_num_of_global_vars);
+  tree global_vars_array_type = build_array_type (variable_info_type,
+						  global_vars_num_index_type);
+  TYPE_ARTIFICIAL (global_vars_array_type) = 1;
+
+  vec<constructor_elt, va_gc> *global_vars_vec = NULL;
+
+  for (hash_table <hsa_noop_symbol_hasher>::iterator it
+       = hsa_global_variable_symbols->begin ();
+       it != hsa_global_variable_symbols->end (); ++it)
+    {
+      unsigned len = strlen ((*it)->m_name);
+      char *copy = XNEWVEC (char, len + 2);
+      copy[0] = '&';
+      memcpy (copy + 1, (*it)->m_name, len);
+      copy[len + 1] = '\0';
+      len++;
+      hsa_sanitize_name (copy);
+
+      tree var_name = build_string (len, copy);
+      TREE_TYPE (var_name) = build_array_type
+	(char_type_node, build_index_type (size_int (len)));
+      free (copy);
+
+      vec<constructor_elt, va_gc> *variable_info_vec = NULL;
+      CONSTRUCTOR_APPEND_ELT (variable_info_vec, NULL_TREE,
+			      build1 (ADDR_EXPR,
+				      build_pointer_type (TREE_TYPE (var_name)),
+				      var_name));
+      CONSTRUCTOR_APPEND_ELT (variable_info_vec, NULL_TREE,
+			      build_fold_addr_expr ((*it)->m_decl));
+
+      tree variable_info_ctor = build_constructor (variable_info_type,
+						   variable_info_vec);
+
+      CONSTRUCTOR_APPEND_ELT (global_vars_vec, NULL_TREE,
+			      variable_info_ctor);
+    }
+
+  tree global_vars_ctor = build_constructor (global_vars_array_type,
+					     global_vars_vec);
+
+  char tmp_name[64];
+  ASM_GENERATE_INTERNAL_LABEL (tmp_name, "__hsa_global_variables", 1);
+  tree global_vars_table = build_decl (UNKNOWN_LOCATION, VAR_DECL,
+					   get_identifier (tmp_name),
+					   global_vars_array_type);
+  TREE_STATIC (global_vars_table) = 1;
+  TREE_READONLY (global_vars_table) = 1;
+  TREE_PUBLIC (global_vars_table) = 0;
+  DECL_ARTIFICIAL (global_vars_table) = 1;
+  DECL_IGNORED_P (global_vars_table) = 1;
+  DECL_EXTERNAL (global_vars_table) = 0;
+  TREE_CONSTANT (global_vars_table) = 1;
+  DECL_INITIAL (global_vars_table) = global_vars_ctor;
+  varpool_node::finalize_decl (global_vars_table);
+
+  return global_vars_table;
+}
+
+/* Create __hsa_host_functions and __hsa_kernels that contain
+   all informations consumed by libgomp to register all kernels
+   in the BRIG binary.  */
+
+static void
+hsa_output_kernels (tree *host_func_table, tree *kernels)
+{
+  unsigned map_count = hsa_get_number_decl_kernel_mappings ();
+
+  tree int_num_of_kernels;
+  int_num_of_kernels = build_int_cst (uint32_type_node, map_count);
+  tree kernel_num_index_type = build_index_type (int_num_of_kernels);
+  tree host_functions_array_type = build_array_type (ptr_type_node,
+						     kernel_num_index_type);
+  TYPE_ARTIFICIAL (host_functions_array_type) = 1;
+
+  vec<constructor_elt, va_gc> *host_functions_vec = NULL;
+  for (unsigned i = 0; i < map_count; ++i)
+    {
+      tree decl = hsa_get_decl_kernel_mapping_decl (i);
+      CONSTRUCTOR_APPEND_ELT
+	(host_functions_vec, NULL_TREE,
+	 build_fold_addr_expr (hsa_get_host_function (decl)));
+    }
+  tree host_functions_ctor = build_constructor (host_functions_array_type,
+						host_functions_vec);
+  char tmp_name[64];
+  ASM_GENERATE_INTERNAL_LABEL (tmp_name, "__hsa_host_functions", 1);
+  tree hsa_host_func_table = build_decl (UNKNOWN_LOCATION, VAR_DECL,
+					 get_identifier (tmp_name),
+					 host_functions_array_type);
+  TREE_STATIC (hsa_host_func_table) = 1;
+  TREE_READONLY (hsa_host_func_table) = 1;
+  TREE_PUBLIC (hsa_host_func_table) = 0;
+  DECL_ARTIFICIAL (hsa_host_func_table) = 1;
+  DECL_IGNORED_P (hsa_host_func_table) = 1;
+  DECL_EXTERNAL (hsa_host_func_table) = 0;
+  TREE_CONSTANT (hsa_host_func_table) = 1;
+  DECL_INITIAL (hsa_host_func_table) = host_functions_ctor;
+  varpool_node::finalize_decl (hsa_host_func_table);
+  *host_func_table = hsa_host_func_table;
+
+  /* Following code emits list of kernel_info structures.  */
+
+  tree kernel_info_type = make_node (RECORD_TYPE);
+  tree id_f1 = build_decl (BUILTINS_LOCATION, FIELD_DECL,
+			   get_identifier ("name"), ptr_type_node);
+  DECL_CHAIN (id_f1) = NULL_TREE;
+  tree id_f2 = build_decl (BUILTINS_LOCATION, FIELD_DECL,
+			   get_identifier ("omp_data_size"),
+			   unsigned_type_node);
+  DECL_CHAIN (id_f2) = id_f1;
+  tree id_f3 = build_decl (BUILTINS_LOCATION, FIELD_DECL,
+			   get_identifier ("gridified_kernel_p"),
+			   boolean_type_node);
+  DECL_CHAIN (id_f3) = id_f2;
+  tree id_f4 = build_decl (BUILTINS_LOCATION, FIELD_DECL,
+			   get_identifier ("kernel_dependencies_count"),
+			   unsigned_type_node);
+  DECL_CHAIN (id_f4) = id_f3;
+  tree id_f5 = build_decl (BUILTINS_LOCATION, FIELD_DECL,
+			   get_identifier ("kernel_dependencies"),
+			   build_pointer_type (build_pointer_type
+					       (char_type_node)));
+  DECL_CHAIN (id_f5) = id_f4;
+  finish_builtin_struct (kernel_info_type, "__hsa_kernel_info", id_f5,
+			 NULL_TREE);
+
+  int_num_of_kernels = build_int_cstu (uint32_type_node, map_count);
+  tree kernel_info_vector_type = build_array_type
+    (kernel_info_type, build_index_type (int_num_of_kernels));
+  TYPE_ARTIFICIAL (kernel_info_vector_type) = 1;
+
+  vec<constructor_elt, va_gc> *kernel_info_vector_vec = NULL;
+  tree kernel_dependencies_vector_type = NULL;
+
+  for (unsigned i = 0; i < map_count; ++i)
+    {
+      tree kernel = hsa_get_decl_kernel_mapping_decl (i);
+      char *name = hsa_get_decl_kernel_mapping_name (i);
+      unsigned len = strlen (name);
+      char *copy = XNEWVEC (char, len + 2);
+      copy[0] = '&';
+      memcpy (copy + 1, name, len);
+      copy[len + 1] = '\0';
+      len++;
+
+      tree kern_name = build_string (len, copy);
+      TREE_TYPE (kern_name) = build_array_type
+	(char_type_node, build_index_type (size_int (len)));
+      free (copy);
+
+      unsigned omp_size = hsa_get_decl_kernel_mapping_omp_size (i);
+      tree omp_data_size = build_int_cstu (unsigned_type_node, omp_size);
+      bool gridified_kernel_p = hsa_get_decl_kernel_mapping_gridified (i);
+      tree gridified_kernel_p_tree = build_int_cstu (boolean_type_node,
+						     gridified_kernel_p);
+      unsigned count = 0;
+
+      kernel_dependencies_vector_type = build_array_type
+	(build_pointer_type (char_type_node),
+	 build_index_type (size_int (0)));
+
+      vec<constructor_elt, va_gc> *kernel_dependencies_vec = NULL;
+      if (hsa_decl_kernel_dependencies)
+	{
+	  vec<const char *> **slot;
+	  slot = hsa_decl_kernel_dependencies->get (kernel);
+	  if (slot)
+	    {
+	      vec <const char *> *dependencies = *slot;
+	      count = dependencies->length ();
+
+	      kernel_dependencies_vector_type = build_array_type
+		(build_pointer_type (char_type_node),
+		 build_index_type (size_int (count)));
+	      TYPE_ARTIFICIAL (kernel_dependencies_vector_type) = 1;
+
+	      for (unsigned j = 0; j < count; j++)
+		{
+		  const char *d = (*dependencies)[j];
+		  len = strlen (d);
+		  tree dependency_name = build_string (len, d);
+		  TREE_TYPE (dependency_name) = build_array_type
+		    (char_type_node, build_index_type (size_int (len)));
+
+		  CONSTRUCTOR_APPEND_ELT
+		    (kernel_dependencies_vec, NULL_TREE,
+		     build1 (ADDR_EXPR,
+			     build_pointer_type (TREE_TYPE (dependency_name)),
+			     dependency_name));
+		}
+	    }
+	}
+
+      tree dependencies_count = build_int_cstu (unsigned_type_node, count);
+
+      vec<constructor_elt, va_gc> *kernel_info_vec = NULL;
+      CONSTRUCTOR_APPEND_ELT (kernel_info_vec, NULL_TREE,
+			      build1 (ADDR_EXPR,
+				      build_pointer_type (TREE_TYPE
+							  (kern_name)),
+				      kern_name));
+      CONSTRUCTOR_APPEND_ELT (kernel_info_vec, NULL_TREE, omp_data_size);
+      CONSTRUCTOR_APPEND_ELT (kernel_info_vec, NULL_TREE,
+			      gridified_kernel_p_tree);
+      CONSTRUCTOR_APPEND_ELT (kernel_info_vec, NULL_TREE, dependencies_count);
+
+      if (count > 0)
+	{
+	  ASM_GENERATE_INTERNAL_LABEL (tmp_name, "__hsa_dependencies_list", i);
+	  tree dependencies_list = build_decl (UNKNOWN_LOCATION, VAR_DECL,
+					       get_identifier (tmp_name),
+					       kernel_dependencies_vector_type);
+
+	  TREE_STATIC (dependencies_list) = 1;
+	  TREE_READONLY (dependencies_list) = 1;
+	  TREE_PUBLIC (dependencies_list) = 0;
+	  DECL_ARTIFICIAL (dependencies_list) = 1;
+	  DECL_IGNORED_P (dependencies_list) = 1;
+	  DECL_EXTERNAL (dependencies_list) = 0;
+	  TREE_CONSTANT (dependencies_list) = 1;
+	  DECL_INITIAL (dependencies_list) = build_constructor
+	    (kernel_dependencies_vector_type, kernel_dependencies_vec);
+	  varpool_node::finalize_decl (dependencies_list);
+
+	  CONSTRUCTOR_APPEND_ELT (kernel_info_vec, NULL_TREE,
+				  build1 (ADDR_EXPR,
+					  build_pointer_type
+					    (TREE_TYPE (dependencies_list)),
+					  dependencies_list));
+	}
+      else
+	CONSTRUCTOR_APPEND_ELT (kernel_info_vec, NULL_TREE, null_pointer_node);
+
+      tree kernel_info_ctor = build_constructor (kernel_info_type,
+						 kernel_info_vec);
+
+      CONSTRUCTOR_APPEND_ELT (kernel_info_vector_vec, NULL_TREE,
+			      kernel_info_ctor);
+    }
+
+  ASM_GENERATE_INTERNAL_LABEL (tmp_name, "__hsa_kernels", 1);
+  tree hsa_kernels = build_decl (UNKNOWN_LOCATION, VAR_DECL,
+				 get_identifier (tmp_name),
+				 kernel_info_vector_type);
+
+  TREE_STATIC (hsa_kernels) = 1;
+  TREE_READONLY (hsa_kernels) = 1;
+  TREE_PUBLIC (hsa_kernels) = 0;
+  DECL_ARTIFICIAL (hsa_kernels) = 1;
+  DECL_IGNORED_P (hsa_kernels) = 1;
+  DECL_EXTERNAL (hsa_kernels) = 0;
+  TREE_CONSTANT (hsa_kernels) = 1;
+  DECL_INITIAL (hsa_kernels) = build_constructor (kernel_info_vector_type,
+						  kernel_info_vector_vec);
+  varpool_node::finalize_decl (hsa_kernels);
+  *kernels = hsa_kernels;
+}
+
+/* Create a static constructor that will register out brig stuff with
+   libgomp.  */
+
+static void
+hsa_output_libgomp_mapping (tree brig_decl)
+{
+  unsigned kernel_count = hsa_get_number_decl_kernel_mappings ();
+  unsigned global_variable_count = hsa_global_variable_symbols->elements ();
+
+  tree kernels;
+  tree host_func_table;
+
+  hsa_output_kernels (&host_func_table, &kernels);
+  tree global_vars = hsa_output_global_variables ();
+
+  tree hsa_image_desc_type = make_node (RECORD_TYPE);
+  tree id_f1 = build_decl (BUILTINS_LOCATION, FIELD_DECL,
+			   get_identifier ("brig_module"), ptr_type_node);
+  DECL_CHAIN (id_f1) = NULL_TREE;
+  tree id_f2 = build_decl (BUILTINS_LOCATION, FIELD_DECL,
+			   get_identifier ("kernel_count"),
+			   unsigned_type_node);
+
+  DECL_CHAIN (id_f2) = id_f1;
+  tree id_f3 = build_decl (BUILTINS_LOCATION, FIELD_DECL,
+			   get_identifier ("hsa_kernel_infos"),
+			   ptr_type_node);
+  DECL_CHAIN (id_f3) = id_f2;
+  tree id_f4 = build_decl (BUILTINS_LOCATION, FIELD_DECL,
+			   get_identifier ("global_variable_count"),
+			   unsigned_type_node);
+  DECL_CHAIN (id_f4) = id_f3;
+  tree id_f5 = build_decl (BUILTINS_LOCATION, FIELD_DECL,
+			   get_identifier ("hsa_global_variable_infos"),
+			   ptr_type_node);
+  DECL_CHAIN (id_f5) = id_f4;
+  finish_builtin_struct (hsa_image_desc_type, "__hsa_image_desc", id_f5,
+			 NULL_TREE);
+  TYPE_ARTIFICIAL (hsa_image_desc_type) = 1;
+
+  vec<constructor_elt, va_gc> *img_desc_vec = NULL;
+  CONSTRUCTOR_APPEND_ELT (img_desc_vec, NULL_TREE,
+			  build_fold_addr_expr (brig_decl));
+  CONSTRUCTOR_APPEND_ELT (img_desc_vec, NULL_TREE,
+			  build_int_cstu (unsigned_type_node, kernel_count));
+  CONSTRUCTOR_APPEND_ELT (img_desc_vec, NULL_TREE,
+			  build1 (ADDR_EXPR,
+				  build_pointer_type (TREE_TYPE (kernels)),
+				  kernels));
+  CONSTRUCTOR_APPEND_ELT (img_desc_vec, NULL_TREE,
+			  build_int_cstu (unsigned_type_node,
+					  global_variable_count));
+  CONSTRUCTOR_APPEND_ELT (img_desc_vec, NULL_TREE,
+			  build1 (ADDR_EXPR,
+				  build_pointer_type (TREE_TYPE (global_vars)),
+				  global_vars));
+
+  tree img_desc_ctor = build_constructor (hsa_image_desc_type, img_desc_vec);
+
+  char tmp_name[64];
+  ASM_GENERATE_INTERNAL_LABEL (tmp_name, "__hsa_img_descriptor", 1);
+  tree hsa_img_descriptor = build_decl (UNKNOWN_LOCATION, VAR_DECL,
+					get_identifier (tmp_name),
+					hsa_image_desc_type);
+  TREE_STATIC (hsa_img_descriptor) = 1;
+  TREE_READONLY (hsa_img_descriptor) = 1;
+  TREE_PUBLIC (hsa_img_descriptor) = 0;
+  DECL_ARTIFICIAL (hsa_img_descriptor) = 1;
+  DECL_IGNORED_P (hsa_img_descriptor) = 1;
+  DECL_EXTERNAL (hsa_img_descriptor) = 0;
+  TREE_CONSTANT (hsa_img_descriptor) = 1;
+  DECL_INITIAL (hsa_img_descriptor) = img_desc_ctor;
+  varpool_node::finalize_decl (hsa_img_descriptor);
+
+  /* Construct the "host_table" libgomp expects. */
+  tree libgomp_host_table_type = build_array_type (ptr_type_node,
+						   build_index_type
+						   (build_int_cst
+						    (integer_type_node, 4)));
+  TYPE_ARTIFICIAL (libgomp_host_table_type) = 1;
+  vec<constructor_elt, va_gc> *libgomp_host_table_vec = NULL;
+  tree host_func_table_addr = build_fold_addr_expr (host_func_table);
+  CONSTRUCTOR_APPEND_ELT (libgomp_host_table_vec, NULL_TREE,
+			  host_func_table_addr);
+  offset_int func_table_size = wi::to_offset (TYPE_SIZE_UNIT (ptr_type_node))
+    * kernel_count;
+  CONSTRUCTOR_APPEND_ELT (libgomp_host_table_vec, NULL_TREE,
+			  fold_build2 (POINTER_PLUS_EXPR,
+				       TREE_TYPE (host_func_table_addr),
+				       host_func_table_addr,
+				       build_int_cst (size_type_node,
+						      func_table_size.to_uhwi
+						      ())));
+  CONSTRUCTOR_APPEND_ELT (libgomp_host_table_vec, NULL_TREE, null_pointer_node);
+  CONSTRUCTOR_APPEND_ELT (libgomp_host_table_vec, NULL_TREE, null_pointer_node);
+  tree libgomp_host_table_ctor = build_constructor (libgomp_host_table_type,
+						    libgomp_host_table_vec);
+  ASM_GENERATE_INTERNAL_LABEL (tmp_name, "__hsa_libgomp_host_table", 1);
+  tree hsa_libgomp_host_table = build_decl (UNKNOWN_LOCATION, VAR_DECL,
+					    get_identifier (tmp_name),
+					    libgomp_host_table_type);
+
+  TREE_STATIC (hsa_libgomp_host_table) = 1;
+  TREE_READONLY (hsa_libgomp_host_table) = 1;
+  TREE_PUBLIC (hsa_libgomp_host_table) = 0;
+  DECL_ARTIFICIAL (hsa_libgomp_host_table) = 1;
+  DECL_IGNORED_P (hsa_libgomp_host_table) = 1;
+  DECL_EXTERNAL (hsa_libgomp_host_table) = 0;
+  TREE_CONSTANT (hsa_libgomp_host_table) = 1;
+  DECL_INITIAL (hsa_libgomp_host_table) = libgomp_host_table_ctor;
+  varpool_node::finalize_decl (hsa_libgomp_host_table);
+
+  /* Generate an initializer with a call to the registration routine.  */
+
+  tree offload_register = builtin_decl_explicit
+    (BUILT_IN_GOMP_OFFLOAD_REGISTER);
+  gcc_checking_assert (offload_register);
+
+  append_to_statement_list
+    (build_call_expr (offload_register, 4,
+		      build_int_cstu (unsigned_type_node,
+				      GOMP_VERSION_PACK (GOMP_VERSION,
+							 GOMP_VERSION_HSA)),
+		      build_fold_addr_expr (hsa_libgomp_host_table),
+		      build_int_cst (integer_type_node, GOMP_DEVICE_HSA),
+		      build_fold_addr_expr (hsa_img_descriptor)),
+     &hsa_ctor_statements);
+
+  cgraph_build_static_cdtor ('I', hsa_ctor_statements, DEFAULT_INIT_PRIORITY);
+
+  tree offload_unregister = builtin_decl_explicit
+    (BUILT_IN_GOMP_OFFLOAD_UNREGISTER);
+  gcc_checking_assert (offload_unregister);
+
+  append_to_statement_list
+    (build_call_expr (offload_unregister, 4,
+		      build_int_cstu (unsigned_type_node,
+				      GOMP_VERSION_PACK (GOMP_VERSION,
+							 GOMP_VERSION_HSA)),
+		      build_fold_addr_expr (hsa_libgomp_host_table),
+		      build_int_cst (integer_type_node, GOMP_DEVICE_HSA),
+		      build_fold_addr_expr (hsa_img_descriptor)),
+     &hsa_dtor_statements);
+  cgraph_build_static_cdtor ('D', hsa_dtor_statements, DEFAULT_INIT_PRIORITY);
+}
+
+/* Required HSA section alignment. */
+
+#define HSA_SECTION_ALIGNMENT 16
+
+/* Emit the brig module we have compiled to a section in the final assembly and
+   also create a compile unit static constructor that will register the brig
+   module with libgomp.  */
+
+void
+hsa_output_brig (void)
+{
+  section *saved_section;
+
+  if (!brig_initialized)
+    return;
+
+  for (unsigned i = 0; i < function_call_linkage.length (); i++)
+    {
+      function_linkage_pair p = function_call_linkage[i];
+
+      BrigCodeOffset32_t *func_offset = function_offsets->get (p.function_decl);
+      gcc_assert (*func_offset);
+      BrigOperandCodeRef *code_ref = (BrigOperandCodeRef *)
+	(brig_operand.get_ptr_by_offset (p.offset));
+      gcc_assert (code_ref->base.kind == BRIG_KIND_OPERAND_CODE_REF);
+      code_ref->ref = htole32 (*func_offset);
+    }
+
+  /* Iterate all function declarations and if we meet a function that should
+     have module linkage and we are unable to emit HSAIL for the function,
+     then change the linkage to program linkage.  Doing so, we will emit
+     a valid BRIG image.  */
+  if (hsa_failed_functions != NULL && emitted_declarations != NULL)
+    for (hash_map <tree, BrigDirectiveExecutable *>::iterator it =
+	 emitted_declarations->begin (); it != emitted_declarations->end ();
+	 ++it)
+      {
+	if (hsa_failed_functions->contains ((*it).first))
+	  (*it).second->linkage = BRIG_LINKAGE_PROGRAM;
+      }
+
+  saved_section = in_section;
+
+  switch_to_section (get_section (BRIG_ELF_SECTION_NAME, SECTION_NOTYPE, NULL));
+  char tmp_name[64];
+  ASM_GENERATE_INTERNAL_LABEL (tmp_name, BRIG_LABEL_STRING, 1);
+  ASM_OUTPUT_LABEL (asm_out_file, tmp_name);
+  tree brig_id = get_identifier (tmp_name);
+  tree brig_decl = build_decl (UNKNOWN_LOCATION, VAR_DECL, brig_id,
+			       char_type_node);
+  SET_DECL_ASSEMBLER_NAME (brig_decl, brig_id);
+  TREE_ADDRESSABLE (brig_decl) = 1;
+  TREE_READONLY (brig_decl) = 1;
+  DECL_ARTIFICIAL (brig_decl) = 1;
+  DECL_IGNORED_P (brig_decl) = 1;
+  TREE_STATIC (brig_decl) = 1;
+  TREE_PUBLIC (brig_decl) = 0;
+  TREE_USED (brig_decl) = 1;
+  DECL_INITIAL (brig_decl) = brig_decl;
+  TREE_ASM_WRITTEN (brig_decl) = 1;
+
+  BrigModuleHeader module_header;
+  memcpy (&module_header.identification, "HSA BRIG",
+	  sizeof(module_header.identification));
+  module_header.brigMajor = htole32 (BRIG_VERSION_BRIG_MAJOR);
+  module_header.brigMinor = htole32 (BRIG_VERSION_BRIG_MINOR);
+  uint64_t section_index[3];
+
+  int data_padding, code_padding, operand_padding;
+  data_padding = HSA_SECTION_ALIGNMENT
+    - brig_data.total_size % HSA_SECTION_ALIGNMENT;
+  code_padding = HSA_SECTION_ALIGNMENT
+    - brig_code.total_size % HSA_SECTION_ALIGNMENT;
+  operand_padding = HSA_SECTION_ALIGNMENT
+    - brig_operand.total_size % HSA_SECTION_ALIGNMENT;
+
+  uint64_t module_size = sizeof (module_header) + sizeof (section_index)
+    + brig_data.total_size + data_padding
+    + brig_code.total_size + code_padding
+    + brig_operand.total_size + operand_padding;
+  gcc_assert ((module_size % 16) == 0);
+  module_header.byteCount = htole64 (module_size);
+  memset (&module_header.hash, 0, sizeof (module_header.hash));
+  module_header.reserved = 0;
+  module_header.sectionCount = htole32 (3);
+  module_header.sectionIndex = htole64 (sizeof (module_header));
+  assemble_string ((const char *) &module_header, sizeof(module_header));
+  uint64_t off = sizeof (module_header) + sizeof (section_index);
+  section_index[0] = htole64 (off);
+  off += brig_data.total_size + data_padding;
+  section_index[1] = htole64 (off);
+  off += brig_code.total_size + code_padding;
+  section_index[2] = htole64 (off);
+  assemble_string ((const char *) &section_index, sizeof (section_index));
+
+  char padding[HSA_SECTION_ALIGNMENT];
+  memset (padding, 0, sizeof(padding));
+
+  brig_data.output ();
+  assemble_string (padding, data_padding);
+  brig_code.output ();
+  assemble_string (padding, code_padding);
+  brig_operand.output ();
+  assemble_string (padding, operand_padding);
+
+  if (saved_section)
+    switch_to_section (saved_section);
+
+  hsa_output_libgomp_mapping (brig_decl);
+
+  hsa_free_decl_kernel_mapping ();
+  brig_release_data ();
+  hsa_deinit_compilation_unit_data ();
+
+  delete emitted_declarations;
+  emitted_declarations = NULL;
+  delete function_offsets;
+  function_offsets = NULL;
+}
diff --git a/gcc/hsa-dump.c b/gcc/hsa-dump.c
new file mode 100644
index 0000000..aee525e
--- /dev/null
+++ b/gcc/hsa-dump.c
@@ -0,0 +1,1188 @@
+/* Infrastructure to dump our HSAIL IL
+   Copyright (C) 2013-2016 Free Software Foundation, Inc.
+   Contributed by Martin Jambor <mjambor@suse.cz> and
+   Martin Liska <mliska@suse.cz>.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "is-a.h"
+#include "vec.h"
+#include "tree.h"
+#include "cfg.h"
+#include "function.h"
+#include "dumpfile.h"
+#include "gimple-pretty-print.h"
+#include "cgraph.h"
+#include "print-tree.h"
+#include "symbol-summary.h"
+#include "hsa.h"
+
+/* Return textual name of TYPE.  */
+
+static const char *
+hsa_type_name (BrigType16_t type)
+{
+  switch (type)
+    {
+    case BRIG_TYPE_NONE:
+      return "none";
+    case BRIG_TYPE_U8:
+      return "u8";
+    case BRIG_TYPE_U16:
+      return "u16";
+    case BRIG_TYPE_U32:
+      return "u32";
+    case BRIG_TYPE_U64:
+      return "u64";
+    case BRIG_TYPE_S8:
+      return "s8";
+    case BRIG_TYPE_S16:
+      return "s16";
+    case BRIG_TYPE_S32:
+      return "s32";
+    case BRIG_TYPE_S64:
+      return "s64";
+    case BRIG_TYPE_F16:
+      return "f16";
+    case BRIG_TYPE_F32:
+      return "f32";
+    case BRIG_TYPE_F64:
+      return "f64";
+    case BRIG_TYPE_B1:
+      return "b1";
+    case BRIG_TYPE_B8:
+      return "b8";
+    case BRIG_TYPE_B16:
+      return "b16";
+    case BRIG_TYPE_B32:
+      return "b32";
+    case BRIG_TYPE_B64:
+      return "b64";
+    case BRIG_TYPE_B128:
+      return "b128";
+    case BRIG_TYPE_SAMP:
+      return "samp";
+    case BRIG_TYPE_ROIMG:
+      return "roimg";
+    case BRIG_TYPE_WOIMG:
+      return "woimg";
+    case BRIG_TYPE_RWIMG:
+      return "rwimg";
+    case BRIG_TYPE_SIG32:
+      return "sig32";
+    case BRIG_TYPE_SIG64:
+      return "sig64";
+    case BRIG_TYPE_U8X4:
+      return "u8x4";
+    case BRIG_TYPE_U8X8:
+      return "u8x8";
+    case BRIG_TYPE_U8X16:
+      return "u8x16";
+    case BRIG_TYPE_U16X2:
+      return "u16x2";
+    case BRIG_TYPE_U16X4:
+      return "u16x4";
+    case BRIG_TYPE_U16X8:
+      return "u16x8";
+    case BRIG_TYPE_U32X2:
+      return "u32x2";
+    case BRIG_TYPE_U32X4:
+      return "u32x4";
+    case BRIG_TYPE_U64X2:
+      return "u64x2";
+    case BRIG_TYPE_S8X4:
+      return "s8x4";
+    case BRIG_TYPE_S8X8:
+      return "s8x8";
+    case BRIG_TYPE_S8X16:
+      return "s8x16";
+    case BRIG_TYPE_S16X2:
+      return "s16x2";
+    case BRIG_TYPE_S16X4:
+      return "s16x4";
+    case BRIG_TYPE_S16X8:
+      return "s16x8";
+    case BRIG_TYPE_S32X2:
+      return "s32x2";
+    case BRIG_TYPE_S32X4:
+      return "s32x4";
+    case BRIG_TYPE_S64X2:
+      return "s64x2";
+    case BRIG_TYPE_F16X2:
+      return "f16x2";
+    case BRIG_TYPE_F16X4:
+      return "f16x4";
+    case BRIG_TYPE_F16X8:
+      return "f16x8";
+    case BRIG_TYPE_F32X2:
+      return "f32x2";
+    case BRIG_TYPE_F32X4:
+      return "f32x4";
+    case BRIG_TYPE_F64X2:
+      return "f64x2";
+    default:
+      return "UNKNOWN_TYPE";
+    }
+}
+
+/* Return textual name of OPCODE.  */
+
+static const char *
+hsa_opcode_name (BrigOpcode16_t opcode)
+{
+  switch (opcode)
+    {
+    case BRIG_OPCODE_NOP:
+      return "nop";
+    case BRIG_OPCODE_ABS:
+      return "abs";
+    case BRIG_OPCODE_ADD:
+      return "add";
+    case BRIG_OPCODE_BORROW:
+      return "borrow";
+    case BRIG_OPCODE_CARRY:
+      return "carry";
+    case BRIG_OPCODE_CEIL:
+      return "ceil";
+    case BRIG_OPCODE_COPYSIGN:
+      return "copysign";
+    case BRIG_OPCODE_DIV:
+      return "div";
+    case BRIG_OPCODE_FLOOR:
+      return "floor";
+    case BRIG_OPCODE_FMA:
+      return "fma";
+    case BRIG_OPCODE_FRACT:
+      return "fract";
+    case BRIG_OPCODE_MAD:
+      return "mad";
+    case BRIG_OPCODE_MAX:
+      return "max";
+    case BRIG_OPCODE_MIN:
+      return "min";
+    case BRIG_OPCODE_MUL:
+      return "mul";
+    case BRIG_OPCODE_MULHI:
+      return "mulhi";
+    case BRIG_OPCODE_NEG:
+      return "neg";
+    case BRIG_OPCODE_REM:
+      return "rem";
+    case BRIG_OPCODE_RINT:
+      return "rint";
+    case BRIG_OPCODE_SQRT:
+      return "sqrt";
+    case BRIG_OPCODE_SUB:
+      return "sub";
+    case BRIG_OPCODE_TRUNC:
+      return "trunc";
+    case BRIG_OPCODE_MAD24:
+      return "mad24";
+    case BRIG_OPCODE_MAD24HI:
+      return "mad24hi";
+    case BRIG_OPCODE_MUL24:
+      return "mul24";
+    case BRIG_OPCODE_MUL24HI:
+      return "mul24hi";
+    case BRIG_OPCODE_SHL:
+      return "shl";
+    case BRIG_OPCODE_SHR:
+      return "shr";
+    case BRIG_OPCODE_AND:
+      return "and";
+    case BRIG_OPCODE_NOT:
+      return "not";
+    case BRIG_OPCODE_OR:
+      return "or";
+    case BRIG_OPCODE_POPCOUNT:
+      return "popcount";
+    case BRIG_OPCODE_XOR:
+      return "xor";
+    case BRIG_OPCODE_BITEXTRACT:
+      return "bitextract";
+    case BRIG_OPCODE_BITINSERT:
+      return "bitinsert";
+    case BRIG_OPCODE_BITMASK:
+      return "bitmask";
+    case BRIG_OPCODE_BITREV:
+      return "bitrev";
+    case BRIG_OPCODE_BITSELECT:
+      return "bitselect";
+    case BRIG_OPCODE_FIRSTBIT:
+      return "firstbit";
+    case BRIG_OPCODE_LASTBIT:
+      return "lastbit";
+    case BRIG_OPCODE_COMBINE:
+      return "combine";
+    case BRIG_OPCODE_EXPAND:
+      return "expand";
+    case BRIG_OPCODE_LDA:
+      return "lda";
+    case BRIG_OPCODE_MOV:
+      return "mov";
+    case BRIG_OPCODE_SHUFFLE:
+      return "shuffle";
+    case BRIG_OPCODE_UNPACKHI:
+      return "unpackhi";
+    case BRIG_OPCODE_UNPACKLO:
+      return "unpacklo";
+    case BRIG_OPCODE_PACK:
+      return "pack";
+    case BRIG_OPCODE_UNPACK:
+      return "unpack";
+    case BRIG_OPCODE_CMOV:
+      return "cmov";
+    case BRIG_OPCODE_CLASS:
+      return "class";
+    case BRIG_OPCODE_NCOS:
+      return "ncos";
+    case BRIG_OPCODE_NEXP2:
+      return "nexp2";
+    case BRIG_OPCODE_NFMA:
+      return "nfma";
+    case BRIG_OPCODE_NLOG2:
+      return "nlog2";
+    case BRIG_OPCODE_NRCP:
+      return "nrcp";
+    case BRIG_OPCODE_NRSQRT:
+      return "nrsqrt";
+    case BRIG_OPCODE_NSIN:
+      return "nsin";
+    case BRIG_OPCODE_NSQRT:
+      return "nsqrt";
+    case BRIG_OPCODE_BITALIGN:
+      return "bitalign";
+    case BRIG_OPCODE_BYTEALIGN:
+      return "bytealign";
+    case BRIG_OPCODE_PACKCVT:
+      return "packcvt";
+    case BRIG_OPCODE_UNPACKCVT:
+      return "unpackcvt";
+    case BRIG_OPCODE_LERP:
+      return "lerp";
+    case BRIG_OPCODE_SAD:
+      return "sad";
+    case BRIG_OPCODE_SADHI:
+      return "sadhi";
+    case BRIG_OPCODE_SEGMENTP:
+      return "segmentp";
+    case BRIG_OPCODE_FTOS:
+      return "ftos";
+    case BRIG_OPCODE_STOF:
+      return "stof";
+    case BRIG_OPCODE_CMP:
+      return "cmp";
+    case BRIG_OPCODE_CVT:
+      return "cvt";
+    case BRIG_OPCODE_LD:
+      return "ld";
+    case BRIG_OPCODE_ST:
+      return "st";
+    case BRIG_OPCODE_ATOMIC:
+      return "atomic";
+    case BRIG_OPCODE_ATOMICNORET:
+      return "atomicnoret";
+    case BRIG_OPCODE_SIGNAL:
+      return "signal";
+    case BRIG_OPCODE_SIGNALNORET:
+      return "signalnoret";
+    case BRIG_OPCODE_MEMFENCE:
+      return "memfence";
+    case BRIG_OPCODE_RDIMAGE:
+      return "rdimage";
+    case BRIG_OPCODE_LDIMAGE:
+      return "ldimage";
+    case BRIG_OPCODE_STIMAGE:
+      return "stimage";
+    case BRIG_OPCODE_QUERYIMAGE:
+      return "queryimage";
+    case BRIG_OPCODE_QUERYSAMPLER:
+      return "querysampler";
+    case BRIG_OPCODE_CBR:
+      return "cbr";
+    case BRIG_OPCODE_BR:
+      return "br";
+    case BRIG_OPCODE_SBR:
+      return "sbr";
+    case BRIG_OPCODE_BARRIER:
+      return "barrier";
+    case BRIG_OPCODE_WAVEBARRIER:
+      return "wavebarrier";
+    case BRIG_OPCODE_ARRIVEFBAR:
+      return "arrivefbar";
+    case BRIG_OPCODE_INITFBAR:
+      return "initfbar";
+    case BRIG_OPCODE_JOINFBAR:
+      return "joinfbar";
+    case BRIG_OPCODE_LEAVEFBAR:
+      return "leavefbar";
+    case BRIG_OPCODE_RELEASEFBAR:
+      return "releasefbar";
+    case BRIG_OPCODE_WAITFBAR:
+      return "waitfbar";
+    case BRIG_OPCODE_LDF:
+      return "ldf";
+    case BRIG_OPCODE_ACTIVELANECOUNT:
+      return "activelanecount";
+    case BRIG_OPCODE_ACTIVELANEID:
+      return "activelaneid";
+    case BRIG_OPCODE_ACTIVELANEMASK:
+      return "activelanemask";
+    case BRIG_OPCODE_CALL:
+      return "call";
+    case BRIG_OPCODE_SCALL:
+      return "scall";
+    case BRIG_OPCODE_ICALL:
+      return "icall";
+    case BRIG_OPCODE_RET:
+      return "ret";
+    case BRIG_OPCODE_ALLOCA:
+      return "alloca";
+    case BRIG_OPCODE_CURRENTWORKGROUPSIZE:
+      return "currentworkgroupsize";
+    case BRIG_OPCODE_DIM:
+      return "dim";
+    case BRIG_OPCODE_GRIDGROUPS:
+      return "gridgroups";
+    case BRIG_OPCODE_GRIDSIZE:
+      return "gridsize";
+    case BRIG_OPCODE_PACKETCOMPLETIONSIG:
+      return "packetcompletionsig";
+    case BRIG_OPCODE_PACKETID:
+      return "packetid";
+    case BRIG_OPCODE_WORKGROUPID:
+      return "workgroupid";
+    case BRIG_OPCODE_WORKGROUPSIZE:
+      return "workgroupsize";
+    case BRIG_OPCODE_WORKITEMABSID:
+      return "workitemabsid";
+    case BRIG_OPCODE_WORKITEMFLATABSID:
+      return "workitemflatabsid";
+    case BRIG_OPCODE_WORKITEMFLATID:
+      return "workitemflatid";
+    case BRIG_OPCODE_WORKITEMID:
+      return "workitemid";
+    case BRIG_OPCODE_CLEARDETECTEXCEPT:
+      return "cleardetectexcept";
+    case BRIG_OPCODE_GETDETECTEXCEPT:
+      return "getdetectexcept";
+    case BRIG_OPCODE_SETDETECTEXCEPT:
+      return "setdetectexcept";
+    case BRIG_OPCODE_ADDQUEUEWRITEINDEX:
+      return "addqueuewriteindex";
+    case BRIG_OPCODE_CASQUEUEWRITEINDEX:
+      return "casqueuewriteindex";
+    case BRIG_OPCODE_LDQUEUEREADINDEX:
+      return "ldqueuereadindex";
+    case BRIG_OPCODE_LDQUEUEWRITEINDEX:
+      return "ldqueuewriteindex";
+    case BRIG_OPCODE_STQUEUEREADINDEX:
+      return "stqueuereadindex";
+    case BRIG_OPCODE_STQUEUEWRITEINDEX:
+      return "stqueuewriteindex";
+    case BRIG_OPCODE_CLOCK:
+      return "clock";
+    case BRIG_OPCODE_CUID:
+      return "cuid";
+    case BRIG_OPCODE_DEBUGTRAP:
+      return "debugtrap";
+    case BRIG_OPCODE_GROUPBASEPTR:
+      return "groupbaseptr";
+    case BRIG_OPCODE_KERNARGBASEPTR:
+      return "kernargbaseptr";
+    case BRIG_OPCODE_LANEID:
+      return "laneid";
+    case BRIG_OPCODE_MAXCUID:
+      return "maxcuid";
+    case BRIG_OPCODE_MAXWAVEID:
+      return "maxwaveid";
+    case BRIG_OPCODE_NULLPTR:
+      return "nullptr";
+    case BRIG_OPCODE_WAVEID:
+      return "waveid";
+    default:
+      return "UNKNOWN_OPCODE";
+    }
+}
+
+/* Return textual name of SEG.  */
+
+const char *
+hsa_seg_name (BrigSegment8_t seg)
+{
+  switch (seg)
+    {
+    case BRIG_SEGMENT_NONE:
+      return "none";
+    case BRIG_SEGMENT_FLAT:
+      return "flat";
+    case BRIG_SEGMENT_GLOBAL:
+      return "global";
+    case BRIG_SEGMENT_READONLY:
+      return "readonly";
+    case BRIG_SEGMENT_KERNARG:
+      return "kernarg";
+    case BRIG_SEGMENT_GROUP:
+      return "group";
+    case BRIG_SEGMENT_PRIVATE:
+      return "private";
+    case BRIG_SEGMENT_SPILL:
+      return "spill";
+    case BRIG_SEGMENT_ARG:
+      return "arg";
+    default:
+      return "UNKNOWN_SEGMENT";
+    }
+}
+
+/* Return textual name of CMPOP.  */
+
+static const char *
+hsa_cmpop_name (BrigCompareOperation8_t cmpop)
+{
+  switch (cmpop)
+    {
+    case BRIG_COMPARE_EQ:
+      return "eq";
+    case BRIG_COMPARE_NE:
+      return "ne";
+    case BRIG_COMPARE_LT:
+      return "lt";
+    case BRIG_COMPARE_LE:
+      return "le";
+    case BRIG_COMPARE_GT:
+      return "gt";
+    case BRIG_COMPARE_GE:
+      return "ge";
+    case BRIG_COMPARE_EQU:
+      return "equ";
+    case BRIG_COMPARE_NEU:
+      return "neu";
+    case BRIG_COMPARE_LTU:
+      return "ltu";
+    case BRIG_COMPARE_LEU:
+      return "leu";
+    case BRIG_COMPARE_GTU:
+      return "gtu";
+    case BRIG_COMPARE_GEU:
+      return "geu";
+    case BRIG_COMPARE_NUM:
+      return "num";
+    case BRIG_COMPARE_NAN:
+      return "nan";
+    case BRIG_COMPARE_SEQ:
+      return "seq";
+    case BRIG_COMPARE_SNE:
+      return "sne";
+    case BRIG_COMPARE_SLT:
+      return "slt";
+    case BRIG_COMPARE_SLE:
+      return "sle";
+    case BRIG_COMPARE_SGT:
+      return "sgt";
+    case BRIG_COMPARE_SGE:
+      return "sge";
+    case BRIG_COMPARE_SGEU:
+      return "sgeu";
+    case BRIG_COMPARE_SEQU:
+      return "sequ";
+    case BRIG_COMPARE_SNEU:
+      return "sneu";
+    case BRIG_COMPARE_SLTU:
+      return "sltu";
+    case BRIG_COMPARE_SLEU:
+      return "sleu";
+    case BRIG_COMPARE_SNUM:
+      return "snum";
+    case BRIG_COMPARE_SNAN:
+      return "snan";
+    case BRIG_COMPARE_SGTU:
+      return "sgtu";
+    default:
+      return "UNKNOWN_COMPARISON";
+    }
+}
+
+/* Return textual name for memory order.  */
+
+static const char *
+hsa_memsem_name (enum BrigMemoryOrder mo)
+{
+  switch (mo)
+    {
+    case BRIG_MEMORY_ORDER_NONE:
+      return "";
+    case BRIG_MEMORY_ORDER_RELAXED:
+      return "rlx";
+    case BRIG_MEMORY_ORDER_SC_ACQUIRE:
+      return "scacq";
+    case BRIG_MEMORY_ORDER_SC_RELEASE:
+      return "screl";
+    case BRIG_MEMORY_ORDER_SC_ACQUIRE_RELEASE:
+      return "scar";
+    default:
+      return "UNKNOWN_MEMORY_ORDER";
+    }
+}
+
+/* Return textual name for memory scope. */
+
+static const char *
+hsa_memscope_name (enum BrigMemoryScope scope)
+{
+  switch (scope)
+    {
+    case BRIG_MEMORY_SCOPE_NONE:
+      return "";
+    case BRIG_MEMORY_SCOPE_WORKITEM:
+      return "wi";
+    case BRIG_MEMORY_SCOPE_WAVEFRONT:
+      return "wave";
+    case BRIG_MEMORY_SCOPE_WORKGROUP:
+      return "wg";
+    case BRIG_MEMORY_SCOPE_AGENT:
+      return "agent";
+    case BRIG_MEMORY_SCOPE_SYSTEM:
+      return "sys";
+    default:
+      return "UNKNOWN_SCOPE";
+    }
+}
+
+/* Return textual name for atomic operation.  */
+
+static const char *
+hsa_m_atomicop_name (enum BrigAtomicOperation op)
+{
+  switch (op)
+    {
+    case BRIG_ATOMIC_ADD:
+      return "add";
+    case BRIG_ATOMIC_AND:
+      return "and";
+    case BRIG_ATOMIC_CAS:
+      return "cas";
+    case BRIG_ATOMIC_EXCH:
+      return "exch";
+    case BRIG_ATOMIC_LD:
+      return "ld";
+    case BRIG_ATOMIC_MAX:
+      return "max";
+    case BRIG_ATOMIC_MIN:
+      return "min";
+    case BRIG_ATOMIC_OR:
+      return "or";
+    case BRIG_ATOMIC_ST:
+      return "st";
+    case BRIG_ATOMIC_SUB:
+      return "sub";
+    case BRIG_ATOMIC_WRAPDEC:
+      return "wrapdec";
+    case BRIG_ATOMIC_WRAPINC:
+      return "wrapinc";
+    case BRIG_ATOMIC_XOR:
+      return "xor";
+    case BRIG_ATOMIC_WAIT_EQ:
+      return "wait_eq";
+    case BRIG_ATOMIC_WAIT_NE:
+      return "wait_ne";
+    case BRIG_ATOMIC_WAIT_LT:
+      return "wait_lt";
+    case BRIG_ATOMIC_WAIT_GTE:
+      return "wait_gte";
+    case BRIG_ATOMIC_WAITTIMEOUT_EQ:
+      return "waittimeout_eq";
+    case BRIG_ATOMIC_WAITTIMEOUT_NE:
+      return "waittimeout_ne";
+    case BRIG_ATOMIC_WAITTIMEOUT_LT:
+      return "waittimeout_lt";
+    case BRIG_ATOMIC_WAITTIMEOUT_GTE:
+      return "waittimeout_gte";
+    default:
+      return "UNKNOWN_ATOMIC_OP";
+    }
+}
+
+/* Return byte alignment for given BrigAlignment8_t value.  */
+
+static unsigned
+hsa_byte_alignment (BrigAlignment8_t alignment)
+{
+  gcc_assert (alignment != BRIG_ALIGNMENT_NONE);
+
+  return 1 << (alignment - 1);
+}
+
+/* Dump textual representation of HSA IL register REG to file F.  */
+
+static void
+dump_hsa_reg (FILE *f, hsa_op_reg *reg, bool dump_type = false)
+{
+  if (reg->m_reg_class)
+    fprintf (f, "$%c%i", reg->m_reg_class, reg->m_hard_num);
+  else
+    fprintf (f, "$_%i", reg->m_order);
+  if (dump_type)
+    fprintf (f, " (%s)", hsa_type_name (reg->m_type));
+}
+
+/* Dump textual representation of HSA IL immediate operand IMM to file F.  */
+
+static void
+dump_hsa_immed (FILE *f, hsa_op_immed *imm)
+{
+  bool unsigned_int_type = (BRIG_TYPE_U8 | BRIG_TYPE_U16 | BRIG_TYPE_U32
+    | BRIG_TYPE_U64) & imm->m_type;
+
+  if (imm->m_tree_value)
+    print_generic_expr (f, imm->m_tree_value, 0);
+  else
+    {
+      gcc_checking_assert (imm->m_brig_repr_size <= 8);
+
+      if (unsigned_int_type)
+	fprintf (f, HOST_WIDE_INT_PRINT_DEC, imm->m_int_value);
+      else
+	fprintf (f, HOST_WIDE_INT_PRINT_UNSIGNED,
+		 (unsigned HOST_WIDE_INT)imm->m_int_value);
+    }
+
+  fprintf (f, " (%s)", hsa_type_name (imm->m_type));
+}
+
+/* Dump textual representation of HSA IL address operand ADDR to file F.  */
+
+static void
+dump_hsa_address (FILE *f, hsa_op_address *addr)
+{
+  bool sth = false;
+
+  if (addr->m_symbol)
+    {
+      sth = true;
+      if (addr->m_symbol->m_name)
+	fprintf (f, "[%%%s]", addr->m_symbol->m_name);
+      else
+	fprintf (f, "[%%__%s_%i]", hsa_seg_name (addr->m_symbol->m_segment),
+		 addr->m_symbol->m_name_number);
+    }
+
+  if (addr->m_reg)
+    {
+      fprintf (f, "[");
+      dump_hsa_reg (f, addr->m_reg);
+      if (addr->m_imm_offset != 0)
+	fprintf (f, " + " HOST_WIDE_INT_PRINT_DEC "]", addr->m_imm_offset);
+      else
+	fprintf (f, "]");
+    }
+  else if (!sth || addr->m_imm_offset != 0)
+    fprintf (f, "[" HOST_WIDE_INT_PRINT_DEC "]", addr->m_imm_offset);
+}
+
+/* Dump textual representation of HSA IL symbol SYMBOL to file F.  */
+
+static void
+dump_hsa_symbol (FILE *f, hsa_symbol *symbol)
+{
+  const char *name;
+  if (symbol->m_name)
+    name = symbol->m_name;
+  else
+    {
+      char buf[64];
+      sprintf (buf, "__%s_%i", hsa_seg_name (symbol->m_segment),
+	       symbol->m_name_number);
+
+      name = buf;
+    }
+
+  fprintf (f, "%s_%s %s", hsa_seg_name (symbol->m_segment),
+	   hsa_type_name (symbol->m_type & ~BRIG_TYPE_ARRAY_MASK), name);
+
+  if (symbol->m_type & BRIG_TYPE_ARRAY_MASK)
+    fprintf (f, "[%lu]", symbol->m_dim);
+}
+
+/* Dump textual representation of HSA IL operand OP to file F.  */
+
+static void
+dump_hsa_operand (FILE *f, hsa_op_base *op, bool dump_reg_type = false)
+{
+  if (is_a <hsa_op_immed *> (op))
+    dump_hsa_immed (f, as_a <hsa_op_immed *> (op));
+  else if (is_a <hsa_op_reg *> (op))
+    dump_hsa_reg (f, as_a <hsa_op_reg *> (op), dump_reg_type);
+  else if (is_a <hsa_op_address *> (op))
+    dump_hsa_address (f, as_a <hsa_op_address *> (op));
+  else
+    fprintf (f, "UNKNOWN_OP_KIND");
+}
+
+/* Dump textual representation of HSA IL operands in VEC to file F.  */
+
+static void
+dump_hsa_operands (FILE *f, hsa_insn_basic *insn, int start = 0,
+		   int end = -1, bool dump_reg_type = false)
+{
+  if (end == -1)
+    end = insn->operand_count ();
+
+  for (int i = start; i < end; i++)
+    {
+      dump_hsa_operand (f, insn->get_op (i), dump_reg_type);
+      if (i != end - 1)
+	fprintf (f, ", ");
+    }
+}
+
+/* Indent F stream with INDENT spaces.  */
+
+static void indent_stream (FILE *f, int indent)
+{
+  for (int i = 0; i < indent; i++)
+    fputc (' ', f);
+}
+
+/* Dump textual representation of HSA IL instruction INSN to file F.  Prepend
+   the instruction with *INDENT spaces and adjust the indentation for call
+   instructions as appropriate.  */
+
+static void
+dump_hsa_insn_1 (FILE *f, hsa_insn_basic *insn, int *indent)
+{
+  gcc_checking_assert (insn);
+
+  if (insn->m_number)
+    fprintf (f, "%5d: ", insn->m_number);
+
+  indent_stream (f, *indent);
+
+  if (is_a <hsa_insn_phi *> (insn))
+    {
+      hsa_insn_phi *phi = as_a <hsa_insn_phi *> (insn);
+      bool first = true;
+      dump_hsa_reg (f, phi->m_dest, true);
+      fprintf (f, " = PHI <");
+      unsigned count = phi->operand_count ();
+      for (unsigned i = 0; i < count; i++)
+	{
+	  if (!phi->get_op (i))
+	    break;
+	  if (!first)
+	    fprintf (f, ", ");
+	  else
+	    first = false;
+	  dump_hsa_operand (f, phi->get_op (i), true);
+	}
+      fprintf (f, ">");
+    }
+  else if (is_a <hsa_insn_signal *> (insn))
+    {
+      hsa_insn_signal *mem = as_a <hsa_insn_signal *> (insn);
+
+      fprintf (f, "%s", hsa_opcode_name (mem->m_opcode));
+      fprintf (f, "_%s", hsa_m_atomicop_name (mem->m_atomicop));
+      if (mem->m_memoryorder != BRIG_MEMORY_ORDER_NONE)
+	fprintf (f, "_%s", hsa_memsem_name (mem->m_memoryorder));
+      fprintf (f, "_%s ", hsa_type_name (mem->m_type));
+
+      dump_hsa_operands (f, mem);
+    }
+
+  else if (is_a <hsa_insn_atomic *> (insn))
+    {
+      hsa_insn_atomic *mem = as_a <hsa_insn_atomic *> (insn);
+
+      /* Either operand[0] or operand[1] must be an address operand.  */
+      hsa_op_address *addr = NULL;
+      if (is_a <hsa_op_address *> (mem->get_op (0)))
+	addr = as_a <hsa_op_address *> (mem->get_op (0));
+      else
+	addr = as_a <hsa_op_address *> (mem->get_op (1));
+
+      fprintf (f, "%s", hsa_opcode_name (mem->m_opcode));
+      fprintf (f, "_%s", hsa_m_atomicop_name (mem->m_atomicop));
+      if (addr->m_symbol)
+	fprintf (f, "_%s", hsa_seg_name (addr->m_symbol->m_segment));
+      if (mem->m_memoryorder != BRIG_MEMORY_ORDER_NONE)
+	fprintf (f, "_%s", hsa_memsem_name (mem->m_memoryorder));
+      if (mem->m_memoryscope != BRIG_MEMORY_SCOPE_NONE)
+	fprintf (f, "_%s", hsa_memscope_name (mem->m_memoryscope));
+      fprintf (f, "_%s ", hsa_type_name (mem->m_type));
+
+      dump_hsa_operands (f, mem);
+    }
+  else if (is_a <hsa_insn_mem *> (insn))
+    {
+      hsa_insn_mem *mem = as_a <hsa_insn_mem *> (insn);
+      hsa_op_address *addr = as_a <hsa_op_address *> (mem->get_op (1));
+
+      fprintf (f, "%s", hsa_opcode_name (mem->m_opcode));
+      if (addr->m_symbol)
+	fprintf (f, "_%s", hsa_seg_name (addr->m_symbol->m_segment));
+      if (mem->m_align != BRIG_ALIGNMENT_NONE)
+	fprintf (f, "_align(%u)", hsa_byte_alignment (mem->m_align));
+      if (mem->m_equiv_class != 0)
+	fprintf (f, "_equiv(%i)", mem->m_equiv_class);
+      fprintf (f, "_%s ", hsa_type_name (mem->m_type));
+
+      dump_hsa_operand (f, mem->get_op (0));
+      fprintf (f, ", ");
+      dump_hsa_address (f, addr);
+    }
+  else if (insn->m_opcode == BRIG_OPCODE_LDA)
+    {
+      hsa_op_address *addr = as_a <hsa_op_address *> (insn->get_op (1));
+
+      fprintf (f, "%s", hsa_opcode_name (insn->m_opcode));
+      if (addr->m_symbol)
+	fprintf (f, "_%s", hsa_seg_name (addr->m_symbol->m_segment));
+      fprintf (f, "_%s ", hsa_type_name (insn->m_type));
+
+      dump_hsa_operand (f, insn->get_op (0));
+      fprintf (f, ", ");
+      dump_hsa_address (f, addr);
+    }
+  else if (is_a <hsa_insn_seg *> (insn))
+    {
+      hsa_insn_seg *seg = as_a <hsa_insn_seg *> (insn);
+      fprintf (f, "%s_%s_%s_%s ", hsa_opcode_name (seg->m_opcode),
+	       hsa_seg_name (seg->m_segment),
+	       hsa_type_name (seg->m_type), hsa_type_name (seg->m_src_type));
+      dump_hsa_reg (f, as_a <hsa_op_reg *> (seg->get_op (0)));
+      fprintf (f, ", ");
+      dump_hsa_operand (f, seg->get_op (1));
+    }
+  else if (is_a <hsa_insn_cmp *> (insn))
+    {
+      hsa_insn_cmp *cmp = as_a <hsa_insn_cmp *> (insn);
+      BrigType16_t src_type;
+
+      if (is_a <hsa_op_reg *> (cmp->get_op (1)))
+	src_type = as_a <hsa_op_reg *> (cmp->get_op (1))->m_type;
+      else
+	src_type = as_a <hsa_op_immed *> (cmp->get_op (1))->m_type;
+
+      fprintf (f, "%s_%s_%s_%s ", hsa_opcode_name (cmp->m_opcode),
+	       hsa_cmpop_name (cmp->m_compare),
+	       hsa_type_name (cmp->m_type), hsa_type_name (src_type));
+      dump_hsa_reg (f, as_a <hsa_op_reg *> (cmp->get_op (0)));
+      fprintf (f, ", ");
+      dump_hsa_operand (f, cmp->get_op (1));
+      fprintf (f, ", ");
+      dump_hsa_operand (f, cmp->get_op (2));
+    }
+  else if (is_a <hsa_insn_br *> (insn))
+    {
+      hsa_insn_br *br = as_a <hsa_insn_br *> (insn);
+      basic_block target = NULL;
+      edge_iterator ei;
+      edge e;
+
+      fprintf (f, "%s ", hsa_opcode_name (br->m_opcode));
+      if (br->m_opcode == BRIG_OPCODE_CBR)
+	{
+	  dump_hsa_reg (f, as_a <hsa_op_reg *> (br->get_op (0)));
+	  fprintf (f, ", ");
+	}
+
+      FOR_EACH_EDGE (e, ei, br->m_bb->succs)
+	if (e->flags & EDGE_TRUE_VALUE)
+	  {
+	    target = e->dest;
+	    break;
+	  }
+      fprintf (f, "BB %i", hsa_bb_for_bb (target)->m_index);
+    }
+  else if (is_a <hsa_insn_sbr *> (insn))
+    {
+      hsa_insn_sbr *sbr = as_a <hsa_insn_sbr *> (insn);
+
+      fprintf (f, "%s ", hsa_opcode_name (sbr->m_opcode));
+      dump_hsa_reg (f, as_a <hsa_op_reg *> (sbr->get_op (0)));
+      fprintf (f, ", [");
+
+      for (unsigned i = 0; i < sbr->m_jump_table.length (); i++)
+	{
+	  fprintf (f, "BB %i", hsa_bb_for_bb (sbr->m_jump_table[i])->m_index);
+	  if (i != sbr->m_jump_table.length () - 1)
+	    fprintf (f, ", ");
+	}
+
+      fprintf (f, "]");
+    }
+  else if (is_a <hsa_insn_arg_block *> (insn))
+    {
+      hsa_insn_arg_block *arg_block = as_a <hsa_insn_arg_block *> (insn);
+      bool start_p = arg_block->m_kind == BRIG_KIND_DIRECTIVE_ARG_BLOCK_START;
+      char c = start_p ? '{' : '}';
+
+      if (start_p)
+	{
+	  *indent += 2;
+	  indent_stream (f, 2);
+	}
+
+      if (!start_p)
+	*indent -= 2;
+
+      fprintf (f, "%c", c);
+    }
+  else if (is_a <hsa_insn_call *> (insn))
+    {
+      hsa_insn_call *call = as_a <hsa_insn_call *> (insn);
+      if (call->m_called_function)
+	{
+	  const char *name = hsa_get_declaration_name (call->m_called_function);
+	  fprintf (f, "call &%s", name);
+	}
+      else
+	{
+	  char *name = call->m_called_internal_fn->name ();
+	  fprintf (f, "call &%s", name);
+	  free (name);
+	}
+
+      if (call->m_output_arg)
+	fprintf (f, "(%%res) ");
+
+      fprintf (f, "(");
+      for (unsigned i = 0; i < call->m_input_args.length (); i++)
+        {
+	  fprintf (f, "%%__arg_%u", i);
+
+	  if (i != call->m_input_args.length () - 1)
+	    fprintf (f, ", ");
+	}
+      fprintf (f, ")");
+    }
+  else if (is_a <hsa_insn_comment *> (insn))
+    {
+      hsa_insn_comment *c = as_a <hsa_insn_comment *> (insn);
+      fprintf (f, "%s", c->m_comment);
+    }
+  else if (is_a <hsa_insn_srctype *> (insn))
+    {
+      hsa_insn_srctype *srctype = as_a <hsa_insn_srctype *> (insn);
+
+      fprintf (f, "%s_%s_%s ", hsa_opcode_name (srctype->m_opcode),
+	       hsa_type_name (srctype->m_type),
+	       hsa_type_name (srctype->m_source_type));
+
+      dump_hsa_operands (f, insn);
+    }
+  else if (is_a <hsa_insn_packed *> (insn))
+    {
+      hsa_insn_packed *packed = as_a <hsa_insn_packed *> (insn);
+
+      fprintf (f, "%s_v%u_%s_%s ", hsa_opcode_name (packed->m_opcode),
+	       packed->operand_count () - 1,
+	       hsa_type_name (packed->m_type),
+	       hsa_type_name (packed->m_source_type));
+
+      if (packed->m_opcode == BRIG_OPCODE_COMBINE)
+	{
+	  dump_hsa_operand (f, insn->get_op (0));
+	  fprintf (f, ", (");
+	  dump_hsa_operands (f, insn, 1);
+	  fprintf (f, ")");
+	}
+      else if (packed->m_opcode == BRIG_OPCODE_EXPAND)
+	{
+	  fprintf (f, "(");
+	  dump_hsa_operands (f, insn, 0, insn->operand_count () - 1);
+	  fprintf (f, "), ");
+	  dump_hsa_operand (f, insn->get_op (insn->operand_count () - 1));
+
+	}
+      else
+	gcc_unreachable ();
+    }
+  else if (is_a <hsa_insn_alloca *> (insn))
+    {
+      hsa_insn_alloca *alloca = as_a <hsa_insn_alloca *> (insn);
+
+      fprintf (f, "%s_align(%u)_%s ", hsa_opcode_name (insn->m_opcode),
+	       hsa_byte_alignment (alloca->m_align),
+	       hsa_type_name (insn->m_type));
+
+      dump_hsa_operands (f, insn);
+    }
+  else
+    {
+      fprintf (f, "%s_%s ", hsa_opcode_name (insn->m_opcode),
+	       hsa_type_name (insn->m_type));
+
+      dump_hsa_operands (f, insn);
+    }
+
+  if (insn->m_brig_offset)
+    {
+      fprintf (f, "             /* BRIG offset: %u", insn->m_brig_offset);
+
+      for (unsigned i = 0; i < insn->operand_count (); i++)
+	fprintf (f, ", op%u: %u", i, insn->get_op (i)->m_brig_op_offset);
+
+      fprintf (f, " */");
+    }
+
+  fprintf (f, "\n");
+}
+
+/* Dump textual representation of HSA IL instruction INSN to file F.  */
+
+void
+dump_hsa_insn (FILE *f, hsa_insn_basic *insn)
+{
+  int indent = 0;
+  dump_hsa_insn_1 (f, insn, &indent);
+}
+
+/* Dump textual representation of HSA IL in HBB to file F.  */
+
+void
+dump_hsa_bb (FILE *f, hsa_bb *hbb)
+{
+  hsa_insn_basic *insn;
+  edge_iterator ei;
+  edge e;
+  basic_block true_bb = NULL, other = NULL;
+
+  fprintf (f, "BB %i:\n", hbb->m_index);
+
+  int indent = 2;
+  for (insn = hbb->m_first_phi; insn; insn = insn->m_next)
+    dump_hsa_insn_1 (f, insn, &indent);
+
+  for (insn = hbb->m_first_insn; insn; insn = insn->m_next)
+    dump_hsa_insn_1 (f, insn, &indent);
+
+  if (hbb->m_last_insn && is_a <hsa_insn_sbr *> (hbb->m_last_insn))
+    goto exit;
+
+  FOR_EACH_EDGE (e, ei, hbb->m_bb->succs)
+    if (e->flags & EDGE_TRUE_VALUE)
+      {
+	gcc_assert (!true_bb);
+	true_bb = e->dest;
+      }
+    else
+      {
+	gcc_assert (!other);
+	other = e->dest;
+      }
+
+  if (true_bb)
+    {
+      if (!hbb->m_last_insn
+	  || hbb->m_last_insn->m_opcode != BRIG_OPCODE_CBR)
+	fprintf (f, "WARNING: No branch insn for a true edge. \n");
+    }
+  else if (hbb->m_last_insn
+	   && hbb->m_last_insn->m_opcode == BRIG_OPCODE_CBR)
+    fprintf (f, "WARNING: No true edge for a cbr statement\n");
+
+  if (other && other->aux)
+    fprintf (f, "  Fall-through to BB %i\n",
+	     hsa_bb_for_bb (other)->m_index);
+  else if (hbb->m_last_insn
+	   && hbb->m_last_insn->m_opcode != BRIG_OPCODE_RET)
+    fprintf (f, "  WARNING: Fall through to a BB with no aux!\n");
+
+exit:
+  fprintf (f, "\n");
+}
+
+/* Dump textual representation of HSA IL of the current function to file F.  */
+
+void
+dump_hsa_cfun (FILE *f)
+{
+  basic_block bb;
+
+  if (hsa_cfun->m_global_symbols.length () > 0)
+    fprintf (f, "\nHSAIL in global scope\n");
+
+  for (unsigned i = 0; i < hsa_cfun->m_global_symbols.length (); i++)
+    {
+      fprintf (f, "  ");
+      dump_hsa_symbol (f, hsa_cfun->m_global_symbols[i]);
+      fprintf (f, "\n");
+    }
+
+  fprintf (f, "\nHSAIL IL for %s\n", hsa_cfun->m_name);
+
+  for (unsigned i = 0; i < hsa_cfun->m_private_variables.length (); i++)
+    {
+      fprintf (f, "  ");
+      dump_hsa_symbol (f, hsa_cfun->m_private_variables[i]);
+      fprintf (f, "\n");
+    }
+
+  FOR_ALL_BB_FN (bb, cfun)
+  {
+    hsa_bb *hbb = (struct hsa_bb *) bb->aux;
+    dump_hsa_bb (f, hbb);
+  }
+}
+
+/* Dump textual representation of HSA IL instruction INSN to stderr.  */
+
+DEBUG_FUNCTION void
+debug_hsa_insn (hsa_insn_basic *insn)
+{
+  dump_hsa_insn (stderr, insn);
+}
+
+/* Dump textual representation of HSA IL in HBB to stderr.  */
+
+DEBUG_FUNCTION void
+debug_hsa_bb (hsa_bb *hbb)
+{
+  dump_hsa_bb (stderr, hbb);
+}
+
+/* Dump textual representation of HSA IL of the current function to stderr.  */
+
+DEBUG_FUNCTION void
+debug_hsa_cfun (void)
+{
+  dump_hsa_cfun (stderr);
+}
+
+/* Dump textual representation of an HSA operand to stderr.  */
+
+DEBUG_FUNCTION void
+debug_hsa_operand (hsa_op_base *opc)
+{
+  dump_hsa_operand (stderr, opc, true);
+  fprintf (stderr, "\n");
+}
+
+/* Dump textual representation of as HSA symbol.  */
+
+DEBUG_FUNCTION void
+debug_hsa_symbol (hsa_symbol *symbol)
+{
+  dump_hsa_symbol (stderr, symbol);
+  fprintf (stderr, "\n");
+}
diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
new file mode 100644
index 0000000..a5f889a
--- /dev/null
+++ b/gcc/hsa-gen.c
@@ -0,0 +1,6115 @@
+/* A pass for lowering gimple to HSAIL
+   Copyright (C) 2013-2016 Free Software Foundation, Inc.
+   Contributed by Martin Jambor <mjambor@suse.cz> and
+   Martin Liska <mliska@suse.cz>.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "is-a.h"
+#include "hash-table.h"
+#include "vec.h"
+#include "tree.h"
+#include "tree-pass.h"
+#include "cfg.h"
+#include "function.h"
+#include "basic-block.h"
+#include "fold-const.h"
+#include "gimple.h"
+#include "gimple-iterator.h"
+#include "bitmap.h"
+#include "dumpfile.h"
+#include "gimple-pretty-print.h"
+#include "diagnostic-core.h"
+#include "alloc-pool.h"
+#include "gimple-ssa.h"
+#include "tree-phinodes.h"
+#include "stringpool.h"
+#include "tree-ssanames.h"
+#include "tree-dfa.h"
+#include "ssa-iterators.h"
+#include "cgraph.h"
+#include "print-tree.h"
+#include "symbol-summary.h"
+#include "hsa.h"
+#include "cfghooks.h"
+#include "tree-cfg.h"
+#include "cfgloop.h"
+#include "cfganal.h"
+#include "builtins.h"
+#include "params.h"
+#include "gomp-constants.h"
+#include "internal-fn.h"
+#include "builtins.h"
+#include "stor-layout.h"
+
+/* Print a warning message and set that we have seen an error.  */
+
+#define HSA_SORRY_ATV(location, message, ...) \
+  do \
+  { \
+    hsa_fail_cfun (); \
+    if (warning_at (EXPR_LOCATION (hsa_cfun->m_decl), OPT_Whsa, \
+		    HSA_SORRY_MSG)) \
+      inform (location, message, __VA_ARGS__); \
+  } \
+  while (false);
+
+/* Same as previous, but highlight a location.  */
+
+#define HSA_SORRY_AT(location, message) \
+  do \
+  { \
+    hsa_fail_cfun (); \
+    if (warning_at (EXPR_LOCATION (hsa_cfun->m_decl), OPT_Whsa, \
+		    HSA_SORRY_MSG)) \
+      inform (location, message); \
+  } \
+  while (false);
+
+/* Default number of threads used by kernel dispatch.  */
+
+#define HSA_DEFAULT_NUM_THREADS 64
+
+/* Following structures are defined in the final version
+   of HSA specification.  */
+
+/* HSA queue packet is shadow structure, originally provided by AMD.  */
+
+struct hsa_queue_packet
+{
+  uint16_t header;
+  uint16_t setup;
+  uint16_t workgroup_size_x;
+  uint16_t workgroup_size_y;
+  uint16_t workgroup_size_z;
+  uint16_t reserved0;
+  uint32_t grid_size_x;
+  uint32_t grid_size_y;
+  uint32_t grid_size_z;
+  uint32_t private_segment_size;
+  uint32_t group_segment_size;
+  uint64_t kernel_object;
+  void *kernarg_address;
+  uint64_t reserved2;
+  uint64_t completion_signal;
+};
+
+/* HSA queue is shadow structure, originally provided by AMD.  */
+
+struct hsa_queue
+{
+  int type;
+  uint32_t features;
+  void *base_address;
+  uint64_t doorbell_signal;
+  uint32_t size;
+  uint32_t reserved1;
+  uint64_t id;
+};
+
+/* Alloc pools for allocating basic hsa structures such as operands,
+   instructions and other basic entities.s */
+static object_allocator<hsa_op_address> *hsa_allocp_operand_address;
+static object_allocator<hsa_op_immed> *hsa_allocp_operand_immed;
+static object_allocator<hsa_op_reg> *hsa_allocp_operand_reg;
+static object_allocator<hsa_op_code_list> *hsa_allocp_operand_code_list;
+static object_allocator<hsa_op_operand_list> *hsa_allocp_operand_operand_list;
+static object_allocator<hsa_insn_basic> *hsa_allocp_inst_basic;
+static object_allocator<hsa_insn_phi> *hsa_allocp_inst_phi;
+static object_allocator<hsa_insn_mem> *hsa_allocp_inst_mem;
+static object_allocator<hsa_insn_atomic> *hsa_allocp_inst_atomic;
+static object_allocator<hsa_insn_signal> *hsa_allocp_inst_signal;
+static object_allocator<hsa_insn_seg> *hsa_allocp_inst_seg;
+static object_allocator<hsa_insn_cmp> *hsa_allocp_inst_cmp;
+static object_allocator<hsa_insn_br> *hsa_allocp_inst_br;
+static object_allocator<hsa_insn_sbr> *hsa_allocp_inst_sbr;
+static object_allocator<hsa_insn_call> *hsa_allocp_inst_call;
+static object_allocator<hsa_insn_arg_block> *hsa_allocp_inst_arg_block;
+static object_allocator<hsa_insn_comment> *hsa_allocp_inst_comment;
+static object_allocator<hsa_insn_queue> *hsa_allocp_inst_queue;
+static object_allocator<hsa_insn_srctype> *hsa_allocp_inst_srctype;
+static object_allocator<hsa_insn_packed> *hsa_allocp_inst_packed;
+static object_allocator<hsa_insn_cvt> *hsa_allocp_inst_cvt;
+static object_allocator<hsa_insn_alloca> *hsa_allocp_inst_alloca;
+static object_allocator<hsa_bb> *hsa_allocp_bb;
+
+/* List of pointers to all instructions that come from an object allocator.  */
+static vec <hsa_insn_basic *> hsa_instructions;
+
+/* List of pointers to all operands that come from an object allocator.  */
+static vec <hsa_op_base *> hsa_operands;
+
+hsa_symbol::hsa_symbol ()
+: m_decl (NULL_TREE), m_name (NULL), m_name_number (0),
+  m_directive_offset (0), m_type (BRIG_TYPE_NONE),
+  m_segment (BRIG_SEGMENT_NONE), m_linkage (BRIG_LINKAGE_NONE), m_dim (0),
+  m_cst_value (NULL), m_global_scope_p (false), m_seen_error (false),
+  m_allocation (BRIG_ALLOCATION_AUTOMATIC)
+{
+}
+
+
+hsa_symbol::hsa_symbol (BrigType16_t type, BrigSegment8_t segment,
+			BrigLinkage8_t linkage, bool global_scope_p,
+			BrigAllocation allocation)
+: m_decl (NULL_TREE), m_name (NULL), m_name_number (0),
+  m_directive_offset (0), m_type (type), m_segment (segment),
+  m_linkage (linkage), m_dim (0), m_cst_value (NULL),
+  m_global_scope_p (global_scope_p), m_seen_error (false),
+  m_allocation (allocation)
+{
+}
+
+unsigned HOST_WIDE_INT
+hsa_symbol::total_byte_size ()
+{
+  unsigned HOST_WIDE_INT s = hsa_type_bit_size (~BRIG_TYPE_ARRAY_MASK & m_type);
+  gcc_assert (s % BITS_PER_UNIT == 0);
+  s /= BITS_PER_UNIT;
+
+  if (m_dim)
+    s *= m_dim;
+
+  return s;
+}
+
+/* Forward declaration.  */
+
+static BrigType16_t
+hsa_type_for_tree_type (const_tree type, unsigned HOST_WIDE_INT *dim_p,
+			bool min32int);
+
+void
+hsa_symbol::fillup_for_decl (tree decl)
+{
+  m_decl = decl;
+  m_type = hsa_type_for_tree_type (TREE_TYPE (decl), &m_dim, false);
+
+  if (hsa_seen_error ())
+    m_seen_error = true;
+}
+
+/* Constructor of class representing global HSA function/kernel information and
+   state.  FNDECL is function declaration, KERNEL_P is true if the function
+   is going to become a HSA kernel.  If the function has body, SSA_NAMES_COUNT
+   should be set to number of SSA names used in the function.  */
+
+hsa_function_representation::hsa_function_representation
+  (tree fdecl, bool kernel_p, unsigned ssa_names_count): m_name (NULL),
+  m_reg_count (0), m_input_args (vNULL),
+  m_output_arg (NULL), m_spill_symbols (vNULL), m_global_symbols (vNULL),
+  m_private_variables (vNULL), m_called_functions (vNULL),
+  m_called_internal_fns (vNULL), m_hbb_count (0),
+  m_in_ssa (true), m_kern_p (kernel_p), m_declaration_p (false), m_decl (fdecl),
+  m_internal_fn (NULL), m_shadow_reg (NULL), m_kernel_dispatch_count (0),
+  m_maximum_omp_data_size (0), m_seen_error (false), m_temp_symbol_count (0),
+  m_ssa_map ()
+{
+  int sym_init_len = (vec_safe_length (cfun->local_decls) / 2) + 1;;
+  m_local_symbols = new hash_table <hsa_noop_symbol_hasher> (sym_init_len);
+  m_ssa_map.safe_grow_cleared (ssa_names_count);
+}
+
+/* Constructor of class representing HSA function information that
+   is derived for an internal function.  */
+hsa_function_representation::hsa_function_representation (hsa_internal_fn *fn):
+  m_reg_count (0), m_input_args (vNULL),
+  m_output_arg (NULL), m_local_symbols (NULL),
+  m_spill_symbols (vNULL), m_global_symbols (vNULL),
+  m_private_variables (vNULL), m_called_functions (vNULL),
+  m_called_internal_fns (vNULL), m_hbb_count (0),
+  m_in_ssa (true), m_kern_p (false), m_declaration_p (true), m_decl (NULL),
+  m_internal_fn (fn), m_shadow_reg (NULL), m_kernel_dispatch_count (0),
+  m_maximum_omp_data_size (0), m_seen_error (false), m_temp_symbol_count (0),
+  m_ssa_map () {}
+
+/* Destructor of class holding function/kernel-wide information and state.  */
+
+hsa_function_representation::~hsa_function_representation ()
+{
+  /* Kernel names are deallocated at the end of BRIG output when deallocating
+     hsa_decl_kernel_mapping.  */
+  if (!m_kern_p || m_seen_error)
+    free (m_name);
+
+  for (unsigned i = 0; i < m_input_args.length (); i++)
+    delete m_input_args[i];
+  m_input_args.release ();
+
+  delete m_output_arg;
+  delete m_local_symbols;
+
+  for (unsigned i = 0; i < m_spill_symbols.length (); i++)
+    delete m_spill_symbols[i];
+  m_spill_symbols.release ();
+
+  hsa_symbol *sym;
+  for (unsigned i = 0; i < m_global_symbols.iterate (i, &sym); i++)
+    if (sym->m_linkage != BRIG_ALLOCATION_PROGRAM)
+      delete sym;
+  m_global_symbols.release ();
+
+  for (unsigned i = 0; i < m_private_variables.length (); i++)
+    delete m_private_variables[i];
+  m_private_variables.release ();
+  m_called_functions.release ();
+  m_ssa_map.release ();
+
+  for (unsigned i = 0; i < m_called_internal_fns.length (); i++)
+    delete m_called_internal_fns[i];
+}
+
+hsa_op_reg *
+hsa_function_representation::get_shadow_reg ()
+{
+  /* If we compile a function with kernel dispatch and does not set
+     an optimization level, the function won't be inlined and
+     we return NULL.  */
+  if (!m_kern_p)
+    return NULL;
+
+  if (m_shadow_reg)
+    return m_shadow_reg;
+
+  /* Append the shadow argument.  */
+  hsa_symbol *shadow = new hsa_symbol (BRIG_TYPE_U64, BRIG_SEGMENT_KERNARG,
+				       BRIG_LINKAGE_FUNCTION);
+  m_input_args.safe_push (shadow);
+  shadow->m_name = "hsa_runtime_shadow";
+
+  hsa_op_reg *r = new hsa_op_reg (BRIG_TYPE_U64);
+  hsa_op_address *addr = new hsa_op_address (shadow);
+
+  hsa_insn_mem *mem = new hsa_insn_mem (BRIG_OPCODE_LD, BRIG_TYPE_U64, r, addr);
+  hsa_bb_for_bb (ENTRY_BLOCK_PTR_FOR_FN (cfun))->append_insn (mem);
+  m_shadow_reg = r;
+
+  return r;
+}
+
+bool hsa_function_representation::has_shadow_reg_p ()
+{
+  return m_shadow_reg != NULL;
+}
+
+void
+hsa_function_representation::init_extra_bbs ()
+{
+  hsa_init_new_bb (ENTRY_BLOCK_PTR_FOR_FN (cfun));
+  hsa_init_new_bb (EXIT_BLOCK_PTR_FOR_FN (cfun));
+}
+
+hsa_symbol *
+hsa_function_representation::create_hsa_temporary (BrigType16_t type)
+{
+  hsa_symbol *s = new hsa_symbol (type, BRIG_SEGMENT_PRIVATE,
+				  BRIG_LINKAGE_FUNCTION);
+  s->m_name_number = m_temp_symbol_count++;
+
+  hsa_cfun->m_private_variables.safe_push (s);
+  return s;
+}
+
+BrigLinkage8_t
+hsa_function_representation::get_linkage ()
+{
+  if (m_internal_fn)
+    return BRIG_LINKAGE_PROGRAM;
+
+  return m_kern_p || TREE_PUBLIC (m_decl) ?
+    BRIG_LINKAGE_PROGRAM : BRIG_LINKAGE_MODULE;
+}
+
+/* Hash map of simple OMP builtins.  */
+static hash_map <nofree_string_hash, omp_simple_builtin> *omp_simple_builtins
+  = NULL;
+
+/* Warning messages for OMP builtins.  */
+
+#define HSA_WARN_LOCK_ROUTINE "support for HSA does not implement OpenMP lock " \
+  "routines"
+#define HSA_WARN_TIMING_ROUTINE "support for HSA does not implement OpenMP " \
+  "timing routines"
+#define HSA_WARN_MEMORY_ROUTINE "OpenMP device memory library routines have " \
+  "undefined semantics within target regions, support for HSA ignores them"
+#define HSA_WARN_AFFINITY "Support for HSA does not implement OpenMP affinity " \
+  "featerues"
+
+/* Initialize hash map with simple OMP builtins.  */
+
+static void
+hsa_init_simple_builtins ()
+{
+  if (omp_simple_builtins != NULL)
+    return;
+
+  omp_simple_builtins = new hash_map <nofree_string_hash, omp_simple_builtin>
+    ();
+
+  omp_simple_builtin omp_builtins[] =
+    {
+      omp_simple_builtin
+	("omp_get_initial_device", NULL, false,
+	 new hsa_op_immed (GOMP_DEVICE_HOST, (BrigType16_t) BRIG_TYPE_S32)),
+      omp_simple_builtin ("omp_is_initial_device", NULL, false,
+			  new hsa_op_immed (0, (BrigType16_t) BRIG_TYPE_S32)),
+      omp_simple_builtin ("omp_get_dynamic", NULL, false,
+			  new hsa_op_immed (0, (BrigType16_t) BRIG_TYPE_S32)),
+      omp_simple_builtin ("omp_set_dynamic", NULL, false, NULL),
+      omp_simple_builtin ("omp_init_lock", HSA_WARN_LOCK_ROUTINE, true),
+      omp_simple_builtin ("omp_init_lock_with_hint", HSA_WARN_LOCK_ROUTINE,
+			  true),
+      omp_simple_builtin ("omp_init_nest_lock_with_hint", HSA_WARN_LOCK_ROUTINE,
+			  true),
+      omp_simple_builtin ("omp_destroy_lock", HSA_WARN_LOCK_ROUTINE, true),
+      omp_simple_builtin ("omp_set_lock", HSA_WARN_LOCK_ROUTINE, true),
+      omp_simple_builtin ("omp_unset_lock", HSA_WARN_LOCK_ROUTINE, true),
+      omp_simple_builtin ("omp_test_lock", HSA_WARN_LOCK_ROUTINE, true),
+      omp_simple_builtin ("omp_get_wtime", HSA_WARN_TIMING_ROUTINE, true),
+      omp_simple_builtin ("omp_get_wtick", HSA_WARN_TIMING_ROUTINE, true),
+      omp_simple_builtin ("omp_target_alloc", HSA_WARN_MEMORY_ROUTINE, false,
+			  new hsa_op_immed (0, (BrigType16_t) BRIG_TYPE_U64)),
+      omp_simple_builtin ("omp_target_free", HSA_WARN_MEMORY_ROUTINE, false),
+      omp_simple_builtin
+	("omp_target_is_present", HSA_WARN_MEMORY_ROUTINE,
+	 false, new hsa_op_immed (-1, (BrigType16_t) BRIG_TYPE_S32)),
+      omp_simple_builtin ("omp_target_memcpy", HSA_WARN_MEMORY_ROUTINE, false,
+			  new hsa_op_immed (-1, (BrigType16_t) BRIG_TYPE_S32)),
+      omp_simple_builtin
+	("omp_target_memcpy_rect", HSA_WARN_MEMORY_ROUTINE,
+	 false, new hsa_op_immed (-1, (BrigType16_t) BRIG_TYPE_S32)),
+      omp_simple_builtin
+	("omp_target_associate_ptr", HSA_WARN_MEMORY_ROUTINE, false,
+	 new hsa_op_immed (-1, (BrigType16_t) BRIG_TYPE_S32)),
+      omp_simple_builtin
+	("omp_target_disassociate_ptr", HSA_WARN_MEMORY_ROUTINE,
+	 false, new hsa_op_immed (-1, (BrigType16_t) BRIG_TYPE_S32)),
+      omp_simple_builtin
+        ("omp_set_max_active_levels",
+	 "Support for HSA only allows only one active level, call to "
+	 "omp_set_max_active_levels will be ignored in the generated HSAIL",
+	 false, NULL),
+      omp_simple_builtin ("omp_get_max_active_levels", NULL, false,
+			  new hsa_op_immed (1, (BrigType16_t) BRIG_TYPE_S32)),
+      omp_simple_builtin ("omp_in_final", NULL, false,
+			  new hsa_op_immed (0, (BrigType16_t) BRIG_TYPE_S32)),
+      omp_simple_builtin ("omp_get_proc_bind", HSA_WARN_AFFINITY, false,
+			  new hsa_op_immed (0, (BrigType16_t) BRIG_TYPE_S32)),
+      omp_simple_builtin ("omp_get_num_places", HSA_WARN_AFFINITY, false,
+			  new hsa_op_immed (0, (BrigType16_t) BRIG_TYPE_S32)),
+      omp_simple_builtin ("omp_get_place_num_procs", HSA_WARN_AFFINITY, false,
+			  new hsa_op_immed (0, (BrigType16_t) BRIG_TYPE_S32)),
+      omp_simple_builtin ("omp_get_place_proc_ids", HSA_WARN_AFFINITY, false,
+			  NULL),
+      omp_simple_builtin ("omp_get_place_num", HSA_WARN_AFFINITY, false,
+			  new hsa_op_immed (-1, (BrigType16_t) BRIG_TYPE_S32)),
+      omp_simple_builtin ("omp_get_partition_num_places", HSA_WARN_AFFINITY,
+			  false,
+			  new hsa_op_immed (0, (BrigType16_t) BRIG_TYPE_S32)),
+      omp_simple_builtin ("omp_get_partition_place_nums", HSA_WARN_AFFINITY,
+			  false, NULL),
+      omp_simple_builtin ("omp_set_default_device",
+			  "omp_set_default_device has undefined semantics "
+			  "within target regions, support for HSA ignores it",
+			  false, NULL),
+      omp_simple_builtin ("omp_get_default_device",
+			  "omp_get_default_device has undefined semantics "
+			  "within target regions, support for HSA ignores it",
+			  false,
+			  new hsa_op_immed (0, (BrigType16_t) BRIG_TYPE_S32)),
+      omp_simple_builtin ("omp_get_num_devices",
+			  "omp_get_num_devices has undefined semantics "
+			  "within target regions, support for HSA ignores it",
+			  false,
+			  new hsa_op_immed (0, (BrigType16_t) BRIG_TYPE_S32)),
+      omp_simple_builtin ("omp_get_num_procs", NULL, true, NULL),
+      omp_simple_builtin ("omp_get_cancellation", NULL, true, NULL),
+      omp_simple_builtin ("omp_set_nested", NULL, true, NULL),
+      omp_simple_builtin ("omp_get_nested", NULL, true, NULL),
+      omp_simple_builtin ("omp_set_schedule", NULL, true, NULL),
+      omp_simple_builtin ("omp_get_schedule", NULL, true, NULL),
+      omp_simple_builtin ("omp_get_thread_limit", NULL, true, NULL),
+      omp_simple_builtin ("omp_get_team_size", NULL, true, NULL),
+      omp_simple_builtin ("omp_get_ancestor_thread_num", NULL, true, NULL),
+      omp_simple_builtin ("omp_get_max_task_priority", NULL, true, NULL)
+    };
+
+  unsigned count = sizeof (omp_builtins) / sizeof (omp_simple_builtin);
+
+  for (unsigned i = 0; i < count; i++)
+    omp_simple_builtins->put (omp_builtins[i].m_name, omp_builtins[i]);
+}
+
+/* Allocate HSA structures that we need only while generating with this.  */
+
+static void
+hsa_init_data_for_cfun ()
+{
+  hsa_init_compilation_unit_data ();
+  hsa_allocp_operand_address
+    = new object_allocator<hsa_op_address> ("HSA address operands");
+  hsa_allocp_operand_immed
+    = new object_allocator<hsa_op_immed> ("HSA immediate operands");
+  hsa_allocp_operand_reg
+    = new object_allocator<hsa_op_reg> ("HSA register operands");
+  hsa_allocp_operand_code_list
+    = new object_allocator<hsa_op_code_list> ("HSA code list operands");
+  hsa_allocp_operand_operand_list
+    = new object_allocator<hsa_op_operand_list> ("HSA operand list operands");
+  hsa_allocp_inst_basic
+    = new object_allocator<hsa_insn_basic> ("HSA basic instructions");
+  hsa_allocp_inst_phi
+    = new object_allocator<hsa_insn_phi> ("HSA phi operands");
+  hsa_allocp_inst_mem
+    = new object_allocator<hsa_insn_mem> ("HSA memory instructions");
+  hsa_allocp_inst_atomic
+    = new object_allocator<hsa_insn_atomic> ("HSA atomic instructions");
+  hsa_allocp_inst_signal
+    = new object_allocator<hsa_insn_signal> ("HSA signal instructions");
+  hsa_allocp_inst_seg
+    = new object_allocator<hsa_insn_seg> ("HSA segment conversion instructions");
+  hsa_allocp_inst_cmp
+    = new object_allocator<hsa_insn_cmp> ("HSA comparison instructions");
+  hsa_allocp_inst_br
+    = new object_allocator<hsa_insn_br> ("HSA branching instructions");
+  hsa_allocp_inst_sbr
+    = new object_allocator<hsa_insn_sbr> ("HSA switch branching instructions");
+  hsa_allocp_inst_call
+    = new object_allocator<hsa_insn_call> ("HSA call instructions");
+  hsa_allocp_inst_arg_block
+    = new object_allocator<hsa_insn_arg_block> ("HSA arg block instructions");
+  hsa_allocp_inst_comment
+    = new object_allocator<hsa_insn_comment> ("HSA comment instructions");
+  hsa_allocp_inst_queue
+    = new object_allocator<hsa_insn_queue> ("HSA queue instructions");
+  hsa_allocp_inst_srctype
+    = new object_allocator<hsa_insn_srctype> ("HSA source type instructions");
+  hsa_allocp_inst_packed
+    = new object_allocator<hsa_insn_packed> ("HSA packed instructions");
+  hsa_allocp_inst_cvt
+    = new object_allocator<hsa_insn_cvt> ("HSA convert instructions");
+  hsa_allocp_inst_alloca
+    = new object_allocator<hsa_insn_alloca> ("HSA alloca instructions");
+  hsa_allocp_bb = new object_allocator<hsa_bb> ("HSA basic blocks");
+}
+
+/* Deinitialize HSA subsystem and free all allocated memory.  */
+
+static void
+hsa_deinit_data_for_cfun (void)
+{
+  basic_block bb;
+
+  FOR_ALL_BB_FN (bb, cfun)
+    if (bb->aux)
+      {
+	hsa_bb *hbb = hsa_bb_for_bb (bb);
+	hbb->~hsa_bb ();
+	bb->aux = NULL;
+      }
+
+  for (unsigned int i = 0; i < hsa_operands.length (); i++)
+    hsa_destroy_operand (hsa_operands[i]);
+
+  hsa_operands.release ();
+
+  for (unsigned i = 0; i < hsa_instructions.length (); i++)
+    hsa_destroy_insn (hsa_instructions[i]);
+
+  hsa_instructions.release ();
+
+  if (omp_simple_builtins != NULL)
+    {
+      delete omp_simple_builtins;
+      omp_simple_builtins = NULL;
+    }
+
+  delete hsa_allocp_operand_address;
+  delete hsa_allocp_operand_immed;
+  delete hsa_allocp_operand_reg;
+  delete hsa_allocp_operand_code_list;
+  delete hsa_allocp_operand_operand_list;
+  delete hsa_allocp_inst_basic;
+  delete hsa_allocp_inst_phi;
+  delete hsa_allocp_inst_atomic;
+  delete hsa_allocp_inst_mem;
+  delete hsa_allocp_inst_signal;
+  delete hsa_allocp_inst_seg;
+  delete hsa_allocp_inst_cmp;
+  delete hsa_allocp_inst_br;
+  delete hsa_allocp_inst_sbr;
+  delete hsa_allocp_inst_call;
+  delete hsa_allocp_inst_arg_block;
+  delete hsa_allocp_inst_comment;
+  delete hsa_allocp_inst_queue;
+  delete hsa_allocp_inst_srctype;
+  delete hsa_allocp_inst_packed;
+  delete hsa_allocp_inst_cvt;
+  delete hsa_allocp_inst_alloca;
+  delete hsa_allocp_bb;
+  delete hsa_cfun;
+}
+
+/* Return the type which holds addresses in the given SEGMENT.  */
+
+static BrigType16_t
+hsa_get_segment_addr_type (BrigSegment8_t segment)
+{
+  switch (segment)
+    {
+    case BRIG_SEGMENT_NONE:
+      gcc_unreachable ();
+
+    case BRIG_SEGMENT_FLAT:
+    case BRIG_SEGMENT_GLOBAL:
+    case BRIG_SEGMENT_READONLY:
+    case BRIG_SEGMENT_KERNARG:
+      return hsa_machine_large_p () ? BRIG_TYPE_U64 : BRIG_TYPE_U32;
+
+    case BRIG_SEGMENT_GROUP:
+    case BRIG_SEGMENT_PRIVATE:
+    case BRIG_SEGMENT_SPILL:
+    case BRIG_SEGMENT_ARG:
+      return BRIG_TYPE_U32;
+    }
+  gcc_unreachable ();
+}
+
+/* Return integer brig type according to provided SIZE in bytes.  If SIGN
+   is set to true, return signed integer type.  */
+
+static BrigType16_t
+get_integer_type_by_bytes (unsigned size, bool sign)
+{
+  if (sign)
+    switch (size)
+      {
+      case 1:
+	return BRIG_TYPE_S8;
+      case 2:
+	return BRIG_TYPE_S16;
+      case 4:
+	return BRIG_TYPE_S32;
+      case 8:
+	return BRIG_TYPE_S64;
+      default:
+	break;
+      }
+  else
+    switch (size)
+      {
+      case 1:
+	return BRIG_TYPE_U8;
+      case 2:
+	return BRIG_TYPE_U16;
+      case 4:
+	return BRIG_TYPE_U32;
+      case 8:
+	return BRIG_TYPE_U64;
+      default:
+	break;
+      }
+
+  return 0;
+}
+
+/* Return HSA type for tree TYPE, which has to fit into BrigType16_t.  Pointers
+   are assumed to use flat addressing.  If min32int is true, always expand
+   integer types to one that has at least 32 bits.  */
+
+static BrigType16_t
+hsa_type_for_scalar_tree_type (const_tree type, bool min32int)
+{
+  HOST_WIDE_INT bsize;
+  const_tree base;
+  BrigType16_t res = BRIG_TYPE_NONE;
+
+  gcc_checking_assert (TYPE_P (type));
+  gcc_checking_assert (!AGGREGATE_TYPE_P (type));
+  if (POINTER_TYPE_P (type))
+    return hsa_get_segment_addr_type (BRIG_SEGMENT_FLAT);
+
+  if (TREE_CODE (type) == VECTOR_TYPE || TREE_CODE (type) == COMPLEX_TYPE)
+    base = TREE_TYPE (type);
+  else
+    base = type;
+
+  if (!tree_fits_uhwi_p (TYPE_SIZE (base)))
+    {
+      HSA_SORRY_ATV (EXPR_LOCATION (type),
+		     "support for HSA does not implement huge or "
+		     "variable-sized type %T", type);
+      return res;
+    }
+
+  bsize = tree_to_uhwi (TYPE_SIZE (base));
+  unsigned byte_size = bsize / BITS_PER_UNIT;
+  if (INTEGRAL_TYPE_P (base))
+    res = get_integer_type_by_bytes (byte_size, !TYPE_UNSIGNED (base));
+  else if (SCALAR_FLOAT_TYPE_P (base))
+    {
+      switch (bsize)
+	{
+	case 16:
+	  res = BRIG_TYPE_F16;
+	  break;
+	case 32:
+	  res = BRIG_TYPE_F32;
+	  break;
+	case 64:
+	  res = BRIG_TYPE_F64;
+	  break;
+	default:
+	  break;
+	}
+    }
+
+  if (res == BRIG_TYPE_NONE)
+    {
+      HSA_SORRY_ATV (EXPR_LOCATION (type),
+		     "support for HSA does not implement type %T", type);
+      return res;
+    }
+
+  if (TREE_CODE (type) == VECTOR_TYPE)
+    {
+      HOST_WIDE_INT tsize = tree_to_uhwi (TYPE_SIZE (type));
+
+      if (bsize == tsize)
+	{
+	  HSA_SORRY_ATV (EXPR_LOCATION (type),
+			 "support for HSA does not implement a vector type "
+			 "where a type and unit size are equal: %T", type);
+	  return res;
+	}
+
+      switch (tsize)
+	{
+	case 32:
+	  res |= BRIG_TYPE_PACK_32;
+	  break;
+	case 64:
+	  res |= BRIG_TYPE_PACK_64;
+	  break;
+	case 128:
+	  res |= BRIG_TYPE_PACK_128;
+	  break;
+	default:
+	  HSA_SORRY_ATV (EXPR_LOCATION (type),
+			 "support for HSA does not implement type %T", type);
+	}
+    }
+
+  if (min32int)
+    {
+      /* Registers/immediate operands can only be 32bit or more except for
+         f16.  */
+      if (res == BRIG_TYPE_U8 || res == BRIG_TYPE_U16)
+	res = BRIG_TYPE_U32;
+      else if (res == BRIG_TYPE_S8 || res == BRIG_TYPE_S16)
+	res = BRIG_TYPE_S32;
+    }
+
+  if (TREE_CODE (type) == COMPLEX_TYPE)
+    {
+      unsigned bsize = 2 * hsa_type_bit_size (res);
+      res = hsa_bittype_for_bitsize (bsize);
+    }
+
+  return res;
+}
+
+/* Returns the BRIG type we need to load/store entities of TYPE.  */
+
+static BrigType16_t
+mem_type_for_type (BrigType16_t type)
+{
+  /* HSA has non-intuitive constraints on load/store types.  If it's
+     a bit-type it _must_ be B128, if it's not a bit-type it must be
+     64bit max.  So for loading entities of 128 bits (e.g. vectors)
+     we have to to B128, while for loading the rest we have to use the
+     input type (??? or maybe also flattened to a equally sized non-vector
+     unsigned type?).  */
+  if ((type & BRIG_TYPE_PACK_MASK) == BRIG_TYPE_PACK_128)
+    return BRIG_TYPE_B128;
+  else if (hsa_btype_p (type))
+    {
+      unsigned bitsize = hsa_type_bit_size (type);
+      if (bitsize < 128)
+	return hsa_uint_for_bitsize (bitsize);
+    }
+  return type;
+}
+
+/* Return HSA type for tree TYPE.  If it cannot fit into BrigType16_t, some
+   kind of array will be generated, setting DIM appropriately.  Otherwise, it
+   will be set to zero.  */
+
+static BrigType16_t
+hsa_type_for_tree_type (const_tree type, unsigned HOST_WIDE_INT *dim_p = NULL,
+			bool min32int = false)
+{
+  gcc_checking_assert (TYPE_P (type));
+  if (!tree_fits_uhwi_p (TYPE_SIZE_UNIT (type)))
+    {
+      HSA_SORRY_ATV (EXPR_LOCATION (type), "support for HSA does not "
+		     "implement huge or variable-sized type %T", type);
+      return BRIG_TYPE_NONE;
+    }
+
+  if (RECORD_OR_UNION_TYPE_P (type))
+    {
+      if (dim_p)
+	*dim_p = tree_to_uhwi (TYPE_SIZE_UNIT (type));
+      return BRIG_TYPE_U8 | BRIG_TYPE_ARRAY;
+    }
+
+  if (TREE_CODE (type) == ARRAY_TYPE)
+    {
+      /* We try to be nice and use the real base-type when this is an array of
+	 scalars and only resort to an array of bytes if the type is more
+	 complex.  */
+
+      unsigned HOST_WIDE_INT dim = 1;
+
+      while (TREE_CODE (type) == ARRAY_TYPE)
+	{
+	  tree domain = TYPE_DOMAIN (type);
+	  if (!TYPE_MIN_VALUE (domain)
+	      || !TYPE_MAX_VALUE (domain)
+	      || !tree_fits_shwi_p (TYPE_MIN_VALUE (domain))
+	      || !tree_fits_shwi_p (TYPE_MAX_VALUE (domain)))
+	    {
+	      HSA_SORRY_ATV (EXPR_LOCATION (type),
+			     "support for HSA does not implement array %T with "
+			     "unknown bounds", type);
+	      return BRIG_TYPE_NONE;
+	    }
+	  HOST_WIDE_INT min = tree_to_shwi (TYPE_MIN_VALUE (domain));
+	  HOST_WIDE_INT max = tree_to_shwi (TYPE_MAX_VALUE (domain));
+	  dim = dim * (unsigned HOST_WIDE_INT) (max - min + 1);
+	  type = TREE_TYPE (type);
+	}
+
+      BrigType16_t res;
+      if (RECORD_OR_UNION_TYPE_P (type))
+	{
+	  dim = dim * tree_to_uhwi (TYPE_SIZE_UNIT (type));
+	  res = BRIG_TYPE_U8;
+	}
+      else
+	res = hsa_type_for_scalar_tree_type (type, false);
+
+      if (dim_p)
+	*dim_p = dim;
+      return res | BRIG_TYPE_ARRAY;
+    }
+
+  /* Scalar case: */
+  if (dim_p)
+    *dim_p = 0;
+
+  return hsa_type_for_scalar_tree_type (type, min32int);
+}
+
+/* Returns true if converting from STYPE into DTYPE needs the _CVT
+   opcode.  If false a normal _MOV is enough.  */
+
+static bool
+hsa_needs_cvt (BrigType16_t dtype, BrigType16_t stype)
+{
+  if (hsa_btype_p (dtype))
+    return false;
+
+  /* float <-> int conversions are real converts.  */
+  if (hsa_type_float_p (dtype) != hsa_type_float_p (stype))
+    return true;
+  /* When both types have different size, then we need CVT as well.  */
+  if (hsa_type_bit_size (dtype) != hsa_type_bit_size (stype))
+    return true;
+  return false;
+}
+
+/* Lookup or create the associated hsa_symbol structure with a given VAR_DECL
+   or lookup the hsa_structure corresponding to a PARM_DECL.  */
+
+static hsa_symbol *
+get_symbol_for_decl (tree decl)
+{
+  hsa_symbol **slot;
+  hsa_symbol dummy (BRIG_TYPE_NONE, BRIG_SEGMENT_NONE, BRIG_LINKAGE_NONE);
+
+  gcc_assert (TREE_CODE (decl) == PARM_DECL
+	      || TREE_CODE (decl) == RESULT_DECL
+	      || TREE_CODE (decl) == VAR_DECL);
+
+  dummy.m_decl = decl;
+
+  bool is_in_global_vars = TREE_CODE (decl) == VAR_DECL && is_global_var (decl);
+
+  if (is_in_global_vars)
+    slot = hsa_global_variable_symbols->find_slot (&dummy, INSERT);
+  else
+    slot = hsa_cfun->m_local_symbols->find_slot (&dummy, INSERT);
+
+  gcc_checking_assert (slot);
+  if (*slot)
+    {
+      /* If the symbol is problematic, mark current function also as
+	 problematic.  */
+      if ((*slot)->m_seen_error)
+	hsa_fail_cfun ();
+
+      return *slot;
+    }
+  else
+    {
+      hsa_symbol *sym;
+      gcc_assert (TREE_CODE (decl) == VAR_DECL);
+
+      if (is_in_global_vars)
+	{
+	  sym = new hsa_symbol (BRIG_TYPE_NONE, BRIG_SEGMENT_GLOBAL,
+				BRIG_LINKAGE_PROGRAM, true,
+				BRIG_ALLOCATION_PROGRAM);
+	  hsa_cfun->m_global_symbols.safe_push (sym);
+	}
+      else
+	{
+	  /* PARM_DECL and RESULT_DECL should be already in m_local_symbols.  */
+	  gcc_assert (TREE_CODE (decl) == VAR_DECL);
+
+	  sym = new hsa_symbol (BRIG_TYPE_NONE, BRIG_SEGMENT_PRIVATE,
+				BRIG_LINKAGE_FUNCTION);
+	  hsa_cfun->m_private_variables.safe_push (sym);
+	}
+
+      sym->fillup_for_decl (decl);
+      sym->m_name = hsa_get_declaration_name (decl);
+
+      *slot = sym;
+      return sym;
+    }
+}
+
+/* For a given HSA function declaration, return a host
+   function declaration.  */
+
+tree
+hsa_get_host_function (tree decl)
+{
+  hsa_function_summary *s = hsa_summaries->get (cgraph_node::get_create (decl));
+  gcc_assert (s->m_kind != HSA_NONE);
+  gcc_assert (s->m_gpu_implementation_p);
+
+  return s->m_binded_function->decl;
+}
+
+/* Return true if function DECL has a host equivalent function.  */
+
+static char *
+get_brig_function_name (tree decl)
+{
+  tree d = decl;
+
+  hsa_function_summary *s = hsa_summaries->get (cgraph_node::get_create (d));
+  if (s->m_kind != HSA_NONE && s->m_gpu_implementation_p)
+    d = s->m_binded_function->decl;
+
+  /* IPA split can create a function that has no host equivalent.  */
+  if (d == NULL)
+    d = decl;
+
+  char *name = xstrdup (hsa_get_declaration_name (d));
+  hsa_sanitize_name (name);
+
+  return name;
+}
+
+/* Create a spill symbol of type TYPE.  */
+
+hsa_symbol *
+hsa_get_spill_symbol (BrigType16_t type)
+{
+  hsa_symbol *sym = new hsa_symbol (type, BRIG_SEGMENT_SPILL,
+				    BRIG_LINKAGE_FUNCTION);
+  hsa_cfun->m_spill_symbols.safe_push (sym);
+  return sym;
+}
+
+/* Create a symbol for a read-only string constant.  */
+hsa_symbol *
+hsa_get_string_cst_symbol (tree string_cst)
+{
+  gcc_checking_assert (TREE_CODE (string_cst) == STRING_CST);
+
+  hsa_symbol **slot = hsa_cfun->m_string_constants_map.get (string_cst);
+  if (slot)
+    return *slot;
+
+  hsa_op_immed *cst = new hsa_op_immed (string_cst);
+  hsa_symbol *sym = new hsa_symbol (cst->m_type, BRIG_SEGMENT_GLOBAL,
+				    BRIG_LINKAGE_MODULE, true,
+				    BRIG_ALLOCATION_AGENT);
+  sym->m_cst_value = cst;
+  sym->m_dim = TREE_STRING_LENGTH (string_cst);
+  sym->m_name_number = hsa_cfun->m_global_symbols.length ();
+
+  hsa_cfun->m_global_symbols.safe_push (sym);
+  hsa_cfun->m_string_constants_map.put (string_cst, sym);
+  return sym;
+}
+
+/* Constructor of the ancestor of all operands.  K is BRIG kind that identified
+   what the operator is.  */
+
+hsa_op_base::hsa_op_base (BrigKind16_t k): m_next (NULL), m_brig_op_offset (0),
+  m_kind (k)
+{
+  hsa_operands.safe_push (this);
+}
+
+/* Constructor of ancestor of all operands which have a type.  K is BRIG kind
+   that identified what the operator is.  T is the type of the operator.  */
+
+hsa_op_with_type::hsa_op_with_type (BrigKind16_t k, BrigType16_t t)
+  : hsa_op_base (k), m_type (t)
+{
+}
+
+hsa_op_with_type *
+hsa_op_with_type::get_in_type (BrigType16_t dtype, hsa_bb *hbb)
+{
+  if (m_type == dtype)
+    return this;
+
+  hsa_op_reg *dest;
+
+  if (hsa_needs_cvt (dtype, m_type))
+    {
+      dest = new hsa_op_reg (dtype);
+      hbb->append_insn (new hsa_insn_cvt (dest, this));
+    }
+  else
+    {
+      dest = new hsa_op_reg (m_type);
+      hbb->append_insn (new hsa_insn_basic (2, BRIG_OPCODE_MOV,
+					    dest->m_type, dest, this));
+
+      /* We cannot simply for instance: 'mov_u32 $_3, 48 (s32)' because
+	 type of the operand must be same as type of the instruction.  */
+      dest->m_type = dtype;
+    }
+
+  return dest;
+}
+
+/* Constructor of class representing HSA immediate values.  TREE_VAL is the
+   tree representation of the immediate value.  If min32int is true,
+   always expand integer types to one that has at least 32 bits.  */
+
+hsa_op_immed::hsa_op_immed (tree tree_val, bool min32int)
+  : hsa_op_with_type (BRIG_KIND_OPERAND_CONSTANT_BYTES,
+		      hsa_type_for_tree_type (TREE_TYPE (tree_val), NULL,
+					      min32int)),
+  m_brig_repr (NULL)
+{
+  if (hsa_seen_error ())
+    return;
+
+  gcc_checking_assert ((is_gimple_min_invariant (tree_val)
+		       && (!POINTER_TYPE_P (TREE_TYPE (tree_val))
+			   || TREE_CODE (tree_val) == INTEGER_CST))
+		       || TREE_CODE (tree_val) == CONSTRUCTOR);
+  m_tree_value = tree_val;
+  m_brig_repr_size = hsa_get_imm_brig_type_len (m_type);
+
+  if (TREE_CODE (m_tree_value) == STRING_CST)
+    m_brig_repr_size = TREE_STRING_LENGTH (m_tree_value);
+  else if (TREE_CODE (m_tree_value) == CONSTRUCTOR)
+    {
+      m_brig_repr_size = tree_to_uhwi
+	(TYPE_SIZE_UNIT (TREE_TYPE (m_tree_value)));
+
+      /* Verify that all elements of a constructor are constants.  */
+      for (unsigned i = 0;
+	   i < vec_safe_length (CONSTRUCTOR_ELTS (m_tree_value)); i++)
+	{
+	  tree v = CONSTRUCTOR_ELT (m_tree_value, i)->value;
+	  if (!CONSTANT_CLASS_P (v))
+	    {
+	      HSA_SORRY_AT (EXPR_LOCATION (tree_val),
+			    "HSA ctor should have only constants");
+	      return;
+	    }
+	}
+    }
+
+  emit_to_buffer (m_tree_value);
+}
+
+/* Constructor of class representing HSA immediate values.  INTEGER_VALUE is the
+   integer representation of the immediate value.  TYPE is BRIG type.  */
+
+hsa_op_immed::hsa_op_immed (HOST_WIDE_INT integer_value, BrigType16_t type)
+  : hsa_op_with_type (BRIG_KIND_OPERAND_CONSTANT_BYTES, type),
+  m_tree_value (NULL), m_brig_repr (NULL)
+{
+  gcc_assert (hsa_type_integer_p (type));
+  m_int_value = integer_value;
+  m_brig_repr_size = hsa_type_bit_size (type) / BITS_PER_UNIT;
+
+  hsa_bytes bytes;
+
+  switch (m_brig_repr_size)
+    {
+    case 1:
+      bytes.b8 = (uint8_t) m_int_value;
+      break;
+    case 2:
+      bytes.b16 = (uint16_t) m_int_value;
+      break;
+    case 4:
+      bytes.b32 = (uint32_t) m_int_value;
+      break;
+    case 8:
+      bytes.b64 = (uint64_t) m_int_value;
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+  m_brig_repr = XNEWVEC (char, m_brig_repr_size);
+  memcpy (m_brig_repr, &bytes, m_brig_repr_size);
+}
+
+hsa_op_immed::hsa_op_immed ():
+  hsa_op_with_type (BRIG_KIND_NONE, BRIG_TYPE_NONE), m_brig_repr (NULL)
+{
+}
+
+/* New operator to allocate immediate operands from pool alloc.  */
+
+void *
+hsa_op_immed::operator new (size_t)
+{
+  return hsa_allocp_operand_immed->allocate_raw ();
+}
+
+/* Destructor.  */
+
+hsa_op_immed::~hsa_op_immed ()
+{
+  free (m_brig_repr);
+}
+
+/* Change type of the immediate value to T.  */
+
+void
+hsa_op_immed::set_type (BrigType16_t t)
+{
+  m_type = t;
+}
+
+/* Constructor of class representing HSA registers and pseudo-registers.  T is
+   the BRIG type of the new register.  */
+
+hsa_op_reg::hsa_op_reg (BrigType16_t t)
+  : hsa_op_with_type (BRIG_KIND_OPERAND_REGISTER, t), m_gimple_ssa (NULL_TREE),
+  m_def_insn (NULL), m_spill_sym (NULL), m_order (hsa_cfun->m_reg_count++),
+  m_lr_begin (0), m_lr_end (0), m_reg_class (0), m_hard_num (0)
+{
+}
+
+/* New operator to allocate a register from pool alloc.  */
+
+void *
+hsa_op_reg::operator new (size_t)
+{
+  return hsa_allocp_operand_reg->allocate_raw ();
+}
+
+/* Verify register operand.  */
+
+void
+hsa_op_reg::verify_ssa ()
+{
+  /* Verify that each HSA register has a definition assigned.
+     Exceptions are VAR_DECL and PARM_DECL that are a default
+     definition.  */
+  gcc_checking_assert (m_def_insn
+		       || (m_gimple_ssa != NULL
+			   && (!SSA_NAME_VAR (m_gimple_ssa)
+			       || (TREE_CODE (SSA_NAME_VAR (m_gimple_ssa))
+				   != PARM_DECL))
+			   && SSA_NAME_IS_DEFAULT_DEF (m_gimple_ssa)));
+
+  /* Verify that every use of the register is really present
+     in an instruction.  */
+  for (unsigned i = 0; i < m_uses.length (); i++)
+    {
+      hsa_insn_basic *use = m_uses[i];
+
+      bool is_visited = false;
+      for (unsigned j = 0; j < use->operand_count (); j++)
+	{
+	  hsa_op_base *u = use->get_op (j);
+	  hsa_op_address *addr; addr = dyn_cast <hsa_op_address *> (u);
+	  if (addr && addr->m_reg)
+	    u = addr->m_reg;
+
+	  if (u == this)
+	    {
+	      bool r = !addr && use->op_output_p (j);
+
+	      if (r)
+		{
+		  error ("HSA SSA name defined by instruction that is supposed "
+			 "to be using it");
+		  debug_hsa_operand (this);
+		  debug_hsa_insn (use);
+		  internal_error ("HSA SSA verification failed");
+		}
+
+	      is_visited = true;
+	    }
+	}
+
+      if (!is_visited)
+	{
+	  error ("HSA SSA name not among operands of instruction that is "
+		 "supposed to use it");
+	  debug_hsa_operand (this);
+	  debug_hsa_insn (use);
+	  internal_error ("HSA SSA verification failed");
+	}
+    }
+}
+
+hsa_op_address::hsa_op_address (hsa_symbol *sym, hsa_op_reg *r,
+				HOST_WIDE_INT offset)
+  : hsa_op_base (BRIG_KIND_OPERAND_ADDRESS), m_symbol (sym), m_reg (r),
+  m_imm_offset (offset)
+{
+}
+
+hsa_op_address::hsa_op_address (hsa_symbol *sym, HOST_WIDE_INT offset)
+  : hsa_op_base (BRIG_KIND_OPERAND_ADDRESS), m_symbol (sym), m_reg (NULL),
+  m_imm_offset (offset)
+{
+}
+
+hsa_op_address::hsa_op_address (hsa_op_reg *r, HOST_WIDE_INT offset)
+  : hsa_op_base (BRIG_KIND_OPERAND_ADDRESS), m_symbol (NULL), m_reg (r),
+  m_imm_offset (offset)
+{
+}
+
+/* New operator to allocate address operands from pool alloc.  */
+
+void *
+hsa_op_address::operator new (size_t)
+{
+  return hsa_allocp_operand_address->allocate_raw ();
+}
+
+/* Constructor of an operand referring to HSAIL code.  */
+
+hsa_op_code_ref::hsa_op_code_ref () : hsa_op_base (BRIG_KIND_OPERAND_CODE_REF),
+  m_directive_offset (0)
+{
+}
+
+/* Constructor of an operand representing a code list.  Set it up so that it
+   can contain ELEMENTS number of elements.  */
+
+hsa_op_code_list::hsa_op_code_list (unsigned elements)
+  : hsa_op_base (BRIG_KIND_OPERAND_CODE_LIST)
+{
+  m_offsets.create (1);
+  m_offsets.safe_grow_cleared (elements);
+}
+
+/* New operator to allocate code list operands from pool alloc.  */
+
+void *
+hsa_op_code_list::operator new (size_t)
+{
+  return hsa_allocp_operand_code_list->allocate_raw ();
+}
+
+/* Constructor of an operand representing an operand list.
+   Set it up so that it can contain ELEMENTS number of elements.  */
+
+hsa_op_operand_list::hsa_op_operand_list (unsigned elements)
+  : hsa_op_base (BRIG_KIND_OPERAND_OPERAND_LIST)
+{
+  m_offsets.create (elements);
+  m_offsets.safe_grow (elements);
+}
+
+/* New operator to allocate operand list operands from pool alloc.  */
+
+void *
+hsa_op_operand_list::operator new (size_t)
+{
+  return hsa_allocp_operand_operand_list->allocate_raw ();
+}
+
+hsa_op_operand_list::~hsa_op_operand_list ()
+{
+  m_offsets.release ();
+}
+
+
+hsa_op_reg *
+hsa_function_representation::reg_for_gimple_ssa (tree ssa)
+{
+  hsa_op_reg *hreg;
+
+  gcc_checking_assert (TREE_CODE (ssa) == SSA_NAME);
+  if (m_ssa_map[SSA_NAME_VERSION (ssa)])
+    return m_ssa_map[SSA_NAME_VERSION (ssa)];
+
+  hreg = new hsa_op_reg (hsa_type_for_scalar_tree_type (TREE_TYPE (ssa),
+							 true));
+  hreg->m_gimple_ssa = ssa;
+  m_ssa_map[SSA_NAME_VERSION (ssa)] = hreg;
+
+  return hreg;
+}
+
+void
+hsa_op_reg::set_definition (hsa_insn_basic *insn)
+{
+  if (hsa_cfun->m_in_ssa)
+    {
+      gcc_checking_assert (!m_def_insn);
+      m_def_insn = insn;
+    }
+  else
+    m_def_insn = NULL;
+}
+
+/* Constructor of the class which is the bases of all instructions and directly
+   represents the most basic ones.  NOPS is the number of operands that the
+   operand vector will contain (and which will be cleared).  OP is the opcode
+   of the instruction.  This constructor does not set type.  */
+
+hsa_insn_basic::hsa_insn_basic (unsigned nops, int opc): m_prev (NULL),
+  m_next (NULL), m_bb (NULL), m_opcode (opc), m_number (0),
+  m_type (BRIG_TYPE_NONE), m_brig_offset (0)
+{
+  if (nops > 0)
+    m_operands.safe_grow_cleared (nops);
+
+  hsa_instructions.safe_push (this);
+}
+
+/* Make OP the operand number INDEX of operands of this instruction.  If OP is a
+   register or an address containing a register, then either set the definition
+   of the register to this instruction if it an output operand or add this
+   instruction to the uses if it is an input one.  */
+
+void
+hsa_insn_basic::set_op (int index, hsa_op_base *op)
+{
+  /* Each address operand is always use.  */
+  hsa_op_address *addr = dyn_cast <hsa_op_address *> (op);
+  if (addr && addr->m_reg)
+    addr->m_reg->m_uses.safe_push (this);
+  else
+    {
+      hsa_op_reg *reg = dyn_cast <hsa_op_reg *> (op);
+      if (reg)
+	{
+	  if (op_output_p (index))
+	    reg->set_definition (this);
+	  else
+	    reg->m_uses.safe_push (this);
+	}
+    }
+
+  m_operands[index] = op;
+}
+
+/* Get INDEX-th operand of the instruction.  */
+
+hsa_op_base *
+hsa_insn_basic::get_op (int index)
+{
+  return m_operands[index];
+}
+
+/* Get address of INDEX-th operand of the instruction.  */
+
+hsa_op_base **
+hsa_insn_basic::get_op_addr (int index)
+{
+  return &m_operands[index];
+}
+
+/* Get number of operands of the instruction.  */
+unsigned int
+hsa_insn_basic::operand_count ()
+{
+  return m_operands.length ();
+}
+
+/* Constructor of the class which is the bases of all instructions and directly
+   represents the most basic ones.  NOPS is the number of operands that the
+   operand vector will contain (and which will be cleared).  OPC is the opcode
+   of the instruction, T is the type of the instruction.  */
+
+hsa_insn_basic::hsa_insn_basic (unsigned nops, int opc, BrigType16_t t,
+				hsa_op_base *arg0, hsa_op_base *arg1,
+				hsa_op_base *arg2, hsa_op_base *arg3):
+  m_prev (NULL), m_next (NULL), m_bb (NULL), m_opcode (opc),m_number (0),
+  m_type (t),  m_brig_offset (0)
+{
+  if (nops > 0)
+    m_operands.safe_grow_cleared (nops);
+
+  if (arg0 != NULL)
+    {
+      gcc_checking_assert (nops >= 1);
+      set_op (0, arg0);
+    }
+
+  if (arg1 != NULL)
+    {
+      gcc_checking_assert (nops >= 2);
+      set_op (1, arg1);
+    }
+
+  if (arg2 != NULL)
+    {
+      gcc_checking_assert (nops >= 3);
+      set_op (2, arg2);
+    }
+
+  if (arg3 != NULL)
+    {
+      gcc_checking_assert (nops >= 4);
+      set_op (3, arg3);
+    }
+
+  hsa_instructions.safe_push (this);
+}
+
+/* New operator to allocate basic instruction from pool alloc.  */
+
+void *
+hsa_insn_basic::operator new (size_t)
+{
+  return hsa_allocp_inst_basic->allocate_raw ();
+}
+
+/* Verify the instruction.  */
+
+void
+hsa_insn_basic::verify ()
+{
+  hsa_op_address *addr;
+  hsa_op_reg *reg;
+
+  /* Iterate all register operands and verify that the instruction
+     is set in uses of the register.  */
+  for (unsigned i = 0; i < operand_count (); i++)
+    {
+      hsa_op_base *use = get_op (i);
+
+      if ((addr = dyn_cast <hsa_op_address *> (use)) && addr->m_reg)
+	{
+	  gcc_assert (addr->m_reg->m_def_insn != this);
+	  use = addr->m_reg;
+	}
+
+      if ((reg = dyn_cast <hsa_op_reg *> (use)) && !op_output_p (i))
+	{
+	  unsigned j;
+	  for (j = 0; j < reg->m_uses.length (); j++)
+	    {
+	      if (reg->m_uses[j] == this)
+		break;
+	    }
+
+	  if (j == reg->m_uses.length ())
+	    {
+	      error ("HSA instruction uses a register but is not among "
+		     "recorded register uses");
+	      debug_hsa_operand (reg);
+	      debug_hsa_insn (this);
+	      internal_error ("HSA instruction verification failed");
+	    }
+	}
+    }
+}
+
+/* Constructor of an instruction representing a PHI node.  NOPS is the number
+   of operands (equal to the number of predecessors).  */
+
+hsa_insn_phi::hsa_insn_phi (unsigned nops, hsa_op_reg *dst)
+  : hsa_insn_basic (nops, HSA_OPCODE_PHI), m_dest (dst)
+{
+  dst->set_definition (this);
+}
+
+/* New operator to allocate PHI instruction from pool alloc.  */
+
+void *
+hsa_insn_phi::operator new (size_t)
+{
+  return hsa_allocp_inst_phi->allocate_raw ();
+}
+
+/* Constructor of class representing instruction for conditional jump, CTRL is
+   the control register determining whether the jump will be carried out, the
+   new instruction is automatically added to its uses list.  */
+
+hsa_insn_br::hsa_insn_br (hsa_op_reg *ctrl)
+: hsa_insn_basic (1, BRIG_OPCODE_CBR, BRIG_TYPE_B1, ctrl),
+  m_width (BRIG_WIDTH_1)
+{
+}
+
+/* New operator to allocate branch instruction from pool alloc.  */
+
+void *
+hsa_insn_br::operator new (size_t)
+{
+  return hsa_allocp_inst_br->allocate_raw ();
+}
+
+/* Constructor of class representing instruction for switch jump, CTRL is
+   the index register.  */
+
+hsa_insn_sbr::hsa_insn_sbr (hsa_op_reg *index, unsigned jump_count)
+: hsa_insn_basic (1, BRIG_OPCODE_SBR, BRIG_TYPE_B1, index),
+  m_width (BRIG_WIDTH_1), m_jump_table (vNULL), m_default_bb (NULL),
+  m_label_code_list (new hsa_op_code_list (jump_count))
+{
+}
+
+/* New operator to allocate switch branch instruction from pool alloc.  */
+
+void *
+hsa_insn_sbr::operator new (size_t)
+{
+  return hsa_allocp_inst_sbr->allocate_raw ();
+}
+
+/* Replace all occurrences of OLD_BB with NEW_BB in the statements
+   jump table.  */
+
+void
+hsa_insn_sbr::replace_all_labels (basic_block old_bb, basic_block new_bb)
+{
+  for (unsigned i = 0; i < m_jump_table.length (); i++)
+    if (m_jump_table[i] == old_bb)
+      m_jump_table[i] = new_bb;
+}
+
+hsa_insn_sbr::~hsa_insn_sbr ()
+{
+  m_jump_table.release ();
+}
+
+/* Constructor of comparison instruction.  CMP is the comparison operation and T
+   is the result type.  */
+
+hsa_insn_cmp::hsa_insn_cmp (BrigCompareOperation8_t cmp, BrigType16_t t,
+			    hsa_op_base *arg0, hsa_op_base *arg1,
+			    hsa_op_base *arg2)
+  : hsa_insn_basic (3 , BRIG_OPCODE_CMP, t, arg0, arg1, arg2), m_compare (cmp)
+{
+}
+
+/* New operator to allocate compare instruction from pool alloc.  */
+
+void *
+hsa_insn_cmp::operator new (size_t)
+{
+  return hsa_allocp_inst_cmp->allocate_raw ();
+}
+
+/* Constructor of classes representing memory accesses.  OPC is the opcode (must
+   be BRIG_OPCODE_ST or BRIG_OPCODE_LD) and T is the type.  The instruction
+   operands are provided as ARG0 and ARG1.  */
+
+hsa_insn_mem::hsa_insn_mem (int opc, BrigType16_t t, hsa_op_base *arg0,
+			    hsa_op_base *arg1)
+  : hsa_insn_basic (2, opc, t, arg0, arg1),
+  m_align (hsa_natural_alignment (t)), m_equiv_class (0)
+{
+  gcc_checking_assert (opc == BRIG_OPCODE_LD || opc == BRIG_OPCODE_ST);
+}
+
+/* Constructor for descendants allowing different opcodes and number of
+   operands, it passes its arguments directly to hsa_insn_basic
+   constructor.  The instruction operands are provided as ARG[0-3].  */
+
+
+hsa_insn_mem::hsa_insn_mem (unsigned nops, int opc, BrigType16_t t,
+			    hsa_op_base *arg0, hsa_op_base *arg1,
+			    hsa_op_base *arg2, hsa_op_base *arg3)
+  : hsa_insn_basic (nops, opc, t, arg0, arg1, arg2, arg3),
+  m_align (hsa_natural_alignment (t)), m_equiv_class (0)
+{
+}
+
+/* New operator to allocate memory instruction from pool alloc.  */
+
+void *
+hsa_insn_mem::operator new (size_t)
+{
+  return hsa_allocp_inst_mem->allocate_raw ();
+}
+
+/* Constructor of class representing atomic instructions and signals. OPC is
+   the principal opcode, aop is the specific atomic operation opcode.  T is the
+   type of the instruction.  The instruction operands
+   are provided as ARG[0-3].  */
+
+hsa_insn_atomic::hsa_insn_atomic (int nops, int opc,
+				  enum BrigAtomicOperation aop,
+				  BrigType16_t t, BrigMemoryOrder memorder,
+				  hsa_op_base *arg0,
+				  hsa_op_base *arg1, hsa_op_base *arg2,
+				  hsa_op_base *arg3)
+  : hsa_insn_mem (nops, opc, t, arg0, arg1, arg2, arg3), m_atomicop (aop),
+  m_memoryorder (memorder),
+  m_memoryscope (BRIG_MEMORY_SCOPE_SYSTEM)
+{
+  gcc_checking_assert (opc == BRIG_OPCODE_ATOMICNORET ||
+		       opc == BRIG_OPCODE_ATOMIC ||
+		       opc == BRIG_OPCODE_SIGNAL ||
+		       opc == BRIG_OPCODE_SIGNALNORET);
+}
+
+/* New operator to allocate signal instruction from pool alloc.  */
+
+void *
+hsa_insn_atomic::operator new (size_t)
+{
+  return hsa_allocp_inst_atomic->allocate_raw ();
+}
+
+/* Constructor of class representing signal instructions.  OPC is the prinicpal
+   opcode, sop is the specific signal operation opcode.  T is the type of the
+   instruction.  The instruction operands are provided as ARG[0-3].  */
+
+hsa_insn_signal::hsa_insn_signal (int nops, int opc,
+				  enum BrigAtomicOperation sop,
+				  BrigType16_t t, hsa_op_base *arg0,
+				  hsa_op_base *arg1, hsa_op_base *arg2,
+				  hsa_op_base *arg3)
+  : hsa_insn_atomic (nops, opc, sop, t, BRIG_MEMORY_ORDER_SC_ACQUIRE_RELEASE,
+		     arg0, arg1, arg2, arg3)
+{
+}
+
+/* New operator to allocate signal instruction from pool alloc.  */
+
+void *
+hsa_insn_signal::operator new (size_t)
+{
+  return hsa_allocp_inst_signal->allocate_raw ();
+}
+
+/* Constructor of class representing segment conversion instructions.  OPC is
+   the opcode which must be either BRIG_OPCODE_STOF or BRIG_OPCODE_FTOS.  DEST
+   and SRCT are destination and source types respectively, SEG is the segment
+   we are converting to or from.  The instruction operands are
+   provided as ARG0 and ARG1.  */
+
+hsa_insn_seg::hsa_insn_seg (int opc, BrigType16_t dest, BrigType16_t srct,
+			    BrigSegment8_t seg, hsa_op_base *arg0,
+			    hsa_op_base *arg1)
+  : hsa_insn_basic (2, opc, dest, arg0, arg1), m_src_type (srct),
+  m_segment (seg)
+{
+  gcc_checking_assert (opc == BRIG_OPCODE_STOF || opc == BRIG_OPCODE_FTOS);
+}
+
+/* New operator to allocate address conversion instruction from pool alloc.  */
+
+void *
+hsa_insn_seg::operator new (size_t)
+{
+  return hsa_allocp_inst_seg->allocate_raw ();
+}
+
+/* Constructor of class representing a call instruction.  CALLEE is the tree
+   representation of the function being called.  */
+
+hsa_insn_call::hsa_insn_call (tree callee)
+  : hsa_insn_basic (0, BRIG_OPCODE_CALL), m_called_function (callee),
+  m_output_arg (NULL), m_args_code_list (NULL), m_result_code_list (NULL)
+{
+}
+
+hsa_insn_call::hsa_insn_call (hsa_internal_fn *fn)
+  : hsa_insn_basic (0, BRIG_OPCODE_CALL), m_called_function (NULL),
+  m_called_internal_fn (fn), m_output_arg (NULL), m_args_code_list (NULL),
+  m_result_code_list (NULL)
+{
+}
+
+/* New operator to allocate call instruction from pool alloc.  */
+
+void *
+hsa_insn_call::operator new (size_t)
+{
+  return hsa_allocp_inst_call->allocate_raw ();
+}
+
+hsa_insn_call::~hsa_insn_call ()
+{
+  for (unsigned i = 0; i < m_input_args.length (); i++)
+    delete m_input_args[i];
+
+  delete m_output_arg;
+
+  m_input_args.release ();
+  m_input_arg_insns.release ();
+}
+
+/* Constructor of class representing the argument block required to invoke
+   a call in HSAIL.  */
+hsa_insn_arg_block::hsa_insn_arg_block (BrigKind brig_kind,
+					hsa_insn_call * call)
+  : hsa_insn_basic (0, HSA_OPCODE_ARG_BLOCK), m_kind (brig_kind),
+  m_call_insn (call)
+{
+}
+
+/* New operator to allocate argument block instruction from pool alloc.  */
+
+void *
+hsa_insn_arg_block::operator new (size_t)
+{
+  return hsa_allocp_inst_arg_block->allocate_raw ();
+}
+
+hsa_insn_comment::hsa_insn_comment (const char *s)
+  : hsa_insn_basic (0, BRIG_KIND_DIRECTIVE_COMMENT)
+{
+  unsigned l = strlen (s);
+
+  /* Append '// ' to the string.  */
+  char *buf = XNEWVEC (char, l + 4);
+  sprintf (buf, "// %s", s);
+  m_comment = buf;
+}
+
+/* New operator to allocate comment instruction from pool alloc.  */
+
+void *
+hsa_insn_comment::operator new (size_t)
+{
+  return hsa_allocp_inst_comment->allocate_raw ();
+}
+
+hsa_insn_comment::~hsa_insn_comment ()
+{
+  gcc_checking_assert (m_comment);
+  free (m_comment);
+  m_comment = NULL;
+}
+
+/* Constructor of class representing the queue instruction in HSAIL.  */
+hsa_insn_queue::hsa_insn_queue (int nops, BrigOpcode opcode)
+  : hsa_insn_basic (nops, opcode, BRIG_TYPE_U64)
+{
+}
+
+/* New operator to allocate source type instruction from pool alloc.  */
+
+void *
+hsa_insn_srctype::operator new (size_t)
+{
+  return hsa_allocp_inst_srctype->allocate_raw ();
+}
+
+/* Constructor of class representing the source type instruction in HSAIL.  */
+
+hsa_insn_srctype::hsa_insn_srctype (int nops, BrigOpcode opcode,
+				    BrigType16_t destt, BrigType16_t srct,
+				    hsa_op_base *arg0, hsa_op_base *arg1,
+				    hsa_op_base *arg2 = NULL)
+  : hsa_insn_basic (nops, opcode, destt, arg0, arg1, arg2),
+  m_source_type (srct)
+{}
+
+/* New operator to allocate packed instruction from pool alloc.  */
+
+void *
+hsa_insn_packed::operator new (size_t)
+{
+  return hsa_allocp_inst_packed->allocate_raw ();
+}
+
+/* Constructor of class representing the packed instruction in HSAIL.  */
+
+hsa_insn_packed::hsa_insn_packed (int nops, BrigOpcode opcode,
+				  BrigType16_t destt, BrigType16_t srct,
+				  hsa_op_base *arg0, hsa_op_base *arg1,
+				  hsa_op_base *arg2)
+  : hsa_insn_srctype (nops, opcode, destt, srct, arg0, arg1, arg2)
+{
+  m_operand_list = new hsa_op_operand_list (nops - 1);
+}
+
+/* New operator to allocate convert instruction from pool alloc.  */
+
+void *
+hsa_insn_cvt::operator new (size_t)
+{
+  return hsa_allocp_inst_cvt->allocate_raw ();
+}
+
+/* Constructor of class representing the convert instruction in HSAIL.  */
+
+hsa_insn_cvt::hsa_insn_cvt (hsa_op_with_type *dest, hsa_op_with_type *src)
+  : hsa_insn_basic (2, BRIG_OPCODE_CVT, dest->m_type, dest, src)
+{
+}
+
+/* New operator to allocate alloca from pool alloc.  */
+
+void *
+hsa_insn_alloca::operator new (size_t)
+{
+  return hsa_allocp_inst_alloca->allocate_raw ();
+}
+
+/* Constructor of class representing the alloca in HSAIL.  */
+
+hsa_insn_alloca::hsa_insn_alloca (hsa_op_with_type *dest,
+				  hsa_op_with_type *size, unsigned alignment)
+  : hsa_insn_basic (2, BRIG_OPCODE_ALLOCA, dest->m_type, dest, size),
+  m_align (BRIG_ALIGNMENT_8)
+{
+  gcc_assert (dest->m_type == BRIG_TYPE_U32);
+  if (alignment)
+    m_align = hsa_alignment_encoding (alignment);
+}
+
+/* Append an instruction INSN into the basic block.  */
+
+void
+hsa_bb::append_insn (hsa_insn_basic *insn)
+{
+  gcc_assert (insn->m_opcode != 0 || insn->operand_count () == 0);
+  gcc_assert (!insn->m_bb);
+
+  insn->m_bb = m_bb;
+  insn->m_prev = m_last_insn;
+  insn->m_next = NULL;
+  if (m_last_insn)
+    m_last_insn->m_next = insn;
+  m_last_insn = insn;
+  if (!m_first_insn)
+    m_first_insn = insn;
+}
+
+/* Insert HSA instruction NEW_INSN immediately before an existing instruction
+   OLD_INSN.  */
+
+static void
+hsa_insert_insn_before (hsa_insn_basic *new_insn, hsa_insn_basic *old_insn)
+{
+  hsa_bb *hbb = hsa_bb_for_bb (old_insn->m_bb);
+
+  if (hbb->m_first_insn == old_insn)
+    hbb->m_first_insn = new_insn;
+  new_insn->m_prev = old_insn->m_prev;
+  new_insn->m_next = old_insn;
+  if (old_insn->m_prev)
+    old_insn->m_prev->m_next = new_insn;
+  old_insn->m_prev = new_insn;
+}
+
+/* Append HSA instruction NEW_INSN immediately after an existing instruction
+   OLD_INSN.  */
+
+static void
+hsa_append_insn_after (hsa_insn_basic *new_insn, hsa_insn_basic *old_insn)
+{
+  hsa_bb *hbb = hsa_bb_for_bb (old_insn->m_bb);
+
+  if (hbb->m_last_insn == old_insn)
+    hbb->m_last_insn = new_insn;
+  new_insn->m_prev = old_insn;
+  new_insn->m_next = old_insn->m_next;
+  if (old_insn->m_next)
+    old_insn->m_next->m_prev = new_insn;
+  old_insn->m_next = new_insn;
+}
+
+/* Return a register containing the calculated value of EXP which must be an
+   expression consisting of PLUS_EXPRs, MULT_EXPRs, NOP_EXPRs, SSA_NAMEs and
+   integer constants as returned by get_inner_reference.
+   Newly generated HSA instructions will be appended to HBB.
+   Perform all calculations in ADDRTYPE.  */
+
+static hsa_op_with_type *
+gen_address_calculation (tree exp, hsa_bb *hbb, BrigType16_t addrtype)
+{
+  int opcode;
+
+  if (TREE_CODE (exp) == NOP_EXPR)
+    exp = TREE_OPERAND (exp, 0);
+
+  switch (TREE_CODE (exp))
+    {
+    case SSA_NAME:
+      return hsa_cfun->reg_for_gimple_ssa (exp)->get_in_type (addrtype, hbb);
+
+    case INTEGER_CST:
+      {
+       hsa_op_immed *imm = new hsa_op_immed (exp);
+       if (addrtype != imm->m_type)
+	 imm->m_type = addrtype;
+       return imm;
+      }
+
+    case PLUS_EXPR:
+      opcode = BRIG_OPCODE_ADD;
+      break;
+
+    case MULT_EXPR:
+      opcode = BRIG_OPCODE_MUL;
+      break;
+
+    default:
+      gcc_unreachable ();
+    }
+
+  hsa_op_reg *res = new hsa_op_reg (addrtype);
+  hsa_insn_basic *insn = new hsa_insn_basic (3, opcode, addrtype);
+  insn->set_op (0, res);
+
+  hsa_op_with_type *op1 = gen_address_calculation (TREE_OPERAND (exp, 0), hbb,
+						   addrtype);
+  hsa_op_with_type *op2 = gen_address_calculation (TREE_OPERAND (exp, 1), hbb,
+						   addrtype);
+  insn->set_op (1, op1);
+  insn->set_op (2, op2);
+
+  hbb->append_insn (insn);
+  return res;
+}
+
+/* If R1 is NULL, just return R2, otherwise append an instruction adding them
+   to HBB and return the register holding the result.  */
+
+static hsa_op_reg *
+add_addr_regs_if_needed (hsa_op_reg *r1, hsa_op_reg *r2, hsa_bb *hbb)
+{
+  gcc_checking_assert (r2);
+  if (!r1)
+    return r2;
+
+  hsa_op_reg *res = new hsa_op_reg (r1->m_type);
+  gcc_assert (!hsa_needs_cvt (r1->m_type, r2->m_type));
+  hsa_insn_basic *insn = new hsa_insn_basic (3, BRIG_OPCODE_ADD, res->m_type);
+  insn->set_op (0, res);
+  insn->set_op (1, r1);
+  insn->set_op (2, r2);
+  hbb->append_insn (insn);
+  return res;
+}
+
+/* Helper of gen_hsa_addr.  Update *SYMBOL, *ADDRTYPE, *REG and *OFFSET to
+   reflect BASE which is the first operand of a MEM_REF or a TARGET_MEM_REF.  */
+
+static void
+process_mem_base (tree base, hsa_symbol **symbol, BrigType16_t *addrtype,
+		  hsa_op_reg **reg, offset_int *offset, hsa_bb *hbb)
+{
+  if (TREE_CODE (base) == SSA_NAME)
+    {
+      gcc_assert (!*reg);
+      hsa_op_with_type *ssa = hsa_cfun->reg_for_gimple_ssa (base)->get_in_type
+	(*addrtype, hbb);
+      *reg = dyn_cast <hsa_op_reg *> (ssa);
+    }
+  else if (TREE_CODE (base) == ADDR_EXPR)
+    {
+      tree decl = TREE_OPERAND (base, 0);
+
+      if (!DECL_P (decl) || TREE_CODE (decl) == FUNCTION_DECL)
+	{
+	  HSA_SORRY_AT (EXPR_LOCATION (base),
+			"support for HSA does not implement a memory reference "
+			"to a non-declaration type");
+	  return;
+	}
+
+      gcc_assert (!*symbol);
+
+      *symbol = get_symbol_for_decl (decl);
+      *addrtype = hsa_get_segment_addr_type ((*symbol)->m_segment);
+    }
+  else if (TREE_CODE (base) == INTEGER_CST)
+    *offset += wi::to_offset (base);
+  else
+    gcc_unreachable ();
+}
+
+/* Forward declaration of a function.  */
+
+static void
+gen_hsa_addr_insns (tree val, hsa_op_reg *dest, hsa_bb *hbb);
+
+/* Generate HSA address operand for a given tree memory reference REF.  If
+   instructions need to be created to calculate the address, they will be added
+   to the end of HBB.  If a caller provider OUTPUT_BITSIZE and OUTPUT_BITPOS,
+   the function assumes that the caller will handle possible
+   bit-field references.  Otherwise if we reference a bit-field, sorry message
+   is displayed.  */
+
+static hsa_op_address *
+gen_hsa_addr (tree ref, hsa_bb *hbb, HOST_WIDE_INT *output_bitsize = NULL,
+	      HOST_WIDE_INT *output_bitpos = NULL)
+{
+  hsa_symbol *symbol = NULL;
+  hsa_op_reg *reg = NULL;
+  offset_int offset = 0;
+  tree origref = ref;
+  tree varoffset = NULL_TREE;
+  BrigType16_t addrtype = hsa_get_segment_addr_type (BRIG_SEGMENT_FLAT);
+  HOST_WIDE_INT bitsize = 0, bitpos = 0;
+  BrigType16_t flat_addrtype = hsa_get_segment_addr_type (BRIG_SEGMENT_FLAT);
+
+  if (TREE_CODE (ref) == STRING_CST)
+    {
+      symbol = hsa_get_string_cst_symbol (ref);
+      goto out;
+    }
+  else if (TREE_CODE (ref) == BIT_FIELD_REF
+	   && ((tree_to_uhwi (TREE_OPERAND (ref, 1)) % BITS_PER_UNIT) != 0
+	       || (tree_to_uhwi (TREE_OPERAND (ref, 2)) % BITS_PER_UNIT) != 0))
+    {
+      HSA_SORRY_ATV (EXPR_LOCATION (origref),
+		     "support for HSA does not implement "
+		     "bit field references such as %E", ref);
+      goto out;
+    }
+
+  if (handled_component_p (ref))
+    {
+      enum machine_mode mode;
+      int unsignedp, volatilep, preversep;
+
+      ref = get_inner_reference (ref, &bitsize, &bitpos, &varoffset, &mode,
+				 &unsignedp, &preversep, &volatilep, false);
+
+      offset = bitpos;
+      offset = wi::rshift (offset, LOG2_BITS_PER_UNIT, SIGNED);
+    }
+
+  switch (TREE_CODE (ref))
+    {
+    case ADDR_EXPR:
+      {
+	addrtype = hsa_get_segment_addr_type (BRIG_SEGMENT_PRIVATE);
+	symbol = hsa_cfun->create_hsa_temporary (flat_addrtype);
+	hsa_op_reg *r = new hsa_op_reg (flat_addrtype);
+	gen_hsa_addr_insns (ref, r, hbb);
+	hbb->append_insn (new hsa_insn_mem (BRIG_OPCODE_ST, r->m_type,
+					    r, new hsa_op_address (symbol)));
+
+	break;
+      }
+    case SSA_NAME:
+      {
+	addrtype = hsa_get_segment_addr_type (BRIG_SEGMENT_PRIVATE);
+	symbol = hsa_cfun->create_hsa_temporary (flat_addrtype);
+	hsa_op_reg *r = hsa_cfun->reg_for_gimple_ssa (ref);
+
+	hbb->append_insn (new hsa_insn_mem (BRIG_OPCODE_ST, r->m_type,
+					    r, new hsa_op_address (symbol)));
+
+	break;
+      }
+    case PARM_DECL:
+    case VAR_DECL:
+    case RESULT_DECL:
+      gcc_assert (!symbol);
+      symbol = get_symbol_for_decl (ref);
+      addrtype = hsa_get_segment_addr_type (symbol->m_segment);
+      break;
+
+    case MEM_REF:
+      process_mem_base (TREE_OPERAND (ref, 0), &symbol, &addrtype, &reg,
+			&offset, hbb);
+
+      if (!integer_zerop (TREE_OPERAND (ref, 1)))
+	offset += wi::to_offset (TREE_OPERAND (ref, 1));
+      break;
+
+    case TARGET_MEM_REF:
+      process_mem_base (TMR_BASE (ref), &symbol, &addrtype, &reg, &offset, hbb);
+      if (TMR_INDEX (ref))
+	{
+	  hsa_op_reg *disp1;
+	  hsa_op_base *idx = hsa_cfun->reg_for_gimple_ssa
+	    (TMR_INDEX (ref))->get_in_type (addrtype, hbb);
+	  if (TMR_STEP (ref) && !integer_onep (TMR_STEP (ref)))
+	    {
+	      disp1 = new hsa_op_reg (addrtype);
+	      hsa_insn_basic *insn = new hsa_insn_basic (3, BRIG_OPCODE_MUL,
+							 addrtype);
+
+	      /* As step must respect addrtype, we overwrite the type
+		 of an immediate value.  */
+	      hsa_op_immed *step = new hsa_op_immed (TMR_STEP (ref));
+	      step->m_type = addrtype;
+
+	      insn->set_op (0, disp1);
+	      insn->set_op (1, idx);
+	      insn->set_op (2, step);
+	      hbb->append_insn (insn);
+	    }
+	  else
+	    disp1 = as_a <hsa_op_reg *> (idx);
+	  reg = add_addr_regs_if_needed (reg, disp1, hbb);
+	}
+      if (TMR_INDEX2 (ref))
+	{
+	  hsa_op_base *disp2 = hsa_cfun->reg_for_gimple_ssa
+	    (TMR_INDEX2 (ref))->get_in_type (addrtype, hbb);
+	  reg = add_addr_regs_if_needed (reg, as_a <hsa_op_reg *> (disp2), hbb);
+	}
+      offset += wi::to_offset (TMR_OFFSET (ref));
+      break;
+    case FUNCTION_DECL:
+      HSA_SORRY_AT (EXPR_LOCATION (origref),
+		    "support for HSA does not implement function pointers");
+      goto out;
+    default:
+      HSA_SORRY_ATV (EXPR_LOCATION (origref), "support for HSA does "
+		     "not implement memory access to %E", origref);
+      goto out;
+    }
+
+  if (varoffset)
+    {
+      if (TREE_CODE (varoffset) == INTEGER_CST)
+	offset += wi::to_offset (varoffset);
+      else
+	{
+	  hsa_op_base *off_op = gen_address_calculation (varoffset, hbb,
+							 addrtype);
+	  reg = add_addr_regs_if_needed (reg, as_a <hsa_op_reg *> (off_op),
+					 hbb);
+	}
+    }
+
+  gcc_checking_assert ((symbol
+			&& addrtype
+			== hsa_get_segment_addr_type (symbol->m_segment))
+		       || (!symbol
+			   && addrtype
+			   == hsa_get_segment_addr_type (BRIG_SEGMENT_FLAT)));
+out:
+  HOST_WIDE_INT hwi_offset = offset.to_shwi ();
+
+  /* Calculate remaining bitsize offset (if presented).  */
+  bitpos %= BITS_PER_UNIT;
+  /* If bitsize is a power of two that is greater or equal to BITS_PER_UNIT, it
+     is not a reason to think this is a bit-field access.  */
+  if (bitpos == 0
+      && (bitsize >= BITS_PER_UNIT)
+      && !(bitsize & (bitsize - 1)))
+    bitsize = 0;
+
+  if ((bitpos || bitsize) && (output_bitpos == NULL || output_bitsize == NULL))
+    HSA_SORRY_ATV (EXPR_LOCATION (origref), "support for HSA does not "
+		   "implement unhandled bit field reference such as %E", ref);
+
+  if (output_bitsize != NULL && output_bitpos != NULL)
+    {
+      *output_bitsize = bitsize;
+      *output_bitpos = bitpos;
+    }
+
+  return new hsa_op_address (symbol, reg, hwi_offset);
+}
+
+/* Generate HSA address for a function call argument of given TYPE.
+   INDEX is used to generate corresponding name of the arguments.
+   Special value -1 represents fact that result value is created.  */
+
+static hsa_op_address *
+gen_hsa_addr_for_arg (tree tree_type, int index)
+{
+  hsa_symbol *sym = new hsa_symbol (BRIG_TYPE_NONE, BRIG_SEGMENT_ARG,
+				    BRIG_LINKAGE_ARG);
+  sym->m_type = hsa_type_for_tree_type (tree_type, &sym->m_dim);
+
+  if (index == -1) /* Function result.  */
+    sym->m_name = "res";
+  else /* Function call arguments.  */
+    {
+      sym->m_name = NULL;
+      sym->m_name_number = index;
+    }
+
+  return new hsa_op_address (sym);
+}
+
+/* Generate HSA instructions that calculate address of VAL including all
+   necessary conversions to flat addressing and place the result into DEST.
+   Instructions are appended to HBB.  */
+
+static void
+gen_hsa_addr_insns (tree val, hsa_op_reg *dest, hsa_bb *hbb)
+{
+  /* Handle cases like tmp = NULL, where we just emit a move instruction
+     to a register.  */
+  if (TREE_CODE (val) == INTEGER_CST)
+    {
+      hsa_op_immed *c = new hsa_op_immed (val);
+      hsa_insn_basic *insn = new hsa_insn_basic (2, BRIG_OPCODE_MOV,
+						 dest->m_type, dest, c);
+      hbb->append_insn (insn);
+      return;
+    }
+
+  hsa_op_address *addr;
+
+  gcc_assert (dest->m_type == hsa_get_segment_addr_type (BRIG_SEGMENT_FLAT));
+  if (TREE_CODE (val) == ADDR_EXPR)
+    val = TREE_OPERAND (val, 0);
+  addr = gen_hsa_addr (val, hbb);
+  hsa_insn_basic *insn = new hsa_insn_basic (2, BRIG_OPCODE_LDA);
+  insn->set_op (1, addr);
+  if (addr->m_symbol && addr->m_symbol->m_segment != BRIG_SEGMENT_GLOBAL)
+    {
+      /* LDA produces segment-relative address, we need to convert
+	 it to the flat one.  */
+      hsa_op_reg *tmp;
+      tmp = new hsa_op_reg (hsa_get_segment_addr_type
+			    (addr->m_symbol->m_segment));
+      hsa_insn_seg *seg;
+      seg = new hsa_insn_seg (BRIG_OPCODE_STOF,
+			      hsa_get_segment_addr_type (BRIG_SEGMENT_FLAT),
+			      tmp->m_type, addr->m_symbol->m_segment, dest,
+			      tmp);
+
+      insn->set_op (0, tmp);
+      insn->m_type = tmp->m_type;
+      hbb->append_insn (insn);
+      hbb->append_insn (seg);
+    }
+  else
+    {
+      insn->set_op (0, dest);
+      insn->m_type = hsa_get_segment_addr_type (BRIG_SEGMENT_FLAT);
+      hbb->append_insn (insn);
+    }
+}
+
+/* Return an HSA register or HSA immediate value operand corresponding to
+   gimple operand OP.  */
+
+static hsa_op_with_type *
+hsa_reg_or_immed_for_gimple_op (tree op, hsa_bb *hbb)
+{
+  hsa_op_reg *tmp;
+
+  if (TREE_CODE (op) == SSA_NAME)
+    tmp = hsa_cfun->reg_for_gimple_ssa (op);
+  else if (!POINTER_TYPE_P (TREE_TYPE (op)))
+    return new hsa_op_immed (op);
+  else
+    {
+      tmp = new hsa_op_reg (hsa_get_segment_addr_type (BRIG_SEGMENT_FLAT));
+      gen_hsa_addr_insns (op, tmp, hbb);
+    }
+  return tmp;
+}
+
+/* Create a simple movement instruction with register destination DEST and
+   register or immediate source SRC and append it to the end of HBB.  */
+
+void
+hsa_build_append_simple_mov (hsa_op_reg *dest, hsa_op_base *src, hsa_bb *hbb)
+{
+  hsa_insn_basic *insn = new hsa_insn_basic (2, BRIG_OPCODE_MOV, dest->m_type,
+					     dest, src);
+  if (hsa_op_reg *sreg = dyn_cast <hsa_op_reg *> (src))
+    gcc_assert (hsa_type_bit_size (dest->m_type)
+		== hsa_type_bit_size (sreg->m_type));
+  else
+    gcc_assert (hsa_type_bit_size (dest->m_type)
+		== hsa_type_bit_size (as_a <hsa_op_immed *> (src)->m_type));
+
+  hbb->append_insn (insn);
+}
+
+/* Generate HSAIL instructions loading a bit field into register DEST.
+   VALUE_REG is a register of a SSA name that is used in the bit field
+   reference.  To identify a bit field BITPOS is offset to the loaded memory
+   and BITSIZE is number of bits of the bit field.
+   Add instructions to HBB.  */
+
+static void
+gen_hsa_insns_for_bitfield (hsa_op_reg *dest, hsa_op_reg *value_reg,
+			    HOST_WIDE_INT bitsize, HOST_WIDE_INT bitpos,
+			    hsa_bb *hbb)
+{
+  unsigned type_bitsize = hsa_type_bit_size (dest->m_type);
+  unsigned left_shift = type_bitsize - (bitsize + bitpos);
+  unsigned right_shift = left_shift + bitpos;
+
+  if (left_shift)
+    {
+      hsa_op_reg *value_reg_2 = new hsa_op_reg (dest->m_type);
+      hsa_op_immed *c = new hsa_op_immed (left_shift, BRIG_TYPE_U32);
+
+      hsa_insn_basic *lshift = new hsa_insn_basic
+	(3, BRIG_OPCODE_SHL, value_reg_2->m_type, value_reg_2, value_reg, c);
+
+      hbb->append_insn (lshift);
+
+      value_reg = value_reg_2;
+    }
+
+  if (right_shift)
+    {
+      hsa_op_reg *value_reg_2 = new hsa_op_reg (dest->m_type);
+      hsa_op_immed *c = new hsa_op_immed (right_shift, BRIG_TYPE_U32);
+
+      hsa_insn_basic *rshift = new hsa_insn_basic
+	(3, BRIG_OPCODE_SHR, value_reg_2->m_type, value_reg_2, value_reg, c);
+
+      hbb->append_insn (rshift);
+
+      value_reg = value_reg_2;
+    }
+
+    hsa_insn_basic *assignment = new hsa_insn_basic
+      (2, BRIG_OPCODE_MOV, dest->m_type, dest, value_reg);
+    hbb->append_insn (assignment);
+}
+
+
+/* Generate HSAIL instructions loading a bit field into register DEST.  ADDR is
+   prepared memory address which is used to load the bit field.  To identify a
+   bit field BITPOS is offset to the loaded memory and BITSIZE is number of
+   bits of the bit field.  Add instructions to HBB.  Load must be performed in
+   alignment ALIGN.  */
+
+static void
+gen_hsa_insns_for_bitfield_load (hsa_op_reg *dest, hsa_op_address *addr,
+				 HOST_WIDE_INT bitsize, HOST_WIDE_INT bitpos,
+				 hsa_bb *hbb, BrigAlignment8_t align)
+{
+  hsa_op_reg *value_reg = new hsa_op_reg (dest->m_type);
+  hsa_insn_mem *mem = new hsa_insn_mem (BRIG_OPCODE_LD, dest->m_type, value_reg,
+					addr);
+  mem->set_align (align);
+  hbb->append_insn (mem);
+  gen_hsa_insns_for_bitfield (dest, value_reg, bitsize, bitpos, hbb);
+}
+
+/* Return the alignment of base memory accesses we issue to perform bit-field
+   memory access REF.  */
+
+static BrigAlignment8_t
+hsa_bitmemref_alignment (tree ref)
+{
+  unsigned HOST_WIDE_INT bit_offset = 0;
+
+  while (true)
+    {
+      if (TREE_CODE (ref) == BIT_FIELD_REF)
+	{
+	  if (!tree_fits_uhwi_p (TREE_OPERAND (ref, 2)))
+	    return BRIG_ALIGNMENT_1;
+	  bit_offset += tree_to_uhwi (TREE_OPERAND (ref, 2));
+	}
+      else if (TREE_CODE (ref) == COMPONENT_REF
+	       && DECL_BIT_FIELD (TREE_OPERAND (ref, 1)))
+	bit_offset += int_bit_position (TREE_OPERAND (ref, 1));
+      else
+	break;
+      ref = TREE_OPERAND (ref, 0);
+    }
+
+  unsigned HOST_WIDE_INT bits = bit_offset % BITS_PER_UNIT;
+  unsigned HOST_WIDE_INT byte_bits = bit_offset - bits;
+  BrigAlignment8_t base = hsa_alignment_encoding (get_object_alignment (ref));
+  if (byte_bits == 0)
+    return base;
+  return MIN (base, hsa_alignment_encoding (byte_bits & -byte_bits));
+}
+
+/* Generate HSAIL instructions loading something into register DEST.  RHS is
+   tree representation of the loaded data, which are loaded as type TYPE.  Add
+   instructions to HBB.  */
+
+static void
+gen_hsa_insns_for_load (hsa_op_reg *dest, tree rhs, tree type, hsa_bb *hbb)
+{
+  /* The destination SSA name will give us the type.  */
+  if (TREE_CODE (rhs) == VIEW_CONVERT_EXPR)
+    rhs = TREE_OPERAND (rhs, 0);
+
+  if (TREE_CODE (rhs) == SSA_NAME)
+    {
+      hsa_op_reg *src = hsa_cfun->reg_for_gimple_ssa (rhs);
+      hsa_build_append_simple_mov (dest, src, hbb);
+    }
+  else if (is_gimple_min_invariant (rhs)
+	   || TREE_CODE (rhs) == ADDR_EXPR)
+    {
+      if (POINTER_TYPE_P (TREE_TYPE (rhs)))
+	{
+	  if (dest->m_type != hsa_get_segment_addr_type (BRIG_SEGMENT_FLAT))
+	    {
+	      HSA_SORRY_ATV (EXPR_LOCATION (rhs),
+			     "support for HSA does not implement conversion "
+			     "of %E to the requested non-pointer type.", rhs);
+	      return;
+	    }
+
+	  gen_hsa_addr_insns (rhs, dest, hbb);
+	}
+      else if (TREE_CODE (rhs) == COMPLEX_CST)
+	{
+	  hsa_op_immed *real_part = new hsa_op_immed (TREE_REALPART (rhs));
+	  hsa_op_immed *imag_part = new hsa_op_immed (TREE_IMAGPART (rhs));
+
+	  hsa_op_reg *real_part_reg = new hsa_op_reg
+	    (hsa_type_for_scalar_tree_type (TREE_TYPE (type), true));
+	  hsa_op_reg *imag_part_reg = new hsa_op_reg
+	    (hsa_type_for_scalar_tree_type (TREE_TYPE (type), true));
+
+	  hsa_build_append_simple_mov (real_part_reg, real_part, hbb);
+	  hsa_build_append_simple_mov (imag_part_reg, imag_part, hbb);
+
+	  BrigType16_t src_type = hsa_bittype_for_type (real_part_reg->m_type);
+
+	  hsa_insn_packed *insn = new hsa_insn_packed
+	    (3, BRIG_OPCODE_COMBINE, dest->m_type, src_type, dest,
+	     real_part_reg, imag_part_reg);
+	  hbb->append_insn (insn);
+	}
+      else
+	{
+	  hsa_op_immed *imm = new hsa_op_immed (rhs);
+	  hsa_build_append_simple_mov (dest, imm, hbb);
+	}
+    }
+  else if (TREE_CODE (rhs) == REALPART_EXPR || TREE_CODE (rhs) == IMAGPART_EXPR)
+    {
+      tree pack_type = TREE_TYPE (TREE_OPERAND (rhs, 0));
+
+      hsa_op_reg *packed_reg = new hsa_op_reg
+	(hsa_type_for_scalar_tree_type (pack_type, true));
+
+      tree complex_rhs = TREE_OPERAND (rhs, 0);
+      gen_hsa_insns_for_load (packed_reg, complex_rhs, TREE_TYPE (complex_rhs),
+			      hbb);
+
+      hsa_op_reg *real_reg = new hsa_op_reg
+	(hsa_type_for_scalar_tree_type (type, true));
+
+      hsa_op_reg *imag_reg = new hsa_op_reg
+	(hsa_type_for_scalar_tree_type (type, true));
+
+      BrigKind16_t brig_type = packed_reg->m_type;
+      hsa_insn_packed *packed = new hsa_insn_packed
+	(3, BRIG_OPCODE_EXPAND, hsa_bittype_for_type (real_reg->m_type),
+	 brig_type, real_reg, imag_reg, packed_reg);
+
+      hbb->append_insn (packed);
+
+      hsa_op_reg *source = TREE_CODE (rhs) == REALPART_EXPR ?
+	real_reg : imag_reg;
+
+      hsa_insn_basic *insn = new hsa_insn_basic (2, BRIG_OPCODE_MOV,
+						 dest->m_type, dest, source);
+
+      hbb->append_insn (insn);
+    }
+  else if (TREE_CODE (rhs) == BIT_FIELD_REF
+	   && TREE_CODE (TREE_OPERAND (rhs, 0)) == SSA_NAME)
+    {
+      tree ssa_name = TREE_OPERAND (rhs, 0);
+      HOST_WIDE_INT bitsize = tree_to_uhwi (TREE_OPERAND (rhs, 1));
+      HOST_WIDE_INT bitpos = tree_to_uhwi (TREE_OPERAND (rhs, 2));
+
+      hsa_op_reg *imm_value = hsa_cfun->reg_for_gimple_ssa (ssa_name);
+      gen_hsa_insns_for_bitfield (dest, imm_value, bitsize, bitpos, hbb);
+    }
+  else if (DECL_P (rhs) || TREE_CODE (rhs) == MEM_REF
+	   || TREE_CODE (rhs) == TARGET_MEM_REF
+	   || handled_component_p (rhs))
+    {
+      HOST_WIDE_INT bitsize, bitpos;
+
+      /* Load from memory.  */
+      hsa_op_address *addr;
+      addr = gen_hsa_addr (rhs, hbb, &bitsize, &bitpos);
+
+      /* Handle load of a bit field.  */
+      if (bitsize > 64)
+	{
+	  HSA_SORRY_AT (EXPR_LOCATION (rhs),
+			"support for HSA does not implement load from a bit "
+			"field bigger than 64 bits");
+	  return;
+	}
+
+      if (bitsize || bitpos)
+	gen_hsa_insns_for_bitfield_load (dest, addr, bitsize, bitpos, hbb,
+					 hsa_bitmemref_alignment (rhs));
+      else
+	{
+	  BrigType16_t mtype;
+	  /* Not dest->m_type, that's possibly extended.  */
+	  mtype = mem_type_for_type (hsa_type_for_scalar_tree_type (type,
+								    false));
+	  hsa_insn_mem *mem = new hsa_insn_mem (BRIG_OPCODE_LD, mtype, dest,
+						addr);
+	  mem->set_align (hsa_alignment_encoding (get_object_alignment (rhs)));
+	  hbb->append_insn (mem);
+	}
+    }
+  else
+    HSA_SORRY_ATV
+      (EXPR_LOCATION (rhs),
+       "support for HSA does not implement loading of expression %E", rhs);
+}
+
+/* Return number of bits necessary for representation of a bit field,
+   starting at BITPOS with size of BITSIZE.  */
+
+static unsigned
+get_bitfield_size (unsigned bitpos, unsigned bitsize)
+{
+  unsigned s = bitpos + bitsize;
+  unsigned sizes[] = {8, 16, 32, 64};
+
+  for (unsigned i = 0; i < 4; i++)
+    if (s <= sizes[i])
+      return sizes[i];
+
+  gcc_unreachable ();
+  return 0;
+}
+
+/* Generate HSAIL instructions storing into memory.  LHS is the destination of
+   the store, SRC is the source operand.  Add instructions to HBB.  */
+
+static void
+gen_hsa_insns_for_store (tree lhs, hsa_op_base *src, hsa_bb *hbb)
+{
+  HOST_WIDE_INT bitsize = 0, bitpos = 0;
+  BrigAlignment8_t req_align;
+  BrigType16_t mtype;
+  mtype = mem_type_for_type (hsa_type_for_scalar_tree_type (TREE_TYPE (lhs),
+							    false));
+  hsa_op_address *addr;
+  addr = gen_hsa_addr (lhs, hbb, &bitsize, &bitpos);
+
+  /* Handle store to a bit field.  */
+  if (bitsize > 64)
+    {
+      HSA_SORRY_AT (EXPR_LOCATION (lhs),
+		    "support for HSA does not implement store to a bit field "
+		    "bigger than 64 bits");
+      return;
+    }
+
+  unsigned type_bitsize = get_bitfield_size (bitpos, bitsize);
+
+  /* HSAIL does not support MOV insn with 16-bits integers.  */
+  if (type_bitsize < 32)
+    type_bitsize = 32;
+
+  if (bitpos || (bitsize && type_bitsize != bitsize))
+    {
+      unsigned HOST_WIDE_INT mask = 0;
+      BrigType16_t mem_type = get_integer_type_by_bytes
+	(type_bitsize / BITS_PER_UNIT, !TYPE_UNSIGNED (TREE_TYPE (lhs)));
+
+      for (unsigned i = 0; i < type_bitsize; i++)
+	if (i < bitpos || i >= bitpos + bitsize)
+	  mask |= ((unsigned HOST_WIDE_INT)1 << i);
+
+      hsa_op_reg *value_reg = new hsa_op_reg (mem_type);
+
+      req_align = hsa_bitmemref_alignment (lhs);
+      /* Load value from memory.  */
+      hsa_insn_mem *mem = new hsa_insn_mem (BRIG_OPCODE_LD, mem_type,
+					    value_reg, addr);
+      mem->set_align (req_align);
+      hbb->append_insn (mem);
+
+      /* AND the loaded value with prepared mask.  */
+      hsa_op_reg *cleared_reg = new hsa_op_reg (mem_type);
+
+      hsa_op_immed *c = new hsa_op_immed
+	(mask, get_integer_type_by_bytes (type_bitsize / BITS_PER_UNIT, false));
+
+      hsa_insn_basic *clearing = new hsa_insn_basic
+	(3, BRIG_OPCODE_AND, mem_type, cleared_reg, value_reg, c);
+      hbb->append_insn (clearing);
+
+      /* Shift to left a value that is going to be stored.  */
+      hsa_op_reg *new_value_reg = new hsa_op_reg (mem_type);
+
+      hsa_insn_basic *basic = new hsa_insn_basic (2, BRIG_OPCODE_MOV, mem_type,
+						  new_value_reg, src);
+      hbb->append_insn (basic);
+
+      if (bitpos)
+	{
+	  hsa_op_reg *shifted_value_reg = new hsa_op_reg (mem_type);
+	  c = new hsa_op_immed (bitpos, BRIG_TYPE_U32);
+
+	  hsa_insn_basic *basic = new hsa_insn_basic
+	    (3, BRIG_OPCODE_SHL, mem_type, shifted_value_reg, new_value_reg, c);
+	  hbb->append_insn (basic);
+
+	  new_value_reg = shifted_value_reg;
+	}
+
+      /* OR the prepared value with prepared chunk loaded from memory.  */
+      hsa_op_reg *prepared_reg= new hsa_op_reg (mem_type);
+      basic = new hsa_insn_basic (3, BRIG_OPCODE_OR, mem_type, prepared_reg,
+				  new_value_reg, cleared_reg);
+      hbb->append_insn (basic);
+
+      src = prepared_reg;
+      mtype = mem_type;
+    }
+  else
+    req_align = hsa_alignment_encoding (get_object_alignment (lhs));
+
+  hsa_insn_mem *mem = new hsa_insn_mem (BRIG_OPCODE_ST, mtype, src, addr);
+  mem->set_align (req_align);
+
+  /* XXX The HSAIL verifier has another constraint: if the source
+     is an immediate then it must match the destination type.  If
+     it's a register the low bits will be used for sub-word stores.
+     We're always allocating new operands so we can modify the above
+     in place.  */
+  if (hsa_op_immed *imm = dyn_cast <hsa_op_immed *> (src))
+    {
+      if ((imm->m_type & BRIG_TYPE_PACK_MASK) == BRIG_TYPE_PACK_NONE)
+	imm->m_type = mem->m_type;
+      else
+	{
+	  /* ...and all vector immediates apparently need to be vectors of
+	     unsigned bytes. */
+	  unsigned bs = hsa_type_bit_size (imm->m_type);
+	  gcc_assert (bs == hsa_type_bit_size (mem->m_type));
+	  switch (bs)
+	    {
+	    case 32:
+	      imm->m_type = BRIG_TYPE_U8X4;
+	      break;
+	    case 64:
+	      imm->m_type = BRIG_TYPE_U8X8;
+	      break;
+	    case 128:
+	      imm->m_type = BRIG_TYPE_U8X16;
+	      break;
+	    default:
+	      gcc_unreachable ();
+	    }
+	}
+    }
+
+  hbb->append_insn (mem);
+}
+
+/* Generate memory copy instructions that are going to be used
+   for copying a HSA symbol SRC_SYMBOL (or SRC_REG) to TARGET memory,
+   represented by pointer in a register.  */
+
+static void
+gen_hsa_memory_copy (hsa_bb *hbb, hsa_op_address *target, hsa_op_address *src,
+		     unsigned size)
+{
+  hsa_op_address *addr;
+  hsa_insn_mem *mem;
+
+  unsigned offset = 0;
+
+  while (size)
+    {
+      unsigned s;
+      if (size >= 8)
+	s = 8;
+      else if (size >= 4)
+	s = 4;
+      else if (size >= 2)
+	s = 2;
+      else
+	s = 1;
+
+      BrigType16_t t = get_integer_type_by_bytes (s, false);
+
+      hsa_op_reg *tmp = new hsa_op_reg (t);
+      addr = new hsa_op_address (src->m_symbol, src->m_reg,
+				 src->m_imm_offset + offset);
+      mem = new hsa_insn_mem (BRIG_OPCODE_LD, t, tmp, addr);
+      hbb->append_insn (mem);
+
+      addr = new hsa_op_address (target->m_symbol, target->m_reg,
+				 target->m_imm_offset + offset);
+      mem = new hsa_insn_mem (BRIG_OPCODE_ST, t, tmp, addr);
+      hbb->append_insn (mem);
+      offset += s;
+      size -= s;
+    }
+}
+
+/* Create a memset mask that is created by copying a CONSTANT byte value
+   to an integer of BYTE_SIZE bytes.  */
+
+static unsigned HOST_WIDE_INT
+build_memset_value (unsigned HOST_WIDE_INT constant, unsigned byte_size)
+{
+  if (constant == 0)
+    return 0;
+
+  HOST_WIDE_INT v = constant;
+
+  for (unsigned i = 1; i < byte_size; i++)
+    v |= constant << (8 * i);
+
+  return v;
+}
+
+/* Generate memory set instructions that are going to be used
+   for setting a CONSTANT byte value to TARGET memory of SIZE bytes.  */
+
+static void
+gen_hsa_memory_set (hsa_bb *hbb, hsa_op_address *target,
+		    unsigned HOST_WIDE_INT constant,
+		    unsigned size)
+{
+  hsa_op_address *addr;
+  hsa_insn_mem *mem;
+
+  unsigned offset = 0;
+
+  while (size)
+    {
+      unsigned s;
+      if (size >= 8)
+	s = 8;
+      else if (size >= 4)
+	s = 4;
+      else if (size >= 2)
+	s = 2;
+      else
+	s = 1;
+
+      addr = new hsa_op_address (target->m_symbol, target->m_reg,
+				 target->m_imm_offset + offset);
+
+      BrigType16_t t = get_integer_type_by_bytes (s, false);
+      HOST_WIDE_INT c = build_memset_value (constant, s);
+
+      mem = new hsa_insn_mem (BRIG_OPCODE_ST, t, new hsa_op_immed (c, t),
+			      addr);
+      hbb->append_insn (mem);
+      offset += s;
+      size -= s;
+    }
+}
+
+/* Generate HSAIL instructions for a single assignment
+   of an empty constructor to an ADDR_LHS.  Constructor is passed as a
+   tree RHS and all instructions are appended to HBB.  */
+
+void
+gen_hsa_ctor_assignment (hsa_op_address *addr_lhs, tree rhs, hsa_bb *hbb)
+{
+  if (vec_safe_length (CONSTRUCTOR_ELTS (rhs)))
+    {
+      HSA_SORRY_AT (EXPR_LOCATION (rhs),
+		    "support for HSA does not implement load from constructor");
+      return;
+    }
+
+  unsigned size = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (rhs)));
+  gen_hsa_memory_set (hbb, addr_lhs, 0, size);
+}
+
+/* Generate HSA instructions for a single assignment of RHS to LHS.
+   HBB is the basic block they will be appended to.  */
+
+static void
+gen_hsa_insns_for_single_assignment (tree lhs, tree rhs, hsa_bb *hbb)
+{
+  if (TREE_CODE (lhs) == SSA_NAME)
+    {
+      hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs);
+      if (hsa_seen_error ())
+	return;
+
+      gen_hsa_insns_for_load (dest, rhs, TREE_TYPE (lhs), hbb);
+    }
+  else if (TREE_CODE (rhs) == SSA_NAME
+	   || (is_gimple_min_invariant (rhs) && TREE_CODE (rhs) != STRING_CST))
+    {
+      /* Store to memory.  */
+      hsa_op_base *src = hsa_reg_or_immed_for_gimple_op (rhs, hbb);
+      if (hsa_seen_error ())
+	return;
+
+      gen_hsa_insns_for_store (lhs, src, hbb);
+    }
+  else
+    {
+      hsa_op_address *addr_lhs = gen_hsa_addr (lhs, hbb);
+
+      if (TREE_CODE (rhs) == CONSTRUCTOR)
+	gen_hsa_ctor_assignment (addr_lhs, rhs, hbb);
+      else
+	{
+	  hsa_op_address *addr_rhs = gen_hsa_addr (rhs, hbb);
+
+	  unsigned size = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (rhs)));
+	  gen_hsa_memory_copy (hbb, addr_lhs, addr_rhs, size);
+	}
+    }
+}
+
+/* Prepend before INSN a load from spill symbol of SPILL_REG.  Return the
+   register into which we loaded.  If this required another register to convert
+   from a B1 type, return it in *PTMP2, otherwise store NULL into it.  We
+   assume we are out of SSA so the returned register does not have its
+   definition set.  */
+
+hsa_op_reg *
+hsa_spill_in (hsa_insn_basic *insn, hsa_op_reg *spill_reg, hsa_op_reg **ptmp2)
+{
+  hsa_symbol *spill_sym = spill_reg->m_spill_sym;
+  hsa_op_reg *reg = new hsa_op_reg (spill_sym->m_type);
+  hsa_op_address *addr = new hsa_op_address (spill_sym);
+
+  hsa_insn_mem *mem = new hsa_insn_mem (BRIG_OPCODE_LD, spill_sym->m_type,
+					reg, addr);
+  hsa_insert_insn_before (mem, insn);
+
+  *ptmp2 = NULL;
+  if (spill_reg->m_type == BRIG_TYPE_B1)
+    {
+      hsa_insn_basic *cvtinsn;
+      *ptmp2 = reg;
+      reg = new hsa_op_reg (spill_reg->m_type);
+
+      cvtinsn = new hsa_insn_cvt (reg, *ptmp2);
+      hsa_insert_insn_before (cvtinsn, insn);
+    }
+  return reg;
+}
+
+/* Append after INSN a store to spill symbol of SPILL_REG.  Return the register
+   from which we stored.  If this required another register to convert to a B1
+   type, return it in *PTMP2, otherwise store NULL into it.  We assume we are
+   out of SSA so the returned register does not have its use updated.  */
+
+hsa_op_reg *
+hsa_spill_out (hsa_insn_basic *insn, hsa_op_reg *spill_reg, hsa_op_reg **ptmp2)
+{
+  hsa_symbol *spill_sym = spill_reg->m_spill_sym;
+  hsa_op_reg *reg = new hsa_op_reg (spill_sym->m_type);
+  hsa_op_address *addr = new hsa_op_address (spill_sym);
+  hsa_op_reg *returnreg;
+
+  *ptmp2 = NULL;
+  returnreg = reg;
+  if (spill_reg->m_type == BRIG_TYPE_B1)
+    {
+      hsa_insn_basic *cvtinsn;
+      *ptmp2 = new hsa_op_reg (spill_sym->m_type);
+      reg->m_type = spill_reg->m_type;
+
+      cvtinsn = new hsa_insn_cvt (*ptmp2, returnreg);
+      hsa_append_insn_after (cvtinsn, insn);
+      insn = cvtinsn;
+      reg = *ptmp2;
+    }
+
+  hsa_insn_mem *mem = new hsa_insn_mem (BRIG_OPCODE_ST, spill_sym->m_type, reg,
+					addr);
+  hsa_append_insn_after (mem, insn);
+  return returnreg;
+}
+
+/* Generate a comparison instruction that will compare LHS and RHS with
+   comparison specified by CODE and put result into register DEST.  DEST has to
+   have its type set already but must not have its definition set yet.
+   Generated instructions will be added to HBB.  */
+
+static void
+gen_hsa_cmp_insn_from_gimple (enum tree_code code, tree lhs, tree rhs,
+			      hsa_op_reg *dest, hsa_bb *hbb)
+{
+  BrigCompareOperation8_t compare;
+
+  switch (code)
+    {
+    case LT_EXPR:
+      compare = BRIG_COMPARE_LT;
+      break;
+    case LE_EXPR:
+      compare = BRIG_COMPARE_LE;
+      break;
+    case GT_EXPR:
+      compare = BRIG_COMPARE_GT;
+      break;
+    case GE_EXPR:
+      compare = BRIG_COMPARE_GE;
+      break;
+    case EQ_EXPR:
+      compare = BRIG_COMPARE_EQ;
+      break;
+    case NE_EXPR:
+      compare = BRIG_COMPARE_NE;
+      break;
+    case UNORDERED_EXPR:
+      compare = BRIG_COMPARE_NAN;
+      break;
+    case ORDERED_EXPR:
+      compare = BRIG_COMPARE_NUM;
+      break;
+    case UNLT_EXPR:
+      compare = BRIG_COMPARE_LTU;
+      break;
+    case UNLE_EXPR:
+      compare = BRIG_COMPARE_LEU;
+      break;
+    case UNGT_EXPR:
+      compare = BRIG_COMPARE_GTU;
+      break;
+    case UNGE_EXPR:
+      compare = BRIG_COMPARE_GEU;
+      break;
+    case UNEQ_EXPR:
+      compare = BRIG_COMPARE_EQU;
+      break;
+    case LTGT_EXPR:
+      compare = BRIG_COMPARE_NEU;
+      break;
+
+    default:
+      HSA_SORRY_ATV (EXPR_LOCATION (lhs),
+		     "support for HSA does not implement comparison tree "
+		     "code %s\n", get_tree_code_name (code));
+      return;
+    }
+
+  /* CMP instruction returns e.g. 0xffffffff (for a 32-bit with integer)
+     as a result of comparison.  */
+
+  BrigType16_t dest_type = hsa_type_integer_p (dest->m_type)
+    ? (BrigType16_t) BRIG_TYPE_B1 : dest->m_type;
+
+  hsa_insn_cmp *cmp = new hsa_insn_cmp (compare, dest_type);
+  cmp->set_op (1, hsa_reg_or_immed_for_gimple_op (lhs, hbb));
+  cmp->set_op (2, hsa_reg_or_immed_for_gimple_op (rhs, hbb));
+
+  hbb->append_insn (cmp);
+  cmp->set_output_in_type (dest, 0, hbb);
+}
+
+/* Generate an unary instruction with OPCODE and append it to a basic block
+   HBB.  The instruction uses DEST as a destination and OP1
+   as a single operand.  */
+
+static void
+gen_hsa_unary_operation (BrigOpcode opcode, hsa_op_reg *dest,
+			 hsa_op_with_type *op1, hsa_bb *hbb)
+{
+  gcc_checking_assert (dest);
+  hsa_insn_basic *insn;
+
+  if (opcode == BRIG_OPCODE_MOV && hsa_needs_cvt (dest->m_type, op1->m_type))
+    insn = new hsa_insn_cvt (dest, op1);
+  else if (opcode == BRIG_OPCODE_FIRSTBIT || opcode == BRIG_OPCODE_LASTBIT)
+    insn = new hsa_insn_srctype (2, opcode, BRIG_TYPE_U32, op1->m_type, NULL,
+				 op1);
+  else
+    {
+      insn = new hsa_insn_basic (2, opcode, dest->m_type, dest, op1);
+
+      if (opcode == BRIG_OPCODE_ABS || opcode == BRIG_OPCODE_NEG)
+	{
+	  /* ABS and NEG only exist in _s form :-/  */
+	  if (insn->m_type == BRIG_TYPE_U32)
+	    insn->m_type = BRIG_TYPE_S32;
+	  else if (insn->m_type == BRIG_TYPE_U64)
+	    insn->m_type = BRIG_TYPE_S64;
+	}
+    }
+
+  hbb->append_insn (insn);
+
+  if (opcode == BRIG_OPCODE_FIRSTBIT || opcode == BRIG_OPCODE_LASTBIT)
+    insn->set_output_in_type (dest, 0, hbb);
+}
+
+/* Generate a binary instruction with OPCODE and append it to a basic block
+   HBB.  The instruction uses DEST as a destination and operands OP1
+   and OP2.  */
+
+static void
+gen_hsa_binary_operation (int opcode, hsa_op_reg *dest,
+			  hsa_op_base *op1, hsa_op_base *op2, hsa_bb *hbb)
+{
+  gcc_checking_assert (dest);
+
+  if ((opcode == BRIG_OPCODE_SHL || opcode == BRIG_OPCODE_SHR)
+      && is_a <hsa_op_immed *> (op2))
+    {
+      hsa_op_immed *i = dyn_cast <hsa_op_immed *> (op2);
+      i->set_type (BRIG_TYPE_U32);
+    }
+  if ((opcode == BRIG_OPCODE_OR
+       || opcode == BRIG_OPCODE_XOR
+       || opcode == BRIG_OPCODE_AND)
+      && is_a <hsa_op_immed *> (op2))
+    {
+      hsa_op_immed *i = dyn_cast <hsa_op_immed *> (op2);
+      i->set_type (hsa_uint_for_bitsize (hsa_type_bit_size (i->m_type)));
+    }
+
+  hsa_insn_basic *insn = new hsa_insn_basic (3, opcode, dest->m_type, dest,
+					     op1, op2);
+  hbb->append_insn (insn);
+}
+
+/* Generate HSA instructions for a single assignment.  HBB is the basic block
+   they will be appended to.  */
+
+static void
+gen_hsa_insns_for_operation_assignment (gimple *assign, hsa_bb *hbb)
+{
+  tree_code code = gimple_assign_rhs_code (assign);
+  gimple_rhs_class rhs_class = get_gimple_rhs_class (gimple_expr_code (assign));
+
+  tree lhs = gimple_assign_lhs (assign);
+  tree rhs1 = gimple_assign_rhs1 (assign);
+  tree rhs2 = gimple_assign_rhs2 (assign);
+  tree rhs3 = gimple_assign_rhs3 (assign);
+
+  BrigOpcode opcode;
+
+  switch (code)
+    {
+    CASE_CONVERT:
+    case FLOAT_EXPR:
+      /* The opcode is changed to BRIG_OPCODE_CVT if BRIG types
+	 needs a conversion.  */
+      opcode = BRIG_OPCODE_MOV;
+      break;
+
+    case PLUS_EXPR:
+    case POINTER_PLUS_EXPR:
+      opcode = BRIG_OPCODE_ADD;
+      break;
+    case MINUS_EXPR:
+      opcode = BRIG_OPCODE_SUB;
+      break;
+    case MULT_EXPR:
+      opcode = BRIG_OPCODE_MUL;
+      break;
+    case MULT_HIGHPART_EXPR:
+      opcode = BRIG_OPCODE_MULHI;
+      break;
+    case RDIV_EXPR:
+    case TRUNC_DIV_EXPR:
+    case EXACT_DIV_EXPR:
+      opcode = BRIG_OPCODE_DIV;
+      break;
+    case CEIL_DIV_EXPR:
+    case FLOOR_DIV_EXPR:
+    case ROUND_DIV_EXPR:
+      HSA_SORRY_AT (gimple_location (assign),
+		    "support for HSA does not implement CEIL_DIV_EXPR, "
+		    "FLOOR_DIV_EXPR or ROUND_DIV_EXPR");
+      return;
+    case TRUNC_MOD_EXPR:
+      opcode = BRIG_OPCODE_REM;
+      break;
+    case CEIL_MOD_EXPR:
+    case FLOOR_MOD_EXPR:
+    case ROUND_MOD_EXPR:
+      HSA_SORRY_AT (gimple_location (assign),
+		    "support for HSA does not implement CEIL_MOD_EXPR, "
+		    "FLOOR_MOD_EXPR or ROUND_MOD_EXPR");
+      return;
+    case NEGATE_EXPR:
+      opcode = BRIG_OPCODE_NEG;
+      break;
+    case MIN_EXPR:
+      opcode = BRIG_OPCODE_MIN;
+      break;
+    case MAX_EXPR:
+      opcode = BRIG_OPCODE_MAX;
+      break;
+    case ABS_EXPR:
+      opcode = BRIG_OPCODE_ABS;
+      break;
+    case LSHIFT_EXPR:
+      opcode = BRIG_OPCODE_SHL;
+      break;
+    case RSHIFT_EXPR:
+      opcode = BRIG_OPCODE_SHR;
+      break;
+    case LROTATE_EXPR:
+    case RROTATE_EXPR:
+      {
+	hsa_insn_basic *insn = NULL;
+	int code1 = code == LROTATE_EXPR ? BRIG_OPCODE_SHL : BRIG_OPCODE_SHR;
+	int code2 = code != LROTATE_EXPR ? BRIG_OPCODE_SHL : BRIG_OPCODE_SHR;
+	BrigType16_t btype = hsa_type_for_scalar_tree_type (TREE_TYPE (lhs),
+							    true);
+
+	hsa_op_with_type *src = hsa_reg_or_immed_for_gimple_op (rhs1, hbb);
+	hsa_op_reg *op1 = new hsa_op_reg (btype);
+	hsa_op_reg *op2 = new hsa_op_reg (btype);
+	hsa_op_with_type *shift1 = hsa_reg_or_immed_for_gimple_op (rhs2, hbb);
+
+	tree type = TREE_TYPE (rhs2);
+	unsigned HOST_WIDE_INT bitsize = TREE_INT_CST_LOW (TYPE_SIZE (type));
+
+	hsa_op_with_type *shift2 = NULL;
+	if (TREE_CODE (rhs2) == INTEGER_CST)
+	  shift2 = new hsa_op_immed (bitsize - tree_to_uhwi (rhs2),
+				     BRIG_TYPE_U32);
+	else if (TREE_CODE (rhs2) == SSA_NAME)
+	  {
+	    hsa_op_reg *s = hsa_cfun->reg_for_gimple_ssa (rhs2);
+	    hsa_op_reg *d = new hsa_op_reg (s->m_type);
+	    hsa_op_immed *size_imm = new hsa_op_immed (bitsize, BRIG_TYPE_U32);
+
+	    insn = new hsa_insn_basic (3, BRIG_OPCODE_SUB, d->m_type,
+				       d, s, size_imm);
+	    hbb->append_insn (insn);
+
+	    shift2 = d;
+	  }
+	else
+	  gcc_unreachable ();
+
+	hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs);
+	gen_hsa_binary_operation (code1, op1, src, shift1, hbb);
+	gen_hsa_binary_operation (code2, op2, src, shift2, hbb);
+	gen_hsa_binary_operation (BRIG_OPCODE_OR, dest, op1, op2, hbb);
+
+	return;
+      }
+    case BIT_IOR_EXPR:
+      opcode = BRIG_OPCODE_OR;
+      break;
+    case BIT_XOR_EXPR:
+      opcode = BRIG_OPCODE_XOR;
+      break;
+    case BIT_AND_EXPR:
+      opcode = BRIG_OPCODE_AND;
+      break;
+    case BIT_NOT_EXPR:
+      opcode = BRIG_OPCODE_NOT;
+      break;
+    case FIX_TRUNC_EXPR:
+      {
+	hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs);
+	hsa_op_with_type *v = hsa_reg_or_immed_for_gimple_op (rhs1, hbb);
+
+	if (hsa_needs_cvt (dest->m_type, v->m_type))
+	  {
+	    hsa_op_reg *tmp = new hsa_op_reg (v->m_type);
+
+	    hsa_insn_basic *insn = new hsa_insn_basic (2, BRIG_OPCODE_TRUNC,
+						       tmp->m_type, tmp, v);
+	    hbb->append_insn (insn);
+
+	    hsa_insn_basic *cvtinsn = new hsa_insn_cvt (dest, tmp);
+	    hbb->append_insn (cvtinsn);
+	  }
+	else
+	  {
+	    hsa_insn_basic *insn = new hsa_insn_basic (2, BRIG_OPCODE_TRUNC,
+						       dest->m_type, dest, v);
+	    hbb->append_insn (insn);
+	  }
+
+	return;
+      }
+      opcode = BRIG_OPCODE_TRUNC;
+      break;
+
+    case LT_EXPR:
+    case LE_EXPR:
+    case GT_EXPR:
+    case GE_EXPR:
+    case EQ_EXPR:
+    case NE_EXPR:
+    case UNORDERED_EXPR:
+    case ORDERED_EXPR:
+    case UNLT_EXPR:
+    case UNLE_EXPR:
+    case UNGT_EXPR:
+    case UNGE_EXPR:
+    case UNEQ_EXPR:
+    case LTGT_EXPR:
+      {
+	hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa
+	  (gimple_assign_lhs (assign));
+
+	gen_hsa_cmp_insn_from_gimple (code, rhs1, rhs2, dest, hbb);
+	return;
+      }
+    case COND_EXPR:
+      {
+	hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa
+	  (gimple_assign_lhs (assign));
+	hsa_op_with_type *ctrl = NULL;
+	tree cond = rhs1;
+
+	if (CONSTANT_CLASS_P (cond) || TREE_CODE (cond) == SSA_NAME)
+	  ctrl = hsa_reg_or_immed_for_gimple_op (cond, hbb);
+	else
+	  {
+	    hsa_op_reg *r = new hsa_op_reg (BRIG_TYPE_B1);
+
+	    gen_hsa_cmp_insn_from_gimple (TREE_CODE (cond),
+				  TREE_OPERAND (cond, 0),
+				  TREE_OPERAND (cond, 1),
+				  r, hbb);
+
+	    ctrl = r;
+	  }
+
+	hsa_op_with_type *rhs2_reg = hsa_reg_or_immed_for_gimple_op
+	  (rhs2, hbb);
+	hsa_op_with_type *rhs3_reg = hsa_reg_or_immed_for_gimple_op
+	  (rhs3, hbb);
+
+	BrigType16_t btype = hsa_bittype_for_type (dest->m_type);
+	hsa_op_reg *tmp = new hsa_op_reg (btype);
+
+	rhs2_reg->m_type = btype;
+	rhs3_reg->m_type = btype;
+
+	hsa_insn_basic *insn = new hsa_insn_basic
+	  (4, BRIG_OPCODE_CMOV, tmp->m_type, tmp, ctrl, rhs2_reg, rhs3_reg);
+
+	hbb->append_insn (insn);
+
+	/* As operands of a CMOV insn must be Bx types, we have to emit
+	   a conversion insn.  */
+	hsa_insn_basic *mov = new hsa_insn_basic (2, BRIG_OPCODE_MOV,
+						  dest->m_type, dest, tmp);
+	hbb->append_insn (mov);
+
+	return;
+      }
+    case COMPLEX_EXPR:
+      {
+	hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa
+	  (gimple_assign_lhs (assign));
+	hsa_op_with_type *rhs1_reg = hsa_reg_or_immed_for_gimple_op (rhs1, hbb);
+	hsa_op_with_type *rhs2_reg = hsa_reg_or_immed_for_gimple_op (rhs2, hbb);
+
+	if (hsa_seen_error ())
+	  return;
+
+	BrigType16_t src_type = hsa_bittype_for_type (rhs1_reg->m_type);
+	rhs1_reg = rhs1_reg->get_in_type (src_type, hbb);
+	rhs2_reg = rhs2_reg->get_in_type (src_type, hbb);
+
+	hsa_insn_packed *insn = new hsa_insn_packed
+	  (3, BRIG_OPCODE_COMBINE, dest->m_type, src_type, dest,
+	   rhs1_reg, rhs2_reg);
+	hbb->append_insn (insn);
+
+	return;
+      }
+    default:
+      /* Implement others as we come across them.  */
+      HSA_SORRY_ATV (gimple_location (assign),
+		     "support for HSA does not implement operation %s",
+		     get_tree_code_name (code));
+      return;
+    }
+
+
+  hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (gimple_assign_lhs (assign));
+
+  hsa_op_with_type *op1 = hsa_reg_or_immed_for_gimple_op (rhs1, hbb);
+  hsa_op_with_type *op2 = rhs2 != NULL_TREE ?
+    hsa_reg_or_immed_for_gimple_op (rhs2, hbb) : NULL;
+
+  if (hsa_seen_error ())
+    return;
+
+  switch (rhs_class)
+    {
+    case GIMPLE_TERNARY_RHS:
+      gcc_unreachable ();
+      return;
+
+      /* Fall through */
+    case GIMPLE_BINARY_RHS:
+      gen_hsa_binary_operation (opcode, dest, op1, op2, hbb);
+      break;
+      /* Fall through */
+    case GIMPLE_UNARY_RHS:
+      gen_hsa_unary_operation (opcode, dest, op1, hbb);
+      break;
+    default:
+      gcc_unreachable ();
+    }
+}
+
+/* Generate HSA instructions for a given gimple condition statement COND.
+   Instructions will be appended to HBB, which also needs to be the
+   corresponding structure to the basic_block of COND.  */
+
+static void
+gen_hsa_insns_for_cond_stmt (gimple *cond, hsa_bb *hbb)
+{
+  hsa_op_reg *ctrl = new hsa_op_reg (BRIG_TYPE_B1);
+  hsa_insn_br *cbr;
+
+  gen_hsa_cmp_insn_from_gimple (gimple_cond_code (cond),
+				gimple_cond_lhs (cond),
+				gimple_cond_rhs (cond),
+				ctrl, hbb);
+
+  cbr = new hsa_insn_br (ctrl);
+  hbb->append_insn (cbr);
+}
+
+/* Maximum number of elements in a jump table for an HSA SBR instruction.  */
+
+#define HSA_MAXIMUM_SBR_LABELS	16
+
+/* Return lowest value of a switch S that is handled in a non-default
+   label.  */
+
+static tree
+get_switch_low (gswitch *s)
+{
+  unsigned labels = gimple_switch_num_labels (s);
+  gcc_checking_assert (labels >= 1);
+
+  return CASE_LOW (gimple_switch_label (s, 1));
+}
+
+/* Return highest value of a switch S that is handled in a non-default
+   label.  */
+
+static tree
+get_switch_high (gswitch *s)
+{
+  unsigned labels = gimple_switch_num_labels (s);
+
+  /* Compare last label to maximum number of labels.  */
+  tree label = gimple_switch_label (s, labels - 1);
+  tree low = CASE_LOW (label);
+  tree high = CASE_HIGH (label);
+
+  return high != NULL_TREE ? high : low;
+}
+
+static tree
+get_switch_size (gswitch *s)
+{
+  return int_const_binop (MINUS_EXPR, get_switch_high (s), get_switch_low (s));
+}
+
+/* Generate HSA instructions for a given gimple switch.
+   Instructions will be appended to HBB.  */
+
+static void
+gen_hsa_insns_for_switch_stmt (gswitch *s, hsa_bb *hbb)
+{
+  function *func = DECL_STRUCT_FUNCTION (current_function_decl);
+  tree index_tree = gimple_switch_index (s);
+  tree lowest = get_switch_low (s);
+
+  hsa_op_reg *index = hsa_cfun->reg_for_gimple_ssa (index_tree);
+  hsa_op_reg *sub_index = new hsa_op_reg (index->m_type);
+  hbb->append_insn (new hsa_insn_basic (3, BRIG_OPCODE_SUB, sub_index->m_type,
+					sub_index, index,
+					new hsa_op_immed (lowest)));
+
+  hsa_op_base *tmp = sub_index->get_in_type (BRIG_TYPE_U64, hbb);
+  sub_index = as_a <hsa_op_reg *> (tmp);
+  unsigned labels = gimple_switch_num_labels (s);
+  unsigned HOST_WIDE_INT size = tree_to_uhwi (get_switch_size (s));
+
+  hsa_insn_sbr *sbr = new hsa_insn_sbr (sub_index, size + 1);
+  tree default_label = gimple_switch_default_label (s);
+  basic_block default_label_bb = label_to_block_fn
+    (func, CASE_LABEL (default_label));
+
+  sbr->m_default_bb = default_label_bb;
+
+  /* Prepare array with default label destination.  */
+  for (unsigned HOST_WIDE_INT i = 0; i <= size; i++)
+    sbr->m_jump_table.safe_push (default_label_bb);
+
+  /* Iterate all labels and fill up the jump table.  */
+  for (unsigned i = 1; i < labels; i++)
+    {
+      tree label = gimple_switch_label (s, i);
+      basic_block bb = label_to_block_fn (func, CASE_LABEL (label));
+
+      unsigned HOST_WIDE_INT sub_low = tree_to_uhwi
+	(int_const_binop (MINUS_EXPR, CASE_LOW (label), lowest));
+
+      unsigned HOST_WIDE_INT sub_high = sub_low;
+      tree high = CASE_HIGH (label);
+      if (high != NULL)
+	sub_high = tree_to_uhwi (int_const_binop (MINUS_EXPR, high, lowest));
+
+      for (unsigned HOST_WIDE_INT j = sub_low; j <= sub_high; j++)
+	sbr->m_jump_table[j] = bb;
+    }
+
+  hbb->append_insn (sbr);
+}
+
+/* Verify that the function DECL can be handled by HSA.  */
+
+static void
+verify_function_arguments (tree decl)
+{
+  if (DECL_STATIC_CHAIN (decl))
+    {
+      HSA_SORRY_ATV (EXPR_LOCATION (decl),
+		     "HSA does not support nested functions: %D", decl);
+      return;
+    }
+  else if (!TYPE_ARG_TYPES (TREE_TYPE (decl)))
+    {
+      HSA_SORRY_ATV (EXPR_LOCATION (decl),
+		     "HSA does not support functions with variadic arguments "
+		     "(or unknown return type): %D", decl);
+      return;
+    }
+}
+
+/* Return BRIG type for FORMAL_ARG_TYPE.  If the formal argument type is NULL,
+   return ACTUAL_ARG_TYPE.  */
+
+static BrigType16_t
+get_format_argument_type (tree formal_arg_type, BrigType16_t actual_arg_type)
+{
+  if (formal_arg_type == NULL)
+    return actual_arg_type;
+
+  BrigType16_t decl_type = hsa_type_for_scalar_tree_type
+    (formal_arg_type, false);
+  return mem_type_for_type (decl_type);
+}
+
+/* Generate HSA instructions for a direct call instruction.
+   Instructions will be appended to HBB, which also needs to be the
+   corresponding structure to the basic_block of STMT.  */
+
+static void
+gen_hsa_insns_for_direct_call (gimple *stmt, hsa_bb *hbb)
+{
+  tree decl = gimple_call_fndecl (stmt);
+  verify_function_arguments (decl);
+  if (hsa_seen_error ())
+    return;
+
+  hsa_insn_call *call_insn = new hsa_insn_call (decl);
+  hsa_cfun->m_called_functions.safe_push (call_insn->m_called_function);
+
+  /* Argument block start.  */
+  hsa_insn_arg_block *arg_start = new hsa_insn_arg_block
+    (BRIG_KIND_DIRECTIVE_ARG_BLOCK_START, call_insn);
+  hbb->append_insn (arg_start);
+
+  tree parm_type_chain = TYPE_ARG_TYPES (gimple_call_fntype (stmt));
+
+  /* Preparation of arguments that will be passed to function.  */
+  const unsigned args = gimple_call_num_args (stmt);
+  for (unsigned i = 0; i < args; ++i)
+    {
+      tree parm = gimple_call_arg (stmt, (int)i);
+      tree parm_decl_type = parm_type_chain != NULL_TREE
+	? TREE_VALUE (parm_type_chain) : NULL_TREE;
+      hsa_op_address *addr;
+
+      if (AGGREGATE_TYPE_P (TREE_TYPE (parm)))
+	{
+	  addr = gen_hsa_addr_for_arg (TREE_TYPE (parm), i);
+	  hsa_op_address *src = gen_hsa_addr (parm, hbb);
+	  gen_hsa_memory_copy (hbb, addr, src,
+			       addr->m_symbol->total_byte_size ());
+	}
+      else
+	{
+	  hsa_op_with_type *src = hsa_reg_or_immed_for_gimple_op (parm, hbb);
+
+	  if (parm_decl_type != NULL && AGGREGATE_TYPE_P (parm_decl_type))
+	    {
+	      HSA_SORRY_AT (gimple_location (stmt),
+			    "support for HSA does not implement an aggregate "
+			    "formal argument in a function call, while actual "
+			    "argument is not an aggregate");
+	      return;
+	    }
+
+	  BrigType16_t formal_arg_type = get_format_argument_type
+	    (parm_decl_type, src->m_type);
+	  if (hsa_seen_error ())
+	    return;
+
+	  if (src->m_type != formal_arg_type)
+	    src = src->get_in_type (formal_arg_type, hbb);
+
+	  addr = gen_hsa_addr_for_arg
+	    (parm_decl_type != NULL_TREE ? parm_decl_type: TREE_TYPE (parm), i);
+	  hsa_insn_mem *mem = new hsa_insn_mem (BRIG_OPCODE_ST, formal_arg_type,
+						src, addr);
+
+	  hbb->append_insn (mem);
+	}
+
+      call_insn->m_input_args.safe_push (addr->m_symbol);
+      if (parm_type_chain)
+	parm_type_chain = TREE_CHAIN (parm_type_chain);
+    }
+
+  call_insn->m_args_code_list = new hsa_op_code_list (args);
+  hbb->append_insn (call_insn);
+
+  tree result_type = TREE_TYPE (TREE_TYPE (decl));
+
+  tree result = gimple_call_lhs (stmt);
+  hsa_insn_mem *result_insn = NULL;
+  if (!VOID_TYPE_P (result_type))
+    {
+      hsa_op_address *addr = gen_hsa_addr_for_arg (result_type, -1);
+
+      /* Even if result of a function call is unused, we have to emit
+	 declaration for the result.  */
+      if (result)
+	{
+	  tree lhs_type = TREE_TYPE (result);
+
+	  if (hsa_seen_error ())
+	    return;
+
+	  if (AGGREGATE_TYPE_P (lhs_type))
+	    {
+	      hsa_op_address *result_addr = gen_hsa_addr (result, hbb);
+	      gen_hsa_memory_copy (hbb, result_addr, addr,
+				   addr->m_symbol->total_byte_size ());
+	    }
+	  else
+	    {
+	      BrigType16_t mtype = mem_type_for_type
+		(hsa_type_for_scalar_tree_type (lhs_type, false));
+
+	      hsa_op_reg *dst = hsa_cfun->reg_for_gimple_ssa (result);
+	      result_insn = new hsa_insn_mem (BRIG_OPCODE_LD, mtype, dst, addr);
+	      hbb->append_insn (result_insn);
+	    }
+	}
+
+      call_insn->m_output_arg = addr->m_symbol;
+      call_insn->m_result_code_list = new hsa_op_code_list (1);
+    }
+  else
+    {
+      if (result)
+	{
+	  HSA_SORRY_AT (gimple_location (stmt),
+			"support for HSA does not implement an assignment of "
+			"return value from a void function");
+	  return;
+	}
+
+      call_insn->m_result_code_list = new hsa_op_code_list (0);
+    }
+
+  /* Argument block end.  */
+  hsa_insn_arg_block *arg_end = new hsa_insn_arg_block
+    (BRIG_KIND_DIRECTIVE_ARG_BLOCK_END, call_insn);
+  hbb->append_insn (arg_end);
+}
+
+/* Generate HSA instructions for a direct call of an internal fn.
+   Instructions will be appended to HBB, which also needs to be the
+   corresponding structure to the basic_block of STMT.  */
+
+static void
+gen_hsa_insns_for_call_of_internal_fn (gimple *stmt, hsa_bb *hbb)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  if (!lhs)
+    return;
+
+  tree lhs_type = TREE_TYPE (lhs);
+  tree rhs1 = gimple_call_arg (stmt, 0);
+  tree rhs1_type = TREE_TYPE (rhs1);
+  enum internal_fn fn = gimple_call_internal_fn (stmt);
+  hsa_insn_call *call_insn = new hsa_insn_call
+    (new hsa_internal_fn (fn, tree_to_uhwi (TYPE_SIZE (rhs1_type))));
+
+  gcc_checking_assert (FLOAT_TYPE_P (rhs1_type));
+
+  if (!hsa_emitted_internal_decls->find (call_insn->m_called_internal_fn))
+    hsa_cfun->m_called_internal_fns.safe_push (call_insn->m_called_internal_fn);
+
+  hsa_insn_arg_block *arg_start = new hsa_insn_arg_block
+    (BRIG_KIND_DIRECTIVE_ARG_BLOCK_START, call_insn);
+  hbb->append_insn (arg_start);
+
+  unsigned num_args = gimple_call_num_args (stmt);
+
+  /* Function arguments.  */
+  for (unsigned i = 0; i < num_args; i++)
+    {
+      tree parm = gimple_call_arg (stmt, (int)i);
+      hsa_op_with_type *src = hsa_reg_or_immed_for_gimple_op (parm, hbb);
+
+      hsa_op_address *addr = gen_hsa_addr_for_arg (TREE_TYPE (parm), i);
+      hsa_insn_mem *mem = new hsa_insn_mem (BRIG_OPCODE_ST, src->m_type,
+					    src, addr);
+
+      call_insn->m_input_args.safe_push (addr->m_symbol);
+      hbb->append_insn (mem);
+    }
+
+  call_insn->m_args_code_list = new hsa_op_code_list (num_args);
+  hbb->append_insn (call_insn);
+
+  /* Assign returned value.  */
+  hsa_op_address *addr = gen_hsa_addr_for_arg (lhs_type, -1);
+
+  call_insn->m_output_arg = addr->m_symbol;
+  call_insn->m_result_code_list = new hsa_op_code_list (1);
+
+  /* Argument block end.  */
+  hsa_insn_arg_block *arg_end = new hsa_insn_arg_block
+    (BRIG_KIND_DIRECTIVE_ARG_BLOCK_END, call_insn);
+  hbb->append_insn (arg_end);
+}
+
+/* Generate HSA instructions for a return value instruction.
+   Instructions will be appended to HBB, which also needs to be the
+   corresponding structure to the basic_block of STMT.  */
+
+static void
+gen_hsa_insns_for_return (greturn *stmt, hsa_bb *hbb)
+{
+  tree retval = gimple_return_retval (stmt);
+  if (retval)
+    {
+      hsa_op_address *addr = new hsa_op_address (hsa_cfun->m_output_arg);
+
+      if (AGGREGATE_TYPE_P (TREE_TYPE (retval)))
+	{
+	  hsa_op_address *retval_addr = gen_hsa_addr (retval, hbb);
+	  gen_hsa_memory_copy (hbb, addr, retval_addr,
+			       hsa_cfun->m_output_arg->total_byte_size ());
+	}
+      else
+	{
+	  BrigType16_t mtype = mem_type_for_type
+	    (hsa_type_for_scalar_tree_type (TREE_TYPE (retval), false));
+
+	  /* Store of return value.  */
+	  hsa_op_with_type *src = hsa_reg_or_immed_for_gimple_op (retval, hbb);
+	  src = src->get_in_type (mtype, hbb);
+	  hsa_insn_mem *mem = new hsa_insn_mem (BRIG_OPCODE_ST, mtype, src,
+						addr);
+	  hbb->append_insn (mem);
+	}
+    }
+
+  /* HSAIL return instruction emission.  */
+  hsa_insn_basic *ret = new hsa_insn_basic (0, BRIG_OPCODE_RET);
+  hbb->append_insn (ret);
+}
+
+/* Set OP_INDEX-th operand of the instruction to DEST, as the DEST
+   can have a different type, conversion instructions are possibly
+   appended to HBB.  */
+
+void
+hsa_insn_basic::set_output_in_type (hsa_op_reg *dest, unsigned op_index,
+				    hsa_bb *hbb)
+{
+  hsa_insn_basic *insn;
+  gcc_checking_assert (op_output_p (op_index));
+
+  if (dest->m_type == m_type)
+    {
+      set_op (op_index, dest);
+      return;
+    }
+
+  hsa_op_reg *tmp = new hsa_op_reg (m_type);
+  set_op (op_index, tmp);
+
+  if (hsa_needs_cvt (dest->m_type, m_type))
+    insn = new hsa_insn_cvt (dest, tmp);
+  else
+    insn = new hsa_insn_basic (2, BRIG_OPCODE_MOV, dest->m_type,
+			       dest, tmp->get_in_type (dest->m_type, hbb));
+
+  hbb->append_insn (insn);
+}
+
+/* Generate instruction OPCODE to query a property of HSA grid along the
+   given DIMENSION.  Store result into DEST and append the instruction to
+   HBB.  */
+
+static void
+query_hsa_grid (hsa_op_reg *dest, BrigType16_t opcode, int dimension,
+		hsa_bb *hbb)
+{
+  /* We're using just one-dimensional kernels, so hard-coded
+     dimension X.  */
+  hsa_op_immed *imm = new hsa_op_immed (dimension,
+					(BrigKind16_t) BRIG_TYPE_U32);
+  hsa_insn_basic *insn = new hsa_insn_basic (2, opcode, BRIG_TYPE_U32, NULL,
+					     imm);
+  hbb->append_insn (insn);
+  insn->set_output_in_type (dest, 0, hbb);
+}
+
+/* Generate a special HSA-related instruction for gimple STMT.
+   Instructions are appended to basic block HBB.  */
+
+static void
+query_hsa_grid (gimple *stmt, BrigOpcode16_t opcode, int dimension,
+		hsa_bb *hbb)
+{
+  tree lhs = gimple_call_lhs (dyn_cast <gcall *> (stmt));
+  if (lhs == NULL_TREE)
+    return;
+
+  hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs);
+
+  query_hsa_grid (dest, opcode, dimension, hbb);
+}
+
+/* Emit instructions that set hsa_num_threads according to provided VALUE.
+   Instructions are appended to basic block HBB.  */
+
+static void
+gen_set_num_threads (tree value, hsa_bb *hbb)
+{
+  hbb->append_insn (new hsa_insn_comment ("omp_set_num_threads"));
+  hsa_op_with_type *src = hsa_reg_or_immed_for_gimple_op (value, hbb);
+
+  src = src->get_in_type (hsa_num_threads->m_type, hbb);
+  hsa_op_address *addr = new hsa_op_address (hsa_num_threads);
+
+  hsa_insn_basic *basic = new hsa_insn_mem
+    (BRIG_OPCODE_ST, hsa_num_threads->m_type, src, addr);
+  hbb->append_insn (basic);
+}
+
+static GTY (()) tree hsa_kernel_dispatch_type = NULL;
+
+/* Return byte offset of a FIELD_NAME in GOMP_hsa_kernel_dispatch which
+   is defined in plugin-hsa.c.  */
+
+static HOST_WIDE_INT
+get_hsa_kernel_dispatch_offset (const char *field_name)
+{
+  if (hsa_kernel_dispatch_type == NULL)
+    {
+      /* Collection of information needed for a dispatch of a kernel from a
+	 kernel.  Keep in sync with libgomp's plugin-hsa.c.  */
+
+      hsa_kernel_dispatch_type = make_node (RECORD_TYPE);
+      tree id_f1 = build_decl (BUILTINS_LOCATION, FIELD_DECL,
+			       get_identifier ("queue"), ptr_type_node);
+      DECL_CHAIN (id_f1) = NULL_TREE;
+      tree id_f2 = build_decl (BUILTINS_LOCATION, FIELD_DECL,
+			       get_identifier ("omp_data_memory"),
+			       ptr_type_node);
+      DECL_CHAIN (id_f2) = id_f1;
+      tree id_f3 = build_decl (BUILTINS_LOCATION, FIELD_DECL,
+			       get_identifier ("kernarg_address"),
+			       ptr_type_node);
+      DECL_CHAIN (id_f3) = id_f2;
+      tree id_f4 = build_decl (BUILTINS_LOCATION, FIELD_DECL,
+			       get_identifier ("object"),
+			       uint64_type_node);
+      DECL_CHAIN (id_f4) = id_f3;
+      tree id_f5 = build_decl (BUILTINS_LOCATION, FIELD_DECL,
+			       get_identifier ("signal"),
+			       uint64_type_node);
+      DECL_CHAIN (id_f5) = id_f4;
+      tree id_f6 = build_decl (BUILTINS_LOCATION, FIELD_DECL,
+			       get_identifier ("private_segment_size"),
+			       uint32_type_node);
+      DECL_CHAIN (id_f6) = id_f5;
+      tree id_f7 = build_decl (BUILTINS_LOCATION, FIELD_DECL,
+			       get_identifier ("group_segment_size"),
+			       uint32_type_node);
+      DECL_CHAIN (id_f7) = id_f6;
+      tree id_f8 = build_decl (BUILTINS_LOCATION, FIELD_DECL,
+			       get_identifier ("kernel_dispatch_count"),
+			       uint64_type_node);
+      DECL_CHAIN (id_f8) = id_f7;
+      tree id_f9 = build_decl (BUILTINS_LOCATION, FIELD_DECL,
+			       get_identifier ("debug"),
+			       uint64_type_node);
+      DECL_CHAIN (id_f9) = id_f8;
+      tree id_f10 = build_decl (BUILTINS_LOCATION, FIELD_DECL,
+				get_identifier ("omp_level"),
+				uint64_type_node);
+      DECL_CHAIN (id_f10) = id_f9;
+      tree id_f11 = build_decl (BUILTINS_LOCATION, FIELD_DECL,
+				get_identifier ("children_dispatches"),
+				ptr_type_node);
+      DECL_CHAIN (id_f11) = id_f10;
+      tree id_f12 = build_decl (BUILTINS_LOCATION, FIELD_DECL,
+			       get_identifier ("omp_num_threads"),
+			       uint32_type_node);
+      DECL_CHAIN (id_f12) = id_f11;
+
+
+      finish_builtin_struct (hsa_kernel_dispatch_type, "__hsa_kernel_dispatch",
+			     id_f12, NULL_TREE);
+      TYPE_ARTIFICIAL (hsa_kernel_dispatch_type) = 1;
+    }
+
+  for (tree chain = TYPE_FIELDS (hsa_kernel_dispatch_type);
+       chain != NULL_TREE; chain = TREE_CHAIN (chain))
+    if (strcmp (field_name, IDENTIFIER_POINTER (DECL_NAME (chain))) == 0)
+      return int_byte_position (chain);
+
+  gcc_unreachable ();
+}
+
+/* Return an HSA register that will contain number of threads for
+   a future dispatched kernel.  Instructions are added to HBB.  */
+
+static hsa_op_reg *
+gen_num_threads_for_dispatch (hsa_bb *hbb)
+{
+  /* Step 1) Assign to number of threads:
+     MIN (HSA_DEFAULT_NUM_THREADS, hsa_num_threads).  */
+  hsa_op_reg *threads = new hsa_op_reg (hsa_num_threads->m_type);
+  hsa_op_address *addr = new hsa_op_address (hsa_num_threads);
+
+  hbb->append_insn (new hsa_insn_mem (BRIG_OPCODE_LD, threads->m_type,
+				      threads, addr));
+
+  hsa_op_immed *limit = new hsa_op_immed (HSA_DEFAULT_NUM_THREADS,
+					  BRIG_TYPE_U32);
+  hsa_op_reg *r = new hsa_op_reg (BRIG_TYPE_B1);
+  hbb->append_insn
+    (new hsa_insn_cmp (BRIG_COMPARE_LT, r->m_type, r, threads, limit));
+
+  BrigType16_t btype = hsa_bittype_for_type (threads->m_type);
+  hsa_op_reg *tmp = new hsa_op_reg (threads->m_type);
+
+  hbb->append_insn
+    (new hsa_insn_basic (4, BRIG_OPCODE_CMOV, btype, tmp, r,
+			 threads, limit));
+
+  /* Step 2) If the number is equal to zero,
+     return shadow->omp_num_threads.  */
+  hsa_op_reg *shadow_reg_ptr = hsa_cfun->get_shadow_reg ();
+
+  hsa_op_reg *shadow_thread_count = new hsa_op_reg (BRIG_TYPE_U32);
+  addr = new hsa_op_address
+   (shadow_reg_ptr, get_hsa_kernel_dispatch_offset ("omp_num_threads"));
+  hsa_insn_basic *basic = new hsa_insn_mem
+   (BRIG_OPCODE_LD, shadow_thread_count->m_type, shadow_thread_count, addr);
+  hbb->append_insn (basic);
+
+  hsa_op_reg *tmp2 = new hsa_op_reg (threads->m_type);
+  r = new hsa_op_reg (BRIG_TYPE_B1);
+  hbb->append_insn
+    (new hsa_insn_cmp (BRIG_COMPARE_EQ, r->m_type, r, tmp,
+		       new hsa_op_immed (0, shadow_thread_count->m_type)));
+  hbb->append_insn
+    (new hsa_insn_basic (4, BRIG_OPCODE_CMOV, btype, tmp2, r,
+			 shadow_thread_count, tmp));
+
+  hsa_op_base *dest = tmp2->get_in_type (BRIG_TYPE_U16, hbb);
+
+  return as_a <hsa_op_reg *> (dest);
+}
+
+
+/* Emit instructions that assign number of teams to lhs of gimple STMT.
+   Instructions are appended to basic block HBB.  */
+
+static void
+gen_get_num_teams (gimple *stmt, hsa_bb *hbb)
+{
+  if (gimple_call_lhs (stmt) == NULL_TREE)
+    return;
+
+  hbb->append_insn (new hsa_insn_comment ("omp_get_num_teams"));
+
+  tree lhs = gimple_call_lhs (stmt);
+  hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs);
+  hsa_op_immed *one = new hsa_op_immed (1, dest->m_type);
+
+  hsa_insn_basic *basic = new hsa_insn_basic
+    (2, BRIG_OPCODE_MOV, dest->m_type, dest, one);
+
+  hbb->append_insn (basic);
+}
+
+/* Emit instructions that assign a team number to lhs of gimple STMT.
+   Instructions are appended to basic block HBB.  */
+
+static void
+gen_get_team_num (gimple *stmt, hsa_bb *hbb)
+{
+  if (gimple_call_lhs (stmt) == NULL_TREE)
+    return;
+
+  hbb->append_insn (new hsa_insn_comment ("omp_get_team_num"));
+
+  tree lhs = gimple_call_lhs (stmt);
+  hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs);
+  hsa_op_immed *zero = new hsa_op_immed (0, dest->m_type);
+
+  hsa_insn_basic *basic = new hsa_insn_basic
+    (2, BRIG_OPCODE_MOV, dest->m_type, dest, zero);
+
+  hbb->append_insn (basic);
+}
+
+/* Emit instructions that get levels-var ICV to lhs of gimple STMT.
+   Instructions are appended to basic block HBB.  */
+
+static void
+gen_get_level (gimple *stmt, hsa_bb *hbb)
+{
+  if (gimple_call_lhs (stmt) == NULL_TREE)
+    return;
+
+  hbb->append_insn (new hsa_insn_comment ("omp_get_level"));
+
+  tree lhs = gimple_call_lhs (stmt);
+  hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs);
+
+  hsa_op_reg *shadow_reg_ptr = hsa_cfun->get_shadow_reg ();
+  if (shadow_reg_ptr == NULL)
+    {
+      HSA_SORRY_AT (gimple_location (stmt),
+		    "support for HSA does not implement omp_get_level called "
+		    "from a function not being inlined within a kernel");
+      return;
+    }
+
+  hsa_op_address *addr = new hsa_op_address
+    (shadow_reg_ptr, get_hsa_kernel_dispatch_offset ("omp_level"));
+
+  hsa_insn_mem *mem = new hsa_insn_mem (BRIG_OPCODE_LD, BRIG_TYPE_U64, NULL,
+					addr);
+  hbb->append_insn (mem);
+  mem->set_output_in_type (dest, 0, hbb);
+}
+
+/* Emit instruction that implement omp_get_max_threads of gimple STMT.  */
+
+static void
+gen_get_max_threads (gimple *stmt, hsa_bb *hbb)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  if (!lhs)
+    return;
+
+  hbb->append_insn (new hsa_insn_comment ("omp_get_max_threads"));
+
+  hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs);
+  hsa_op_with_type *num_theads_reg = gen_num_threads_for_dispatch (hbb)
+    ->get_in_type (dest->m_type, hbb);
+  hsa_build_append_simple_mov (dest, num_theads_reg, hbb);
+}
+
+/* Emit instructions that implement alloca builtin gimple STMT.
+   Instructions are appended to basic block HBB.  */
+
+static void
+gen_hsa_alloca (gcall *call, hsa_bb *hbb)
+{
+  tree lhs = gimple_call_lhs (call);
+  if (lhs == NULL_TREE)
+    return;
+
+  built_in_function fn = DECL_FUNCTION_CODE (gimple_call_fndecl (call));
+
+  gcc_checking_assert (fn == BUILT_IN_ALLOCA
+		       || fn == BUILT_IN_ALLOCA_WITH_ALIGN);
+
+  unsigned bit_alignment = 0;
+
+  if (fn == BUILT_IN_ALLOCA_WITH_ALIGN)
+    {
+      tree alignment_tree = gimple_call_arg (call, 1);
+      if (TREE_CODE (alignment_tree) != INTEGER_CST)
+	{
+	  HSA_SORRY_ATV
+	    (gimple_location (call), "support for HSA does not implement "
+	     "__builtin_alloca_with_align with a non-constant "
+	     "alignment: %E", alignment_tree);
+	}
+
+      bit_alignment = tree_to_uhwi (alignment_tree);
+    }
+
+  tree rhs1 = gimple_call_arg (call, 0);
+  hsa_op_with_type *size = hsa_reg_or_immed_for_gimple_op (rhs1, hbb)
+    ->get_in_type (BRIG_TYPE_U32, hbb);
+  hsa_op_with_type *dest = hsa_cfun->reg_for_gimple_ssa (lhs);
+
+  hsa_op_reg *tmp = new hsa_op_reg
+    (hsa_get_segment_addr_type (BRIG_SEGMENT_PRIVATE));
+  hsa_insn_alloca *a = new hsa_insn_alloca (tmp, size, bit_alignment);
+  hbb->append_insn (a);
+
+  hsa_insn_seg *seg = new hsa_insn_seg
+    (BRIG_OPCODE_STOF, hsa_get_segment_addr_type (BRIG_SEGMENT_FLAT),
+     tmp->m_type, BRIG_SEGMENT_PRIVATE, dest, tmp);
+  hbb->append_insn (seg);
+}
+
+/* Emit instructions that implement clrsb builtin STMT:
+   Returns the number of leading redundant sign bits in x, i.e. the number
+   of bits following the most significant bit that are identical to it.
+   There are no special cases for 0 or other values.
+   Instructions are appended to basic block HBB.  */
+
+static void
+gen_hsa_clrsb (gcall *call, hsa_bb *hbb)
+{
+  tree lhs = gimple_call_lhs (call);
+  if (lhs == NULL_TREE)
+    return;
+
+  hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs);
+  tree rhs1 = gimple_call_arg (call, 0);
+  hsa_op_with_type *arg = hsa_reg_or_immed_for_gimple_op (rhs1, hbb);
+  BrigType16_t bittype = hsa_bittype_for_type (arg->m_type);
+  unsigned bitsize = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (rhs1)));
+  gcc_checking_assert (bitsize >= 32);
+
+  /* Set true to MOST_SIG if the most significant bit is set to one.  */
+  hsa_op_immed *c = new hsa_op_immed (1ul << (bitsize - 1),
+				      hsa_uint_for_bitsize (bitsize));
+
+  hsa_op_reg *and_reg = new hsa_op_reg (bittype);
+  gen_hsa_binary_operation (BRIG_OPCODE_AND, and_reg, arg, c, hbb);
+
+  hsa_op_reg *most_sign = new hsa_op_reg (BRIG_TYPE_B1);
+  hsa_insn_cmp *cmp = new hsa_insn_cmp
+    (BRIG_COMPARE_EQ, most_sign->m_type, most_sign, and_reg, c);
+  hbb->append_insn (cmp);
+
+  /* If the most significant bit is one, negate the input.  Otherwise
+     shift the input value to left by one bit.  */
+  hsa_op_reg *arg_neg = new hsa_op_reg (arg->m_type);
+  gen_hsa_unary_operation (BRIG_OPCODE_NEG, arg_neg, arg, hbb);
+
+  hsa_op_reg *shifted_arg = new hsa_op_reg (arg->m_type);
+  gen_hsa_binary_operation (BRIG_OPCODE_SHL, shifted_arg, arg,
+			    new hsa_op_immed (1, BRIG_TYPE_U64), hbb);
+
+  /* Assign the value that can be used for FIRSTBIT instruction according
+     to the most significant bit.  */
+  hsa_op_reg *tmp = new hsa_op_reg (bittype);
+  hsa_insn_basic *cmov = new hsa_insn_basic
+    (4, BRIG_OPCODE_CMOV, bittype, tmp, most_sign, arg_neg, shifted_arg);
+  hbb->append_insn (cmov);
+
+  hsa_op_reg *leading_bits = new hsa_op_reg (BRIG_TYPE_S32);
+  gen_hsa_unary_operation (BRIG_OPCODE_FIRSTBIT, leading_bits,
+			   tmp->get_in_type (hsa_uint_for_bitsize (bitsize),
+					     hbb), hbb);
+
+  /* Set flag if the input value is equal to zero.  */
+  hsa_op_reg *is_zero = new hsa_op_reg (BRIG_TYPE_B1);
+  cmp = new hsa_insn_cmp
+    (BRIG_COMPARE_EQ, is_zero->m_type, is_zero, arg,
+     new hsa_op_immed (0, arg->m_type));
+  hbb->append_insn (cmp);
+
+  /* Return the number of leading bits, or 31 if the input value is zero.  */
+  cmov = new hsa_insn_basic
+    (4, BRIG_OPCODE_CMOV, BRIG_TYPE_B32, NULL, is_zero,
+     new hsa_op_immed (31, BRIG_TYPE_U32),
+     leading_bits->get_in_type (BRIG_TYPE_B32, hbb));
+  hbb->append_insn (cmov);
+  cmov->set_output_in_type (dest, 0, hbb);
+}
+
+/* Emit instructions that implement ffs builtin STMT:
+   Returns one plus the index of the least significant 1-bit of x,
+   or if x is zero, returns zero.
+   Instructions are appended to basic block HBB.  */
+
+static void
+gen_hsa_ffs (gcall *call, hsa_bb *hbb)
+{
+  tree lhs = gimple_call_lhs (call);
+  if (lhs == NULL_TREE)
+    return;
+
+  hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs);
+
+  tree rhs1 = gimple_call_arg (call, 0);
+  hsa_op_with_type *arg = hsa_reg_or_immed_for_gimple_op (rhs1, hbb);
+
+  hsa_op_reg *tmp = new hsa_op_reg (BRIG_TYPE_U32);
+  hsa_insn_srctype *insn = new hsa_insn_srctype
+    (2, BRIG_OPCODE_LASTBIT, tmp->m_type, arg->m_type, tmp, arg);
+  hbb->append_insn (insn);
+
+  hsa_insn_basic *addition = new hsa_insn_basic
+    (3, BRIG_OPCODE_ADD, tmp->m_type, NULL, tmp,
+     new hsa_op_immed (1, tmp->m_type));
+  hbb->append_insn (addition);
+  addition->set_output_in_type (dest, 0, hbb);
+}
+
+static void
+gen_hsa_popcount_to_dest (hsa_op_reg *dest, hsa_op_with_type *arg, hsa_bb *hbb)
+{
+  gcc_checking_assert (hsa_type_integer_p (arg->m_type));
+
+  if (hsa_type_bit_size (arg->m_type) < 32)
+    arg = arg->get_in_type (BRIG_TYPE_B32, hbb);
+
+  if (!hsa_btype_p (arg->m_type))
+    arg = arg->get_in_type (hsa_bittype_for_type (arg->m_type), hbb);
+
+  hsa_insn_srctype *popcount = new hsa_insn_srctype
+    (2, BRIG_OPCODE_POPCOUNT, BRIG_TYPE_U32, arg->m_type, NULL, arg);
+  hbb->append_insn (popcount);
+  popcount->set_output_in_type (dest, 0, hbb);
+}
+
+/* Emit instructions that implement parity builtin STMT:
+   Returns the parity of x, i.e. the number of 1-bits in x modulo 2.
+   Instructions are appended to basic block HBB.  */
+
+static void
+gen_hsa_parity (gcall *call, hsa_bb *hbb)
+{
+  tree lhs = gimple_call_lhs (call);
+  if (lhs == NULL_TREE)
+    return;
+
+  hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs);
+  tree rhs1 = gimple_call_arg (call, 0);
+  hsa_op_with_type *arg = hsa_reg_or_immed_for_gimple_op (rhs1, hbb);
+
+  hsa_op_reg *popcount = new hsa_op_reg (BRIG_TYPE_U32);
+  gen_hsa_popcount_to_dest (popcount, arg, hbb);
+
+  hsa_insn_basic *insn = new hsa_insn_basic
+    (3, BRIG_OPCODE_REM, popcount->m_type, NULL, popcount,
+     new hsa_op_immed (2, popcount->m_type));
+  hbb->append_insn (insn);
+  insn->set_output_in_type (dest, 0, hbb);
+}
+
+/* Emit instructions that implement popcount builtin STMT.
+   Instructions are appended to basic block HBB.  */
+
+static void
+gen_hsa_popcount (gcall *call, hsa_bb *hbb)
+{
+  tree lhs = gimple_call_lhs (call);
+  if (lhs == NULL_TREE)
+    return;
+
+  hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs);
+  tree rhs1 = gimple_call_arg (call, 0);
+  hsa_op_with_type *arg = hsa_reg_or_immed_for_gimple_op (rhs1, hbb);
+
+  gen_hsa_popcount_to_dest (dest, arg, hbb);
+}
+
+/* Set VALUE to a shadow kernel debug argument and append a new instruction
+   to HBB basic block.  */
+
+static void
+set_debug_value (hsa_bb *hbb, hsa_op_with_type *value)
+{
+  hsa_op_reg *shadow_reg_ptr = hsa_cfun->get_shadow_reg ();
+  if (shadow_reg_ptr == NULL)
+    return;
+
+  hsa_op_address *addr = new hsa_op_address
+    (shadow_reg_ptr, get_hsa_kernel_dispatch_offset ("debug"));
+  hsa_insn_mem *mem = new hsa_insn_mem (BRIG_OPCODE_ST, BRIG_TYPE_U64, value,
+					addr);
+  hbb->append_insn (mem);
+}
+
+void
+omp_simple_builtin::generate (gimple *stmt, hsa_bb *hbb)
+{
+  if (m_sorry)
+    {
+      if (m_warning_message)
+	HSA_SORRY_AT (gimple_location (stmt), m_warning_message)
+      else
+	HSA_SORRY_ATV (gimple_location (stmt),
+		       "Support for HSA does not implement calls to %s\n",
+		       m_name)
+    }
+  else if (m_warning_message != NULL)
+    warning_at (gimple_location (stmt), OPT_Whsa, m_warning_message);
+
+  if (m_return_value != NULL)
+    {
+      tree lhs = gimple_call_lhs (stmt);
+      if (!lhs)
+	return;
+
+      hbb->append_insn (new hsa_insn_comment (m_name));
+
+      hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs);
+      hsa_build_append_simple_mov
+	(dest, m_return_value->get_in_type (dest->m_type, hbb), hbb);
+    }
+}
+
+/* If STMT is a call of a known library function, generate code to perform
+   it and return true.  */
+
+static bool
+gen_hsa_insns_for_known_library_call (gimple *stmt, hsa_bb *hbb)
+{
+  bool handled = false;
+  const char *name = hsa_get_declaration_name (gimple_call_fndecl (stmt));
+
+  char *copy = NULL;
+  size_t len = strlen (name);
+  if (len > 0 && name[len - 1] == '_')
+    {
+      copy = XNEWVEC (char, len + 1);
+      strcpy (copy, name);
+      copy[len - 1] = '\0';
+      name = copy;
+    }
+
+  /* Handle omp_* routines.  */
+  if (strstr (name, "omp_") == name)
+    {
+      hsa_init_simple_builtins ();
+      omp_simple_builtin *builtin = omp_simple_builtins->get (name);
+      if (builtin)
+	{
+	  builtin->generate (stmt, hbb);
+	  return true;
+	}
+
+      handled = true;
+      if (strcmp (name, "omp_set_num_threads") == 0)
+	gen_set_num_threads (gimple_call_arg (stmt, 0), hbb);
+      else if (strcmp (name, "omp_get_thread_num") == 0)
+	{
+	  hbb->append_insn (new hsa_insn_comment (name));
+	  query_hsa_grid (stmt, BRIG_OPCODE_WORKITEMABSID, 0, hbb);
+	}
+      else if (strcmp (name, "omp_get_num_threads") == 0)
+	{
+	  hbb->append_insn (new hsa_insn_comment (name));
+	  query_hsa_grid (stmt, BRIG_OPCODE_GRIDSIZE, 0, hbb);
+	}
+      else if (strcmp (name, "omp_get_num_teams") == 0)
+	gen_get_num_teams (stmt, hbb);
+      else if (strcmp (name, "omp_get_team_num") == 0)
+	gen_get_team_num (stmt, hbb);
+      else if (strcmp (name, "omp_get_level") == 0)
+	gen_get_level (stmt, hbb);
+      else if (strcmp (name, "omp_get_active_level") == 0)
+	gen_get_level (stmt, hbb);
+      else if (strcmp (name, "omp_in_parallel") == 0)
+	gen_get_level (stmt, hbb);
+      else if (strcmp (name, "omp_get_max_threads") == 0)
+	gen_get_max_threads (stmt, hbb);
+      else
+	handled = false;
+
+      if (handled)
+	{
+	  if (copy)
+	    free (copy);
+	  return true;
+	}
+    }
+
+  if (strcmp (name, "__hsa_set_debug_value") == 0)
+    {
+      handled = true;
+      if (hsa_cfun->has_shadow_reg_p ())
+	{
+	  tree rhs1 = gimple_call_arg (stmt, 0);
+	  hsa_op_with_type *src = hsa_reg_or_immed_for_gimple_op (rhs1, hbb);
+
+	  src = src->get_in_type (BRIG_TYPE_U64, hbb);
+	  set_debug_value (hbb, src);
+	}
+    }
+
+  if (copy)
+    free (copy);
+  return handled;
+}
+
+/* Helper functions to create a single unary HSA operations out of calls to
+   builtins.  OPCODE is the HSA operation to be generated.  STMT is a gimple
+   call to a builtin.  HBB is the HSA BB to which the instruction should be
+   added.  Note that nothing will be created if STMT does not have a LHS.  */
+
+static void
+gen_hsa_unaryop_for_builtin (BrigOpcode opcode, gimple *stmt, hsa_bb *hbb)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  if (!lhs)
+    return;
+  hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs);
+  hsa_op_with_type *op = hsa_reg_or_immed_for_gimple_op
+    (gimple_call_arg (stmt, 0), hbb);
+  gen_hsa_unary_operation (opcode, dest, op, hbb);
+}
+
+/* Helper functions to create a call to standard library if LHS of the
+   STMT is used.  HBB is the HSA BB to which the instruction should be
+   added.  */
+
+static void
+gen_hsa_unaryop_builtin_call (gimple *stmt, hsa_bb *hbb)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  if (!lhs)
+    return;
+
+  if (gimple_call_internal_p (stmt))
+    gen_hsa_insns_for_call_of_internal_fn (stmt, hbb);
+  else
+    gen_hsa_insns_for_direct_call (stmt, hbb);
+}
+
+/* Helper functions to create a single unary HSA operations out of calls to
+   builtins (if unsafe math optimizations are enable). Otherwise, create
+   a call to standard library function.
+   OPCODE is the HSA operation to be generated.  STMT is a gimple
+   call to a builtin.  HBB is the HSA BB to which the instruction should be
+   added.  Note that nothing will be created if STMT does not have a LHS.  */
+
+static void
+gen_hsa_unaryop_or_call_for_builtin (BrigOpcode opcode, gimple *stmt,
+				     hsa_bb *hbb)
+{
+  if (flag_unsafe_math_optimizations)
+    gen_hsa_unaryop_for_builtin (opcode, stmt, hbb);
+  else
+    gen_hsa_unaryop_builtin_call (stmt, hbb);
+}
+
+/* Generate HSA address corresponding to a value VAL (as opposed to a memory
+   reference tree), for example an SSA_NAME or an ADDR_EXPR.  HBB is the HSA BB
+   to which the instruction should be added.  */
+
+static hsa_op_address *
+get_address_from_value (tree val, hsa_bb *hbb)
+{
+  switch (TREE_CODE (val))
+    {
+    case SSA_NAME:
+      {
+	BrigType16_t addrtype = hsa_get_segment_addr_type (BRIG_SEGMENT_FLAT);
+	hsa_op_base *reg = hsa_cfun->reg_for_gimple_ssa (val)->get_in_type
+	  (addrtype, hbb);
+	return new hsa_op_address (NULL, as_a <hsa_op_reg *> (reg), 0);
+      }
+    case ADDR_EXPR:
+      return gen_hsa_addr (TREE_OPERAND (val, 0), hbb);
+
+    case INTEGER_CST:
+      if (tree_fits_shwi_p (val))
+	return new hsa_op_address (NULL, NULL, tree_to_shwi (val));
+      /* Otherwise fall-through */
+
+    default:
+      HSA_SORRY_ATV (EXPR_LOCATION (val),
+		     "support for HSA does not implement memory access to %E",
+		     val);
+      return new hsa_op_address (NULL, NULL, 0);
+    }
+}
+
+/* Return string for MEMMODEL.  */
+
+static const char *
+get_memory_order_name (unsigned memmodel)
+{
+  switch (memmodel)
+    {
+    case __ATOMIC_RELAXED:
+      return "__ATOMIC_RELAXED";
+    case __ATOMIC_CONSUME:
+      return "__ATOMIC_CONSUME";
+    case __ATOMIC_ACQUIRE:
+      return "__ATOMIC_ACQUIRE";
+    case __ATOMIC_RELEASE:
+      return "__ATOMIC_RELEASE";
+    case __ATOMIC_ACQ_REL:
+      return "__ATOMIC_ACQ_REL";
+    case __ATOMIC_SEQ_CST:
+      return "__ATOMIC_SEQ_CST";
+    default:
+      return NULL;
+    }
+}
+
+/* Return memory order according to predefined __atomic memory model
+   constants.  LOCATION is provided to locate the problematic statement.  */
+
+static BrigMemoryOrder
+get_memory_order (unsigned memmodel, location_t location)
+{
+  switch (memmodel)
+    {
+    case __ATOMIC_RELAXED:
+      return BRIG_MEMORY_ORDER_RELAXED;
+    case __ATOMIC_ACQUIRE:
+      return BRIG_MEMORY_ORDER_SC_ACQUIRE;
+    case __ATOMIC_RELEASE:
+      return BRIG_MEMORY_ORDER_SC_RELEASE;
+    case __ATOMIC_ACQ_REL:
+      return BRIG_MEMORY_ORDER_SC_ACQUIRE_RELEASE;
+    default:
+      HSA_SORRY_ATV (location,
+		     "support for HSA does not implement memory model: %s",
+		     get_memory_order_name (memmodel));
+      return BRIG_MEMORY_ORDER_NONE;
+    }
+}
+
+/* Helper function to create an HSA atomic binary operation instruction out of
+   calls to atomic builtins.  RET_ORIG is true if the built-in is the variant
+   that return s the value before applying operation, and false if it should
+   return the value after applying the operation (if it returns value at all).
+   ACODE is the atomic operation code, STMT is a gimple call to a builtin.  HBB
+   is the HSA BB to which the instruction should be added.  */
+
+static void
+gen_hsa_ternary_atomic_for_builtin (bool ret_orig,
+ 				    enum BrigAtomicOperation acode, gimple *stmt,
+				    hsa_bb *hbb)
+{
+  tree lhs = gimple_call_lhs (stmt);
+
+  tree type = TREE_TYPE (gimple_call_arg (stmt, 1));
+  BrigType16_t hsa_type = hsa_type_for_scalar_tree_type (type, false);
+  BrigType16_t mtype = mem_type_for_type (hsa_type);
+  tree model = gimple_call_arg (stmt, 2);
+
+  if (!tree_fits_uhwi_p (model))
+    {
+      HSA_SORRY_ATV
+	(gimple_location (stmt),
+	 "support for HSA does not implement memory model %E", model);
+      return;
+    }
+
+  unsigned HOST_WIDE_INT mmodel = tree_to_uhwi (model);
+
+  BrigMemoryOrder memorder = get_memory_order
+    (mmodel, gimple_location (stmt));
+
+  /* Certain atomic insns must have Bx memory types.  */
+  switch (acode)
+    {
+    case BRIG_ATOMIC_LD:
+    case BRIG_ATOMIC_ST:
+    case BRIG_ATOMIC_AND:
+    case BRIG_ATOMIC_OR:
+    case BRIG_ATOMIC_XOR:
+    case BRIG_ATOMIC_EXCH:
+      mtype = hsa_bittype_for_type (mtype);
+      break;
+    default:
+      break;
+    }
+
+  hsa_op_reg *dest;
+  int nops, opcode;
+  if (lhs)
+    {
+      if (ret_orig)
+	dest = hsa_cfun->reg_for_gimple_ssa (lhs);
+      else
+	dest = new hsa_op_reg (hsa_type);
+      opcode = BRIG_OPCODE_ATOMIC;
+      nops = 3;
+    }
+  else
+    {
+      dest = NULL;
+      opcode = BRIG_OPCODE_ATOMICNORET;
+      nops = 2;
+    }
+
+  if (acode == BRIG_ATOMIC_ST && memorder != BRIG_MEMORY_ORDER_RELAXED
+      && memorder != BRIG_MEMORY_ORDER_SC_RELEASE)
+    {
+      HSA_SORRY_ATV (gimple_location (stmt),
+		     "support for HSA does not implement memory model for "
+		     "ATOMIC_ST: %s", get_memory_order_name (mmodel));
+      return;
+    }
+
+  hsa_insn_atomic *atominsn = new hsa_insn_atomic (nops, opcode, acode, mtype,
+						   memorder);
+
+  hsa_op_address *addr;
+  addr = get_address_from_value (gimple_call_arg (stmt, 0), hbb);
+  /* TODO: Warn if addr has private segment, because the finalizer will not
+     accept that (and it does not make much sense).  */
+  hsa_op_base *op = hsa_reg_or_immed_for_gimple_op (gimple_call_arg (stmt, 1),
+						    hbb);
+
+  if (lhs)
+    {
+      atominsn->set_op (0, dest);
+      atominsn->set_op (1, addr);
+      atominsn->set_op (2, op);
+    }
+  else
+    {
+      atominsn->set_op (0, addr);
+      atominsn->set_op (1, op);
+    }
+
+  hbb->append_insn (atominsn);
+
+  /* HSA does not natively support the variants that return the modified value,
+     so re-do the operation again non-atomically if that is what was
+     requested.  */
+  if (lhs && !ret_orig)
+    {
+      int arith;
+      switch (acode)
+	{
+	case BRIG_ATOMIC_ADD:
+	  arith = BRIG_OPCODE_ADD;
+	  break;
+	case BRIG_ATOMIC_AND:
+	  arith = BRIG_OPCODE_AND;
+	  break;
+	case BRIG_ATOMIC_OR:
+	  arith = BRIG_OPCODE_OR;
+	  break;
+	case BRIG_ATOMIC_SUB:
+	  arith = BRIG_OPCODE_SUB;
+	  break;
+	case BRIG_ATOMIC_XOR:
+	  arith = BRIG_OPCODE_XOR;
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+      hsa_op_reg *real_dest = hsa_cfun->reg_for_gimple_ssa (lhs);
+      gen_hsa_binary_operation (arith, real_dest, dest, op, hbb);
+    }
+}
+
+/* Generate HSA instructions for an internal fn.
+   Instructions will be appended to HBB, which also needs to be the
+   corresponding structure to the basic_block of STMT.  */
+
+static void
+gen_hsa_insn_for_internal_fn_call (gcall *stmt, hsa_bb *hbb)
+{
+  gcc_checking_assert (gimple_call_internal_fn (stmt));
+  internal_fn fn = gimple_call_internal_fn (stmt);
+
+  bool is_float_type_p = false;
+  if (gimple_call_lhs (stmt) != NULL
+      && TREE_TYPE (gimple_call_lhs (stmt)) == float_type_node)
+    is_float_type_p = true;
+
+  switch (fn)
+    {
+    case IFN_CEIL:
+      gen_hsa_unaryop_for_builtin (BRIG_OPCODE_CEIL, stmt, hbb);
+      break;
+
+    case IFN_FLOOR:
+      gen_hsa_unaryop_for_builtin (BRIG_OPCODE_FLOOR, stmt, hbb);
+      break;
+
+    case IFN_RINT:
+      gen_hsa_unaryop_for_builtin (BRIG_OPCODE_RINT, stmt, hbb);
+      break;
+
+    case IFN_SQRT:
+      gen_hsa_unaryop_for_builtin (BRIG_OPCODE_SQRT, stmt, hbb);
+      break;
+
+    case IFN_TRUNC:
+      gen_hsa_unaryop_for_builtin (BRIG_OPCODE_TRUNC, stmt, hbb);
+      break;
+
+    case IFN_COS:
+      {
+	if (is_float_type_p)
+	  gen_hsa_unaryop_or_call_for_builtin (BRIG_OPCODE_NCOS, stmt, hbb);
+	else
+	  gen_hsa_unaryop_builtin_call (stmt, hbb);
+
+	break;
+      }
+    case IFN_EXP2:
+      {
+	if (is_float_type_p)
+	  gen_hsa_unaryop_or_call_for_builtin (BRIG_OPCODE_NEXP2, stmt, hbb);
+	else
+	  gen_hsa_unaryop_builtin_call (stmt, hbb);
+
+	break;
+      }
+
+    case IFN_LOG2:
+      {
+	if (is_float_type_p)
+	  gen_hsa_unaryop_or_call_for_builtin (BRIG_OPCODE_NLOG2, stmt, hbb);
+	else
+	  gen_hsa_unaryop_builtin_call (stmt, hbb);
+
+	break;
+      }
+
+    case IFN_SIN:
+      {
+	if (is_float_type_p)
+	  gen_hsa_unaryop_or_call_for_builtin (BRIG_OPCODE_NSIN, stmt, hbb);
+	else
+	  gen_hsa_unaryop_builtin_call (stmt, hbb);
+        break;
+      }
+
+    case IFN_CLRSB:
+      gen_hsa_clrsb (stmt, hbb);
+      break;
+
+    case IFN_CLZ:
+      gen_hsa_unaryop_for_builtin (BRIG_OPCODE_FIRSTBIT, stmt, hbb);
+      break;
+
+    case IFN_CTZ:
+      gen_hsa_unaryop_for_builtin (BRIG_OPCODE_LASTBIT, stmt, hbb);
+      break;
+
+    case IFN_FFS:
+      gen_hsa_ffs (stmt, hbb);
+      break;
+
+    case IFN_PARITY:
+      gen_hsa_parity (stmt, hbb);
+      break;
+
+    case IFN_POPCOUNT:
+      gen_hsa_popcount (stmt, hbb);
+      break;
+
+    case IFN_ACOS:
+    case IFN_ASIN:
+    case IFN_ATAN:
+    case IFN_EXP:
+    case IFN_EXP10:
+    case IFN_EXPM1:
+    case IFN_LOG:
+    case IFN_LOG10:
+    case IFN_LOG1P:
+    case IFN_LOGB:
+    case IFN_SIGNIFICAND:
+    case IFN_TAN:
+    case IFN_NEARBYINT:
+    case IFN_ROUND:
+    case IFN_ATAN2:
+    case IFN_COPYSIGN:
+    case IFN_FMOD:
+    case IFN_POW:
+    case IFN_REMAINDER:
+    case IFN_SCALB:
+    case IFN_FMIN:
+    case IFN_FMAX:
+      gen_hsa_insns_for_call_of_internal_fn (stmt, hbb);
+
+    default:
+      HSA_SORRY_ATV (gimple_location (stmt),
+		     "support for HSA does not implement internal function: %s",
+		     internal_fn_name (fn));
+      break;
+    }
+}
+
+#define HSA_MEMORY_BUILTINS_LIMIT     128
+
+/* Generate HSA instructions for the given call statement STMT.  Instructions
+   will be appended to HBB.  */
+
+static void
+gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
+{
+  gcall *call = as_a <gcall *> (stmt);
+  tree lhs = gimple_call_lhs (stmt);
+  hsa_op_reg *dest;
+
+  if (gimple_call_internal_p (stmt))
+    {
+      gen_hsa_insn_for_internal_fn_call (call, hbb);
+      return;
+    }
+
+  if (!gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
+    {
+      tree function_decl = gimple_call_fndecl (stmt);
+      if (function_decl == NULL_TREE)
+	{
+	  HSA_SORRY_AT (gimple_location (stmt),
+			"support for HSA does not implement indirect calls");
+	  return;
+	}
+
+      if (hsa_callable_function_p (function_decl))
+	gen_hsa_insns_for_direct_call (stmt, hbb);
+      else if (!gen_hsa_insns_for_known_library_call (stmt, hbb))
+	HSA_SORRY_AT (gimple_location (stmt),
+		      "HSA supports only calls of functions marked with pragma "
+		      "omp declare target");
+      return;
+    }
+
+  tree fndecl = gimple_call_fndecl (stmt);
+  enum built_in_function builtin = DECL_FUNCTION_CODE (fndecl);
+  switch (builtin)
+    {
+    case BUILT_IN_FABS:
+    case BUILT_IN_FABSF:
+      gen_hsa_unaryop_for_builtin (BRIG_OPCODE_ABS, stmt, hbb);
+      break;
+
+    case BUILT_IN_CEIL:
+    case BUILT_IN_CEILF:
+      gen_hsa_unaryop_for_builtin (BRIG_OPCODE_CEIL, stmt, hbb);
+      break;
+
+    case BUILT_IN_FLOOR:
+    case BUILT_IN_FLOORF:
+      gen_hsa_unaryop_for_builtin (BRIG_OPCODE_FLOOR, stmt, hbb);
+      break;
+
+    case BUILT_IN_RINT:
+    case BUILT_IN_RINTF:
+      gen_hsa_unaryop_for_builtin (BRIG_OPCODE_RINT, stmt, hbb);
+      break;
+
+    case BUILT_IN_SQRT:
+    case BUILT_IN_SQRTF:
+      gen_hsa_unaryop_for_builtin (BRIG_OPCODE_SQRT, stmt, hbb);
+      break;
+
+    case BUILT_IN_TRUNC:
+    case BUILT_IN_TRUNCF:
+      gen_hsa_unaryop_for_builtin (BRIG_OPCODE_TRUNC, stmt, hbb);
+      break;
+
+    case BUILT_IN_COS:
+    case BUILT_IN_SIN:
+    case BUILT_IN_EXP2:
+    case BUILT_IN_LOG2:
+      /* HSAIL does not provide an instruction for double argument type.  */
+      gen_hsa_unaryop_builtin_call (stmt, hbb);
+      break;
+
+    case BUILT_IN_COSF:
+      gen_hsa_unaryop_or_call_for_builtin (BRIG_OPCODE_NCOS, stmt, hbb);
+      break;
+
+    case BUILT_IN_EXP2F:
+      gen_hsa_unaryop_or_call_for_builtin (BRIG_OPCODE_NEXP2, stmt, hbb);
+      break;
+
+    case BUILT_IN_LOG2F:
+      gen_hsa_unaryop_or_call_for_builtin (BRIG_OPCODE_NLOG2, stmt, hbb);
+      break;
+
+    case BUILT_IN_SINF:
+      gen_hsa_unaryop_or_call_for_builtin (BRIG_OPCODE_NSIN, stmt, hbb);
+      break;
+
+    case BUILT_IN_CLRSB:
+    case BUILT_IN_CLRSBL:
+    case BUILT_IN_CLRSBLL:
+      gen_hsa_clrsb (call, hbb);
+      break;
+
+    case BUILT_IN_CLZ:
+    case BUILT_IN_CLZL:
+    case BUILT_IN_CLZLL:
+      gen_hsa_unaryop_for_builtin (BRIG_OPCODE_FIRSTBIT, stmt, hbb);
+      break;
+
+    case BUILT_IN_CTZ:
+    case BUILT_IN_CTZL:
+    case BUILT_IN_CTZLL:
+      gen_hsa_unaryop_for_builtin (BRIG_OPCODE_LASTBIT, stmt, hbb);
+      break;
+
+    case BUILT_IN_FFS:
+    case BUILT_IN_FFSL:
+    case BUILT_IN_FFSLL:
+      gen_hsa_ffs (call, hbb);
+      break;
+
+    case BUILT_IN_PARITY:
+    case BUILT_IN_PARITYL:
+    case BUILT_IN_PARITYLL:
+      gen_hsa_parity (call, hbb);
+      break;
+
+    case BUILT_IN_POPCOUNT:
+    case BUILT_IN_POPCOUNTL:
+    case BUILT_IN_POPCOUNTLL:
+      gen_hsa_popcount (call, hbb);
+      break;
+
+    case BUILT_IN_ATOMIC_LOAD_1:
+    case BUILT_IN_ATOMIC_LOAD_2:
+    case BUILT_IN_ATOMIC_LOAD_4:
+    case BUILT_IN_ATOMIC_LOAD_8:
+    case BUILT_IN_ATOMIC_LOAD_16:
+      {
+	BrigType16_t mtype;
+	hsa_op_address *addr;
+	addr = get_address_from_value (gimple_call_arg (stmt, 0), hbb);
+	tree model = gimple_call_arg (stmt, 1);
+	if (!tree_fits_uhwi_p (model))
+	  {
+	    HSA_SORRY_ATV
+	      (gimple_location (stmt),
+	       "support for HSA does not implement memory model: %E", model);
+	    return;
+	  }
+
+	unsigned HOST_WIDE_INT mmodel = tree_to_uhwi (model);
+	BrigMemoryOrder memorder = get_memory_order (mmodel,
+						     gimple_location (stmt));
+
+	if (memorder != BRIG_MEMORY_ORDER_RELAXED
+	    && memorder != BRIG_MEMORY_ORDER_SC_RELEASE)
+	  {
+	    HSA_SORRY_ATV
+	      (gimple_location (stmt),
+	       "support for HSA does not implement memory model for "
+	       "ATOMIC_LD: %s", get_memory_order_name (mmodel));
+	    return;
+	  }
+
+	if (lhs)
+	  {
+	    mtype = mem_type_for_type
+	      (hsa_type_for_scalar_tree_type (TREE_TYPE (lhs), false));
+	    mtype = hsa_bittype_for_type (mtype);
+	    dest = hsa_cfun->reg_for_gimple_ssa (lhs);
+	  }
+	else
+	  {
+	    mtype = BRIG_TYPE_B64;
+	    dest = new hsa_op_reg (mtype);
+	  }
+
+	hsa_insn_atomic *atominsn
+	  = new hsa_insn_atomic (2, BRIG_OPCODE_ATOMIC, BRIG_ATOMIC_LD, mtype,
+				 memorder, dest, addr);
+
+	hbb->append_insn (atominsn);
+	break;
+      }
+
+    case BUILT_IN_ATOMIC_EXCHANGE_1:
+    case BUILT_IN_ATOMIC_EXCHANGE_2:
+    case BUILT_IN_ATOMIC_EXCHANGE_4:
+    case BUILT_IN_ATOMIC_EXCHANGE_8:
+    case BUILT_IN_ATOMIC_EXCHANGE_16:
+      gen_hsa_ternary_atomic_for_builtin (true, BRIG_ATOMIC_EXCH, stmt, hbb);
+      break;
+
+    case BUILT_IN_ATOMIC_FETCH_ADD_1:
+    case BUILT_IN_ATOMIC_FETCH_ADD_2:
+    case BUILT_IN_ATOMIC_FETCH_ADD_4:
+    case BUILT_IN_ATOMIC_FETCH_ADD_8:
+    case BUILT_IN_ATOMIC_FETCH_ADD_16:
+      gen_hsa_ternary_atomic_for_builtin (true, BRIG_ATOMIC_ADD, stmt, hbb);
+      break;
+
+    case BUILT_IN_ATOMIC_FETCH_SUB_1:
+    case BUILT_IN_ATOMIC_FETCH_SUB_2:
+    case BUILT_IN_ATOMIC_FETCH_SUB_4:
+    case BUILT_IN_ATOMIC_FETCH_SUB_8:
+    case BUILT_IN_ATOMIC_FETCH_SUB_16:
+      gen_hsa_ternary_atomic_for_builtin (true, BRIG_ATOMIC_SUB, stmt, hbb);
+      break;
+
+    case BUILT_IN_ATOMIC_FETCH_AND_1:
+    case BUILT_IN_ATOMIC_FETCH_AND_2:
+    case BUILT_IN_ATOMIC_FETCH_AND_4:
+    case BUILT_IN_ATOMIC_FETCH_AND_8:
+    case BUILT_IN_ATOMIC_FETCH_AND_16:
+      gen_hsa_ternary_atomic_for_builtin (true, BRIG_ATOMIC_AND, stmt, hbb);
+      break;
+
+    case BUILT_IN_ATOMIC_FETCH_XOR_1:
+    case BUILT_IN_ATOMIC_FETCH_XOR_2:
+    case BUILT_IN_ATOMIC_FETCH_XOR_4:
+    case BUILT_IN_ATOMIC_FETCH_XOR_8:
+    case BUILT_IN_ATOMIC_FETCH_XOR_16:
+      gen_hsa_ternary_atomic_for_builtin (true, BRIG_ATOMIC_XOR, stmt, hbb);
+      break;
+
+    case BUILT_IN_ATOMIC_FETCH_OR_1:
+    case BUILT_IN_ATOMIC_FETCH_OR_2:
+    case BUILT_IN_ATOMIC_FETCH_OR_4:
+    case BUILT_IN_ATOMIC_FETCH_OR_8:
+    case BUILT_IN_ATOMIC_FETCH_OR_16:
+      gen_hsa_ternary_atomic_for_builtin (true, BRIG_ATOMIC_OR, stmt, hbb);
+      break;
+
+    case BUILT_IN_ATOMIC_STORE_1:
+    case BUILT_IN_ATOMIC_STORE_2:
+    case BUILT_IN_ATOMIC_STORE_4:
+    case BUILT_IN_ATOMIC_STORE_8:
+    case BUILT_IN_ATOMIC_STORE_16:
+      /* Since there cannot be any LHS, the first parameter is meaningless.  */
+      gen_hsa_ternary_atomic_for_builtin (true, BRIG_ATOMIC_ST, stmt, hbb);
+      break;
+
+    case BUILT_IN_ATOMIC_ADD_FETCH_1:
+    case BUILT_IN_ATOMIC_ADD_FETCH_2:
+    case BUILT_IN_ATOMIC_ADD_FETCH_4:
+    case BUILT_IN_ATOMIC_ADD_FETCH_8:
+    case BUILT_IN_ATOMIC_ADD_FETCH_16:
+      gen_hsa_ternary_atomic_for_builtin (false, BRIG_ATOMIC_ADD, stmt, hbb);
+      break;
+
+    case BUILT_IN_ATOMIC_SUB_FETCH_1:
+    case BUILT_IN_ATOMIC_SUB_FETCH_2:
+    case BUILT_IN_ATOMIC_SUB_FETCH_4:
+    case BUILT_IN_ATOMIC_SUB_FETCH_8:
+    case BUILT_IN_ATOMIC_SUB_FETCH_16:
+      gen_hsa_ternary_atomic_for_builtin (false, BRIG_ATOMIC_SUB, stmt, hbb);
+      break;
+
+    case BUILT_IN_ATOMIC_AND_FETCH_1:
+    case BUILT_IN_ATOMIC_AND_FETCH_2:
+    case BUILT_IN_ATOMIC_AND_FETCH_4:
+    case BUILT_IN_ATOMIC_AND_FETCH_8:
+    case BUILT_IN_ATOMIC_AND_FETCH_16:
+      gen_hsa_ternary_atomic_for_builtin (false, BRIG_ATOMIC_AND, stmt, hbb);
+      break;
+
+    case BUILT_IN_ATOMIC_XOR_FETCH_1:
+    case BUILT_IN_ATOMIC_XOR_FETCH_2:
+    case BUILT_IN_ATOMIC_XOR_FETCH_4:
+    case BUILT_IN_ATOMIC_XOR_FETCH_8:
+    case BUILT_IN_ATOMIC_XOR_FETCH_16:
+      gen_hsa_ternary_atomic_for_builtin (false, BRIG_ATOMIC_XOR, stmt, hbb);
+      break;
+
+    case BUILT_IN_ATOMIC_OR_FETCH_1:
+    case BUILT_IN_ATOMIC_OR_FETCH_2:
+    case BUILT_IN_ATOMIC_OR_FETCH_4:
+    case BUILT_IN_ATOMIC_OR_FETCH_8:
+    case BUILT_IN_ATOMIC_OR_FETCH_16:
+      gen_hsa_ternary_atomic_for_builtin (false, BRIG_ATOMIC_OR, stmt, hbb);
+      break;
+
+    case BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_1:
+    case BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_2:
+    case BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_4:
+    case BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_8:
+    case BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_16:
+      {
+	/* XXX Ignore mem model for now.  */
+	tree type = TREE_TYPE (gimple_call_arg (stmt, 1));
+
+	BrigType16_t atype  = hsa_bittype_for_type
+	  (hsa_type_for_scalar_tree_type (type, false));
+
+	hsa_insn_atomic *atominsn = new hsa_insn_atomic
+	  (4, BRIG_OPCODE_ATOMIC, BRIG_ATOMIC_CAS, atype,
+	   BRIG_MEMORY_ORDER_SC_ACQUIRE_RELEASE);
+	hsa_op_address *addr;
+	addr = get_address_from_value (gimple_call_arg (stmt, 0), hbb);
+
+	if (lhs != NULL)
+	  dest = hsa_cfun->reg_for_gimple_ssa (lhs);
+	else
+	  dest = new hsa_op_reg (atype);
+
+	/* Should check what the memory scope is */
+	atominsn->m_memoryscope = BRIG_MEMORY_SCOPE_WORKGROUP;
+	atominsn->set_op (0, dest);
+	atominsn->set_op (1, addr);
+	atominsn->set_op
+	  (2, hsa_reg_or_immed_for_gimple_op (gimple_call_arg (stmt, 1), hbb));
+	atominsn->set_op
+	  (3, hsa_reg_or_immed_for_gimple_op (gimple_call_arg (stmt, 2), hbb));
+
+	hbb->append_insn (atominsn);
+	break;
+      }
+    case BUILT_IN_GOMP_PARALLEL:
+      HSA_SORRY_AT (gimple_location (stmt),
+		    "support for HSA does not implement non-gridified "
+		    "OpenMP parallel constructs.");
+      break;
+    case BUILT_IN_OMP_GET_THREAD_NUM:
+      {
+	query_hsa_grid (stmt, BRIG_OPCODE_WORKITEMABSID, 0, hbb);
+	break;
+      }
+
+    case BUILT_IN_OMP_GET_NUM_THREADS:
+      {
+	query_hsa_grid (stmt, BRIG_OPCODE_GRIDSIZE, 0, hbb);
+	break;
+      }
+    case BUILT_IN_GOMP_TEAMS:
+      {
+	gen_set_num_threads (gimple_call_arg (stmt, 1), hbb);
+	break;
+      }
+    case BUILT_IN_OMP_GET_NUM_TEAMS:
+      {
+	gen_get_num_teams (stmt, hbb);
+	break;
+      }
+    case BUILT_IN_OMP_GET_TEAM_NUM:
+      {
+	gen_get_team_num (stmt, hbb);
+	break;
+      }
+    case BUILT_IN_MEMCPY:
+    case BUILT_IN_MEMPCPY:
+      {
+	tree byte_size = gimple_call_arg (stmt, 2);
+
+	if (!tree_fits_uhwi_p (byte_size))
+	  {
+	    gen_hsa_insns_for_direct_call (stmt, hbb);
+	    return;
+	  }
+
+	unsigned n = tree_to_uhwi (byte_size);
+
+	if (n > HSA_MEMORY_BUILTINS_LIMIT)
+	  {
+	    gen_hsa_insns_for_direct_call (stmt, hbb);
+	    return;
+	  }
+
+	tree dst = gimple_call_arg (stmt, 0);
+	tree src = gimple_call_arg (stmt, 1);
+
+	hsa_op_address *dst_addr = get_address_from_value (dst, hbb);
+	hsa_op_address *src_addr = get_address_from_value (src, hbb);
+
+	gen_hsa_memory_copy (hbb, dst_addr, src_addr, n);
+
+	tree lhs = gimple_call_lhs (stmt);
+	if (lhs)
+	  {
+	    hsa_op_reg *lhs_reg = hsa_cfun->reg_for_gimple_ssa (lhs);
+	    hsa_op_with_type *dst_reg = hsa_reg_or_immed_for_gimple_op (dst,
+									hbb);
+	    hsa_op_with_type *tmp;
+
+	    if (builtin == BUILT_IN_MEMPCPY)
+	      {
+		tmp = new hsa_op_reg (dst_reg->m_type);
+		hsa_insn_basic *add = new hsa_insn_basic
+		  (3, BRIG_OPCODE_ADD, tmp->m_type,
+		   tmp, dst_reg, new hsa_op_immed (n, dst_reg->m_type));
+		hbb->append_insn (add);
+	      }
+	    else
+	      tmp = dst_reg;
+
+	    hsa_build_append_simple_mov (lhs_reg, tmp, hbb);
+	  }
+
+	break;
+      }
+    case BUILT_IN_MEMSET:
+      {
+	tree dst = gimple_call_arg (stmt, 0);
+	tree c = gimple_call_arg (stmt, 1);
+
+	if (TREE_CODE (c) != INTEGER_CST)
+	  {
+	    gen_hsa_insns_for_direct_call (stmt, hbb);
+	    return;
+	  }
+
+	tree byte_size = gimple_call_arg (stmt, 2);
+
+	if (!tree_fits_uhwi_p (byte_size))
+	  {
+	    gen_hsa_insns_for_direct_call (stmt, hbb);
+	    return;
+	  }
+
+	unsigned n = tree_to_uhwi (byte_size);
+
+	if (n > HSA_MEMORY_BUILTINS_LIMIT)
+	  {
+	    gen_hsa_insns_for_direct_call (stmt, hbb);
+	    return;
+	  }
+
+	hsa_op_address *dst_addr;
+	dst_addr = get_address_from_value (dst, hbb);
+	unsigned HOST_WIDE_INT constant = tree_to_uhwi
+	  (fold_convert (unsigned_char_type_node, c));
+
+	gen_hsa_memory_set (hbb, dst_addr, constant, n);
+
+	tree lhs = gimple_call_lhs (stmt);
+	if (lhs)
+	  gen_hsa_insns_for_single_assignment (lhs, dst, hbb);
+
+	break;
+      }
+    case BUILT_IN_BZERO:
+      {
+	tree dst = gimple_call_arg (stmt, 0);
+	tree byte_size = gimple_call_arg (stmt, 1);
+
+	if (!tree_fits_uhwi_p (byte_size))
+	  {
+	    gen_hsa_insns_for_direct_call (stmt, hbb);
+	    return;
+	  }
+
+	unsigned n = tree_to_uhwi (byte_size);
+
+	if (n > HSA_MEMORY_BUILTINS_LIMIT)
+	  {
+	    gen_hsa_insns_for_direct_call (stmt, hbb);
+	    return;
+	  }
+
+	hsa_op_address *dst_addr;
+	dst_addr = get_address_from_value (dst, hbb);
+
+	gen_hsa_memory_set (hbb, dst_addr, 0, n);
+
+	break;
+      }
+    case BUILT_IN_ALLOCA:
+    case BUILT_IN_ALLOCA_WITH_ALIGN:
+      {
+	gen_hsa_alloca (call, hbb);
+	break;
+      }
+    default:
+      {
+	gen_hsa_insns_for_direct_call (stmt, hbb);
+	return;
+      }
+    }
+}
+
+/* Generate HSA instructions for a given gimple statement.  Instructions will be
+   appended to HBB.  */
+
+static void
+gen_hsa_insns_for_gimple_stmt (gimple *stmt, hsa_bb *hbb)
+{
+  switch (gimple_code (stmt))
+    {
+    case GIMPLE_ASSIGN:
+      if (gimple_clobber_p (stmt))
+	break;
+
+      if (gimple_assign_single_p (stmt))
+	{
+	  tree lhs = gimple_assign_lhs (stmt);
+	  tree rhs = gimple_assign_rhs1 (stmt);
+	  gen_hsa_insns_for_single_assignment (lhs, rhs, hbb);
+	}
+      else
+	gen_hsa_insns_for_operation_assignment (stmt, hbb);
+      break;
+    case GIMPLE_RETURN:
+      gen_hsa_insns_for_return (as_a <greturn *> (stmt), hbb);
+      break;
+    case GIMPLE_COND:
+      gen_hsa_insns_for_cond_stmt (stmt, hbb);
+      break;
+    case GIMPLE_CALL:
+      gen_hsa_insns_for_call (stmt, hbb);
+      break;
+    case GIMPLE_DEBUG:
+      /* ??? HSA supports some debug facilities.  */
+      break;
+    case GIMPLE_LABEL:
+    {
+      tree label = gimple_label_label (as_a <glabel *> (stmt));
+      if (FORCED_LABEL (label))
+	HSA_SORRY_AT (gimple_location (stmt),
+		      "support for HSA does not implement gimple label with "
+		      "address taken");
+
+      break;
+    }
+    case GIMPLE_NOP:
+    {
+      hbb->append_insn (new hsa_insn_basic (0, BRIG_OPCODE_NOP));
+      break;
+    }
+    case GIMPLE_SWITCH:
+    {
+      gen_hsa_insns_for_switch_stmt (as_a <gswitch *> (stmt), hbb);
+      break;
+    }
+    default:
+      HSA_SORRY_ATV (gimple_location (stmt),
+		     "support for HSA does not implement gimple statement %s",
+		     gimple_code_name[(int) gimple_code (stmt)]);
+    }
+}
+
+/* Generate a HSA PHI from a gimple PHI.  */
+
+static void
+gen_hsa_phi_from_gimple_phi (gimple *phi_stmt, hsa_bb *hbb)
+{
+  hsa_insn_phi *hphi;
+  unsigned count = gimple_phi_num_args (phi_stmt);
+
+  hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa
+    (gimple_phi_result (phi_stmt));
+  hphi = new hsa_insn_phi (count, dest);
+  hphi->m_bb = hbb->m_bb;
+
+  tree lhs = gimple_phi_result (phi_stmt);
+
+  for (unsigned i = 0; i < count; i++)
+    {
+      tree op = gimple_phi_arg_def (phi_stmt, i);
+
+      if (TREE_CODE (op) == SSA_NAME)
+	{
+	  hsa_op_reg *hreg = hsa_cfun->reg_for_gimple_ssa (op);
+	  hphi->set_op (i, hreg);
+	}
+      else
+	{
+	  gcc_assert (is_gimple_min_invariant (op));
+	  tree t = TREE_TYPE (op);
+	  if (!POINTER_TYPE_P (t)
+	      || (TREE_CODE (op) == STRING_CST
+		  && TREE_CODE (TREE_TYPE (t)) == INTEGER_TYPE))
+	    hphi->set_op (i, new hsa_op_immed (op));
+	  else if (POINTER_TYPE_P (TREE_TYPE (lhs))
+		   && TREE_CODE (op) == INTEGER_CST)
+	    {
+	      /* Handle assignment of NULL value to a pointer type.  */
+	      hphi->set_op (i, new hsa_op_immed (op));
+	    }
+	  else if (TREE_CODE (op) == ADDR_EXPR)
+	    {
+	      edge e = gimple_phi_arg_edge (as_a <gphi *> (phi_stmt), i);
+	      hsa_bb *hbb_src = hsa_init_new_bb (split_edge (e));
+	      hsa_op_address *addr = gen_hsa_addr (TREE_OPERAND (op, 0),
+						   hbb_src);
+
+	      hsa_op_reg *dest = new hsa_op_reg (BRIG_TYPE_U64);
+	      hsa_insn_basic *insn = new  hsa_insn_basic
+		(2, BRIG_OPCODE_LDA, BRIG_TYPE_U64, dest, addr);
+	      hbb_src->append_insn (insn);
+
+	      hphi->set_op (i, dest);
+	    }
+	  else
+	    {
+	      HSA_SORRY_AT (gimple_location (phi_stmt),
+			    "support for HSA does not handle PHI nodes with "
+			    "constant address operands");
+	      return;
+	    }
+	}
+    }
+
+  hphi->m_prev = hbb->m_last_phi;
+  hphi->m_next = NULL;
+  if (hbb->m_last_phi)
+    hbb->m_last_phi->m_next = hphi;
+  hbb->m_last_phi = hphi;
+  if (!hbb->m_first_phi)
+    hbb->m_first_phi = hphi;
+}
+
+/* Constructor of class containing HSA-specific information about a basic
+   block.  CFG_BB is the CFG BB this HSA BB is associated with.  IDX is the new
+   index of this BB (so that the constructor does not attempt to use
+   hsa_cfun during its construction).  */
+
+hsa_bb::hsa_bb (basic_block cfg_bb, int idx): m_bb (cfg_bb),
+  m_first_insn (NULL), m_last_insn (NULL), m_first_phi (NULL),
+  m_last_phi (NULL), m_index (idx), m_liveout (BITMAP_ALLOC (NULL)),
+  m_livein (BITMAP_ALLOC (NULL))
+{
+  gcc_assert (!cfg_bb->aux);
+  cfg_bb->aux = this;
+}
+
+/* Constructor of class containing HSA-specific information about a basic
+   block.  CFG_BB is the CFG BB this HSA BB is associated with.  */
+
+hsa_bb::hsa_bb (basic_block cfg_bb): m_bb (cfg_bb),
+  m_first_insn (NULL), m_last_insn (NULL), m_first_phi (NULL),
+  m_last_phi (NULL), m_index (hsa_cfun->m_hbb_count++),
+  m_liveout (BITMAP_ALLOC (NULL)), m_livein (BITMAP_ALLOC (NULL))
+{
+  gcc_assert (!cfg_bb->aux);
+  cfg_bb->aux = this;
+}
+
+/* Destructor of class representing HSA BB.  */
+
+hsa_bb::~hsa_bb ()
+{
+  BITMAP_FREE (m_livein);
+  BITMAP_FREE (m_liveout);
+}
+
+/* Create and initialize and return a new hsa_bb structure for a given CFG
+   basic block BB.  */
+
+hsa_bb *
+hsa_init_new_bb (basic_block bb)
+{
+  return new (*hsa_allocp_bb) hsa_bb (bb);
+}
+
+/* Initialize OMP in an HSA basic block PROLOGUE.  */
+
+static void
+init_prologue (void)
+{
+  if (!hsa_cfun->m_kern_p)
+    return;
+
+  hsa_bb *prologue = hsa_bb_for_bb (ENTRY_BLOCK_PTR_FOR_FN (cfun));
+
+  /* Create a magic number that is going to be printed by libgomp.  */
+  unsigned index = hsa_get_number_decl_kernel_mappings ();
+
+  /* Emit store to debug argument.  */
+  if (PARAM_VALUE (PARAM_HSA_GEN_DEBUG_STORES) > 0)
+    set_debug_value (prologue, new hsa_op_immed (1000 + index, BRIG_TYPE_U64));
+}
+
+/* Initialize hsa_num_threads to a default value.  */
+
+static void
+init_hsa_num_threads (void)
+{
+  hsa_bb *prologue = hsa_bb_for_bb (ENTRY_BLOCK_PTR_FOR_FN (cfun));
+
+  /* Save the default value to private variable hsa_num_threads.  */
+  hsa_insn_basic *basic = new hsa_insn_mem
+    (BRIG_OPCODE_ST, hsa_num_threads->m_type,
+     new hsa_op_immed (0, hsa_num_threads->m_type),
+     new hsa_op_address (hsa_num_threads));
+  prologue->append_insn (basic);
+}
+
+/* Go over gimple representation and generate our internal HSA one.  */
+
+static void
+gen_body_from_gimple ()
+{
+  basic_block bb;
+
+  /* Verify CFG for complex edges we are unable to handle.  */
+  edge_iterator ei;
+  edge e;
+
+  FOR_EACH_BB_FN (bb, cfun)
+    {
+      FOR_EACH_EDGE (e, ei, bb->succs)
+	{
+	  /* Verify all unsupported flags for edges that point
+	     to the same basic block.  */
+	  if (e->flags & EDGE_EH)
+	    {
+	      HSA_SORRY_AT
+		(UNKNOWN_LOCATION,
+		 "support for HSA does not implement exception handling");
+	      return;
+	    }
+	}
+    }
+
+  FOR_EACH_BB_FN (bb, cfun)
+    {
+      gimple_stmt_iterator gsi;
+      hsa_bb *hbb = hsa_bb_for_bb (bb);
+      if (hbb)
+	continue;
+
+      hbb = hsa_init_new_bb (bb);
+
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  gen_hsa_insns_for_gimple_stmt (gsi_stmt (gsi), hbb);
+	  if (hsa_seen_error ())
+	    return;
+	}
+    }
+
+  FOR_EACH_BB_FN (bb, cfun)
+    {
+      gimple_stmt_iterator gsi;
+      hsa_bb *hbb = hsa_bb_for_bb (bb);
+      gcc_assert (hbb != NULL);
+
+      for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	if (!virtual_operand_p (gimple_phi_result (gsi_stmt (gsi))))
+	  gen_hsa_phi_from_gimple_phi (gsi_stmt (gsi), hbb);
+    }
+
+  if (dump_file)
+    {
+      fprintf (dump_file, "------- Generated SSA form -------\n");
+      dump_hsa_cfun (dump_file);
+    }
+}
+
+static void
+gen_function_decl_parameters (hsa_function_representation *f,
+			      tree decl)
+{
+  tree parm;
+  unsigned i;
+
+  for (parm = TYPE_ARG_TYPES (TREE_TYPE (decl)), i = 0;
+       parm;
+       parm = TREE_CHAIN (parm), i++)
+    {
+      /* Result type if last in the tree list.  */
+      if (TREE_CHAIN (parm) == NULL)
+	break;
+
+      tree v = TREE_VALUE (parm);
+
+      hsa_symbol *arg = new hsa_symbol (BRIG_TYPE_NONE, BRIG_SEGMENT_ARG,
+					BRIG_LINKAGE_NONE);
+      arg->m_type = hsa_type_for_tree_type (v, &arg->m_dim);
+      arg->m_name_number = i;
+
+      f->m_input_args.safe_push (arg);
+    }
+
+  tree result_type = TREE_TYPE (TREE_TYPE (decl));
+  if (!VOID_TYPE_P (result_type))
+    {
+      f->m_output_arg = new hsa_symbol (BRIG_TYPE_NONE, BRIG_SEGMENT_ARG,
+					BRIG_LINKAGE_NONE);
+      f->m_output_arg->m_type = hsa_type_for_tree_type
+	(result_type, &f->m_output_arg->m_dim);
+      f->m_output_arg->m_name = "res";
+    }
+}
+
+/* Generate the vector of parameters of the HSA representation of the current
+   function.  This also includes the output parameter representing the
+   result.  */
+
+static void
+gen_function_def_parameters ()
+{
+  tree parm;
+
+  hsa_bb *prologue = hsa_bb_for_bb (ENTRY_BLOCK_PTR_FOR_FN (cfun));
+
+  for (parm = DECL_ARGUMENTS (cfun->decl); parm;
+       parm = DECL_CHAIN (parm))
+    {
+      struct hsa_symbol **slot;
+
+      hsa_symbol *arg = new hsa_symbol
+	(BRIG_TYPE_NONE,
+	 hsa_cfun->m_kern_p ? BRIG_SEGMENT_KERNARG : BRIG_SEGMENT_ARG,
+	 BRIG_LINKAGE_FUNCTION);
+      arg->fillup_for_decl (parm);
+
+      hsa_cfun->m_input_args.safe_push (arg);
+
+      if (hsa_seen_error ())
+	return;
+
+      arg->m_name = hsa_get_declaration_name (parm);
+
+      /* Copy all input arguments and create corresponding private symbols
+	 for them.  */
+      hsa_symbol *private_arg;
+      hsa_op_address *parm_addr = new hsa_op_address (arg);
+
+      if (TREE_ADDRESSABLE (parm)
+	  || (!is_gimple_reg (parm) && !TREE_READONLY (parm)))
+	{
+	  private_arg = hsa_cfun->create_hsa_temporary (arg->m_type);
+	  private_arg->fillup_for_decl (parm);
+
+	  hsa_op_address *private_arg_addr = new hsa_op_address (private_arg);
+	  gen_hsa_memory_copy (prologue, private_arg_addr, parm_addr,
+			       arg->total_byte_size ());
+	}
+      else
+	private_arg = arg;
+
+      slot = hsa_cfun->m_local_symbols->find_slot (private_arg, INSERT);
+      gcc_assert (!*slot);
+      *slot = private_arg;
+
+      if (is_gimple_reg (parm))
+	{
+	  tree ddef = ssa_default_def (cfun, parm);
+	  if (ddef && !has_zero_uses (ddef))
+	    {
+	      BrigType16_t mtype = mem_type_for_type
+		(hsa_type_for_scalar_tree_type (TREE_TYPE (ddef), false));
+	      hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (ddef);
+	      hsa_insn_mem *mem = new hsa_insn_mem (BRIG_OPCODE_LD, mtype,
+						    dest, parm_addr);
+	      gcc_assert (!parm_addr->m_reg);
+	      prologue->append_insn (mem);
+	    }
+	}
+    }
+
+  if (!VOID_TYPE_P (TREE_TYPE (TREE_TYPE (cfun->decl))))
+    {
+      struct hsa_symbol **slot;
+
+      hsa_cfun->m_output_arg = new hsa_symbol (BRIG_TYPE_NONE, BRIG_SEGMENT_ARG,
+					       BRIG_LINKAGE_FUNCTION);
+      hsa_cfun->m_output_arg->fillup_for_decl (DECL_RESULT (cfun->decl));
+
+      if (hsa_seen_error ())
+	return;
+
+      hsa_cfun->m_output_arg->m_name = "res";
+      slot = hsa_cfun->m_local_symbols->find_slot (hsa_cfun->m_output_arg,
+						   INSERT);
+      gcc_assert (!*slot);
+      *slot = hsa_cfun->m_output_arg;
+    }
+}
+
+/* Generate function representation that corresponds to
+   a function declaration.  */
+
+hsa_function_representation *
+hsa_generate_function_declaration (tree decl)
+{
+  hsa_function_representation *fun = new hsa_function_representation
+    (decl, false, 0);
+
+  fun->m_declaration_p = true;
+  fun->m_name = get_brig_function_name (decl);
+  gen_function_decl_parameters (fun, decl);
+
+  return fun;
+}
+
+
+/* Generate function representation that corresponds to
+   an internal FN.  */
+
+hsa_function_representation *
+hsa_generate_internal_fn_decl (hsa_internal_fn *fn)
+{
+  hsa_function_representation *fun = new hsa_function_representation (fn);
+
+  fun->m_name = fn->name ();
+
+  for (unsigned i = 0; i < fn->get_arity (); i++)
+    {
+      hsa_symbol *arg = new hsa_symbol
+	(fn->get_argument_type (i), BRIG_SEGMENT_ARG, BRIG_LINKAGE_NONE);
+      arg->m_name_number = i;
+      fun->m_input_args.safe_push (arg);
+    }
+
+  fun->m_output_arg = new hsa_symbol
+    (fn->get_argument_type (-1), BRIG_SEGMENT_ARG, BRIG_LINKAGE_NONE);
+  fun->m_output_arg->m_name = "res";
+
+  return fun;
+}
+
+/* Return true if switch statement S can be transformed
+   to a SBR instruction in HSAIL.  */
+
+static bool
+transformable_switch_to_sbr_p (gswitch *s)
+{
+  /* Identify if a switch statement can be transformed to
+     SBR instruction, like:
+
+     sbr_u32 $s1 [@label1, @label2, @label3];
+  */
+
+  tree size = get_switch_size (s);
+  if (!tree_fits_uhwi_p (size))
+    return false;
+
+  if (tree_to_uhwi (size) > HSA_MAXIMUM_SBR_LABELS)
+    return false;
+
+  return true;
+}
+
+/* Structure hold connection between PHI nodes and immediate
+   values hold by there nodes.  */
+
+struct phi_definition
+{
+  phi_definition (unsigned phi_i, unsigned label_i, tree imm):
+    phi_index (phi_i), label_index (label_i), phi_value (imm)
+  {}
+
+  unsigned phi_index;
+  unsigned label_index;
+  tree phi_value;
+};
+
+/* Sum slice of a vector V, starting from index START and ending
+   at the index END - 1.  */
+
+template <typename T>
+static
+T sum_slice (const auto_vec <T> &v, unsigned start, unsigned end)
+{
+  T s = 0;
+
+  for (unsigned i = start; i < end; i++)
+    s += v[i];
+
+  return s;
+}
+
+/* Function transforms GIMPLE SWITCH statements to a series of IF statements.
+   Let's assume following example:
+
+L0:
+   switch (index)
+     case C1:
+L1:    hard_work_1 ();
+       break;
+     case C2..C3:
+L2:    hard_work_2 ();
+       break;
+     default:
+LD:    hard_work_3 ();
+       break;
+
+  The transformation encompasses following steps:
+    1) all immediate values used by edges coming from the switch basic block
+       are saved
+    2) all these edges are removed
+    3) the switch statement (in L0) is replaced by:
+	 if (index == C1)
+	   goto L1;
+	 else
+	   goto L1';
+
+    4) newly created basic block Lx' is used for generation of
+       a next condition
+    5) else branch of the last condition goes to LD
+    6) fix all immediate values in PHI nodes that were propagated though
+       edges that were removed in step 2
+
+  Note: if a case is made by a range C1..C2, then process
+	following transformation:
+
+  switch_cond_op1 = C1 <= index;
+  switch_cond_op2 = index <= C2;
+  switch_cond_and = switch_cond_op1 & switch_cond_op2;
+  if (switch_cond_and != 0)
+    goto Lx;
+  else
+    goto Ly;
+
+*/
+
+static void
+convert_switch_statements ()
+{
+  function *func = DECL_STRUCT_FUNCTION (current_function_decl);
+  basic_block bb;
+
+  bool need_update = false;
+
+  FOR_EACH_BB_FN (bb, func)
+  {
+    gimple_stmt_iterator gsi = gsi_last_bb (bb);
+    if (gsi_end_p (gsi))
+      continue;
+
+    gimple *stmt = gsi_stmt (gsi);
+
+    if (gimple_code (stmt) == GIMPLE_SWITCH)
+      {
+	gswitch *s = as_a <gswitch *> (stmt);
+
+	/* If the switch can utilize SBR insn, skip the statement.  */
+	if (transformable_switch_to_sbr_p (s))
+	  continue;
+
+	need_update = true;
+
+	unsigned labels = gimple_switch_num_labels (s);
+	tree index = gimple_switch_index (s);
+	tree index_type = TREE_TYPE (index);
+	tree default_label = gimple_switch_default_label (s);
+	basic_block default_label_bb = label_to_block_fn
+	  (func, CASE_LABEL (default_label));
+	basic_block cur_bb = bb;
+
+	auto_vec <edge> new_edges;
+	auto_vec <phi_definition *> phi_todo_list;
+	auto_vec <gcov_type> edge_counts;
+	auto_vec <int> edge_probabilities;
+
+	/* Investigate all labels that and PHI nodes in these edges which
+	   should be fixed after we add new collection of edges.  */
+	for (unsigned i = 0; i < labels; i++)
+	  {
+	    tree label = gimple_switch_label (s, i);
+	    basic_block label_bb = label_to_block_fn (func, CASE_LABEL (label));
+	    edge e = find_edge (bb, label_bb);
+	    edge_counts.safe_push (e->count);
+	    edge_probabilities.safe_push (e->probability);
+	    gphi_iterator phi_gsi;
+
+	    /* Save PHI definitions that will be destroyed because of an edge
+	       is going to be removed.  */
+	    unsigned phi_index = 0;
+	    for (phi_gsi = gsi_start_phis (e->dest);
+		 !gsi_end_p (phi_gsi); gsi_next (&phi_gsi))
+	      {
+		gphi *phi = phi_gsi.phi ();
+		for (unsigned j = 0; j < gimple_phi_num_args (phi); j++)
+		  {
+		    if (gimple_phi_arg_edge (phi, j) == e)
+		      {
+			tree imm = gimple_phi_arg_def (phi, j);
+			phi_todo_list.safe_push
+			  (new phi_definition (phi_index, i, imm));
+			break;
+		      }
+		  }
+		phi_index++;
+	      }
+	  }
+
+	/* Remove all edges for the current basic block.  */
+	for (int i = EDGE_COUNT (bb->succs) - 1; i >= 0; i--)
+ 	  {
+	    edge e = EDGE_SUCC (bb, i);
+	    remove_edge (e);
+	  }
+
+	/* Iterate all non-default labels.  */
+	for (unsigned i = 1; i < labels; i++)
+	  {
+	    tree label = gimple_switch_label (s, i);
+	    tree low = CASE_LOW (label);
+	    tree high = CASE_HIGH (label);
+
+	    if (!useless_type_conversion_p (TREE_TYPE (low), index_type))
+	      low = fold_convert (index_type, low);
+
+	    gimple_stmt_iterator cond_gsi = gsi_last_bb (cur_bb);
+	    gimple *c = NULL;
+	    if (high)
+	      {
+		tree tmp1 = make_temp_ssa_name (boolean_type_node, NULL,
+						"switch_cond_op1");
+
+		gimple *assign1 = gimple_build_assign (tmp1, LE_EXPR, low,
+						      index);
+
+		tree tmp2 = make_temp_ssa_name (boolean_type_node, NULL,
+						"switch_cond_op2");
+
+		if (!useless_type_conversion_p (TREE_TYPE (high), index_type))
+		  high = fold_convert (index_type, high);
+		gimple *assign2 = gimple_build_assign (tmp2, LE_EXPR, index,
+						      high);
+
+		tree tmp3 = make_temp_ssa_name (boolean_type_node, NULL,
+						"switch_cond_and");
+		gimple *assign3 = gimple_build_assign (tmp3, BIT_AND_EXPR, tmp1,
+						      tmp2);
+
+		gsi_insert_before (&cond_gsi, assign1, GSI_SAME_STMT);
+		gsi_insert_before (&cond_gsi, assign2, GSI_SAME_STMT);
+		gsi_insert_before (&cond_gsi, assign3, GSI_SAME_STMT);
+
+		c = gimple_build_cond (NE_EXPR, tmp3, constant_boolean_node
+				       (false, boolean_type_node), NULL, NULL);
+	      }
+	    else
+	      c = gimple_build_cond (EQ_EXPR, index, low, NULL, NULL);
+
+	    gimple_set_location (c, gimple_location (stmt));
+
+	    gsi_insert_before (&cond_gsi, c, GSI_SAME_STMT);
+
+	    basic_block label_bb = label_to_block_fn
+	      (func, CASE_LABEL (label));
+	    edge new_edge = make_edge (cur_bb, label_bb, EDGE_TRUE_VALUE);
+	    int prob_sum = sum_slice <int> (edge_probabilities, i, labels) +
+	       edge_probabilities[0];
+
+	    if (prob_sum)
+	      new_edge->probability = RDIV
+		(REG_BR_PROB_BASE * edge_probabilities[i], prob_sum);
+
+	    new_edge->count = edge_counts[i];
+	    new_edges.safe_push (new_edge);
+
+	    if (i < labels - 1)
+	      {
+		/* Prepare another basic block that will contain
+		   next condition.  */
+		basic_block next_bb = create_empty_bb (cur_bb);
+		if (current_loops)
+		  {
+		    add_bb_to_loop (next_bb, cur_bb->loop_father);
+		    loops_state_set (LOOPS_NEED_FIXUP);
+		  }
+
+		edge next_edge = make_edge (cur_bb, next_bb, EDGE_FALSE_VALUE);
+		next_edge->probability = inverse_probability
+		  (new_edge->probability);
+		next_edge->count = edge_counts[0]
+		  + sum_slice <gcov_type> (edge_counts, i, labels);
+		next_bb->frequency = EDGE_FREQUENCY (next_edge);
+		cur_bb = next_bb;
+	      }
+	    else /* Link last IF statement and default label
+		    of the switch.  */
+	      {
+		edge e = make_edge (cur_bb, default_label_bb, EDGE_FALSE_VALUE);
+		e->probability = inverse_probability (new_edge->probability);
+		e->count = edge_counts[0];
+		new_edges.safe_insert (0, e);
+	      }
+	  }
+
+	  /* Restore original PHI immediate value.  */
+	  for (unsigned i = 0; i < phi_todo_list.length (); i++)
+	    {
+	      phi_definition *phi_def = phi_todo_list[i];
+	      edge new_edge = new_edges[phi_def->label_index];
+
+	      gphi_iterator it = gsi_start_phis (new_edge->dest);
+	      for (unsigned i = 0; i < phi_def->phi_index; i++)
+		gsi_next (&it);
+
+	      gphi *phi = it.phi ();
+	      add_phi_arg (phi, phi_def->phi_value, new_edge, UNKNOWN_LOCATION);
+	      delete phi_def;
+	    }
+
+	/* Remove the original GIMPLE switch statement.  */
+	gsi_remove (&gsi, true);
+      }
+  }
+
+  if (dump_file)
+    dump_function_to_file (current_function_decl, dump_file, TDF_DETAILS);
+
+  if (need_update)
+    {
+      free_dominance_info (CDI_DOMINATORS);
+      calculate_dominance_info (CDI_DOMINATORS);
+    }
+}
+
+/* Expand builtins that can't be handled by HSA back-end.  */
+
+static void
+expand_builtins ()
+{
+  function *func = DECL_STRUCT_FUNCTION (current_function_decl);
+  basic_block bb;
+
+  FOR_EACH_BB_FN (bb, func)
+  {
+    for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
+	 gsi_next (&gsi))
+      {
+	gimple *stmt = gsi_stmt (gsi);
+
+	if (gimple_code (stmt) != GIMPLE_CALL)
+	  continue;
+
+	gcall *call = as_a <gcall *> (stmt);
+
+	if (!gimple_call_builtin_p (call, BUILT_IN_NORMAL))
+	  continue;
+
+	tree fndecl = gimple_call_fndecl (stmt);
+	enum built_in_function fn = DECL_FUNCTION_CODE (fndecl);
+	switch (fn)
+	  {
+	  case BUILT_IN_CEXPF:
+	  case BUILT_IN_CEXPIF:
+	  case BUILT_IN_CEXPI:
+	    {
+	      /* Similar to builtins.c (expand_builtin_cexpi), the builtin
+		 can be transformed to: cexp(I * z) = ccos(z) + I * csin(z).  */
+	      tree lhs = gimple_call_lhs (stmt);
+	      tree rhs = gimple_call_arg (stmt, 0);
+	      tree rhs_type = TREE_TYPE (rhs);
+	      bool float_type_p = rhs_type == float_type_node;
+	      tree real_part = make_temp_ssa_name (rhs_type, NULL,
+						   "cexp_real_part");
+	      tree imag_part = make_temp_ssa_name (rhs_type, NULL,
+						   "cexp_imag_part");
+
+	      tree cos_fndecl = mathfn_built_in
+		(rhs_type, fn == float_type_p ? BUILT_IN_COSF : BUILT_IN_COS);
+	      gcall *cos = gimple_build_call (cos_fndecl, 1, rhs);
+	      gimple_call_set_lhs (cos, real_part);
+	      gsi_insert_before (&gsi, cos, GSI_SAME_STMT);
+
+	      tree sin_fndecl = mathfn_built_in
+		(rhs_type, fn == float_type_p ? BUILT_IN_SINF : BUILT_IN_SIN);
+	      gcall *sin = gimple_build_call (sin_fndecl, 1, rhs);
+	      gimple_call_set_lhs (sin, imag_part);
+	      gsi_insert_before (&gsi, sin, GSI_SAME_STMT);
+
+
+	      gassign *assign = gimple_build_assign (lhs, COMPLEX_EXPR,
+						     real_part, imag_part);
+	      gsi_insert_before (&gsi, assign, GSI_SAME_STMT);
+	      gsi_remove (&gsi, true);
+
+	      break;
+	    }
+	  default:
+	    break;
+	  }
+      }
+  }
+}
+
+/* Emit HSA module variables that are global for the entire module.  */
+
+static void
+emit_hsa_module_variables (void)
+{
+  hsa_num_threads = new hsa_symbol (BRIG_TYPE_U32, BRIG_SEGMENT_PRIVATE,
+				    BRIG_LINKAGE_MODULE, true);
+
+  hsa_num_threads->m_name = "hsa_num_threads";
+
+  hsa_brig_emit_omp_symbols ();
+}
+
+/* Generate HSAIL representation of the current function and write into a
+   special section of the output file.  If KERNEL is set, the function will be
+   considered an HSA kernel callable from the host, otherwise it will be
+   compiled as an HSA function callable from other HSA code.  */
+
+static void
+generate_hsa (bool kernel)
+{
+  hsa_init_data_for_cfun ();
+
+  if (hsa_num_threads == NULL)
+    emit_hsa_module_variables ();
+
+  /* Initialize hsa_cfun.  */
+  hsa_cfun = new hsa_function_representation (cfun->decl, kernel,
+					      SSANAMES (cfun)->length ());
+  hsa_cfun->init_extra_bbs ();
+
+  if (flag_tm)
+    {
+      HSA_SORRY_AT (UNKNOWN_LOCATION,
+		    "support for HSA does not implement transactional memory");
+      goto fail;
+    }
+
+  verify_function_arguments (cfun->decl);
+  if (hsa_seen_error ())
+    goto fail;
+
+  hsa_cfun->m_name = get_brig_function_name (cfun->decl);
+
+  gen_function_def_parameters ();
+  if (hsa_seen_error ())
+    goto fail;
+
+  init_prologue ();
+
+  gen_body_from_gimple ();
+  if (hsa_seen_error ())
+    goto fail;
+
+  if (hsa_cfun->m_kernel_dispatch_count)
+    init_hsa_num_threads ();
+
+  if (hsa_cfun->m_kern_p)
+    {
+      hsa_function_summary *s = hsa_summaries->get
+	(cgraph_node::get (hsa_cfun->m_decl));
+      hsa_add_kern_decl_mapping (current_function_decl, hsa_cfun->m_name,
+				 hsa_cfun->m_maximum_omp_data_size,
+				 s->m_gridified_kernel_p);
+    }
+
+#ifdef ENABLE_CHECKING
+  for (unsigned i = 0; i < hsa_cfun->m_ssa_map.length (); i++)
+    if (hsa_cfun->m_ssa_map[i])
+      hsa_cfun->m_ssa_map[i]->verify_ssa ();
+
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, cfun)
+    {
+      hsa_bb *hbb = hsa_bb_for_bb (bb);
+
+      for (hsa_insn_basic *insn = hbb->m_first_insn; insn; insn = insn->m_next)
+	insn->verify ();
+    }
+
+#endif
+
+  hsa_regalloc ();
+  hsa_brig_emit_function ();
+
+ fail:
+  hsa_deinit_data_for_cfun ();
+}
+
+namespace {
+
+const pass_data pass_data_gen_hsail =
+{
+  GIMPLE_PASS,
+  "hsagen",	 			/* name */
+  OPTGROUP_NONE,                        /* optinfo_flags */
+  TV_NONE,				/* tv_id */
+  PROP_cfg | PROP_ssa,                  /* properties_required */
+  0,					/* properties_provided */
+  0,					/* properties_destroyed */
+  0,					/* todo_flags_start */
+  0					/* todo_flags_finish */
+};
+
+class pass_gen_hsail : public gimple_opt_pass
+{
+public:
+  pass_gen_hsail (gcc::context *ctxt)
+    : gimple_opt_pass(pass_data_gen_hsail, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  bool gate (function *);
+  unsigned int execute (function *);
+
+}; // class pass_gen_hsail
+
+/* Determine whether or not to run generation of HSAIL.  */
+
+bool
+pass_gen_hsail::gate (function *f)
+{
+  return hsa_gen_requested_p ()
+    && hsa_gpu_implementation_p (f->decl);
+}
+
+unsigned int
+pass_gen_hsail::execute (function *)
+{
+  hsa_function_summary *s = hsa_summaries->get
+    (cgraph_node::get_create (current_function_decl));
+
+  convert_switch_statements ();
+  expand_builtins ();
+  generate_hsa (s->m_kind == HSA_KERNEL);
+  TREE_ASM_WRITTEN (current_function_decl) = 1;
+  return TODO_discard_function;
+}
+
+} // anon namespace
+
+/* Create the instance of hsa gen pass.  */
+
+gimple_opt_pass *
+make_pass_gen_hsail (gcc::context *ctxt)
+{
+  return new pass_gen_hsail (ctxt);
+}
diff --git a/gcc/hsa.c b/gcc/hsa.c
new file mode 100644
index 0000000..0dd6184
--- /dev/null
+++ b/gcc/hsa.c
@@ -0,0 +1,942 @@
+/* Implementation of commonly needed HSAIL related functions and methods.
+   Copyright (C) 2013-2016 Free Software Foundation, Inc.
+   Contributed by Martin Jambor <mjambor@suse.cz> and
+   Martin Liska <mliska@suse.cz>.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "is-a.h"
+#include "hash-set.h"
+#include "hash-map.h"
+#include "vec.h"
+#include "tree.h"
+#include "dumpfile.h"
+#include "gimple-pretty-print.h"
+#include "diagnostic-core.h"
+#include "alloc-pool.h"
+#include "cgraph.h"
+#include "print-tree.h"
+#include "stringpool.h"
+#include "symbol-summary.h"
+#include "hsa.h"
+#include "internal-fn.h"
+#include "ctype.h"
+
+/* Structure containing intermediate HSA representation of the generated
+   function. */
+class hsa_function_representation *hsa_cfun;
+
+/* Element of the mapping vector between a host decl and an HSA kernel.  */
+
+struct GTY(()) hsa_decl_kernel_map_element
+{
+  /* The decl of the host function.  */
+  tree decl;
+  /* Name of the HSA kernel in BRIG.  */
+  char * GTY((skip)) name;
+  /* Size of OMP data, if the kernel contains a kernel dispatch.  */
+  unsigned omp_data_size;
+  /* True if the function is gridified kernel.  */
+  bool gridified_kernel_p;
+};
+
+/* Mapping between decls and corresponding HSA kernels in this compilation
+   unit.  */
+
+static GTY (()) vec<hsa_decl_kernel_map_element, va_gc> *hsa_decl_kernel_mapping;
+
+/* Mapping between decls and corresponding HSA kernels
+   called by the function.  */
+hash_map <tree, vec <const char *> *> *hsa_decl_kernel_dependencies;
+
+/* Hash function to lookup a symbol for a decl.  */
+hash_table <hsa_noop_symbol_hasher> *hsa_global_variable_symbols;
+
+/* HSA summaries.  */
+hsa_summary_t *hsa_summaries = NULL;
+
+/* HSA number of threads.  */
+hsa_symbol *hsa_num_threads = NULL;
+
+/* HSA function that cannot be expanded to HSAIL.  */
+hash_set <tree> *hsa_failed_functions = NULL;
+
+/* True if compilation unit-wide data are already allocated and initialized.  */
+static bool compilation_unit_data_initialized;
+
+/* Return true if FNDECL represents an HSA-callable function.  */
+
+bool
+hsa_callable_function_p (tree fndecl)
+{
+  return (lookup_attribute ("omp declare target", DECL_ATTRIBUTES (fndecl))
+	  && !lookup_attribute ("oacc function", DECL_ATTRIBUTES (fndecl)));
+}
+
+/* Allocate HSA structures that are are used when dealing with different
+   functions.  */
+
+void
+hsa_init_compilation_unit_data (void)
+{
+  if (compilation_unit_data_initialized)
+    return;
+
+  compilation_unit_data_initialized = true;
+
+  hsa_global_variable_symbols = new hash_table <hsa_noop_symbol_hasher> (8);
+  hsa_failed_functions = new hash_set <tree> ();
+  hsa_emitted_internal_decls = new hash_table <hsa_internal_fn_hasher> (2);
+}
+
+/* Free data structures that are used when dealing with different
+   functions.  */
+
+void
+hsa_deinit_compilation_unit_data (void)
+{
+  gcc_assert (compilation_unit_data_initialized);
+
+  delete hsa_failed_functions;
+  delete hsa_emitted_internal_decls;
+
+  for (hash_table <hsa_noop_symbol_hasher>::iterator it =
+       hsa_global_variable_symbols->begin ();
+       it != hsa_global_variable_symbols->end (); ++it)
+    {
+      hsa_symbol *sym = *it;
+      delete sym;
+    }
+
+  delete hsa_global_variable_symbols;
+
+  if (hsa_num_threads)
+    {
+      delete hsa_num_threads;
+      hsa_num_threads = NULL;
+    }
+
+  compilation_unit_data_initialized = false;
+}
+
+/* Return true if we are generating large HSA machine model.  */
+
+bool
+hsa_machine_large_p (void)
+{
+  /* FIXME: I suppose this is technically wrong but should work for me now.  */
+  return (GET_MODE_BITSIZE (Pmode) == 64);
+}
+
+/* Return the HSA profile we are using.  */
+
+bool
+hsa_full_profile_p (void)
+{
+  return true;
+}
+
+/* Return true if a register in operand number OPNUM of instruction
+   is an output.  False if it is an input.  */
+
+bool
+hsa_insn_basic::op_output_p (unsigned opnum)
+{
+  switch (m_opcode)
+    {
+    case HSA_OPCODE_PHI:
+    case BRIG_OPCODE_CBR:
+    case BRIG_OPCODE_SBR:
+    case BRIG_OPCODE_ST:
+    case BRIG_OPCODE_SIGNALNORET:
+      /* FIXME: There are probably missing cases here, double check.  */
+      return false;
+    case BRIG_OPCODE_EXPAND:
+      /* Example: expand_v4_b32_b128 (dest0, dest1, dest2, dest3), src0.  */
+      return opnum < operand_count () - 1;
+    default:
+     return opnum == 0;
+    }
+}
+
+/* Return true if OPCODE is an floating-point bit instruction opcode.  */
+
+bool
+hsa_opcode_floating_bit_insn_p (BrigOpcode16_t opcode)
+{
+  switch (opcode)
+    {
+    case BRIG_OPCODE_NEG:
+    case BRIG_OPCODE_ABS:
+    case BRIG_OPCODE_CLASS:
+    case BRIG_OPCODE_COPYSIGN:
+      return true;
+    default:
+      return false;
+    }
+}
+
+/* Return the number of destination operands for this INSN.  */
+
+unsigned
+hsa_insn_basic::input_count ()
+{
+  switch (m_opcode)
+    {
+      default:
+	return 1;
+
+      case BRIG_OPCODE_NOP:
+	return 0;
+
+      case BRIG_OPCODE_EXPAND:
+	return 2;
+
+      case BRIG_OPCODE_LD:
+	/* ld_v[234] not yet handled.  */
+	return 1;
+
+      case BRIG_OPCODE_ST:
+	return 0;
+
+      case BRIG_OPCODE_ATOMICNORET:
+	return 0;
+
+      case BRIG_OPCODE_SIGNAL:
+	return 1;
+
+      case BRIG_OPCODE_SIGNALNORET:
+	return 0;
+
+      case BRIG_OPCODE_MEMFENCE:
+	return 0;
+
+      case BRIG_OPCODE_RDIMAGE:
+      case BRIG_OPCODE_LDIMAGE:
+      case BRIG_OPCODE_STIMAGE:
+      case BRIG_OPCODE_QUERYIMAGE:
+      case BRIG_OPCODE_QUERYSAMPLER:
+	sorry ("HSA image ops not handled");
+	return 0;
+
+      case BRIG_OPCODE_CBR:
+      case BRIG_OPCODE_BR:
+	return 0;
+
+      case BRIG_OPCODE_SBR:
+	return 0; /* ??? */
+
+      case BRIG_OPCODE_WAVEBARRIER:
+	return 0; /* ??? */
+
+      case BRIG_OPCODE_BARRIER:
+      case BRIG_OPCODE_ARRIVEFBAR:
+      case BRIG_OPCODE_INITFBAR:
+      case BRIG_OPCODE_JOINFBAR:
+      case BRIG_OPCODE_LEAVEFBAR:
+      case BRIG_OPCODE_RELEASEFBAR:
+      case BRIG_OPCODE_WAITFBAR:
+	return 0;
+
+      case BRIG_OPCODE_LDF:
+	return 1;
+
+      case BRIG_OPCODE_ACTIVELANECOUNT:
+      case BRIG_OPCODE_ACTIVELANEID:
+      case BRIG_OPCODE_ACTIVELANEMASK:
+      case BRIG_OPCODE_ACTIVELANEPERMUTE:
+	return 1; /* ??? */
+
+      case BRIG_OPCODE_CALL:
+      case BRIG_OPCODE_SCALL:
+      case BRIG_OPCODE_ICALL:
+	return 0;
+
+      case BRIG_OPCODE_RET:
+	return 0;
+
+      case BRIG_OPCODE_ALLOCA:
+	return 1;
+
+      case BRIG_OPCODE_CLEARDETECTEXCEPT:
+	return 0;
+
+      case BRIG_OPCODE_SETDETECTEXCEPT:
+	return 0;
+
+      case BRIG_OPCODE_PACKETCOMPLETIONSIG:
+      case BRIG_OPCODE_PACKETID:
+      case BRIG_OPCODE_CASQUEUEWRITEINDEX:
+      case BRIG_OPCODE_LDQUEUEREADINDEX:
+      case BRIG_OPCODE_LDQUEUEWRITEINDEX:
+      case BRIG_OPCODE_STQUEUEREADINDEX:
+      case BRIG_OPCODE_STQUEUEWRITEINDEX:
+	return 1; /* ??? */
+
+      case BRIG_OPCODE_ADDQUEUEWRITEINDEX:
+	return 1;
+
+      case BRIG_OPCODE_DEBUGTRAP:
+	return 0;
+
+      case BRIG_OPCODE_GROUPBASEPTR:
+      case BRIG_OPCODE_KERNARGBASEPTR:
+	return 1; /* ??? */
+
+      case HSA_OPCODE_ARG_BLOCK:
+	return 0;
+
+      case BRIG_KIND_DIRECTIVE_COMMENT:
+	return 0;
+    }
+}
+
+/* Return the number of source operands for this INSN.  */
+
+unsigned
+hsa_insn_basic::num_used_ops ()
+{
+  gcc_checking_assert (input_count () <= operand_count ());
+
+  return operand_count () - input_count ();
+}
+
+/* Set alignment to VALUE.  */
+
+void
+hsa_insn_mem::set_align (BrigAlignment8_t value)
+{
+  /* TODO: Perhaps remove this dump later on:  */
+  if (dump_file && (dump_flags & TDF_DETAILS) && value < m_align)
+    {
+      fprintf (dump_file, "Decreasing alignment to %u in instruction ", value);
+      dump_hsa_insn (dump_file, this);
+    }
+  m_align = value;
+}
+
+/* Return size of HSA type T in bits.  */
+
+unsigned
+hsa_type_bit_size (BrigType16_t t)
+{
+  switch (t)
+    {
+    case BRIG_TYPE_B1:
+      return 1;
+
+    case BRIG_TYPE_U8:
+    case BRIG_TYPE_S8:
+    case BRIG_TYPE_B8:
+      return 8;
+
+    case BRIG_TYPE_U16:
+    case BRIG_TYPE_S16:
+    case BRIG_TYPE_B16:
+    case BRIG_TYPE_F16:
+      return 16;
+
+    case BRIG_TYPE_U32:
+    case BRIG_TYPE_S32:
+    case BRIG_TYPE_B32:
+    case BRIG_TYPE_F32:
+    case BRIG_TYPE_U8X4:
+    case BRIG_TYPE_U16X2:
+    case BRIG_TYPE_S8X4:
+    case BRIG_TYPE_S16X2:
+    case BRIG_TYPE_F16X2:
+      return 32;
+
+    case BRIG_TYPE_U64:
+    case BRIG_TYPE_S64:
+    case BRIG_TYPE_F64:
+    case BRIG_TYPE_B64:
+    case BRIG_TYPE_U8X8:
+    case BRIG_TYPE_U16X4:
+    case BRIG_TYPE_U32X2:
+    case BRIG_TYPE_S8X8:
+    case BRIG_TYPE_S16X4:
+    case BRIG_TYPE_S32X2:
+    case BRIG_TYPE_F16X4:
+    case BRIG_TYPE_F32X2:
+
+      return 64;
+
+    case BRIG_TYPE_B128:
+    case BRIG_TYPE_U8X16:
+    case BRIG_TYPE_U16X8:
+    case BRIG_TYPE_U32X4:
+    case BRIG_TYPE_U64X2:
+    case BRIG_TYPE_S8X16:
+    case BRIG_TYPE_S16X8:
+    case BRIG_TYPE_S32X4:
+    case BRIG_TYPE_S64X2:
+    case BRIG_TYPE_F16X8:
+    case BRIG_TYPE_F32X4:
+    case BRIG_TYPE_F64X2:
+      return 128;
+
+    default:
+      gcc_assert (hsa_seen_error ());
+      return t;
+    }
+}
+
+/* Return BRIG bit-type with BITSIZE length.  */
+
+BrigType16_t
+hsa_bittype_for_bitsize (unsigned bitsize)
+{
+  switch (bitsize)
+    {
+    case 1:
+      return BRIG_TYPE_B1;
+    case 8:
+      return BRIG_TYPE_B8;
+    case 16:
+      return BRIG_TYPE_B16;
+    case 32:
+      return BRIG_TYPE_B32;
+    case 64:
+      return BRIG_TYPE_B64;
+    case 128:
+      return BRIG_TYPE_B128;
+    default:
+      gcc_unreachable ();
+    }
+}
+
+/* Return BRIG unsigned int type with BITSIZE length.  */
+
+BrigType16_t
+hsa_uint_for_bitsize (unsigned bitsize)
+{
+  switch (bitsize)
+    {
+    case 8:
+      return BRIG_TYPE_U8;
+    case 16:
+      return BRIG_TYPE_U16;
+    case 32:
+      return BRIG_TYPE_U32;
+    case 64:
+      return BRIG_TYPE_U64;
+    default:
+      gcc_unreachable ();
+    }
+}
+
+/* Return BRIG float type with BITSIZE length.  */
+
+BrigType16_t
+hsa_float_for_bitsize (unsigned bitsize)
+{
+  switch (bitsize)
+    {
+    case 16:
+      return BRIG_TYPE_F16;
+    case 32:
+      return BRIG_TYPE_F32;
+    case 64:
+      return BRIG_TYPE_F64;
+    default:
+      gcc_unreachable ();
+    }
+}
+
+/* Return HSA bit-type with the same size as the type T.  */
+
+BrigType16_t
+hsa_bittype_for_type (BrigType16_t t)
+{
+  return hsa_bittype_for_bitsize (hsa_type_bit_size (t));
+}
+
+/* Return true if and only if TYPE is a floating point number type.  */
+
+bool
+hsa_type_float_p (BrigType16_t type)
+{
+  switch (type & BRIG_TYPE_BASE_MASK)
+    {
+    case BRIG_TYPE_F16:
+    case BRIG_TYPE_F32:
+    case BRIG_TYPE_F64:
+      return true;
+    default:
+      return false;
+    }
+}
+
+/* Return true if and only if TYPE is an integer number type.  */
+
+bool
+hsa_type_integer_p (BrigType16_t type)
+{
+  switch (type & BRIG_TYPE_BASE_MASK)
+    {
+    case BRIG_TYPE_U8:
+    case BRIG_TYPE_U16:
+    case BRIG_TYPE_U32:
+    case BRIG_TYPE_U64:
+    case BRIG_TYPE_S8:
+    case BRIG_TYPE_S16:
+    case BRIG_TYPE_S32:
+    case BRIG_TYPE_S64:
+      return true;
+    default:
+      return false;
+    }
+}
+
+/* Return true if and only if TYPE is an bit-type.  */
+
+bool
+hsa_btype_p (BrigType16_t type)
+{
+  switch (type & BRIG_TYPE_BASE_MASK)
+    {
+    case BRIG_TYPE_B8:
+    case BRIG_TYPE_B16:
+    case BRIG_TYPE_B32:
+    case BRIG_TYPE_B64:
+    case BRIG_TYPE_B128:
+      return true;
+    default:
+      return false;
+    }
+}
+
+
+/* Return HSA alignment encoding alignment to N bits.  */
+
+BrigAlignment8_t
+hsa_alignment_encoding (unsigned n)
+{
+  gcc_assert (n >= 8 && !(n & (n - 1)));
+  if (n >= 256)
+    return BRIG_ALIGNMENT_32;
+
+  switch (n)
+    {
+    case 8:
+      return BRIG_ALIGNMENT_1;
+    case 16:
+      return BRIG_ALIGNMENT_2;
+    case 32:
+      return BRIG_ALIGNMENT_4;
+    case 64:
+      return BRIG_ALIGNMENT_8;
+    case 128:
+      return BRIG_ALIGNMENT_16;
+    default:
+      gcc_unreachable ();
+    }
+}
+
+/* Return natural alignment of HSA TYPE.  */
+
+BrigAlignment8_t
+hsa_natural_alignment (BrigType16_t type)
+{
+  return hsa_alignment_encoding (hsa_type_bit_size (type & ~BRIG_TYPE_ARRAY));
+}
+
+/* Call the correct destructor of a HSA instruction.  */
+
+void
+hsa_destroy_insn (hsa_insn_basic *insn)
+{
+  if (hsa_insn_phi *phi = dyn_cast <hsa_insn_phi *> (insn))
+    phi->~hsa_insn_phi ();
+  else if (hsa_insn_br *br = dyn_cast <hsa_insn_br *> (insn))
+    br->~hsa_insn_br ();
+  else if (hsa_insn_cmp *cmp = dyn_cast <hsa_insn_cmp *> (insn))
+    cmp->~hsa_insn_cmp ();
+  else if (hsa_insn_mem *mem = dyn_cast <hsa_insn_mem *> (insn))
+    mem->~hsa_insn_mem ();
+  else if (hsa_insn_atomic *atomic = dyn_cast <hsa_insn_atomic *> (insn))
+    atomic->~hsa_insn_atomic ();
+  else if (hsa_insn_seg *seg = dyn_cast <hsa_insn_seg *> (insn))
+    seg->~hsa_insn_seg ();
+  else if (hsa_insn_call *call = dyn_cast <hsa_insn_call *> (insn))
+    call->~hsa_insn_call ();
+  else if (hsa_insn_arg_block *block = dyn_cast <hsa_insn_arg_block *> (insn))
+    block->~hsa_insn_arg_block ();
+  else if (hsa_insn_sbr *sbr = dyn_cast <hsa_insn_sbr *> (insn))
+    sbr->~hsa_insn_sbr ();
+  else if (hsa_insn_comment *comment = dyn_cast <hsa_insn_comment *> (insn))
+    comment->~hsa_insn_comment ();
+  else
+    insn->~hsa_insn_basic ();
+}
+
+/* Call the correct destructor of a HSA operand.  */
+
+void
+hsa_destroy_operand (hsa_op_base *op)
+{
+  if (hsa_op_code_list *list = dyn_cast <hsa_op_code_list *> (op))
+    list->~hsa_op_code_list ();
+  else if (hsa_op_operand_list *list = dyn_cast <hsa_op_operand_list *> (op))
+    list->~hsa_op_operand_list ();
+  else if (hsa_op_reg *reg = dyn_cast <hsa_op_reg *> (op))
+    reg->~hsa_op_reg ();
+  else if (hsa_op_immed *immed = dyn_cast <hsa_op_immed *> (op))
+    immed->~hsa_op_immed ();
+  else
+    op->~hsa_op_base ();
+}
+
+/* Create a mapping between the original function DECL and kernel name NAME.  */
+
+void
+hsa_add_kern_decl_mapping (tree decl, char *name, unsigned omp_data_size,
+			   bool gridified_kernel_p)
+{
+  hsa_decl_kernel_map_element dkm;
+  dkm.decl = decl;
+  dkm.name = name;
+  dkm.omp_data_size = omp_data_size;
+  dkm.gridified_kernel_p = gridified_kernel_p;
+  vec_safe_push (hsa_decl_kernel_mapping, dkm);
+}
+
+/* Return the number of kernel decl name mappings.  */
+
+unsigned
+hsa_get_number_decl_kernel_mappings (void)
+{
+  return vec_safe_length (hsa_decl_kernel_mapping);
+}
+
+/* Return the decl in the Ith kernel decl name mapping.  */
+
+tree
+hsa_get_decl_kernel_mapping_decl (unsigned i)
+{
+  return (*hsa_decl_kernel_mapping)[i].decl;
+}
+
+/* Return the name in the Ith kernel decl name mapping.  */
+
+char *
+hsa_get_decl_kernel_mapping_name (unsigned i)
+{
+  return (*hsa_decl_kernel_mapping)[i].name;
+}
+
+/* Return maximum OMP size for kernel decl name mapping.  */
+
+unsigned
+hsa_get_decl_kernel_mapping_omp_size (unsigned i)
+{
+  return (*hsa_decl_kernel_mapping)[i].omp_data_size;
+}
+
+/* Return if the function is gridified kernel in decl name mapping.  */
+
+bool
+hsa_get_decl_kernel_mapping_gridified (unsigned i)
+{
+  return (*hsa_decl_kernel_mapping)[i].gridified_kernel_p;
+}
+
+/* Free the mapping between original decls and kernel names.  */
+
+void
+hsa_free_decl_kernel_mapping (void)
+{
+  if (hsa_decl_kernel_mapping == NULL)
+    return;
+
+  for (unsigned i = 0; i < hsa_decl_kernel_mapping->length (); ++i)
+    free ((*hsa_decl_kernel_mapping)[i].name);
+  ggc_free (hsa_decl_kernel_mapping);
+}
+
+/* Add new kernel dependency.  */
+
+void
+hsa_add_kernel_dependency (tree caller, const char *called_function)
+{
+  if (hsa_decl_kernel_dependencies == NULL)
+    hsa_decl_kernel_dependencies = new hash_map<tree, vec<const char *> *> ();
+
+  vec <const char *> *s = NULL;
+  vec <const char *> **slot = hsa_decl_kernel_dependencies->get (caller);
+  if (slot == NULL)
+    {
+      s = new vec <const char *> ();
+      hsa_decl_kernel_dependencies->put (caller, s);
+    }
+  else
+    s = *slot;
+
+  s->safe_push (called_function);
+}
+
+/* Modify the name P in-place so that it is a valid HSA identifier.  */
+
+void
+hsa_sanitize_name (char *p)
+{
+  for (; *p; p++)
+    if (*p == '.' || *p == '-')
+      *p = '_';
+}
+
+/* Clone the name P, set trailing ampersand and sanitize the name.  */
+
+char *
+hsa_brig_function_name (const char *p)
+{
+  unsigned len = strlen (p);
+  char *buf = XNEWVEC (char, len + 2);
+
+  buf[0] = '&';
+  buf[len + 1] = '\0';
+  memcpy (buf + 1, p, len);
+
+  hsa_sanitize_name (buf);
+  return buf;
+}
+
+/* Return declaration name if exists.  */
+
+const char *
+hsa_get_declaration_name (tree decl)
+{
+  if (!DECL_NAME (decl))
+    {
+      char *b = XNEWVEC (char, 64);
+      sprintf (b, "__hsa_anonymous_%i", DECL_UID (decl));
+      const char *ggc_str = ggc_alloc_string (b, strlen (b) + 1);
+      free (b);
+      return ggc_str;
+    }
+
+  tree name_tree;
+  if (TREE_CODE (decl) == FUNCTION_DECL
+      || (TREE_CODE (decl) == VAR_DECL && is_global_var (decl)))
+    name_tree = DECL_ASSEMBLER_NAME (decl);
+  else
+    name_tree = DECL_NAME (decl);
+
+  const char *name = IDENTIFIER_POINTER (name_tree);
+  /* User-defined assembly names have prepended asterisk symbol.  */
+  if (name[0] == '*')
+    name++;
+
+  return name;
+}
+
+void
+hsa_summary_t::link_functions (cgraph_node *gpu, cgraph_node *host,
+			       hsa_function_kind kind, bool gridified_kernel_p)
+{
+  hsa_function_summary *gpu_summary = get (gpu);
+  hsa_function_summary *host_summary = get (host);
+
+  gpu_summary->m_kind = kind;
+  host_summary->m_kind = kind;
+
+  gpu_summary->m_gpu_implementation_p = true;
+  host_summary->m_gpu_implementation_p = false;
+
+  gpu_summary->m_gridified_kernel_p = gridified_kernel_p;
+  host_summary->m_gridified_kernel_p = gridified_kernel_p;
+
+  gpu_summary->m_binded_function = host;
+  host_summary->m_binded_function = gpu;
+
+  tree gdecl = gpu->decl;
+  DECL_ATTRIBUTES (gdecl)
+    = tree_cons (get_identifier ("flatten"), NULL_TREE,
+		 DECL_ATTRIBUTES (gdecl));
+
+  tree fn_opts = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (gdecl);
+  if (fn_opts == NULL_TREE)
+    fn_opts = optimization_default_node;
+  fn_opts = copy_node (fn_opts);
+  TREE_OPTIMIZATION (fn_opts)->x_flag_tree_loop_vectorize = false;
+  TREE_OPTIMIZATION (fn_opts)->x_flag_tree_slp_vectorize = false;
+  DECL_FUNCTION_SPECIFIC_OPTIMIZATION (gdecl) = fn_opts;
+}
+
+/* Add a HOST function to HSA summaries.  */
+
+void
+hsa_register_kernel (cgraph_node *host)
+{
+  if (hsa_summaries == NULL)
+    hsa_summaries = new hsa_summary_t (symtab);
+  hsa_function_summary *s = hsa_summaries->get (host);
+  s->m_kind = HSA_KERNEL;
+}
+
+/* Add a pair of functions to HSA summaries.  GPU is an HSA implementation of
+   a HOST function.  */
+
+void
+hsa_register_kernel (cgraph_node *gpu, cgraph_node *host)
+{
+  if (hsa_summaries == NULL)
+    hsa_summaries = new hsa_summary_t (symtab);
+  hsa_summaries->link_functions (gpu, host, HSA_KERNEL, true);
+}
+
+/* Return true if expansion of the current HSA function has already failed.  */
+
+bool
+hsa_seen_error (void)
+{
+  return hsa_cfun->m_seen_error;
+}
+
+/* Mark current HSA function as failed.  */
+
+void
+hsa_fail_cfun (void)
+{
+  hsa_failed_functions->add (hsa_cfun->m_decl);
+  hsa_cfun->m_seen_error = true;
+}
+
+char *
+hsa_internal_fn::name ()
+{
+  char *name = xstrdup (internal_fn_name (m_fn));
+  for (char *ptr = name; *ptr; ptr++)
+    *ptr = TOLOWER (*ptr);
+
+  const char *suffix = NULL;
+  if (m_type_bit_size == 32)
+    suffix = "f";
+
+  if (suffix)
+    {
+      char *name2 = concat (name, suffix, NULL);
+      free (name);
+      name = name2;
+    }
+
+  hsa_sanitize_name (name);
+  return name;
+}
+
+unsigned
+hsa_internal_fn::get_arity ()
+{
+  switch (m_fn)
+    {
+    case IFN_ACOS:
+    case IFN_ASIN:
+    case IFN_ATAN:
+    case IFN_COS:
+    case IFN_EXP:
+    case IFN_EXP10:
+    case IFN_EXP2:
+    case IFN_EXPM1:
+    case IFN_LOG:
+    case IFN_LOG10:
+    case IFN_LOG1P:
+    case IFN_LOG2:
+    case IFN_LOGB:
+    case IFN_SIGNIFICAND:
+    case IFN_SIN:
+    case IFN_SQRT:
+    case IFN_TAN:
+    case IFN_CEIL:
+    case IFN_FLOOR:
+    case IFN_NEARBYINT:
+    case IFN_RINT:
+    case IFN_ROUND:
+    case IFN_TRUNC:
+      return 1;
+    case IFN_ATAN2:
+    case IFN_COPYSIGN:
+    case IFN_FMOD:
+    case IFN_POW:
+    case IFN_REMAINDER:
+    case IFN_SCALB:
+    case IFN_LDEXP:
+      return 2;
+      break;
+    case IFN_CLRSB:
+    case IFN_CLZ:
+    case IFN_CTZ:
+    case IFN_FFS:
+    case IFN_PARITY:
+    case IFN_POPCOUNT:
+    default:
+      gcc_unreachable ();
+    }
+}
+
+BrigType16_t
+hsa_internal_fn::get_argument_type (int n)
+{
+  switch (m_fn)
+    {
+    case IFN_ACOS:
+    case IFN_ASIN:
+    case IFN_ATAN:
+    case IFN_COS:
+    case IFN_EXP:
+    case IFN_EXP10:
+    case IFN_EXP2:
+    case IFN_EXPM1:
+    case IFN_LOG:
+    case IFN_LOG10:
+    case IFN_LOG1P:
+    case IFN_LOG2:
+    case IFN_LOGB:
+    case IFN_SIGNIFICAND:
+    case IFN_SIN:
+    case IFN_SQRT:
+    case IFN_TAN:
+    case IFN_CEIL:
+    case IFN_FLOOR:
+    case IFN_NEARBYINT:
+    case IFN_RINT:
+    case IFN_ROUND:
+    case IFN_TRUNC:
+    case IFN_ATAN2:
+    case IFN_COPYSIGN:
+    case IFN_FMOD:
+    case IFN_POW:
+    case IFN_REMAINDER:
+    case IFN_SCALB:
+      return hsa_float_for_bitsize (m_type_bit_size);
+    case IFN_LDEXP:
+      {
+	if (n == -1 || n == 0)
+	  return hsa_float_for_bitsize (m_type_bit_size);
+	else
+	  return BRIG_TYPE_S32;
+      }
+    default:
+      gcc_unreachable ();
+    }
+}
+
+#include "gt-hsa.h"
diff --git a/gcc/hsa.h b/gcc/hsa.h
new file mode 100644
index 0000000..29b29e2
--- /dev/null
+++ b/gcc/hsa.h
@@ -0,0 +1,1402 @@
+/* HSAIL and BRIG related macros and definitions.
+   Copyright (C) 2013-2016 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef HSA_H
+#define HSA_H
+
+#include "hsa-brig-format.h"
+#include "is-a.h"
+#include "predict.h"
+#include "tree.h"
+#include "vec.h"
+#include "hash-table.h"
+#include "basic-block.h"
+
+
+/* Return true if the compiler should produce HSAIL.  */
+
+static inline bool
+hsa_gen_requested_p (void)
+{
+#ifndef ENABLE_HSA
+  return false;
+#endif
+  return !flag_disable_hsa;
+}
+
+/* Standard warning message if we failed to generate HSAIL for a function */
+
+#define HSA_SORRY_MSG "could not emit HSAIL for the function"
+
+class hsa_op_immed;
+class hsa_op_cst_list;
+class hsa_insn_basic;
+class hsa_op_address;
+class hsa_op_reg;
+class hsa_bb;
+typedef hsa_insn_basic *hsa_insn_basic_p;
+
+/* Class representing an input argument, output argument (result) or a
+   variable, that will eventually end up being a symbol directive.  */
+
+struct hsa_symbol
+{
+  /* Constructor.  */
+  hsa_symbol (BrigType16_t type, BrigSegment8_t segment,
+	      BrigLinkage8_t linkage, bool global_scope_p = false,
+	      BrigAllocation allocation = BRIG_ALLOCATION_AUTOMATIC);
+
+  /* Return total size of the symbol.  */
+  unsigned HOST_WIDE_INT total_byte_size ();
+
+  /* Fill in those values into the symbol according to DECL, which are
+     determined independently from whether it is parameter, result,
+     or a variable, local or global.  */
+  void fillup_for_decl (tree decl);
+
+  /* Pointer to the original tree, which is PARM_DECL for input parameters and
+     RESULT_DECL for the output parameters.  */
+  tree m_decl;
+
+  /* Name of the symbol, that will be written into output and dumps.  Can be
+     NULL, see name_number below.*/
+  const char *m_name;
+
+  /* If name is NULL, artificial name will be formed from the segment name and
+     this number.  */
+  int m_name_number;
+
+  /* Once written, this is the offset of the associated symbol directive.  Zero
+     means the symbol has not been written yet.  */
+  unsigned m_directive_offset;
+
+  /* HSA type of the parameter.  */
+  BrigType16_t m_type;
+
+  /* The HSA segment this will eventually end up in.  */
+  BrigSegment8_t m_segment;
+
+  /* The HSA kind of linkage.  */
+  BrigLinkage8_t m_linkage;
+
+  /* Array dimension, if non-zero.  */
+  unsigned HOST_WIDE_INT m_dim;
+
+  /* Constant value, used for string constants.  */
+  hsa_op_immed *m_cst_value;
+
+  /* Is in global scope.  */
+  bool m_global_scope_p;
+
+  /* True if an error has been seen for the symbol.  */
+  bool m_seen_error;
+
+  /* Symbol allocation.  */
+  BrigAllocation m_allocation;
+
+private:
+  /* Default constructor.  */
+  hsa_symbol ();
+};
+
+/* Abstract class for HSA instruction operands. */
+
+class hsa_op_base
+{
+public:
+  /* Next operand scheduled to be written when writing BRIG operand
+     section.  */
+  hsa_op_base *m_next;
+
+  /* Offset to which the associated operand structure will be written.  Zero if
+     yet not scheduled for writing.  */
+  unsigned m_brig_op_offset;
+
+  /* The type of a particular operand.  */
+  BrigKind16_t m_kind;
+
+protected:
+  hsa_op_base (BrigKind16_t k);
+private:
+  /* Make the default constructor inaccessible.  */
+  hsa_op_base () {}
+};
+
+/* Common abstract ancestor for operands which have a type.  */
+
+class hsa_op_with_type : public hsa_op_base
+{
+public:
+  /* The type.  */
+  BrigType16_t m_type;
+
+  /* Convert an operand to a destination type DTYPE and attach insns
+     to HBB if needed.  */
+  hsa_op_with_type *get_in_type (BrigType16_t dtype, hsa_bb *hbb);
+
+protected:
+  hsa_op_with_type (BrigKind16_t k, BrigType16_t t);
+private:
+  /* Make the default constructor inaccessible.  */
+  hsa_op_with_type () : hsa_op_base (BRIG_KIND_NONE) {}
+};
+
+/* An immediate HSA operand.  */
+
+class hsa_op_immed : public hsa_op_with_type
+{
+public:
+  hsa_op_immed (tree tree_val, bool min32int = true);
+  hsa_op_immed (HOST_WIDE_INT int_value, BrigType16_t type);
+  void *operator new (size_t);
+  ~hsa_op_immed ();
+  void set_type (BrigKind16_t t);
+
+  /* Value as represented by middle end.  */
+  tree m_tree_value;
+
+  /* Integer value representation.  */
+  HOST_WIDE_INT m_int_value;
+
+  /* Brig data representation.  */
+  char *m_brig_repr;
+
+  /* Brig data representation size in bytes.  */
+  unsigned m_brig_repr_size;
+
+private:
+  /* Make the default constructor inaccessible.  */
+  hsa_op_immed ();
+  /* All objects are deallocated by destroying their pool, so make delete
+     inaccessible too.  */
+  void operator delete (void *) {}
+  void emit_to_buffer (tree value);
+};
+
+/* Report whether or not P is a an immediate operand.  */
+
+template <>
+template <>
+inline bool
+is_a_helper <hsa_op_immed *>::test (hsa_op_base *p)
+{
+  return p->m_kind == BRIG_KIND_OPERAND_CONSTANT_BYTES;
+}
+
+/* HSA register operand.  */
+
+class hsa_op_reg : public hsa_op_with_type
+{
+  friend class hsa_insn_basic;
+  friend class hsa_insn_phi;
+public:
+  hsa_op_reg (BrigType16_t t);
+  void *operator new (size_t);
+
+  /* Verify register operand.  */
+  void verify_ssa ();
+
+  /* If NON-NULL, gimple SSA that we come from.  NULL if none.  */
+  tree m_gimple_ssa;
+
+  /* Defining instruction while still in the SSA.  */
+  hsa_insn_basic *m_def_insn;
+
+  /* If the register allocator decides to spill the register, this is the
+     appropriate spill symbol.  */
+  hsa_symbol *m_spill_sym;
+
+  /* Number of this register structure in the order in which they were
+     allocated.  */
+  int m_order;
+  int m_lr_begin, m_lr_end;
+
+  /* Zero if the register is not yet allocated.  After, allocation, this must
+     be 'c', 's', 'd' or 'q'.  */
+  char m_reg_class;
+  /* If allocated, the number of the HW register (within its HSA register
+     class). */
+  char m_hard_num;
+
+private:
+  /* Make the default constructor inaccessible.  */
+  hsa_op_reg () : hsa_op_with_type (BRIG_KIND_NONE, BRIG_TYPE_NONE) {}
+  /* All objects are deallocated by destroying their pool, so make delete
+     inaccessible too.  */
+  void operator delete (void *) {}
+  /* Set definition where the register is defined.  */
+  void set_definition (hsa_insn_basic *insn);
+  /* Uses of the value while still in SSA.  */
+  auto_vec <hsa_insn_basic_p> m_uses;
+};
+
+typedef class hsa_op_reg *hsa_op_reg_p;
+
+/* Report whether or not P is a register operand.  */
+
+template <>
+template <>
+inline bool
+is_a_helper <hsa_op_reg *>::test (hsa_op_base *p)
+{
+  return p->m_kind == BRIG_KIND_OPERAND_REGISTER;
+}
+
+/* Report whether or not P is a register operand.  */
+
+template <>
+template <>
+inline bool
+is_a_helper <hsa_op_reg *>::test (hsa_op_with_type *p)
+{
+  return p->m_kind == BRIG_KIND_OPERAND_REGISTER;
+}
+
+/* An address HSA operand.  */
+
+class hsa_op_address : public hsa_op_base
+{
+public:
+  /* set up a new address operand consisting of base symbol SYM, register R and
+     immediate OFFSET.  If the machine model is not large and offset is 64 bit,
+     the upper, 32 bits have to be zero.  */
+  hsa_op_address (hsa_symbol *sym, hsa_op_reg *reg,
+		  HOST_WIDE_INT offset = 0);
+
+  void *operator new (size_t);
+
+  /* Set up a new address operand consisting of base symbol SYM and
+     immediate OFFSET.  If the machine model is not large and offset is 64 bit,
+     the upper, 32 bits have to be zero.  */
+  hsa_op_address (hsa_symbol *sym, HOST_WIDE_INT offset = 0);
+
+  /* Set up a new address operand consisting of register R and
+     immediate OFFSET.  If the machine model is not large and offset is 64 bit,
+     the upper, 32 bits have to be zero.  */
+  hsa_op_address (hsa_op_reg *reg, HOST_WIDE_INT offset = 0);
+
+  /* Symbol base of the address.  Can be NULL if there is none.  */
+  hsa_symbol *m_symbol;
+
+  /* Register offset.  Can be NULL if there is none.  */
+  hsa_op_reg *m_reg;
+
+  /* Immediate byte offset.  */
+  HOST_WIDE_INT m_imm_offset;
+
+private:
+  /* Make the default constructor inaccessible.  */
+  hsa_op_address () : hsa_op_base (BRIG_KIND_NONE) {}
+  /* All objects are deallocated by destroying their pool, so make delete
+     inaccessible too.  */
+  void operator delete (void *) {}
+};
+
+/* Report whether or not P is an address operand.  */
+
+template <>
+template <>
+inline bool
+is_a_helper <hsa_op_address *>::test (hsa_op_base *p)
+{
+  return p->m_kind == BRIG_KIND_OPERAND_ADDRESS;
+}
+
+/* A reference to code HSA operand. It can be either reference
+   to a start of a BB or a start of a function.  */
+
+class hsa_op_code_ref : public hsa_op_base
+{
+public:
+  hsa_op_code_ref ();
+
+  /* Offset in the code section that this refers to.  */
+  unsigned m_directive_offset;
+};
+
+/* Report whether or not P is a code reference operand.  */
+
+template <>
+template <>
+inline bool
+is_a_helper <hsa_op_code_ref *>::test (hsa_op_base *p)
+{
+  return p->m_kind == BRIG_KIND_OPERAND_CODE_REF;
+}
+
+/* Code list HSA operand.  */
+
+class hsa_op_code_list: public hsa_op_base
+{
+public:
+  hsa_op_code_list (unsigned elements);
+  void *operator new (size_t);
+
+  /* Offset to variable-sized array in hsa_data section, where
+     are offsets to entries in the hsa_code section.  */
+  auto_vec<unsigned> m_offsets;
+private:
+  /* Make the default constructor inaccessible.  */
+  hsa_op_code_list () : hsa_op_base (BRIG_KIND_NONE) {}
+  /* All objects are deallocated by destroying their pool, so make delete
+     inaccessible too.  */
+  void operator delete (void *) {}
+};
+
+/* Report whether or not P is a code list operand.  */
+
+template <>
+template <>
+inline bool
+is_a_helper <hsa_op_code_list *>::test (hsa_op_base *p)
+{
+  return p->m_kind == BRIG_KIND_OPERAND_CODE_LIST;
+}
+
+/* Operand list HSA operand.  */
+
+class hsa_op_operand_list: public hsa_op_base
+{
+public:
+  hsa_op_operand_list (unsigned elements);
+  ~hsa_op_operand_list ();
+  void *operator new (size_t);
+
+  /* Offset to variable-sized array in hsa_data section, where
+     are offsets to entries in the hsa_code section.  */
+  auto_vec<unsigned> m_offsets;
+private:
+  /* Make the default constructor inaccessible.  */
+  hsa_op_operand_list () : hsa_op_base (BRIG_KIND_NONE) {}
+  /* All objects are deallocated by destroying their pool, so make delete
+     inaccessible too.  */
+  void operator delete (void *) {}
+};
+
+/* Report whether or not P is a code list operand.  */
+
+template <>
+template <>
+inline bool
+is_a_helper <hsa_op_operand_list *>::test (hsa_op_base *p)
+{
+  return p->m_kind == BRIG_KIND_OPERAND_OPERAND_LIST;
+}
+
+/* Opcodes of instructions that are not part of HSA but that we use to
+   represent it nevertheless.  */
+
+#define HSA_OPCODE_PHI (-1)
+#define HSA_OPCODE_ARG_BLOCK (-2)
+
+/* The number of operand pointers we can directly in an instruction.  */
+#define HSA_BRIG_INT_STORAGE_OPERANDS 5
+
+/* Class representing an HSA instruction.  Unlike typical ancestors for
+   specialized classes, this one is also directly used for all instructions
+   that are then represented as BrigInstBasic.  */
+
+class hsa_insn_basic
+{
+public:
+  hsa_insn_basic (unsigned nops, int opc);
+  hsa_insn_basic (unsigned nops, int opc, BrigType16_t t,
+		  hsa_op_base *arg0 = NULL,
+		  hsa_op_base *arg1 = NULL,
+		  hsa_op_base *arg2 = NULL,
+		  hsa_op_base *arg3 = NULL);
+
+  void *operator new (size_t);
+  void set_op (int index, hsa_op_base *op);
+  hsa_op_base *get_op (int index);
+  hsa_op_base **get_op_addr (int index);
+  unsigned int operand_count ();
+  void verify ();
+  unsigned input_count ();
+  unsigned num_used_ops ();
+  void set_output_in_type (hsa_op_reg *dest, unsigned op_index, hsa_bb *hbb);
+  bool op_output_p (unsigned opnum);
+
+  /* The previous and next instruction in the basic block.  */
+  hsa_insn_basic *m_prev, *m_next;
+
+  /* Basic block this instruction belongs to.  */
+  basic_block m_bb;
+
+  /* Operand code distinguishing different types of instructions.  Eventually
+     these should only be BRIG_INST_* values from the BrigOpcode16_t range but
+     initially we use negative values for PHI nodes and such.  */
+  int m_opcode;
+
+  /* Linearized number assigned to the instruction by HSA RA.  */
+  int m_number;
+
+  /* Type of the destination of the operations.  */
+  BrigType16_t m_type;
+
+  /* BRIG offset of the instruction in code section.  */
+  unsigned int m_brig_offset;
+
+private:
+  /* Make the default constructor inaccessible.  */
+  hsa_insn_basic () {}
+  /* All objects are deallocated by destroying their pool, so make delete
+     inaccessible too.  */
+  void operator delete (void *) {}
+  /* The individual operands.  All instructions but PHI nodes have five or
+     fewer instructions and so will fit the internal storage.  */
+  /* TODO: Vast majority of instructions have three or fewer operands, so we
+     may actually try reducing it.  */
+  auto_vec<hsa_op_base *, HSA_BRIG_INT_STORAGE_OPERANDS> m_operands;
+};
+
+/* Class representing a PHI node of the SSA form of HSA virtual
+   registers.  */
+
+class hsa_insn_phi : public hsa_insn_basic
+{
+public:
+  hsa_insn_phi (unsigned nops, hsa_op_reg *dst);
+
+  void *operator new (size_t);
+
+  /* Destination.  */
+  hsa_op_reg *m_dest;
+
+private:
+  /* Make the default constructor inaccessible.  */
+  hsa_insn_phi () : hsa_insn_basic (1, HSA_OPCODE_PHI) {}
+  /* All objects are deallocated by destroying their pool, so make delete
+     inaccessible too.  */
+  void operator delete (void *) {}
+};
+
+/* Report whether or not P is a PHI node.  */
+
+template <>
+template <>
+inline bool
+is_a_helper <hsa_insn_phi *>::test (hsa_insn_basic *p)
+{
+  return p->m_opcode == HSA_OPCODE_PHI;
+}
+
+/* HSA instruction for branches.  Currently we explicitely represent only
+   conditional branches.  */
+
+class hsa_insn_br : public hsa_insn_basic
+{
+public:
+  hsa_insn_br (hsa_op_reg *ctrl);
+
+  void *operator new (size_t);
+
+  /* Width as described in HSA documentation.  */
+  BrigWidth8_t m_width;
+private:
+  /* Make the default constructor inaccessible.  */
+  hsa_insn_br () : hsa_insn_basic (1, BRIG_OPCODE_CBR) {}
+  /* All objects are deallocated by destroying their pool, so make delete
+     inaccessible too.  */
+  void operator delete (void *) {}
+};
+
+/* Report whether P is a branching instruction.  */
+
+template <>
+template <>
+inline bool
+is_a_helper <hsa_insn_br *>::test (hsa_insn_basic *p)
+{
+  return p->m_opcode == BRIG_OPCODE_BR
+    || p->m_opcode == BRIG_OPCODE_CBR;
+}
+
+/* HSA instruction for switch branches.  */
+
+class hsa_insn_sbr : public hsa_insn_basic
+{
+public:
+  hsa_insn_sbr (hsa_op_reg *index, unsigned jump_count);
+
+  /* Default destructor.  */
+  ~hsa_insn_sbr ();
+
+  void *operator new (size_t);
+
+  void replace_all_labels (basic_block old_bb, basic_block new_bb);
+
+  /* Width as described in HSA documentation.  */
+  BrigWidth8_t m_width;
+
+  /* Jump table.  */
+  vec <basic_block> m_jump_table;
+
+  /* Default label basic block.  */
+  basic_block m_default_bb;
+
+  /* Code list for label references.  */
+  hsa_op_code_list *m_label_code_list;
+
+private:
+  /* Make the default constructor inaccessible.  */
+  hsa_insn_sbr () : hsa_insn_basic (1, BRIG_OPCODE_SBR) {}
+  /* All objects are deallocated by destroying their pool, so make delete
+     inaccessible too.  */
+  void operator delete (void *) {}
+};
+
+/* Report whether P is a switch branching instruction.  */
+
+template <>
+template <>
+inline bool
+is_a_helper <hsa_insn_sbr *>::test (hsa_insn_basic *p)
+{
+  return p->m_opcode == BRIG_OPCODE_SBR;
+}
+
+/* HSA instruction for comparisons.  */
+
+class hsa_insn_cmp : public hsa_insn_basic
+{
+public:
+  hsa_insn_cmp (BrigCompareOperation8_t cmp, BrigType16_t t,
+		hsa_op_base *arg0 = NULL, hsa_op_base *arg1 = NULL,
+		hsa_op_base *arg2 = NULL);
+
+  void *operator new (size_t);
+
+  /* Source type should be derived from operand types.  */
+
+  /* The comparison operation.  */
+  BrigCompareOperation8_t m_compare;
+
+  /* TODO: Modifiers and packing control are missing but so are everywhere
+     else.  */
+private:
+  /* Make the default constructor inaccessible.  */
+  hsa_insn_cmp () : hsa_insn_basic (1, BRIG_OPCODE_CMP) {}
+  /* All objects are deallocated by destroying their pool, so make delete
+     inaccessible too.  */
+  void operator delete (void *) {}
+};
+
+/* Report whether or not P is a comparison instruction.  */
+
+template <>
+template <>
+inline bool
+is_a_helper <hsa_insn_cmp *>::test (hsa_insn_basic *p)
+{
+  return p->m_opcode == BRIG_OPCODE_CMP;
+}
+
+/* HSA instruction for memory operations.  */
+
+class hsa_insn_mem : public hsa_insn_basic
+{
+public:
+  hsa_insn_mem (int opc, BrigType16_t t, hsa_op_base *arg0, hsa_op_base *arg1);
+
+  void *operator new (size_t);
+
+  /* Set alignment to VALUE.  */
+
+  void set_align (BrigAlignment8_t value);
+
+  /* The segment is of the memory access is either the segment of the symbol in
+     the address operand or flat address is there is no symbol there.  */
+
+  /* Required alignment of the memory operation. */
+  BrigAlignment8_t m_align;
+
+  /* HSA equiv class, basically an alias set number. */
+  uint8_t m_equiv_class;
+
+  /* TODO:  Add width modifier, perhaps also other things.  */
+protected:
+  hsa_insn_mem (unsigned nops, int opc, BrigType16_t t,
+		hsa_op_base *arg0 = NULL, hsa_op_base *arg1 = NULL,
+		hsa_op_base *arg2 = NULL, hsa_op_base *arg3 = NULL);
+
+private:
+  /* Make the default constructor inaccessible.  */
+  hsa_insn_mem () : hsa_insn_basic (1, BRIG_OPCODE_LD) {}
+  /* All objects are deallocated by destroying their pool, so make delete
+     inaccessible too.  */
+  void operator delete (void *) {}
+};
+
+/* Report whether or not P is a memory instruction.  */
+
+template <>
+template <>
+inline bool
+is_a_helper <hsa_insn_mem *>::test (hsa_insn_basic *p)
+{
+  return (p->m_opcode == BRIG_OPCODE_LD
+	  || p->m_opcode == BRIG_OPCODE_ST);
+}
+
+/* HSA instruction for atomic operations.  */
+
+class hsa_insn_atomic : public hsa_insn_mem
+{
+public:
+  hsa_insn_atomic (int nops, int opc, enum BrigAtomicOperation aop,
+		   BrigType16_t t, BrigMemoryOrder memorder,
+		   hsa_op_base *arg0 = NULL, hsa_op_base *arg1 = NULL,
+		   hsa_op_base *arg2 = NULL, hsa_op_base *arg3 = NULL);
+  void *operator new (size_t);
+
+  /* The operation itself.  */
+  enum BrigAtomicOperation m_atomicop;
+
+  /* Things like acquire/release/aligned.  */
+  enum BrigMemoryOrder m_memoryorder;
+
+  /* Scope of the atomic operation. */
+  enum BrigMemoryScope m_memoryscope;
+
+private:
+  /* Make the default constructor inaccessible.  */
+  hsa_insn_atomic () : hsa_insn_mem (1, BRIG_KIND_NONE, BRIG_TYPE_NONE) {}
+  /* All objects are deallocated by destroying their pool, so make delete
+     inaccessible too.  */
+  void operator delete (void *) {}
+};
+
+/* Report whether or not P is an atomic instruction.  */
+
+template <>
+template <>
+inline bool
+is_a_helper <hsa_insn_atomic *>::test (hsa_insn_basic *p)
+{
+  return (p->m_opcode == BRIG_OPCODE_ATOMIC
+	  || p->m_opcode == BRIG_OPCODE_ATOMICNORET);
+}
+
+/* HSA instruction for signal operations.  */
+
+class hsa_insn_signal : public hsa_insn_atomic
+{
+public:
+  hsa_insn_signal (int nops, int opc, enum BrigAtomicOperation sop,
+		   BrigType16_t t, hsa_op_base *arg0 = NULL,
+		   hsa_op_base *arg1 = NULL,
+		   hsa_op_base *arg2 = NULL, hsa_op_base *arg3 = NULL);
+
+  void *operator new (size_t);
+
+private:
+  /* All objects are deallocated by destroying their pool, so make delete
+     inaccessible too.  */
+  void operator delete (void *) {}
+};
+
+/* Report whether or not P is a signal instruction.  */
+
+template <>
+template <>
+inline bool
+is_a_helper <hsa_insn_signal *>::test (hsa_insn_basic *p)
+{
+  return (p->m_opcode == BRIG_OPCODE_SIGNAL
+	  || p->m_opcode == BRIG_OPCODE_SIGNALNORET);
+}
+
+/* HSA instruction to convert between flat addressing and segments.  */
+
+class hsa_insn_seg : public hsa_insn_basic
+{
+public:
+  hsa_insn_seg (int opc, BrigType16_t destt, BrigType16_t srct,
+		BrigSegment8_t seg, hsa_op_base *arg0, hsa_op_base *arg1);
+
+  void *operator new (size_t);
+
+  /* Source type.  Depends on the source addressing/segment.  */
+  BrigType16_t m_src_type;
+  /* The segment we are converting from or to.  */
+  BrigSegment8_t m_segment;
+private:
+  /* Make the default constructor inaccessible.  */
+  hsa_insn_seg () : hsa_insn_basic (1, BRIG_OPCODE_STOF) {}
+  /* All objects are deallocated by destroying their pool, so make delete
+     inaccessible too.  */
+  void operator delete (void *) {}
+};
+
+/* Report whether or not P is a segment conversion instruction.  */
+
+template <>
+template <>
+inline bool
+is_a_helper <hsa_insn_seg *>::test (hsa_insn_basic *p)
+{
+  return (p->m_opcode == BRIG_OPCODE_STOF
+	  || p->m_opcode == BRIG_OPCODE_FTOS);
+}
+
+/* Class for internal functions for purpose of HSA emission.  */
+
+class hsa_internal_fn
+{
+public:
+  hsa_internal_fn (enum internal_fn fn, unsigned type_bit_size):
+    m_fn (fn), m_type_bit_size (type_bit_size), m_offset (0) {}
+
+  hsa_internal_fn (const hsa_internal_fn *f):
+    m_fn (f->m_fn), m_type_bit_size (f->m_type_bit_size),
+    m_offset (f->m_offset) {}
+
+  /* Return arity of the internal function.  */
+  unsigned get_arity ();
+
+  /* Return BRIG type of N-th argument, if -1 is passed, return value type
+     is received.  */
+  BrigType16_t get_argument_type (int n);
+
+  /* Return function name.  The memory must be released by a caller.  */
+  char *name ();
+
+  /* Internal function.  */
+  enum internal_fn m_fn;
+
+  /* Bit width of return type.  */
+  unsigned m_type_bit_size;
+
+  /* BRIG offset of declaration of the function.  */
+  BrigCodeOffset32_t m_offset;
+};
+
+/* HSA instruction for function call.  */
+
+class hsa_insn_call : public hsa_insn_basic
+{
+public:
+  hsa_insn_call (tree callee);
+  hsa_insn_call (hsa_internal_fn *fn);
+
+  /* Default destructor.  */
+  ~hsa_insn_call ();
+
+  void *operator new (size_t);
+
+  /* Called function */
+  tree m_called_function;
+
+  /* Called internal function.  */
+  hsa_internal_fn *m_called_internal_fn;
+
+  /* Input formal arguments.  */
+  auto_vec <hsa_symbol *> m_input_args;
+
+  /* Input arguments store instructions.  */
+  auto_vec <hsa_insn_mem *> m_input_arg_insns;
+
+  /* Output argument, can be NULL for void functions.  */
+  hsa_symbol *m_output_arg;
+
+  /* Called function code reference.  */
+  hsa_op_code_ref m_func;
+
+  /* Code list for arguments of the function.  */
+  hsa_op_code_list *m_args_code_list;
+
+  /* Code list for result of the function.  */
+  hsa_op_code_list *m_result_code_list;
+private:
+  /* Make the default constructor inaccessible.  */
+  hsa_insn_call () : hsa_insn_basic (0, BRIG_OPCODE_CALL) {}
+  /* All objects are deallocated by destroying their pool, so make delete
+     inaccessible too.  */
+  void operator delete (void *) {}
+};
+
+/* Report whether or not P is a call instruction.  */
+
+template <>
+template <>
+inline bool
+is_a_helper <hsa_insn_call *>::test (hsa_insn_basic *p)
+{
+  return (p->m_opcode == BRIG_OPCODE_CALL);
+}
+
+/* HSA call instruction block encapsulates definition of arguments,
+   result type, corresponding loads and a possible store.
+   Moreover, it contains a single call instruction.
+   Emission of the instruction will produce multiple
+   HSAIL instructions.  */
+
+class hsa_insn_arg_block : public hsa_insn_basic
+{
+public:
+  hsa_insn_arg_block (BrigKind brig_kind, hsa_insn_call * call);
+
+  void *operator new (size_t);
+
+  /* Kind of argument block.  */
+  BrigKind m_kind;
+
+  /* Call instruction.  */
+  hsa_insn_call *m_call_insn;
+private:
+  /* All objects are deallocated by destroying their pool, so make delete
+     inaccessible too.  */
+  void operator delete (void *) {}
+};
+
+/* Report whether or not P is a call block instruction.  */
+
+template <>
+template <>
+inline bool
+is_a_helper <hsa_insn_arg_block *>::test (hsa_insn_basic *p)
+{
+  return (p->m_opcode == HSA_OPCODE_ARG_BLOCK);
+}
+
+/* HSA comment instruction.  */
+
+class hsa_insn_comment: public hsa_insn_basic
+{
+public:
+  /* Constructor of class representing the comment in HSAIL.  */
+  hsa_insn_comment (const char *s);
+
+  /* Default destructor.  */
+  ~hsa_insn_comment ();
+
+  void *operator new (size_t);
+
+  char *m_comment;
+};
+
+/* Report whether or not P is a call block instruction.  */
+
+template <>
+template <>
+inline bool
+is_a_helper <hsa_insn_comment *>::test (hsa_insn_basic *p)
+{
+  return (p->m_opcode == BRIG_KIND_DIRECTIVE_COMMENT);
+}
+
+/* HSA queue instruction.  */
+
+class hsa_insn_queue: public hsa_insn_basic
+{
+public:
+  hsa_insn_queue (int nops, BrigOpcode opcode);
+
+  /* Destructor.  */
+  ~hsa_insn_queue ();
+};
+
+/* Report whether or not P is a queue instruction.  */
+
+template <>
+template <>
+inline bool
+is_a_helper <hsa_insn_queue *>::test (hsa_insn_basic *p)
+{
+  return (p->m_opcode == BRIG_OPCODE_ADDQUEUEWRITEINDEX);
+}
+
+/* HSA source type instruction.  */
+
+class hsa_insn_srctype: public hsa_insn_basic
+{
+public:
+  hsa_insn_srctype (int nops, BrigOpcode opcode, BrigType16_t destt,
+		   BrigType16_t srct, hsa_op_base *arg0, hsa_op_base *arg1,
+		   hsa_op_base *arg2);
+
+  /* Pool allocator.  */
+  void *operator new (size_t);
+
+  /* Source type.  */
+  BrigType16_t m_source_type;
+
+  /* Destructor.  */
+  ~hsa_insn_srctype ();
+};
+
+/* Report whether or not P is a source type instruction.  */
+
+template <>
+template <>
+inline bool
+is_a_helper <hsa_insn_srctype *>::test (hsa_insn_basic *p)
+{
+  return (p->m_opcode == BRIG_OPCODE_POPCOUNT
+	  || p->m_opcode == BRIG_OPCODE_FIRSTBIT
+	  || p->m_opcode == BRIG_OPCODE_LASTBIT);
+}
+
+/* HSA packed instruction.  */
+
+class hsa_insn_packed : public hsa_insn_srctype
+{
+public:
+  hsa_insn_packed (int nops, BrigOpcode opcode, BrigType16_t destt,
+		   BrigType16_t srct, hsa_op_base *arg0, hsa_op_base *arg1,
+		   hsa_op_base *arg2);
+
+  /* Pool allocator.  */
+  void *operator new (size_t);
+
+  /* Operand list for an operand of the instruction.  */
+  hsa_op_operand_list *m_operand_list;
+
+  /* Destructor.  */
+  ~hsa_insn_packed ();
+};
+
+/* Report whether or not P is a combine instruction.  */
+
+template <>
+template <>
+inline bool
+is_a_helper <hsa_insn_packed *>::test (hsa_insn_basic *p)
+{
+  return (p->m_opcode == BRIG_OPCODE_COMBINE
+	  || p->m_opcode == BRIG_OPCODE_EXPAND);
+}
+
+/* HSA convert instruction.  */
+
+class hsa_insn_cvt: public hsa_insn_basic
+{
+public:
+  hsa_insn_cvt (hsa_op_with_type *dest, hsa_op_with_type *src);
+
+  /* Pool allocator.  */
+  void *operator new (size_t);
+};
+
+/* Report whether or not P is a convert instruction.  */
+
+template <>
+template <>
+inline bool
+is_a_helper <hsa_insn_cvt *>::test (hsa_insn_basic *p)
+{
+  return (p->m_opcode == BRIG_OPCODE_CVT);
+}
+
+/* HSA alloca instruction.  */
+
+class hsa_insn_alloca: public hsa_insn_basic
+{
+public:
+  hsa_insn_alloca (hsa_op_with_type *dest, hsa_op_with_type *size,
+		   unsigned alignment = 0);
+
+  /* Required alignment of the allocation. */
+  BrigAlignment8_t m_align;
+
+  /* Pool allocator.  */
+  void *operator new (size_t);
+};
+
+/* Report whether or not P is an alloca instruction.  */
+
+template <>
+template <>
+inline bool
+is_a_helper <hsa_insn_alloca *>::test (hsa_insn_basic *p)
+{
+  return (p->m_opcode == BRIG_OPCODE_ALLOCA);
+}
+
+/* Basic block of HSA instructions.  */
+
+class hsa_bb
+{
+public:
+  hsa_bb (basic_block cfg_bb);
+  hsa_bb (basic_block cfg_bb, int idx);
+  ~hsa_bb ();
+
+  /* Append an instruction INSN into the basic block.  */
+  void append_insn (hsa_insn_basic *insn);
+
+  /* The real CFG BB that this HBB belongs to.  */
+  basic_block m_bb;
+
+  /* The operand that refers to the label to this BB.  */
+  hsa_op_code_ref m_label_ref;
+
+  /* The first and last instruction.  */
+  hsa_insn_basic *m_first_insn, *m_last_insn;
+  /* The first and last phi node.  */
+  hsa_insn_phi *m_first_phi, *m_last_phi;
+
+  /* Just a number to construct names from.  */
+  int m_index;
+
+  bitmap m_liveout, m_livein;
+private:
+  /* Make the default constructor inaccessible.  */
+  hsa_bb ();
+  /* All objects are deallocated by destroying their pool, so make delete
+     inaccessible too.  */
+  void operator delete (void *) {}
+};
+
+/* Return the corresponding HSA basic block structure for the given control
+   flow basic_block BB.  */
+
+static inline hsa_bb *
+hsa_bb_for_bb (basic_block bb)
+{
+  return (struct hsa_bb *) bb->aux;
+}
+
+/* Class for hashing local hsa_symbols.  */
+
+struct hsa_noop_symbol_hasher : nofree_ptr_hash <hsa_symbol>
+{
+  static inline hashval_t hash (const value_type);
+  static inline bool equal (const value_type, const compare_type);
+};
+
+/* Hash hsa_symbol.  */
+
+inline hashval_t
+hsa_noop_symbol_hasher::hash (const value_type item)
+{
+  return DECL_UID (item->m_decl);
+}
+
+/* Return true if the DECL_UIDs of decls both symbols refer to are equal.  */
+
+inline bool
+hsa_noop_symbol_hasher::equal (const value_type a, const compare_type b)
+{
+  return (DECL_UID (a->m_decl) == DECL_UID (b->m_decl));
+}
+
+/* Structure that encapsulates intermediate representation of a HSA
+   function.  */
+
+class hsa_function_representation
+{
+public:
+  hsa_function_representation (tree fdecl, bool kernel_p,
+			       unsigned ssa_names_count);
+  hsa_function_representation (hsa_internal_fn *fn);
+  ~hsa_function_representation ();
+
+  /* Builds a shadow register that is utilized to a kernel dispatch.  */
+  hsa_op_reg *get_shadow_reg ();
+
+  /* Return true if we are in a function that has kernel dispatch
+     shadow register.  */
+  bool has_shadow_reg_p ();
+
+  /* The entry/exit blocks don't contain incoming code,
+     but the HSA generator might use them to put code into,
+     so we need hsa_bb instances of them.  */
+  void init_extra_bbs ();
+
+  /* Return linkage of the representation.  */
+  BrigLinkage8_t get_linkage ();
+
+  /* Create a private symbol of requested TYPE.  */
+  hsa_symbol *create_hsa_temporary (BrigType16_t type);
+
+  /* Lookup or create a HSA pseudo register for a given gimple SSA name.  */
+  hsa_op_reg *reg_for_gimple_ssa (tree ssa);
+
+  /* Name of the function.  */
+  char *m_name;
+
+  /* Number of allocated register structures.  */
+  int m_reg_count;
+
+  /* Input arguments.  */
+  vec <hsa_symbol *> m_input_args;
+
+  /* Output argument or NULL if there is none.  */
+  hsa_symbol *m_output_arg;
+
+  /* Hash table of local variable symbols.  */
+  hash_table <hsa_noop_symbol_hasher> *m_local_symbols;
+
+  /* Hash map for string constants.  */
+  hash_map <tree, hsa_symbol *> m_string_constants_map;
+
+  /* Vector of pointers to spill symbols.  */
+  vec <struct hsa_symbol *> m_spill_symbols;
+
+  /* Vector of pointers to global variables and transformed string constants
+     that are used by the function.  */
+  vec <struct hsa_symbol *> m_global_symbols;
+
+  /* Private function artificial variables.  */
+  vec <struct hsa_symbol *> m_private_variables;
+
+  /* Vector of called function declarations.  */
+  vec <tree> m_called_functions;
+
+  /* Vector of used internal functions.  */
+  vec <hsa_internal_fn *> m_called_internal_fns;
+
+  /* Number of HBB BBs.  */
+  int m_hbb_count;
+
+  /* Whether or not we could check and enforce SSA properties.  */
+  bool m_in_ssa;
+
+  /* True if the function is kernel function.  */
+  bool m_kern_p;
+
+  /* True if the function representation is a declaration.  */
+  bool m_declaration_p;
+
+  /* Function declaration tree.  */
+  tree m_decl;
+
+  /* Internal function info is used for declarations of internal functions.  */
+  hsa_internal_fn *m_internal_fn;
+
+  /* Runtime shadow register.  */
+  hsa_op_reg *m_shadow_reg;
+
+  /* Number of kernel dispatched which take place in the function.  */
+  unsigned m_kernel_dispatch_count;
+
+  /* If the function representation contains a kernel dispatch,
+     OMP data size is necessary memory that is used for copying before
+     a kernel dispatch.  */
+  unsigned m_maximum_omp_data_size;
+
+  /* Return true if there's an HSA-specific warning already seen.  */
+  bool m_seen_error;
+
+  /* Counter for temporary symbols created in the function representation.  */
+  unsigned m_temp_symbol_count;
+
+  /* SSA names mapping.  */
+  vec <hsa_op_reg_p> m_ssa_map;
+};
+
+enum hsa_function_kind
+{
+  HSA_NONE,
+  HSA_KERNEL,
+  HSA_FUNCTION
+};
+
+struct hsa_function_summary
+{
+  /* Default constructor.  */
+  hsa_function_summary ();
+
+  /* Kind of GPU/host function.  */
+  hsa_function_kind m_kind;
+
+  /* Pointer to a cgraph node which is a HSA implementation of the function.
+     In case of the function is a HSA function, the binded function points
+     to the host function.  */
+  cgraph_node *m_binded_function;
+
+  /* Identifies if the function is an HSA function or a host function.  */
+  bool m_gpu_implementation_p;
+
+  /* True if the function is a gridified kernel.  */
+  bool m_gridified_kernel_p;
+};
+
+inline
+hsa_function_summary::hsa_function_summary (): m_kind (HSA_NONE),
+  m_binded_function (NULL), m_gpu_implementation_p (false)
+{
+}
+
+/* Function summary for HSA functions.  */
+class hsa_summary_t: public function_summary <hsa_function_summary *>
+{
+public:
+  hsa_summary_t (symbol_table *table):
+    function_summary<hsa_function_summary *> (table) { }
+
+  /* Couple GPU and HOST as gpu-specific and host-specific implementation of
+     the same function.  KIND determines whether GPU is a host-invokable kernel
+     or gpu-callable function and GRIDIFIED_KERNEL_P is set if the function was
+     gridified in OMP.  */
+
+  void link_functions (cgraph_node *gpu, cgraph_node *host,
+		       hsa_function_kind kind, bool gridified_kernel_p);
+};
+
+/* OMP simple builtin describes behavior that should be done for
+   the routine.  */
+class omp_simple_builtin
+{
+public:
+  omp_simple_builtin (const char *name, const char *warning_message,
+	       bool sorry, hsa_op_immed *return_value = NULL):
+    m_name (name), m_warning_message (warning_message), m_sorry (sorry),
+    m_return_value (return_value)
+  {}
+
+  /* Generate HSAIL instructions for the builtin or produce warning message.  */
+  void generate (gimple *stmt, hsa_bb *hbb);
+
+  /* Name of function.  */
+  const char *m_name;
+
+  /* Warning message.  */
+  const char *m_warning_message;
+
+  /* Flag if we should sorry after the warning message is printed.  */
+  bool m_sorry;
+
+  /* Return value of the function.  */
+  hsa_op_immed *m_return_value;
+
+  /* Emission function.  */
+  void (*m_emit_func) (gimple *stmt, hsa_bb *);
+};
+
+/* Class for hashing hsa_internal_fn.  */
+
+struct hsa_internal_fn_hasher: free_ptr_hash <hsa_internal_fn>
+{
+  static inline hashval_t hash (const value_type);
+  static inline bool equal (const value_type, const compare_type);
+};
+
+/* Hash hsa_symbol.  */
+
+inline hashval_t
+hsa_internal_fn_hasher::hash (const value_type item)
+{
+  return item->m_fn;
+}
+
+/* Return true if the DECL_UIDs of decls both symbols refer to  are equal.  */
+
+inline bool
+hsa_internal_fn_hasher::equal (const value_type a, const compare_type b)
+{
+  return a->m_fn == b->m_fn && a->m_type_bit_size == b->m_type_bit_size;
+}
+
+/* in hsa.c */
+extern struct hsa_function_representation *hsa_cfun;
+extern hash_map <tree, vec <const char *> *> *hsa_decl_kernel_dependencies;
+extern hsa_summary_t *hsa_summaries;
+extern hsa_symbol *hsa_num_threads;
+extern unsigned hsa_kernel_calls_counter;
+extern hash_set <tree> *hsa_failed_functions;
+extern hash_table <hsa_noop_symbol_hasher> *hsa_global_variable_symbols;
+
+bool hsa_callable_function_p (tree fndecl);
+void hsa_init_compilation_unit_data (void);
+void hsa_deinit_compilation_unit_data (void);
+bool hsa_machine_large_p (void);
+bool hsa_full_profile_p (void);
+bool hsa_opcode_floating_bit_insn_p (BrigOpcode16_t);
+unsigned hsa_type_bit_size (BrigType16_t t);
+BrigType16_t hsa_bittype_for_bitsize (unsigned bitsize);
+BrigType16_t hsa_uint_for_bitsize (unsigned bitsize);
+BrigType16_t hsa_float_for_bitsize (unsigned bitsize);
+BrigType16_t hsa_bittype_for_type (BrigType16_t t);
+bool hsa_type_float_p (BrigType16_t type);
+bool hsa_type_integer_p (BrigType16_t type);
+bool hsa_btype_p (BrigType16_t type);
+BrigAlignment8_t hsa_alignment_encoding (unsigned n);
+BrigAlignment8_t hsa_natural_alignment (BrigType16_t type);
+void hsa_destroy_operand (hsa_op_base *op);
+void hsa_destroy_insn (hsa_insn_basic *insn);
+void hsa_add_kern_decl_mapping (tree decl, char *name, unsigned, bool);
+unsigned hsa_get_number_decl_kernel_mappings (void);
+tree hsa_get_decl_kernel_mapping_decl (unsigned i);
+char *hsa_get_decl_kernel_mapping_name (unsigned i);
+unsigned hsa_get_decl_kernel_mapping_omp_size (unsigned i);
+bool hsa_get_decl_kernel_mapping_gridified (unsigned i);
+void hsa_free_decl_kernel_mapping (void);
+void hsa_add_kernel_dependency (tree caller, const char *called_function);
+void hsa_sanitize_name (char *p);
+char *hsa_brig_function_name (const char *p);
+const char *hsa_get_declaration_name (tree decl);
+void hsa_register_kernel (cgraph_node *host);
+void hsa_register_kernel (cgraph_node *gpu, cgraph_node *host);
+bool hsa_seen_error (void);
+void hsa_fail_cfun (void);
+
+/* In hsa-gen.c.  */
+void hsa_build_append_simple_mov (hsa_op_reg *, hsa_op_base *, hsa_bb *);
+hsa_symbol *hsa_get_spill_symbol (BrigType16_t);
+hsa_symbol *hsa_get_string_cst_symbol (BrigType16_t);
+hsa_op_reg *hsa_spill_in (hsa_insn_basic *, hsa_op_reg *, hsa_op_reg **);
+hsa_op_reg *hsa_spill_out (hsa_insn_basic *, hsa_op_reg *, hsa_op_reg **);
+hsa_bb *hsa_init_new_bb (basic_block);
+hsa_function_representation *hsa_generate_function_declaration (tree decl);
+hsa_function_representation *hsa_generate_internal_fn_decl (hsa_internal_fn *);
+tree hsa_get_host_function (tree decl);
+
+/* In hsa-regalloc.c.  */
+void hsa_regalloc (void);
+
+/* In hsa-brig.c.  */
+extern hash_table <hsa_internal_fn_hasher> *hsa_emitted_internal_decls;
+void hsa_brig_emit_function (void);
+void hsa_output_brig (void);
+unsigned hsa_get_imm_brig_type_len (BrigType16_t type);
+void hsa_brig_emit_omp_symbols (void);
+
+/*  In hsa-dump.c.  */
+const char *hsa_seg_name (BrigSegment8_t);
+void dump_hsa_insn (FILE *f, hsa_insn_basic *insn);
+void dump_hsa_bb (FILE *, hsa_bb *);
+void dump_hsa_cfun (FILE *);
+DEBUG_FUNCTION void debug_hsa_operand (hsa_op_base *opc);
+DEBUG_FUNCTION void debug_hsa_insn (hsa_insn_basic *insn);
+
+union hsa_bytes
+{
+  uint8_t b8;
+  uint16_t b16;
+  uint32_t b32;
+  uint64_t b64;
+};
+
+/* Return true if a function DECL is an HSA implementation.  */
+
+static inline bool
+hsa_gpu_implementation_p (tree decl)
+{
+  if (hsa_summaries == NULL)
+    return false;
+
+  hsa_function_summary *s = hsa_summaries->get (cgraph_node::get_create (decl));
+
+  return s->m_gpu_implementation_p;
+}
+
+#endif /* HSA_H */
diff --git a/gcc/toplev.c b/gcc/toplev.c
index 8bab3e5..b754e5b 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -75,6 +75,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gcse.h"
 #include "tree-chkp.h"
 #include "omp-low.h"
+#include "hsa.h"
 
 #if defined(DBX_DEBUGGING_INFO) || defined(XCOFF_DEBUGGING_INFO)
 #include "dbxout.h"
@@ -518,6 +519,8 @@ compile_file (void)
 
       omp_finish_file ();
 
+      hsa_output_brig ();
+
       output_shared_constant_pool ();
       output_object_blocks ();
       finish_tm_clone_pairs ();

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [hsa merge 08/10] HSAIL BRIG description header file
  2016-01-13 17:55 [hsa merge 00/10] Merge of HSA branch Martin Jambor
                   ` (7 preceding siblings ...)
  2016-01-13 18:11 ` [hsa merge 09/10] Majority of the HSA back-end Martin Jambor
@ 2016-01-13 18:12 ` Martin Jambor
  2016-01-14 13:04   ` Jakub Jelinek
  2016-01-13 18:17 ` [hsa merge 05/10] OpenMP lowering/expansion changes (gridification) Martin Jambor
  2016-01-19 10:45 ` [hsa merge 00/10] Merge of HSA branch Martin Jambor
  10 siblings, 1 reply; 77+ messages in thread
From: Martin Jambor @ 2016-01-13 18:12 UTC (permalink / raw)
  To: GCC Patches; +Cc: David Edelsohn, Jeff Law

[-- Attachment #1: hsa-brig-format.diff --]
[-- Type: text/plain, Size: 37117 bytes --]

Hi,

the following patch adds a BRIG (binary representation of HSAIL)
representation description.  It is within a single header file
describing the binary structures and constants of the format.

The file comes from the HSA Foundation (I have only added the
HSA_BRIG_FORMAT_H macro and check and removed some weird comments
which are not present in proposed future versions of the file) and is
licensed under "University of Illinois/NCSA Open Source License."

The license is "GPL-compatible" according to FSF
(http://www.gnu.org/licenses/license-list.en.html#GPLCompatibleLicenses)
so I believe we can have it in GCC.  Nevertheless, it is not GPL and
there is no copyright assignment for it, but the situation is
hopefully analogous to some other libraries that have their upstream
elsewhere but we ship them as part of the GCC.

In the previous posting of this patch
(https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00721.html) I have
requested a permission from the steering committee to include this file
with a different upstream in GCC.  I have not received an official
reply but since I have been chosen to be the HSA maintainer, I tend to
think there were no legal objections against HSA going forward,
including this file.

Thanks,

Martin


2015-12-04  Martin Jambor  <mjambor@suse.cz>

        * hsa-brig-format.h: New file.

diff --git a/gcc/hsa-brig-format.h b/gcc/hsa-brig-format.h
new file mode 100644
index 0000000..6e2fe75
--- /dev/null
+++ b/gcc/hsa-brig-format.h
@@ -0,0 +1,1277 @@
+// University of Illinois/NCSA
+// Open Source License
+//
+// Copyright (c) 2013-2015, Advanced Micro Devices, Inc.
+// All rights reserved.
+//
+// Developed by:
+//
+//     HSA Team
+//
+//     Advanced Micro Devices, Inc
+//
+//     www.amd.com
+//
+// Permission is hereby granted, free of charge, to any person obtaining a copy of
+// this software and associated documentation files (the "Software"), to deal with
+// the Software without restriction, including without limitation the rights to
+// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+// of the Software, and to permit persons to whom the Software is furnished to do
+// so, subject to the following conditions:
+//
+//     * Redistributions of source code must retain the above copyright notice,
+//       this list of conditions and the following disclaimers.
+//
+//     * Redistributions in binary form must reproduce the above copyright notice,
+//       this list of conditions and the following disclaimers in the
+//       documentation and/or other materials provided with the distribution.
+//
+//     * Neither the names of the HSA Team, University of Illinois at
+//       Urbana-Champaign, nor the names of its contributors may be used to
+//       endorse or promote products derived from this Software without specific
+//       prior written permission.
+//
+// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
+// FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
+// CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH THE
+// SOFTWARE.
+
+#ifndef HSA_BRIG_FORMAT_H
+#define HSA_BRIG_FORMAT_H
+
+typedef uint32_t BrigVersion32_t;
+
+enum BrigVersion {
+
+    BRIG_VERSION_HSAIL_MAJOR = 1,
+    BRIG_VERSION_HSAIL_MINOR = 0,
+    BRIG_VERSION_BRIG_MAJOR  = 1,
+    BRIG_VERSION_BRIG_MINOR  = 0
+};
+
+typedef uint8_t BrigAlignment8_t;
+
+typedef uint8_t BrigAllocation8_t;
+
+typedef uint8_t BrigAluModifier8_t;
+
+typedef uint8_t BrigAtomicOperation8_t;
+
+typedef uint32_t BrigCodeOffset32_t;
+
+typedef uint8_t BrigCompareOperation8_t;
+
+typedef uint16_t BrigControlDirective16_t;
+
+typedef uint32_t BrigDataOffset32_t;
+
+typedef BrigDataOffset32_t BrigDataOffsetCodeList32_t;
+
+typedef BrigDataOffset32_t BrigDataOffsetOperandList32_t;
+
+typedef BrigDataOffset32_t BrigDataOffsetString32_t;
+
+typedef uint8_t BrigExecutableModifier8_t;
+
+typedef uint8_t BrigImageChannelOrder8_t;
+
+typedef uint8_t BrigImageChannelType8_t;
+
+typedef uint8_t BrigImageGeometry8_t;
+
+typedef uint8_t BrigImageQuery8_t;
+
+typedef uint16_t BrigKind16_t;
+
+typedef uint8_t BrigLinkage8_t;
+
+typedef uint8_t BrigMachineModel8_t;
+
+typedef uint8_t BrigMemoryModifier8_t;
+
+typedef uint8_t BrigMemoryOrder8_t;
+
+typedef uint8_t BrigMemoryScope8_t;
+
+typedef uint16_t BrigOpcode16_t;
+
+typedef uint32_t BrigOperandOffset32_t;
+
+typedef uint8_t BrigPack8_t;
+
+typedef uint8_t BrigProfile8_t;
+
+typedef uint16_t BrigRegisterKind16_t;
+
+typedef uint8_t BrigRound8_t;
+
+typedef uint8_t BrigSamplerAddressing8_t;
+
+typedef uint8_t BrigSamplerCoordNormalization8_t;
+
+typedef uint8_t BrigSamplerFilter8_t;
+
+typedef uint8_t BrigSamplerQuery8_t;
+
+typedef uint32_t BrigSectionIndex32_t;
+
+typedef uint8_t BrigSegCvtModifier8_t;
+
+typedef uint8_t BrigSegment8_t;
+
+typedef uint32_t BrigStringOffset32_t;
+
+typedef uint16_t BrigType16_t;
+
+typedef uint8_t BrigVariableModifier8_t;
+
+typedef uint8_t BrigWidth8_t;
+
+typedef uint32_t BrigExceptions32_t;
+
+enum BrigKind {
+
+    BRIG_KIND_NONE = 0x0000,
+
+    BRIG_KIND_DIRECTIVE_BEGIN = 0x1000,
+    BRIG_KIND_DIRECTIVE_ARG_BLOCK_END = 0x1000,
+    BRIG_KIND_DIRECTIVE_ARG_BLOCK_START = 0x1001,
+    BRIG_KIND_DIRECTIVE_COMMENT = 0x1002,
+    BRIG_KIND_DIRECTIVE_CONTROL = 0x1003,
+    BRIG_KIND_DIRECTIVE_EXTENSION = 0x1004,
+    BRIG_KIND_DIRECTIVE_FBARRIER = 0x1005,
+    BRIG_KIND_DIRECTIVE_FUNCTION = 0x1006,
+    BRIG_KIND_DIRECTIVE_INDIRECT_FUNCTION = 0x1007,
+    BRIG_KIND_DIRECTIVE_KERNEL = 0x1008,
+    BRIG_KIND_DIRECTIVE_LABEL = 0x1009,
+    BRIG_KIND_DIRECTIVE_LOC = 0x100a,
+    BRIG_KIND_DIRECTIVE_MODULE = 0x100b,
+    BRIG_KIND_DIRECTIVE_PRAGMA = 0x100c,
+    BRIG_KIND_DIRECTIVE_SIGNATURE = 0x100d,
+    BRIG_KIND_DIRECTIVE_VARIABLE = 0x100e,
+    BRIG_KIND_DIRECTIVE_END = 0x100f,
+
+    BRIG_KIND_INST_BEGIN = 0x2000,
+    BRIG_KIND_INST_ADDR = 0x2000,
+    BRIG_KIND_INST_ATOMIC = 0x2001,
+    BRIG_KIND_INST_BASIC = 0x2002,
+    BRIG_KIND_INST_BR = 0x2003,
+    BRIG_KIND_INST_CMP = 0x2004,
+    BRIG_KIND_INST_CVT = 0x2005,
+    BRIG_KIND_INST_IMAGE = 0x2006,
+    BRIG_KIND_INST_LANE = 0x2007,
+    BRIG_KIND_INST_MEM = 0x2008,
+    BRIG_KIND_INST_MEM_FENCE = 0x2009,
+    BRIG_KIND_INST_MOD = 0x200a,
+    BRIG_KIND_INST_QUERY_IMAGE = 0x200b,
+    BRIG_KIND_INST_QUERY_SAMPLER = 0x200c,
+    BRIG_KIND_INST_QUEUE = 0x200d,
+    BRIG_KIND_INST_SEG = 0x200e,
+    BRIG_KIND_INST_SEG_CVT = 0x200f,
+    BRIG_KIND_INST_SIGNAL = 0x2010,
+    BRIG_KIND_INST_SOURCE_TYPE = 0x2011,
+    BRIG_KIND_INST_END = 0x2012,
+
+    BRIG_KIND_OPERAND_BEGIN = 0x3000,
+    BRIG_KIND_OPERAND_ADDRESS = 0x3000,
+    BRIG_KIND_OPERAND_ALIGN = 0x3001,
+    BRIG_KIND_OPERAND_CODE_LIST = 0x3002,
+    BRIG_KIND_OPERAND_CODE_REF = 0x3003,
+    BRIG_KIND_OPERAND_CONSTANT_BYTES = 0x3004,
+    BRIG_KIND_OPERAND_RESERVED = 0x3005,
+    BRIG_KIND_OPERAND_CONSTANT_IMAGE = 0x3006,
+    BRIG_KIND_OPERAND_CONSTANT_OPERAND_LIST = 0x3007,
+    BRIG_KIND_OPERAND_CONSTANT_SAMPLER = 0x3008,
+    BRIG_KIND_OPERAND_OPERAND_LIST = 0x3009,
+    BRIG_KIND_OPERAND_REGISTER = 0x300a,
+    BRIG_KIND_OPERAND_STRING = 0x300b,
+    BRIG_KIND_OPERAND_WAVESIZE = 0x300c,
+    BRIG_KIND_OPERAND_END = 0x300d
+};
+
+enum BrigAlignment {
+
+    BRIG_ALIGNMENT_NONE = 0,
+    BRIG_ALIGNMENT_1 = 1,
+    BRIG_ALIGNMENT_2 = 2,
+    BRIG_ALIGNMENT_4 = 3,
+    BRIG_ALIGNMENT_8 = 4,
+    BRIG_ALIGNMENT_16 = 5,
+    BRIG_ALIGNMENT_32 = 6,
+    BRIG_ALIGNMENT_64 = 7,
+    BRIG_ALIGNMENT_128 = 8,
+    BRIG_ALIGNMENT_256 = 9,
+
+    BRIG_ALIGNMENT_LAST,
+    BRIG_ALIGNMENT_MAX = BRIG_ALIGNMENT_LAST - 1
+};
+
+enum BrigAllocation {
+
+    BRIG_ALLOCATION_NONE = 0,
+    BRIG_ALLOCATION_PROGRAM = 1,
+    BRIG_ALLOCATION_AGENT = 2,
+    BRIG_ALLOCATION_AUTOMATIC = 3
+};
+
+enum BrigAluModifierMask {
+    BRIG_ALU_FTZ = 1
+};
+
+enum BrigAtomicOperation {
+
+    BRIG_ATOMIC_ADD = 0,
+    BRIG_ATOMIC_AND = 1,
+    BRIG_ATOMIC_CAS = 2,
+    BRIG_ATOMIC_EXCH = 3,
+    BRIG_ATOMIC_LD = 4,
+    BRIG_ATOMIC_MAX = 5,
+    BRIG_ATOMIC_MIN = 6,
+    BRIG_ATOMIC_OR = 7,
+    BRIG_ATOMIC_ST = 8,
+    BRIG_ATOMIC_SUB = 9,
+    BRIG_ATOMIC_WRAPDEC = 10,
+    BRIG_ATOMIC_WRAPINC = 11,
+    BRIG_ATOMIC_XOR = 12,
+    BRIG_ATOMIC_WAIT_EQ = 13,
+    BRIG_ATOMIC_WAIT_NE = 14,
+    BRIG_ATOMIC_WAIT_LT = 15,
+    BRIG_ATOMIC_WAIT_GTE = 16,
+    BRIG_ATOMIC_WAITTIMEOUT_EQ = 17,
+    BRIG_ATOMIC_WAITTIMEOUT_NE = 18,
+    BRIG_ATOMIC_WAITTIMEOUT_LT = 19,
+    BRIG_ATOMIC_WAITTIMEOUT_GTE = 20
+};
+
+enum BrigCompareOperation {
+
+    BRIG_COMPARE_EQ = 0,
+    BRIG_COMPARE_NE = 1,
+    BRIG_COMPARE_LT = 2,
+    BRIG_COMPARE_LE = 3,
+    BRIG_COMPARE_GT = 4,
+    BRIG_COMPARE_GE = 5,
+    BRIG_COMPARE_EQU = 6,
+    BRIG_COMPARE_NEU = 7,
+    BRIG_COMPARE_LTU = 8,
+    BRIG_COMPARE_LEU = 9,
+    BRIG_COMPARE_GTU = 10,
+    BRIG_COMPARE_GEU = 11,
+    BRIG_COMPARE_NUM = 12,
+    BRIG_COMPARE_NAN = 13,
+    BRIG_COMPARE_SEQ = 14,
+    BRIG_COMPARE_SNE = 15,
+    BRIG_COMPARE_SLT = 16,
+    BRIG_COMPARE_SLE = 17,
+    BRIG_COMPARE_SGT = 18,
+    BRIG_COMPARE_SGE = 19,
+    BRIG_COMPARE_SGEU = 20,
+    BRIG_COMPARE_SEQU = 21,
+    BRIG_COMPARE_SNEU = 22,
+    BRIG_COMPARE_SLTU = 23,
+    BRIG_COMPARE_SLEU = 24,
+    BRIG_COMPARE_SNUM = 25,
+    BRIG_COMPARE_SNAN = 26,
+    BRIG_COMPARE_SGTU = 27
+};
+
+enum BrigControlDirective {
+
+    BRIG_CONTROL_NONE = 0,
+    BRIG_CONTROL_ENABLEBREAKEXCEPTIONS = 1,
+    BRIG_CONTROL_ENABLEDETECTEXCEPTIONS = 2,
+    BRIG_CONTROL_MAXDYNAMICGROUPSIZE = 3,
+    BRIG_CONTROL_MAXFLATGRIDSIZE = 4,
+    BRIG_CONTROL_MAXFLATWORKGROUPSIZE = 5,
+    BRIG_CONTROL_REQUIREDDIM = 6,
+    BRIG_CONTROL_REQUIREDGRIDSIZE = 7,
+    BRIG_CONTROL_REQUIREDWORKGROUPSIZE = 8,
+    BRIG_CONTROL_REQUIRENOPARTIALWORKGROUPS = 9
+};
+
+enum BrigExecutableModifierMask {
+
+    BRIG_EXECUTABLE_DEFINITION = 1
+};
+
+enum BrigImageChannelOrder {
+
+    BRIG_CHANNEL_ORDER_A = 0,
+    BRIG_CHANNEL_ORDER_R = 1,
+    BRIG_CHANNEL_ORDER_RX = 2,
+    BRIG_CHANNEL_ORDER_RG = 3,
+    BRIG_CHANNEL_ORDER_RGX = 4,
+    BRIG_CHANNEL_ORDER_RA = 5,
+    BRIG_CHANNEL_ORDER_RGB = 6,
+    BRIG_CHANNEL_ORDER_RGBX = 7,
+    BRIG_CHANNEL_ORDER_RGBA = 8,
+    BRIG_CHANNEL_ORDER_BGRA = 9,
+    BRIG_CHANNEL_ORDER_ARGB = 10,
+    BRIG_CHANNEL_ORDER_ABGR = 11,
+    BRIG_CHANNEL_ORDER_SRGB = 12,
+    BRIG_CHANNEL_ORDER_SRGBX = 13,
+    BRIG_CHANNEL_ORDER_SRGBA = 14,
+    BRIG_CHANNEL_ORDER_SBGRA = 15,
+    BRIG_CHANNEL_ORDER_INTENSITY = 16,
+    BRIG_CHANNEL_ORDER_LUMINANCE = 17,
+    BRIG_CHANNEL_ORDER_DEPTH = 18,
+    BRIG_CHANNEL_ORDER_DEPTH_STENCIL = 19,
+
+    BRIG_CHANNEL_ORDER_UNKNOWN,
+
+    BRIG_CHANNEL_ORDER_FIRST_USER_DEFINED = 128
+
+};
+
+enum BrigImageChannelType {
+
+    BRIG_CHANNEL_TYPE_SNORM_INT8 = 0,
+    BRIG_CHANNEL_TYPE_SNORM_INT16 = 1,
+    BRIG_CHANNEL_TYPE_UNORM_INT8 = 2,
+    BRIG_CHANNEL_TYPE_UNORM_INT16 = 3,
+    BRIG_CHANNEL_TYPE_UNORM_INT24 = 4,
+    BRIG_CHANNEL_TYPE_UNORM_SHORT_555 = 5,
+    BRIG_CHANNEL_TYPE_UNORM_SHORT_565 = 6,
+    BRIG_CHANNEL_TYPE_UNORM_INT_101010 = 7,
+    BRIG_CHANNEL_TYPE_SIGNED_INT8 = 8,
+    BRIG_CHANNEL_TYPE_SIGNED_INT16 = 9,
+    BRIG_CHANNEL_TYPE_SIGNED_INT32 = 10,
+    BRIG_CHANNEL_TYPE_UNSIGNED_INT8 = 11,
+    BRIG_CHANNEL_TYPE_UNSIGNED_INT16 = 12,
+    BRIG_CHANNEL_TYPE_UNSIGNED_INT32 = 13,
+    BRIG_CHANNEL_TYPE_HALF_FLOAT = 14,
+    BRIG_CHANNEL_TYPE_FLOAT = 15,
+
+    BRIG_CHANNEL_TYPE_UNKNOWN,
+
+    BRIG_CHANNEL_TYPE_FIRST_USER_DEFINED = 128
+};
+
+enum BrigImageGeometry {
+
+    BRIG_GEOMETRY_1D = 0,
+    BRIG_GEOMETRY_2D = 1,
+    BRIG_GEOMETRY_3D = 2,
+    BRIG_GEOMETRY_1DA = 3,
+    BRIG_GEOMETRY_2DA = 4,
+    BRIG_GEOMETRY_1DB = 5,
+    BRIG_GEOMETRY_2DDEPTH = 6,
+    BRIG_GEOMETRY_2DADEPTH = 7,
+
+    BRIG_GEOMETRY_UNKNOWN,
+
+    BRIG_GEOMETRY_FIRST_USER_DEFINED = 128
+};
+
+enum BrigImageQuery {
+
+    BRIG_IMAGE_QUERY_WIDTH = 0,
+    BRIG_IMAGE_QUERY_HEIGHT = 1,
+    BRIG_IMAGE_QUERY_DEPTH = 2,
+    BRIG_IMAGE_QUERY_ARRAY = 3,
+    BRIG_IMAGE_QUERY_CHANNELORDER = 4,
+    BRIG_IMAGE_QUERY_CHANNELTYPE = 5
+};
+
+enum BrigLinkage {
+
+    BRIG_LINKAGE_NONE = 0,
+    BRIG_LINKAGE_PROGRAM = 1,
+    BRIG_LINKAGE_MODULE = 2,
+    BRIG_LINKAGE_FUNCTION = 3,
+    BRIG_LINKAGE_ARG = 4
+};
+
+enum BrigMachineModel {
+
+    BRIG_MACHINE_SMALL = 0,
+    BRIG_MACHINE_LARGE = 1,
+
+    BRIG_MACHINE_UNDEF = 2
+};
+
+enum BrigMemoryModifierMask {
+    BRIG_MEMORY_CONST = 1
+};
+
+enum BrigMemoryOrder {
+
+    BRIG_MEMORY_ORDER_NONE = 0,
+    BRIG_MEMORY_ORDER_RELAXED = 1,
+    BRIG_MEMORY_ORDER_SC_ACQUIRE = 2,
+    BRIG_MEMORY_ORDER_SC_RELEASE = 3,
+    BRIG_MEMORY_ORDER_SC_ACQUIRE_RELEASE = 4,
+
+    BRIG_MEMORY_ORDER_LAST = 5
+};
+
+enum BrigMemoryScope {
+
+    BRIG_MEMORY_SCOPE_NONE = 0,
+    BRIG_MEMORY_SCOPE_WORKITEM = 1,
+    BRIG_MEMORY_SCOPE_WAVEFRONT = 2,
+    BRIG_MEMORY_SCOPE_WORKGROUP = 3,
+    BRIG_MEMORY_SCOPE_AGENT = 4,
+    BRIG_MEMORY_SCOPE_SYSTEM = 5,
+
+    BRIG_MEMORY_SCOPE_LAST = 6
+};
+
+enum BrigOpcode {
+
+    BRIG_OPCODE_NOP = 0,
+    BRIG_OPCODE_ABS = 1,
+    BRIG_OPCODE_ADD = 2,
+    BRIG_OPCODE_BORROW = 3,
+    BRIG_OPCODE_CARRY = 4,
+    BRIG_OPCODE_CEIL = 5,
+    BRIG_OPCODE_COPYSIGN = 6,
+    BRIG_OPCODE_DIV = 7,
+    BRIG_OPCODE_FLOOR = 8,
+    BRIG_OPCODE_FMA = 9,
+    BRIG_OPCODE_FRACT = 10,
+    BRIG_OPCODE_MAD = 11,
+    BRIG_OPCODE_MAX = 12,
+    BRIG_OPCODE_MIN = 13,
+    BRIG_OPCODE_MUL = 14,
+    BRIG_OPCODE_MULHI = 15,
+    BRIG_OPCODE_NEG = 16,
+    BRIG_OPCODE_REM = 17,
+    BRIG_OPCODE_RINT = 18,
+    BRIG_OPCODE_SQRT = 19,
+    BRIG_OPCODE_SUB = 20,
+    BRIG_OPCODE_TRUNC = 21,
+    BRIG_OPCODE_MAD24 = 22,
+    BRIG_OPCODE_MAD24HI = 23,
+    BRIG_OPCODE_MUL24 = 24,
+    BRIG_OPCODE_MUL24HI = 25,
+    BRIG_OPCODE_SHL = 26,
+    BRIG_OPCODE_SHR = 27,
+    BRIG_OPCODE_AND = 28,
+    BRIG_OPCODE_NOT = 29,
+    BRIG_OPCODE_OR = 30,
+    BRIG_OPCODE_POPCOUNT = 31,
+    BRIG_OPCODE_XOR = 32,
+    BRIG_OPCODE_BITEXTRACT = 33,
+    BRIG_OPCODE_BITINSERT = 34,
+    BRIG_OPCODE_BITMASK = 35,
+    BRIG_OPCODE_BITREV = 36,
+    BRIG_OPCODE_BITSELECT = 37,
+    BRIG_OPCODE_FIRSTBIT = 38,
+    BRIG_OPCODE_LASTBIT = 39,
+    BRIG_OPCODE_COMBINE = 40,
+    BRIG_OPCODE_EXPAND = 41,
+    BRIG_OPCODE_LDA = 42,
+    BRIG_OPCODE_MOV = 43,
+    BRIG_OPCODE_SHUFFLE = 44,
+    BRIG_OPCODE_UNPACKHI = 45,
+    BRIG_OPCODE_UNPACKLO = 46,
+    BRIG_OPCODE_PACK = 47,
+    BRIG_OPCODE_UNPACK = 48,
+    BRIG_OPCODE_CMOV = 49,
+    BRIG_OPCODE_CLASS = 50,
+    BRIG_OPCODE_NCOS = 51,
+    BRIG_OPCODE_NEXP2 = 52,
+    BRIG_OPCODE_NFMA = 53,
+    BRIG_OPCODE_NLOG2 = 54,
+    BRIG_OPCODE_NRCP = 55,
+    BRIG_OPCODE_NRSQRT = 56,
+    BRIG_OPCODE_NSIN = 57,
+    BRIG_OPCODE_NSQRT = 58,
+    BRIG_OPCODE_BITALIGN = 59,
+    BRIG_OPCODE_BYTEALIGN = 60,
+    BRIG_OPCODE_PACKCVT = 61,
+    BRIG_OPCODE_UNPACKCVT = 62,
+    BRIG_OPCODE_LERP = 63,
+    BRIG_OPCODE_SAD = 64,
+    BRIG_OPCODE_SADHI = 65,
+    BRIG_OPCODE_SEGMENTP = 66,
+    BRIG_OPCODE_FTOS = 67,
+    BRIG_OPCODE_STOF = 68,
+    BRIG_OPCODE_CMP = 69,
+    BRIG_OPCODE_CVT = 70,
+    BRIG_OPCODE_LD = 71,
+    BRIG_OPCODE_ST = 72,
+    BRIG_OPCODE_ATOMIC = 73,
+    BRIG_OPCODE_ATOMICNORET = 74,
+    BRIG_OPCODE_SIGNAL = 75,
+    BRIG_OPCODE_SIGNALNORET = 76,
+    BRIG_OPCODE_MEMFENCE = 77,
+    BRIG_OPCODE_RDIMAGE = 78,
+    BRIG_OPCODE_LDIMAGE = 79,
+    BRIG_OPCODE_STIMAGE = 80,
+    BRIG_OPCODE_IMAGEFENCE = 81,
+    BRIG_OPCODE_QUERYIMAGE = 82,
+    BRIG_OPCODE_QUERYSAMPLER = 83,
+    BRIG_OPCODE_CBR = 84,
+    BRIG_OPCODE_BR = 85,
+    BRIG_OPCODE_SBR = 86,
+    BRIG_OPCODE_BARRIER = 87,
+    BRIG_OPCODE_WAVEBARRIER = 88,
+    BRIG_OPCODE_ARRIVEFBAR = 89,
+    BRIG_OPCODE_INITFBAR = 90,
+    BRIG_OPCODE_JOINFBAR = 91,
+    BRIG_OPCODE_LEAVEFBAR = 92,
+    BRIG_OPCODE_RELEASEFBAR = 93,
+    BRIG_OPCODE_WAITFBAR = 94,
+    BRIG_OPCODE_LDF = 95,
+    BRIG_OPCODE_ACTIVELANECOUNT = 96,
+    BRIG_OPCODE_ACTIVELANEID = 97,
+    BRIG_OPCODE_ACTIVELANEMASK = 98,
+    BRIG_OPCODE_ACTIVELANEPERMUTE = 99,
+    BRIG_OPCODE_CALL = 100,
+    BRIG_OPCODE_SCALL = 101,
+    BRIG_OPCODE_ICALL = 102,
+    BRIG_OPCODE_RET = 103,
+    BRIG_OPCODE_ALLOCA = 104,
+    BRIG_OPCODE_CURRENTWORKGROUPSIZE = 105,
+    BRIG_OPCODE_CURRENTWORKITEMFLATID = 106,
+    BRIG_OPCODE_DIM = 107,
+    BRIG_OPCODE_GRIDGROUPS = 108,
+    BRIG_OPCODE_GRIDSIZE = 109,
+    BRIG_OPCODE_PACKETCOMPLETIONSIG = 110,
+    BRIG_OPCODE_PACKETID = 111,
+    BRIG_OPCODE_WORKGROUPID = 112,
+    BRIG_OPCODE_WORKGROUPSIZE = 113,
+    BRIG_OPCODE_WORKITEMABSID = 114,
+    BRIG_OPCODE_WORKITEMFLATABSID = 115,
+    BRIG_OPCODE_WORKITEMFLATID = 116,
+    BRIG_OPCODE_WORKITEMID = 117,
+    BRIG_OPCODE_CLEARDETECTEXCEPT = 118,
+    BRIG_OPCODE_GETDETECTEXCEPT = 119,
+    BRIG_OPCODE_SETDETECTEXCEPT = 120,
+    BRIG_OPCODE_ADDQUEUEWRITEINDEX = 121,
+    BRIG_OPCODE_CASQUEUEWRITEINDEX = 122,
+    BRIG_OPCODE_LDQUEUEREADINDEX = 123,
+    BRIG_OPCODE_LDQUEUEWRITEINDEX = 124,
+    BRIG_OPCODE_STQUEUEREADINDEX = 125,
+    BRIG_OPCODE_STQUEUEWRITEINDEX = 126,
+    BRIG_OPCODE_CLOCK = 127,
+    BRIG_OPCODE_CUID = 128,
+    BRIG_OPCODE_DEBUGTRAP = 129,
+    BRIG_OPCODE_GROUPBASEPTR = 130,
+    BRIG_OPCODE_KERNARGBASEPTR = 131,
+    BRIG_OPCODE_LANEID = 132,
+    BRIG_OPCODE_MAXCUID = 133,
+    BRIG_OPCODE_MAXWAVEID = 134,
+    BRIG_OPCODE_NULLPTR = 135,
+    BRIG_OPCODE_WAVEID = 136,
+    BRIG_OPCODE_FIRST_USER_DEFINED = 32768,
+
+    BRIG_OPCODE_GCNMADU = (1u << 15) | 0,
+    BRIG_OPCODE_GCNMADS = (1u << 15) | 1,
+    BRIG_OPCODE_GCNMAX3 = (1u << 15) | 2,
+    BRIG_OPCODE_GCNMIN3 = (1u << 15) | 3,
+    BRIG_OPCODE_GCNMED3 = (1u << 15) | 4,
+    BRIG_OPCODE_GCNFLDEXP = (1u << 15) | 5,
+    BRIG_OPCODE_GCNFREXP_EXP = (1u << 15) | 6,
+    BRIG_OPCODE_GCNFREXP_MANT = (1u << 15) | 7,
+    BRIG_OPCODE_GCNTRIG_PREOP = (1u << 15) | 8,
+    BRIG_OPCODE_GCNBFM = (1u << 15) | 9,
+    BRIG_OPCODE_GCNLD = (1u << 15) | 10,
+    BRIG_OPCODE_GCNST = (1u << 15) | 11,
+    BRIG_OPCODE_GCNATOMIC = (1u << 15) | 12,
+    BRIG_OPCODE_GCNATOMICNORET = (1u << 15) | 13,
+    BRIG_OPCODE_GCNSLEEP = (1u << 15) | 14,
+    BRIG_OPCODE_GCNPRIORITY = (1u << 15) | 15,
+    BRIG_OPCODE_GCNREGIONALLOC = (1u << 15) | 16,
+    BRIG_OPCODE_GCNMSAD = (1u << 15) | 17,
+    BRIG_OPCODE_GCNQSAD = (1u << 15) | 18,
+    BRIG_OPCODE_GCNMQSAD = (1u << 15) | 19,
+    BRIG_OPCODE_GCNMQSAD4 = (1u << 15) | 20,
+    BRIG_OPCODE_GCNSADW = (1u << 15) | 21,
+    BRIG_OPCODE_GCNSADD = (1u << 15) | 22,
+    BRIG_OPCODE_GCNCONSUME = (1u << 15) | 23,
+    BRIG_OPCODE_GCNAPPEND = (1u << 15) | 24,
+    BRIG_OPCODE_GCNB4XCHG = (1u << 15) | 25,
+    BRIG_OPCODE_GCNB32XCHG = (1u << 15) | 26,
+    BRIG_OPCODE_GCNMAX = (1u << 15) | 27,
+    BRIG_OPCODE_GCNMIN = (1u << 15) | 28,
+    BRIG_OPCODE_GCNDIVRELAXED = (1u << 15) | 29,
+    BRIG_OPCODE_GCNDIVRELAXEDNARROW = (1u << 15) | 30
+};
+
+enum BrigPack {
+
+    BRIG_PACK_NONE = 0,
+    BRIG_PACK_PP = 1,
+    BRIG_PACK_PS = 2,
+    BRIG_PACK_SP = 3,
+    BRIG_PACK_SS = 4,
+    BRIG_PACK_S = 5,
+    BRIG_PACK_P = 6,
+    BRIG_PACK_PPSAT = 7,
+    BRIG_PACK_PSSAT = 8,
+    BRIG_PACK_SPSAT = 9,
+    BRIG_PACK_SSSAT = 10,
+    BRIG_PACK_SSAT = 11,
+    BRIG_PACK_PSAT = 12
+};
+
+enum BrigProfile {
+
+    BRIG_PROFILE_BASE = 0,
+    BRIG_PROFILE_FULL = 1,
+
+    BRIG_PROFILE_UNDEF = 2
+};
+
+enum BrigRegisterKind {
+
+    BRIG_REGISTER_KIND_CONTROL = 0,
+    BRIG_REGISTER_KIND_SINGLE = 1,
+    BRIG_REGISTER_KIND_DOUBLE = 2,
+    BRIG_REGISTER_KIND_QUAD = 3
+};
+
+enum BrigRound {
+
+    BRIG_ROUND_NONE = 0,
+    BRIG_ROUND_FLOAT_DEFAULT = 1,
+    BRIG_ROUND_FLOAT_NEAR_EVEN = 2,
+    BRIG_ROUND_FLOAT_ZERO = 3,
+    BRIG_ROUND_FLOAT_PLUS_INFINITY = 4,
+    BRIG_ROUND_FLOAT_MINUS_INFINITY = 5,
+    BRIG_ROUND_INTEGER_NEAR_EVEN = 6,
+    BRIG_ROUND_INTEGER_ZERO = 7,
+    BRIG_ROUND_INTEGER_PLUS_INFINITY = 8,
+    BRIG_ROUND_INTEGER_MINUS_INFINITY = 9,
+    BRIG_ROUND_INTEGER_NEAR_EVEN_SAT = 10,
+    BRIG_ROUND_INTEGER_ZERO_SAT = 11,
+    BRIG_ROUND_INTEGER_PLUS_INFINITY_SAT = 12,
+    BRIG_ROUND_INTEGER_MINUS_INFINITY_SAT = 13,
+    BRIG_ROUND_INTEGER_SIGNALING_NEAR_EVEN = 14,
+    BRIG_ROUND_INTEGER_SIGNALING_ZERO = 15,
+    BRIG_ROUND_INTEGER_SIGNALING_PLUS_INFINITY = 16,
+    BRIG_ROUND_INTEGER_SIGNALING_MINUS_INFINITY = 17,
+    BRIG_ROUND_INTEGER_SIGNALING_NEAR_EVEN_SAT = 18,
+    BRIG_ROUND_INTEGER_SIGNALING_ZERO_SAT = 19,
+    BRIG_ROUND_INTEGER_SIGNALING_PLUS_INFINITY_SAT = 20,
+    BRIG_ROUND_INTEGER_SIGNALING_MINUS_INFINITY_SAT = 21
+};
+
+enum BrigSamplerAddressing {
+
+    BRIG_ADDRESSING_UNDEFINED = 0,
+    BRIG_ADDRESSING_CLAMP_TO_EDGE = 1,
+    BRIG_ADDRESSING_CLAMP_TO_BORDER = 2,
+    BRIG_ADDRESSING_REPEAT = 3,
+    BRIG_ADDRESSING_MIRRORED_REPEAT = 4,
+
+    BRIG_ADDRESSING_FIRST_USER_DEFINED = 128
+};
+
+enum BrigSamplerCoordNormalization {
+
+    BRIG_COORD_UNNORMALIZED = 0,
+    BRIG_COORD_NORMALIZED = 1
+};
+
+enum BrigSamplerFilter {
+
+    BRIG_FILTER_NEAREST = 0,
+    BRIG_FILTER_LINEAR = 1,
+
+    BRIG_FILTER_FIRST_USER_DEFINED = 128
+};
+
+enum BrigSamplerQuery {
+
+    BRIG_SAMPLER_QUERY_ADDRESSING = 0,
+    BRIG_SAMPLER_QUERY_COORD = 1,
+    BRIG_SAMPLER_QUERY_FILTER = 2
+};
+
+enum BrigSectionIndex {
+
+    BRIG_SECTION_INDEX_DATA = 0,
+    BRIG_SECTION_INDEX_CODE = 1,
+    BRIG_SECTION_INDEX_OPERAND = 2,
+    BRIG_SECTION_INDEX_BEGIN_IMPLEMENTATION_DEFINED = 3,
+
+    BRIG_SECTION_INDEX_IMPLEMENTATION_DEFINED = BRIG_SECTION_INDEX_BEGIN_IMPLEMENTATION_DEFINED
+};
+
+enum BrigSegCvtModifierMask {
+    BRIG_SEG_CVT_NONULL = 1
+};
+
+enum BrigSegment {
+
+    BRIG_SEGMENT_NONE = 0,
+    BRIG_SEGMENT_FLAT = 1,
+    BRIG_SEGMENT_GLOBAL = 2,
+    BRIG_SEGMENT_READONLY = 3,
+    BRIG_SEGMENT_KERNARG = 4,
+    BRIG_SEGMENT_GROUP = 5,
+    BRIG_SEGMENT_PRIVATE = 6,
+    BRIG_SEGMENT_SPILL = 7,
+    BRIG_SEGMENT_ARG = 8,
+
+    BRIG_SEGMENT_FIRST_USER_DEFINED = 128,
+
+    BRIG_SEGMENT_AMD_GCN = 9
+};
+
+enum BrigPackedTypeBits {
+
+    BRIG_TYPE_BASE_SIZE  = 5,
+    BRIG_TYPE_PACK_SIZE  = 2,
+    BRIG_TYPE_ARRAY_SIZE = 1,
+
+    BRIG_TYPE_BASE_SHIFT  = 0,
+    BRIG_TYPE_PACK_SHIFT  = BRIG_TYPE_BASE_SHIFT + BRIG_TYPE_BASE_SIZE,
+    BRIG_TYPE_ARRAY_SHIFT = BRIG_TYPE_PACK_SHIFT + BRIG_TYPE_PACK_SIZE,
+
+    BRIG_TYPE_BASE_MASK  = ((1 << BRIG_TYPE_BASE_SIZE)  - 1) << BRIG_TYPE_BASE_SHIFT,
+    BRIG_TYPE_PACK_MASK  = ((1 << BRIG_TYPE_PACK_SIZE)  - 1) << BRIG_TYPE_PACK_SHIFT,
+    BRIG_TYPE_ARRAY_MASK = ((1 << BRIG_TYPE_ARRAY_SIZE) - 1) << BRIG_TYPE_ARRAY_SHIFT,
+
+    BRIG_TYPE_PACK_NONE = 0 << BRIG_TYPE_PACK_SHIFT,
+    BRIG_TYPE_PACK_32   = 1 << BRIG_TYPE_PACK_SHIFT,
+    BRIG_TYPE_PACK_64   = 2 << BRIG_TYPE_PACK_SHIFT,
+    BRIG_TYPE_PACK_128  = 3 << BRIG_TYPE_PACK_SHIFT,
+
+    BRIG_TYPE_ARRAY     = 1 << BRIG_TYPE_ARRAY_SHIFT
+};
+
+enum BrigType {
+
+    BRIG_TYPE_NONE  = 0,
+    BRIG_TYPE_U8    = 1,
+    BRIG_TYPE_U16   = 2,
+    BRIG_TYPE_U32   = 3,
+    BRIG_TYPE_U64   = 4,
+    BRIG_TYPE_S8    = 5,
+    BRIG_TYPE_S16   = 6,
+    BRIG_TYPE_S32   = 7,
+    BRIG_TYPE_S64   = 8,
+    BRIG_TYPE_F16   = 9,
+    BRIG_TYPE_F32   = 10,
+    BRIG_TYPE_F64   = 11,
+    BRIG_TYPE_B1    = 12,
+    BRIG_TYPE_B8    = 13,
+    BRIG_TYPE_B16   = 14,
+    BRIG_TYPE_B32   = 15,
+    BRIG_TYPE_B64   = 16,
+    BRIG_TYPE_B128  = 17,
+    BRIG_TYPE_SAMP  = 18,
+    BRIG_TYPE_ROIMG = 19,
+    BRIG_TYPE_WOIMG = 20,
+    BRIG_TYPE_RWIMG = 21,
+    BRIG_TYPE_SIG32 = 22,
+    BRIG_TYPE_SIG64 = 23,
+
+    BRIG_TYPE_U8X4  = BRIG_TYPE_U8  | BRIG_TYPE_PACK_32,
+    BRIG_TYPE_U8X8  = BRIG_TYPE_U8  | BRIG_TYPE_PACK_64,
+    BRIG_TYPE_U8X16 = BRIG_TYPE_U8  | BRIG_TYPE_PACK_128,
+    BRIG_TYPE_U16X2 = BRIG_TYPE_U16 | BRIG_TYPE_PACK_32,
+    BRIG_TYPE_U16X4 = BRIG_TYPE_U16 | BRIG_TYPE_PACK_64,
+    BRIG_TYPE_U16X8 = BRIG_TYPE_U16 | BRIG_TYPE_PACK_128,
+    BRIG_TYPE_U32X2 = BRIG_TYPE_U32 | BRIG_TYPE_PACK_64,
+    BRIG_TYPE_U32X4 = BRIG_TYPE_U32 | BRIG_TYPE_PACK_128,
+    BRIG_TYPE_U64X2 = BRIG_TYPE_U64 | BRIG_TYPE_PACK_128,
+    BRIG_TYPE_S8X4  = BRIG_TYPE_S8  | BRIG_TYPE_PACK_32,
+    BRIG_TYPE_S8X8  = BRIG_TYPE_S8  | BRIG_TYPE_PACK_64,
+    BRIG_TYPE_S8X16 = BRIG_TYPE_S8  | BRIG_TYPE_PACK_128,
+    BRIG_TYPE_S16X2 = BRIG_TYPE_S16 | BRIG_TYPE_PACK_32,
+    BRIG_TYPE_S16X4 = BRIG_TYPE_S16 | BRIG_TYPE_PACK_64,
+    BRIG_TYPE_S16X8 = BRIG_TYPE_S16 | BRIG_TYPE_PACK_128,
+    BRIG_TYPE_S32X2 = BRIG_TYPE_S32 | BRIG_TYPE_PACK_64,
+    BRIG_TYPE_S32X4 = BRIG_TYPE_S32 | BRIG_TYPE_PACK_128,
+    BRIG_TYPE_S64X2 = BRIG_TYPE_S64 | BRIG_TYPE_PACK_128,
+    BRIG_TYPE_F16X2 = BRIG_TYPE_F16 | BRIG_TYPE_PACK_32,
+    BRIG_TYPE_F16X4 = BRIG_TYPE_F16 | BRIG_TYPE_PACK_64,
+    BRIG_TYPE_F16X8 = BRIG_TYPE_F16 | BRIG_TYPE_PACK_128,
+    BRIG_TYPE_F32X2 = BRIG_TYPE_F32 | BRIG_TYPE_PACK_64,
+    BRIG_TYPE_F32X4 = BRIG_TYPE_F32 | BRIG_TYPE_PACK_128,
+    BRIG_TYPE_F64X2 = BRIG_TYPE_F64 | BRIG_TYPE_PACK_128,
+
+    BRIG_TYPE_U8_ARRAY    = BRIG_TYPE_U8    | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_U16_ARRAY   = BRIG_TYPE_U16   | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_U32_ARRAY   = BRIG_TYPE_U32   | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_U64_ARRAY   = BRIG_TYPE_U64   | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_S8_ARRAY    = BRIG_TYPE_S8    | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_S16_ARRAY   = BRIG_TYPE_S16   | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_S32_ARRAY   = BRIG_TYPE_S32   | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_S64_ARRAY   = BRIG_TYPE_S64   | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_F16_ARRAY   = BRIG_TYPE_F16   | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_F32_ARRAY   = BRIG_TYPE_F32   | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_F64_ARRAY   = BRIG_TYPE_F64   | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_B8_ARRAY    = BRIG_TYPE_B8    | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_B16_ARRAY   = BRIG_TYPE_B16   | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_B32_ARRAY   = BRIG_TYPE_B32   | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_B64_ARRAY   = BRIG_TYPE_B64   | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_B128_ARRAY  = BRIG_TYPE_B128  | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_SAMP_ARRAY  = BRIG_TYPE_SAMP  | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_ROIMG_ARRAY = BRIG_TYPE_ROIMG | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_WOIMG_ARRAY = BRIG_TYPE_WOIMG | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_RWIMG_ARRAY = BRIG_TYPE_RWIMG | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_SIG32_ARRAY = BRIG_TYPE_SIG32 | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_SIG64_ARRAY = BRIG_TYPE_SIG64 | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_U8X4_ARRAY  = BRIG_TYPE_U8X4  | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_U8X8_ARRAY  = BRIG_TYPE_U8X8  | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_U8X16_ARRAY = BRIG_TYPE_U8X16 | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_U16X2_ARRAY = BRIG_TYPE_U16X2 | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_U16X4_ARRAY = BRIG_TYPE_U16X4 | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_U16X8_ARRAY = BRIG_TYPE_U16X8 | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_U32X2_ARRAY = BRIG_TYPE_U32X2 | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_U32X4_ARRAY = BRIG_TYPE_U32X4 | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_U64X2_ARRAY = BRIG_TYPE_U64X2 | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_S8X4_ARRAY  = BRIG_TYPE_S8X4  | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_S8X8_ARRAY  = BRIG_TYPE_S8X8  | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_S8X16_ARRAY = BRIG_TYPE_S8X16 | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_S16X2_ARRAY = BRIG_TYPE_S16X2 | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_S16X4_ARRAY = BRIG_TYPE_S16X4 | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_S16X8_ARRAY = BRIG_TYPE_S16X8 | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_S32X2_ARRAY = BRIG_TYPE_S32X2 | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_S32X4_ARRAY = BRIG_TYPE_S32X4 | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_S64X2_ARRAY = BRIG_TYPE_S64X2 | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_F16X2_ARRAY = BRIG_TYPE_F16X2 | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_F16X4_ARRAY = BRIG_TYPE_F16X4 | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_F16X8_ARRAY = BRIG_TYPE_F16X8 | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_F32X2_ARRAY = BRIG_TYPE_F32X2 | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_F32X4_ARRAY = BRIG_TYPE_F32X4 | BRIG_TYPE_ARRAY,
+    BRIG_TYPE_F64X2_ARRAY = BRIG_TYPE_F64X2 | BRIG_TYPE_ARRAY,
+
+    BRIG_TYPE_INVALID = (unsigned) -1
+};
+
+enum BrigVariableModifierMask {
+
+    BRIG_VARIABLE_DEFINITION = 1,
+    BRIG_VARIABLE_CONST = 2
+};
+
+enum BrigWidth {
+
+    BRIG_WIDTH_NONE = 0,
+    BRIG_WIDTH_1 = 1,
+    BRIG_WIDTH_2 = 2,
+    BRIG_WIDTH_4 = 3,
+    BRIG_WIDTH_8 = 4,
+    BRIG_WIDTH_16 = 5,
+    BRIG_WIDTH_32 = 6,
+    BRIG_WIDTH_64 = 7,
+    BRIG_WIDTH_128 = 8,
+    BRIG_WIDTH_256 = 9,
+    BRIG_WIDTH_512 = 10,
+    BRIG_WIDTH_1024 = 11,
+    BRIG_WIDTH_2048 = 12,
+    BRIG_WIDTH_4096 = 13,
+    BRIG_WIDTH_8192 = 14,
+    BRIG_WIDTH_16384 = 15,
+    BRIG_WIDTH_32768 = 16,
+    BRIG_WIDTH_65536 = 17,
+    BRIG_WIDTH_131072 = 18,
+    BRIG_WIDTH_262144 = 19,
+    BRIG_WIDTH_524288 = 20,
+    BRIG_WIDTH_1048576 = 21,
+    BRIG_WIDTH_2097152 = 22,
+    BRIG_WIDTH_4194304 = 23,
+    BRIG_WIDTH_8388608 = 24,
+    BRIG_WIDTH_16777216 = 25,
+    BRIG_WIDTH_33554432 = 26,
+    BRIG_WIDTH_67108864 = 27,
+    BRIG_WIDTH_134217728 = 28,
+    BRIG_WIDTH_268435456 = 29,
+    BRIG_WIDTH_536870912 = 30,
+    BRIG_WIDTH_1073741824 = 31,
+    BRIG_WIDTH_2147483648 = 32,
+    BRIG_WIDTH_WAVESIZE = 33,
+    BRIG_WIDTH_ALL = 34,
+
+    BRIG_WIDTH_LAST
+};
+
+struct BrigUInt64 {
+    uint32_t lo;
+    uint32_t hi;
+
+};
+
+struct BrigAluModifier {
+    BrigAluModifier8_t allBits;
+
+};
+
+struct BrigBase {
+    uint16_t byteCount;
+    BrigKind16_t kind;
+};
+
+struct BrigData {
+
+    uint32_t byteCount;
+    uint8_t bytes[1];
+};
+
+struct BrigExecutableModifier {
+    BrigExecutableModifier8_t allBits;
+
+};
+
+struct BrigMemoryModifier {
+    BrigMemoryModifier8_t allBits;
+
+};
+
+struct BrigSegCvtModifier {
+    BrigSegCvtModifier8_t allBits;
+
+};
+
+struct BrigVariableModifier {
+    BrigVariableModifier8_t allBits;
+
+};
+
+struct BrigDirectiveArgBlockEnd {
+    BrigBase base;
+};
+
+struct BrigDirectiveArgBlockStart {
+    BrigBase base;
+};
+
+struct BrigDirectiveComment {
+    BrigBase base;
+    BrigDataOffsetString32_t name;
+};
+
+struct BrigDirectiveControl {
+    BrigBase base;
+    BrigControlDirective16_t control;
+    uint16_t reserved;
+    BrigDataOffsetOperandList32_t operands;
+};
+
+struct BrigDirectiveExecutable {
+    BrigBase base;
+    BrigDataOffsetString32_t name;
+    uint16_t outArgCount;
+    uint16_t inArgCount;
+    BrigCodeOffset32_t firstInArg;
+    BrigCodeOffset32_t firstCodeBlockEntry;
+    BrigCodeOffset32_t nextModuleEntry;
+    BrigExecutableModifier modifier;
+    BrigLinkage8_t linkage;
+    uint16_t reserved;
+};
+
+struct BrigDirectiveExtension {
+    BrigBase base;
+    BrigDataOffsetString32_t name;
+};
+
+struct BrigDirectiveFbarrier {
+    BrigBase base;
+    BrigDataOffsetString32_t name;
+    BrigVariableModifier modifier;
+    BrigLinkage8_t linkage;
+    uint16_t reserved;
+};
+
+struct BrigDirectiveLabel {
+    BrigBase base;
+    BrigDataOffsetString32_t name;
+};
+
+struct BrigDirectiveLoc {
+    BrigBase base;
+    BrigDataOffsetString32_t filename;
+    uint32_t line;
+    uint32_t column;
+};
+
+struct BrigDirectiveNone {
+    BrigBase base;
+};
+
+struct BrigDirectivePragma {
+    BrigBase base;
+    BrigDataOffsetOperandList32_t operands;
+};
+
+struct BrigDirectiveVariable {
+    BrigBase base;
+    BrigDataOffsetString32_t name;
+    BrigOperandOffset32_t init;
+    BrigType16_t type;
+
+    BrigSegment8_t segment;
+    BrigAlignment8_t align;
+    BrigUInt64 dim;
+    BrigVariableModifier modifier;
+    BrigLinkage8_t linkage;
+    BrigAllocation8_t allocation;
+    uint8_t reserved;
+};
+
+struct BrigDirectiveModule {
+    BrigBase base;
+    BrigDataOffsetString32_t name;
+    BrigVersion32_t hsailMajor;
+    BrigVersion32_t hsailMinor;
+    BrigProfile8_t profile;
+    BrigMachineModel8_t machineModel;
+    BrigRound8_t defaultFloatRound;
+    uint8_t reserved;
+};
+
+struct BrigInstBase {
+    BrigBase base;
+    BrigOpcode16_t opcode;
+    BrigType16_t type;
+    BrigDataOffsetOperandList32_t operands;
+
+};
+
+struct BrigInstAddr {
+    BrigInstBase base;
+    BrigSegment8_t segment;
+    uint8_t reserved[3];
+};
+
+struct BrigInstAtomic {
+    BrigInstBase base;
+    BrigSegment8_t segment;
+    BrigMemoryOrder8_t memoryOrder;
+    BrigMemoryScope8_t memoryScope;
+    BrigAtomicOperation8_t atomicOperation;
+    uint8_t equivClass;
+    uint8_t reserved[3];
+};
+
+struct BrigInstBasic {
+    BrigInstBase base;
+};
+
+struct BrigInstBr {
+    BrigInstBase base;
+    BrigWidth8_t width;
+    uint8_t reserved[3];
+};
+
+struct BrigInstCmp {
+    BrigInstBase base;
+    BrigType16_t sourceType;
+    BrigAluModifier modifier;
+    BrigCompareOperation8_t compare;
+    BrigPack8_t pack;
+    uint8_t reserved[3];
+};
+
+struct BrigInstCvt {
+    BrigInstBase base;
+    BrigType16_t sourceType;
+    BrigAluModifier modifier;
+    BrigRound8_t round;
+};
+
+struct BrigInstImage {
+    BrigInstBase base;
+    BrigType16_t imageType;
+    BrigType16_t coordType;
+    BrigImageGeometry8_t geometry;
+    uint8_t equivClass;
+    uint16_t reserved;
+};
+
+struct BrigInstLane {
+    BrigInstBase base;
+    BrigType16_t sourceType;
+    BrigWidth8_t width;
+    uint8_t reserved;
+};
+
+struct BrigInstMem {
+    BrigInstBase base;
+    BrigSegment8_t segment;
+    BrigAlignment8_t align;
+    uint8_t equivClass;
+    BrigWidth8_t width;
+    BrigMemoryModifier modifier;
+    uint8_t reserved[3];
+};
+
+struct BrigInstMemFence {
+    BrigInstBase base;
+    BrigMemoryOrder8_t memoryOrder;
+    BrigMemoryScope8_t globalSegmentMemoryScope;
+    BrigMemoryScope8_t groupSegmentMemoryScope;
+    BrigMemoryScope8_t imageSegmentMemoryScope;
+};
+
+struct BrigInstMod {
+    BrigInstBase base;
+    BrigAluModifier modifier;
+    BrigRound8_t round;
+    BrigPack8_t pack;
+    uint8_t reserved;
+};
+
+struct BrigInstQueryImage {
+    BrigInstBase base;
+    BrigType16_t imageType;
+    BrigImageGeometry8_t geometry;
+    BrigImageQuery8_t imageQuery;
+};
+
+struct BrigInstQuerySampler {
+    BrigInstBase base;
+    BrigSamplerQuery8_t samplerQuery;
+    uint8_t reserved[3];
+};
+
+struct BrigInstQueue {
+    BrigInstBase base;
+    BrigSegment8_t segment;
+    BrigMemoryOrder8_t memoryOrder;
+    uint16_t reserved;
+};
+
+struct BrigInstSeg {
+    BrigInstBase base;
+    BrigSegment8_t segment;
+    uint8_t reserved[3];
+};
+
+struct BrigInstSegCvt {
+    BrigInstBase base;
+    BrigType16_t sourceType;
+    BrigSegment8_t segment;
+    BrigSegCvtModifier modifier;
+};
+
+struct BrigInstSignal {
+    BrigInstBase base;
+    BrigType16_t signalType;
+    BrigMemoryOrder8_t memoryOrder;
+    BrigAtomicOperation8_t signalOperation;
+};
+
+struct BrigInstSourceType {
+    BrigInstBase base;
+    BrigType16_t sourceType;
+    uint16_t reserved;
+};
+
+struct BrigOperandAddress {
+    BrigBase base;
+    BrigCodeOffset32_t symbol;
+    BrigOperandOffset32_t reg;
+    BrigUInt64 offset;
+};
+
+struct BrigOperandAlign {
+    BrigBase base;
+    BrigAlignment8_t align;
+    uint8_t reserved[3];
+};
+
+struct BrigOperandCodeList {
+    BrigBase base;
+    BrigDataOffsetCodeList32_t elements;
+
+};
+
+struct BrigOperandCodeRef {
+    BrigBase base;
+    BrigCodeOffset32_t ref;
+};
+
+struct BrigOperandConstantBytes {
+    BrigBase base;
+    BrigType16_t type;
+    uint16_t reserved;
+    BrigDataOffsetString32_t bytes;
+};
+
+struct BrigOperandConstantOperandList {
+    BrigBase base;
+    BrigType16_t type;
+    uint16_t reserved;
+    BrigDataOffsetOperandList32_t elements;
+
+};
+
+struct BrigOperandConstantImage {
+    BrigBase base;
+    BrigType16_t type;
+    BrigImageGeometry8_t geometry;
+    BrigImageChannelOrder8_t channelOrder;
+    BrigImageChannelType8_t channelType;
+    uint8_t reserved[3];
+    BrigUInt64 width;
+    BrigUInt64 height;
+    BrigUInt64 depth;
+    BrigUInt64 array;
+};
+
+struct BrigOperandOperandList {
+    BrigBase base;
+    BrigDataOffsetOperandList32_t elements;
+
+};
+
+struct BrigOperandRegister {
+    BrigBase base;
+    BrigRegisterKind16_t regKind;
+    uint16_t regNum;
+};
+
+struct BrigOperandConstantSampler {
+    BrigBase base;
+    BrigType16_t type;
+    BrigSamplerCoordNormalization8_t coord;
+    BrigSamplerFilter8_t filter;
+    BrigSamplerAddressing8_t addressing;
+    uint8_t reserved[3];
+};
+
+struct BrigOperandString {
+    BrigBase base;
+    BrigDataOffsetString32_t string;
+};
+
+struct BrigOperandWavesize {
+    BrigBase base;
+};
+
+enum BrigExceptionsMask {
+    BRIG_EXCEPTIONS_INVALID_OPERATION = 1 << 0,
+    BRIG_EXCEPTIONS_DIVIDE_BY_ZERO = 1 << 1,
+    BRIG_EXCEPTIONS_OVERFLOW = 1 << 2,
+    BRIG_EXCEPTIONS_UNDERFLOW = 1 << 3,
+    BRIG_EXCEPTIONS_INEXACT = 1 << 4,
+
+    BRIG_EXCEPTIONS_FIRST_USER_DEFINED = 1 << 16
+};
+
+struct BrigSectionHeader {
+    uint64_t byteCount;
+    uint32_t headerByteCount;
+    uint32_t nameLength;
+    uint8_t name[1];
+};
+
+#define MODULE_IDENTIFICATION_LENGTH (8)
+
+struct BrigModuleHeader {
+    char identification[MODULE_IDENTIFICATION_LENGTH];
+    BrigVersion32_t brigMajor;
+    BrigVersion32_t brigMinor;
+    uint64_t byteCount;
+    uint8_t hash[64];
+    uint32_t reserved;
+    uint32_t sectionCount;
+    uint64_t sectionIndex;
+};
+
+typedef BrigModuleHeader* BrigModule_t;
+
+#endif /* HSA_BRIG_FORMAT_H */

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [hsa merge 05/10] OpenMP lowering/expansion changes (gridification)
  2016-01-13 17:55 [hsa merge 00/10] Merge of HSA branch Martin Jambor
                   ` (8 preceding siblings ...)
  2016-01-13 18:12 ` [hsa merge 08/10] HSAIL BRIG description header file Martin Jambor
@ 2016-01-13 18:17 ` Martin Jambor
  2016-01-14 12:52   ` Jakub Jelinek
  2016-01-14 14:22   ` Alexander Monakov
  2016-01-19 10:45 ` [hsa merge 00/10] Merge of HSA branch Martin Jambor
  10 siblings, 2 replies; 77+ messages in thread
From: Martin Jambor @ 2016-01-13 18:17 UTC (permalink / raw)
  To: GCC Patches; +Cc: Jakub Jelinek

[-- Attachment #1: omp.diff --]
[-- Type: text/plain, Size: 81527 bytes --]

Hi,

the patch in this email contains the changes to make our OpenMP
lowering and expansion machinery produce GPU kernels for a certain
limited class of loops.

The following is a re-post of
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00718.html with a fair
amount of incorporate feedback, almost all of which has been posted in
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01884.html.

Thanks,

Martin


2016-01-13  Martin Jambor  <mjambor@suse.cz>

gcc/
	* builtin-types.def (BT_FN_VOID_UINT_PTR_INT_PTR): New.
	(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT): Removed.
	(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR): New.
	* gimple-low.c (lower_stmt): Also handle GIMPLE_OMP_GRID_BODY.
	* gimple-pretty-print.c (dump_gimple_omp_for): Also handle
	GF_OMP_FOR_KIND_GRID_LOOP.
	(dump_gimple_omp_block): Also handle GIMPLE_OMP_GRID_BODY.
	(pp_gimple_stmt_1): Likewise.
	* gimple-walk.c (walk_gimple_stmt): Likewise.
	* gimple.c (gimple_build_omp_grid_body): New function.
	(gimple_copy): Also handle GIMPLE_OMP_GRID_BODY.
	* gimple.def (GIMPLE_OMP_GRID_BODY): New.
	* gimple.h (enum gf_mask): Added GF_OMP_PARALLEL_GRID_PHONY,
	GF_OMP_FOR_KIND_GRID_LOOP, GF_OMP_FOR_GRID_PHONY and
	GF_OMP_TEAMS_GRID_PHONY.
	(gimple_statement_omp_single_layout): Updated comments.
	(gimple_build_omp_grid_body): New function.
	(gimple_has_substatements): Also handle GIMPLE_OMP_GRID_BODY.
	(gimple_omp_for_grid_phony): New function.
	(gimple_omp_for_set_grid_phony): Likewise.
	(gimple_omp_parallel_grid_phony): Likewise.
	(gimple_omp_parallel_set_grid_phony): Likewise.
	(gimple_omp_teams_grid_phony): Likewise.
	(gimple_omp_teams_set_grid_phony): Likewise.
	(gimple_return_set_retbnd): Also handle GIMPLE_OMP_GRID_BODY.
	* omp-builtins.def (BUILT_IN_GOMP_OFFLOAD_REGISTER): New.
	(BUILT_IN_GOMP_OFFLOAD_UNREGISTER): Likewise.
	(BUILT_IN_GOMP_TARGET): Updated type.
	* omp-low.c: Include symbol-summary.h, hsa.h and params.h.
	(adjust_for_condition): New function.
	(get_omp_for_step_from_incr): Likewise.
	(extract_omp_for_data): Moved parts to adjust_for_condition and
	get_omp_for_step_from_incr.
	(build_outer_var_ref): Handle GIMPLE_OMP_GRID_BODY.
	(fixup_child_record_type): Bail out if receiver_decl is NULL.
	(scan_sharing_clauses): Handle OMP_CLAUSE__GRIDDIM_.
	(scan_omp_parallel): Do not create child functions for phony
	constructs.
	(check_omp_nesting_restrictions): Handle GIMPLE_OMP_GRID_BODY.
	(scan_omp_1_op): Checking assert we are not remapping to
	ERROR_MARK.  Also also handle GIMPLE_OMP_GRID_BODY.
	(parallel_needs_hsa_kernel_p): New function.
	(expand_parallel_call): Register apprpriate parallel child
	functions as HSA kernels.
	(grid_launch_attributes_trees): New type.
	(grid_attr_trees): New variable.
	(grid_create_kernel_launch_attr_types): New function.
	(grid_insert_store_range_dim): Likewise.
	(grid_get_kernel_launch_attributes): Likewise.
	(get_target_argument_identifier_1): Likewise.
	(get_target_argument_identifier): Likewise.
	(get_target_argument_value): Likewise.
	(push_target_argument_according_to_value): Likewise.
	(get_target_arguments): Likewise.
	(expand_omp_target): Call get_target_arguments instead of looking
	up for teams and thread limit.
	(grid_expand_omp_for_loop): New function.
	(grid_arg_decl_map): New type.
	(grid_remap_kernel_arg_accesses): New function.
	(grid_expand_target_kernel_body): New function.
	(expand_omp): Call it.
	(lower_omp_for): Do not emit phony constructs.
	(lower_omp_taskreg): Do not emit phony constructs but create for them
	a temporary variable receiver_decl.
	(lower_omp_taskreg): Do not emit phony constructs.
	(lower_omp_teams): Likewise.
	(lower_omp_grid_body): New function.
	(lower_omp_1): Call it.
	(grid_reg_assignment_to_local_var_p): New function.
	(grid_seq_only_contains_local_assignments): Likewise.
	(grid_find_single_omp_among_assignments_1): Likewise.
	(grid_find_single_omp_among_assignments): Likewise.
	(grid_find_ungridifiable_statement): Likewise.
	(grid_target_follows_gridifiable_pattern): Likewise.
	(grid_remap_prebody_decls): Likewise.
	(grid_copy_leading_local_assignments): Likewise.
	(grid_process_kernel_body_copy): Likewise.
	(grid_attempt_target_gridification): Likewise.
	(grid_gridify_all_targets_stmt): Likewise.
	(grid_gridify_all_targets): Likewise.
	(execute_lower_omp): Call grid_gridify_all_targets.
	(make_gimple_omp_edges): Handle GIMPLE_OMP_GRID_BODY.
	* tree-core.h (omp_clause_code): Added OMP_CLAUSE__GRIDDIM_.
	(tree_omp_clause): Added union field dimension.
	* tree-pretty-print.c (dump_omp_clause): Handle OMP_CLAUSE__GRIDDIM_.
	* tree.c (omp_clause_num_ops): Added number of arguments of
	OMP_CLAUSE__GRIDDIM_.
	(omp_clause_code_name): Added name of OMP_CLAUSE__GRIDDIM_.
	(walk_tree_1): Handle OMP_CLAUSE__GRIDDIM_.
	* tree.h (OMP_CLAUSE_GRIDDIM_DIMENSION): New.
	(OMP_CLAUSE_SET_GRIDDIM_DIMENSION): Likewise.
	(OMP_CLAUSE_GRIDDIM_SIZE): Likewise.
	(OMP_CLAUSE_GRIDDIM_GROUP): Likewise.

fortran/
	* types.def (BT_FN_VOID_UINT_PTR_INT_PTR): New.
	(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT): Removed.
	(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR): New.

diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index c3b6e13..7fab9f8 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -478,6 +478,8 @@ DEF_FUNCTION_TYPE_4 (BT_FN_BOOL_UINT_LONGPTR_LONGPTR_LONGPTR,
 DEF_FUNCTION_TYPE_4 (BT_FN_BOOL_UINT_ULLPTR_ULLPTR_ULLPTR,
 		     BT_BOOL, BT_UINT, BT_PTR_ULONGLONG, BT_PTR_ULONGLONG,
 		     BT_PTR_ULONGLONG)
+DEF_FUNCTION_TYPE_4 (BT_FN_VOID_UINT_PTR_INT_PTR, BT_VOID, BT_INT, BT_PTR,
+		     BT_INT, BT_PTR)
 
 DEF_FUNCTION_TYPE_5 (BT_FN_INT_STRING_INT_SIZE_CONST_STRING_VALIST_ARG,
 		     BT_INT, BT_STRING, BT_INT, BT_SIZE, BT_CONST_STRING,
@@ -555,10 +557,9 @@ DEF_FUNCTION_TYPE_9 (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT,
 		     BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR,
 		     BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG,
 		     BT_BOOL, BT_UINT, BT_PTR, BT_INT)
-
-DEF_FUNCTION_TYPE_10 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT,
-		      BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
-		      BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_INT, BT_INT)
+DEF_FUNCTION_TYPE_9 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR,
+		     BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
+		     BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_PTR)
 
 DEF_FUNCTION_TYPE_11 (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_UINT_LONG_INT_LONG_LONG_LONG,
 		      BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR,
diff --git a/gcc/fortran/types.def b/gcc/fortran/types.def
index 43a1b46..8780b76 100644
--- a/gcc/fortran/types.def
+++ b/gcc/fortran/types.def
@@ -159,6 +159,8 @@ DEF_FUNCTION_TYPE_4 (BT_FN_BOOL_UINT_LONGPTR_LONGPTR_LONGPTR,
 DEF_FUNCTION_TYPE_4 (BT_FN_BOOL_UINT_ULLPTR_ULLPTR_ULLPTR,
 		     BT_BOOL, BT_UINT, BT_PTR_ULONGLONG, BT_PTR_ULONGLONG,
 		     BT_PTR_ULONGLONG)
+DEF_FUNCTION_TYPE_4 (BT_FN_VOID_UINT_PTR_INT_PTR, BT_VOID, BT_INT, BT_PTR,
+		     BT_INT, BT_PTR)
 
 DEF_FUNCTION_TYPE_5 (BT_FN_VOID_OMPFN_PTR_UINT_UINT_UINT,
 		     BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR, BT_UINT, BT_UINT,
@@ -220,10 +222,9 @@ DEF_FUNCTION_TYPE_9 (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT,
 		     BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR,
 		     BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG,
 		     BT_BOOL, BT_UINT, BT_PTR, BT_INT)
-
-DEF_FUNCTION_TYPE_10 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT,
+DEF_FUNCTION_TYPE_9 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR,
 		      BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
-		      BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_INT, BT_INT)
+		      BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_PTR)
 
 DEF_FUNCTION_TYPE_11 (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_UINT_LONG_INT_LONG_LONG_LONG,
 		      BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR,
diff --git a/gcc/gimple-low.c b/gcc/gimple-low.c
index f9b0fba..eb90d48 100644
--- a/gcc/gimple-low.c
+++ b/gcc/gimple-low.c
@@ -358,6 +358,7 @@ lower_stmt (gimple_stmt_iterator *gsi, struct lower_data *data)
     case GIMPLE_OMP_TASK:
     case GIMPLE_OMP_TARGET:
     case GIMPLE_OMP_TEAMS:
+    case GIMPLE_OMP_GRID_BODY:
       data->cannot_fallthru = false;
       lower_omp_directive (gsi, data);
       data->cannot_fallthru = false;
diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index b26d7a7..e27214f 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -1187,6 +1187,9 @@ dump_gimple_omp_for (pretty_printer *buffer, gomp_for *gs, int spc, int flags)
 	case GF_OMP_FOR_KIND_CILKSIMD:
 	  pp_string (buffer, "#pragma simd");
 	  break;
+	case GF_OMP_FOR_KIND_GRID_LOOP:
+	  pp_string (buffer, "#pragma omp for grid_loop");
+	  break;
 	default:
 	  gcc_unreachable ();
 	}
@@ -1494,6 +1497,9 @@ dump_gimple_omp_block (pretty_printer *buffer, gimple *gs, int spc, int flags)
 	case GIMPLE_OMP_SECTION:
 	  pp_string (buffer, "#pragma omp section");
 	  break;
+	case GIMPLE_OMP_GRID_BODY:
+	  pp_string (buffer, "#pragma omp gridified body");
+	  break;
 	default:
 	  gcc_unreachable ();
 	}
@@ -2301,6 +2307,7 @@ pp_gimple_stmt_1 (pretty_printer *buffer, gimple *gs, int spc, int flags)
     case GIMPLE_OMP_MASTER:
     case GIMPLE_OMP_TASKGROUP:
     case GIMPLE_OMP_SECTION:
+    case GIMPLE_OMP_GRID_BODY:
       dump_gimple_omp_block (buffer, gs, spc, flags);
       break;
 
diff --git a/gcc/gimple-walk.c b/gcc/gimple-walk.c
index 6c0a46d..15cd842 100644
--- a/gcc/gimple-walk.c
+++ b/gcc/gimple-walk.c
@@ -655,6 +655,7 @@ walk_gimple_stmt (gimple_stmt_iterator *gsi, walk_stmt_fn callback_stmt,
     case GIMPLE_OMP_SINGLE:
     case GIMPLE_OMP_TARGET:
     case GIMPLE_OMP_TEAMS:
+    case GIMPLE_OMP_GRID_BODY:
       ret = walk_gimple_seq_mod (gimple_omp_body_ptr (stmt), callback_stmt,
 			     callback_op, wi);
       if (ret)
diff --git a/gcc/gimple.c b/gcc/gimple.c
index f312f90..60fe91c 100644
--- a/gcc/gimple.c
+++ b/gcc/gimple.c
@@ -954,6 +954,19 @@ gimple_build_omp_master (gimple_seq body)
   return p;
 }
 
+/* Build a GIMPLE_OMP_GRID_BODY statement.
+
+   BODY is the sequence of statements to be executed by the kernel.  */
+
+gimple *
+gimple_build_omp_grid_body (gimple_seq body)
+{
+  gimple *p = gimple_alloc (GIMPLE_OMP_GRID_BODY, 0);
+  if (body)
+    gimple_omp_set_body (p, body);
+
+  return p;
+}
 
 /* Build a GIMPLE_OMP_TASKGROUP statement.
 
@@ -1807,6 +1820,7 @@ gimple_copy (gimple *stmt)
 	case GIMPLE_OMP_SECTION:
 	case GIMPLE_OMP_MASTER:
 	case GIMPLE_OMP_TASKGROUP:
+	case GIMPLE_OMP_GRID_BODY:
 	copy_omp_body:
 	  new_seq = gimple_seq_copy (gimple_omp_body (stmt));
 	  gimple_omp_set_body (copy, new_seq);
diff --git a/gcc/gimple.def b/gcc/gimple.def
index db23dee..2ff22b8 100644
--- a/gcc/gimple.def
+++ b/gcc/gimple.def
@@ -376,6 +376,10 @@ DEFGSCODE(GIMPLE_OMP_TEAMS, "gimple_omp_teams", GSS_OMP_SINGLE_LAYOUT)
    CLAUSES is an OMP_CLAUSE chain holding the associated clauses.  */
 DEFGSCODE(GIMPLE_OMP_ORDERED, "gimple_omp_ordered", GSS_OMP_SINGLE_LAYOUT)
 
+/* GIMPLE_OMP_GRID_BODY <BODY> represents a parallel loop lowered for execution
+   on a GPU.  It is an artificial statement created by omp lowering.  */
+DEFGSCODE(GIMPLE_OMP_GRID_BODY, "gimple_omp_gpukernel", GSS_OMP)
+
 /* GIMPLE_PREDICT <PREDICT, OUTCOME> specifies a hint for branch prediction.
 
    PREDICT is one of the predictors from predict.def.
diff --git a/gcc/gimple.h b/gcc/gimple.h
index b30a9b8..7eef07c 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -146,6 +146,7 @@ enum gf_mask {
     GF_CALL_CTRL_ALTERING       = 1 << 7,
     GF_CALL_WITH_BOUNDS 	= 1 << 8,
     GF_OMP_PARALLEL_COMBINED	= 1 << 0,
+    GF_OMP_PARALLEL_GRID_PHONY = 1 << 1,
     GF_OMP_TASK_TASKLOOP	= 1 << 0,
     GF_OMP_FOR_KIND_MASK	= (1 << 4) - 1,
     GF_OMP_FOR_KIND_FOR		= 0,
@@ -153,12 +154,14 @@ enum gf_mask {
     GF_OMP_FOR_KIND_TASKLOOP	= 2,
     GF_OMP_FOR_KIND_CILKFOR     = 3,
     GF_OMP_FOR_KIND_OACC_LOOP	= 4,
+    GF_OMP_FOR_KIND_GRID_LOOP = 5,
     /* Flag for SIMD variants of OMP_FOR kinds.  */
     GF_OMP_FOR_SIMD		= 1 << 3,
     GF_OMP_FOR_KIND_SIMD	= GF_OMP_FOR_SIMD | 0,
     GF_OMP_FOR_KIND_CILKSIMD	= GF_OMP_FOR_SIMD | 1,
     GF_OMP_FOR_COMBINED		= 1 << 4,
     GF_OMP_FOR_COMBINED_INTO	= 1 << 5,
+    GF_OMP_FOR_GRID_PHONY	= 1 << 6,
     GF_OMP_TARGET_KIND_MASK	= (1 << 4) - 1,
     GF_OMP_TARGET_KIND_REGION	= 0,
     GF_OMP_TARGET_KIND_DATA	= 1,
@@ -172,6 +175,7 @@ enum gf_mask {
     GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA = 9,
     GF_OMP_TARGET_KIND_OACC_DECLARE = 10,
     GF_OMP_TARGET_KIND_OACC_HOST_DATA = 11,
+    GF_OMP_TEAMS_GRID_PHONY	= 1 << 0,
 
     /* True on an GIMPLE_OMP_RETURN statement if the return does not require
        a thread synchronization via some sort of barrier.  The exact barrier
@@ -726,14 +730,14 @@ struct GTY((tag("GSS_OMP_CONTINUE")))
   tree control_use;
 };
 
-/* GIMPLE_OMP_SINGLE, GIMPLE_OMP_TEAMS, GIMPLE_OMP_ORDERED */
+/* GIMPLE_OMP_SINGLE, GIMPLE_OMP_ORDERED */
 
 struct GTY((tag("GSS_OMP_SINGLE_LAYOUT")))
   gimple_statement_omp_single_layout : public gimple_statement_omp
 {
   /* [ WORD 1-7 ] : base class */
 
-  /* [ WORD 7 ]  */
+  /* [ WORD 8 ]  */
   tree clauses;
 };
 
@@ -1454,6 +1458,7 @@ gomp_task *gimple_build_omp_task (gimple_seq, tree, tree, tree, tree,
 				       tree, tree);
 gimple *gimple_build_omp_section (gimple_seq);
 gimple *gimple_build_omp_master (gimple_seq);
+gimple *gimple_build_omp_grid_body (gimple_seq);
 gimple *gimple_build_omp_taskgroup (gimple_seq);
 gomp_continue *gimple_build_omp_continue (tree, tree);
 gomp_ordered *gimple_build_omp_ordered (gimple_seq, tree);
@@ -1714,6 +1719,7 @@ gimple_has_substatements (gimple *g)
     case GIMPLE_OMP_CRITICAL:
     case GIMPLE_WITH_CLEANUP_EXPR:
     case GIMPLE_TRANSACTION:
+    case GIMPLE_OMP_GRID_BODY:
       return true;
 
     default:
@@ -5079,6 +5085,24 @@ gimple_omp_for_set_pre_body (gimple *gs, gimple_seq pre_body)
   omp_for_stmt->pre_body = pre_body;
 }
 
+/* Return the kernel_phony of OMP_FOR statement.  */
+
+static inline bool
+gimple_omp_for_grid_phony (const gomp_for *omp_for)
+{
+  return (gimple_omp_subcode (omp_for) & GF_OMP_FOR_GRID_PHONY) != 0;
+}
+
+/* Set kernel_phony flag of OMP_FOR to VALUE.  */
+
+static inline void
+gimple_omp_for_set_grid_phony (gomp_for *omp_for, bool value)
+{
+  if (value)
+    omp_for->subcode |= GF_OMP_FOR_GRID_PHONY;
+  else
+    omp_for->subcode &= ~GF_OMP_FOR_GRID_PHONY;
+}
 
 /* Return the clauses associated with OMP_PARALLEL GS.  */
 
@@ -5165,6 +5189,24 @@ gimple_omp_parallel_set_data_arg (gomp_parallel *omp_parallel_stmt,
   omp_parallel_stmt->data_arg = data_arg;
 }
 
+/* Return the kernel_phony flag of OMP_PARALLEL_STMT.  */
+
+static inline bool
+gimple_omp_parallel_grid_phony (const gomp_parallel *stmt)
+{
+  return (gimple_omp_subcode (stmt) & GF_OMP_PARALLEL_GRID_PHONY) != 0;
+}
+
+/* Set kernel_phony flag of OMP_PARALLEL_STMT to VALUE.  */
+
+static inline void
+gimple_omp_parallel_set_grid_phony (gomp_parallel *stmt, bool value)
+{
+  if (value)
+    stmt->subcode |= GF_OMP_PARALLEL_GRID_PHONY;
+  else
+    stmt->subcode &= ~GF_OMP_PARALLEL_GRID_PHONY;
+}
 
 /* Return the clauses associated with OMP_TASK GS.  */
 
@@ -5638,6 +5680,24 @@ gimple_omp_teams_set_clauses (gomp_teams *omp_teams_stmt, tree clauses)
   omp_teams_stmt->clauses = clauses;
 }
 
+/* Return the kernel_phony flag of an OMP_TEAMS_STMT.  */
+
+static inline bool
+gimple_omp_teams_grid_phony (const gomp_teams *omp_teams_stmt)
+{
+  return (gimple_omp_subcode (omp_teams_stmt) & GF_OMP_TEAMS_GRID_PHONY) != 0;
+}
+
+/* Set kernel_phony flag of an OMP_TEAMS_STMT to VALUE.  */
+
+static inline void
+gimple_omp_teams_set_grid_phony (gomp_teams *omp_teams_stmt, bool value)
+{
+  if (value)
+    omp_teams_stmt->subcode |= GF_OMP_TEAMS_GRID_PHONY;
+  else
+    omp_teams_stmt->subcode &= ~GF_OMP_TEAMS_GRID_PHONY;
+}
 
 /* Return the clauses associated with OMP_SECTIONS GS.  */
 
@@ -6002,7 +6062,8 @@ gimple_return_set_retbnd (gimple *gs, tree retval)
     case GIMPLE_OMP_RETURN:			\
     case GIMPLE_OMP_ATOMIC_LOAD:		\
     case GIMPLE_OMP_ATOMIC_STORE:		\
-    case GIMPLE_OMP_CONTINUE
+    case GIMPLE_OMP_CONTINUE:			\
+    case GIMPLE_OMP_GRID_BODY
 
 static inline bool
 is_gimple_omp (const gimple *stmt)
diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def
index 5529547..60199b0 100644
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -340,8 +340,13 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_SINGLE_COPY_START, "GOMP_single_copy_start",
 		  BT_FN_PTR, ATTR_NOTHROW_LEAF_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_SINGLE_COPY_END, "GOMP_single_copy_end",
 		  BT_FN_VOID_PTR, ATTR_NOTHROW_LEAF_LIST)
+DEF_GOMP_BUILTIN (BUILT_IN_GOMP_OFFLOAD_REGISTER, "GOMP_offload_register_ver",
+		  BT_FN_VOID_UINT_PTR_INT_PTR, ATTR_NOTHROW_LIST)
+DEF_GOMP_BUILTIN (BUILT_IN_GOMP_OFFLOAD_UNREGISTER,
+		  "GOMP_offload_unregister_ver",
+		  BT_FN_VOID_UINT_PTR_INT_PTR, ATTR_NOTHROW_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET, "GOMP_target_ext",
-		  BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT,
+		  BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR,
 		  ATTR_NOTHROW_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET_DATA, "GOMP_target_data_ext",
 		  BT_FN_VOID_INT_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST)
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 9dda946..8bc0e60 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -80,6 +80,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "lto-section-names.h"
 #include "gomp-constants.h"
 #include "gimple-pretty-print.h"
+#include "symbol-summary.h"
+#include "hsa.h"
+#include "params.h"
 
 /* Lowering of OMP parallel and workshare constructs proceeds in two
    phases.  The first phase scans the function looking for OMP statements
@@ -450,6 +453,63 @@ is_combined_parallel (struct omp_region *region)
   return region->is_combined_parallel;
 }
 
+/* Adjust *COND_CODE and *N2 so that the former is either LT_EXPR or
+   GT_EXPR.  */
+
+static void
+adjust_for_condition (location_t loc, enum tree_code *cond_code, tree *n2)
+{
+  switch (*cond_code)
+    {
+    case LT_EXPR:
+    case GT_EXPR:
+    case NE_EXPR:
+      break;
+    case LE_EXPR:
+      if (POINTER_TYPE_P (TREE_TYPE (*n2)))
+	*n2 = fold_build_pointer_plus_hwi_loc (loc, *n2, 1);
+      else
+	*n2 = fold_build2_loc (loc, PLUS_EXPR, TREE_TYPE (*n2), *n2,
+			       build_int_cst (TREE_TYPE (*n2), 1));
+      *cond_code = LT_EXPR;
+      break;
+    case GE_EXPR:
+      if (POINTER_TYPE_P (TREE_TYPE (*n2)))
+	*n2 = fold_build_pointer_plus_hwi_loc (loc, *n2, -1);
+      else
+	*n2 = fold_build2_loc (loc, MINUS_EXPR, TREE_TYPE (*n2), *n2,
+			       build_int_cst (TREE_TYPE (*n2), 1));
+      *cond_code = GT_EXPR;
+      break;
+    default:
+      gcc_unreachable ();
+    }
+}
+
+/* Return the looping step from INCR, extracted from the step of a gimple omp
+   for statement.  */
+
+static tree
+get_omp_for_step_from_incr (location_t loc, tree incr)
+{
+  tree step;
+  switch (TREE_CODE (incr))
+    {
+    case PLUS_EXPR:
+      step = TREE_OPERAND (incr, 1);
+      break;
+    case POINTER_PLUS_EXPR:
+      step = fold_convert (ssizetype, TREE_OPERAND (incr, 1));
+      break;
+    case MINUS_EXPR:
+      step = TREE_OPERAND (incr, 1);
+      step = fold_build1_loc (loc, NEGATE_EXPR, TREE_TYPE (step), step);
+      break;
+    default:
+      gcc_unreachable ();
+    }
+  return step;
+}
 
 /* Extract the header elements of parallel loop FOR_STMT and store
    them into *FD.  */
@@ -579,58 +639,14 @@ extract_omp_for_data (gomp_for *for_stmt, struct omp_for_data *fd,
 
       loop->cond_code = gimple_omp_for_cond (for_stmt, i);
       loop->n2 = gimple_omp_for_final (for_stmt, i);
-      switch (loop->cond_code)
-	{
-	case LT_EXPR:
-	case GT_EXPR:
-	  break;
-	case NE_EXPR:
-	  gcc_assert (gimple_omp_for_kind (for_stmt)
-		      == GF_OMP_FOR_KIND_CILKSIMD
-		      || (gimple_omp_for_kind (for_stmt)
-			  == GF_OMP_FOR_KIND_CILKFOR));
-	  break;
-	case LE_EXPR:
-	  if (POINTER_TYPE_P (TREE_TYPE (loop->n2)))
-	    loop->n2 = fold_build_pointer_plus_hwi_loc (loc, loop->n2, 1);
-	  else
-	    loop->n2 = fold_build2_loc (loc,
-				    PLUS_EXPR, TREE_TYPE (loop->n2), loop->n2,
-				    build_int_cst (TREE_TYPE (loop->n2), 1));
-	  loop->cond_code = LT_EXPR;
-	  break;
-	case GE_EXPR:
-	  if (POINTER_TYPE_P (TREE_TYPE (loop->n2)))
-	    loop->n2 = fold_build_pointer_plus_hwi_loc (loc, loop->n2, -1);
-	  else
-	    loop->n2 = fold_build2_loc (loc,
-				    MINUS_EXPR, TREE_TYPE (loop->n2), loop->n2,
-				    build_int_cst (TREE_TYPE (loop->n2), 1));
-	  loop->cond_code = GT_EXPR;
-	  break;
-	default:
-	  gcc_unreachable ();
-	}
+      gcc_assert (loop->cond_code != NE_EXPR
+		  || gimple_omp_for_kind (for_stmt) == GF_OMP_FOR_KIND_CILKSIMD
+		  || gimple_omp_for_kind (for_stmt) == GF_OMP_FOR_KIND_CILKFOR);
+      adjust_for_condition (loc, &loop->cond_code, &loop->n2);
 
       t = gimple_omp_for_incr (for_stmt, i);
       gcc_assert (TREE_OPERAND (t, 0) == var);
-      switch (TREE_CODE (t))
-	{
-	case PLUS_EXPR:
-	  loop->step = TREE_OPERAND (t, 1);
-	  break;
-	case POINTER_PLUS_EXPR:
-	  loop->step = fold_convert (ssizetype, TREE_OPERAND (t, 1));
-	  break;
-	case MINUS_EXPR:
-	  loop->step = TREE_OPERAND (t, 1);
-	  loop->step = fold_build1_loc (loc,
-				    NEGATE_EXPR, TREE_TYPE (loop->step),
-				    loop->step);
-	  break;
-	default:
-	  gcc_unreachable ();
-	}
+      loop->step = get_omp_for_step_from_incr (loc, t);
 
       if (simd
 	  || (fd->sched_kind == OMP_CLAUSE_SCHEDULE_STATIC
@@ -1321,7 +1337,16 @@ build_outer_var_ref (tree var, omp_context *ctx, bool lastprivate = false)
 	}
     }
   else if (ctx->outer)
-    x = lookup_decl (var, ctx->outer);
+    {
+      omp_context *outer = ctx->outer;
+      if (gimple_code (outer->stmt) == GIMPLE_OMP_GRID_BODY)
+	{
+	  outer = outer->outer;
+	  gcc_assert (outer
+		      && gimple_code (outer->stmt) != GIMPLE_OMP_GRID_BODY);
+	}
+	x = lookup_decl (var, outer);
+    }
   else if (is_reference (var))
     /* This can happen with orphaned constructs.  If var is reference, it is
        possible it is shared and as such valid.  */
@@ -1774,6 +1799,8 @@ fixup_child_record_type (omp_context *ctx)
 {
   tree f, type = ctx->record_type;
 
+  if (!ctx->receiver_decl)
+    return;
   /* ??? It isn't sufficient to just call remap_type here, because
      variably_modified_type_p doesn't work the way we expect for
      record types.  Testing each field for whether it needs remapping
@@ -2132,6 +2159,14 @@ scan_sharing_clauses (tree clauses, omp_context *ctx,
 	    }
 	  break;
 
+	case OMP_CLAUSE__GRIDDIM_:
+	  if (ctx->outer)
+	    {
+	      scan_omp_op (&OMP_CLAUSE__GRIDDIM__SIZE (c), ctx->outer);
+	      scan_omp_op (&OMP_CLAUSE__GRIDDIM__GROUP (c), ctx->outer);
+	    }
+	  break;
+
 	case OMP_CLAUSE_NOWAIT:
 	case OMP_CLAUSE_ORDERED:
 	case OMP_CLAUSE_COLLAPSE:
@@ -2327,6 +2362,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx,
 	case OMP_CLAUSE_INDEPENDENT:
 	case OMP_CLAUSE_AUTO:
 	case OMP_CLAUSE_SEQ:
+	case OMP_CLAUSE__GRIDDIM_:
 	  break;
 
 	case OMP_CLAUSE_DEVICE_RESIDENT:
@@ -2648,8 +2684,11 @@ scan_omp_parallel (gimple_stmt_iterator *gsi, omp_context *outer_ctx)
   DECL_NAMELESS (name) = 1;
   TYPE_NAME (ctx->record_type) = name;
   TYPE_ARTIFICIAL (ctx->record_type) = 1;
-  create_omp_child_function (ctx, false);
-  gimple_omp_parallel_set_child_fn (stmt, ctx->cb.dst_fn);
+  if (!gimple_omp_parallel_grid_phony (stmt))
+    {
+      create_omp_child_function (ctx, false);
+      gimple_omp_parallel_set_child_fn (stmt, ctx->cb.dst_fn);
+    }
 
   scan_sharing_clauses (gimple_omp_parallel_clauses (stmt), ctx);
   scan_omp (gimple_omp_body_ptr (stmt), ctx);
@@ -3189,6 +3228,11 @@ check_omp_nesting_restrictions (gimple *stmt, omp_context *ctx)
 {
   tree c;
 
+  if (ctx && gimple_code (ctx->stmt) == GIMPLE_OMP_GRID_BODY)
+    /* GRID_BODY is an artificial construct, nesting rules will be checked in
+       the original copy of its contents.  */
+    return true;
+
   /* No nesting of non-OpenACC STMT (that is, an OpenMP one, or a GOMP builtin)
      inside an OpenACC CTX.  */
   if (!(is_gimple_omp (stmt)
@@ -3777,7 +3821,11 @@ scan_omp_1_op (tree *tp, int *walk_subtrees, void *data)
     case LABEL_DECL:
     case RESULT_DECL:
       if (ctx)
-	*tp = remap_decl (t, &ctx->cb);
+	{
+	  tree repl = remap_decl (t, &ctx->cb);
+	  gcc_checking_assert (TREE_CODE (repl) != ERROR_MARK);
+	  *tp = repl;
+	}
       break;
 
     default:
@@ -3911,6 +3959,7 @@ scan_omp_1_stmt (gimple_stmt_iterator *gsi, bool *handled_ops_p,
     case GIMPLE_OMP_TASKGROUP:
     case GIMPLE_OMP_ORDERED:
     case GIMPLE_OMP_CRITICAL:
+    case GIMPLE_OMP_GRID_BODY:
       ctx = new_omp_context (stmt, ctx);
       scan_omp (gimple_omp_body_ptr (stmt), ctx);
       break;
@@ -6343,6 +6392,37 @@ gimple_build_cond_empty (tree cond)
   return gimple_build_cond (pred_code, lhs, rhs, NULL_TREE, NULL_TREE);
 }
 
+/* Return true if a parallel REGION is within a declare target function or
+   within a target region and is not a part of a gridified target.  */
+
+static bool
+parallel_needs_hsa_kernel_p (struct omp_region *region)
+{
+  bool indirect = false;
+  for (region = region->outer; region; region = region->outer)
+    {
+      if (region->type == GIMPLE_OMP_PARALLEL)
+	indirect = true;
+      else if (region->type == GIMPLE_OMP_TARGET)
+	{
+	  gomp_target *tgt_stmt
+	    = as_a <gomp_target *> (last_stmt (region->entry));
+
+	  if (find_omp_clause (gimple_omp_target_clauses (tgt_stmt),
+			       OMP_CLAUSE__GRIDDIM_))
+	    return indirect;
+	  else
+	    return true;
+	}
+    }
+
+  if (lookup_attribute ("omp declare target",
+			DECL_ATTRIBUTES (current_function_decl)))
+    return true;
+
+  return false;
+}
+
 static void expand_omp_build_assign (gimple_stmt_iterator *, tree, tree,
 				     bool = false);
 
@@ -6512,7 +6592,8 @@ expand_parallel_call (struct omp_region *region, basic_block bb,
     t1 = null_pointer_node;
   else
     t1 = build_fold_addr_expr (t);
-  t2 = build_fold_addr_expr (gimple_omp_parallel_child_fn (entry_stmt));
+  tree child_fndecl = gimple_omp_parallel_child_fn (entry_stmt);
+  t2 = build_fold_addr_expr (child_fndecl);
 
   vec_alloc (args, 4 + vec_safe_length (ws_args));
   args->quick_push (t2);
@@ -6527,6 +6608,13 @@ expand_parallel_call (struct omp_region *region, basic_block bb,
 
   force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
 			    false, GSI_CONTINUE_LINKING);
+
+  if (hsa_gen_requested_p ()
+      && parallel_needs_hsa_kernel_p (region))
+    {
+      cgraph_node *child_cnode = cgraph_node::get (child_fndecl);
+      hsa_register_kernel (child_cnode);
+    }
 }
 
 /* Insert a function call whose name is FUNC_NAME with the information from
@@ -12566,6 +12654,236 @@ mark_loops_in_oacc_kernels_region (basic_block region_entry,
     loop->in_oacc_kernels_region = true;
 }
 
+/* Types used to pass grid and wortkgroup sizes to kernel invocation.  */
+
+struct GTY(()) grid_launch_attributes_trees
+{
+  tree kernel_dim_array_type;
+  tree kernel_lattrs_dimnum_decl;
+  tree kernel_lattrs_grid_decl;
+  tree kernel_lattrs_group_decl;
+  tree kernel_launch_attributes_type;
+};
+
+static GTY(()) struct grid_launch_attributes_trees *grid_attr_trees;
+
+/* Create types used to pass kernel launch attributes to target.  */
+
+static void
+grid_create_kernel_launch_attr_types (void)
+{
+  if (grid_attr_trees)
+    return;
+  grid_attr_trees = ggc_alloc <grid_launch_attributes_trees> ();
+
+  tree dim_arr_index_type
+    = build_index_type (build_int_cst (integer_type_node, 2));
+  grid_attr_trees->kernel_dim_array_type
+    = build_array_type (uint32_type_node, dim_arr_index_type);
+
+  grid_attr_trees->kernel_launch_attributes_type = make_node (RECORD_TYPE);
+  grid_attr_trees->kernel_lattrs_dimnum_decl
+    = build_decl (BUILTINS_LOCATION, FIELD_DECL, get_identifier ("ndim"),
+		  uint32_type_node);
+  DECL_CHAIN (grid_attr_trees->kernel_lattrs_dimnum_decl) = NULL_TREE;
+
+  grid_attr_trees->kernel_lattrs_grid_decl
+    = build_decl (BUILTINS_LOCATION, FIELD_DECL, get_identifier ("grid_size"),
+		  grid_attr_trees->kernel_dim_array_type);
+  DECL_CHAIN (grid_attr_trees->kernel_lattrs_grid_decl)
+    = grid_attr_trees->kernel_lattrs_dimnum_decl;
+  grid_attr_trees->kernel_lattrs_group_decl
+    = build_decl (BUILTINS_LOCATION, FIELD_DECL, get_identifier ("group_size"),
+		  grid_attr_trees->kernel_dim_array_type);
+  DECL_CHAIN (grid_attr_trees->kernel_lattrs_group_decl)
+    = grid_attr_trees->kernel_lattrs_grid_decl;
+  finish_builtin_struct (grid_attr_trees->kernel_launch_attributes_type,
+			 "__gomp_kernel_launch_attributes",
+			 grid_attr_trees->kernel_lattrs_group_decl, NULL_TREE);
+}
+
+/* Insert before the current statement in GSI a store of VALUE to INDEX of
+   array (of type kernel_dim_array_type) FLD_DECL of RANGE_VAR.  VALUE must be
+   of type uint32_type_node.  */
+
+static void
+grid_insert_store_range_dim (gimple_stmt_iterator *gsi, tree range_var,
+			     tree fld_decl, int index, tree value)
+{
+  tree ref = build4 (ARRAY_REF, uint32_type_node,
+		     build3 (COMPONENT_REF,
+			     grid_attr_trees->kernel_dim_array_type,
+			     range_var, fld_decl, NULL_TREE),
+		     build_int_cst (integer_type_node, index),
+		     NULL_TREE, NULL_TREE);
+  gsi_insert_before (gsi, gimple_build_assign (ref, value), GSI_SAME_STMT);
+}
+
+/* Return a tree representation of a pointer to a structure with grid and
+   work-group size information.  Statements filling that information will be
+   inserted before GSI, TGT_STMT is the target statement which has the
+   necessary information in it.  */
+
+static tree
+grid_get_kernel_launch_attributes (gimple_stmt_iterator *gsi,
+				   gomp_target *tgt_stmt)
+{
+  grid_create_kernel_launch_attr_types ();
+  tree u32_one = build_one_cst (uint32_type_node);
+  tree lattrs = create_tmp_var (grid_attr_trees->kernel_launch_attributes_type,
+				"__kernel_launch_attrs");
+
+  unsigned max_dim = 0;
+  for (tree clause = gimple_omp_target_clauses (tgt_stmt);
+       clause;
+       clause = OMP_CLAUSE_CHAIN (clause))
+    {
+      if (OMP_CLAUSE_CODE (clause) != OMP_CLAUSE__GRIDDIM_)
+	continue;
+
+      unsigned dim = OMP_CLAUSE_GRIDDIM_DIMENSION (clause);
+      max_dim = MAX (dim, max_dim);
+
+      grid_insert_store_range_dim (gsi, lattrs,
+				   grid_attr_trees->kernel_lattrs_grid_decl,
+				   dim, OMP_CLAUSE__GRIDDIM__SIZE (clause));
+      grid_insert_store_range_dim (gsi, lattrs,
+				   grid_attr_trees->kernel_lattrs_group_decl,
+				   dim, OMP_CLAUSE__GRIDDIM__GROUP (clause));
+    }
+
+  tree dimref = build3 (COMPONENT_REF, uint32_type_node, lattrs,
+			grid_attr_trees->kernel_lattrs_dimnum_decl, NULL_TREE);
+  /* At this moment we cannot gridify a loop with a collapse clause.  */
+  /* TODO: Adjust when we support bigger collapse.  */
+  gcc_assert (max_dim == 0);
+  gsi_insert_before (gsi, gimple_build_assign (dimref, u32_one), GSI_SAME_STMT);
+  TREE_ADDRESSABLE (lattrs) = 1;
+  return build_fold_addr_expr (lattrs);
+}
+
+/* Build target argument identifier from the DEVICE identifier, value
+   identifier ID and whether the element also has a SUBSEQUENT_PARAM.  */
+
+static tree
+get_target_argument_identifier_1 (int device, bool subseqent_param, int id)
+{
+  tree t = build_int_cst (integer_type_node, device);
+  if (subseqent_param)
+    t = fold_build2 (BIT_IOR_EXPR, integer_type_node, t,
+		     build_int_cst (integer_type_node,
+				    GOMP_TARGET_ARG_SUBSEQUENT_PARAM));
+  t = fold_build2 (BIT_IOR_EXPR, integer_type_node, t,
+		   build_int_cst (integer_type_node, id));
+  return t;
+}
+
+/* Like above but return it in type that can be directly stored as an element
+   of the argument array.  */
+
+static tree
+get_target_argument_identifier (int device, bool subseqent_param, int id)
+{
+  tree t = get_target_argument_identifier_1 (device, subseqent_param, id);
+  return fold_convert (ptr_type_node, t);
+}
+
+/* Return a target argument consisiting of DEVICE identifier, value identifier
+   ID, and the actual VALUE.  */
+
+static tree
+get_target_argument_value (gimple_stmt_iterator *gsi, int device, int id,
+			   tree value)
+{
+  tree t = fold_build2 (LSHIFT_EXPR, integer_type_node,
+			fold_convert (integer_type_node, value),
+			build_int_cst (unsigned_type_node,
+				       GOMP_TARGET_ARG_VALUE_SHIFT));
+  t = fold_build2 (BIT_IOR_EXPR, integer_type_node, t,
+		   get_target_argument_identifier_1 (device, false, id));
+  t = fold_convert (ptr_type_node, t);
+  return force_gimple_operand_gsi (gsi, t, true, NULL, true, GSI_SAME_STMT);
+}
+
+/* If VALUE is an integer constant greater than -2^15 and smaller than 2^15,
+   push one argument to ARGS with bot the DEVICE, ID and VALUE embeded in it,
+   otherwise push an iedntifier (with DEVICE and ID) and the VALUE in two
+   arguments.  */
+
+static void
+push_target_argument_according_to_value (gimple_stmt_iterator *gsi, int device,
+					 int id, tree value, vec <tree> *args)
+{
+  if (tree_fits_shwi_p (value)
+      && tree_to_shwi (value) > -(1 << 15)
+      && tree_to_shwi (value) < (1 << 15))
+    args->quick_push (get_target_argument_value (gsi, device, id, value));
+  else
+    {
+      args->quick_push (get_target_argument_identifier (device, true, id));
+      value = fold_convert (ptr_type_node, value);
+      value = force_gimple_operand_gsi (gsi, value, true, NULL, true,
+					GSI_SAME_STMT);
+      args->quick_push (value);
+    }
+}
+
+/* Create an array of arguments that is then passed to GOMP_target.   */
+
+static tree
+get_target_arguments (gimple_stmt_iterator *gsi, gomp_target *tgt_stmt)
+{
+  auto_vec <tree, 6> args;
+  tree clauses = gimple_omp_target_clauses (tgt_stmt);
+  tree t, c = find_omp_clause (clauses, OMP_CLAUSE_NUM_TEAMS);
+  if (c)
+    t = OMP_CLAUSE_NUM_TEAMS_EXPR (c);
+  else
+    t = integer_minus_one_node;
+  push_target_argument_according_to_value (gsi, GOMP_TARGET_ARG_DEVICE_ALL,
+					   GOMP_TARGET_ARG_NUM_TEAMS, t, &args);
+
+  c = find_omp_clause (clauses, OMP_CLAUSE_THREAD_LIMIT);
+  if (c)
+    t = OMP_CLAUSE_THREAD_LIMIT_EXPR (c);
+  else
+    t = integer_minus_one_node;
+  push_target_argument_according_to_value (gsi, GOMP_TARGET_ARG_DEVICE_ALL,
+					   GOMP_TARGET_ARG_THREAD_LIMIT, t,
+					   &args);
+
+  /* Add HSA-specific grid sizes, if available.  */
+  if (find_omp_clause (gimple_omp_target_clauses (tgt_stmt),
+		       OMP_CLAUSE__GRIDDIM_))
+    {
+      t = get_target_argument_identifier (GOMP_DEVICE_HSA, true,
+					  GOMP_TARGET_ARG_HSA_KERNEL_ATTRIBUTES);
+      args.quick_push (t);
+      args.quick_push (grid_get_kernel_launch_attributes (gsi, tgt_stmt));
+    }
+
+  /* Produce more, perhaps device specific, arguments here.  */
+
+  tree argarray = create_tmp_var (build_array_type_nelts (ptr_type_node,
+							  args.length () + 1),
+				  ".omp_target_args");
+  for (unsigned i = 0; i < args.length (); i++)
+    {
+      tree ref = build4 (ARRAY_REF, ptr_type_node, argarray,
+			 build_int_cst (integer_type_node, i),
+			 NULL_TREE, NULL_TREE);
+      gsi_insert_before (gsi, gimple_build_assign (ref, args[i]),
+			 GSI_SAME_STMT);
+    }
+  tree ref = build4 (ARRAY_REF, ptr_type_node, argarray,
+		     build_int_cst (integer_type_node, args.length ()),
+		     NULL_TREE, NULL_TREE);
+  gsi_insert_before (gsi, gimple_build_assign (ref, null_pointer_node),
+		     GSI_SAME_STMT);
+  TREE_ADDRESSABLE (argarray) = 1;
+  return build_fold_addr_expr (argarray);
+}
+
 /* Expand the GIMPLE_OMP_TARGET starting at REGION.  */
 
 static void
@@ -12983,30 +13301,7 @@ expand_omp_target (struct omp_region *region)
 	depend = build_int_cst (ptr_type_node, 0);
       args.quick_push (depend);
       if (start_ix == BUILT_IN_GOMP_TARGET)
-	{
-	  c = find_omp_clause (clauses, OMP_CLAUSE_NUM_TEAMS);
-	  if (c)
-	    {
-	      t = fold_convert (integer_type_node,
-				OMP_CLAUSE_NUM_TEAMS_EXPR (c));
-	      t = force_gimple_operand_gsi (&gsi, t, true, NULL,
-					    true, GSI_SAME_STMT);
-	    }
-	  else
-	    t = integer_minus_one_node;
-	  args.quick_push (t);
-	  c = find_omp_clause (clauses, OMP_CLAUSE_THREAD_LIMIT);
-	  if (c)
-	    {
-	      t = fold_convert (integer_type_node,
-				OMP_CLAUSE_THREAD_LIMIT_EXPR (c));
-	      t = force_gimple_operand_gsi (&gsi, t, true, NULL,
-					    true, GSI_SAME_STMT);
-	    }
-	  else
-	    t = integer_minus_one_node;
-	  args.quick_push (t);
-	}
+	args.quick_push (get_target_arguments (&gsi, entry_stmt));
       break;
     case BUILT_IN_GOACC_PARALLEL:
       {
@@ -13110,90 +13405,343 @@ expand_omp_target (struct omp_region *region)
     }
 }
 
-
-/* Expand the parallel region tree rooted at REGION.  Expansion
-   proceeds in depth-first order.  Innermost regions are expanded
-   first.  This way, parallel regions that require a new function to
-   be created (e.g., GIMPLE_OMP_PARALLEL) can be expanded without having any
-   internal dependencies in their body.  */
+/* Expand KFOR loop as a GPGPU kernel, i.e. as a body only with iteration
+   variable derived from the thread number.  */
 
 static void
-expand_omp (struct omp_region *region)
+grid_expand_omp_for_loop (struct omp_region *kfor)
 {
-  omp_any_child_fn_dumped = false;
-  while (region)
-    {
-      location_t saved_location;
-      gimple *inner_stmt = NULL;
+  tree t, threadid;
+  tree type, itype;
+  gimple_stmt_iterator gsi;
+  tree n1, step;
+  struct omp_for_data fd;
 
-      /* First, determine whether this is a combined parallel+workshare
-       	 region.  */
-      if (region->type == GIMPLE_OMP_PARALLEL)
-	determine_parallel_type (region);
+  gomp_for *for_stmt = as_a <gomp_for *> (last_stmt (kfor->entry));
+  gcc_checking_assert (gimple_omp_for_kind (for_stmt)
+		       == GF_OMP_FOR_KIND_GRID_LOOP);
+  basic_block body_bb = FALLTHRU_EDGE (kfor->entry)->dest;
 
-      if (region->type == GIMPLE_OMP_FOR
-	  && gimple_omp_for_combined_p (last_stmt (region->entry)))
-	inner_stmt = last_stmt (region->inner->entry);
+  gcc_assert (gimple_omp_for_collapse (for_stmt) == 1);
+  gcc_assert (kfor->cont);
+  extract_omp_for_data (for_stmt, &fd, NULL);
 
-      if (region->inner)
-	expand_omp (region->inner);
+  itype = type = TREE_TYPE (fd.loop.v);
+  if (POINTER_TYPE_P (type))
+    itype = signed_type_for (type);
 
-      saved_location = input_location;
-      if (gimple_has_location (last_stmt (region->entry)))
-	input_location = gimple_location (last_stmt (region->entry));
+  gsi = gsi_start_bb (body_bb);
 
-      switch (region->type)
-	{
-	case GIMPLE_OMP_PARALLEL:
-	case GIMPLE_OMP_TASK:
-	  expand_omp_taskreg (region);
-	  break;
+  n1 = fd.loop.n1;
+  step = fd.loop.step;
+  n1 = force_gimple_operand_gsi (&gsi, fold_convert (type, n1),
+				 true, NULL_TREE, true, GSI_SAME_STMT);
+  step = force_gimple_operand_gsi (&gsi, fold_convert (itype, step),
+				   true, NULL_TREE, true, GSI_SAME_STMT);
+  threadid = build_call_expr (builtin_decl_explicit
+			      (BUILT_IN_OMP_GET_THREAD_NUM), 0);
+  threadid = fold_convert (itype, threadid);
+  threadid = force_gimple_operand_gsi (&gsi, threadid, true, NULL_TREE,
+				       true, GSI_SAME_STMT);
 
-	case GIMPLE_OMP_FOR:
-	  expand_omp_for (region, inner_stmt);
-	  break;
+  tree startvar = fd.loop.v;
+  t = fold_build2 (MULT_EXPR, itype, threadid, step);
+  if (POINTER_TYPE_P (type))
+    t = fold_build_pointer_plus (n1, t);
+  else
+    t = fold_build2 (PLUS_EXPR, type, t, n1);
+  t = fold_convert (type, t);
+  t = force_gimple_operand_gsi (&gsi, t,
+				DECL_P (startvar)
+				&& TREE_ADDRESSABLE (startvar),
+				NULL_TREE, true, GSI_SAME_STMT);
+  gassign *assign_stmt = gimple_build_assign (startvar, t);
+  gsi_insert_before (&gsi, assign_stmt, GSI_SAME_STMT);
 
-	case GIMPLE_OMP_SECTIONS:
-	  expand_omp_sections (region);
-	  break;
+  /* Remove the omp for statement */
+  gsi = gsi_last_bb (kfor->entry);
+  gsi_remove (&gsi, true);
 
-	case GIMPLE_OMP_SECTION:
-	  /* Individual omp sections are handled together with their
-	     parent GIMPLE_OMP_SECTIONS region.  */
-	  break;
+  /* Remove the GIMPLE_OMP_CONTINUE statement.  */
+  gsi = gsi_last_bb (kfor->cont);
+  gcc_assert (!gsi_end_p (gsi)
+	      && gimple_code (gsi_stmt (gsi)) == GIMPLE_OMP_CONTINUE);
+  gsi_remove (&gsi, true);
 
-	case GIMPLE_OMP_SINGLE:
-	  expand_omp_single (region);
-	  break;
+  /* Replace the GIMPLE_OMP_RETURN with a real return.  */
+  gsi = gsi_last_bb (kfor->exit);
+  gcc_assert (!gsi_end_p (gsi)
+	      && gimple_code (gsi_stmt (gsi)) == GIMPLE_OMP_RETURN);
+  gsi_remove (&gsi, true);
 
-	case GIMPLE_OMP_ORDERED:
-	  {
-	    gomp_ordered *ord_stmt
-	      = as_a <gomp_ordered *> (last_stmt (region->entry));
-	    if (find_omp_clause (gimple_omp_ordered_clauses (ord_stmt),
-				 OMP_CLAUSE_DEPEND))
-	      {
-		/* We'll expand these when expanding corresponding
-		   worksharing region with ordered(n) clause.  */
-		gcc_assert (region->outer
-			    && region->outer->type == GIMPLE_OMP_FOR);
-		region->ord_stmt = ord_stmt;
-		break;
-	      }
-	  }
-	  /* FALLTHRU */
-	case GIMPLE_OMP_MASTER:
-	case GIMPLE_OMP_TASKGROUP:
-	case GIMPLE_OMP_CRITICAL:
-	case GIMPLE_OMP_TEAMS:
-	  expand_omp_synch (region);
-	  break;
+  /* Fixup the much simpler CFG.  */
+  remove_edge (find_edge (kfor->cont, body_bb));
 
-	case GIMPLE_OMP_ATOMIC_LOAD:
-	  expand_omp_atomic (region);
-	  break;
+  if (kfor->cont != body_bb)
+    set_immediate_dominator (CDI_DOMINATORS, kfor->cont, body_bb);
+  set_immediate_dominator (CDI_DOMINATORS, kfor->exit, kfor->cont);
+}
 
-	case GIMPLE_OMP_TARGET:
+/* Structure passed to grid_remap_kernel_arg_accesses so that it can remap
+   argument_decls.  */
+
+struct grid_arg_decl_map
+{
+  tree old_arg;
+  tree new_arg;
+};
+
+/* Invoked through walk_gimple_op, will remap all PARM_DECLs to the ones
+   pertaining to kernel function.  */
+
+static tree
+grid_remap_kernel_arg_accesses (tree *tp, int *walk_subtrees, void *data)
+{
+  struct walk_stmt_info *wi = (struct walk_stmt_info *) data;
+  struct grid_arg_decl_map *adm = (struct grid_arg_decl_map *) wi->info;
+  tree t = *tp;
+
+  if (t == adm->old_arg)
+    *tp = adm->new_arg;
+  *walk_subtrees = !TYPE_P (t) && !DECL_P (t);
+  return NULL_TREE;
+}
+
+static void expand_omp (struct omp_region *region);
+
+/* If TARGET region contains a kernel body for loop, remove its region from the
+   TARGET and expand it in GPGPU kernel fashion. */
+
+static void
+grid_expand_target_grid_body (struct omp_region *target)
+{
+  if (!hsa_gen_requested_p ())
+    return;
+
+  gomp_target *tgt_stmt = as_a <gomp_target *> (last_stmt (target->entry));
+  struct omp_region **pp;
+
+  for (pp = &target->inner; *pp; pp = &(*pp)->next)
+    if ((*pp)->type == GIMPLE_OMP_GRID_BODY)
+      break;
+
+  struct omp_region *gpukernel = *pp;
+
+  tree orig_child_fndecl = gimple_omp_target_child_fn (tgt_stmt);
+  if (!gpukernel)
+    {
+      /* HSA cannot handle OACC stuff.  */
+      if (gimple_omp_target_kind (tgt_stmt) != GF_OMP_TARGET_KIND_REGION)
+	return;
+      gcc_checking_assert (orig_child_fndecl);
+      gcc_assert (!find_omp_clause (gimple_omp_target_clauses (tgt_stmt),
+				    OMP_CLAUSE__GRIDDIM_));
+      cgraph_node *n = cgraph_node::get (orig_child_fndecl);
+
+      hsa_register_kernel (n);
+      return;
+    }
+
+  gcc_assert (find_omp_clause (gimple_omp_target_clauses (tgt_stmt),
+			       OMP_CLAUSE__GRIDDIM_));
+  tree inside_block = gimple_block (first_stmt (single_succ (gpukernel->entry)));
+  *pp = gpukernel->next;
+  for (pp = &gpukernel->inner; *pp; pp = &(*pp)->next)
+    if ((*pp)->type == GIMPLE_OMP_FOR)
+      break;
+
+  struct omp_region *kfor = *pp;
+  gcc_assert (kfor);
+  gcc_assert (gimple_omp_for_kind (last_stmt ((kfor)->entry))
+	      == GF_OMP_FOR_KIND_GRID_LOOP);
+  *pp = kfor->next;
+  if (kfor->inner)
+    expand_omp (kfor->inner);
+  if (gpukernel->inner)
+    expand_omp (gpukernel->inner);
+
+  tree kern_fndecl = copy_node (orig_child_fndecl);
+  DECL_NAME (kern_fndecl) = clone_function_name (kern_fndecl, "kernel");
+  SET_DECL_ASSEMBLER_NAME (kern_fndecl, DECL_NAME (kern_fndecl));
+  tree tgtblock = gimple_block (tgt_stmt);
+  tree fniniblock = make_node (BLOCK);
+  BLOCK_ABSTRACT_ORIGIN (fniniblock) = tgtblock;
+  BLOCK_SOURCE_LOCATION (fniniblock) = BLOCK_SOURCE_LOCATION (tgtblock);
+  BLOCK_SOURCE_END_LOCATION (fniniblock) = BLOCK_SOURCE_END_LOCATION (tgtblock);
+  DECL_INITIAL (kern_fndecl) = fniniblock;
+  push_struct_function (kern_fndecl);
+  cfun->function_end_locus = gimple_location (tgt_stmt);
+  pop_cfun ();
+
+  tree old_parm_decl = DECL_ARGUMENTS (kern_fndecl);
+  gcc_assert (!DECL_CHAIN (old_parm_decl));
+  tree new_parm_decl = copy_node (DECL_ARGUMENTS (kern_fndecl));
+  DECL_CONTEXT (new_parm_decl) = kern_fndecl;
+  DECL_ARGUMENTS (kern_fndecl) = new_parm_decl;
+  struct function *kern_cfun = DECL_STRUCT_FUNCTION (kern_fndecl);
+  kern_cfun->curr_properties = cfun->curr_properties;
+
+  remove_edge (BRANCH_EDGE (kfor->entry));
+  grid_expand_omp_for_loop (kfor);
+
+  /* Remove the omp for statement */
+  gimple_stmt_iterator gsi = gsi_last_bb (gpukernel->entry);
+  gsi_remove (&gsi, true);
+  /* Replace the GIMPLE_OMP_RETURN at the end of the kernel region with a real
+     return.  */
+  gsi = gsi_last_bb (gpukernel->exit);
+  gcc_assert (!gsi_end_p (gsi)
+	      && gimple_code (gsi_stmt (gsi)) == GIMPLE_OMP_RETURN);
+  gimple *ret_stmt = gimple_build_return (NULL);
+  gsi_insert_after (&gsi, ret_stmt, GSI_SAME_STMT);
+  gsi_remove (&gsi, true);
+
+  /* Statements in the first BB in the target construct have been produced by
+     target lowering and must be copied inside the GPUKERNEL, with the two
+     exceptions of the first OMP statement and the OMP_DATA assignment
+     statement.  */
+  gsi = gsi_start_bb (single_succ (gpukernel->entry));
+  tree data_arg = gimple_omp_target_data_arg (tgt_stmt);
+  tree sender = data_arg ? TREE_VEC_ELT (data_arg, 0) : NULL;
+  for (gimple_stmt_iterator tsi = gsi_start_bb (single_succ (target->entry));
+       !gsi_end_p (tsi); gsi_next (&tsi))
+    {
+      gimple *stmt = gsi_stmt (tsi);
+      if (is_gimple_omp (stmt))
+	break;
+      if (sender
+	  && is_gimple_assign (stmt)
+	  && TREE_CODE (gimple_assign_rhs1 (stmt)) == ADDR_EXPR
+	  && TREE_OPERAND (gimple_assign_rhs1 (stmt), 0) == sender)
+	continue;
+      gimple *copy = gimple_copy (stmt);
+      gsi_insert_before (&gsi, copy, GSI_SAME_STMT);
+      gimple_set_block (copy, fniniblock);
+    }
+
+  move_sese_region_to_fn (kern_cfun, single_succ (gpukernel->entry),
+			  gpukernel->exit, inside_block);
+
+  cgraph_node *kcn = cgraph_node::get_create (kern_fndecl);
+  kcn->mark_force_output ();
+  cgraph_node *orig_child = cgraph_node::get (orig_child_fndecl);
+
+  hsa_register_kernel (kcn, orig_child);
+
+  cgraph_node::add_new_function (kern_fndecl, true);
+  push_cfun (kern_cfun);
+  cgraph_edge::rebuild_edges ();
+
+  /* Re-map any mention of the PARM_DECL of the original function to the
+     PARM_DECL of the new one.
+
+     TODO: It would be great if lowering produced references into the GPU
+     kernel decl straight away and we did not have to do this.  */
+  struct grid_arg_decl_map adm;
+  adm.old_arg = old_parm_decl;
+  adm.new_arg = new_parm_decl;
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, kern_cfun)
+    {
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  gimple *stmt = gsi_stmt (gsi);
+	  struct walk_stmt_info wi;
+	  memset (&wi, 0, sizeof (wi));
+	  wi.info = &adm;
+	  walk_gimple_op (stmt, grid_remap_kernel_arg_accesses, &wi);
+	}
+    }
+  pop_cfun ();
+
+  return;
+}
+
+/* Expand the parallel region tree rooted at REGION.  Expansion
+   proceeds in depth-first order.  Innermost regions are expanded
+   first.  This way, parallel regions that require a new function to
+   be created (e.g., GIMPLE_OMP_PARALLEL) can be expanded without having any
+   internal dependencies in their body.  */
+
+static void
+expand_omp (struct omp_region *region)
+{
+  omp_any_child_fn_dumped = false;
+  while (region)
+    {
+      location_t saved_location;
+      gimple *inner_stmt = NULL;
+
+      /* First, determine whether this is a combined parallel+workshare
+       	 region.  */
+      if (region->type == GIMPLE_OMP_PARALLEL)
+	determine_parallel_type (region);
+      else if (region->type == GIMPLE_OMP_TARGET)
+	grid_expand_target_grid_body (region);
+
+      if (region->type == GIMPLE_OMP_FOR
+	  && gimple_omp_for_combined_p (last_stmt (region->entry)))
+	inner_stmt = last_stmt (region->inner->entry);
+
+      if (region->inner)
+	expand_omp (region->inner);
+
+      saved_location = input_location;
+      if (gimple_has_location (last_stmt (region->entry)))
+	input_location = gimple_location (last_stmt (region->entry));
+
+      switch (region->type)
+	{
+	case GIMPLE_OMP_PARALLEL:
+	case GIMPLE_OMP_TASK:
+	  expand_omp_taskreg (region);
+	  break;
+
+	case GIMPLE_OMP_FOR:
+	  expand_omp_for (region, inner_stmt);
+	  break;
+
+	case GIMPLE_OMP_SECTIONS:
+	  expand_omp_sections (region);
+	  break;
+
+	case GIMPLE_OMP_SECTION:
+	  /* Individual omp sections are handled together with their
+	     parent GIMPLE_OMP_SECTIONS region.  */
+	  break;
+
+	case GIMPLE_OMP_SINGLE:
+	  expand_omp_single (region);
+	  break;
+
+	case GIMPLE_OMP_ORDERED:
+	  {
+	    gomp_ordered *ord_stmt
+	      = as_a <gomp_ordered *> (last_stmt (region->entry));
+	    if (find_omp_clause (gimple_omp_ordered_clauses (ord_stmt),
+				 OMP_CLAUSE_DEPEND))
+	      {
+		/* We'll expand these when expanding corresponding
+		   worksharing region with ordered(n) clause.  */
+		gcc_assert (region->outer
+			    && region->outer->type == GIMPLE_OMP_FOR);
+		region->ord_stmt = ord_stmt;
+		break;
+	      }
+	  }
+	  /* FALLTHRU */
+	case GIMPLE_OMP_MASTER:
+	case GIMPLE_OMP_TASKGROUP:
+	case GIMPLE_OMP_CRITICAL:
+	case GIMPLE_OMP_TEAMS:
+	  expand_omp_synch (region);
+	  break;
+
+	case GIMPLE_OMP_ATOMIC_LOAD:
+	  expand_omp_atomic (region);
+	  break;
+
+	case GIMPLE_OMP_TARGET:
 	  expand_omp_target (region);
 	  break;
 
@@ -14508,11 +15056,13 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 						ctx);
 	}
 
-  gimple_seq_add_stmt (&body, stmt);
+  if (!gimple_omp_for_grid_phony (stmt))
+    gimple_seq_add_stmt (&body, stmt);
   gimple_seq_add_seq (&body, gimple_omp_body (stmt));
 
-  gimple_seq_add_stmt (&body, gimple_build_omp_continue (fd.loop.v,
-							 fd.loop.v));
+  if (!gimple_omp_for_grid_phony (stmt))
+    gimple_seq_add_stmt (&body, gimple_build_omp_continue (fd.loop.v,
+							   fd.loop.v));
 
   /* After the loop, add exit clauses.  */
   lower_reduction_clauses (gimple_omp_for_clauses (stmt), &body, ctx);
@@ -14524,9 +15074,12 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   body = maybe_catch_exception (body);
 
-  /* Region exit marker goes at the end of the loop body.  */
-  gimple_seq_add_stmt (&body, gimple_build_omp_return (fd.have_nowait));
-  maybe_add_implicit_barrier_cancel (ctx, &body);
+  if (!gimple_omp_for_grid_phony (stmt))
+    {
+      /* Region exit marker goes at the end of the loop body.  */
+      gimple_seq_add_stmt (&body, gimple_build_omp_return (fd.have_nowait));
+      maybe_add_implicit_barrier_cancel (ctx, &body);
+    }
 
   /* Add OpenACC joining and reduction markers just after the loop.  */
   if (oacc_tail)
@@ -14969,6 +15522,14 @@ lower_omp_taskreg (gimple_stmt_iterator *gsi_p, omp_context *ctx)
   par_olist = NULL;
   par_ilist = NULL;
   par_rlist = NULL;
+  bool phony_construct = gimple_code (stmt) == GIMPLE_OMP_PARALLEL
+    && gimple_omp_parallel_grid_phony (as_a <gomp_parallel *> (stmt));
+  if (phony_construct && ctx->record_type)
+    {
+      gcc_checking_assert (!ctx->receiver_decl);
+      ctx->receiver_decl = create_tmp_var
+	(build_reference_type (ctx->record_type), ".omp_rec");
+    }
   lower_rec_input_clauses (clauses, &par_ilist, &par_olist, ctx, NULL);
   lower_omp (&par_body, ctx);
   if (gimple_code (stmt) == GIMPLE_OMP_PARALLEL)
@@ -15027,13 +15588,19 @@ lower_omp_taskreg (gimple_stmt_iterator *gsi_p, omp_context *ctx)
     gimple_seq_add_stmt (&new_body,
 			 gimple_build_omp_continue (integer_zero_node,
 						    integer_zero_node));
-  gimple_seq_add_stmt (&new_body, gimple_build_omp_return (false));
-  gimple_omp_set_body (stmt, new_body);
+  if (!phony_construct)
+    {
+      gimple_seq_add_stmt (&new_body, gimple_build_omp_return (false));
+      gimple_omp_set_body (stmt, new_body);
+    }
 
   bind = gimple_build_bind (NULL, NULL, gimple_bind_block (par_bind));
   gsi_replace (gsi_p, dep_bind ? dep_bind : bind, true);
   gimple_bind_add_seq (bind, ilist);
-  gimple_bind_add_stmt (bind, stmt);
+  if (!phony_construct)
+    gimple_bind_add_stmt (bind, stmt);
+  else
+    gimple_bind_add_seq (bind, new_body);
   gimple_bind_add_seq (bind, olist);
 
   pop_gimplify_context (NULL);
@@ -16166,19 +16733,22 @@ lower_omp_teams (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 			   &bind_body, &dlist, ctx, NULL);
   lower_omp (gimple_omp_body_ptr (teams_stmt), ctx);
   lower_reduction_clauses (gimple_omp_teams_clauses (teams_stmt), &olist, ctx);
-  gimple_seq_add_stmt (&bind_body, teams_stmt);
-
-  location_t loc = gimple_location (teams_stmt);
-  tree decl = builtin_decl_explicit (BUILT_IN_GOMP_TEAMS);
-  gimple *call = gimple_build_call (decl, 2, num_teams, thread_limit);
-  gimple_set_location (call, loc);
-  gimple_seq_add_stmt (&bind_body, call);
+  if (!gimple_omp_teams_grid_phony (teams_stmt))
+    {
+      gimple_seq_add_stmt (&bind_body, teams_stmt);
+      location_t loc = gimple_location (teams_stmt);
+      tree decl = builtin_decl_explicit (BUILT_IN_GOMP_TEAMS);
+      gimple *call = gimple_build_call (decl, 2, num_teams, thread_limit);
+      gimple_set_location (call, loc);
+      gimple_seq_add_stmt (&bind_body, call);
+    }
 
   gimple_seq_add_seq (&bind_body, gimple_omp_body (teams_stmt));
   gimple_omp_set_body (teams_stmt, NULL);
   gimple_seq_add_seq (&bind_body, olist);
   gimple_seq_add_seq (&bind_body, dlist);
-  gimple_seq_add_stmt (&bind_body, gimple_build_omp_return (true));
+  if (!gimple_omp_teams_grid_phony (teams_stmt))
+    gimple_seq_add_stmt (&bind_body, gimple_build_omp_return (true));
   gimple_bind_set_body (bind, bind_body);
 
   pop_gimplify_context (bind);
@@ -16189,6 +16759,17 @@ lower_omp_teams (gimple_stmt_iterator *gsi_p, omp_context *ctx)
     TREE_USED (block) = 1;
 }
 
+/* Expand code within an artificial GPUKERNELS OMP construct.  */
+
+static void
+lower_omp_grid_body (gimple_stmt_iterator *gsi_p, omp_context *ctx)
+{
+  gimple *stmt = gsi_stmt (*gsi_p);
+  lower_omp (gimple_omp_body_ptr (stmt), ctx);
+  gimple_seq_add_stmt (gimple_omp_body_ptr (stmt),
+		       gimple_build_omp_return (false));
+}
+
 
 /* Callback for lower_omp_1.  Return non-NULL if *tp needs to be
    regimplified.  If DATA is non-NULL, lower_omp_1 is outside
@@ -16400,6 +16981,11 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx)
       gcc_assert (ctx);
       lower_omp_teams (gsi_p, ctx);
       break;
+    case GIMPLE_OMP_GRID_BODY:
+      ctx = maybe_lookup_ctx (stmt);
+      gcc_assert (ctx);
+      lower_omp_grid_body (gsi_p, ctx);
+      break;
     case GIMPLE_CALL:
       tree fndecl;
       call_stmt = as_a <gcall *> (stmt);
@@ -16489,7 +17075,682 @@ lower_omp (gimple_seq *body, omp_context *ctx)
       fold_stmt (&gsi);
   input_location = saved_location;
 }
-\f
+
+/* Returen true if STMT is an assignment of a register-type into a local
+   VAR_DECL.  */
+
+static bool
+grid_reg_assignment_to_local_var_p (gimple *stmt)
+{
+  gassign *assign = dyn_cast <gassign *> (stmt);
+  if (!assign)
+    return false;
+  tree lhs = gimple_assign_lhs (assign);
+  if (TREE_CODE (lhs) != VAR_DECL
+      || !is_gimple_reg_type (TREE_TYPE (lhs))
+      || is_global_var (lhs))
+    return false;
+  return true;
+}
+
+/* Return true if all statements in SEQ are assignments to local register-type
+   variables.  */
+
+static bool
+grid_seq_only_contains_local_assignments (gimple_seq seq)
+{
+  if (!seq)
+    return true;
+
+  gimple_stmt_iterator gsi;
+  for (gsi = gsi_start (seq); !gsi_end_p (gsi); gsi_next (&gsi))
+    if (!grid_reg_assignment_to_local_var_p (gsi_stmt (gsi)))
+      return false;
+  return true;
+}
+
+/* Scan statements in SEQ and call itself recursively on any bind.  If during
+   whole search only assignments to register-type local variables and one
+   single OMP statement is encountered, return true, otherwise return false.
+   RET is where we store any OMP statement encountered.  TARGET_LOC and NAME
+   are used for dumping a note about a failure.  */
+
+static bool
+grid_find_single_omp_among_assignments_1 (gimple_seq seq, location_t target_loc,
+				     const char *name, gimple **ret)
+{
+  gimple_stmt_iterator gsi;
+  for (gsi = gsi_start (seq); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      gimple *stmt = gsi_stmt (gsi);
+
+      if (grid_reg_assignment_to_local_var_p (stmt))
+	continue;
+      if (gbind *bind = dyn_cast <gbind *> (stmt))
+	{
+	  if (!grid_find_single_omp_among_assignments_1 (gimple_bind_body (bind),
+							 target_loc, name, ret))
+	      return false;
+	}
+      else if (is_gimple_omp (stmt))
+	{
+	  if (*ret)
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_NOTE, target_loc,
+				 "Will not turn target construct into a simple "
+				 "GPGPU kernel because %s construct contains "
+				 "multiple OpenMP constructs\n", name);
+	      return false;
+	    }
+	  *ret = stmt;
+	}
+      else
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, target_loc,
+			     "Will not turn target construct into a simple "
+			     "GPGPU kernel because %s construct contains "
+			     "a complex statement\n", name);
+	  return false;
+	}
+    }
+  return true;
+}
+
+/* Scan statements in SEQ and make sure that it and any binds in it contain
+   only assignments to local register-type variables and one OMP construct.  If
+   so, return that construct, otherwise return NULL.  If dumping is enabled and
+   function fails, use TARGET_LOC and NAME to dump a note with the reason for
+   failure.  */
+
+static gimple *
+grid_find_single_omp_among_assignments (gimple_seq seq, location_t target_loc,
+					const char *name)
+{
+  if (!seq)
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, target_loc,
+			 "Will not turn target construct into a simple "
+			 "GPGPU kernel because %s construct has empty "
+			 "body\n",
+			 name);
+      return NULL;
+    }
+
+  gimple *ret = NULL;
+  if (grid_find_single_omp_among_assignments_1 (seq, target_loc, name, &ret))
+    {
+      if (!ret && dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, target_loc,
+			 "Will not turn target construct into a simple "
+			 "GPGPU kernel because %s construct does not contain"
+			 "any other OpenMP construct\n", name);
+      return ret;
+    }
+  else
+    return NULL;
+}
+
+/* Walker function looking for statements there is no point gridifying (and for
+   noreturn function calls which we cannot do).  Return non-NULL if such a
+   function is found.  */
+
+static tree
+grid_find_ungridifiable_statement (gimple_stmt_iterator *gsi,
+				   bool *handled_ops_p,
+				   struct walk_stmt_info *)
+{
+  *handled_ops_p = false;
+  gimple *stmt = gsi_stmt (*gsi);
+  switch (gimple_code (stmt))
+    {
+    case GIMPLE_CALL:
+      if (gimple_call_noreturn_p (as_a <gcall *> (stmt)))
+	{
+	  *handled_ops_p = true;
+	  return error_mark_node;
+	}
+      break;
+
+    /* We may reduce the following list if we find a way to implement the
+       clauses, but now there is no point trying further.  */
+    case GIMPLE_OMP_CRITICAL:
+    case GIMPLE_OMP_TASKGROUP:
+    case GIMPLE_OMP_TASK:
+    case GIMPLE_OMP_SECTION:
+    case GIMPLE_OMP_SECTIONS:
+    case GIMPLE_OMP_SECTIONS_SWITCH:
+    case GIMPLE_OMP_TARGET:
+    case GIMPLE_OMP_ORDERED:
+      *handled_ops_p = true;
+      return error_mark_node;
+
+    default:
+      break;
+    }
+  return NULL;
+}
+
+
+/* If TARGET follows a pattern that can be turned into a gridified GPGPU
+   kernel, return true, otherwise return false.  In the case of success, also
+   fill in GROUP_SIZE_P with the requested group size or NULL if there is
+   none.  */
+
+static bool
+grid_target_follows_gridifiable_pattern (gomp_target *target, tree *group_size_p)
+{
+  if (gimple_omp_target_kind (target) != GF_OMP_TARGET_KIND_REGION)
+    return false;
+
+  location_t tloc = gimple_location (target);
+  gimple *stmt
+    = grid_find_single_omp_among_assignments (gimple_omp_body (target),
+					      tloc, "target");
+  if (!stmt)
+    return false;
+  gomp_teams *teams = dyn_cast <gomp_teams *> (stmt);
+  tree group_size = NULL;
+  if (!teams)
+    {
+      dump_printf_loc (MSG_NOTE, tloc,
+		       "Will not turn target construct into a simple "
+		       "GPGPU kernel because it does not have a sole teams "
+		       "construct in it.\n");
+      return false;
+    }
+
+  tree clauses = gimple_omp_teams_clauses (teams);
+  while (clauses)
+    {
+      switch (OMP_CLAUSE_CODE (clauses))
+	{
+	case OMP_CLAUSE_NUM_TEAMS:
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, tloc,
+			     "Will not turn target construct into a "
+			     "gridified GPGPU kernel because we cannot "
+			     "handle num_teams clause of teams "
+			     "construct\n ");
+	  return false;
+
+	case OMP_CLAUSE_REDUCTION:
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, tloc,
+			     "Will not turn target construct into a "
+			     "gridified GPGPU kernel because a reduction "
+			     "clause is present\n ");
+	  return false;
+
+	case OMP_CLAUSE_LASTPRIVATE:
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, tloc,
+			     "Will not turn target construct into a "
+			     "gridified GPGPU kernel because a lastprivate "
+			     "clause is present\n ");
+	  return false;
+
+	case OMP_CLAUSE_THREAD_LIMIT:
+	  group_size = OMP_CLAUSE_OPERAND (clauses, 0);
+	  break;
+
+	default:
+	  break;
+	}
+      clauses = OMP_CLAUSE_CHAIN (clauses);
+    }
+
+  stmt = grid_find_single_omp_among_assignments (gimple_omp_body (teams), tloc,
+						 "teams");
+  if (!stmt)
+    return false;
+  gomp_for *dist = dyn_cast <gomp_for *> (stmt);
+  if (!dist)
+    {
+      dump_printf_loc (MSG_NOTE, tloc,
+		       "Will not turn target construct into a simple "
+		       "GPGPU kernel because the teams construct  does not have "
+		       "a sole distribute construct in it.\n");
+      return false;
+    }
+
+  gcc_assert (gimple_omp_for_kind (dist) == GF_OMP_FOR_KIND_DISTRIBUTE);
+  if (!gimple_omp_for_combined_p (dist))
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, tloc,
+			 "Will not turn target construct into a gridified GPGPU "
+			 "kernel because we cannot handle a standalone "
+			 "distribute construct\n ");
+      return false;
+    }
+  if (dist->collapse > 1)
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, tloc,
+			 "Will not turn target construct into a gridified GPGPU "
+			 "kernel because the distribute construct contains "
+			 "collapse clause\n");
+      return false;
+    }
+  struct omp_for_data fd;
+  extract_omp_for_data (dist, &fd, NULL);
+  if (fd.chunk_size)
+    {
+      if (group_size && !operand_equal_p (group_size, fd.chunk_size, 0))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, tloc,
+			     "Will not turn target construct into a "
+			     "gridified GPGPU kernel because the teams "
+			     "thread limit is different from distribute "
+			     "schedule chunk\n");
+	  return false;
+	}
+      group_size = fd.chunk_size;
+    }
+  stmt = grid_find_single_omp_among_assignments (gimple_omp_body (dist), tloc,
+						 "distribute");
+  gomp_parallel *par;
+  if (!stmt || !(par = dyn_cast <gomp_parallel *> (stmt)))
+    return false;
+
+  clauses = gimple_omp_parallel_clauses (par);
+  while (clauses)
+    {
+      switch (OMP_CLAUSE_CODE (clauses))
+	{
+	case OMP_CLAUSE_NUM_THREADS:
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, tloc,
+			     "Will not turn target construct into a gridified"
+			     "GPGPU kernel because there is a num_threads "
+			     "clause of the parallel construct\n");
+	  return false;
+
+	case OMP_CLAUSE_REDUCTION:
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, tloc,
+			     "Will not turn target construct into a "
+			     "gridified GPGPU kernel because a reduction "
+			     "clause is present\n ");
+	  return false;
+
+	case OMP_CLAUSE_LASTPRIVATE:
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, tloc,
+			     "Will not turn target construct into a "
+			     "gridified GPGPU kernel because a lastprivate "
+			     "clause is present\n ");
+	  return false;
+
+	default:
+	  break;
+	}
+      clauses = OMP_CLAUSE_CHAIN (clauses);
+    }
+
+  stmt = grid_find_single_omp_among_assignments (gimple_omp_body (par), tloc,
+						 "parallel");
+  gomp_for *gfor;
+  if (!stmt || !(gfor = dyn_cast <gomp_for *> (stmt)))
+    return false;
+
+  if (gimple_omp_for_kind (gfor) != GF_OMP_FOR_KIND_FOR)
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, tloc,
+			 "Will not turn target construct into a gridified GPGPU "
+			 "kernel because the inner loop is not a simple for "
+			 "loop\n");
+      return false;
+    }
+  if (gfor->collapse > 1)
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, tloc,
+			 "Will not turn target construct into a gridified GPGPU "
+			 "kernel because the inner loop contains collapse "
+			 "clause\n");
+      return false;
+    }
+
+  if (!grid_seq_only_contains_local_assignments (gimple_omp_for_pre_body (gfor)))
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, tloc,
+			 "Will not turn target construct into a gridified GPGPU "
+			 "kernel because the inner loop pre_body contains"
+			 "a complex instruction\n");
+      return false;
+    }
+
+  clauses = gimple_omp_for_clauses (gfor);
+  while (clauses)
+    {
+      switch (OMP_CLAUSE_CODE (clauses))
+	{
+	case OMP_CLAUSE_SCHEDULE:
+	  if (OMP_CLAUSE_SCHEDULE_KIND (clauses) != OMP_CLAUSE_SCHEDULE_AUTO)
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_NOTE, tloc,
+				 "Will not turn target construct into a "
+				 "gridified GPGPU kernel because the inner "
+				 "loop has a non-automatic scheduling clause\n");
+	      return false;
+	    }
+	  break;
+
+	case OMP_CLAUSE_REDUCTION:
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, tloc,
+			     "Will not turn target construct into a "
+			     "gridified GPGPU kernel because a reduction "
+			     "clause is present\n ");
+	  return false;
+
+	case OMP_CLAUSE_LASTPRIVATE:
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, tloc,
+			     "Will not turn target construct into a "
+			     "gridified GPGPU kernel because a lastprivate "
+			     "clause is present\n ");
+	  return false;
+
+	default:
+	  break;
+	}
+      clauses = OMP_CLAUSE_CHAIN (clauses);
+    }
+
+  struct walk_stmt_info wi;
+  memset (&wi, 0, sizeof (wi));
+  if (gimple *bad = walk_gimple_seq (gimple_omp_body (gfor),
+				     grid_find_ungridifiable_statement,
+				     NULL, &wi))
+    {
+      if (dump_enabled_p ())
+	{
+	  if (is_gimple_call (bad))
+	    dump_printf_loc (MSG_NOTE, tloc,
+			     "Will not turn target construct into a gridified "
+			     " GPGPU kernel because the inner loop contains "
+			     "call to a noreturn function\n");
+	  else
+	    dump_printf_loc (MSG_NOTE, tloc,
+			     "Will not turn target construct into a gridified "
+			     "GPGPU kernel because the inner loop contains "
+			     "statement %s which cannot be transformed\n",
+			     gimple_code_name[(int) gimple_code (bad)]);
+	}
+      return false;
+    }
+
+  *group_size_p = group_size;
+  return true;
+}
+
+/* Operand walker, used to remap pre-body declarations according to a hash map
+   provided in DATA.  */
+
+static tree
+grid_remap_prebody_decls (tree *tp, int *walk_subtrees, void *data)
+{
+  tree t = *tp;
+
+  if (DECL_P (t) || TYPE_P (t))
+    *walk_subtrees = 0;
+  else
+    *walk_subtrees = 1;
+
+  if (TREE_CODE (t) == VAR_DECL)
+    {
+      struct walk_stmt_info *wi = (struct walk_stmt_info *) data;
+      hash_map<tree, tree> *declmap = (hash_map<tree, tree> *) wi->info;
+      tree *repl = declmap->get (t);
+      if (repl)
+	*tp = *repl;
+    }
+  return NULL_TREE;
+}
+
+/* Copy leading register-type assignments to local variables in SRC to just
+   before DST, Creating temporaries, adjusting mapping of operands in WI and
+   remapping operands as necessary.  Add any new temporaries to TGT_BIND.
+   Return the first statement that does not conform to
+   grid_reg_assignment_to_local_var_p or NULL.  */
+
+static gimple *
+grid_copy_leading_local_assignments (gimple_seq src, gimple_stmt_iterator *dst,
+				gbind *tgt_bind, struct walk_stmt_info *wi)
+{
+  hash_map<tree, tree> *declmap = (hash_map<tree, tree> *) wi->info;
+  gimple_stmt_iterator gsi;
+  for (gsi = gsi_start (src); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      gimple *stmt = gsi_stmt (gsi);
+      if (gbind *bind = dyn_cast <gbind *> (stmt))
+	{
+	  gimple *r = grid_copy_leading_local_assignments (gimple_bind_body (bind),
+						      dst, tgt_bind, wi);
+	  if (r)
+	    return r;
+	  else
+	    continue;
+	}
+      if (!grid_reg_assignment_to_local_var_p (stmt))
+	return stmt;
+      tree lhs = gimple_assign_lhs (as_a <gassign *> (stmt));
+      tree repl = copy_var_decl (lhs, create_tmp_var_name (NULL),
+				 TREE_TYPE (lhs));
+      DECL_CONTEXT (repl) = current_function_decl;
+      gimple_bind_append_vars (tgt_bind, repl);
+
+      declmap->put (lhs, repl);
+      gassign *copy = as_a <gassign *> (gimple_copy (stmt));
+      walk_gimple_op (copy, grid_remap_prebody_decls, wi);
+      gsi_insert_before (dst, copy, GSI_SAME_STMT);
+    }
+  return NULL;
+}
+
+/* Given freshly copied top level kernel SEQ, identify the individual OMP
+   components, mark them as part of kernel and return the inner loop, and copy
+   assignment leading to them just before DST, remapping them using WI and
+   adding new temporaries to TGT_BIND.  */
+
+static gomp_for *
+grid_process_kernel_body_copy (gimple_seq seq, gimple_stmt_iterator *dst,
+			       gbind *tgt_bind, struct walk_stmt_info *wi)
+{
+  gimple *stmt = grid_copy_leading_local_assignments (seq, dst, tgt_bind, wi);
+  gomp_teams *teams = dyn_cast <gomp_teams *> (stmt);
+  gcc_assert (teams);
+  gimple_omp_teams_set_grid_phony (teams, true);
+  stmt = grid_copy_leading_local_assignments (gimple_omp_body (teams), dst,
+					 tgt_bind, wi);
+  gcc_checking_assert (stmt);
+  gomp_for *dist = dyn_cast <gomp_for *> (stmt);
+  gcc_assert (dist);
+  gimple_seq prebody = gimple_omp_for_pre_body (dist);
+  if (prebody)
+    grid_copy_leading_local_assignments (prebody, dst, tgt_bind, wi);
+  gimple_omp_for_set_grid_phony (dist, true);
+  stmt = grid_copy_leading_local_assignments (gimple_omp_body (dist), dst,
+					 tgt_bind, wi);
+  gcc_checking_assert (stmt);
+
+  gomp_parallel *parallel = as_a <gomp_parallel *> (stmt);
+  gimple_omp_parallel_set_grid_phony (parallel, true);
+  stmt = grid_copy_leading_local_assignments (gimple_omp_body (parallel), dst,
+					 tgt_bind, wi);
+  gomp_for *inner_loop = as_a <gomp_for *> (stmt);
+  gimple_omp_for_set_kind (inner_loop, GF_OMP_FOR_KIND_GRID_LOOP);
+  prebody = gimple_omp_for_pre_body (inner_loop);
+  if (prebody)
+    grid_copy_leading_local_assignments (prebody, dst, tgt_bind, wi);
+
+  return inner_loop;
+}
+
+/* If TARGET points to a GOMP_TARGET which follows a gridifiable pattern,
+   create a GPU kernel for it.  GSI must point to the same statement, TGT_BIND
+   is the bind into which temporaries inserted before TARGET should be
+   added.  */
+
+static void
+grid_attempt_target_gridification (gomp_target *target,
+				   gimple_stmt_iterator *gsi,
+				   gbind *tgt_bind)
+{
+  tree group_size;
+  if (!target || !grid_target_follows_gridifiable_pattern (target, &group_size))
+    return;
+
+  location_t loc = gimple_location (target);
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc,
+		     "Target construct will be turned into a gridified GPGPU "
+		     "kernel\n");
+
+  /* Copy target body to a GPUKERNEL construct:  */
+  gimple_seq kernel_seq = copy_gimple_seq_and_replace_locals
+    (gimple_omp_body (target));
+
+  hash_map<tree, tree> *declmap = new hash_map<tree, tree>;
+  struct walk_stmt_info wi;
+  memset (&wi, 0, sizeof (struct walk_stmt_info));
+  wi.info = declmap;
+
+  /* Copy assignments in between OMP statements before target, mark OMP
+     statements within copy appropriatly.  */
+  gomp_for *inner_loop = grid_process_kernel_body_copy (kernel_seq, gsi,
+							tgt_bind, &wi);
+
+  gbind *old_bind = as_a <gbind *> (gimple_seq_first (gimple_omp_body (target)));
+  gbind *new_bind = as_a <gbind *> (gimple_seq_first (kernel_seq));
+  tree new_block = gimple_bind_block (new_bind);
+  tree enc_block = BLOCK_SUPERCONTEXT (gimple_bind_block (old_bind));
+  BLOCK_CHAIN (new_block) = BLOCK_SUBBLOCKS (enc_block);
+  BLOCK_SUBBLOCKS (enc_block) = new_block;
+  BLOCK_SUPERCONTEXT (new_block) = enc_block;
+  gimple *gpukernel = gimple_build_omp_grid_body (kernel_seq);
+  gimple_seq_add_stmt
+    (gimple_bind_body_ptr (as_a <gbind *> (gimple_omp_body (target))),
+     gpukernel);
+
+  walk_tree (&group_size, grid_remap_prebody_decls, &wi, NULL);
+  push_gimplify_context ();
+  size_t collapse = gimple_omp_for_collapse (inner_loop);
+  for (size_t i = 0; i < collapse; i++)
+    {
+      tree itype, type = TREE_TYPE (gimple_omp_for_index (inner_loop, i));
+      if (POINTER_TYPE_P (type))
+	itype = signed_type_for (type);
+      else
+	itype = type;
+
+      enum tree_code cond_code = gimple_omp_for_cond (inner_loop, i);
+      tree n1 = unshare_expr (gimple_omp_for_initial (inner_loop, i));
+      walk_tree (&n1, grid_remap_prebody_decls, &wi, NULL);
+      tree n2 = unshare_expr (gimple_omp_for_final (inner_loop, i));
+      walk_tree (&n2, grid_remap_prebody_decls, &wi, NULL);
+      adjust_for_condition (loc, &cond_code, &n2);
+      tree step;
+      step = get_omp_for_step_from_incr (loc,
+					 gimple_omp_for_incr (inner_loop, i));
+      gimple_seq tmpseq = NULL;
+      n1 = fold_convert (itype, n1);
+      n2 = fold_convert (itype, n2);
+      tree t = build_int_cst (itype, (cond_code == LT_EXPR ? -1 : 1));
+      t = fold_build2 (PLUS_EXPR, itype, step, t);
+      t = fold_build2 (PLUS_EXPR, itype, t, n2);
+      t = fold_build2 (MINUS_EXPR, itype, t, n1);
+      if (TYPE_UNSIGNED (itype) && cond_code == GT_EXPR)
+	t = fold_build2 (TRUNC_DIV_EXPR, itype,
+			 fold_build1 (NEGATE_EXPR, itype, t),
+			 fold_build1 (NEGATE_EXPR, itype, step));
+      else
+	t = fold_build2 (TRUNC_DIV_EXPR, itype, t, step);
+      tree gs = fold_convert (uint32_type_node, t);
+      gimplify_expr (&gs, &tmpseq, NULL, is_gimple_val, fb_rvalue);
+      if (!gimple_seq_empty_p (tmpseq))
+	gsi_insert_seq_before (gsi, tmpseq, GSI_SAME_STMT);
+
+      tree ws;
+      if (i == 0 && group_size)
+	{
+	  ws = fold_convert (uint32_type_node, group_size);
+	  tmpseq = NULL;
+	  gimplify_expr (&ws, &tmpseq, NULL, is_gimple_val, fb_rvalue);
+	  if (!gimple_seq_empty_p (tmpseq))
+	    gsi_insert_seq_before (gsi, tmpseq, GSI_SAME_STMT);
+	}
+      else
+	ws = build_zero_cst (uint32_type_node);
+
+      tree c = build_omp_clause (UNKNOWN_LOCATION, OMP_CLAUSE__GRIDDIM_);
+      OMP_CLAUSE_SET_GRIDDIM_DIMENSION (c, (unsigned int) i);
+      OMP_CLAUSE__GRIDDIM__SIZE (c) = gs;
+      OMP_CLAUSE__GRIDDIM__GROUP (c) = ws;
+      OMP_CLAUSE_CHAIN (c) = gimple_omp_target_clauses (target);
+      gimple_omp_target_set_clauses (target, c);
+    }
+  pop_gimplify_context (tgt_bind);
+  delete declmap;
+  return;
+}
+
+/* Walker function doing all the work for create_target_kernels. */
+
+static tree
+grid_gridify_all_targets_stmt (gimple_stmt_iterator *gsi,
+				   bool *handled_ops_p,
+				   struct walk_stmt_info *incoming)
+{
+  *handled_ops_p = false;
+
+  gimple *stmt = gsi_stmt (*gsi);
+  gomp_target *target = dyn_cast <gomp_target *> (stmt);
+  if (target)
+    {
+      gbind *tgt_bind = (gbind *) incoming->info;
+      gcc_checking_assert (tgt_bind);
+      grid_attempt_target_gridification (target, gsi, tgt_bind);
+      return NULL_TREE;
+    }
+  gbind *bind = dyn_cast <gbind *> (stmt);
+  if (bind)
+    {
+      *handled_ops_p = true;
+      struct walk_stmt_info wi;
+      memset (&wi, 0, sizeof (wi));
+      wi.info = bind;
+      walk_gimple_seq_mod (gimple_bind_body_ptr (bind),
+			   grid_gridify_all_targets_stmt, NULL, &wi);
+    }
+  return NULL_TREE;
+}
+
+/* Attempt to gridify all target constructs in BODY_P.  All such targets will
+   have their bodies duplicated, with the new copy being put into a
+   gimple_omp_grid_body statement.  All kernel-related construct within the
+   grid_body will be marked with phony flags or kernel kinds.  Moreover, some
+   re-structuring is often needed, such as copying pre-bodies before the target
+   construct so that kernel grid sizes can be computed.  */
+
+static void
+grid_gridify_all_targets (gimple_seq *body_p)
+{
+  struct walk_stmt_info wi;
+  memset (&wi, 0, sizeof (wi));
+  walk_gimple_seq_mod (body_p, grid_gridify_all_targets_stmt, NULL, &wi);
+}
+
+
 /* Main entry point.  */
 
 static unsigned int
@@ -16509,6 +17770,10 @@ execute_lower_omp (void)
 				 delete_omp_context);
 
   body = gimple_body (current_function_decl);
+
+  if (hsa_gen_requested_p ())
+    grid_gridify_all_targets (&body);
+
   scan_omp (&body, NULL);
   gcc_assert (taskreg_nesting_level == 0);
   FOR_EACH_VEC_ELT (taskreg_contexts, i, ctx)
@@ -16846,6 +18111,7 @@ make_gimple_omp_edges (basic_block bb, struct omp_region **region,
     case GIMPLE_OMP_TASKGROUP:
     case GIMPLE_OMP_CRITICAL:
     case GIMPLE_OMP_SECTION:
+    case GIMPLE_OMP_GRID_BODY:
       cur_region = new_omp_region (bb, code, cur_region);
       fallthru = true;
       break;
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index f78cc7b..41c1a9b 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -458,7 +458,11 @@ enum omp_clause_code {
   OMP_CLAUSE_VECTOR_LENGTH,
 
   /* OpenACC clause: tile ( size-expr-list ).  */
-  OMP_CLAUSE_TILE
+  OMP_CLAUSE_TILE,
+
+  /* OpenMP internal-only clause to specify grid dimensions of a gridified
+     kernel.  */
+  OMP_CLAUSE__GRIDDIM_
 };
 
 #undef DEFTREESTRUCT
@@ -1375,6 +1379,9 @@ struct GTY(()) tree_omp_clause {
     enum tree_code                 reduction_code;
     enum omp_clause_linear_kind    linear_kind;
     enum tree_code                 if_modifier;
+    /* The dimension a OMP_CLAUSE__GRIDDIM_ clause of a gridified target
+       construct describes.  */
+    unsigned int		   dimension;
   } GTY ((skip)) subcode;
 
   /* The gimplification of OMP_CLAUSE_REDUCTION_{INIT,MERGE} for omp-low's
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index 4488973..31cea10 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -942,6 +942,18 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, int flags)
       pp_right_paren (pp);
       break;
 
+    case OMP_CLAUSE__GRIDDIM_:
+      pp_string (pp, "_griddim_(");
+      pp_unsigned_wide_integer (pp, OMP_CLAUSE_GRIDDIM_DIMENSION (clause));
+      pp_colon (pp);
+      dump_generic_node (pp, OMP_CLAUSE__GRIDDIM__SIZE (clause), spc, flags,
+			 false);
+      pp_comma (pp);
+      dump_generic_node (pp, OMP_CLAUSE__GRIDDIM__GROUP (clause), spc, flags,
+			 false);
+      pp_right_paren (pp);
+      break;
+
     default:
       /* Should never happen.  */
       dump_generic_node (pp, clause, spc, flags, false);
diff --git a/gcc/tree.c b/gcc/tree.c
index 20470c5..8fef0d1 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -328,6 +328,7 @@ unsigned const char omp_clause_num_ops[] =
   1, /* OMP_CLAUSE_NUM_WORKERS  */
   1, /* OMP_CLAUSE_VECTOR_LENGTH  */
   1, /* OMP_CLAUSE_TILE  */
+  2, /* OMP_CLAUSE__GRIDDIM_  */
 };
 
 const char * const omp_clause_code_name[] =
@@ -398,7 +399,8 @@ const char * const omp_clause_code_name[] =
   "num_gangs",
   "num_workers",
   "vector_length",
-  "tile"
+  "tile",
+  "_griddim_"
 };
 
 
@@ -11606,6 +11608,7 @@ walk_tree_1 (tree *tp, walk_tree_fn func, void *data,
       switch (OMP_CLAUSE_CODE (*tp))
 	{
 	case OMP_CLAUSE_GANG:
+	case OMP_CLAUSE__GRIDDIM_:
 	  WALK_SUBTREE (OMP_CLAUSE_OPERAND (*tp, 1));
 	  /* FALLTHRU */
 
diff --git a/gcc/tree.h b/gcc/tree.h
index 4b509cc..e885ea1 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1636,6 +1636,17 @@ extern void protected_set_expr_location (tree, location_t);
 #define OMP_CLAUSE_TILE_LIST(NODE) \
   OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_TILE), 0)
 
+#define OMP_CLAUSE_GRIDDIM_DIMENSION(NODE) \
+  (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__GRIDDIM_)\
+   ->omp_clause.subcode.dimension)
+#define OMP_CLAUSE_SET_GRIDDIM_DIMENSION(NODE, DIMENSION) \
+  (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__GRIDDIM_)\
+   ->omp_clause.subcode.dimension = (DIMENSION))
+#define OMP_CLAUSE__GRIDDIM__SIZE(NODE) \
+  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__GRIDDIM_), 0)
+#define OMP_CLAUSE__GRIDDIM__GROUP(NODE) \
+  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__GRIDDIM_), 1)
+
 /* SSA_NAME accessors.  */
 
 /* Returns the IDENTIFIER_NODE giving the SSA name a name or NULL_TREE

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 01/10] Configury changes and new options
  2016-01-13 17:55 ` [hsa merge 01/10] Configury changes and new options Martin Jambor
@ 2016-01-14 12:35   ` Jakub Jelinek
  2016-02-02 12:53   ` Thomas Schwinge
  1 sibling, 0 replies; 77+ messages in thread
From: Jakub Jelinek @ 2016-01-14 12:35 UTC (permalink / raw)
  To: Martin Jambor; +Cc: GCC Patches

On Wed, Jan 13, 2016 at 06:39:26PM +0100, Martin Jambor wrote:
> Hi,
> 
> this patch contains changes to the configuration mechanism and offload
> bits, so that users can build compilers with HSA support.
> 
> It is a re-post of
> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00714.html, which, has
> already been approved by Jakub after a few changes
> (https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01284.html).

Ok.

	Jakub

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 03/10] HSA libgomp plugin
  2016-01-13 17:55 ` [hsa merge 03/10] HSA libgomp plugin Martin Jambor
@ 2016-01-14 12:37   ` Jakub Jelinek
  0 siblings, 0 replies; 77+ messages in thread
From: Jakub Jelinek @ 2016-01-14 12:37 UTC (permalink / raw)
  To: Martin Jambor; +Cc: GCC Patches, Martin Liska

On Wed, Jan 13, 2016 at 06:39:28PM +0100, Martin Jambor wrote:
> the patch below adds the HSA-specific plugin for libgomp.  The plugin
> implements the interface mandated by libgomp and takes care of finding
> any available HSA devices, finalizing HSAIL code and running it on
> HSA-capable GPUs.
> 
> This patch is a re-post of
> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00716.html with a number
> of modifications requested by Jakub.

Ok.

	Jakub

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 06/10] Pass manager changes
  2016-01-13 17:55 ` [hsa merge 06/10] Pass manager changes Martin Jambor
@ 2016-01-14 12:52   ` Jakub Jelinek
  0 siblings, 0 replies; 77+ messages in thread
From: Jakub Jelinek @ 2016-01-14 12:52 UTC (permalink / raw)
  To: Martin Jambor; +Cc: GCC Patches

On Wed, Jan 13, 2016 at 06:39:31PM +0100, Martin Jambor wrote:
> Hi,
> 
> the pass manager changes required for HSA have already been committed
> to trunk so all that remains are these additions to the pass pipeline.
> 
> This bit has already been approved by Richi in
> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00996.html

Ok.

	Jakub

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 05/10] OpenMP lowering/expansion changes (gridification)
  2016-01-13 18:17 ` [hsa merge 05/10] OpenMP lowering/expansion changes (gridification) Martin Jambor
@ 2016-01-14 12:52   ` Jakub Jelinek
  2016-01-15 14:52     ` Martin Jambor
  2016-01-14 14:22   ` Alexander Monakov
  1 sibling, 1 reply; 77+ messages in thread
From: Jakub Jelinek @ 2016-01-14 12:52 UTC (permalink / raw)
  To: Martin Jambor; +Cc: GCC Patches

On Wed, Jan 13, 2016 at 06:39:30PM +0100, Martin Jambor wrote:
> @@ -726,14 +730,14 @@ struct GTY((tag("GSS_OMP_CONTINUE")))
>    tree control_use;
>  };
>  
> -/* GIMPLE_OMP_SINGLE, GIMPLE_OMP_TEAMS, GIMPLE_OMP_ORDERED */
> +/* GIMPLE_OMP_SINGLE, GIMPLE_OMP_ORDERED */

I think in this patch GIMPLE_OMP_TEAMS uses GSS_OMP_SINGLE_LAYOUT,
so the comment change is wrong.

> @@ -16489,7 +17075,682 @@ lower_omp (gimple_seq *body, omp_context *ctx)
>        fold_stmt (&gsi);
>    input_location = saved_location;
>  }
> -\f

Can you please keep the form feed char after the set of grid_* functions?

> +static void
> +grid_gridify_all_targets (gimple_seq *body_p)
> +{
> +  struct walk_stmt_info wi;
> +  memset (&wi, 0, sizeof (wi));
> +  walk_gimple_seq_mod (body_p, grid_gridify_all_targets_stmt, NULL, &wi);
> +}
> +

Here?

> +
>  /* Main entry point.  */
>  
>  static unsigned int
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -1636,6 +1636,17 @@ extern void protected_set_expr_location (tree, location_t);
>  #define OMP_CLAUSE_TILE_LIST(NODE) \
>    OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_TILE), 0)
>  
> +#define OMP_CLAUSE_GRIDDIM_DIMENSION(NODE) \
> +  (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__GRIDDIM_)\
> +   ->omp_clause.subcode.dimension)
> +#define OMP_CLAUSE_SET_GRIDDIM_DIMENSION(NODE, DIMENSION) \
> +  (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__GRIDDIM_)\
> +   ->omp_clause.subcode.dimension = (DIMENSION))

Why do you need a separate SET?  OMP_CLAUSE_GRIDDIM_DIMENSION (node) = dimension;
should work just fine.

Ok with these changes.

	Jakub

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 07/10] IPA-HSA pass
  2016-01-13 17:55 ` [hsa merge 07/10] IPA-HSA pass Martin Jambor
@ 2016-01-14 12:59   ` Jakub Jelinek
  2016-01-15 11:48     ` Martin Liška
  2016-01-15 14:53     ` Martin Jambor
  2016-01-15  9:52   ` Jan Hubicka
  1 sibling, 2 replies; 77+ messages in thread
From: Jakub Jelinek @ 2016-01-14 12:59 UTC (permalink / raw)
  To: Martin Jambor; +Cc: GCC Patches, Jan Hubicka, Martin Liska

On Wed, Jan 13, 2016 at 06:39:32PM +0100, Martin Jambor wrote:

> +	  cgraph_node *clone = node->create_virtual_clone
> +	    (vec <cgraph_edge *> (), NULL, NULL, "hsa");

Nicer formatting would be
	  cgraph_node *clone
	    = node->create_virtual_clone (vec <cgraph_edge *> (),
					  NULL, NULL, "hsa");

> +	  cgraph_node *clone = node->create_virtual_clone
> +	    (vec <cgraph_edge *> (), NULL, NULL, "hsa");

Ditto.

> +  const struct lto_function_header *header =
> +    (const struct lto_function_header *) data;

= goes on the next line.

> +  const int cfg_offset = sizeof (struct lto_function_header);
> +  const int main_offset = cfg_offset + header->cfg_size;
> +  const int string_offset = main_offset + header->main_size;
> +  struct data_in *data_in;
> +  unsigned int i;
> +  unsigned int count;
> +
> +  lto_input_block ib_main ((const char *) data + main_offset,
> +			   header->main_size, file_data->mode_table);
> +
> +  data_in =

Ditto.

> +bool
> +pass_ipa_hsa::gate (function *)
> +{
> +  return hsa_gen_requested_p () || in_lto_p;

Does it really need to be enabled whenever in_lto_p?
I mean, if HSA is not configured in, I think the gate should be false too.

Otherwise LGTM.

	Jakub

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 08/10] HSAIL BRIG description header file
  2016-01-13 18:12 ` [hsa merge 08/10] HSAIL BRIG description header file Martin Jambor
@ 2016-01-14 13:04   ` Jakub Jelinek
  2016-01-15  1:19     ` Ian Lance Taylor
  0 siblings, 1 reply; 77+ messages in thread
From: Jakub Jelinek @ 2016-01-14 13:04 UTC (permalink / raw)
  To: Martin Jambor, Ian Lance Taylor; +Cc: GCC Patches, David Edelsohn, Jeff Law

On Wed, Jan 13, 2016 at 06:39:33PM +0100, Martin Jambor wrote:
> the following patch adds a BRIG (binary representation of HSAIL)
> representation description.  It is within a single header file
> describing the binary structures and constants of the format.
> 
> The file comes from the HSA Foundation (I have only added the
> HSA_BRIG_FORMAT_H macro and check and removed some weird comments
> which are not present in proposed future versions of the file) and is
> licensed under "University of Illinois/NCSA Open Source License."
> 
> The license is "GPL-compatible" according to FSF
> (http://www.gnu.org/licenses/license-list.en.html#GPLCompatibleLicenses)
> so I believe we can have it in GCC.  Nevertheless, it is not GPL and
> there is no copyright assignment for it, but the situation is
> hopefully analogous to some other libraries that have their upstream
> elsewhere but we ship them as part of the GCC.
> 
> In the previous posting of this patch
> (https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00721.html) I have
> requested a permission from the steering committee to include this file
> with a different upstream in GCC.  I have not received an official
> reply but since I have been chosen to be the HSA maintainer, I tend to
> think there were no legal objections against HSA going forward,
> including this file.

I just wonder if this wouldn't be better located in include/ subdirectory,
it is something external to GCC.
CCing Ian on this.

> 2015-12-04  Martin Jambor  <mjambor@suse.cz>
> 
>         * hsa-brig-format.h: New file.

	Jakub

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 09/10] Majority of the HSA back-end
  2016-01-13 18:11 ` [hsa merge 09/10] Majority of the HSA back-end Martin Jambor
@ 2016-01-14 14:05   ` Jakub Jelinek
  2016-01-15 15:08     ` Martin Jambor
  2016-01-15 23:49   ` Martin Jambor
  1 sibling, 1 reply; 77+ messages in thread
From: Jakub Jelinek @ 2016-01-14 14:05 UTC (permalink / raw)
  To: Martin Jambor; +Cc: GCC Patches, Martin Liska

On Wed, Jan 13, 2016 at 06:39:34PM +0100, Martin Jambor wrote:
> +struct function_linkage_pair
> +{
> +  function_linkage_pair (tree decl, unsigned int off):

I would have expected a space before :, or better yet on the next line:
  function_linkage_pair (tree decl, unsigned int off)
    : function_decl (decl), offset (off) {}

> +  total_size = sizeof (BrigSectionHeader) + strlen(section_name);

Space between function name and (
(multiple times in the file).
> +void*

Space before *.

> +hsa_brig_section::get_ptr_by_offset (unsigned int offset)
> +{
> +  gcc_assert (offset < total_size);
> +
> +  offset -= header_byte_delta;
> +  unsigned int i;
> +
> +  for (i = 0; offset >= chunks[i].size; i++)
> +    offset -= chunks[i].size;

The vertical spacing looks just weird.  Either declare i first
in the function and put empty line after that, or put empty line
before the declaration and no empty line between the declaration and for,
or declare i inside of for.
> +     r = r * 67 + (unsigned)ds->s[i] - 113;
> +  r = r * 67 + (unsigned)ds->prefix - 113;

Space after ) (twice).

> +  return r;
> +}
> +
> +/* Returns nonzero if DS1 and DS2 are equal.  */
> +
> +inline bool
> +brig_string_slot_hasher::equal (const value_type ds1, const compare_type ds2)
> +{
> +  if (ds1->len == ds2->len)
> +    return ds1->prefix == ds2->prefix && memcmp (ds1->s, ds2->s, ds1->len) == 0;

Too long line.

> +  /* XXX Sanitize the names without all the strdup.  */

Do you plan to work on these XXXs?  Doesn't have to be right away.

> +	  char *modname2;
> +	  asprintf (&modname2, "%s_%s", modname, part);
> +	  free (modname);

Better
	  modname = concat (modname, "_", part, NULL);
?
> +	  modname = modname2;

> +      char buf[64];
> +      sprintf (buf, "__%s_%i", hsa_seg_name (symbol->m_segment),
> +	       symbol->m_name_number);

I'd feel better with a snprintf here.

> +      name_offset = brig_emit_string (buf, prefix);
> +    }
> +
> +  dirvar.name = htole32 (name_offset);
> +  dirvar.init = 0;
> +  dirvar.type = htole16 (symbol->m_type);
> +  dirvar.segment = symbol->m_segment;
> +  /* TODO: Once we are able to access global variables, we must copy their
> +     alignment.  */
> +  dirvar.align = MAX (hsa_natural_alignment (dirvar.type),
> +		      (BrigAlignment8_t) BRIG_ALIGNMENT_4);
> +  dirvar.linkage = symbol->m_linkage;
> +  dirvar.dim.lo = (uint32_t) symbol->m_dim;
> +  dirvar.dim.hi = (uint32_t) ((unsigned long long) symbol->m_dim >> 32);

This makes IMHO too specific host assumptions about long long.
m_dim is UHWI, why not just symbol->m_dim >> 32 without any cast
(I think we rely on unsigned HOST_WIDE_INT being at least 64-bit pretty much
everywhere), or symbol->m_dim >> 16 >> 16?
Or cast to (uint64_t) instead.

> +      /* Internal function.  */
> +      hsa_internal_fn **slot = hsa_emitted_internal_decls->find_slot
> +	(f->m_internal_fn, INSERT);

Again, IMHO if you can easily avoid splitting function decl or ptr and
opening ( to multiple lines, you IMHO should.

      hsa_internal_fn **slot
	= hsa_emitted_internal_decls->find_slot (f->m_internal_fn, INSERT);

> +  char buf[32];
> +
> +  lbldir.base.byteCount = htole16 (sizeof (lbldir));
> +  lbldir.base.kind = htole16 (BRIG_KIND_DIRECTIVE_LABEL);
> +  sprintf (buf, "BB_%u_%i", DECL_UID (current_function_decl), hbb->m_index);

Again, please use snprintf.

> +  unsigned data_len = tree_to_uhwi (TYPE_SIZE (type))/BITS_PER_UNIT;

Spaces around / .

> +  if (INTEGRAL_TYPE_P (type)
> +      || (POINTER_TYPE_P (type) && TREE_CODE (value) == INTEGER_CST))

This looks weird.  You are clearly assuming INTEGER_CST in the code,
but checking it only for POINTER_TYPE_P and not for INTEGRAL_TYPE_P.

> +    switch (data_len)
> +      {
> +      case 1:
> +	bytes.b8 = (uint8_t) TREE_INT_CST_LOW (value);
> +	break;
> +      case 2:
> +	bytes.b16 = (uint16_t) TREE_INT_CST_LOW (value);
> +	break;
> +      case 4:
> +	bytes.b32 = (uint32_t) TREE_INT_CST_LOW (value);
> +	break;
> +      case 8:
> +	bytes.b64 = (uint64_t) int_cst_value (value);

This will ICE on 64-bit unsigned constants, because int_cst_value I think
returns signed HWI, not unsigned.  Why not just TREE_INT_CST_LOW too?

> +	  unsigned actual;
> +	  actual = emit_immediate_scalar_to_buffer
> +	    (VECTOR_CST_ELT (value, i), p, 0);

Better

	  unsigned actual
	    = emit_immediate_scalar_to_buffer (VECTOR_CST_ELT (value, i),
					       p, 0);

> +      unsigned actual;
> +      actual = emit_immediate_scalar_to_buffer
> +	(TREE_REALPART (value), p, total_len / 2);

Likewise.

> +	  unsigned actual = emit_immediate_scalar_to_buffer
> +	    (CONSTRUCTOR_ELT (value, i)->value, p, 0);

Similarly.

> +      out.offset.hi = htole32 (((long long) addr->m_imm_offset) >> 32);

See above.

> +  repr.modifier.allBits = 0 ;

No space before ;

> +  repr.modifier.allBits = 0 ;

Ditto.

> +  else
> +    repr.sourceType = htole16 (as_a <hsa_op_immed *> (cmp->get_op (1))->m_type);

Too long line.

> +  repr.base.operands = htole32
> +    (emit_operands (br->get_op (0), &hsa_bb_for_bb (target)->m_label_ref));

Formatting.

> +  repr.base.operands = htole32
> +    (emit_operands (sbr->get_op (0), sbr->m_label_code_list));

Likewise.

> +  if (call->m_called_function)
> +    function_call_linkage.safe_push
> +      (function_linkage_pair (call->m_called_function,
> +			      call->m_func.m_brig_op_offset));
> +  else
> +    {
> +      hsa_internal_fn *slot = hsa_emitted_internal_decls->find
> +	(call->m_called_internal_fn);

Likewise.  Not sure if you can easily do something with the first one,
perhaps use a temporary?

> +	    insn->m_call_insn->m_result_code_list->m_offsets[0] = htole32
> +	      (emit_directive_variable (insn->m_call_insn->m_output_arg));

Formatting.

> +	  insn->m_operand_list->m_offsets[i - 1] = htole32
> +	    (enqueue_op (insn->get_op (i)));

Ditto.
> +	}
> +
> +      repr.base.operands = htole32 (emit_operands (insn->get_op (0),
> +						   insn->m_operand_list));
> +    }
> +  else if (insn->m_opcode == BRIG_OPCODE_EXPAND)
> +    {
> +      /* Create operand list for packed type.  */
> +      for (unsigned i = 0; i < operand_count - 1; i++)
> +	{
> +	  gcc_checking_assert (insn->get_op (i));
> +	  insn->m_operand_list->m_offsets[i] = htole32
> +	    (enqueue_op (insn->get_op (i)));

Ditto.
> +	}
> +
> +      repr.base.operands = htole32
> +	(emit_operands (insn->m_operand_list,
> +			insn->get_op (insn->operand_count () - 1)));

Ditto.

> +/* Unit constructor and destructor statements.  */
> +
> +static GTY(()) tree hsa_ctor_statements;
> +static GTY(()) tree hsa_dtor_statements;

Perhaps make this cdtor_statements[2] to avoid too many GTY roots?

> +      TREE_TYPE (var_name) = build_array_type
> +	(char_type_node, build_index_type (size_int (len)));

Formatting.

> +      CONSTRUCTOR_APPEND_ELT
> +	(host_functions_vec, NULL_TREE,
> +	 build_fold_addr_expr (hsa_get_host_function (decl)));

Likewise.  Perhaps with a temporary?

> +      TREE_TYPE (kern_name) = build_array_type
> +	(char_type_node, build_index_type (size_int (len)));

Formatting.

> +      kernel_dependencies_vector_type = build_array_type
> +	(build_pointer_type (char_type_node),

Formatting.

> +	      kernel_dependencies_vector_type = build_array_type
> +		(build_pointer_type (char_type_node),
> +		 build_index_type (size_int (count)));

Ditto.

> +		  TREE_TYPE (dependency_name) = build_array_type
> +		    (char_type_node, build_index_type (size_int (len)));
> +
> +		  CONSTRUCTOR_APPEND_ELT
> +		    (kernel_dependencies_vec, NULL_TREE,
> +		     build1 (ADDR_EXPR,
> +			     build_pointer_type (TREE_TYPE (dependency_name)),
> +			     dependency_name));

Ditto several times.

> +  offset_int func_table_size = wi::to_offset (TYPE_SIZE_UNIT (ptr_type_node))
> +    * kernel_count;

  offset_int func_table_size
    = wi::to_offset (TYPE_SIZE_UNIT (ptr_type_node)) * kernel_count;
instead?

> +  tree offload_register = builtin_decl_explicit
> +    (BUILT_IN_GOMP_OFFLOAD_REGISTER);

Formatting.

> +/* Required HSA section alignment. */
> +
> +#define HSA_SECTION_ALIGNMENT 16

Shouldn't this be at the top of the file?

> +      BrigOperandCodeRef *code_ref = (BrigOperandCodeRef *)
> +	(brig_operand.get_ptr_by_offset (p.offset));

      BrigOperandCodeRef *code_ref
	= (BrigOperandCodeRef *) (brig_operand.get_ptr_by_offset (p.offset));
instead?

> +    for (hash_map <tree, BrigDirectiveExecutable *>::iterator it =
> +	 emitted_declarations->begin (); it != emitted_declarations->end ();

= goes on the next line.  And I think the condition should go on yet another
line, either together with ++it, or on its own.
> +	 ++it)

> +  data_padding = HSA_SECTION_ALIGNMENT
> +    - brig_data.total_size % HSA_SECTION_ALIGNMENT;
> +  code_padding = HSA_SECTION_ALIGNMENT
> +    - brig_code.total_size % HSA_SECTION_ALIGNMENT;
> +  operand_padding = HSA_SECTION_ALIGNMENT
> +    - brig_operand.total_size % HSA_SECTION_ALIGNMENT;
> +
> +  uint64_t module_size = sizeof (module_header) + sizeof (section_index)
> +    + brig_data.total_size + data_padding
> +    + brig_code.total_size + code_padding
> +    + brig_operand.total_size + operand_padding;

Put = on the next line in each case, or use
  var = (something
         +- something else ...);

> +  memset (padding, 0, sizeof(padding));

Missing space before (.

> +  bool unsigned_int_type = (BRIG_TYPE_U8 | BRIG_TYPE_U16 | BRIG_TYPE_U32
> +    | BRIG_TYPE_U64) & imm->m_type;

See above.

> +unsigned HOST_WIDE_INT
> +hsa_symbol::total_byte_size ()
> +{
> +  unsigned HOST_WIDE_INT s = hsa_type_bit_size (~BRIG_TYPE_ARRAY_MASK & m_type);

Too long line.
> +  gcc_assert (s % BITS_PER_UNIT == 0);
> +  s /= BITS_PER_UNIT;
> +
> +  if (m_dim)
> +    s *= m_dim;
> +
> +  return s;
> +}
> +
> +/* Forward declaration.  */
> +
> +static BrigType16_t
> +hsa_type_for_tree_type (const_tree type, unsigned HOST_WIDE_INT *dim_p,
> +			bool min32int);
> +
> +void
> +hsa_symbol::fillup_for_decl (tree decl)
> +{
> +  m_decl = decl;
> +  m_type = hsa_type_for_tree_type (TREE_TYPE (decl), &m_dim, false);
> +
> +  if (hsa_seen_error ())
> +    m_seen_error = true;
> +}
> +
> +/* Constructor of class representing global HSA function/kernel information and
> +   state.  FNDECL is function declaration, KERNEL_P is true if the function
> +   is going to become a HSA kernel.  If the function has body, SSA_NAMES_COUNT
> +   should be set to number of SSA names used in the function.  */
> +
> +hsa_function_representation::hsa_function_representation
> +  (tree fdecl, bool kernel_p, unsigned ssa_names_count): m_name (NULL),

Space before : or : on the next line.

> +hsa_function_representation::hsa_function_representation (hsa_internal_fn *fn):

Ditto.

> +#define HSA_WARN_MEMORY_ROUTINE "OpenMP device memory library routines have " \
> +  "undefined semantics within target regions, support for HSA ignores them"

Well, if you don't support them in HSA target regions, you'd better punt and
not error on them.

> +#define HSA_WARN_AFFINITY "Support for HSA does not implement OpenMP affinity " \
> +  "featerues"

Typo - features

> +  omp_simple_builtins = new hash_map <nofree_string_hash, omp_simple_builtin>
> +    ();

Formatting.
> +
> +  omp_simple_builtin omp_builtins[] =
> +    {
> +      omp_simple_builtin
> +	("omp_get_initial_device", NULL, false,
> +	 new hsa_op_immed (GOMP_DEVICE_HOST, (BrigType16_t) BRIG_TYPE_S32)),

Ditto (and several times below).

> +  bool is_in_global_vars = TREE_CODE (decl) == VAR_DECL && is_global_var (decl);

Too long line.

> +  hsa_function_summary *s = hsa_summaries->get (cgraph_node::get_create (decl));

Ditto.

> +hsa_op_base::hsa_op_base (BrigKind16_t k): m_next (NULL), m_brig_op_offset (0),
> +  m_kind (k)

Space before : or put : on next line together with all the non-static data
member initializers.

> +/* Constructor of class representing HSA immediate values.  INTEGER_VALUE is the
> +   integer representation of the immediate value.  TYPE is BRIG type.  */
> +
> +hsa_op_immed::hsa_op_immed (HOST_WIDE_INT integer_value, BrigType16_t type)
> +  : hsa_op_with_type (BRIG_KIND_OPERAND_CONSTANT_BYTES, type),
> +  m_tree_value (NULL), m_brig_repr (NULL)

I'd say m_tree_value should be indented by 4 spaces instead of 2, below
hsa_op_with_type.

> +hsa_op_address::hsa_op_address (hsa_symbol *sym, hsa_op_reg *r,
> +				HOST_WIDE_INT offset)
> +  : hsa_op_base (BRIG_KIND_OPERAND_ADDRESS), m_symbol (sym), m_reg (r),
> +  m_imm_offset (offset)

See above (several times below too).

> +hsa_insn_br::hsa_insn_br (hsa_op_reg *ctrl)
> +: hsa_insn_basic (1, BRIG_OPCODE_CBR, BRIG_TYPE_B1, ctrl),
> +  m_width (BRIG_WIDTH_1)

Shift the 2 lines by 2 columns to the right (more times below).

> +      hsa_op_reg *tmp;
> +      tmp = new hsa_op_reg (hsa_get_segment_addr_type
> +			    (addr->m_symbol->m_segment));

      hsa_op_reg *tmp
	= new hsa_op_reg (hsa_get_segment_addr_type (addr->m_symbol->m_segment));
is too long, so perhaps a temporary?

> +  if (flag_tm)
> +    {
> +      HSA_SORRY_AT (UNKNOWN_LOCATION,
> +		    "support for HSA does not implement transactional memory");
> +      goto fail;
> +    }

Wouldn't (at least incrementally) be it much better to just give up if you
see any of the problematic constructs in the function, rather than
just giving up whenever -fgnu-tm is used?

> +      char *b = XNEWVEC (char, 64);
> +      sprintf (b, "__hsa_anonymous_%i", DECL_UID (decl));
> +      const char *ggc_str = ggc_alloc_string (b, strlen (b) + 1);
> +      free (b);

Ugh, why not just char b[64]; instead?  And use snprintf...
And then ggc_strdup?

> +unsigned
> +hsa_internal_fn::get_arity ()
> +{
> +  switch (m_fn)
> +    {
> +    case IFN_ACOS:
> +    case IFN_ASIN:
> +    case IFN_ATAN:
> +    case IFN_COS:
> +    case IFN_EXP:
> +    case IFN_EXP10:
> +    case IFN_EXP2:
> +    case IFN_EXPM1:
> +    case IFN_LOG:
> +    case IFN_LOG10:
> +    case IFN_LOG1P:
> +    case IFN_LOG2:
> +    case IFN_LOGB:
> +    case IFN_SIGNIFICAND:
> +    case IFN_SIN:
> +    case IFN_SQRT:
> +    case IFN_TAN:
> +    case IFN_CEIL:
> +    case IFN_FLOOR:
> +    case IFN_NEARBYINT:
> +    case IFN_RINT:
> +    case IFN_ROUND:
> +    case IFN_TRUNC:
> +      return 1;
> +    case IFN_ATAN2:
> +    case IFN_COPYSIGN:
> +    case IFN_FMOD:
> +    case IFN_POW:
> +    case IFN_REMAINDER:
> +    case IFN_SCALB:
> +    case IFN_LDEXP:
> +      return 2;
> +      break;
> +    case IFN_CLRSB:
> +    case IFN_CLZ:
> +    case IFN_CTZ:
> +    case IFN_FFS:
> +    case IFN_PARITY:
> +    case IFN_POPCOUNT:
> +    default:
> +      gcc_unreachable ();

There are various other IFNs (e.g. for __builtin_{add,sub,mul}_overflow,
lots of others).  How do you ensure you don't ICE on those?

Generally, can I ask you to search for each of the categories of formatting
and other issues in other spots and attempt to clean up or fix them too?
Like for the call formatting generally search for ERE '^[[:blank:]]*\(',
for long lines just use contrib/check_GNU_style.sh, look for
'=[[:blank:]]*$', '^:', '\):', look around even '^  :', etc.?

	Jakub

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 05/10] OpenMP lowering/expansion changes (gridification)
  2016-01-13 18:17 ` [hsa merge 05/10] OpenMP lowering/expansion changes (gridification) Martin Jambor
  2016-01-14 12:52   ` Jakub Jelinek
@ 2016-01-14 14:22   ` Alexander Monakov
  1 sibling, 0 replies; 77+ messages in thread
From: Alexander Monakov @ 2016-01-14 14:22 UTC (permalink / raw)
  To: Martin Jambor; +Cc: GCC Patches, Jakub Jelinek

Hi Martin,

while addressing Jakub's feedback can you also fix the typos below?
(noticed because I needed to reuse those hunks in gomp-nvptx)

On Wed, 13 Jan 2016, Martin Jambor wrote:
> +/* Return a target argument consisiting of DEVICE identifier, value identifier
> +   ID, and the actual VALUE.  */

"consisting"

> +/* If VALUE is an integer constant greater than -2^15 and smaller than 2^15,
> +   push one argument to ARGS with bot the DEVICE, ID and VALUE embeded in it,
> +   otherwise push an iedntifier (with DEVICE and ID) and the VALUE in two
> +   arguments.  */

"both", "embedded", "identifier"

Thanks.
Alexander

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 10/10] HSA register allocator
  2016-01-13 17:55 ` [hsa merge 10/10] HSA register allocator Martin Jambor
@ 2016-01-14 14:41   ` Jakub Jelinek
  2016-01-15 15:09     ` Martin Jambor
  0 siblings, 1 reply; 77+ messages in thread
From: Jakub Jelinek @ 2016-01-14 14:41 UTC (permalink / raw)
  To: Martin Jambor; +Cc: GCC Patches, Michael Matz

On Wed, Jan 13, 2016 at 06:39:35PM +0100, Martin Jambor wrote:
> +    for (phi = hbb->m_first_phi;
> +	 phi;
> +	 phi = phi->m_next ? as_a <hsa_insn_phi *> (phi->m_next): NULL)

Space before :

Ok with that change.

	Jakub

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 08/10] HSAIL BRIG description header file
  2016-01-14 13:04   ` Jakub Jelinek
@ 2016-01-15  1:19     ` Ian Lance Taylor
  2016-01-15 10:03       ` Richard Biener
                         ` (2 more replies)
  0 siblings, 3 replies; 77+ messages in thread
From: Ian Lance Taylor @ 2016-01-15  1:19 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Martin Jambor, GCC Patches, David Edelsohn, Jeff Law

Jakub Jelinek <jakub@redhat.com> writes:

> On Wed, Jan 13, 2016 at 06:39:33PM +0100, Martin Jambor wrote:
>> the following patch adds a BRIG (binary representation of HSAIL)
>> representation description.  It is within a single header file
>> describing the binary structures and constants of the format.
>> 
>> The file comes from the HSA Foundation (I have only added the
>> HSA_BRIG_FORMAT_H macro and check and removed some weird comments
>> which are not present in proposed future versions of the file) and is
>> licensed under "University of Illinois/NCSA Open Source License."
>> 
>> The license is "GPL-compatible" according to FSF
>> (http://www.gnu.org/licenses/license-list.en.html#GPLCompatibleLicenses)
>> so I believe we can have it in GCC.  Nevertheless, it is not GPL and
>> there is no copyright assignment for it, but the situation is
>> hopefully analogous to some other libraries that have their upstream
>> elsewhere but we ship them as part of the GCC.
>> 
>> In the previous posting of this patch
>> (https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00721.html) I have
>> requested a permission from the steering committee to include this file
>> with a different upstream in GCC.  I have not received an official
>> reply but since I have been chosen to be the HSA maintainer, I tend to
>> think there were no legal objections against HSA going forward,
>> including this file.

Martin, could you ask the HSA Foundation or AMD or whoever if there is
any way they could remove the second requirement of the license?  It
adds yet another case where anybody distributing GCC has to list yet
another copyright notice.

Barring that, I would personally prefer that you write your own version
of this header file, defining the constants and structs that you need.
That's basically what we've done for ELF and COFF and Mach-O, several
times over.  For example, libiberty/simple-object-elf.c.

Barring that, I agree with Jakub that this looks like something that
should go in the top-level include subdirectory rather than the gcc
subdirectory.

Ian

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 07/10] IPA-HSA pass
  2016-01-13 17:55 ` [hsa merge 07/10] IPA-HSA pass Martin Jambor
  2016-01-14 12:59   ` Jakub Jelinek
@ 2016-01-15  9:52   ` Jan Hubicka
  2016-01-15 11:47     ` Martin Liška
  1 sibling, 1 reply; 77+ messages in thread
From: Jan Hubicka @ 2016-01-15  9:52 UTC (permalink / raw)
  To: Martin Jambor; +Cc: GCC Patches, Jan Hubicka, Martin Liska

> 2016-01-13  Martin Liska  <mliska@suse.cz>
> 	    Martin Jambor  <mjambor@suse.cz>
> 
> 	* ipa-hsa.c: New file.
> 	* lto-section-in.c (lto_section_name): Add hsa section name.
> 	* lto-streamer.h (lto_section_type): Add hsa section.
> 	* lto-partition.c: Include "hsa.h"
> 	(add_symbol_to_partition_1): Put hsa implementations into the
> 	same partition as host implementations.
> 	* timevar.def (TV_IPA_HSA): New.
> 
> diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c
> index 81a63a5..0a56170 100644
> --- a/gcc/lto/lto-partition.c
> +++ b/gcc/lto/lto-partition.c
> @@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "ipa-prop.h"
>  #include "ipa-inline.h"
>  #include "lto-partition.h"
> +#include "hsa.h"
>  
>  vec<ltrans_partition> ltrans_partitions;
>  
> @@ -170,6 +171,24 @@ add_symbol_to_partition_1 (ltrans_partition part, symtab_node *node)
>  	 Therefore put it into the same partition.  */
>        if (cnode->instrumented_version)
>  	add_symbol_to_partition_1 (part, cnode->instrumented_version);
> +
> +      /* Add an HSA associated with the symbol.  */
> +      if (hsa_summaries != NULL)
> +	{
> +	  hsa_function_summary *s = hsa_summaries->get (cnode);
> +	  if (s->m_kind == HSA_KERNEL)
> +	    {
> +	      /* Add binded function.  */
> +	      bool added = add_symbol_to_partition_1 (part,
> +						      s->m_binded_function);
> +	      gcc_assert (added);
> +	      if (symtab->dump_file)
> +		fprintf (symtab->dump_file,
> +			 "adding an HSA function (host/gpu) to the "
> +			 "partition: %s\n",
> +			 s->m_binded_function->name ());
> +	    }
> +	}

Do we really need to look that up in the hsa summary? Why these can not be partitioned the
usual way?

The patch looks OK for me modulo Jakub's comments.

Honza
>      }
>  
>    add_references_to_partition (part, node);
> diff --git a/gcc/timevar.def b/gcc/timevar.def
> index 2765179..d9a5066 100644
> --- a/gcc/timevar.def
> +++ b/gcc/timevar.def
> @@ -97,6 +97,7 @@ DEFTIMEVAR (TV_WHOPR_WPA_IO          , "whopr wpa I/O")
>  DEFTIMEVAR (TV_WHOPR_PARTITIONING    , "whopr partitioning")
>  DEFTIMEVAR (TV_WHOPR_LTRANS          , "whopr ltrans")
>  DEFTIMEVAR (TV_IPA_REFERENCE         , "ipa reference")
> +DEFTIMEVAR (TV_IPA_HSA		     , "ipa HSA")
>  DEFTIMEVAR (TV_IPA_PROFILE           , "ipa profile")
>  DEFTIMEVAR (TV_IPA_AUTOFDO           , "auto profile")
>  DEFTIMEVAR (TV_IPA_PURE_CONST        , "ipa pure const")

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 08/10] HSAIL BRIG description header file
  2016-01-15  1:19     ` Ian Lance Taylor
@ 2016-01-15 10:03       ` Richard Biener
  2016-01-15 10:14       ` Martin Jambor
  2016-02-26 18:58       ` Martin Jambor
  2 siblings, 0 replies; 77+ messages in thread
From: Richard Biener @ 2016-01-15 10:03 UTC (permalink / raw)
  To: Ian Lance Taylor
  Cc: Jakub Jelinek, Martin Jambor, GCC Patches, David Edelsohn, Jeff Law

On Fri, Jan 15, 2016 at 2:18 AM, Ian Lance Taylor <ian@airs.com> wrote:
> Jakub Jelinek <jakub@redhat.com> writes:
>
>> On Wed, Jan 13, 2016 at 06:39:33PM +0100, Martin Jambor wrote:
>>> the following patch adds a BRIG (binary representation of HSAIL)
>>> representation description.  It is within a single header file
>>> describing the binary structures and constants of the format.
>>>
>>> The file comes from the HSA Foundation (I have only added the
>>> HSA_BRIG_FORMAT_H macro and check and removed some weird comments
>>> which are not present in proposed future versions of the file) and is
>>> licensed under "University of Illinois/NCSA Open Source License."
>>>
>>> The license is "GPL-compatible" according to FSF
>>> (http://www.gnu.org/licenses/license-list.en.html#GPLCompatibleLicenses)
>>> so I believe we can have it in GCC.  Nevertheless, it is not GPL and
>>> there is no copyright assignment for it, but the situation is
>>> hopefully analogous to some other libraries that have their upstream
>>> elsewhere but we ship them as part of the GCC.
>>>
>>> In the previous posting of this patch
>>> (https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00721.html) I have
>>> requested a permission from the steering committee to include this file
>>> with a different upstream in GCC.  I have not received an official
>>> reply but since I have been chosen to be the HSA maintainer, I tend to
>>> think there were no legal objections against HSA going forward,
>>> including this file.
>
> Martin, could you ask the HSA Foundation or AMD or whoever if there is
> any way they could remove the second requirement of the license?  It
> adds yet another case where anybody distributing GCC has to list yet
> another copyright notice.
>
> Barring that, I would personally prefer that you write your own version
> of this header file, defining the constants and structs that you need.
> That's basically what we've done for ELF and COFF and Mach-O, several
> times over.  For example, libiberty/simple-object-elf.c.

Not sure how that would end up with sth that is not considered a derivative
(esp. if not clean-room "duplicated" from other available documentation).

So if that's ok then it must be ok to basically strip the license and put ours
on it doing some formatting / re-ordering (or renaming).

That said - stupid copyright stuff (for something that is maybe not
even copyrightable).

Richard.

> Barring that, I agree with Jakub that this looks like something that
> should go in the top-level include subdirectory rather than the gcc
> subdirectory.
>
> Ian

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 08/10] HSAIL BRIG description header file
  2016-01-15  1:19     ` Ian Lance Taylor
  2016-01-15 10:03       ` Richard Biener
@ 2016-01-15 10:14       ` Martin Jambor
  2016-01-15 10:17         ` Richard Biener
  2016-01-15 10:37         ` Jakub Jelinek
  2016-02-26 18:58       ` Martin Jambor
  2 siblings, 2 replies; 77+ messages in thread
From: Martin Jambor @ 2016-01-15 10:14 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Jakub Jelinek, GCC Patches, David Edelsohn, Jeff Law

Hi,

On Thu, Jan 14, 2016 at 05:18:56PM -0800, Ian Lance Taylor wrote:
> Jakub Jelinek <jakub@redhat.com> writes:
> 
> > On Wed, Jan 13, 2016 at 06:39:33PM +0100, Martin Jambor wrote:
> >> the following patch adds a BRIG (binary representation of HSAIL)
> >> representation description.  It is within a single header file
> >> describing the binary structures and constants of the format.
> >> 
> >> The file comes from the HSA Foundation (I have only added the
> >> HSA_BRIG_FORMAT_H macro and check and removed some weird comments
> >> which are not present in proposed future versions of the file) and is
> >> licensed under "University of Illinois/NCSA Open Source License."
> >> 
> >> The license is "GPL-compatible" according to FSF
> >> (http://www.gnu.org/licenses/license-list.en.html#GPLCompatibleLicenses)
> >> so I believe we can have it in GCC.  Nevertheless, it is not GPL and
> >> there is no copyright assignment for it, but the situation is
> >> hopefully analogous to some other libraries that have their upstream
> >> elsewhere but we ship them as part of the GCC.
> >> 
> >> In the previous posting of this patch
> >> (https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00721.html) I have
> >> requested a permission from the steering committee to include this file
> >> with a different upstream in GCC.  I have not received an official
> >> reply but since I have been chosen to be the HSA maintainer, I tend to
> >> think there were no legal objections against HSA going forward,
> >> including this file.
> 
> Martin, could you ask the HSA Foundation or AMD or whoever if there is
> any way they could remove the second requirement of the license?  It
> adds yet another case where anybody distributing GCC has to list yet
> another copyright notice.

I will raise this with the HSA PRM group and perhaps there is a slight
chance that they will change this in the upcoming version of HSAIL.
But it is not going to happen soon enough.

> 
> Barring that, I would personally prefer that you write your own version
> of this header file, defining the constants and structs that you need.
> That's basically what we've done for ELF and COFF and Mach-O, several
> times over.  For example, libiberty/simple-object-elf.c.

Well, if we have done something like this before, I can go through the
exercise of copy'n'pasting everything from the PDF specification, if
that allowed us to "own" the file and put it under GPL 3.  But I must
say I do not know.

It is going to be a bit tedious job (and it would be good to double
check I made no mistakes somehow) but it is certainly doable.  I guess
I will embark on it after going through the rest of the review (unless
someone here tells me I should not, that is).

> 
> Barring that, I agree with Jakub that this looks like something that
> should go in the top-level include subdirectory rather than the gcc
> subdirectory.

Even if I "create" a copy of our own?  But sure, no problem.

Martin

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 08/10] HSAIL BRIG description header file
  2016-01-15 10:14       ` Martin Jambor
@ 2016-01-15 10:17         ` Richard Biener
  2016-01-15 10:37         ` Jakub Jelinek
  1 sibling, 0 replies; 77+ messages in thread
From: Richard Biener @ 2016-01-15 10:17 UTC (permalink / raw)
  To: Ian Lance Taylor, Jakub Jelinek, GCC Patches, David Edelsohn, Jeff Law

On Fri, Jan 15, 2016 at 11:14 AM, Martin Jambor <mjambor@suse.cz> wrote:
> Hi,
>
> On Thu, Jan 14, 2016 at 05:18:56PM -0800, Ian Lance Taylor wrote:
>> Jakub Jelinek <jakub@redhat.com> writes:
>>
>> > On Wed, Jan 13, 2016 at 06:39:33PM +0100, Martin Jambor wrote:
>> >> the following patch adds a BRIG (binary representation of HSAIL)
>> >> representation description.  It is within a single header file
>> >> describing the binary structures and constants of the format.
>> >>
>> >> The file comes from the HSA Foundation (I have only added the
>> >> HSA_BRIG_FORMAT_H macro and check and removed some weird comments
>> >> which are not present in proposed future versions of the file) and is
>> >> licensed under "University of Illinois/NCSA Open Source License."
>> >>
>> >> The license is "GPL-compatible" according to FSF
>> >> (http://www.gnu.org/licenses/license-list.en.html#GPLCompatibleLicenses)
>> >> so I believe we can have it in GCC.  Nevertheless, it is not GPL and
>> >> there is no copyright assignment for it, but the situation is
>> >> hopefully analogous to some other libraries that have their upstream
>> >> elsewhere but we ship them as part of the GCC.
>> >>
>> >> In the previous posting of this patch
>> >> (https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00721.html) I have
>> >> requested a permission from the steering committee to include this file
>> >> with a different upstream in GCC.  I have not received an official
>> >> reply but since I have been chosen to be the HSA maintainer, I tend to
>> >> think there were no legal objections against HSA going forward,
>> >> including this file.
>>
>> Martin, could you ask the HSA Foundation or AMD or whoever if there is
>> any way they could remove the second requirement of the license?  It
>> adds yet another case where anybody distributing GCC has to list yet
>> another copyright notice.
>
> I will raise this with the HSA PRM group and perhaps there is a slight
> chance that they will change this in the upcoming version of HSAIL.
> But it is not going to happen soon enough.
>
>>
>> Barring that, I would personally prefer that you write your own version
>> of this header file, defining the constants and structs that you need.
>> That's basically what we've done for ELF and COFF and Mach-O, several
>> times over.  For example, libiberty/simple-object-elf.c.
>
> Well, if we have done something like this before, I can go through the
> exercise of copy'n'pasting everything from the PDF specification, if
> that allowed us to "own" the file and put it under GPL 3.  But I must
> say I do not know.
>
> It is going to be a bit tedious job (and it would be good to double
> check I made no mistakes somehow) but it is certainly doable.  I guess
> I will embark on it after going through the rest of the review (unless
> someone here tells me I should not, that is).
>
>>
>> Barring that, I agree with Jakub that this looks like something that
>> should go in the top-level include subdirectory rather than the gcc
>> subdirectory.
>
> Even if I "create" a copy of our own?  But sure, no problem.

Btw, in the mean time we can also not ship the header and check
for its presence (thus require the user to install it in his system or
drop it in-place like isl/gmp).

So it shouldn't block the patches if we add a configure check for it
(HSA needs to be enabled with a switch anyway).

Richard.

> Martin

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 08/10] HSAIL BRIG description header file
  2016-01-15 10:14       ` Martin Jambor
  2016-01-15 10:17         ` Richard Biener
@ 2016-01-15 10:37         ` Jakub Jelinek
  2016-01-15 12:03           ` Jakub Jelinek
  2016-01-15 17:53           ` Mike Stump
  1 sibling, 2 replies; 77+ messages in thread
From: Jakub Jelinek @ 2016-01-15 10:37 UTC (permalink / raw)
  To: Ian Lance Taylor, GCC Patches, David Edelsohn, Jeff Law

On Fri, Jan 15, 2016 at 11:14:33AM +0100, Martin Jambor wrote:
> > Martin, could you ask the HSA Foundation or AMD or whoever if there is
> > any way they could remove the second requirement of the license?  It
> > adds yet another case where anybody distributing GCC has to list yet
> > another copyright notice.
> 
> I will raise this with the HSA PRM group and perhaps there is a slight
> chance that they will change this in the upcoming version of HSAIL.
> But it is not going to happen soon enough.

Under what license is
http://www.hsafoundation.com/html/Content/PRM/Topics/18_BRIG/_chpStr_BRIG_HSAIL_binary_format.htm
?  Sounds the same as the pdf to me.
Unlike the pdf version thereof, you could grab the <pre>...</pre> chunks
out of this fairly easily with recursive wget and some quick scripting.

There is conflicting:
It or any components may not be reproduced, republished, distributed,
transmitted, displayed, broadcast or otherwise exploited in any manner
without the express prior written permission of HSA Foundation.

HSA Foundation grants express permission to any current Founder, Promoter,
Supporter Contributor, Academic or Associate member of HSA Foundation to
copy and redistribute UNMODIFIED versions of this specification in any
fashion, provided that NO CHARGE is made for the specification and the
latest available update of the specification for any version of the API is
used whenever possible. Such distributed specification may be re-formatted
AS LONG AS the contents of the specification are not changed in any way.
The specification may be incorporated into a product that is sold as long as
such product includes significant independent work developed by the seller.
A link to the current version of this specification on the HSA Foundation
web-site should be included whenever possible with specification
distributions.

Now, extracting just portions of the spec, even if unmodifed and/or only
reformatted, might actually be considered modification.
But what you've posted looks to be generated in pretty much the same way,
and the chosen license in there is not in the specification.

	Jakub

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 07/10] IPA-HSA pass
  2016-01-15  9:52   ` Jan Hubicka
@ 2016-01-15 11:47     ` Martin Liška
  2016-01-16 10:00       ` Jan Hubicka
  0 siblings, 1 reply; 77+ messages in thread
From: Martin Liška @ 2016-01-15 11:47 UTC (permalink / raw)
  To: Jan Hubicka, Martin Jambor; +Cc: GCC Patches

On 01/15/2016 10:52 AM, Jan Hubicka wrote:
> Do we really need to look that up in the hsa summary? Why these can not be partitioned the
> usual way?

Hi.

Yes, it's needed as hsa-brig.c uses host function declaration of a kernel as a key for libgomp.
That's why we want to put the pair to a LTO partition.

Martin

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 07/10] IPA-HSA pass
  2016-01-14 12:59   ` Jakub Jelinek
@ 2016-01-15 11:48     ` Martin Liška
  2016-01-15 14:53     ` Martin Jambor
  1 sibling, 0 replies; 77+ messages in thread
From: Martin Liška @ 2016-01-15 11:48 UTC (permalink / raw)
  To: gcc-patches

On 01/14/2016 01:58 PM, Jakub Jelinek wrote:
> Does it really need to be enabled whenever in_lto_p?
> I mean, if HSA is not configured in, I think the gate should be false too.

Sure, it can be removed, change will incorporated in final installed version of the file.

Thanks,
Martin

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 08/10] HSAIL BRIG description header file
  2016-01-15 10:37         ` Jakub Jelinek
@ 2016-01-15 12:03           ` Jakub Jelinek
  2016-01-15 17:23             ` Martin Jambor
  2016-01-15 17:53           ` Mike Stump
  1 sibling, 1 reply; 77+ messages in thread
From: Jakub Jelinek @ 2016-01-15 12:03 UTC (permalink / raw)
  To: Ian Lance Taylor, GCC Patches, David Edelsohn, Jeff Law

On Fri, Jan 15, 2016 at 11:37:32AM +0100, Jakub Jelinek wrote:
> On Fri, Jan 15, 2016 at 11:14:33AM +0100, Martin Jambor wrote:
> > > Martin, could you ask the HSA Foundation or AMD or whoever if there is
> > > any way they could remove the second requirement of the license?  It
> > > adds yet another case where anybody distributing GCC has to list yet
> > > another copyright notice.
> > 
> > I will raise this with the HSA PRM group and perhaps there is a slight
> > chance that they will change this in the upcoming version of HSAIL.
> > But it is not going to happen soon enough.
> 
> Under what license is
> http://www.hsafoundation.com/html/Content/PRM/Topics/18_BRIG/_chpStr_BRIG_HSAIL_binary_format.htm
> ?  Sounds the same as the pdf to me.
> Unlike the pdf version thereof, you could grab the <pre>...</pre> chunks
> out of this fairly easily with recursive wget and some quick scripting.

E.g.
for i in `seq 2 123`; do sed 's/\r$//;s/&lt;/</g;s/<pre[^>]*>/\n<pre>\n/g;s/<\/pre>/\n<\/pre>\n/g;s/<a name=[^>]*><\/a>//g' $i | sed -n '/^<pre>$/,/^<\/pre>$/{/^<.*pre>$/d;p}'; done
on downloaded (in the order of appearance in the toc) files, I get
following, which while it doesn't compile, I suppose some manual reordering
and if it is needed in C, also e.g. in case of typedef BrigModuleHeader* BrigModule_t; adding
struct before BrigModuleHeader or turning that struct also into a typedef, might make it work.
Now the question is if it covers all you care about.

typedef BrigModuleHeader* BrigModule_t;
typedef uint32_t BrigDataOffset32_t;
typedef uint32_t BrigCodeOffset32_t;
typedef uint32_t BrigOperandOffset32_t;
typedef BrigDataOffset32_t BrigDataOffsetString32_t;
typedef BrigDataOffset32_t BrigDataOffsetCodeList32_t;
typedef BrigDataOffset32_t BrigDataOffsetOperandList32_t;
typedef uint8_t BrigAlignment8_t;
enum BrigAlignment {
  BRIG_ALIGNMENT_NONE = 0,
  BRIG_ALIGNMENT_1 = 1,
  BRIG_ALIGNMENT_2 = 2,
  BRIG_ALIGNMENT_4 = 3,
  BRIG_ALIGNMENT_8 = 4,
  BRIG_ALIGNMENT_16 = 5,
  BRIG_ALIGNMENT_32 = 6,
  BRIG_ALIGNMENT_64 = 7,
  BRIG_ALIGNMENT_128 = 8,
  BRIG_ALIGNMENT_256 = 9
};
typedef uint8_t BrigAllocation8_t;
enum BrigAllocation {
  BRIG_ALLOCATION_NONE = 0,
  BRIG_ALLOCATION_PROGRAM = 1,
  BRIG_ALLOCATION_AGENT = 2,
  BRIG_ALLOCATION_AUTOMATIC = 3
};
typedef uint8_t BrigAluModifier8_t;
enum BrigAluModifierMask {
  BRIG_ALU_FTZ = 1
};

typedef uint8_t BrigAtomicOperation8_t;
enum BrigAtomicOperation {
  BRIG_ATOMIC_ADD = 0,
  BRIG_ATOMIC_AND = 1,
  BRIG_ATOMIC_CAS = 2,
  BRIG_ATOMIC_EXCH = 3,
  BRIG_ATOMIC_LD = 4,
  BRIG_ATOMIC_MAX = 5,
  BRIG_ATOMIC_MIN = 6,
  BRIG_ATOMIC_OR = 7,
  BRIG_ATOMIC_ST = 8,
  BRIG_ATOMIC_SUB = 9,
  BRIG_ATOMIC_WRAPDEC = 10,
  BRIG_ATOMIC_WRAPINC = 11,
  BRIG_ATOMIC_XOR = 12,
  BRIG_ATOMIC_WAIT_EQ = 13,
  BRIG_ATOMIC_WAIT_NE = 14,
  BRIG_ATOMIC_WAIT_LT = 15,
  BRIG_ATOMIC_WAIT_GTE = 16,
  BRIG_ATOMIC_WAITTIMEOUT_EQ = 17,
  BRIG_ATOMIC_WAITTIMEOUT_NE = 18,
  BRIG_ATOMIC_WAITTIMEOUT_LT = 19,
  BRIG_ATOMIC_WAITTIMEOUT_GTE = 20
};
struct BrigBase {
  uint16_t byteCount;
  BrigKind16_t kind;
}; 
typedef uint8_t BrigCompareOperation8_t;
enum BrigCompareOperation {
  BRIG_COMPARE_EQ = 0,
  BRIG_COMPARE_NE = 1,
  BRIG_COMPARE_LT = 2,
  BRIG_COMPARE_LE = 3,
  BRIG_COMPARE_GT = 4,
  BRIG_COMPARE_GE = 5,
  BRIG_COMPARE_EQU = 6,
  BRIG_COMPARE_NEU = 7,
  BRIG_COMPARE_LTU = 8,
  BRIG_COMPARE_LEU = 9,
  BRIG_COMPARE_GTU = 10,
  BRIG_COMPARE_GEU = 11,
  BRIG_COMPARE_NUM = 12,
  BRIG_COMPARE_NAN = 13,
  BRIG_COMPARE_SEQ = 14,
  BRIG_COMPARE_SNE = 15,
  BRIG_COMPARE_SLT = 16,
  BRIG_COMPARE_SLE = 17,
  BRIG_COMPARE_SGT = 18,
  BRIG_COMPARE_SGE = 19,
  BRIG_COMPARE_SGEU = 20,
  BRIG_COMPARE_SEQU = 21,
  BRIG_COMPARE_SNEU = 22,
  BRIG_COMPARE_SLTU = 23,
  BRIG_COMPARE_SLEU = 24,
  BRIG_COMPARE_SNUM = 25,
  BRIG_COMPARE_SNAN = 26,
  BRIG_COMPARE_SGTU = 27
};
typedef uint16_t BrigControlDirective16_t;
enum BrigControlDirective {
  BRIG_CONTROL_NONE = 0,
  BRIG_CONTROL_ENABLEBREAKEXCEPTIONS = 1,
  BRIG_CONTROL_ENABLEDETECTEXCEPTIONS = 2,
  BRIG_CONTROL_MAXDYNAMICGROUPSIZE = 3,
  BRIG_CONTROL_MAXFLATGRIDSIZE = 4,
  BRIG_CONTROL_MAXFLATWORKGROUPSIZE = 5,
  BRIG_CONTROL_REQUIREDDIM = 6,
  BRIG_CONTROL_REQUIREDGRIDSIZE = 7,
  BRIG_CONTROL_REQUIREDWORKGROUPSIZE = 8,
  BRIG_CONTROL_REQUIRENOPARTIALWORKGROUPS = 9
 }; 

 typedef uint32_t BrigExceptions32_t;
 enum BrigExceptionsMask {
   BRIG_EXCEPTIONS_INVALID_OPERATION = 1 << 0,
   BRIG_EXCEPTIONS_DIVIDE_BY_ZERO = 1 << 1,
   BRIG_EXCEPTIONS_OVERFLOW = 1 << 2,
   BRIG_EXCEPTIONS_UNDERFLOW = 1 << 3,
   BRIG_EXCEPTIONS_INEXACT = 1 << 4,
   BRIG_EXCEPTIONS_FIRST_USER_DEFINED = 1 << 16
 };

typedef uint8_t BrigExecutableModifier8_t;
enum BrigExecutableModifierMask {
  BRIG_EXECUTABLE_DEFINITION = 1
};
typedef uint8_t BrigImageChannelOrder8_t;
enum BrigImageChannelOrder {
  BRIG_CHANNEL_ORDER_A = 0,
  BRIG_CHANNEL_ORDER_R = 1,
  BRIG_CHANNEL_ORDER_RX = 2,
  BRIG_CHANNEL_ORDER_RG = 3,
  BRIG_CHANNEL_ORDER_RGX = 4,
  BRIG_CHANNEL_ORDER_RA = 5,
  BRIG_CHANNEL_ORDER_RGB = 6,
  BRIG_CHANNEL_ORDER_RGBX = 7,
  BRIG_CHANNEL_ORDER_RGBA = 8,
  BRIG_CHANNEL_ORDER_BGRA = 9,
  BRIG_CHANNEL_ORDER_ARGB = 10,
  BRIG_CHANNEL_ORDER_ABGR = 11,
  BRIG_CHANNEL_ORDER_SRGB = 12,
  BRIG_CHANNEL_ORDER_SRGBX = 13,
  BRIG_CHANNEL_ORDER_SRGBA = 14,
  BRIG_CHANNEL_ORDER_SBGRA = 15,
  BRIG_CHANNEL_ORDER_INTENSITY = 16,
  BRIG_CHANNEL_ORDER_LUMINANCE = 17,
  BRIG_CHANNEL_ORDER_DEPTH = 18,
  BRIG_CHANNEL_ORDER_DEPTH_STENCIL = 19,
  BRIG_CHANNEL_ORDER_FIRST_USER_DEFINED = 128
};

typedef uint8_t BrigImageChannelType8_t;
enum BrigImageChannelType {
  BRIG_CHANNEL_TYPE_SNORM_INT8 = 0,
  BRIG_CHANNEL_TYPE_SNORM_INT16 = 1,
  BRIG_CHANNEL_TYPE_UNORM_INT8 = 2,
  BRIG_CHANNEL_TYPE_UNORM_INT16 = 3,
  BRIG_CHANNEL_TYPE_UNORM_INT24 = 4,
  BRIG_CHANNEL_TYPE_UNORM_SHORT_555 = 5,
  BRIG_CHANNEL_TYPE_UNORM_SHORT_565 = 6,
  BRIG_CHANNEL_TYPE_UNORM_INT_101010 = 7,
  BRIG_CHANNEL_TYPE_SIGNED_INT8 = 8,
  BRIG_CHANNEL_TYPE_SIGNED_INT16 = 9,
  BRIG_CHANNEL_TYPE_SIGNED_INT32 = 10,
  BRIG_CHANNEL_TYPE_UNSIGNED_INT8 = 11,
  BRIG_CHANNEL_TYPE_UNSIGNED_INT16 = 12,
  BRIG_CHANNEL_TYPE_UNSIGNED_INT32 = 13,
  BRIG_CHANNEL_TYPE_HALF_FLOAT = 14,
  BRIG_CHANNEL_TYPE_FLOAT = 15,
  BRIG_CHANNEL_TYPE_FIRST_USER_DEFINED = 128
};

typedef uint8_t BrigImageGeometry8_t;
enum BrigImageGeometry {
  BRIG_GEOMETRY_1D = 0,
  BRIG_GEOMETRY_2D = 1,
  BRIG_GEOMETRY_3D = 2,
  BRIG_GEOMETRY_1DA = 3,
  BRIG_GEOMETRY_2DA = 4,
  BRIG_GEOMETRY_1DB = 5,
  BRIG_GEOMETRY_2DDEPTH = 6,
  BRIG_GEOMETRY_2DADEPTH = 7,
  BRIG_GEOMETRY_FIRST_USER_DEFINED = 128
};

typedef uint8_t BrigImageQuery8_t;
enum BrigImageQuery {
  BRIG_IMAGE_QUERY_WIDTH = 0,
  BRIG_IMAGE_QUERY_HEIGHT = 1,
  BRIG_IMAGE_QUERY_DEPTH = 2,
  BRIG_IMAGE_QUERY_ARRAY = 3,
  BRIG_IMAGE_QUERY_CHANNELORDER = 4,
  BRIG_IMAGE_QUERY_CHANNELTYPE = 5
};
typedef uint16_t BrigKind16_t;
enum BrigKind {
  BRIG_KIND_NONE = 0x0000,
  BRIG_KIND_DIRECTIVE_BEGIN = 0x1000,
      BRIG_KIND_DIRECTIVE_ARG_BLOCK_END = 0x1000,
      BRIG_KIND_DIRECTIVE_ARG_BLOCK_START = 0x1001,
      BRIG_KIND_DIRECTIVE_COMMENT = 0x1002,
      BRIG_KIND_DIRECTIVE_CONTROL = 0x1003,
      BRIG_KIND_DIRECTIVE_EXTENSION = 0x1004,
      BRIG_KIND_DIRECTIVE_FBARRIER = 0x1005,
      BRIG_KIND_DIRECTIVE_FUNCTION = 0x1006,
      BRIG_KIND_DIRECTIVE_INDIRECT_FUNCTION = 0x1007,
      BRIG_KIND_DIRECTIVE_KERNEL = 0x1008,
      BRIG_KIND_DIRECTIVE_LABEL = 0x1009,
      BRIG_KIND_DIRECTIVE_LOC = 0x100a,
      BRIG_KIND_DIRECTIVE_MODULE = 0x100b,
      BRIG_KIND_DIRECTIVE_PRAGMA = 0x100c,
      BRIG_KIND_DIRECTIVE_SIGNATURE = 0x100d,
      BRIG_KIND_DIRECTIVE_VARIABLE = 0x100e,
  BRIG_KIND_DIRECTIVE_END = 0x100f,
  BRIG_KIND_INST_BEGIN = 0x2000,
      BRIG_KIND_INST_ADDR = 0x2000,
      BRIG_KIND_INST_ATOMIC = 0x2001,
      BRIG_KIND_INST_BASIC = 0x2002,
      BRIG_KIND_INST_BR = 0x2003,
      BRIG_KIND_INST_CMP = 0x2004,
      BRIG_KIND_INST_CVT = 0x2005,
      BRIG_KIND_INST_IMAGE = 0x2006,
      BRIG_KIND_INST_LANE = 0x2007,
      BRIG_KIND_INST_MEM = 0x2008,
      BRIG_KIND_INST_MEM_FENCE = 0x2009,
      BRIG_KIND_INST_MOD = 0x200a,
      BRIG_KIND_INST_QUERY_IMAGE = 0x200b,
      BRIG_KIND_INST_QUERY_SAMPLER = 0x200c,
      BRIG_KIND_INST_QUEUE = 0x200d,
      BRIG_KIND_INST_SEG = 0x200e,
      BRIG_KIND_INST_SEG_CVT = 0x200f,
      BRIG_KIND_INST_SIGNAL = 0x2010,
      BRIG_KIND_INST_SOURCE_TYPE = 0x2011,
  BRIG_KIND_INST_END = 0x2012,
  BRIG_KIND_OPERAND_BEGIN = 0x3000,
      BRIG_KIND_OPERAND_ADDRESS = 0x3000,
      BRIG_KIND_OPERAND_ALIGN = 0x3001,
      BRIG_KIND_OPERAND_CODE_LIST = 0x3002,
      BRIG_KIND_OPERAND_CODE_REF = 0x3003,
      BRIG_KIND_OPERAND_CONSTANT_BYTES = 0x3004,
      BRIG_KIND_OPERAND_RESERVED = 0x3005,
      BRIG_KIND_OPERAND_CONSTANT_IMAGE = 0x3006,
      BRIG_KIND_OPERAND_CONSTANT_OPERAND_LIST = 0x3007,
      BRIG_KIND_OPERAND_CONSTANT_SAMPLER = 0x3008,
      BRIG_KIND_OPERAND_OPERAND_LIST = 0x3009,
      BRIG_KIND_OPERAND_REGISTER = 0x300a,
      BRIG_KIND_OPERAND_STRING = 0x300b,
      BRIG_KIND_OPERAND_WAVESIZE = 0x3009c, 
  BRIG_KIND_OPERAND_END = 0x300d
};

typedef uint8_t BrigLinkage8_t;
enum BrigLinkage {
  BRIG_LINKAGE_NONE = 0,
  BRIG_LINKAGE_PROGRAM = 1,
  BRIG_LINKAGE_MODULE = 2,
  BRIG_LINKAGE_FUNCTION = 3,
  BRIG_LINKAGE_ARG = 4
};
typedef uint8_t BrigMachineModel8_t;
enum BrigMachineModel {
  BRIG_MACHINE_SMALL = 0,
  BRIG_MACHINE_LARGE = 1
};
typedef uint8_t BrigMemoryModifier8_t;
enum BrigMemoryModifierMask {
  BRIG_MEMORY_CONST = 1
};
typedef uint8_t BrigMemoryOrder8_t;
enum BrigMemoryOrder {
  BRIG_MEMORY_ORDER_NONE = 0,
  BRIG_MEMORY_ORDER_RELAXED = 1,
  BRIG_MEMORY_ORDER_SC_ACQUIRE = 2,
  BRIG_MEMORY_ORDER_SC_RELEASE = 3,
  BRIG_MEMORY_ORDER_SC_ACQUIRE_RELEASE = 4
};
typedef uint8_t BrigMemoryScope8_t;
enum BrigMemoryScope {
  BRIG_MEMORY_SCOPE_NONE = 0,
  BRIG_MEMORY_SCOPE_WORKITEM = 1,
  BRIG_MEMORY_SCOPE_WAVEFRONT = 2,
  BRIG_MEMORY_SCOPE_WORKGROUP = 3,
  BRIG_MEMORY_SCOPE_AGENT = 4,
  BRIG_MEMORY_SCOPE_SYSTEM = 5
};
struct BrigModuleHeader {
  char identification[8];
  BrigVersion32_t brigMajor;
  BrigVersion32_t brigMinor;
  uint64_t byteCount;
  uint8_t hash[64];
  uint32_t reserved;
  uint32_t sectionCount;
  uint64_t sectionIndex;
};

typedef uint16_t BrigOpcode16_t;
enum BrigOpcode {
  BRIG_OPCODE_NOP = 0,
  BRIG_OPCODE_ABS = 1,
  BRIG_OPCODE_ADD = 2,
  BRIG_OPCODE_BORROW = 3,
  BRIG_OPCODE_CARRY = 4,
  BRIG_OPCODE_CEIL = 5,
  BRIG_OPCODE_COPYSIGN = 6,
  BRIG_OPCODE_DIV = 7,
  BRIG_OPCODE_FLOOR = 8,
  BRIG_OPCODE_FMA = 9,
  BRIG_OPCODE_FRACT = 10,
  BRIG_OPCODE_MAD = 11,
  BRIG_OPCODE_MAX = 12,
  BRIG_OPCODE_MIN = 13,
  BRIG_OPCODE_MUL = 14,
  BRIG_OPCODE_MULHI = 15,
  BRIG_OPCODE_NEG = 16,
  BRIG_OPCODE_REM = 17,
  BRIG_OPCODE_RINT = 18,
  BRIG_OPCODE_SQRT = 19,
  BRIG_OPCODE_SUB = 20,
  BRIG_OPCODE_TRUNC = 21,
  BRIG_OPCODE_MAD24 = 22,
  BRIG_OPCODE_MAD24HI = 23,
  BRIG_OPCODE_MUL24 = 24,
  BRIG_OPCODE_MUL24HI = 25,
  BRIG_OPCODE_SHL = 26,
  BRIG_OPCODE_SHR = 27,
  BRIG_OPCODE_AND = 28,
  BRIG_OPCODE_NOT = 29,
  BRIG_OPCODE_OR = 30,
  BRIG_OPCODE_POPCOUNT = 31,
  BRIG_OPCODE_XOR = 32,
  BRIG_OPCODE_BITEXTRACT = 33,
  BRIG_OPCODE_BITINSERT = 34,
  BRIG_OPCODE_BITMASK = 35,
  BRIG_OPCODE_BITREV = 36,
  BRIG_OPCODE_BITSELECT = 37,
  BRIG_OPCODE_FIRSTBIT = 38,
  BRIG_OPCODE_LASTBIT = 39,
  BRIG_OPCODE_COMBINE = 40,
  BRIG_OPCODE_EXPAND = 41,
  BRIG_OPCODE_LDA = 42,
  BRIG_OPCODE_MOV = 43,
  BRIG_OPCODE_SHUFFLE = 44,
  BRIG_OPCODE_UNPACKHI = 45,
  BRIG_OPCODE_UNPACKLO = 46,
  BRIG_OPCODE_PACK = 47,
  BRIG_OPCODE_UNPACK = 48,
  BRIG_OPCODE_CMOV = 49,
  BRIG_OPCODE_CLASS = 50,
  BRIG_OPCODE_NCOS = 51,
  BRIG_OPCODE_NEXP2 = 52,
  BRIG_OPCODE_NFMA = 53,
  BRIG_OPCODE_NLOG2 = 54,
  BRIG_OPCODE_NRCP = 55,
  BRIG_OPCODE_NRSQRT = 56,
  BRIG_OPCODE_NSIN = 57,
  BRIG_OPCODE_NSQRT = 58,
  BRIG_OPCODE_BITALIGN = 59,
  BRIG_OPCODE_BYTEALIGN = 60,
  BRIG_OPCODE_PACKCVT = 61,
  BRIG_OPCODE_UNPACKCVT = 62,
  BRIG_OPCODE_LERP = 63,
  BRIG_OPCODE_SAD = 64,
  BRIG_OPCODE_SADHI = 65,
  BRIG_OPCODE_SEGMENTP = 66,
  BRIG_OPCODE_FTOS = 67,
  BRIG_OPCODE_STOF = 68,
  BRIG_OPCODE_CMP = 69,
  BRIG_OPCODE_CVT = 70,
  BRIG_OPCODE_LD = 71,
  BRIG_OPCODE_ST = 72,
  BRIG_OPCODE_ATOMIC = 73,
  BRIG_OPCODE_ATOMICNORET = 74,
  BRIG_OPCODE_SIGNAL = 75,
  BRIG_OPCODE_SIGNALNORET = 76,
  BRIG_OPCODE_MEMFENCE = 77,
  BRIG_OPCODE_RDIMAGE = 78,
  BRIG_OPCODE_LDIMAGE = 79,
  BRIG_OPCODE_STIMAGE = 80,
  BRIG_OPCODE_IMAGEFENCE = 81,
  BRIG_OPCODE_QUERYIMAGE = 82,
  BRIG_OPCODE_QUERYSAMPLER = 83,
  BRIG_OPCODE_CBR = 84,
  BRIG_OPCODE_BR = 85,
  BRIG_OPCODE_SBR = 86,
  BRIG_OPCODE_BARRIER = 87,
  BRIG_OPCODE_WAVEBARRIER = 88,
  BRIG_OPCODE_ARRIVEFBAR = 89,
  BRIG_OPCODE_INITFBAR = 90,
  BRIG_OPCODE_JOINFBAR = 91,
  BRIG_OPCODE_LEAVEFBAR = 92,
  BRIG_OPCODE_RELEASEFBAR = 93,
  BRIG_OPCODE_WAITFBAR = 94,
  BRIG_OPCODE_LDF = 95,
  BRIG_OPCODE_ACTIVELANECOUNT = 96,
  BRIG_OPCODE_ACTIVELANEID = 97,
  BRIG_OPCODE_ACTIVELANEMASK = 98,
  BRIG_OPCODE_ACTIVELANEPERMUTE = 99,
  BRIG_OPCODE_CALL = 100,
  BRIG_OPCODE_SCALL = 101,
  BRIG_OPCODE_ICALL = 102,
  BRIG_OPCODE_RET = 103,
  BRIG_OPCODE_ALLOCA = 104,
  BRIG_OPCODE_CURRENTWORKGROUPSIZE = 105,
  BRIG_OPCODE_CURRENTWORKITEMFLATID = 106,
  BRIG_OPCODE_DIM = 107,
  BRIG_OPCODE_GRIDGROUPS = 108,
  BRIG_OPCODE_GRIDSIZE = 109,
  BRIG_OPCODE_PACKETCOMPLETIONSIG = 110,
  BRIG_OPCODE_PACKETID = 111,
  BRIG_OPCODE_WORKGROUPID = 112,
  BRIG_OPCODE_WORKGROUPSIZE = 113,
  BRIG_OPCODE_WORKITEMABSID = 114,
  BRIG_OPCODE_WORKITEMFLATABSID = 115,
  BRIG_OPCODE_WORKITEMFLATID = 116,
  BRIG_OPCODE_WORKITEMID = 117,
  BRIG_OPCODE_CLEARDETECTEXCEPT = 118,
  BRIG_OPCODE_GETDETECTEXCEPT = 119,
  BRIG_OPCODE_SETDETECTEXCEPT = 120,
  BRIG_OPCODE_ADDQUEUEWRITEINDEX = 121,
  BRIG_OPCODE_CASQUEUEWRITEINDEX = 122,
  BRIG_OPCODE_LDQUEUEREADINDEX = 123,
  BRIG_OPCODE_LDQUEUEWRITEINDEX = 124,
  BRIG_OPCODE_STQUEUEREADINDEX = 125,
  BRIG_OPCODE_STQUEUEWRITEINDEX = 126,
  BRIG_OPCODE_CLOCK = 127,
  BRIG_OPCODE_CUID = 128,
  BRIG_OPCODE_DEBUGTRAP = 129,
  BRIG_OPCODE_GROUPBASEPTR = 130,
  BRIG_OPCODE_KERNARGBASEPTR = 131,
  BRIG_OPCODE_LANEID = 132,
  BRIG_OPCODE_MAXCUID = 133,
  BRIG_OPCODE_MAXWAVEID = 134,
  BRIG_OPCODE_NULLPTR = 135,
  BRIG_OPCODE_WAVEID = 136,
  BRIG_OPCODE_FIRST_USER_DEFINED = 32768,
 };

typedef uint8_t BrigPack8_t;
enum BrigPack {
  BRIG_PACK_NONE = 0,
  BRIG_PACK_PP = 1,
  BRIG_PACK_PS = 2,
  BRIG_PACK_SP = 3,
  BRIG_PACK_SS = 4,
  BRIG_PACK_S = 5,
  BRIG_PACK_P = 6,
  BRIG_PACK_PPSAT = 7,
  BRIG_PACK_PSSAT = 8,
  BRIG_PACK_SPSAT = 9,
  BRIG_PACK_SSSAT = 10,
  BRIG_PACK_SSAT = 11,
  BRIG_PACK_PSAT = 12
};
typedef uint8_t BrigProfile8_t;
enum BrigProfile {
  BRIG_PROFILE_BASE = 0,
  BRIG_PROFILE_FULL = 1
};
typedef uint16_t BrigRegisterKind16_t;
enum BrigRegisterKind {
  BRIG_REGISTER_KIND_CONTROL = 0,
  BRIG_REGISTER_KIND_SINGLE = 1,
  BRIG_REGISTER_KIND_DOUBLE = 2,
  BRIG_REGISTER_KIND_QUAD = 3
};
typedef uint8_t BrigRound8_t;
enum BrigRound {
  BRIG_ROUND_NONE = 0,
  BRIG_ROUND_FLOAT_DEFAULT = 1,
  BRIG_ROUND_FLOAT_NEAR_EVEN = 2,
  BRIG_ROUND_FLOAT_ZERO = 3,
  BRIG_ROUND_FLOAT_PLUS_INFINITY = 4,
  BRIG_ROUND_FLOAT_MINUS_INFINITY = 5,
  BRIG_ROUND_INTEGER_NEAR_EVEN = 6,
  BRIG_ROUND_INTEGER_ZERO = 7,
  BRIG_ROUND_INTEGER_PLUS_INFINITY = 8,
  BRIG_ROUND_INTEGER_MINUS_INFINITY = 9,
  BRIG_ROUND_INTEGER_NEAR_EVEN_SAT = 10,
  BRIG_ROUND_INTEGER_ZERO_SAT = 11,
  BRIG_ROUND_INTEGER_PLUS_INFINITY_SAT = 12,
  BRIG_ROUND_INTEGER_MINUS_INFINITY_SAT = 13,
  BRIG_ROUND_INTEGER_SIGNALING_NEAR_EVEN = 14,
  BRIG_ROUND_INTEGER_SIGNALING_ZERO = 15,
  BRIG_ROUND_INTEGER_SIGNALING_PLUS_INFINITY = 16,
  BRIG_ROUND_INTEGER_SIGNALING_MINUS_INFINITY = 17,
  BRIG_ROUND_INTEGER_SIGNALING_NEAR_EVEN_SAT = 18,
  BRIG_ROUND_INTEGER_SIGNALING_ZERO_SAT = 19,
  BRIG_ROUND_INTEGER_SIGNALING_PLUS_INFINITY_SAT = 20,
  BRIG_ROUND_INTEGER_SIGNALING_MINUS_INFINITY_SAT = 21
};

typedef uint8_t BrigSamplerAddressing8_t;
enum BrigSamplerAddressing {
  BRIG_ADDRESSING_UNDEFINED = 0,
  BRIG_ADDRESSING_CLAMP_TO_EDGE = 1,
  BRIG_ADDRESSING_CLAMP_TO_BORDER = 2,
  BRIG_ADDRESSING_REPEAT = 3,
  BRIG_ADDRESSING_MIRRORED_REPEAT = 4,
  BRIG_ADDRESSING_FIRST_USER_DEFINED = 128
};

typedef uint8_t BrigSamplerCoordNormalization8_t;
enum BrigSamplerCoordNormalization {
  BRIG_COORD_UNNORMALIZED = 0,
  BRIG_COORD_NORMALIZED = 1
};
typedef uint8_t BrigSamplerFilter8_t;
enum BrigSamplerFilter {
  BRIG_FILTER_NEAREST = 0,
  BRIG_FILTER_LINEAR = 1,
  BRIG_FILTER_FIRST_USER_DEFINED = 128
};

typedef uint8_t BrigSamplerQuery8_t;
enum BrigSamplerQuery {
  BRIG_SAMPLER_QUERY_ADDRESSING = 0,
  BRIG_SAMPLER_QUERY_COORD = 1,
  BRIG_SAMPLER_QUERY_FILTER = 2
};
typedef uint32_t BrigSectionIndex32_t;
enum BrigSectionIndex {
  BRIG_SECTION_INDEX_DATA = 0,
  BRIG_SECTION_INDEX_CODE = 1,
  BRIG_SECTION_INDEX_OPERAND = 2,
  BRIG_SECTION_INDEX_BEGIN_IMPLEMENTATION_DEFINED = 3
};
struct BrigSectionHeader {
  uint64_t byteCount;
  uint32_t headerByteCount;
  uint32_t nameLength;
  uint8_t name[1];
};
typedef uint8_t BrigSegCvtModifier8_t;
enum BrigSegCvtModifierMask {
  BRIG_SEG_CVT_NONULL = 1
};
typedef uint8_t BrigSegment8_t;
enum BrigSegment {
  BRIG_SEGMENT_NONE = 0,
  BRIG_SEGMENT_FLAT = 1,
  BRIG_SEGMENT_GLOBAL = 2,
  BRIG_SEGMENT_READONLY = 3,
  BRIG_SEGMENT_KERNARG = 4,
  BRIG_SEGMENT_GROUP = 5,
  BRIG_SEGMENT_PRIVATE = 6,
  BRIG_SEGMENT_SPILL = 7,
  BRIG_SEGMENT_ARG = 8,
  BRIG_SEGMENT_FIRST_USER_DEFINED = 128
};

enum {
  BRIG_TYPE_BASE_SIZE = 5,
  BRIG_TYPE_PACK_SIZE = 2,
  BRIG_TYPE_ARRAY_SIZE = 1,

  BRIG_TYPE_BASE_SHIFT = 0,
  BRIG_TYPE_PACK_SHIFT = BRIG_TYPE_BASE_SHIFT + BRIG_TYPE_BASE_SIZE,
  BRIG_TYPE_ARRAY_SHIFT = BRIG_TYPE_PACK_SHIFT + BRIG_TYPE_PACK_SIZE,

  BRIG_TYPE_BASE_MASK = ((1 << BRIG_TYPE_BASE_SIZE) - 1) << BRIG_TYPE_BASE_SHIFT,
  BRIG_TYPE_PACK_MASK = ((1 << BRIG_TYPE_PACK_SIZE) - 1) << BRIG_TYPE_PACK_SHIFT,
  BRIG_TYPE_ARRAY_MASK = ((1 << BRIG_TYPE_ARRAY_SIZE) - 1) << BRIG_TYPE_ARRAY_SHIFT,
 
  BRIG_TYPE_PACK_NONE = 0 << BRIG_TYPE_PACK_SHIFT,
  BRIG_TYPE_PACK_32 = 1 << BRIG_TYPE_PACK_SHIFT,
  BRIG_TYPE_PACK_64 = 2 << BRIG_TYPE_PACK_SHIFT,
  BRIG_TYPE_PACK_128 = 3 << BRIG_TYPE_PACK_SHIFT,

  BRIG_TYPE_ARRAY = 1 << BRIG_TYPE_ARRAY_SHIFT
 };

typedef uint16_t BrigType16_t;
enum BrigType {
  BRIG_TYPE_NONE = 0,
 
  BRIG_TYPE_U8 = 1,
  BRIG_TYPE_U16 = 2,
  BRIG_TYPE_U32 = 3,
  BRIG_TYPE_U64 = 4,
 
  BRIG_TYPE_S8 = 5,
  BRIG_TYPE_S16 = 6,
  BRIG_TYPE_S32 = 7,
  BRIG_TYPE_S64 = 8,
 
  BRIG_TYPE_F16 = 9,
  BRIG_TYPE_F32 = 10,
  BRIG_TYPE_F64 = 11,
 
  BRIG_TYPE_B1 = 12,
  BRIG_TYPE_B8 = 13,
  BRIG_TYPE_B16 = 14,
  BRIG_TYPE_B32 = 15,
  BRIG_TYPE_B64 = 16,
  BRIG_TYPE_B128 = 17,
 
  BRIG_TYPE_SAMP = 18,
  BRIG_TYPE_ROIMG = 19,
  BRIG_TYPE_WOIMG = 20,
  BRIG_TYPE_RWIMG = 21,
 
  BRIG_TYPE_SIG32 = 22,
  BRIG_TYPE_SIG64 = 23,
 
  BRIG_TYPE_U8X4 = BRIG_TYPE_U8 | BRIG_TYPE_PACK_32,
  BRIG_TYPE_U8X8 = BRIG_TYPE_U8 | BRIG_TYPE_PACK_64,
  BRIG_TYPE_U8X16 = BRIG_TYPE_U8 | BRIG_TYPE_PACK_128,
 
  BRIG_TYPE_U16X2 = BRIG_TYPE_U16 | BRIG_TYPE_PACK_32,
  BRIG_TYPE_U16X4 = BRIG_TYPE_U16 | BRIG_TYPE_PACK_64,
  BRIG_TYPE_U16X8 = BRIG_TYPE_U16 | BRIG_TYPE_PACK_128,
 
  BRIG_TYPE_U32X2 = BRIG_TYPE_U32 | BRIG_TYPE_PACK_64,
  BRIG_TYPE_U32X4 = BRIG_TYPE_U32 | BRIG_TYPE_PACK_128,
 
  BRIG_TYPE_U64X2 = BRIG_TYPE_U64 | BRIG_TYPE_PACK_128,
 
  BRIG_TYPE_S8X4 = BRIG_TYPE_S8 | BRIG_TYPE_PACK_32,
  BRIG_TYPE_S8X8 = BRIG_TYPE_S8 | BRIG_TYPE_PACK_64,
  BRIG_TYPE_S8X16 = BRIG_TYPE_S8 | BRIG_TYPE_PACK_128,
 
  BRIG_TYPE_S16X2 = BRIG_TYPE_S16 | BRIG_TYPE_PACK_32,
  BRIG_TYPE_S16X4 = BRIG_TYPE_S16 | BRIG_TYPE_PACK_64,
  BRIG_TYPE_S16X8 = BRIG_TYPE_S16 | BRIG_TYPE_PACK_128,
 
  BRIG_TYPE_S32X2 = BRIG_TYPE_S32 | BRIG_TYPE_PACK_64,
  BRIG_TYPE_S32X4 = BRIG_TYPE_S32 | BRIG_TYPE_PACK_128,
 
  BRIG_TYPE_S64X2 = BRIG_TYPE_S64 | BRIG_TYPE_PACK_128,
 
  BRIG_TYPE_F16X2 = BRIG_TYPE_F16 | BRIG_TYPE_PACK_32,
  BRIG_TYPE_F16X4 = BRIG_TYPE_F16 | BRIG_TYPE_PACK_64,
  BRIG_TYPE_F16X8 = BRIG_TYPE_F16 | BRIG_TYPE_PACK_128,
 
  BRIG_TYPE_F32X2 = BRIG_TYPE_F32 | BRIG_TYPE_PACK_64,
  BRIG_TYPE_F32X4 = BRIG_TYPE_F32 | BRIG_TYPE_PACK_128,
 
  BRIG_TYPE_F64X2 = BRIG_TYPE_F64 | BRIG_TYPE_PACK_128,

  BRIG_TYPE_U8_ARRAY = BRIG_TYPE_U8 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_U16_ARRAY = BRIG_TYPE_U16 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_U32_ARRAY = BRIG_TYPE_U32 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_U64_ARRAY = BRIG_TYPE_U64 | BRIG_TYPE_ARRAY,
 
  BRIG_TYPE_S8_ARRAY = BRIG_TYPE_S8 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_S16_ARRAY = BRIG_TYPE_S16 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_S32_ARRAY = BRIG_TYPE_S32 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_S64_ARRAY = BRIG_TYPE_S64 | BRIG_TYPE_ARRAY,
 
  BRIG_TYPE_F16_ARRAY = BRIG_TYPE_F16 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_F32_ARRAY = BRIG_TYPE_F32 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_F64_ARRAY = BRIG_TYPE_F64 | BRIG_TYPE_ARRAY,
 
  BRIG_TYPE_B8_ARRAY = BRIG_TYPE_B8 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_B16_ARRAY = BRIG_TYPE_B16 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_B32_ARRAY = BRIG_TYPE_B32 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_B64_ARRAY = BRIG_TYPE_B64 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_B128_ARRAY = BRIG_TYPE_B128 | BRIG_TYPE_ARRAY,
 
  BRIG_TYPE_SAMP_ARRAY = BRIG_TYPE_SAMP | BRIG_TYPE_ARRAY,
  BRIG_TYPE_ROIMG_ARRAY = BRIG_TYPE_ROIMG | BRIG_TYPE_ARRAY,
  BRIG_TYPE_WOIMG_ARRAY = BRIG_TYPE_WOIMG | BRIG_TYPE_ARRAY,
  BRIG_TYPE_RWIMG_ARRAY = BRIG_TYPE_RWIMG | BRIG_TYPE_ARRAY,
 
  BRIG_TYPE_SIG32_ARRAY = BRIG_TYPE_SIG32 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_SIG64_ARRAY = BRIG_TYPE_SIG64 | BRIG_TYPE_ARRAY,
 
  BRIG_TYPE_U8X4_ARRAY = BRIG_TYPE_U8X4 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_U8X8_ARRAY = BRIG_TYPE_U8X8 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_U8X16_ARRAY = BRIG_TYPE_U8X16 | BRIG_TYPE_ARRAY,
 
  BRIG_TYPE_U16X2_ARRAY = BRIG_TYPE_U16X2 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_U16X4_ARRAY = BRIG_TYPE_U16X4 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_U16X8_ARRAY = BRIG_TYPE_U16X8 | BRIG_TYPE_ARRAY,
 
  BRIG_TYPE_U32X2_ARRAY = BRIG_TYPE_U32X2 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_U32X4_ARRAY = BRIG_TYPE_U32X4 | BRIG_TYPE_ARRAY,
 
  BRIG_TYPE_U64X2_ARRAY = BRIG_TYPE_U64X2 | BRIG_TYPE_ARRAY,
 
  BRIG_TYPE_S8X4_ARRAY = BRIG_TYPE_S8X4 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_S8X8_ARRAY = BRIG_TYPE_S8X8 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_S8X16_ARRAY = BRIG_TYPE_S8X16 | BRIG_TYPE_ARRAY,
 
  BRIG_TYPE_S16X2_ARRAY = BRIG_TYPE_S16X2 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_S16X4_ARRAY = BRIG_TYPE_S16X4 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_S16X8_ARRAY = BRIG_TYPE_S16X8 | BRIG_TYPE_ARRAY,
 
  BRIG_TYPE_S32X2_ARRAY = BRIG_TYPE_S32X2 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_S32X4_ARRAY = BRIG_TYPE_S32X4 | BRIG_TYPE_ARRAY,
 
  BRIG_TYPE_S64X2_ARRAY = BRIG_TYPE_S64X2 | BRIG_TYPE_ARRAY,
 
  BRIG_TYPE_F16X2_ARRAY = BRIG_TYPE_F16X2 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_F16X4_ARRAY = BRIG_TYPE_F16X4 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_F16X8_ARRAY = BRIG_TYPE_F16X8 | BRIG_TYPE_ARRAY,
 
  BRIG_TYPE_F32X2_ARRAY = BRIG_TYPE_F32X2 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_F32X4_ARRAY = BRIG_TYPE_F32X4 | BRIG_TYPE_ARRAY,
 
  BRIG_TYPE_F64X2_ARRAY = BRIG_TYPE_F64X2 | BRIG_TYPE_ARRAY
};

struct BrigUInt64 {
  uint32_t lo;
  uint32_t hi;
 };
value = (uint64_t(hi) << 32) | uint64_t(lo)
typedef uint8_t BrigVariableModifier8_t;
enum BrigVariableModifierMask {
  BRIG_VARIABLE_DEFINITION = 1,
  BRIG_VARIABLE_CONST = 2
};


typedef_uint32_t BrigVersion32_t;
enum BrigVersion {
  BRIG_VERSION_HSAIL_MAJOR = 1,
  BRIG_VERSION_HSAIL_MINOR = 0,
  BRIG_VERSION_BRIG_MAJOR = 1,
  BRIG_VERSION_BRIG_MINOR = 0
};

typedef uint8_t BrigWidth8_t;
enum BrigWidth {
  BRIG_WIDTH_NONE = 0,
  BRIG_WIDTH_1 = 1,
  BRIG_WIDTH_2 = 2,
  BRIG_WIDTH_4 = 3,
  BRIG_WIDTH_8 = 4,
  BRIG_WIDTH_16 = 5,
  BRIG_WIDTH_32 = 6,
  BRIG_WIDTH_64 = 7,
  BRIG_WIDTH_128 = 8,
  BRIG_WIDTH_256 = 9,
  BRIG_WIDTH_512 = 10,
  BRIG_WIDTH_1024 = 11,
  BRIG_WIDTH_2048 = 12,
  BRIG_WIDTH_4096 = 13,
  BRIG_WIDTH_8192 = 14,
  BRIG_WIDTH_16384 = 15,
  BRIG_WIDTH_32768 = 16,
  BRIG_WIDTH_65536 = 17,
  BRIG_WIDTH_131072 = 18,
  BRIG_WIDTH_262144 = 19,
  BRIG_WIDTH_524288 = 20,
  BRIG_WIDTH_1048576 = 21,
  BRIG_WIDTH_2097152 = 22,
  BRIG_WIDTH_4194304 = 23,
  BRIG_WIDTH_8388608 = 24,
  BRIG_WIDTH_16777216 = 25,
  BRIG_WIDTH_33554432 = 26,
  BRIG_WIDTH_67108864 = 27,
  BRIG_WIDTH_134217728 = 28,
  BRIG_WIDTH_268435456 = 29,
  BRIG_WIDTH_536870912 = 30,
  BRIG_WIDTH_1073741824 = 31,
  BRIG_WIDTH_2147483648 = 32,
  BRIG_WIDTH_WAVESIZE = 33,
  BRIG_WIDTH_ALL = 34
};
struct BrigData {
  uint32_t byteCount; 
  uint8_t bytes[1];    
  };
struct BrigDirectiveArgBlock {
  BrigBase base;
};
struct BrigDirectiveComment {
  BrigBase base;
  BrigDataOffsetString32_t name;
};
struct BrigDirectiveControl {
  BrigBase base;
  BrigControlDirective16_t control;
  uint16_t reserved;
  BrigDataOffsetOperandList32_t operands;
};                
struct BrigDirectiveExecutable {
  BrigBase base;
  BrigDataOffsetString32_t name;
  uint16_t outArgCount;
  uint16_t inArgCount;
  BrigCodeOffset32_t firstInArg;
  BrigCodeOffset32_t firstCodeBlockEntry;
  BrigCodeOffset32_t nextModuleEntry;
  BrigExecutableModifier8_t modifier;
  BrigLinkage8_t linkage;
  uint16_t reserved;
};
struct BrigDirectiveExtension {
  BrigBase base;
  BrigDataOffsetString32_t name;
};
struct BrigDirectiveFbarrier {
  BrigBase base;
  BrigDataOffsetString32_t name;
  BrigVariableModifier8_t modifier;
  BrigLinkage8_t linkage;
  uint16_t reserved;
};
struct BrigDirectiveLabel {
  BrigBase base;
  BrigDataOffsetString32_t name;
};                        
struct BrigDirectiveLoc {
  BrigBase base;
  BrigDataOffsetString32_t filename;
  uint32_t line;
  uint32_t column;
};
struct BrigDirectiveModule {
  BrigBase base;
  BrigDataOffsetString32_t name;
  BrigVersion32_t hsailMajor;
  BrigVersion32_t hsailMinor;
  BrigProfile8_t profile;
  BrigMachineModel8_t machineModel;
  BrigRound8_t defaultFloatRound;
  uint8_t reserved;
};

struct BrigDirectiveNone {
  BrigBase base;
};
struct BrigDirectivePragma {
  BrigBase base;
  BrigDataOffsetOperandList32_t operands;
};
struct BrigDirectiveVariable {
  BrigBase base;
  BrigDataOffsetString32_t name;
  BrigOperandOffset32_t init;
  BrigType16_t type;
  BrigSegment8_t segment;
  BrigAlignment8_t align;
  BrigUInt64 dim;
  BrigVariableModifier8_t modifier;
  BrigLinkage8_t linkage;
  BrigAllocation8_t allocation;
  uint8_t reserved;
};  
struct BrigInstBase {
  BrigBase base;
  BrigOpcode16_t opcode;
  BrigType16_t type;
  BrigDataOffsetOperandList32_t operands;
};         

struct BrigInstAddr {
  BrigInstBase base; 
  BrigSegment8_t segment;
  uint8_t reserved[3];
};              
struct BrigInstAtomic {
  BrigInstBase base;
  BrigSegment8_t segment;
  BrigMemoryOrder8_t memoryOrder;
  BrigMemoryScope8_t memoryScope;
  BrigAtomicOperation8_t atomicOperation;
  uint8_t equivClass;
  uint8_t reserved[3];
};              
struct BrigInstBasic {
  BrigInstBase base; 
};
struct BrigInstBr {
  BrigInstBase base;
  BrigWidth8_t width;
  uint8_t reserved[3];
}; 
struct BrigInstCmp {
  BrigInstBase base;
  BrigType16_t sourceType;
  BrigAluModifier8_t modifier;
  BrigCompareOperation8_t compare;
  BrigPack8_t pack;
  uint8_t reserved[3];
};
struct BrigInstCvt {
  BrigInstBase base; 
  BrigType16_t sourceType;
  BrigAluModifier8_t modifier;
  BrigRound8_t round;
};
struct BrigInstImage {
  BrigInstBase base;
  BrigType16_t imageType;
  BrigType16_t coordType; 
  BrigImageGeometry8_t geometry;
  uint8_t equivClass;
  uint16_t reserved;
};  
struct BrigInstLane {
  BrigInstBase base;
  BrigType16_t sourceType;
  BrigWidth8_t width;
  uint8_t reserved;
};
struct BrigInstMem {
  BrigInstBase base; 
  BrigSegment8_t segment;
  BrigAlignment8_t align;
  uint8_t equivClass;
  BrigWidth8_t width;
  BrigMemoryModifier8_t modifier;
  uint8_t reserved[3];
};
struct BrigInstMemFence {
  BrigInstBase base;
  BrigMemoryOrder8_t memoryOrder;
  BrigMemoryScope8_t globalSegmentMemoryScope;
  BrigMemoryScope8_t groupSegmentMemoryScope;
  BrigMemoryScope8_t imageSegmentMemoryScope;
};    
struct BrigInstMod {
  BrigInstBase base;
  BrigAluModifier8_t modifier;
  BrigRound8_t round;
  BrigPack8_t pack;
  uint8_t reserved;
};  
struct BrigInstQueryImage {
  BrigInstBase base;
  BrigType16_t imageType;
  BrigImageGeometry8_t geometry;
  BrigImageQuery8_t query;
};
struct BrigInstQuerySampler {
  BrigInstBase base;
  BrigSamplerQuery8_t query;
  uint8_t reserved[3];
};
struct BrigInstQueue {
  BrigInstBase base;
  BrigSegment8_t segment;
  BrigMemoryOrder8_t memoryOrder;
  uint16_t reserved;
};
struct BrigInstSeg {
  BrigInstBase base;
  BrigSegment8_t segment;
  uint8_t reserved[3];
};
struct BrigInstSegCvt {
  BrigInstBase base;
  BrigType16_t sourceType;
  BrigSegment8_t segment;
  BrigSegCvtModifier8_t modifier;
};
struct BrigInstSignal {
  BrigInstBase base;
  BrigType16_t signalType;
  BrigMemoryOrder8_t memoryOrder;
  BrigAtomicOperation8_t signalOperation;
};
struct BrigInstSourceType {
  BrigInstBase base;
  BrigType16_t sourceType;
  uint16_t reserved;
};
struct BrigOperandAddress {
  BrigBase base; 
  BrigCodeOffset32_t symbol;
  BrigOperandOffset32_t reg;
  BrigUInt64 offset;  
};                    
struct BrigOperandAlign {
  BrigBase base; 
  BrigAlignment8_t align;
  uint8_t reserved[3];
};                    
                    
struct BrigOperandCodeList {
  BrigBase base;
  BrigDataOffsetCodeList32_t elements;
};                      
struct BrigOperandCodeRef {
  BrigBase base;
  BrigCodeOffset32_t ref;
};       
struct BrigOperandConstantBytes {
  BrigBase base;
  BrigType16_t type;
  uint16_t reserved;
  BrigDataOffsetString32_t bytes;
};
struct BrigOperandConstantImage {
  BrigBase base;
  BrigType16_t type;
  BrigImageGeometry8_t geometry;
  BrigImageChannelOrder8_t channelOrder;
  BrigImageChannelType8_t channelType;
  uint8_t reserved[3];
  BrigUInt64 width;
  BrigUInt64 height;
  BrigUInt64 depth;
  BrigUInt64 array;
};
struct BrigOperandConstantOperandList {
  BrigBase base;
  BrigType16_t type;
  uint16_t reserved;
  BrigDataOffsetOperandList32_t elements;
};    

struct BrigOperandConstantSampler {
  BrigBase base;
  BrigType16_t type;
  BrigSamplerCoordNormalization8_t coord;
  BrigSamplerFilter8_t filter;
  BrigSamplerAddressing8_t addressing;
  uint8_t reserved[3];
};

struct BrigOperandOperandList {
  BrigBase base;
  BrigDataOffsetOperandList32_t elements;
};                      
struct BrigOperandRegister {
  BrigBase base;
  BrigRegisterKind16_t regKind;
  uint16_t regNum;
};     
struct BrigOperandString {
  BrigBase base;
  BrigDataOffsetString32_t string;
};
struct BrigOperandWavesize {
  BrigBase base;
};      


	Jakub

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 05/10] OpenMP lowering/expansion changes (gridification)
  2016-01-14 12:52   ` Jakub Jelinek
@ 2016-01-15 14:52     ` Martin Jambor
  0 siblings, 0 replies; 77+ messages in thread
From: Martin Jambor @ 2016-01-15 14:52 UTC (permalink / raw)
  To: gcc-patches

Thanks Jakub and Alex,

I have committed the following to the branch to address your comments:

2016-01-15  Martin Jambor  <mjambor@suse.cz>

	* gimple.h: Fixed comment of gimple_statement_omp_single_layout
	* omp-low.c (get_target_argument_value): Fixed spelling in its
	comment.
	(push_target_argument_according_to_value): Likewise.
	* tree.h (OMP_CLAUSE_GRIDDIM_DIMENSION): Renamed to
	OMP_CLAUSE__GRIDDIM__DIMENSION
---
 gcc/gimple.h            |  2 +-
 gcc/omp-low.c           | 12 ++++++------
 gcc/tree-pretty-print.c |  2 +-
 gcc/tree.h              |  5 +----
 4 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/gcc/gimple.h b/gcc/gimple.h
index 7eef07c..6d15dab 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -730,7 +730,7 @@ struct GTY((tag("GSS_OMP_CONTINUE")))
   tree control_use;
 };
 
-/* GIMPLE_OMP_SINGLE, GIMPLE_OMP_ORDERED */
+/* GIMPLE_OMP_SINGLE, GIMPLE_OMP_TEAMS, GIMPLE_OMP_ORDERED */
 
 struct GTY((tag("GSS_OMP_SINGLE_LAYOUT")))
   gimple_statement_omp_single_layout : public gimple_statement_omp
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index c534f5c..616c5bd 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -12741,7 +12741,7 @@ grid_get_kernel_launch_attributes (gimple_stmt_iterator *gsi,
       if (OMP_CLAUSE_CODE (clause) != OMP_CLAUSE__GRIDDIM_)
 	continue;
 
-      unsigned dim = OMP_CLAUSE_GRIDDIM_DIMENSION (clause);
+      unsigned dim = OMP_CLAUSE__GRIDDIM__DIMENSION (clause);
       max_dim = MAX (dim, max_dim);
 
       grid_insert_store_range_dim (gsi, lattrs,
@@ -12788,7 +12788,7 @@ get_target_argument_identifier (int device, bool subseqent_param, int id)
   return fold_convert (ptr_type_node, t);
 }
 
-/* Return a target argument consisiting of DEVICE identifier, value identifier
+/* Return a target argument consisting of DEVICE identifier, value identifier
    ID, and the actual VALUE.  */
 
 static tree
@@ -12806,8 +12806,8 @@ get_target_argument_value (gimple_stmt_iterator *gsi, int device, int id,
 }
 
 /* If VALUE is an integer constant greater than -2^15 and smaller than 2^15,
-   push one argument to ARGS with bot the DEVICE, ID and VALUE embeded in it,
-   otherwise push an iedntifier (with DEVICE and ID) and the VALUE in two
+   push one argument to ARGS with both the DEVICE, ID and VALUE embedded in it,
+   otherwise push an identifier (with DEVICE and ID) and the VALUE in two
    arguments.  */
 
 static void
@@ -17693,7 +17693,7 @@ grid_attempt_target_gridification (gomp_target *target,
 	ws = build_zero_cst (uint32_type_node);
 
       tree c = build_omp_clause (UNKNOWN_LOCATION, OMP_CLAUSE__GRIDDIM_);
-      OMP_CLAUSE_SET_GRIDDIM_DIMENSION (c, (unsigned int) i);
+      OMP_CLAUSE__GRIDDIM__DIMENSION (c) = i;
       OMP_CLAUSE__GRIDDIM__SIZE (c) = gs;
       OMP_CLAUSE__GRIDDIM__GROUP (c) = ws;
       OMP_CLAUSE_CHAIN (c) = gimple_omp_target_clauses (target);
@@ -17749,7 +17749,7 @@ grid_gridify_all_targets (gimple_seq *body_p)
   memset (&wi, 0, sizeof (wi));
   walk_gimple_seq_mod (body_p, grid_gridify_all_targets_stmt, NULL, &wi);
 }
-
+\f
 
 /* Main entry point.  */
 
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index 31cea10..9c13d84 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -944,7 +944,7 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, int flags)
 
     case OMP_CLAUSE__GRIDDIM_:
       pp_string (pp, "_griddim_(");
-      pp_unsigned_wide_integer (pp, OMP_CLAUSE_GRIDDIM_DIMENSION (clause));
+      pp_unsigned_wide_integer (pp, OMP_CLAUSE__GRIDDIM__DIMENSION (clause));
       pp_colon (pp);
       dump_generic_node (pp, OMP_CLAUSE__GRIDDIM__SIZE (clause), spc, flags,
 			 false);
diff --git a/gcc/tree.h b/gcc/tree.h
index e885ea1..9b987bb 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1636,12 +1636,9 @@ extern void protected_set_expr_location (tree, location_t);
 #define OMP_CLAUSE_TILE_LIST(NODE) \
   OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_TILE), 0)
 
-#define OMP_CLAUSE_GRIDDIM_DIMENSION(NODE) \
+#define OMP_CLAUSE__GRIDDIM__DIMENSION(NODE) \
   (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__GRIDDIM_)\
    ->omp_clause.subcode.dimension)
-#define OMP_CLAUSE_SET_GRIDDIM_DIMENSION(NODE, DIMENSION) \
-  (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__GRIDDIM_)\
-   ->omp_clause.subcode.dimension = (DIMENSION))
 #define OMP_CLAUSE__GRIDDIM__SIZE(NODE) \
   OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__GRIDDIM_), 0)
 #define OMP_CLAUSE__GRIDDIM__GROUP(NODE) \
-- 
2.6.4

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 07/10] IPA-HSA pass
  2016-01-14 12:59   ` Jakub Jelinek
  2016-01-15 11:48     ` Martin Liška
@ 2016-01-15 14:53     ` Martin Jambor
  2016-01-15 15:01       ` Jakub Jelinek
  1 sibling, 1 reply; 77+ messages in thread
From: Martin Jambor @ 2016-01-15 14:53 UTC (permalink / raw)
  To: gcc-patches

On Thu, Jan 14, 2016 at 01:58:58PM +0100, Jakub Jelinek wrote:
> Otherwise LGTM.
> 
> 	Jakub

Thanks Jakub, I have committed the following patch from Martin Liska
that addresses your comments.

Martin

2016-01-15  Martin Liska  <mliska@suse.cz>

	* ipa-hsa.c (process_hsa_functions): Fixed coding style.
	(ipa_hsa_read_section): Likewise.
	(ipa_hsa_read_section): Likewise.
	(pass_ipa_hsa::gate): Removed in_lto_p from the condition.
---
 gcc/ipa-hsa.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/gcc/ipa-hsa.c b/gcc/ipa-hsa.c
index dd47995..769657f 100644
--- a/gcc/ipa-hsa.c
+++ b/gcc/ipa-hsa.c
@@ -86,8 +86,9 @@ process_hsa_functions (void)
 	{
 	  if (!check_warn_node_versionable (node))
 	    continue;
-	  cgraph_node *clone = node->create_virtual_clone
-	    (vec <cgraph_edge *> (), NULL, NULL, "hsa");
+	  cgraph_node *clone
+	    = node->create_virtual_clone (vec <cgraph_edge *> (),
+					  NULL, NULL, "hsa");
 	  TREE_PUBLIC (clone->decl) = TREE_PUBLIC (node->decl);
 
 	  clone->force_output = true;
@@ -102,8 +103,9 @@ process_hsa_functions (void)
 	{
 	  if (!check_warn_node_versionable (node))
 	    continue;
-	  cgraph_node *clone = node->create_virtual_clone
-	    (vec <cgraph_edge *> (), NULL, NULL, "hsa");
+	  cgraph_node *clone
+	    = node->create_virtual_clone (vec <cgraph_edge *> (),
+					  NULL, NULL, "hsa");
 	  TREE_PUBLIC (clone->decl) = TREE_PUBLIC (node->decl);
 
 	  if (!cgraph_local_p (node))
@@ -209,8 +211,8 @@ static void
 ipa_hsa_read_section (struct lto_file_decl_data *file_data, const char *data,
 		       size_t len)
 {
-  const struct lto_function_header *header =
-    (const struct lto_function_header *) data;
+  const struct lto_function_header *header
+    = (const struct lto_function_header *) data;
   const int cfg_offset = sizeof (struct lto_function_header);
   const int main_offset = cfg_offset + header->cfg_size;
   const int string_offset = main_offset + header->main_size;
@@ -221,9 +223,9 @@ ipa_hsa_read_section (struct lto_file_decl_data *file_data, const char *data,
   lto_input_block ib_main ((const char *) data + main_offset,
 			   header->main_size, file_data->mode_table);
 
-  data_in =
-    lto_data_in_create (file_data, (const char *) data + string_offset,
-			header->string_size, vNULL);
+  data_in
+    = lto_data_in_create (file_data, (const char *) data + string_offset,
+			  header->string_size, vNULL);
   count = streamer_read_uhwi (&ib_main);
 
   for (i = 0; i < count; i++)
@@ -317,7 +319,7 @@ public:
 bool
 pass_ipa_hsa::gate (function *)
 {
-  return hsa_gen_requested_p () || in_lto_p;
+  return hsa_gen_requested_p ();
 }
 
 } // anon namespace
-- 
2.6.4



^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 07/10] IPA-HSA pass
  2016-01-15 14:53     ` Martin Jambor
@ 2016-01-15 15:01       ` Jakub Jelinek
  2016-01-15 16:02         ` Martin Jambor
  0 siblings, 1 reply; 77+ messages in thread
From: Jakub Jelinek @ 2016-01-15 15:01 UTC (permalink / raw)
  To: gcc-patches

On Fri, Jan 15, 2016 at 03:53:23PM +0100, Martin Jambor wrote:
> @@ -317,7 +319,7 @@ public:
>  bool
>  pass_ipa_hsa::gate (function *)
>  {
> -  return hsa_gen_requested_p () || in_lto_p;
> +  return hsa_gen_requested_p ();
>  }
>  
>  } // anon namespace

I actually didn't mean this, I mean more of:
  return (hsa_gen_requested_p ()
#ifdef ENABLE_HSA
	  || in_lto_p
#endif
	 );
or so.  Unless you arrange in lto-wrapper or where that if
HSA is enabled in any LTO input source, then it is enabled also in
lto1.  If you do that, your change is fine.

	Jakub

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 09/10] Majority of the HSA back-end
  2016-01-14 14:05   ` Jakub Jelinek
@ 2016-01-15 15:08     ` Martin Jambor
  2016-01-15 15:18       ` Jakub Jelinek
  0 siblings, 1 reply; 77+ messages in thread
From: Martin Jambor @ 2016-01-15 15:08 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC Patches, Martin Liska

Hi,

thanks Jakub.  Below you'll find a patch, which is mostly work of
Martin Liska, that should address all the review comments.  We have
then also went over the "XXX" marks (my bad that I forgot that Michael
uses this mark), removed half of them and turned the rest into TODOs.

Let me just quickly answer two comments as well:

On Thu, Jan 14, 2016 at 03:05:33PM +0100, Jakub Jelinek wrote:
> On Wed, Jan 13, 2016 at 06:39:34PM +0100, Martin Jambor wrote:
>
...
> > +#define HSA_WARN_MEMORY_ROUTINE "OpenMP device memory library routines have " \
> > +  "undefined semantics within target regions, support for HSA ignores them"
> 
> Well, if you don't support them in HSA target regions, you'd better punt and
> not error on them.

We don't error, apart from issuing a warning we basically ignore them.
I believe we can do it even in the long term and that it is in fact
useful because the standard says that the "effect" if these routines
is "unspecified" if they get called from a target region.

Perhaps this is even something we should warn about earlier in omp
lowering/expansion.

...

> > +unsigned
> > +hsa_internal_fn::get_arity ()
> > +{
> > +  switch (m_fn)
> > +    {
> > +    case IFN_ACOS:
> > +    case IFN_ASIN:
> > +    case IFN_ATAN:
> > +    case IFN_COS:
> > +    case IFN_EXP:
> > +    case IFN_EXP10:
> > +    case IFN_EXP2:
> > +    case IFN_EXPM1:
> > +    case IFN_LOG:
> > +    case IFN_LOG10:
> > +    case IFN_LOG1P:
> > +    case IFN_LOG2:
> > +    case IFN_LOGB:
> > +    case IFN_SIGNIFICAND:
> > +    case IFN_SIN:
> > +    case IFN_SQRT:
> > +    case IFN_TAN:
> > +    case IFN_CEIL:
> > +    case IFN_FLOOR:
> > +    case IFN_NEARBYINT:
> > +    case IFN_RINT:
> > +    case IFN_ROUND:
> > +    case IFN_TRUNC:
> > +      return 1;
> > +    case IFN_ATAN2:
> > +    case IFN_COPYSIGN:
> > +    case IFN_FMOD:
> > +    case IFN_POW:
> > +    case IFN_REMAINDER:
> > +    case IFN_SCALB:
> > +    case IFN_LDEXP:
> > +      return 2;
> > +      break;
> > +    case IFN_CLRSB:
> > +    case IFN_CLZ:
> > +    case IFN_CTZ:
> > +    case IFN_FFS:
> > +    case IFN_PARITY:
> > +    case IFN_POPCOUNT:
> > +    default:
> > +      gcc_unreachable ();
> 
> There are various other IFNs (e.g. for __builtin_{add,sub,mul}_overflow,
> lots of others).  How do you ensure you don't ICE on those?

Martin added a comment explaining this.  This can only be reached when
we already know we are processing a known builtin, filtered by
gen_hsa_insn_for_internal_fn_call.

Thanks for looking at the code,

Martin

2016-01-15  Martin Liska  <mliska@suse.cz>
            Martin Jambor  <mjambor@suse.cz>

	* hsa-brig.c (struct function_linkage_pair): Fix GNU coding style
	and replace sprintf with snprintf.
	(hsa_brig_section::init): Likewise.
	(hsa_brig_section::output): Likewise.
	(hsa_brig_section::get_ptr_by_offset): Likewise.
	(brig_string_slot_hasher::hash): Likewise.
	(brig_string_slot_hasher::equal): Likewise.
	(brig_string_slot_hasher::remove): Likewise.
	(brig_emit_string): Likewise.
	(brig_init): Likewise.
	(emit_directive_variable): Likewise.
	(emit_function_directives): Likewise.
	(emit_bb_label_directive): Likewise.
	(emit_immediate_scalar_to_buffer): Likewise.
	(hsa_op_immed::emit_to_buffer): Likewise.
	(emit_immediate_operand): Likewise.
	(emit_address_operand): Likewise.
	(emit_memory_insn): Likewise.
	(emit_alloca_insn): Likewise.
	(emit_cmp_insn): Likewise.
	(emit_branch_insn): Likewise.
	(emit_switch_insn): Likewise.
	(emit_call_insn): Likewise.
	(emit_arg_block_insn): Likewise.
	(emit_packed_insn): Likewise.
	(emit_basic_insn): Likewise.
	(hsa_brig_emit_function): Likewise.
	(hsa_output_global_variables): Likewise.
	(hsa_output_kernels): Likewise.
	(hsa_output_libgomp_mapping): Likewise.
	(hsa_output_brig): Likewise.
	* hsa-dump.c (dump_hsa_immed): Likewise.
	(dump_hsa_insn_1): Likewise.
	* hsa-gen.c (hsa_symbol::total_byte_size): Likewise.
	(hsa_init_simple_builtins): Likewise.
	(hsa_init_data_for_cfun): Likewise.
	(hsa_type_for_scalar_tree_type): Likewise.
	(get_symbol_for_decl): Likewise.
	(hsa_get_host_function): Likewise.
	(hsa_op_immed::hsa_op_immed): Likewise.
	(hsa_insn_mem::hsa_insn_mem): Likewise.
	(hsa_insn_atomic::hsa_insn_atomic): Likewise.
	(hsa_insn_seg::hsa_insn_seg): Likewise.
	(hsa_insn_srctype::hsa_insn_srctype): Likewise.
	(process_mem_base): Likewise.
	(gen_hsa_insns_for_bitfield): Likewise.
	(gen_hsa_insns_for_load): Likewise.
	(gen_hsa_insns_for_store): Likewise.
	(gen_hsa_insns_for_operation_assignment): Likewise.
	(gen_hsa_insns_for_switch_stmt): Likewise.
	(get_format_argument_type): Likewise.
	(gen_hsa_insns_for_direct_call): Likewise.
	(gen_hsa_insns_for_call_of_internal_fn): Likewise.
	(gen_hsa_insns_for_return): Likewise.
	(query_hsa_grid): Likewise.
	(gen_set_num_threads): Likewise.
	(gen_num_threads_for_dispatch): Likewise.
	(gen_get_num_teams): Likewise.
	(gen_get_team_num): Likewise.
	(gen_get_level): Likewise.
	(gen_hsa_alloca): Likewise.
	(gen_hsa_clrsb): Likewise.
	(gen_hsa_ffs): Likewise.
	(gen_hsa_popcount_to_dest): Likewise.
	(gen_hsa_parity): Likewise.
	(set_debug_value): Likewise.
	(omp_simple_builtin::generate): Likewise.
	(gen_hsa_insns_for_kernel_call): Likewise.
	(gen_hsa_unaryop_for_builtin): Likewise.
	(get_address_from_value): Likewise.
	(gen_hsa_ternary_atomic_for_builtin): Likewise.
	(gen_hsa_insn_for_internal_fn_call): Likewise.
	(gen_hsa_insns_for_call): Likewise.
	(gen_hsa_phi_from_gimple_phi): Likewise.
	(init_hsa_num_threads): Likewise.
	(gen_body_from_gimple): Likewise.
	(gen_function_decl_parameters): Likewise.
	(gen_function_def_parameters): Likewise.
	(hsa_generate_function_declaration): Likewise.
	(hsa_generate_internal_fn_decl): Likewise.
	(convert_switch_statements): Likewise.
	(expand_builtins): Likewise.
	(generate_hsa): Likewise.
	(pass_gen_hsail::execute): Likewise.
	* hsa.c (hsa_deinit_compilation_unit_data): Likewise.
	(hsa_get_declaration_name): Likewise.
	(hsa_internal_fn::get_arity): Likewise.
	(hsa_internal_fn::get_argument_type): Likewise.
	* hsa.h (struct hsa_symbol): Likewise.
---
 gcc/hsa-brig.c | 292 ++++++++++----------
 gcc/hsa-dump.c |  11 +-
 gcc/hsa-gen.c  | 832 ++++++++++++++++++++++++++++++---------------------------
 gcc/hsa.c      |  23 +-
 gcc/hsa.h      |  20 +-
 5 files changed, 615 insertions(+), 563 deletions(-)

diff --git a/gcc/hsa-brig.c b/gcc/hsa-brig.c
index 17e3398..7e71f6f 100644
--- a/gcc/hsa-brig.c
+++ b/gcc/hsa-brig.c
@@ -52,6 +52,10 @@ along with GCC; see the file COPYING3.  If not see
 
 #define BRIG_CHUNK_MAX_SIZE (64 * 1024)
 
+/* Required HSA section alignment.  */
+
+#define HSA_SECTION_ALIGNMENT 16
+
 /* Chunks of BRIG binary data.  */
 
 struct hsa_brig_data_chunk
@@ -72,7 +76,7 @@ public:
   const char *section_name;
   /* Size in bytes of all data stored in the section.  */
   unsigned total_size;
-  /* The size of the header of the section including padding. */
+  /* The size of the header of the section including padding.  */
   unsigned header_byte_count;
   /* The size of the header of the section without any padding.  */
   unsigned header_byte_delta;
@@ -80,7 +84,7 @@ public:
   /* Buffers of binary data, each containing BRIG_CHUNK_MAX_SIZE bytes.  */
   vec <struct hsa_brig_data_chunk> chunks;
 
-  /* More convenient access to the last chunk from the vector above. */
+  /* More convenient access to the last chunk from the vector above.  */
   struct hsa_brig_data_chunk *cur_chunk;
 
   void allocate_new_chunk ();
@@ -110,8 +114,8 @@ static vec <hsa_insn_sbr *> *switch_instructions;
 
 struct function_linkage_pair
 {
-  function_linkage_pair (tree decl, unsigned int off):
-    function_decl (decl), offset (off) {}
+  function_linkage_pair (tree decl, unsigned int off)
+    : function_decl (decl), offset (off) {}
 
   /* Declaration of called function.  */
   tree function_decl;
@@ -145,7 +149,7 @@ hsa_brig_section::init (const char *name)
      certainly wasn't to have the first character of name and padding, which
      are a part of sizeof (BrigSectionHeader), included in the first addend,
      this is what the disassembler expects.  */
-  total_size = sizeof (BrigSectionHeader) + strlen(section_name);
+  total_size = sizeof (BrigSectionHeader) + strlen (section_name);
   chunks.create (1);
   allocate_new_chunk ();
   header_byte_delta = total_size;
@@ -175,8 +179,8 @@ hsa_brig_section::output ()
 
   section_header.byteCount = htole64 (total_size);
   section_header.headerByteCount = htole32 (header_byte_count);
-  section_header.nameLength = htole32 (strlen(section_name));
-  assemble_string ((const char*) &section_header, 16);
+  section_header.nameLength = htole32 (strlen (section_name));
+  assemble_string ((const char *) &section_header, 16);
   assemble_string (section_name, (section_header.nameLength));
   memset (&padding, 0, sizeof (padding));
   /* This is also a consequence of the wrong header size computation described
@@ -229,14 +233,13 @@ hsa_brig_section::round_size_up (int factor)
 
 /* Return pointer to data by global OFFSET in the section.  */
 
-void*
+void *
 hsa_brig_section::get_ptr_by_offset (unsigned int offset)
 {
   gcc_assert (offset < total_size);
-
   offset -= header_byte_delta;
-  unsigned int i;
 
+  unsigned i;
   for (i = 0; offset >= chunks[i].size; i++)
     offset -= chunks[i].size;
 
@@ -272,8 +275,8 @@ brig_string_slot_hasher::hash (const value_type ds)
   int i;
 
   for (i = 0; i < ds->len; i++)
-     r = r * 67 + (unsigned)ds->s[i] - 113;
-  r = r * 67 + (unsigned)ds->prefix - 113;
+     r = r * 67 + (unsigned) ds->s[i] - 113;
+  r = r * 67 + (unsigned) ds->prefix - 113;
   return r;
 }
 
@@ -283,7 +286,8 @@ inline bool
 brig_string_slot_hasher::equal (const value_type ds1, const compare_type ds2)
 {
   if (ds1->len == ds2->len)
-    return ds1->prefix == ds2->prefix && memcmp (ds1->s, ds2->s, ds1->len) == 0;
+    return ds1->prefix == ds2->prefix
+      && memcmp (ds1->s, ds2->s, ds1->len) == 0;
 
   return 0;
 }
@@ -293,7 +297,7 @@ brig_string_slot_hasher::equal (const value_type ds1, const compare_type ds2)
 inline void
 brig_string_slot_hasher::remove (value_type ds)
 {
-  free (const_cast<char*> (ds->s));
+  free (const_cast<char *> (ds->s));
   free (ds);
 }
 
@@ -315,7 +319,6 @@ brig_emit_string (const char *str, char prefix = 0, bool sanitize = true)
   brig_string_slot **slot;
   char *str2;
 
-  /* XXX Sanitize the names without all the strdup.  */
   str2 = xstrdup (str);
 
   if (sanitize)
@@ -331,15 +334,16 @@ brig_emit_string (const char *str, char prefix = 0, bool sanitize = true)
       brig_string_slot *new_slot = XCNEW (brig_string_slot);
 
       /* In theory we should fill in BrigData but that would mean copying
-         the string to a buffer for no reason, so we just emulate it. */
+	 the string to a buffer for no reason, so we just emulate it.  */
       offset = brig_data.add (&hdr_len, sizeof (hdr_len));
       if (prefix)
-        brig_data.add (&prefix, 1);
+	brig_data.add (&prefix, 1);
 
       brig_data.add (str2, slen);
       brig_data.round_size_up (4);
 
-      /* XXX could use the string we just copied into brig_string->cur_chunk */
+      /* TODO: could use the string we just copied into
+	 brig_string->cur_chunk */
       new_slot->s = str2;
       new_slot->len = slen;
       new_slot->prefix = prefix;
@@ -395,8 +399,8 @@ brig_init (void)
 	part = main_input_filename;
       else
 	part++;
-      asprintf (&modname, "&__hsa_module_%s", part);
-      char* extension = strchr (modname, '.');
+      modname = concat ("&__hsa_module_", part, NULL);
+      char *extension = strchr (modname, '.');
       if (extension)
 	*extension = '\0';
 
@@ -421,7 +425,7 @@ brig_init (void)
   else
     moddir.name = brig_emit_string ("__hsa_module_unnamed", '&');
   moddir.base.kind = htole16 (BRIG_KIND_DIRECTIVE_MODULE);
-  moddir.hsailMajor = htole32 (BRIG_VERSION_HSAIL_MAJOR) ;
+  moddir.hsailMajor = htole32 (BRIG_VERSION_HSAIL_MAJOR);
   moddir.hsailMinor = htole32 (BRIG_VERSION_HSAIL_MINOR);
   moddir.profile = hsa_full_profile_p () ? BRIG_PROFILE_FULL: BRIG_PROFILE_BASE;
   if (hsa_machine_large_p ())
@@ -513,8 +517,8 @@ emit_directive_variable (struct hsa_symbol *symbol)
   else
     {
       char buf[64];
-      sprintf (buf, "__%s_%i", hsa_seg_name (symbol->m_segment),
-	       symbol->m_name_number);
+      snprintf (buf, 64, "__%s_%i", hsa_seg_name (symbol->m_segment),
+		symbol->m_name_number);
       name_offset = brig_emit_string (buf, prefix);
     }
 
@@ -527,8 +531,8 @@ emit_directive_variable (struct hsa_symbol *symbol)
   dirvar.align = MAX (hsa_natural_alignment (dirvar.type),
 		      (BrigAlignment8_t) BRIG_ALIGNMENT_4);
   dirvar.linkage = symbol->m_linkage;
-  dirvar.dim.lo = (uint32_t) symbol->m_dim;
-  dirvar.dim.hi = (uint32_t) ((unsigned long long) symbol->m_dim >> 32);
+  dirvar.dim.lo = symbol->m_dim;
+  dirvar.dim.hi = symbol->m_dim >> 32;
 
   /* Global variables are just declared and linked via HSA runtime.  */
   if (symbol->m_linkage != BRIG_ALLOCATION_PROGRAM)
@@ -565,7 +569,7 @@ emit_function_directives (hsa_function_representation *f, bool is_declaration)
       }
 
   name_offset = brig_emit_string (f->m_name, '&');
-  inarg_off = brig_code.total_size + sizeof(fndir)
+  inarg_off = brig_code.total_size + sizeof (fndir)
     + (f->m_output_arg ? sizeof (struct BrigDirectiveVariable) : 0);
   scoped_off = inarg_off
     + f->m_input_args.length () * sizeof (struct BrigDirectiveVariable);
@@ -603,15 +607,15 @@ emit_function_directives (hsa_function_representation *f, bool is_declaration)
   else
     {
       /* Internal function.  */
-      hsa_internal_fn **slot = hsa_emitted_internal_decls->find_slot
-	(f->m_internal_fn, INSERT);
+      hsa_internal_fn **slot
+	= hsa_emitted_internal_decls->find_slot (f->m_internal_fn, INSERT);
       hsa_internal_fn *int_fn = new hsa_internal_fn (f->m_internal_fn);
       int_fn->m_offset = brig_code.total_size;
       *slot = int_fn;
     }
 
   brig_code.add (&fndir, sizeof (fndir));
-  /* XXX terrible hack: we need to set instCount after we emit all
+  /* terrible hack: we need to set instCount after we emit all
      insns, but we need to emit directive in order, and we emit directives
      during insn emitting.  So we need to emit the FUNCTION directive
      early, then the insns, and then we need to set instCount, so remember
@@ -619,8 +623,8 @@ emit_function_directives (hsa_function_representation *f, bool is_declaration)
      directly to after fndir here.  */
   ptr_to_fndir
       = (BrigDirectiveExecutable *)(brig_code.cur_chunk->data
-                                    + brig_code.cur_chunk->size
-                                    - sizeof (fndir));
+				    + brig_code.cur_chunk->size
+				    - sizeof (fndir));
 
   if (f->m_output_arg)
     emit_directive_variable (f->m_output_arg);
@@ -651,14 +655,16 @@ static void
 emit_bb_label_directive (hsa_bb *hbb)
 {
   struct BrigDirectiveLabel lbldir;
-  char buf[32];
 
   lbldir.base.byteCount = htole16 (sizeof (lbldir));
   lbldir.base.kind = htole16 (BRIG_KIND_DIRECTIVE_LABEL);
-  sprintf (buf, "BB_%u_%i", DECL_UID (current_function_decl), hbb->m_index);
+  char buf[32];
+  snprintf (buf, 32, "BB_%u_%i", DECL_UID (current_function_decl),
+	    hbb->m_index);
   lbldir.name = htole32 (brig_emit_string (buf, '@'));
 
-  hbb->m_label_ref.m_directive_offset = brig_code.add (&lbldir, sizeof (lbldir));
+  hbb->m_label_ref.m_directive_offset = brig_code.add (&lbldir,
+						       sizeof (lbldir));
   brig_insn_count++;
 }
 
@@ -788,7 +794,7 @@ emit_immediate_scalar_to_buffer (tree value, char *data, unsigned need_len)
   tree type = TREE_TYPE (value);
   gcc_checking_assert (TREE_CODE (type) != VECTOR_TYPE);
 
-  unsigned data_len = tree_to_uhwi (TYPE_SIZE (type))/BITS_PER_UNIT;
+  unsigned data_len = tree_to_uhwi (TYPE_SIZE (type)) / BITS_PER_UNIT;
   if (INTEGRAL_TYPE_P (type)
       || (POINTER_TYPE_P (type) && TREE_CODE (value) == INTEGER_CST))
     switch (data_len)
@@ -803,7 +809,7 @@ emit_immediate_scalar_to_buffer (tree value, char *data, unsigned need_len)
 	bytes.b32 = (uint32_t) TREE_INT_CST_LOW (value);
 	break;
       case 8:
-	bytes.b64 = (uint64_t) int_cst_value (value);
+	bytes.b64 = (uint64_t) TREE_INT_CST_LOW (value);
 	break;
       default:
 	gcc_unreachable ();
@@ -861,8 +867,8 @@ hsa_op_immed::emit_to_buffer (tree value)
       for (i = 0; i < num; i++)
 	{
 	  unsigned actual;
-	  actual = emit_immediate_scalar_to_buffer
-	    (VECTOR_CST_ELT (value, i), p, 0);
+	  actual
+	    = emit_immediate_scalar_to_buffer (VECTOR_CST_ELT (value, i), p, 0);
 	  total_len -= actual;
 	  p += actual;
 	}
@@ -876,14 +882,16 @@ hsa_op_immed::emit_to_buffer (tree value)
     {
       gcc_assert (total_len % 2 == 0);
       unsigned actual;
-      actual = emit_immediate_scalar_to_buffer
-	(TREE_REALPART (value), p, total_len / 2);
+      actual
+	= emit_immediate_scalar_to_buffer (TREE_REALPART (value), p,
+					   total_len / 2);
 
       gcc_assert (actual == total_len / 2);
       p += actual;
 
-      actual = emit_immediate_scalar_to_buffer
-	(TREE_IMAGPART (value), p, total_len / 2);
+      actual
+	= emit_immediate_scalar_to_buffer (TREE_IMAGPART (value), p,
+					   total_len / 2);
       gcc_assert (actual == total_len / 2);
     }
   else if (TREE_CODE (value) == CONSTRUCTOR)
@@ -891,8 +899,8 @@ hsa_op_immed::emit_to_buffer (tree value)
       unsigned len = vec_safe_length (CONSTRUCTOR_ELTS (value));
       for (unsigned i = 0; i < len; i++)
 	{
-	  unsigned actual = emit_immediate_scalar_to_buffer
-	    (CONSTRUCTOR_ELT (value, i)->value, p, 0);
+	  tree v = CONSTRUCTOR_ELT (value, i)->value;
+	  unsigned actual = emit_immediate_scalar_to_buffer (v, p, 0);
 	  total_len -= actual;
 	  p += actual;
 	}
@@ -918,7 +926,7 @@ emit_immediate_operand (hsa_op_immed *imm)
   uint32_t byteCount = htole32 (imm->m_brig_repr_size);
   out.type = htole16 (imm->m_type);
   out.bytes = htole32 (brig_data.add (&byteCount, sizeof (byteCount)));
-  brig_operand.add (&out, sizeof(out));
+  brig_operand.add (&out, sizeof (out));
   brig_data.add (imm->m_brig_repr, imm->m_brig_repr_size);
   brig_data.round_size_up (4);
 }
@@ -970,8 +978,8 @@ emit_address_operand (hsa_op_address *addr)
 
   if (sizeof (addr->m_imm_offset) == 8)
     {
-      out.offset.lo = htole32 ((uint32_t)addr->m_imm_offset);
-      out.offset.hi = htole32 (((long long) addr->m_imm_offset) >> 32);
+      out.offset.lo = htole32 (addr->m_imm_offset);
+      out.offset.hi = htole32 (addr->m_imm_offset >> 32);
     }
   else
     {
@@ -1097,7 +1105,6 @@ emit_internal_fn_decl (hsa_internal_fn *fn)
   return e;
 }
 
-
 /* Enqueue all operands of INSN and return offset to BRIG data section
    to list of operand offsets.  */
 
@@ -1160,7 +1167,7 @@ emit_operands (hsa_op_base *op0, hsa_op_base *op1 = NULL,
 }
 
 /* Emit an HSA memory instruction and all necessary directives, schedule
-   necessary operands for writing .  */
+   necessary operands for writing.  */
 
 static void
 emit_memory_insn (hsa_insn_mem *mem)
@@ -1185,7 +1192,7 @@ emit_memory_insn (hsa_insn_mem *mem)
     repr.segment = addr->m_symbol->m_segment;
   else
     repr.segment = BRIG_SEGMENT_FLAT;
-  repr.modifier.allBits = 0 ;
+  repr.modifier.allBits = 0;
   repr.equivClass = mem->m_equiv_class;
   repr.align = mem->m_align;
   if (mem->m_opcode == BRIG_OPCODE_LD)
@@ -1225,7 +1232,7 @@ emit_signal_insn (hsa_insn_signal *mem)
 }
 
 /* Emit an HSA atomic memory instruction and all necessary directives, schedule
-   necessary operands for writing .  */
+   necessary operands for writing.  */
 
 static void
 emit_atomic_insn (hsa_insn_atomic *mem)
@@ -1263,7 +1270,7 @@ emit_atomic_insn (hsa_insn_atomic *mem)
 }
 
 /* Emit an HSA LDA instruction and all necessary directives, schedule
-   necessary operands for writing .  */
+   necessary operands for writing.  */
 
 static void
 emit_addr_insn (hsa_insn_basic *insn)
@@ -1289,7 +1296,7 @@ emit_addr_insn (hsa_insn_basic *insn)
 }
 
 /* Emit an HSA segment conversion instruction and all necessary directives,
-   schedule necessary operands for writing .  */
+   schedule necessary operands for writing.  */
 
 static void
 emit_segment_insn (hsa_insn_seg *seg)
@@ -1311,7 +1318,7 @@ emit_segment_insn (hsa_insn_seg *seg)
 }
 
 /* Emit an HSA alloca instruction and all necessary directives,
-   schedule necessary operands for writing .  */
+   schedule necessary operands for writing.  */
 
 static void
 emit_alloca_insn (hsa_insn_alloca *alloca)
@@ -1330,7 +1337,7 @@ emit_alloca_insn (hsa_insn_alloca *alloca)
   repr.base.type = htole16 (alloca->m_type);
   repr.base.operands = htole32 (emit_insn_operands (alloca));
   repr.segment = BRIG_SEGMENT_PRIVATE;
-  repr.modifier.allBits = 0 ;
+  repr.modifier.allBits = 0;
   repr.equivClass = 0;
   repr.align = alloca->m_align;
   repr.width = BRIG_WIDTH_NONE;
@@ -1340,7 +1347,7 @@ emit_alloca_insn (hsa_insn_alloca *alloca)
 }
 
 /* Emit an HSA comparison instruction and all necessary directives,
-   schedule necessary operands for writing .  */
+   schedule necessary operands for writing.  */
 
 static void
 emit_cmp_insn (hsa_insn_cmp *cmp)
@@ -1357,7 +1364,8 @@ emit_cmp_insn (hsa_insn_cmp *cmp)
   if (is_a <hsa_op_reg *> (cmp->get_op (1)))
     repr.sourceType = htole16 (as_a <hsa_op_reg *> (cmp->get_op (1))->m_type);
   else
-    repr.sourceType = htole16 (as_a <hsa_op_immed *> (cmp->get_op (1))->m_type);
+    repr.sourceType
+      = htole16 (as_a <hsa_op_immed *> (cmp->get_op (1))->m_type);
   repr.modifier.allBits = 0;
   repr.compare = cmp->m_compare;
   repr.pack = 0;
@@ -1367,7 +1375,7 @@ emit_cmp_insn (hsa_insn_cmp *cmp)
 }
 
 /* Emit an HSA branching instruction and all necessary directives, schedule
-   necessary operands for writing .  */
+   necessary operands for writing.  */
 
 static void
 emit_branch_insn (hsa_insn_br *br)
@@ -1395,8 +1403,9 @@ emit_branch_insn (hsa_insn_br *br)
       }
   gcc_assert (target);
 
-  repr.base.operands = htole32
-    (emit_operands (br->get_op (0), &hsa_bb_for_bb (target)->m_label_ref));
+  repr.base.operands
+    = htole32 (emit_operands (br->get_op (0),
+			      &hsa_bb_for_bb (target)->m_label_ref));
   memset (&repr.reserved, 0, sizeof (repr.reserved));
 
   brig_code.add (&repr, sizeof (repr));
@@ -1440,8 +1449,8 @@ emit_switch_insn (hsa_insn_sbr *sbr)
   /* For Conditional jumps the type is always B1.  */
   hsa_op_reg *index = as_a <hsa_op_reg *> (sbr->get_op (0));
   repr.base.type = htole16 (index->m_type);
-  repr.base.operands = htole32
-    (emit_operands (sbr->get_op (0), sbr->m_label_code_list));
+  repr.base.operands
+    = htole32 (emit_operands (sbr->get_op (0), sbr->m_label_code_list));
   memset (&repr.reserved, 0, sizeof (repr.reserved));
 
   brig_code.add (&repr, sizeof (repr));
@@ -1502,19 +1511,21 @@ emit_call_insn (hsa_insn_call *call)
   repr.base.opcode = htole16 (BRIG_OPCODE_CALL);
   repr.base.type = htole16 (BRIG_TYPE_NONE);
 
-  repr.base.operands = htole32
-    (emit_operands (call->m_result_code_list, &call->m_func,
-		    call->m_args_code_list));
+  repr.base.operands
+    = htole32 (emit_operands (call->m_result_code_list, &call->m_func,
+			      call->m_args_code_list));
 
   /* Internal functions have not set m_called_function.  */
   if (call->m_called_function)
-    function_call_linkage.safe_push
-      (function_linkage_pair (call->m_called_function,
-			      call->m_func.m_brig_op_offset));
+    {
+      function_linkage_pair pair (call->m_called_function,
+				  call->m_func.m_brig_op_offset);
+      function_call_linkage.safe_push (pair);
+    }
   else
     {
-      hsa_internal_fn *slot = hsa_emitted_internal_decls->find
-	(call->m_called_internal_fn);
+      hsa_internal_fn *slot
+	= hsa_emitted_internal_decls->find (call->m_called_internal_fn);
       gcc_assert (slot);
       gcc_assert (slot->m_offset > 0);
       call->m_func.m_directive_offset = slot->m_offset;
@@ -1543,15 +1554,17 @@ emit_arg_block_insn (hsa_insn_arg_block *insn)
 
 	for (unsigned i = 0; i < insn->m_call_insn->m_input_args.length (); i++)
 	  {
-	    insn->m_call_insn->m_args_code_list->m_offsets[i] = htole32
-	      (emit_directive_variable (insn->m_call_insn->m_input_args[i]));
+	    insn->m_call_insn->m_args_code_list->m_offsets[i]
+	      = htole32 (emit_directive_variable
+			 (insn->m_call_insn->m_input_args[i]));
 	    brig_insn_count++;
 	  }
 
 	if (insn->m_call_insn->m_output_arg)
 	  {
-	    insn->m_call_insn->m_result_code_list->m_offsets[0] = htole32
-	      (emit_directive_variable (insn->m_call_insn->m_output_arg));
+	    insn->m_call_insn->m_result_code_list->m_offsets[0]
+	      = htole32 (emit_directive_variable
+			 (insn->m_call_insn->m_output_arg));
 	    brig_insn_count++;
 	  }
 
@@ -1652,8 +1665,8 @@ emit_packed_insn (hsa_insn_packed *insn)
       for (unsigned i = 1; i < operand_count; i++)
 	{
 	  gcc_checking_assert (insn->get_op (i));
-	  insn->m_operand_list->m_offsets[i - 1] = htole32
-	    (enqueue_op (insn->get_op (i)));
+	  insn->m_operand_list->m_offsets[i - 1]
+	    = htole32 (enqueue_op (insn->get_op (i)));
 	}
 
       repr.base.operands = htole32 (emit_operands (insn->get_op (0),
@@ -1665,13 +1678,13 @@ emit_packed_insn (hsa_insn_packed *insn)
       for (unsigned i = 0; i < operand_count - 1; i++)
 	{
 	  gcc_checking_assert (insn->get_op (i));
-	  insn->m_operand_list->m_offsets[i] = htole32
-	    (enqueue_op (insn->get_op (i)));
+	  insn->m_operand_list->m_offsets[i]
+	    = htole32 (enqueue_op (insn->get_op (i)));
 	}
 
-      repr.base.operands = htole32
-	(emit_operands (insn->m_operand_list,
-			insn->get_op (insn->operand_count () - 1)));
+      unsigned ops = emit_operands (insn->m_operand_list,
+				    insn->get_op (insn->operand_count () - 1));
+      repr.base.operands = htole32 (ops);
     }
 
 
@@ -1696,7 +1709,7 @@ emit_basic_insn (hsa_insn_basic *insn)
   switch (insn->m_opcode)
     {
       /* And the bit-logical operations need bit types and whine about
-         arithmetic types :-/  */
+	 arithmetic types :-/  */
       case BRIG_OPCODE_AND:
       case BRIG_OPCODE_OR:
       case BRIG_OPCODE_XOR:
@@ -1717,8 +1730,7 @@ emit_basic_insn (hsa_insn_basic *insn)
 	repr.round = BRIG_ROUND_FLOAT_NEAR_EVEN;
       else
 	repr.round = 0;
-      /* We assume that destination and sources agree in packing
-         layout.  */
+      /* We assume that destination and sources agree in packing layout.  */
       if (insn->num_used_ops () >= 2)
 	repr.pack = BRIG_PACK_PP;
       else
@@ -1734,7 +1746,7 @@ emit_basic_insn (hsa_insn_basic *insn)
 }
 
 /* Emit an HSA instruction and all necessary directives, schedule necessary
-   operands for writing .  */
+   operands for writing.  */
 
 static void
 emit_insn (hsa_insn_basic *insn)
@@ -1886,8 +1898,8 @@ hsa_brig_emit_function (void)
 	  for (unsigned j = 0; j < sbr->m_jump_table.length (); j++)
 	    {
 	      hsa_bb *hbb = hsa_bb_for_bb (sbr->m_jump_table[j]);
-	      sbr->m_label_code_list->m_offsets[j] =
-		hbb->m_label_ref.m_directive_offset;
+	      sbr->m_label_code_list->m_offsets[j]
+		= hbb->m_label_ref.m_directive_offset;
 	    }
 	}
 
@@ -1914,10 +1926,7 @@ hsa_brig_emit_omp_symbols (void)
   emit_directive_variable (hsa_num_threads);
 }
 
-/* Unit constructor and destructor statements.  */
-
-static GTY(()) tree hsa_ctor_statements;
-static GTY(()) tree hsa_dtor_statements;
+static GTY(()) tree hsa_cdtor_statements[2];
 
 /* Create and return __hsa_global_variables symbol that contains
    all informations consumed by libgomp to link global variables
@@ -1961,8 +1970,8 @@ hsa_output_global_variables ()
       hsa_sanitize_name (copy);
 
       tree var_name = build_string (len, copy);
-      TREE_TYPE (var_name) = build_array_type
-	(char_type_node, build_index_type (size_int (len)));
+      TREE_TYPE (var_name)
+	= build_array_type (char_type_node, build_index_type (size_int (len)));
       free (copy);
 
       vec<constructor_elt, va_gc> *variable_info_vec = NULL;
@@ -2021,9 +2030,8 @@ hsa_output_kernels (tree *host_func_table, tree *kernels)
   for (unsigned i = 0; i < map_count; ++i)
     {
       tree decl = hsa_get_decl_kernel_mapping_decl (i);
-      CONSTRUCTOR_APPEND_ELT
-	(host_functions_vec, NULL_TREE,
-	 build_fold_addr_expr (hsa_get_host_function (decl)));
+      tree host_fn = build_fold_addr_expr (hsa_get_host_function (decl));
+      CONSTRUCTOR_APPEND_ELT (host_functions_vec, NULL_TREE, host_fn);
     }
   tree host_functions_ctor = build_constructor (host_functions_array_type,
 						host_functions_vec);
@@ -2070,8 +2078,9 @@ hsa_output_kernels (tree *host_func_table, tree *kernels)
 			 NULL_TREE);
 
   int_num_of_kernels = build_int_cstu (uint32_type_node, map_count);
-  tree kernel_info_vector_type = build_array_type
-    (kernel_info_type, build_index_type (int_num_of_kernels));
+  tree kernel_info_vector_type
+    = build_array_type (kernel_info_type,
+			build_index_type (int_num_of_kernels));
   TYPE_ARTIFICIAL (kernel_info_vector_type) = 1;
 
   vec<constructor_elt, va_gc> *kernel_info_vector_vec = NULL;
@@ -2089,8 +2098,8 @@ hsa_output_kernels (tree *host_func_table, tree *kernels)
       len++;
 
       tree kern_name = build_string (len, copy);
-      TREE_TYPE (kern_name) = build_array_type
-	(char_type_node, build_index_type (size_int (len)));
+      TREE_TYPE (kern_name)
+	= build_array_type (char_type_node, build_index_type (size_int (len)));
       free (copy);
 
       unsigned omp_size = hsa_get_decl_kernel_mapping_omp_size (i);
@@ -2100,9 +2109,9 @@ hsa_output_kernels (tree *host_func_table, tree *kernels)
 						     gridified_kernel_p);
       unsigned count = 0;
 
-      kernel_dependencies_vector_type = build_array_type
-	(build_pointer_type (char_type_node),
-	 build_index_type (size_int (0)));
+      kernel_dependencies_vector_type
+	= build_array_type (build_pointer_type (char_type_node),
+			    build_index_type (size_int (0)));
 
       vec<constructor_elt, va_gc> *kernel_dependencies_vec = NULL;
       if (hsa_decl_kernel_dependencies)
@@ -2114,9 +2123,9 @@ hsa_output_kernels (tree *host_func_table, tree *kernels)
 	      vec <const char *> *dependencies = *slot;
 	      count = dependencies->length ();
 
-	      kernel_dependencies_vector_type = build_array_type
-		(build_pointer_type (char_type_node),
-		 build_index_type (size_int (count)));
+	      kernel_dependencies_vector_type
+		= build_array_type (build_pointer_type (char_type_node),
+				    build_index_type (size_int (count)));
 	      TYPE_ARTIFICIAL (kernel_dependencies_vector_type) = 1;
 
 	      for (unsigned j = 0; j < count; j++)
@@ -2124,8 +2133,9 @@ hsa_output_kernels (tree *host_func_table, tree *kernels)
 		  const char *d = (*dependencies)[j];
 		  len = strlen (d);
 		  tree dependency_name = build_string (len, d);
-		  TREE_TYPE (dependency_name) = build_array_type
-		    (char_type_node, build_index_type (size_int (len)));
+		  TREE_TYPE (dependency_name)
+		    = build_array_type (char_type_node,
+					build_index_type (size_int (len)));
 
 		  CONSTRUCTOR_APPEND_ELT
 		    (kernel_dependencies_vec, NULL_TREE,
@@ -2163,8 +2173,9 @@ hsa_output_kernels (tree *host_func_table, tree *kernels)
 	  DECL_IGNORED_P (dependencies_list) = 1;
 	  DECL_EXTERNAL (dependencies_list) = 0;
 	  TREE_CONSTANT (dependencies_list) = 1;
-	  DECL_INITIAL (dependencies_list) = build_constructor
-	    (kernel_dependencies_vector_type, kernel_dependencies_vec);
+	  DECL_INITIAL (dependencies_list)
+	    = build_constructor (kernel_dependencies_vector_type,
+				 kernel_dependencies_vec);
 	  varpool_node::finalize_decl (dependencies_list);
 
 	  CONSTRUCTOR_APPEND_ELT (kernel_info_vec, NULL_TREE,
@@ -2275,18 +2286,16 @@ hsa_output_libgomp_mapping (tree brig_decl)
   DECL_INITIAL (hsa_img_descriptor) = img_desc_ctor;
   varpool_node::finalize_decl (hsa_img_descriptor);
 
-  /* Construct the "host_table" libgomp expects. */
-  tree libgomp_host_table_type = build_array_type (ptr_type_node,
-						   build_index_type
-						   (build_int_cst
-						    (integer_type_node, 4)));
+  /* Construct the "host_table" libgomp expects.  */
+  tree index_type = build_index_type (build_int_cst (integer_type_node, 4));
+  tree libgomp_host_table_type = build_array_type (ptr_type_node, index_type);
   TYPE_ARTIFICIAL (libgomp_host_table_type) = 1;
   vec<constructor_elt, va_gc> *libgomp_host_table_vec = NULL;
   tree host_func_table_addr = build_fold_addr_expr (host_func_table);
   CONSTRUCTOR_APPEND_ELT (libgomp_host_table_vec, NULL_TREE,
 			  host_func_table_addr);
-  offset_int func_table_size = wi::to_offset (TYPE_SIZE_UNIT (ptr_type_node))
-    * kernel_count;
+  offset_int func_table_size
+    = wi::to_offset (TYPE_SIZE_UNIT (ptr_type_node)) * kernel_count;
   CONSTRUCTOR_APPEND_ELT (libgomp_host_table_vec, NULL_TREE,
 			  fold_build2 (POINTER_PLUS_EXPR,
 				       TREE_TYPE (host_func_table_addr),
@@ -2315,8 +2324,8 @@ hsa_output_libgomp_mapping (tree brig_decl)
 
   /* Generate an initializer with a call to the registration routine.  */
 
-  tree offload_register = builtin_decl_explicit
-    (BUILT_IN_GOMP_OFFLOAD_REGISTER);
+  tree offload_register
+    = builtin_decl_explicit (BUILT_IN_GOMP_OFFLOAD_REGISTER);
   gcc_checking_assert (offload_register);
 
   append_to_statement_list
@@ -2327,12 +2336,13 @@ hsa_output_libgomp_mapping (tree brig_decl)
 		      build_fold_addr_expr (hsa_libgomp_host_table),
 		      build_int_cst (integer_type_node, GOMP_DEVICE_HSA),
 		      build_fold_addr_expr (hsa_img_descriptor)),
-     &hsa_ctor_statements);
+     &hsa_cdtor_statements[0]);
 
-  cgraph_build_static_cdtor ('I', hsa_ctor_statements, DEFAULT_INIT_PRIORITY);
+  cgraph_build_static_cdtor ('I', hsa_cdtor_statements[0],
+			     DEFAULT_INIT_PRIORITY);
 
-  tree offload_unregister = builtin_decl_explicit
-    (BUILT_IN_GOMP_OFFLOAD_UNREGISTER);
+  tree offload_unregister
+    = builtin_decl_explicit (BUILT_IN_GOMP_OFFLOAD_UNREGISTER);
   gcc_checking_assert (offload_unregister);
 
   append_to_statement_list
@@ -2343,14 +2353,11 @@ hsa_output_libgomp_mapping (tree brig_decl)
 		      build_fold_addr_expr (hsa_libgomp_host_table),
 		      build_int_cst (integer_type_node, GOMP_DEVICE_HSA),
 		      build_fold_addr_expr (hsa_img_descriptor)),
-     &hsa_dtor_statements);
-  cgraph_build_static_cdtor ('D', hsa_dtor_statements, DEFAULT_INIT_PRIORITY);
+     &hsa_cdtor_statements[1]);
+  cgraph_build_static_cdtor ('D', hsa_cdtor_statements[1],
+			     DEFAULT_INIT_PRIORITY);
 }
 
-/* Required HSA section alignment. */
-
-#define HSA_SECTION_ALIGNMENT 16
-
 /* Emit the brig module we have compiled to a section in the final assembly and
    also create a compile unit static constructor that will register the brig
    module with libgomp.  */
@@ -2369,8 +2376,8 @@ hsa_output_brig (void)
 
       BrigCodeOffset32_t *func_offset = function_offsets->get (p.function_decl);
       gcc_assert (*func_offset);
-      BrigOperandCodeRef *code_ref = (BrigOperandCodeRef *)
-	(brig_operand.get_ptr_by_offset (p.offset));
+      BrigOperandCodeRef *code_ref
+	= (BrigOperandCodeRef *) (brig_operand.get_ptr_by_offset (p.offset));
       gcc_assert (code_ref->base.kind == BRIG_KIND_OPERAND_CODE_REF);
       code_ref->ref = htole32 (*func_offset);
     }
@@ -2380,8 +2387,9 @@ hsa_output_brig (void)
      then change the linkage to program linkage.  Doing so, we will emit
      a valid BRIG image.  */
   if (hsa_failed_functions != NULL && emitted_declarations != NULL)
-    for (hash_map <tree, BrigDirectiveExecutable *>::iterator it =
-	 emitted_declarations->begin (); it != emitted_declarations->end ();
+    for (hash_map <tree, BrigDirectiveExecutable *>::iterator it
+	 = emitted_declarations->begin ();
+	 it != emitted_declarations->end ();
 	 ++it)
       {
 	if (hsa_failed_functions->contains ((*it).first))
@@ -2410,7 +2418,7 @@ hsa_output_brig (void)
 
   BrigModuleHeader module_header;
   memcpy (&module_header.identification, "HSA BRIG",
-	  sizeof(module_header.identification));
+	  sizeof (module_header.identification));
   module_header.brigMajor = htole32 (BRIG_VERSION_BRIG_MAJOR);
   module_header.brigMinor = htole32 (BRIG_VERSION_BRIG_MINOR);
   uint64_t section_index[3];
@@ -2423,17 +2431,21 @@ hsa_output_brig (void)
   operand_padding = HSA_SECTION_ALIGNMENT
     - brig_operand.total_size % HSA_SECTION_ALIGNMENT;
 
-  uint64_t module_size = sizeof (module_header) + sizeof (section_index)
-    + brig_data.total_size + data_padding
-    + brig_code.total_size + code_padding
-    + brig_operand.total_size + operand_padding;
+  uint64_t module_size = sizeof (module_header)
+    + sizeof (section_index)
+    + brig_data.total_size
+    + data_padding
+    + brig_code.total_size
+    + code_padding
+    + brig_operand.total_size
+    + operand_padding;
   gcc_assert ((module_size % 16) == 0);
   module_header.byteCount = htole64 (module_size);
   memset (&module_header.hash, 0, sizeof (module_header.hash));
   module_header.reserved = 0;
   module_header.sectionCount = htole32 (3);
   module_header.sectionIndex = htole64 (sizeof (module_header));
-  assemble_string ((const char *) &module_header, sizeof(module_header));
+  assemble_string ((const char *) &module_header, sizeof (module_header));
   uint64_t off = sizeof (module_header) + sizeof (section_index);
   section_index[0] = htole64 (off);
   off += brig_data.total_size + data_padding;
@@ -2443,7 +2455,7 @@ hsa_output_brig (void)
   assemble_string ((const char *) &section_index, sizeof (section_index));
 
   char padding[HSA_SECTION_ALIGNMENT];
-  memset (padding, 0, sizeof(padding));
+  memset (padding, 0, sizeof (padding));
 
   brig_data.output ();
   assemble_string (padding, data_padding);
diff --git a/gcc/hsa-dump.c b/gcc/hsa-dump.c
index aee525e..af79bcb 100644
--- a/gcc/hsa-dump.c
+++ b/gcc/hsa-dump.c
@@ -543,7 +543,7 @@ hsa_memsem_name (enum BrigMemoryOrder mo)
     }
 }
 
-/* Return textual name for memory scope. */
+/* Return textual name for memory scope.  */
 
 static const char *
 hsa_memscope_name (enum BrigMemoryScope scope)
@@ -649,8 +649,9 @@ dump_hsa_reg (FILE *f, hsa_op_reg *reg, bool dump_type = false)
 static void
 dump_hsa_immed (FILE *f, hsa_op_immed *imm)
 {
-  bool unsigned_int_type = (BRIG_TYPE_U8 | BRIG_TYPE_U16 | BRIG_TYPE_U32
-    | BRIG_TYPE_U64) & imm->m_type;
+  bool unsigned_int_type
+    = (BRIG_TYPE_U8 | BRIG_TYPE_U16 | BRIG_TYPE_U32 | BRIG_TYPE_U64)
+    & imm->m_type;
 
   if (imm->m_tree_value)
     print_generic_expr (f, imm->m_tree_value, 0);
@@ -662,7 +663,7 @@ dump_hsa_immed (FILE *f, hsa_op_immed *imm)
 	fprintf (f, HOST_WIDE_INT_PRINT_DEC, imm->m_int_value);
       else
 	fprintf (f, HOST_WIDE_INT_PRINT_UNSIGNED,
-		 (unsigned HOST_WIDE_INT)imm->m_int_value);
+		 (unsigned HOST_WIDE_INT) imm->m_int_value);
     }
 
   fprintf (f, " (%s)", hsa_type_name (imm->m_type));
@@ -967,7 +968,7 @@ dump_hsa_insn_1 (FILE *f, hsa_insn_basic *insn, int *indent)
 
       fprintf (f, "(");
       for (unsigned i = 0; i < call->m_input_args.length (); i++)
-        {
+	{
 	  fprintf (f, "%%__arg_%u", i);
 
 	  if (i != call->m_input_args.length () - 1)
diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index 2628fbc..6d1cc98 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -126,7 +126,7 @@ struct hsa_queue
 };
 
 /* Alloc pools for allocating basic hsa structures such as operands,
-   instructions and other basic entities.s */
+   instructions and other basic entities.  */
 static object_allocator<hsa_op_address> *hsa_allocp_operand_address;
 static object_allocator<hsa_op_immed> *hsa_allocp_operand_immed;
 static object_allocator<hsa_op_reg> *hsa_allocp_operand_reg;
@@ -158,11 +158,11 @@ static vec <hsa_insn_basic *> hsa_instructions;
 static vec <hsa_op_base *> hsa_operands;
 
 hsa_symbol::hsa_symbol ()
-: m_decl (NULL_TREE), m_name (NULL), m_name_number (0),
-  m_directive_offset (0), m_type (BRIG_TYPE_NONE),
-  m_segment (BRIG_SEGMENT_NONE), m_linkage (BRIG_LINKAGE_NONE), m_dim (0),
-  m_cst_value (NULL), m_global_scope_p (false), m_seen_error (false),
-  m_allocation (BRIG_ALLOCATION_AUTOMATIC)
+  : m_decl (NULL_TREE), m_name (NULL), m_name_number (0),
+    m_directive_offset (0), m_type (BRIG_TYPE_NONE),
+    m_segment (BRIG_SEGMENT_NONE), m_linkage (BRIG_LINKAGE_NONE), m_dim (0),
+    m_cst_value (NULL), m_global_scope_p (false), m_seen_error (false),
+    m_allocation (BRIG_ALLOCATION_AUTOMATIC)
 {
 }
 
@@ -170,18 +170,19 @@ hsa_symbol::hsa_symbol ()
 hsa_symbol::hsa_symbol (BrigType16_t type, BrigSegment8_t segment,
 			BrigLinkage8_t linkage, bool global_scope_p,
 			BrigAllocation allocation)
-: m_decl (NULL_TREE), m_name (NULL), m_name_number (0),
-  m_directive_offset (0), m_type (type), m_segment (segment),
-  m_linkage (linkage), m_dim (0), m_cst_value (NULL),
-  m_global_scope_p (global_scope_p), m_seen_error (false),
-  m_allocation (allocation)
+  : m_decl (NULL_TREE), m_name (NULL), m_name_number (0),
+    m_directive_offset (0), m_type (type), m_segment (segment),
+    m_linkage (linkage), m_dim (0), m_cst_value (NULL),
+    m_global_scope_p (global_scope_p), m_seen_error (false),
+    m_allocation (allocation)
 {
 }
 
 unsigned HOST_WIDE_INT
 hsa_symbol::total_byte_size ()
 {
-  unsigned HOST_WIDE_INT s = hsa_type_bit_size (~BRIG_TYPE_ARRAY_MASK & m_type);
+  unsigned HOST_WIDE_INT s
+    = hsa_type_bit_size (~BRIG_TYPE_ARRAY_MASK & m_type);
   gcc_assert (s % BITS_PER_UNIT == 0);
   s /= BITS_PER_UNIT;
 
@@ -213,15 +214,16 @@ hsa_symbol::fillup_for_decl (tree decl)
    should be set to number of SSA names used in the function.  */
 
 hsa_function_representation::hsa_function_representation
-  (tree fdecl, bool kernel_p, unsigned ssa_names_count): m_name (NULL),
-  m_reg_count (0), m_input_args (vNULL),
-  m_output_arg (NULL), m_spill_symbols (vNULL), m_global_symbols (vNULL),
-  m_private_variables (vNULL), m_called_functions (vNULL),
-  m_called_internal_fns (vNULL), m_hbb_count (0),
-  m_in_ssa (true), m_kern_p (kernel_p), m_declaration_p (false), m_decl (fdecl),
-  m_internal_fn (NULL), m_shadow_reg (NULL), m_kernel_dispatch_count (0),
-  m_maximum_omp_data_size (0), m_seen_error (false), m_temp_symbol_count (0),
-  m_ssa_map ()
+  (tree fdecl, bool kernel_p, unsigned ssa_names_count)
+  : m_name (NULL),
+    m_reg_count (0), m_input_args (vNULL),
+    m_output_arg (NULL), m_spill_symbols (vNULL), m_global_symbols (vNULL),
+    m_private_variables (vNULL), m_called_functions (vNULL),
+    m_called_internal_fns (vNULL), m_hbb_count (0),
+    m_in_ssa (true), m_kern_p (kernel_p), m_declaration_p (false),
+    m_decl (fdecl), m_internal_fn (NULL), m_shadow_reg (NULL),
+    m_kernel_dispatch_count (0), m_maximum_omp_data_size (0),
+    m_seen_error (false), m_temp_symbol_count (0), m_ssa_map ()
 {
   int sym_init_len = (vec_safe_length (cfun->local_decls) / 2) + 1;;
   m_local_symbols = new hash_table <hsa_noop_symbol_hasher> (sym_init_len);
@@ -230,16 +232,16 @@ hsa_function_representation::hsa_function_representation
 
 /* Constructor of class representing HSA function information that
    is derived for an internal function.  */
-hsa_function_representation::hsa_function_representation (hsa_internal_fn *fn):
-  m_reg_count (0), m_input_args (vNULL),
-  m_output_arg (NULL), m_local_symbols (NULL),
-  m_spill_symbols (vNULL), m_global_symbols (vNULL),
-  m_private_variables (vNULL), m_called_functions (vNULL),
-  m_called_internal_fns (vNULL), m_hbb_count (0),
-  m_in_ssa (true), m_kern_p (false), m_declaration_p (true), m_decl (NULL),
-  m_internal_fn (fn), m_shadow_reg (NULL), m_kernel_dispatch_count (0),
-  m_maximum_omp_data_size (0), m_seen_error (false), m_temp_symbol_count (0),
-  m_ssa_map () {}
+hsa_function_representation::hsa_function_representation (hsa_internal_fn *fn)
+  : m_reg_count (0), m_input_args (vNULL),
+    m_output_arg (NULL), m_local_symbols (NULL),
+    m_spill_symbols (vNULL), m_global_symbols (vNULL),
+    m_private_variables (vNULL), m_called_functions (vNULL),
+    m_called_internal_fns (vNULL), m_hbb_count (0),
+    m_in_ssa (true), m_kern_p (false), m_declaration_p (true), m_decl (NULL),
+    m_internal_fn (fn), m_shadow_reg (NULL), m_kernel_dispatch_count (0),
+    m_maximum_omp_data_size (0), m_seen_error (false), m_temp_symbol_count (0),
+    m_ssa_map () {}
 
 /* Destructor of class holding function/kernel-wide information and state.  */
 
@@ -344,14 +346,14 @@ static hash_map <nofree_string_hash, omp_simple_builtin> *omp_simple_builtins
 
 /* Warning messages for OMP builtins.  */
 
-#define HSA_WARN_LOCK_ROUTINE "support for HSA does not implement OpenMP lock " \
-  "routines"
+#define HSA_WARN_LOCK_ROUTINE "support for HSA does not implement OpenMP " \
+  "lock routines"
 #define HSA_WARN_TIMING_ROUTINE "support for HSA does not implement OpenMP " \
   "timing routines"
 #define HSA_WARN_MEMORY_ROUTINE "OpenMP device memory library routines have " \
   "undefined semantics within target regions, support for HSA ignores them"
-#define HSA_WARN_AFFINITY "Support for HSA does not implement OpenMP affinity " \
-  "featerues"
+#define HSA_WARN_AFFINITY "Support for HSA does not implement OpenMP " \
+  "affinity feateres"
 
 /* Initialize hash map with simple OMP builtins.  */
 
@@ -361,14 +363,14 @@ hsa_init_simple_builtins ()
   if (omp_simple_builtins != NULL)
     return;
 
-  omp_simple_builtins = new hash_map <nofree_string_hash, omp_simple_builtin>
-    ();
+  omp_simple_builtins
+    = new hash_map <nofree_string_hash, omp_simple_builtin> ();
 
   omp_simple_builtin omp_builtins[] =
     {
-      omp_simple_builtin
-	("omp_get_initial_device", NULL, false,
-	 new hsa_op_immed (GOMP_DEVICE_HOST, (BrigType16_t) BRIG_TYPE_S32)),
+      omp_simple_builtin ("omp_get_initial_device", NULL, false,
+			  new hsa_op_immed (GOMP_DEVICE_HOST,
+					    (BrigType16_t) BRIG_TYPE_S32)),
       omp_simple_builtin ("omp_is_initial_device", NULL, false,
 			  new hsa_op_immed (0, (BrigType16_t) BRIG_TYPE_S32)),
       omp_simple_builtin ("omp_get_dynamic", NULL, false,
@@ -388,25 +390,26 @@ hsa_init_simple_builtins ()
       omp_simple_builtin ("omp_target_alloc", HSA_WARN_MEMORY_ROUTINE, false,
 			  new hsa_op_immed (0, (BrigType16_t) BRIG_TYPE_U64)),
       omp_simple_builtin ("omp_target_free", HSA_WARN_MEMORY_ROUTINE, false),
-      omp_simple_builtin
-	("omp_target_is_present", HSA_WARN_MEMORY_ROUTINE,
-	 false, new hsa_op_immed (-1, (BrigType16_t) BRIG_TYPE_S32)),
+      omp_simple_builtin ("omp_target_is_present", HSA_WARN_MEMORY_ROUTINE,
+			  false,
+			  new hsa_op_immed (-1, (BrigType16_t) BRIG_TYPE_S32)),
       omp_simple_builtin ("omp_target_memcpy", HSA_WARN_MEMORY_ROUTINE, false,
 			  new hsa_op_immed (-1, (BrigType16_t) BRIG_TYPE_S32)),
-      omp_simple_builtin
-	("omp_target_memcpy_rect", HSA_WARN_MEMORY_ROUTINE,
-	 false, new hsa_op_immed (-1, (BrigType16_t) BRIG_TYPE_S32)),
-      omp_simple_builtin
-	("omp_target_associate_ptr", HSA_WARN_MEMORY_ROUTINE, false,
-	 new hsa_op_immed (-1, (BrigType16_t) BRIG_TYPE_S32)),
-      omp_simple_builtin
-	("omp_target_disassociate_ptr", HSA_WARN_MEMORY_ROUTINE,
-	 false, new hsa_op_immed (-1, (BrigType16_t) BRIG_TYPE_S32)),
-      omp_simple_builtin
-        ("omp_set_max_active_levels",
-	 "Support for HSA only allows only one active level, call to "
-	 "omp_set_max_active_levels will be ignored in the generated HSAIL",
-	 false, NULL),
+      omp_simple_builtin ("omp_target_memcpy_rect", HSA_WARN_MEMORY_ROUTINE,
+			  false,
+			  new hsa_op_immed (-1, (BrigType16_t) BRIG_TYPE_S32)),
+      omp_simple_builtin ("omp_target_associate_ptr", HSA_WARN_MEMORY_ROUTINE,
+			  false,
+			  new hsa_op_immed (-1, (BrigType16_t) BRIG_TYPE_S32)),
+      omp_simple_builtin ("omp_target_disassociate_ptr",
+			  HSA_WARN_MEMORY_ROUTINE,
+			  false,
+			  new hsa_op_immed (-1, (BrigType16_t) BRIG_TYPE_S32)),
+      omp_simple_builtin ("omp_set_max_active_levels",
+			  "Support for HSA only allows only one active level, "
+			  "call to omp_set_max_active_levels will be ignored "
+			  "in the generated HSAIL",
+			  false, NULL),
       omp_simple_builtin ("omp_get_max_active_levels", NULL, false,
 			  new hsa_op_immed (1, (BrigType16_t) BRIG_TYPE_S32)),
       omp_simple_builtin ("omp_in_final", NULL, false,
@@ -485,7 +488,8 @@ hsa_init_data_for_cfun ()
   hsa_allocp_inst_signal
     = new object_allocator<hsa_insn_signal> ("HSA signal instructions");
   hsa_allocp_inst_seg
-    = new object_allocator<hsa_insn_seg> ("HSA segment conversion instructions");
+    = new object_allocator<hsa_insn_seg> ("HSA segment conversion "
+					  "instructions");
   hsa_allocp_inst_cmp
     = new object_allocator<hsa_insn_cmp> ("HSA comparison instructions");
   hsa_allocp_inst_br
@@ -721,7 +725,7 @@ hsa_type_for_scalar_tree_type (const_tree type, bool min32int)
   if (min32int)
     {
       /* Registers/immediate operands can only be 32bit or more except for
-         f16.  */
+	 f16.  */
       if (res == BRIG_TYPE_U8 || res == BRIG_TYPE_U16)
 	res = BRIG_TYPE_U32;
       else if (res == BRIG_TYPE_S8 || res == BRIG_TYPE_S16)
@@ -863,7 +867,8 @@ get_symbol_for_decl (tree decl)
 
   dummy.m_decl = decl;
 
-  bool is_in_global_vars = TREE_CODE (decl) == VAR_DECL && is_global_var (decl);
+  bool is_in_global_vars
+    = TREE_CODE (decl) == VAR_DECL && is_global_var (decl);
 
   if (is_in_global_vars)
     slot = hsa_global_variable_symbols->find_slot (&dummy, INSERT);
@@ -916,7 +921,8 @@ get_symbol_for_decl (tree decl)
 tree
 hsa_get_host_function (tree decl)
 {
-  hsa_function_summary *s = hsa_summaries->get (cgraph_node::get_create (decl));
+  hsa_function_summary *s
+    = hsa_summaries->get (cgraph_node::get_create (decl));
   gcc_assert (s->m_kind != HSA_NONE);
   gcc_assert (s->m_gpu_implementation_p);
 
@@ -981,8 +987,8 @@ hsa_get_string_cst_symbol (tree string_cst)
 /* Constructor of the ancestor of all operands.  K is BRIG kind that identified
    what the operator is.  */
 
-hsa_op_base::hsa_op_base (BrigKind16_t k): m_next (NULL), m_brig_op_offset (0),
-  m_kind (k)
+hsa_op_base::hsa_op_base (BrigKind16_t k)
+  : m_next (NULL), m_brig_op_offset (0), m_kind (k)
 {
   hsa_operands.safe_push (this);
 }
@@ -1046,8 +1052,8 @@ hsa_op_immed::hsa_op_immed (tree tree_val, bool min32int)
     m_brig_repr_size = TREE_STRING_LENGTH (m_tree_value);
   else if (TREE_CODE (m_tree_value) == CONSTRUCTOR)
     {
-      m_brig_repr_size = tree_to_uhwi
-	(TYPE_SIZE_UNIT (TREE_TYPE (m_tree_value)));
+      m_brig_repr_size
+	= tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (m_tree_value)));
 
       /* Verify that all elements of a constructor are constants.  */
       for (unsigned i = 0;
@@ -1071,7 +1077,7 @@ hsa_op_immed::hsa_op_immed (tree tree_val, bool min32int)
 
 hsa_op_immed::hsa_op_immed (HOST_WIDE_INT integer_value, BrigType16_t type)
   : hsa_op_with_type (BRIG_KIND_OPERAND_CONSTANT_BYTES, type),
-  m_tree_value (NULL), m_brig_repr (NULL)
+    m_tree_value (NULL), m_brig_repr (NULL)
 {
   gcc_assert (hsa_type_integer_p (type));
   m_int_value = integer_value;
@@ -1101,8 +1107,8 @@ hsa_op_immed::hsa_op_immed (HOST_WIDE_INT integer_value, BrigType16_t type)
   memcpy (m_brig_repr, &bytes, m_brig_repr_size);
 }
 
-hsa_op_immed::hsa_op_immed ():
-  hsa_op_with_type (BRIG_KIND_NONE, BRIG_TYPE_NONE), m_brig_repr (NULL)
+hsa_op_immed::hsa_op_immed ()
+  : hsa_op_with_type (BRIG_KIND_NONE, BRIG_TYPE_NONE), m_brig_repr (NULL)
 {
 }
 
@@ -1134,8 +1140,8 @@ hsa_op_immed::set_type (BrigType16_t t)
 
 hsa_op_reg::hsa_op_reg (BrigType16_t t)
   : hsa_op_with_type (BRIG_KIND_OPERAND_REGISTER, t), m_gimple_ssa (NULL_TREE),
-  m_def_insn (NULL), m_spill_sym (NULL), m_order (hsa_cfun->m_reg_count++),
-  m_lr_begin (0), m_lr_end (0), m_reg_class (0), m_hard_num (0)
+    m_def_insn (NULL), m_spill_sym (NULL), m_order (hsa_cfun->m_reg_count++),
+    m_lr_begin (0), m_lr_end (0), m_reg_class (0), m_hard_num (0)
 {
 }
 
@@ -1207,19 +1213,19 @@ hsa_op_reg::verify_ssa ()
 hsa_op_address::hsa_op_address (hsa_symbol *sym, hsa_op_reg *r,
 				HOST_WIDE_INT offset)
   : hsa_op_base (BRIG_KIND_OPERAND_ADDRESS), m_symbol (sym), m_reg (r),
-  m_imm_offset (offset)
+    m_imm_offset (offset)
 {
 }
 
 hsa_op_address::hsa_op_address (hsa_symbol *sym, HOST_WIDE_INT offset)
   : hsa_op_base (BRIG_KIND_OPERAND_ADDRESS), m_symbol (sym), m_reg (NULL),
-  m_imm_offset (offset)
+    m_imm_offset (offset)
 {
 }
 
 hsa_op_address::hsa_op_address (hsa_op_reg *r, HOST_WIDE_INT offset)
   : hsa_op_base (BRIG_KIND_OPERAND_ADDRESS), m_symbol (NULL), m_reg (r),
-  m_imm_offset (offset)
+    m_imm_offset (offset)
 {
 }
 
@@ -1314,9 +1320,10 @@ hsa_op_reg::set_definition (hsa_insn_basic *insn)
    operand vector will contain (and which will be cleared).  OP is the opcode
    of the instruction.  This constructor does not set type.  */
 
-hsa_insn_basic::hsa_insn_basic (unsigned nops, int opc): m_prev (NULL),
-  m_next (NULL), m_bb (NULL), m_opcode (opc), m_number (0),
-  m_type (BRIG_TYPE_NONE), m_brig_offset (0)
+hsa_insn_basic::hsa_insn_basic (unsigned nops, int opc)
+  : m_prev (NULL),
+    m_next (NULL), m_bb (NULL), m_opcode (opc), m_number (0),
+    m_type (BRIG_TYPE_NONE), m_brig_offset (0)
 {
   if (nops > 0)
     m_operands.safe_grow_cleared (nops);
@@ -1381,9 +1388,9 @@ hsa_insn_basic::operand_count ()
 
 hsa_insn_basic::hsa_insn_basic (unsigned nops, int opc, BrigType16_t t,
 				hsa_op_base *arg0, hsa_op_base *arg1,
-				hsa_op_base *arg2, hsa_op_base *arg3):
-  m_prev (NULL), m_next (NULL), m_bb (NULL), m_opcode (opc),m_number (0),
-  m_type (t),  m_brig_offset (0)
+				hsa_op_base *arg2, hsa_op_base *arg3)
+ : m_prev (NULL), m_next (NULL), m_bb (NULL), m_opcode (opc),m_number (0),
+   m_type (t),  m_brig_offset (0)
 {
   if (nops > 0)
     m_operands.safe_grow_cleared (nops);
@@ -1486,8 +1493,8 @@ hsa_insn_phi::operator new (size_t)
    new instruction is automatically added to its uses list.  */
 
 hsa_insn_br::hsa_insn_br (hsa_op_reg *ctrl)
-: hsa_insn_basic (1, BRIG_OPCODE_CBR, BRIG_TYPE_B1, ctrl),
-  m_width (BRIG_WIDTH_1)
+  : hsa_insn_basic (1, BRIG_OPCODE_CBR, BRIG_TYPE_B1, ctrl),
+    m_width (BRIG_WIDTH_1)
 {
 }
 
@@ -1503,9 +1510,9 @@ hsa_insn_br::operator new (size_t)
    the index register.  */
 
 hsa_insn_sbr::hsa_insn_sbr (hsa_op_reg *index, unsigned jump_count)
-: hsa_insn_basic (1, BRIG_OPCODE_SBR, BRIG_TYPE_B1, index),
-  m_width (BRIG_WIDTH_1), m_jump_table (vNULL), m_default_bb (NULL),
-  m_label_code_list (new hsa_op_code_list (jump_count))
+  : hsa_insn_basic (1, BRIG_OPCODE_SBR, BRIG_TYPE_B1, index),
+    m_width (BRIG_WIDTH_1), m_jump_table (vNULL), m_default_bb (NULL),
+    m_label_code_list (new hsa_op_code_list (jump_count))
 {
 }
 
@@ -1558,7 +1565,7 @@ hsa_insn_cmp::operator new (size_t)
 hsa_insn_mem::hsa_insn_mem (int opc, BrigType16_t t, hsa_op_base *arg0,
 			    hsa_op_base *arg1)
   : hsa_insn_basic (2, opc, t, arg0, arg1),
-  m_align (hsa_natural_alignment (t)), m_equiv_class (0)
+    m_align (hsa_natural_alignment (t)), m_equiv_class (0)
 {
   gcc_checking_assert (opc == BRIG_OPCODE_LD || opc == BRIG_OPCODE_ST);
 }
@@ -1572,7 +1579,7 @@ hsa_insn_mem::hsa_insn_mem (unsigned nops, int opc, BrigType16_t t,
 			    hsa_op_base *arg0, hsa_op_base *arg1,
 			    hsa_op_base *arg2, hsa_op_base *arg3)
   : hsa_insn_basic (nops, opc, t, arg0, arg1, arg2, arg3),
-  m_align (hsa_natural_alignment (t)), m_equiv_class (0)
+    m_align (hsa_natural_alignment (t)), m_equiv_class (0)
 {
 }
 
@@ -1584,7 +1591,7 @@ hsa_insn_mem::operator new (size_t)
   return hsa_allocp_inst_mem->allocate_raw ();
 }
 
-/* Constructor of class representing atomic instructions and signals. OPC is
+/* Constructor of class representing atomic instructions and signals.  OPC is
    the principal opcode, aop is the specific atomic operation opcode.  T is the
    type of the instruction.  The instruction operands
    are provided as ARG[0-3].  */
@@ -1596,8 +1603,8 @@ hsa_insn_atomic::hsa_insn_atomic (int nops, int opc,
 				  hsa_op_base *arg1, hsa_op_base *arg2,
 				  hsa_op_base *arg3)
   : hsa_insn_mem (nops, opc, t, arg0, arg1, arg2, arg3), m_atomicop (aop),
-  m_memoryorder (memorder),
-  m_memoryscope (BRIG_MEMORY_SCOPE_SYSTEM)
+    m_memoryorder (memorder),
+    m_memoryscope (BRIG_MEMORY_SCOPE_SYSTEM)
 {
   gcc_checking_assert (opc == BRIG_OPCODE_ATOMICNORET ||
 		       opc == BRIG_OPCODE_ATOMIC ||
@@ -1645,7 +1652,7 @@ hsa_insn_seg::hsa_insn_seg (int opc, BrigType16_t dest, BrigType16_t srct,
 			    BrigSegment8_t seg, hsa_op_base *arg0,
 			    hsa_op_base *arg1)
   : hsa_insn_basic (2, opc, dest, arg0, arg1), m_src_type (srct),
-  m_segment (seg)
+    m_segment (seg)
 {
   gcc_checking_assert (opc == BRIG_OPCODE_STOF || opc == BRIG_OPCODE_FTOS);
 }
@@ -1663,14 +1670,14 @@ hsa_insn_seg::operator new (size_t)
 
 hsa_insn_call::hsa_insn_call (tree callee)
   : hsa_insn_basic (0, BRIG_OPCODE_CALL), m_called_function (callee),
-  m_output_arg (NULL), m_args_code_list (NULL), m_result_code_list (NULL)
+    m_output_arg (NULL), m_args_code_list (NULL), m_result_code_list (NULL)
 {
 }
 
 hsa_insn_call::hsa_insn_call (hsa_internal_fn *fn)
   : hsa_insn_basic (0, BRIG_OPCODE_CALL), m_called_function (NULL),
-  m_called_internal_fn (fn), m_output_arg (NULL), m_args_code_list (NULL),
-  m_result_code_list (NULL)
+    m_called_internal_fn (fn), m_output_arg (NULL), m_args_code_list (NULL),
+    m_result_code_list (NULL)
 {
 }
 
@@ -1698,7 +1705,7 @@ hsa_insn_call::~hsa_insn_call ()
 hsa_insn_arg_block::hsa_insn_arg_block (BrigKind brig_kind,
 					hsa_insn_call * call)
   : hsa_insn_basic (0, HSA_OPCODE_ARG_BLOCK), m_kind (brig_kind),
-  m_call_insn (call)
+    m_call_insn (call)
 {
 }
 
@@ -1757,7 +1764,7 @@ hsa_insn_srctype::hsa_insn_srctype (int nops, BrigOpcode opcode,
 				    hsa_op_base *arg0, hsa_op_base *arg1,
 				    hsa_op_base *arg2 = NULL)
   : hsa_insn_basic (nops, opcode, destt, arg0, arg1, arg2),
-  m_source_type (srct)
+    m_source_type (srct)
 {}
 
 /* New operator to allocate packed instruction from pool alloc.  */
@@ -1807,7 +1814,7 @@ hsa_insn_alloca::operator new (size_t)
 hsa_insn_alloca::hsa_insn_alloca (hsa_op_with_type *dest,
 				  hsa_op_with_type *size, unsigned alignment)
   : hsa_insn_basic (2, BRIG_OPCODE_ALLOCA, dest->m_type, dest, size),
-  m_align (BRIG_ALIGNMENT_8)
+    m_align (BRIG_ALIGNMENT_8)
 {
   gcc_assert (dest->m_type == BRIG_TYPE_U32);
   if (alignment)
@@ -1950,8 +1957,8 @@ process_mem_base (tree base, hsa_symbol **symbol, BrigType16_t *addrtype,
   if (TREE_CODE (base) == SSA_NAME)
     {
       gcc_assert (!*reg);
-      hsa_op_with_type *ssa = hsa_cfun->reg_for_gimple_ssa (base)->get_in_type
-	(*addrtype, hbb);
+      hsa_op_with_type *ssa
+	= hsa_cfun->reg_for_gimple_ssa (base)->get_in_type (*addrtype, hbb);
       *reg = dyn_cast <hsa_op_reg *> (ssa);
     }
   else if (TREE_CODE (base) == ADDR_EXPR)
@@ -2290,8 +2297,9 @@ gen_hsa_insns_for_bitfield (hsa_op_reg *dest, hsa_op_reg *value_reg,
       hsa_op_reg *value_reg_2 = new hsa_op_reg (dest->m_type);
       hsa_op_immed *c = new hsa_op_immed (left_shift, BRIG_TYPE_U32);
 
-      hsa_insn_basic *lshift = new hsa_insn_basic
-	(3, BRIG_OPCODE_SHL, value_reg_2->m_type, value_reg_2, value_reg, c);
+      hsa_insn_basic *lshift
+	= new hsa_insn_basic (3, BRIG_OPCODE_SHL, value_reg_2->m_type,
+			      value_reg_2, value_reg, c);
 
       hbb->append_insn (lshift);
 
@@ -2303,16 +2311,17 @@ gen_hsa_insns_for_bitfield (hsa_op_reg *dest, hsa_op_reg *value_reg,
       hsa_op_reg *value_reg_2 = new hsa_op_reg (dest->m_type);
       hsa_op_immed *c = new hsa_op_immed (right_shift, BRIG_TYPE_U32);
 
-      hsa_insn_basic *rshift = new hsa_insn_basic
-	(3, BRIG_OPCODE_SHR, value_reg_2->m_type, value_reg_2, value_reg, c);
+      hsa_insn_basic *rshift
+	= new hsa_insn_basic (3, BRIG_OPCODE_SHR, value_reg_2->m_type,
+			      value_reg_2, value_reg, c);
 
       hbb->append_insn (rshift);
 
       value_reg = value_reg_2;
     }
 
-    hsa_insn_basic *assignment = new hsa_insn_basic
-      (2, BRIG_OPCODE_MOV, dest->m_type, dest, value_reg);
+    hsa_insn_basic *assignment
+      = new hsa_insn_basic (2, BRIG_OPCODE_MOV, dest->m_type, dest, value_reg);
     hbb->append_insn (assignment);
 }
 
@@ -2404,19 +2413,22 @@ gen_hsa_insns_for_load (hsa_op_reg *dest, tree rhs, tree type, hsa_bb *hbb)
 	  hsa_op_immed *real_part = new hsa_op_immed (TREE_REALPART (rhs));
 	  hsa_op_immed *imag_part = new hsa_op_immed (TREE_IMAGPART (rhs));
 
-	  hsa_op_reg *real_part_reg = new hsa_op_reg
-	    (hsa_type_for_scalar_tree_type (TREE_TYPE (type), true));
-	  hsa_op_reg *imag_part_reg = new hsa_op_reg
-	    (hsa_type_for_scalar_tree_type (TREE_TYPE (type), true));
+	  hsa_op_reg *real_part_reg
+	    = new hsa_op_reg (hsa_type_for_scalar_tree_type (TREE_TYPE (type),
+							     true));
+	  hsa_op_reg *imag_part_reg
+	    = new hsa_op_reg (hsa_type_for_scalar_tree_type (TREE_TYPE (type),
+							     true));
 
 	  hsa_build_append_simple_mov (real_part_reg, real_part, hbb);
 	  hsa_build_append_simple_mov (imag_part_reg, imag_part, hbb);
 
 	  BrigType16_t src_type = hsa_bittype_for_type (real_part_reg->m_type);
 
-	  hsa_insn_packed *insn = new hsa_insn_packed
-	    (3, BRIG_OPCODE_COMBINE, dest->m_type, src_type, dest,
-	     real_part_reg, imag_part_reg);
+	  hsa_insn_packed *insn
+	    = new hsa_insn_packed (3, BRIG_OPCODE_COMBINE, dest->m_type,
+				   src_type, dest, real_part_reg,
+				   imag_part_reg);
 	  hbb->append_insn (insn);
 	}
       else
@@ -2429,22 +2441,23 @@ gen_hsa_insns_for_load (hsa_op_reg *dest, tree rhs, tree type, hsa_bb *hbb)
     {
       tree pack_type = TREE_TYPE (TREE_OPERAND (rhs, 0));
 
-      hsa_op_reg *packed_reg = new hsa_op_reg
-	(hsa_type_for_scalar_tree_type (pack_type, true));
+      hsa_op_reg *packed_reg
+	= new hsa_op_reg (hsa_type_for_scalar_tree_type (pack_type, true));
 
       tree complex_rhs = TREE_OPERAND (rhs, 0);
       gen_hsa_insns_for_load (packed_reg, complex_rhs, TREE_TYPE (complex_rhs),
 			      hbb);
 
-      hsa_op_reg *real_reg = new hsa_op_reg
-	(hsa_type_for_scalar_tree_type (type, true));
+      hsa_op_reg *real_reg
+	= new hsa_op_reg (hsa_type_for_scalar_tree_type (type, true));
 
-      hsa_op_reg *imag_reg = new hsa_op_reg
-	(hsa_type_for_scalar_tree_type (type, true));
+      hsa_op_reg *imag_reg
+	= new hsa_op_reg (hsa_type_for_scalar_tree_type (type, true));
 
       BrigKind16_t brig_type = packed_reg->m_type;
-      hsa_insn_packed *packed = new hsa_insn_packed
-	(3, BRIG_OPCODE_EXPAND, hsa_bittype_for_type (real_reg->m_type),
+      hsa_insn_packed *packed
+	= new hsa_insn_packed (3, BRIG_OPCODE_EXPAND,
+			       hsa_bittype_for_type (real_reg->m_type),
 	 brig_type, real_reg, imag_reg, packed_reg);
 
       hbb->append_insn (packed);
@@ -2502,9 +2515,10 @@ gen_hsa_insns_for_load (hsa_op_reg *dest, tree rhs, tree type, hsa_bb *hbb)
 	}
     }
   else
-    HSA_SORRY_ATV
-      (EXPR_LOCATION (rhs),
-       "support for HSA does not implement loading of expression %E", rhs);
+    HSA_SORRY_ATV (EXPR_LOCATION (rhs),
+		   "support for HSA does not implement loading "
+		   "of expression %E",
+		   rhs);
 }
 
 /* Return number of bits necessary for representation of a bit field,
@@ -2556,8 +2570,9 @@ gen_hsa_insns_for_store (tree lhs, hsa_op_base *src, hsa_bb *hbb)
   if (bitpos || (bitsize && type_bitsize != bitsize))
     {
       unsigned HOST_WIDE_INT mask = 0;
-      BrigType16_t mem_type = get_integer_type_by_bytes
-	(type_bitsize / BITS_PER_UNIT, !TYPE_UNSIGNED (TREE_TYPE (lhs)));
+      BrigType16_t mem_type
+	= get_integer_type_by_bytes (type_bitsize / BITS_PER_UNIT,
+				     !TYPE_UNSIGNED (TREE_TYPE (lhs)));
 
       for (unsigned i = 0; i < type_bitsize; i++)
 	if (i < bitpos || i >= bitpos + bitsize)
@@ -2575,11 +2590,13 @@ gen_hsa_insns_for_store (tree lhs, hsa_op_base *src, hsa_bb *hbb)
       /* AND the loaded value with prepared mask.  */
       hsa_op_reg *cleared_reg = new hsa_op_reg (mem_type);
 
-      hsa_op_immed *c = new hsa_op_immed
-	(mask, get_integer_type_by_bytes (type_bitsize / BITS_PER_UNIT, false));
+      BrigType16_t t
+	= get_integer_type_by_bytes (type_bitsize / BITS_PER_UNIT, false);
+      hsa_op_immed *c = new hsa_op_immed (mask, t);
 
-      hsa_insn_basic *clearing = new hsa_insn_basic
-	(3, BRIG_OPCODE_AND, mem_type, cleared_reg, value_reg, c);
+      hsa_insn_basic *clearing
+	= new hsa_insn_basic (3, BRIG_OPCODE_AND, mem_type, cleared_reg,
+			      value_reg, c);
       hbb->append_insn (clearing);
 
       /* Shift to left a value that is going to be stored.  */
@@ -2594,8 +2611,9 @@ gen_hsa_insns_for_store (tree lhs, hsa_op_base *src, hsa_bb *hbb)
 	  hsa_op_reg *shifted_value_reg = new hsa_op_reg (mem_type);
 	  c = new hsa_op_immed (bitpos, BRIG_TYPE_U32);
 
-	  hsa_insn_basic *basic = new hsa_insn_basic
-	    (3, BRIG_OPCODE_SHL, mem_type, shifted_value_reg, new_value_reg, c);
+	  hsa_insn_basic *basic
+	    = new hsa_insn_basic (3, BRIG_OPCODE_SHL, mem_type,
+				  shifted_value_reg, new_value_reg, c);
 	  hbb->append_insn (basic);
 
 	  new_value_reg = shifted_value_reg;
@@ -2616,11 +2634,10 @@ gen_hsa_insns_for_store (tree lhs, hsa_op_base *src, hsa_bb *hbb)
   hsa_insn_mem *mem = new hsa_insn_mem (BRIG_OPCODE_ST, mtype, src, addr);
   mem->set_align (req_align);
 
-  /* XXX The HSAIL verifier has another constraint: if the source
-     is an immediate then it must match the destination type.  If
-     it's a register the low bits will be used for sub-word stores.
-     We're always allocating new operands so we can modify the above
-     in place.  */
+  /* The HSAIL verifier has another constraint: if the source is an immediate
+     then it must match the destination type.  If it's a register the low bits
+     will be used for sub-word stores.  We're always allocating new operands so
+     we can modify the above in place.  */
   if (hsa_op_immed *imm = dyn_cast <hsa_op_immed *> (src))
     {
       if ((imm->m_type & BRIG_TYPE_PACK_MASK) == BRIG_TYPE_PACK_NONE)
@@ -2628,7 +2645,7 @@ gen_hsa_insns_for_store (tree lhs, hsa_op_base *src, hsa_bb *hbb)
       else
 	{
 	  /* ...and all vector immediates apparently need to be vectors of
-	     unsigned bytes. */
+	     unsigned bytes.  */
 	  unsigned bs = hsa_type_bit_size (imm->m_type);
 	  gcc_assert (bs == hsa_type_bit_size (mem->m_type));
 	  switch (bs)
@@ -3189,16 +3206,16 @@ gen_hsa_insns_for_operation_assignment (gimple *assign, hsa_bb *hbb)
     case UNEQ_EXPR:
     case LTGT_EXPR:
       {
-	hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa
-	  (gimple_assign_lhs (assign));
+	hsa_op_reg *dest
+	  = hsa_cfun->reg_for_gimple_ssa (gimple_assign_lhs (assign));
 
 	gen_hsa_cmp_insn_from_gimple (code, rhs1, rhs2, dest, hbb);
 	return;
       }
     case COND_EXPR:
       {
-	hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa
-	  (gimple_assign_lhs (assign));
+	hsa_op_reg *dest
+	  = hsa_cfun->reg_for_gimple_ssa (gimple_assign_lhs (assign));
 	hsa_op_with_type *ctrl = NULL;
 	tree cond = rhs1;
 
@@ -3216,10 +3233,8 @@ gen_hsa_insns_for_operation_assignment (gimple *assign, hsa_bb *hbb)
 	    ctrl = r;
 	  }
 
-	hsa_op_with_type *rhs2_reg = hsa_reg_or_immed_for_gimple_op
-	  (rhs2, hbb);
-	hsa_op_with_type *rhs3_reg = hsa_reg_or_immed_for_gimple_op
-	  (rhs3, hbb);
+	hsa_op_with_type *rhs2_reg = hsa_reg_or_immed_for_gimple_op (rhs2, hbb);
+	hsa_op_with_type *rhs3_reg = hsa_reg_or_immed_for_gimple_op (rhs3, hbb);
 
 	BrigType16_t btype = hsa_bittype_for_type (dest->m_type);
 	hsa_op_reg *tmp = new hsa_op_reg (btype);
@@ -3227,8 +3242,9 @@ gen_hsa_insns_for_operation_assignment (gimple *assign, hsa_bb *hbb)
 	rhs2_reg->m_type = btype;
 	rhs3_reg->m_type = btype;
 
-	hsa_insn_basic *insn = new hsa_insn_basic
-	  (4, BRIG_OPCODE_CMOV, tmp->m_type, tmp, ctrl, rhs2_reg, rhs3_reg);
+	hsa_insn_basic *insn
+	  = new hsa_insn_basic (4, BRIG_OPCODE_CMOV, tmp->m_type, tmp, ctrl,
+				rhs2_reg, rhs3_reg);
 
 	hbb->append_insn (insn);
 
@@ -3242,8 +3258,8 @@ gen_hsa_insns_for_operation_assignment (gimple *assign, hsa_bb *hbb)
       }
     case COMPLEX_EXPR:
       {
-	hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa
-	  (gimple_assign_lhs (assign));
+	hsa_op_reg *dest
+	  = hsa_cfun->reg_for_gimple_ssa (gimple_assign_lhs (assign));
 	hsa_op_with_type *rhs1_reg = hsa_reg_or_immed_for_gimple_op (rhs1, hbb);
 	hsa_op_with_type *rhs2_reg = hsa_reg_or_immed_for_gimple_op (rhs2, hbb);
 
@@ -3254,9 +3270,9 @@ gen_hsa_insns_for_operation_assignment (gimple *assign, hsa_bb *hbb)
 	rhs1_reg = rhs1_reg->get_in_type (src_type, hbb);
 	rhs2_reg = rhs2_reg->get_in_type (src_type, hbb);
 
-	hsa_insn_packed *insn = new hsa_insn_packed
-	  (3, BRIG_OPCODE_COMBINE, dest->m_type, src_type, dest,
-	   rhs1_reg, rhs2_reg);
+	hsa_insn_packed *insn
+	  = new hsa_insn_packed (3, BRIG_OPCODE_COMBINE, dest->m_type, src_type,
+				 dest, rhs1_reg, rhs2_reg);
 	hbb->append_insn (insn);
 
 	return;
@@ -3378,8 +3394,8 @@ gen_hsa_insns_for_switch_stmt (gswitch *s, hsa_bb *hbb)
 
   hsa_insn_sbr *sbr = new hsa_insn_sbr (sub_index, size + 1);
   tree default_label = gimple_switch_default_label (s);
-  basic_block default_label_bb = label_to_block_fn
-    (func, CASE_LABEL (default_label));
+  basic_block default_label_bb = label_to_block_fn (func,
+						    CASE_LABEL (default_label));
 
   sbr->m_default_bb = default_label_bb;
 
@@ -3393,8 +3409,8 @@ gen_hsa_insns_for_switch_stmt (gswitch *s, hsa_bb *hbb)
       tree label = gimple_switch_label (s, i);
       basic_block bb = label_to_block_fn (func, CASE_LABEL (label));
 
-      unsigned HOST_WIDE_INT sub_low = tree_to_uhwi
-	(int_const_binop (MINUS_EXPR, CASE_LOW (label), lowest));
+      unsigned HOST_WIDE_INT sub_low
+	= tree_to_uhwi (int_const_binop (MINUS_EXPR, CASE_LOW (label), lowest));
 
       unsigned HOST_WIDE_INT sub_high = sub_low;
       tree high = CASE_HIGH (label);
@@ -3437,8 +3453,8 @@ get_format_argument_type (tree formal_arg_type, BrigType16_t actual_arg_type)
   if (formal_arg_type == NULL)
     return actual_arg_type;
 
-  BrigType16_t decl_type = hsa_type_for_scalar_tree_type
-    (formal_arg_type, false);
+  BrigType16_t decl_type
+    = hsa_type_for_scalar_tree_type (formal_arg_type, false);
   return mem_type_for_type (decl_type);
 }
 
@@ -3458,8 +3474,8 @@ gen_hsa_insns_for_direct_call (gimple *stmt, hsa_bb *hbb)
   hsa_cfun->m_called_functions.safe_push (call_insn->m_called_function);
 
   /* Argument block start.  */
-  hsa_insn_arg_block *arg_start = new hsa_insn_arg_block
-    (BRIG_KIND_DIRECTIVE_ARG_BLOCK_START, call_insn);
+  hsa_insn_arg_block *arg_start
+    = new hsa_insn_arg_block (BRIG_KIND_DIRECTIVE_ARG_BLOCK_START, call_insn);
   hbb->append_insn (arg_start);
 
   tree parm_type_chain = TYPE_ARG_TYPES (gimple_call_fntype (stmt));
@@ -3493,16 +3509,17 @@ gen_hsa_insns_for_direct_call (gimple *stmt, hsa_bb *hbb)
 	      return;
 	    }
 
-	  BrigType16_t formal_arg_type = get_format_argument_type
-	    (parm_decl_type, src->m_type);
+	  BrigType16_t formal_arg_type
+	    = get_format_argument_type (parm_decl_type, src->m_type);
 	  if (hsa_seen_error ())
 	    return;
 
 	  if (src->m_type != formal_arg_type)
 	    src = src->get_in_type (formal_arg_type, hbb);
 
-	  addr = gen_hsa_addr_for_arg
-	    (parm_decl_type != NULL_TREE ? parm_decl_type: TREE_TYPE (parm), i);
+	  addr
+	    = gen_hsa_addr_for_arg (parm_decl_type != NULL_TREE ?
+				    parm_decl_type: TREE_TYPE (parm), i);
 	  hsa_insn_mem *mem = new hsa_insn_mem (BRIG_OPCODE_ST, formal_arg_type,
 						src, addr);
 
@@ -3542,8 +3559,9 @@ gen_hsa_insns_for_direct_call (gimple *stmt, hsa_bb *hbb)
 	    }
 	  else
 	    {
-	      BrigType16_t mtype = mem_type_for_type
-		(hsa_type_for_scalar_tree_type (lhs_type, false));
+	      BrigType16_t mtype
+		= mem_type_for_type (hsa_type_for_scalar_tree_type (lhs_type,
+								    false));
 
 	      hsa_op_reg *dst = hsa_cfun->reg_for_gimple_ssa (result);
 	      result_insn = new hsa_insn_mem (BRIG_OPCODE_LD, mtype, dst, addr);
@@ -3568,8 +3586,8 @@ gen_hsa_insns_for_direct_call (gimple *stmt, hsa_bb *hbb)
     }
 
   /* Argument block end.  */
-  hsa_insn_arg_block *arg_end = new hsa_insn_arg_block
-    (BRIG_KIND_DIRECTIVE_ARG_BLOCK_END, call_insn);
+  hsa_insn_arg_block *arg_end
+    = new hsa_insn_arg_block (BRIG_KIND_DIRECTIVE_ARG_BLOCK_END, call_insn);
   hbb->append_insn (arg_end);
 }
 
@@ -3588,16 +3606,17 @@ gen_hsa_insns_for_call_of_internal_fn (gimple *stmt, hsa_bb *hbb)
   tree rhs1 = gimple_call_arg (stmt, 0);
   tree rhs1_type = TREE_TYPE (rhs1);
   enum internal_fn fn = gimple_call_internal_fn (stmt);
-  hsa_insn_call *call_insn = new hsa_insn_call
-    (new hsa_internal_fn (fn, tree_to_uhwi (TYPE_SIZE (rhs1_type))));
+  hsa_internal_fn *ifn
+    = new hsa_internal_fn (fn, tree_to_uhwi (TYPE_SIZE (rhs1_type)));
+  hsa_insn_call *call_insn = new hsa_insn_call (ifn);
 
   gcc_checking_assert (FLOAT_TYPE_P (rhs1_type));
 
   if (!hsa_emitted_internal_decls->find (call_insn->m_called_internal_fn))
     hsa_cfun->m_called_internal_fns.safe_push (call_insn->m_called_internal_fn);
 
-  hsa_insn_arg_block *arg_start = new hsa_insn_arg_block
-    (BRIG_KIND_DIRECTIVE_ARG_BLOCK_START, call_insn);
+  hsa_insn_arg_block *arg_start
+    = new hsa_insn_arg_block (BRIG_KIND_DIRECTIVE_ARG_BLOCK_START, call_insn);
   hbb->append_insn (arg_start);
 
   unsigned num_args = gimple_call_num_args (stmt);
@@ -3626,8 +3645,8 @@ gen_hsa_insns_for_call_of_internal_fn (gimple *stmt, hsa_bb *hbb)
   call_insn->m_result_code_list = new hsa_op_code_list (1);
 
   /* Argument block end.  */
-  hsa_insn_arg_block *arg_end = new hsa_insn_arg_block
-    (BRIG_KIND_DIRECTIVE_ARG_BLOCK_END, call_insn);
+  hsa_insn_arg_block *arg_end
+    = new hsa_insn_arg_block (BRIG_KIND_DIRECTIVE_ARG_BLOCK_END, call_insn);
   hbb->append_insn (arg_end);
 }
 
@@ -3651,8 +3670,9 @@ gen_hsa_insns_for_return (greturn *stmt, hsa_bb *hbb)
 	}
       else
 	{
-	  BrigType16_t mtype = mem_type_for_type
-	    (hsa_type_for_scalar_tree_type (TREE_TYPE (retval), false));
+	  BrigType16_t t = hsa_type_for_scalar_tree_type (TREE_TYPE (retval),
+							  false);
+	  BrigType16_t mtype = mem_type_for_type (t);
 
 	  /* Store of return value.  */
 	  hsa_op_with_type *src = hsa_reg_or_immed_for_gimple_op (retval, hbb);
@@ -3707,8 +3727,8 @@ query_hsa_grid (hsa_op_reg *dest, BrigType16_t opcode, int dimension,
 {
   /* We're using just one-dimensional kernels, so hard-coded
      dimension X.  */
-  hsa_op_immed *imm = new hsa_op_immed (dimension,
-					(BrigKind16_t) BRIG_TYPE_U32);
+  hsa_op_immed *imm
+    = new hsa_op_immed (dimension, (BrigKind16_t) BRIG_TYPE_U32);
   hsa_insn_basic *insn = new hsa_insn_basic (2, opcode, BRIG_TYPE_U32, NULL,
 					     imm);
   hbb->append_insn (insn);
@@ -3743,8 +3763,8 @@ gen_set_num_threads (tree value, hsa_bb *hbb)
   src = src->get_in_type (hsa_num_threads->m_type, hbb);
   hsa_op_address *addr = new hsa_op_address (hsa_num_threads);
 
-  hsa_insn_basic *basic = new hsa_insn_mem
-    (BRIG_OPCODE_ST, hsa_num_threads->m_type, src, addr);
+  hsa_insn_basic *basic
+    = new hsa_insn_mem (BRIG_OPCODE_ST, hsa_num_threads->m_type, src, addr);
   hbb->append_insn (basic);
 }
 
@@ -3841,35 +3861,35 @@ gen_num_threads_for_dispatch (hsa_bb *hbb)
   hsa_op_immed *limit = new hsa_op_immed (HSA_DEFAULT_NUM_THREADS,
 					  BRIG_TYPE_U32);
   hsa_op_reg *r = new hsa_op_reg (BRIG_TYPE_B1);
-  hbb->append_insn
-    (new hsa_insn_cmp (BRIG_COMPARE_LT, r->m_type, r, threads, limit));
+  hsa_insn_cmp * cmp
+    = new hsa_insn_cmp (BRIG_COMPARE_LT, r->m_type, r, threads, limit);
+  hbb->append_insn (cmp);
 
   BrigType16_t btype = hsa_bittype_for_type (threads->m_type);
   hsa_op_reg *tmp = new hsa_op_reg (threads->m_type);
 
-  hbb->append_insn
-    (new hsa_insn_basic (4, BRIG_OPCODE_CMOV, btype, tmp, r,
-			 threads, limit));
+  hbb->append_insn (new hsa_insn_basic (4, BRIG_OPCODE_CMOV, btype, tmp, r,
+					threads, limit));
 
   /* Step 2) If the number is equal to zero,
      return shadow->omp_num_threads.  */
   hsa_op_reg *shadow_reg_ptr = hsa_cfun->get_shadow_reg ();
 
   hsa_op_reg *shadow_thread_count = new hsa_op_reg (BRIG_TYPE_U32);
-  addr = new hsa_op_address
-   (shadow_reg_ptr, get_hsa_kernel_dispatch_offset ("omp_num_threads"));
-  hsa_insn_basic *basic = new hsa_insn_mem
-   (BRIG_OPCODE_LD, shadow_thread_count->m_type, shadow_thread_count, addr);
+  addr
+    = new hsa_op_address (shadow_reg_ptr,
+			  get_hsa_kernel_dispatch_offset ("omp_num_threads"));
+  hsa_insn_basic *basic
+    = new hsa_insn_mem (BRIG_OPCODE_LD, shadow_thread_count->m_type,
+			shadow_thread_count, addr);
   hbb->append_insn (basic);
 
   hsa_op_reg *tmp2 = new hsa_op_reg (threads->m_type);
   r = new hsa_op_reg (BRIG_TYPE_B1);
-  hbb->append_insn
-    (new hsa_insn_cmp (BRIG_COMPARE_EQ, r->m_type, r, tmp,
-		       new hsa_op_immed (0, shadow_thread_count->m_type)));
-  hbb->append_insn
-    (new hsa_insn_basic (4, BRIG_OPCODE_CMOV, btype, tmp2, r,
-			 shadow_thread_count, tmp));
+  hsa_op_immed *imm = new hsa_op_immed (0, shadow_thread_count->m_type);
+  hbb->append_insn (new hsa_insn_cmp (BRIG_COMPARE_EQ, r->m_type, r, tmp, imm));
+  hbb->append_insn (new hsa_insn_basic (4, BRIG_OPCODE_CMOV, btype, tmp2, r,
+					shadow_thread_count, tmp));
 
   hsa_op_base *dest = tmp2->get_in_type (BRIG_TYPE_U16, hbb);
 
@@ -3892,8 +3912,8 @@ gen_get_num_teams (gimple *stmt, hsa_bb *hbb)
   hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs);
   hsa_op_immed *one = new hsa_op_immed (1, dest->m_type);
 
-  hsa_insn_basic *basic = new hsa_insn_basic
-    (2, BRIG_OPCODE_MOV, dest->m_type, dest, one);
+  hsa_insn_basic *basic
+    = new hsa_insn_basic (2, BRIG_OPCODE_MOV, dest->m_type, dest, one);
 
   hbb->append_insn (basic);
 }
@@ -3913,8 +3933,8 @@ gen_get_team_num (gimple *stmt, hsa_bb *hbb)
   hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs);
   hsa_op_immed *zero = new hsa_op_immed (0, dest->m_type);
 
-  hsa_insn_basic *basic = new hsa_insn_basic
-    (2, BRIG_OPCODE_MOV, dest->m_type, dest, zero);
+  hsa_insn_basic *basic
+    = new hsa_insn_basic (2, BRIG_OPCODE_MOV, dest->m_type, dest, zero);
 
   hbb->append_insn (basic);
 }
@@ -3942,8 +3962,9 @@ gen_get_level (gimple *stmt, hsa_bb *hbb)
       return;
     }
 
-  hsa_op_address *addr = new hsa_op_address
-    (shadow_reg_ptr, get_hsa_kernel_dispatch_offset ("omp_level"));
+  hsa_op_address *addr
+    = new hsa_op_address (shadow_reg_ptr,
+			  get_hsa_kernel_dispatch_offset ("omp_level"));
 
   hsa_insn_mem *mem = new hsa_insn_mem (BRIG_OPCODE_LD, BRIG_TYPE_U64, NULL,
 					addr);
@@ -3990,10 +4011,10 @@ gen_hsa_alloca (gcall *call, hsa_bb *hbb)
       tree alignment_tree = gimple_call_arg (call, 1);
       if (TREE_CODE (alignment_tree) != INTEGER_CST)
 	{
-	  HSA_SORRY_ATV
-	    (gimple_location (call), "support for HSA does not implement "
-	     "__builtin_alloca_with_align with a non-constant "
-	     "alignment: %E", alignment_tree);
+	  HSA_SORRY_ATV (gimple_location (call),
+			 "support for HSA does not implement "
+			 "__builtin_alloca_with_align with a non-constant "
+			 "alignment: %E", alignment_tree);
 	}
 
       bit_alignment = tree_to_uhwi (alignment_tree);
@@ -4004,14 +4025,15 @@ gen_hsa_alloca (gcall *call, hsa_bb *hbb)
     ->get_in_type (BRIG_TYPE_U32, hbb);
   hsa_op_with_type *dest = hsa_cfun->reg_for_gimple_ssa (lhs);
 
-  hsa_op_reg *tmp = new hsa_op_reg
-    (hsa_get_segment_addr_type (BRIG_SEGMENT_PRIVATE));
+  hsa_op_reg *tmp
+    = new hsa_op_reg (hsa_get_segment_addr_type (BRIG_SEGMENT_PRIVATE));
   hsa_insn_alloca *a = new hsa_insn_alloca (tmp, size, bit_alignment);
   hbb->append_insn (a);
 
-  hsa_insn_seg *seg = new hsa_insn_seg
-    (BRIG_OPCODE_STOF, hsa_get_segment_addr_type (BRIG_SEGMENT_FLAT),
-     tmp->m_type, BRIG_SEGMENT_PRIVATE, dest, tmp);
+  hsa_insn_seg *seg
+    = new hsa_insn_seg (BRIG_OPCODE_STOF,
+			hsa_get_segment_addr_type (BRIG_SEGMENT_FLAT),
+			tmp->m_type, BRIG_SEGMENT_PRIVATE, dest, tmp);
   hbb->append_insn (seg);
 }
 
@@ -4043,8 +4065,9 @@ gen_hsa_clrsb (gcall *call, hsa_bb *hbb)
   gen_hsa_binary_operation (BRIG_OPCODE_AND, and_reg, arg, c, hbb);
 
   hsa_op_reg *most_sign = new hsa_op_reg (BRIG_TYPE_B1);
-  hsa_insn_cmp *cmp = new hsa_insn_cmp
-    (BRIG_COMPARE_EQ, most_sign->m_type, most_sign, and_reg, c);
+  hsa_insn_cmp *cmp
+    = new hsa_insn_cmp (BRIG_COMPARE_EQ, most_sign->m_type, most_sign,
+			and_reg, c);
   hbb->append_insn (cmp);
 
   /* If the most significant bit is one, negate the input.  Otherwise
@@ -4059,8 +4082,9 @@ gen_hsa_clrsb (gcall *call, hsa_bb *hbb)
   /* Assign the value that can be used for FIRSTBIT instruction according
      to the most significant bit.  */
   hsa_op_reg *tmp = new hsa_op_reg (bittype);
-  hsa_insn_basic *cmov = new hsa_insn_basic
-    (4, BRIG_OPCODE_CMOV, bittype, tmp, most_sign, arg_neg, shifted_arg);
+  hsa_insn_basic *cmov
+    = new hsa_insn_basic (4, BRIG_OPCODE_CMOV, bittype, tmp, most_sign,
+			  arg_neg, shifted_arg);
   hbb->append_insn (cmov);
 
   hsa_op_reg *leading_bits = new hsa_op_reg (BRIG_TYPE_S32);
@@ -4070,16 +4094,14 @@ gen_hsa_clrsb (gcall *call, hsa_bb *hbb)
 
   /* Set flag if the input value is equal to zero.  */
   hsa_op_reg *is_zero = new hsa_op_reg (BRIG_TYPE_B1);
-  cmp = new hsa_insn_cmp
-    (BRIG_COMPARE_EQ, is_zero->m_type, is_zero, arg,
-     new hsa_op_immed (0, arg->m_type));
+  cmp = new hsa_insn_cmp (BRIG_COMPARE_EQ, is_zero->m_type, is_zero, arg,
+			  new hsa_op_immed (0, arg->m_type));
   hbb->append_insn (cmp);
 
   /* Return the number of leading bits, or 31 if the input value is zero.  */
-  cmov = new hsa_insn_basic
-    (4, BRIG_OPCODE_CMOV, BRIG_TYPE_B32, NULL, is_zero,
-     new hsa_op_immed (31, BRIG_TYPE_U32),
-     leading_bits->get_in_type (BRIG_TYPE_B32, hbb));
+  cmov = new hsa_insn_basic (4, BRIG_OPCODE_CMOV, BRIG_TYPE_B32, NULL, is_zero,
+			     new hsa_op_immed (31, BRIG_TYPE_U32),
+			     leading_bits->get_in_type (BRIG_TYPE_B32, hbb));
   hbb->append_insn (cmov);
   cmov->set_output_in_type (dest, 0, hbb);
 }
@@ -4102,13 +4124,14 @@ gen_hsa_ffs (gcall *call, hsa_bb *hbb)
   hsa_op_with_type *arg = hsa_reg_or_immed_for_gimple_op (rhs1, hbb);
 
   hsa_op_reg *tmp = new hsa_op_reg (BRIG_TYPE_U32);
-  hsa_insn_srctype *insn = new hsa_insn_srctype
-    (2, BRIG_OPCODE_LASTBIT, tmp->m_type, arg->m_type, tmp, arg);
+  hsa_insn_srctype *insn = new hsa_insn_srctype (2, BRIG_OPCODE_LASTBIT,
+						 tmp->m_type, arg->m_type,
+						 tmp, arg);
   hbb->append_insn (insn);
 
-  hsa_insn_basic *addition = new hsa_insn_basic
-    (3, BRIG_OPCODE_ADD, tmp->m_type, NULL, tmp,
-     new hsa_op_immed (1, tmp->m_type));
+  hsa_insn_basic *addition
+    = new hsa_insn_basic (3, BRIG_OPCODE_ADD, tmp->m_type, NULL, tmp,
+			  new hsa_op_immed (1, tmp->m_type));
   hbb->append_insn (addition);
   addition->set_output_in_type (dest, 0, hbb);
 }
@@ -4124,8 +4147,9 @@ gen_hsa_popcount_to_dest (hsa_op_reg *dest, hsa_op_with_type *arg, hsa_bb *hbb)
   if (!hsa_btype_p (arg->m_type))
     arg = arg->get_in_type (hsa_bittype_for_type (arg->m_type), hbb);
 
-  hsa_insn_srctype *popcount = new hsa_insn_srctype
-    (2, BRIG_OPCODE_POPCOUNT, BRIG_TYPE_U32, arg->m_type, NULL, arg);
+  hsa_insn_srctype *popcount
+    = new hsa_insn_srctype (2, BRIG_OPCODE_POPCOUNT, BRIG_TYPE_U32,
+			    arg->m_type, NULL, arg);
   hbb->append_insn (popcount);
   popcount->set_output_in_type (dest, 0, hbb);
 }
@@ -4148,9 +4172,9 @@ gen_hsa_parity (gcall *call, hsa_bb *hbb)
   hsa_op_reg *popcount = new hsa_op_reg (BRIG_TYPE_U32);
   gen_hsa_popcount_to_dest (popcount, arg, hbb);
 
-  hsa_insn_basic *insn = new hsa_insn_basic
-    (3, BRIG_OPCODE_REM, popcount->m_type, NULL, popcount,
-     new hsa_op_immed (2, popcount->m_type));
+  hsa_insn_basic *insn
+    = new hsa_insn_basic (3, BRIG_OPCODE_REM, popcount->m_type, NULL, popcount,
+			  new hsa_op_immed (2, popcount->m_type));
   hbb->append_insn (insn);
   insn->set_output_in_type (dest, 0, hbb);
 }
@@ -4182,8 +4206,9 @@ set_debug_value (hsa_bb *hbb, hsa_op_with_type *value)
   if (shadow_reg_ptr == NULL)
     return;
 
-  hsa_op_address *addr = new hsa_op_address
-    (shadow_reg_ptr, get_hsa_kernel_dispatch_offset ("debug"));
+  hsa_op_address *addr
+    = new hsa_op_address (shadow_reg_ptr,
+			  get_hsa_kernel_dispatch_offset ("debug"));
   hsa_insn_mem *mem = new hsa_insn_mem (BRIG_OPCODE_ST, BRIG_TYPE_U64, value,
 					addr);
   hbb->append_insn (mem);
@@ -4213,8 +4238,8 @@ omp_simple_builtin::generate (gimple *stmt, hsa_bb *hbb)
       hbb->append_insn (new hsa_insn_comment (m_name));
 
       hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs);
-      hsa_build_append_simple_mov
-	(dest, m_return_value->get_in_type (dest->m_type, hbb), hbb);
+      hsa_op_with_type *op = m_return_value->get_in_type (dest->m_type, hbb);
+      hsa_build_append_simple_mov (dest, op, hbb);
     }
 }
 
@@ -4326,8 +4351,8 @@ gen_hsa_insns_for_kernel_call (hsa_bb *hbb, gcall *call)
 
   /* Get my kernel dispatch argument.  */
   hbb->append_insn (new hsa_insn_comment ("get kernel dispatch structure"));
-  addr = new hsa_op_address
-    (shadow_reg_ptr, get_hsa_kernel_dispatch_offset ("children_dispatches"));
+  HOST_WIDE_INT o = get_hsa_kernel_dispatch_offset ("children_dispatches");
+  addr = new hsa_op_address (shadow_reg_ptr, o);
 
   hsa_op_reg *shadow_reg_base_ptr = new hsa_op_reg (BRIG_TYPE_U64);
   mem = new hsa_insn_mem (BRIG_OPCODE_LD, BRIG_TYPE_U64, shadow_reg_base_ptr,
@@ -4344,12 +4369,12 @@ gen_hsa_insns_for_kernel_call (hsa_bb *hbb, gcall *call)
   hbb->append_insn (mem);
 
   /* Load an address of the command queue to a register.  */
-  hbb->append_insn (new hsa_insn_comment
-		    ("load base address of command queue"));
+  hbb->append_insn (new hsa_insn_comment ("load base address of command "
+					  "queue"));
 
   hsa_op_reg *queue_reg = new hsa_op_reg (BRIG_TYPE_U64);
-  addr = new hsa_op_address (shadow_reg,
-			     get_hsa_kernel_dispatch_offset ("queue"));
+  o = get_hsa_kernel_dispatch_offset ("queue");
+  addr = new hsa_op_address (shadow_reg, o);
 
   mem = new hsa_insn_mem (BRIG_OPCODE_LD, BRIG_TYPE_U64, queue_reg, addr);
 
@@ -4358,9 +4383,8 @@ gen_hsa_insns_for_kernel_call (hsa_bb *hbb, gcall *call)
   /* Load an address of prepared memory for a kernel arguments.  */
   hbb->append_insn (new hsa_insn_comment ("get a kernarg address"));
   hsa_op_reg *kernarg_reg = new hsa_op_reg (BRIG_TYPE_U64);
-
-  addr = new hsa_op_address
-    (shadow_reg, get_hsa_kernel_dispatch_offset ("kernarg_address"));
+  o = get_hsa_kernel_dispatch_offset ("kernarg_address");
+  addr = new hsa_op_address (shadow_reg, o);
 
   mem = new hsa_insn_mem (BRIG_OPCODE_LD, BRIG_TYPE_U64, kernarg_reg, addr);
   hbb->append_insn (mem);
@@ -4368,9 +4392,8 @@ gen_hsa_insns_for_kernel_call (hsa_bb *hbb, gcall *call)
   /* Load an kernel object we want to call.  */
   hbb->append_insn (new hsa_insn_comment ("get a kernel object"));
   hsa_op_reg *object_reg = new hsa_op_reg (BRIG_TYPE_U64);
-
-  addr = new hsa_op_address
-    (shadow_reg, get_hsa_kernel_dispatch_offset ("object"));
+  o = get_hsa_kernel_dispatch_offset ("object");
+  addr = new hsa_op_address (shadow_reg, o);
 
   mem = new hsa_insn_mem (BRIG_OPCODE_LD, BRIG_TYPE_U64, object_reg, addr);
   hbb->append_insn (mem);
@@ -4379,8 +4402,8 @@ gen_hsa_insns_for_kernel_call (hsa_bb *hbb, gcall *call)
   hbb->append_insn (new hsa_insn_comment ("get a signal by kernel call index"));
 
   hsa_op_reg *signal_reg = new hsa_op_reg (BRIG_TYPE_U64);
-  addr = new hsa_op_address
-    (shadow_reg, get_hsa_kernel_dispatch_offset ("signal"));
+  o = get_hsa_kernel_dispatch_offset ("signal");
+  addr = new hsa_op_address (shadow_reg, o);
   mem = new hsa_insn_mem (BRIG_OPCODE_LD, BRIG_TYPE_U64, signal_reg, addr);
   hbb->append_insn (mem);
 
@@ -4399,22 +4422,22 @@ gen_hsa_insns_for_kernel_call (hsa_bb *hbb, gcall *call)
   /* Get private segment size.  */
   hsa_op_reg *private_seg_reg = new hsa_op_reg (BRIG_TYPE_U32);
 
-  hbb->append_insn (new hsa_insn_comment
-		    ("get a kernel private segment size by kernel call index"));
+  hbb->append_insn (new hsa_insn_comment ("get a kernel private segment size "
+					  "by kernel call index"));
 
-  addr = new hsa_op_address
-    (shadow_reg, get_hsa_kernel_dispatch_offset ("private_segment_size"));
+  o = get_hsa_kernel_dispatch_offset ("private_segment_size");
+  addr = new hsa_op_address (shadow_reg, o);
   mem = new hsa_insn_mem (BRIG_OPCODE_LD, BRIG_TYPE_U32, private_seg_reg, addr);
   hbb->append_insn (mem);
 
   /* Get group segment size.  */
   hsa_op_reg *group_seg_reg = new hsa_op_reg (BRIG_TYPE_U32);
 
-  hbb->append_insn (new hsa_insn_comment
-		    ("get a kernel group segment size by kernel call index"));
+  hbb->append_insn (new hsa_insn_comment ("get a kernel group segment size "
+					  "by kernel call index"));
 
-  addr = new hsa_op_address
-    (shadow_reg, get_hsa_kernel_dispatch_offset ("group_segment_size"));
+  o = get_hsa_kernel_dispatch_offset ("group_segment_size");
+  addr = new hsa_op_address (shadow_reg, o);
   mem = new hsa_insn_mem (BRIG_OPCODE_LD, BRIG_TYPE_U32, group_seg_reg, addr);
   hbb->append_insn (mem);
 
@@ -4438,13 +4461,14 @@ gen_hsa_insns_for_kernel_call (hsa_bb *hbb, gcall *call)
   hsa_op_reg *queue_addr_reg = new hsa_op_reg (BRIG_TYPE_U64);
 
   c = new hsa_op_immed (addr_offset, BRIG_TYPE_U64);
-  hsa_insn_basic *insn = new hsa_insn_basic
-    (3, BRIG_OPCODE_ADD, BRIG_TYPE_U64, queue_addr_reg, queue_reg, c);
+  hsa_insn_basic *insn
+    = new hsa_insn_basic (3, BRIG_OPCODE_ADD, BRIG_TYPE_U64, queue_addr_reg,
+			  queue_reg, c);
 
   hbb->append_insn (insn);
 
-  hbb->append_insn (new hsa_insn_comment
-		    ("get base address of prepared packet"));
+  hbb->append_insn (new hsa_insn_comment ("get base address of prepared "
+					  "packet"));
 
   hsa_op_reg *queue_addr_value_reg = new hsa_op_reg (BRIG_TYPE_U64);
   addr = new hsa_op_address (queue_addr_reg);
@@ -4454,16 +4478,15 @@ gen_hsa_insns_for_kernel_call (hsa_bb *hbb, gcall *call)
 
   c = new hsa_op_immed (sizeof (hsa_queue_packet), BRIG_TYPE_U64);
   hsa_op_reg *queue_packet_offset_reg = new hsa_op_reg (BRIG_TYPE_U64);
-  insn = new hsa_insn_basic
-    (3, BRIG_OPCODE_MUL, BRIG_TYPE_U64, queue_packet_offset_reg,
-     queue_index_reg, c);
+  insn = new hsa_insn_basic (3, BRIG_OPCODE_MUL, BRIG_TYPE_U64,
+			     queue_packet_offset_reg, queue_index_reg, c);
 
   hbb->append_insn (insn);
 
   hsa_op_reg *queue_packet_reg = new hsa_op_reg (BRIG_TYPE_U64);
-  insn = new hsa_insn_basic
-    (3, BRIG_OPCODE_ADD, BRIG_TYPE_U64, queue_packet_reg, queue_addr_value_reg,
-     queue_packet_offset_reg);
+  insn = new hsa_insn_basic (3, BRIG_OPCODE_ADD, BRIG_TYPE_U64,
+			     queue_packet_reg, queue_addr_value_reg,
+			     queue_packet_offset_reg);
 
   hbb->append_insn (insn);
 
@@ -4478,8 +4501,8 @@ gen_hsa_insns_for_kernel_call (hsa_bb *hbb, gcall *call)
 			  addr);
   hbb->append_insn (mem);
 
-  hsa_op_with_type *packet_setup_u32 = packet_setup_reg->get_in_type
-    (BRIG_TYPE_U32, hbb);
+  hsa_op_with_type *packet_setup_u32
+    = packet_setup_reg->get_in_type (BRIG_TYPE_U32, hbb);
 
   hsa_op_reg *packet_setup_u32_2 = new hsa_op_reg (BRIG_TYPE_U32);
   c = new hsa_op_immed (1, BRIG_TYPE_U32);
@@ -4488,8 +4511,8 @@ gen_hsa_insns_for_kernel_call (hsa_bb *hbb, gcall *call)
 
   hbb->append_insn (insn);
 
-  hsa_op_with_type *packet_setup_reg_2 = packet_setup_u32_2->get_in_type
-    (BRIG_TYPE_U16, hbb);
+  hsa_op_with_type *packet_setup_reg_2
+    = packet_setup_u32_2->get_in_type (BRIG_TYPE_U16, hbb);
 
   addr = new hsa_op_address (queue_packet_reg,
 			     offsetof (hsa_queue_packet, setup));
@@ -4501,8 +4524,8 @@ gen_hsa_insns_for_kernel_call (hsa_bb *hbb, gcall *call)
      emit passed grid_size.  */
   hsa_op_reg *threads_reg = gen_num_threads_for_dispatch (hbb);
 
-  hbb->append_insn (new hsa_insn_comment
-		    ("set packet->grid_size_x = hsa_num_threads"));
+  hbb->append_insn (new hsa_insn_comment ("set packet->grid_size_x = "
+					  "hsa_num_threads"));
 
   addr = new hsa_op_address (queue_packet_reg,
 			     offsetof (hsa_queue_packet, grid_size_x));
@@ -4511,20 +4534,20 @@ gen_hsa_insns_for_kernel_call (hsa_bb *hbb, gcall *call)
   hbb->append_insn (mem);
 
   /* Write to shadow_reg->omp_num_threads = hsa_num_threads.  */
-  hbb->append_insn (new hsa_insn_comment
-		    ("set shadow_reg->omp_num_threads = hsa_num_threads"));
+  hbb->append_insn (new hsa_insn_comment ("set shadow_reg->omp_num_threads = "
+					  "hsa_num_threads"));
 
-  addr = new hsa_op_address
-    (shadow_reg, get_hsa_kernel_dispatch_offset ("omp_num_threads"));
-  hbb->append_insn
-    (new hsa_insn_mem (BRIG_OPCODE_ST, threads_reg->m_type, threads_reg, addr));
+  o = get_hsa_kernel_dispatch_offset ("omp_num_threads");
+  addr = new hsa_op_address (shadow_reg, o);
+  hbb->append_insn (new hsa_insn_mem (BRIG_OPCODE_ST, threads_reg->m_type,
+				      threads_reg, addr));
 
   /* Write to packet->workgroup_size_x.  */
-  hbb->append_insn (new hsa_insn_comment
-		    ("set packet->workgroup_size_x = hsa_num_threads"));
+  hbb->append_insn (new hsa_insn_comment ("set packet->workgroup_size_x = "
+					  "hsa_num_threads"));
 
-  addr = new hsa_op_address
-    (queue_packet_reg, offsetof (hsa_queue_packet, workgroup_size_x));
+  addr = new hsa_op_address (queue_packet_reg,
+			     offsetof (hsa_queue_packet, workgroup_size_x));
   mem = new hsa_insn_mem (BRIG_OPCODE_ST, BRIG_TYPE_U16, threads_reg,
 			  addr);
   hbb->append_insn (mem);
@@ -4568,8 +4591,8 @@ gen_hsa_insns_for_kernel_call (hsa_bb *hbb, gcall *call)
   /* Write to packet->private_segment_size.  */
   hbb->append_insn (new hsa_insn_comment ("set packet->private_segment_size"));
 
-  hsa_op_with_type *private_seg_reg_u16 = private_seg_reg->get_in_type
-    (BRIG_TYPE_U16, hbb);
+  hsa_op_with_type *private_seg_reg_u16
+    = private_seg_reg->get_in_type (BRIG_TYPE_U16, hbb);
 
   addr = new hsa_op_address (queue_packet_reg,
 			     offsetof (hsa_queue_packet, private_segment_size));
@@ -4580,8 +4603,8 @@ gen_hsa_insns_for_kernel_call (hsa_bb *hbb, gcall *call)
   /* Write to packet->group_segment_size.  */
   hbb->append_insn (new hsa_insn_comment ("set packet->group_segment_size"));
 
-  hsa_op_with_type *group_seg_reg_u16 = group_seg_reg->get_in_type
-    (BRIG_TYPE_U16, hbb);
+  hsa_op_with_type *group_seg_reg_u16
+    = group_seg_reg->get_in_type (BRIG_TYPE_U16, hbb);
 
   addr = new hsa_op_address (queue_packet_reg,
 			     offsetof (hsa_queue_packet, group_segment_size));
@@ -4602,9 +4625,8 @@ gen_hsa_insns_for_kernel_call (hsa_bb *hbb, gcall *call)
 
   hsa_op_reg *omp_data_memory_reg = new hsa_op_reg (BRIG_TYPE_U64);
 
-  addr = new hsa_op_address
-    (shadow_reg, get_hsa_kernel_dispatch_offset ("omp_data_memory"));
-
+  o = get_hsa_kernel_dispatch_offset ("omp_data_memory");
+  addr = new hsa_op_address (shadow_reg, o);
   mem = new hsa_insn_mem (BRIG_OPCODE_LD, BRIG_TYPE_U64, omp_data_memory_reg,
 			  addr);
   hbb->append_insn (mem);
@@ -4638,8 +4660,8 @@ gen_hsa_insns_for_kernel_call (hsa_bb *hbb, gcall *call)
   else
     gcc_unreachable ();
 
-  hbb->append_insn (new hsa_insn_comment
-		    ("write memory pointer to packet->kernarg_address"));
+  hbb->append_insn (new hsa_insn_comment ("write memory pointer to "
+					  "packet->kernarg_address"));
 
   addr = new hsa_op_address (queue_packet_reg,
 			     offsetof (hsa_queue_packet, kernarg_address));
@@ -4647,8 +4669,8 @@ gen_hsa_insns_for_kernel_call (hsa_bb *hbb, gcall *call)
   hbb->append_insn (mem);
 
   /* Write to packet->kernarg_address.  */
-  hbb->append_insn (new hsa_insn_comment
-		    ("write argument0 to *packet->kernarg_address"));
+  hbb->append_insn (new hsa_insn_comment ("write argument0 to "
+					  "*packet->kernarg_address"));
 
   addr = new hsa_op_address (kernarg_reg);
 
@@ -4657,8 +4679,8 @@ gen_hsa_insns_for_kernel_call (hsa_bb *hbb, gcall *call)
   hbb->append_insn (mem);
 
   /* Pass shadow argument to another dispatched kernel module.  */
-  hbb->append_insn (new hsa_insn_comment
-		    ("write argument1 to *packet->kernarg_address"));
+  hbb->append_insn (new hsa_insn_comment ("write argument1 to "
+					  "*packet->kernarg_address"));
 
   addr = new hsa_op_address (kernarg_reg, 8);
   mem = new hsa_insn_mem (BRIG_OPCODE_ST, BRIG_TYPE_U64, shadow_reg, addr);
@@ -4673,8 +4695,8 @@ gen_hsa_insns_for_kernel_call (hsa_bb *hbb, gcall *call)
   hbb->append_insn (mem);
 
   /* Atomically write to packer->header.  */
-  hbb->append_insn
-    (new hsa_insn_comment ("store atomically to packet->header"));
+  hbb->append_insn (new hsa_insn_comment ("store atomically to "
+					  "packet->header"));
 
   addr = new hsa_op_address (queue_packet_reg,
 			     offsetof (hsa_queue_packet, header));
@@ -4682,9 +4704,10 @@ gen_hsa_insns_for_kernel_call (hsa_bb *hbb, gcall *call)
   /* Store 5122 << 16 + 1 to packet->header.  */
   c = new hsa_op_immed (70658, BRIG_TYPE_U32);
 
-  hsa_insn_atomic *atomic = new hsa_insn_atomic
-    (2, BRIG_OPCODE_ATOMICNORET, BRIG_ATOMIC_ST, BRIG_TYPE_B32,
-     BRIG_MEMORY_ORDER_SC_RELEASE, addr, c);
+  hsa_insn_atomic *atomic
+    = new hsa_insn_atomic (2, BRIG_OPCODE_ATOMICNORET, BRIG_ATOMIC_ST,
+			   BRIG_TYPE_B32, BRIG_MEMORY_ORDER_SC_RELEASE, addr,
+			   c);
   atomic->m_memoryscope = BRIG_MEMORY_SCOPE_SYSTEM;
 
   hbb->append_insn (atomic);
@@ -4737,9 +4760,9 @@ gen_hsa_insns_for_kernel_call (hsa_bb *hbb, gcall *call)
   new_hbb->append_insn (signal);
 
   hsa_op_reg *ctrl = new hsa_op_reg (BRIG_TYPE_B1);
-  hsa_insn_cmp *cmp = new hsa_insn_cmp
-    (BRIG_COMPARE_EQ, ctrl->m_type, ctrl, signal_result_reg,
-     new hsa_op_immed (1, signal_result_reg->m_type));
+  hsa_op_immed *imm = new hsa_op_immed (1, signal_result_reg->m_type);
+  hsa_insn_cmp *cmp = new hsa_insn_cmp (BRIG_COMPARE_EQ, ctrl->m_type, ctrl,
+					signal_result_reg, imm);
 
   new_hbb->append_insn (cmp);
   new_hbb->append_insn (new hsa_insn_br (ctrl));
@@ -4747,8 +4770,8 @@ gen_hsa_insns_for_kernel_call (hsa_bb *hbb, gcall *call)
   if (TREE_CODE (argument) == ADDR_EXPR)
     {
       /* Emit instructions that copy back OMP arguments to a caller kernel.  */
-      memcpy_hbb->append_insn
-	(new hsa_insn_comment ("OMP arg memcpy back instructions"));
+      memcpy_hbb->append_insn (new hsa_insn_comment ("OMP arg memcpy back "
+						     "instructions"));
 
       hsa_op_address *src_addr = new hsa_op_address (omp_var_decl);
       gen_hsa_memory_copy (memcpy_hbb, src_addr, dst_addr, omp_var_decl->m_dim);
@@ -4769,8 +4792,8 @@ gen_hsa_unaryop_for_builtin (BrigOpcode opcode, gimple *stmt, hsa_bb *hbb)
   if (!lhs)
     return;
   hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs);
-  hsa_op_with_type *op = hsa_reg_or_immed_for_gimple_op
-    (gimple_call_arg (stmt, 0), hbb);
+  hsa_op_with_type *op
+    = hsa_reg_or_immed_for_gimple_op (gimple_call_arg (stmt, 0), hbb);
   gen_hsa_unary_operation (opcode, dest, op, hbb);
 }
 
@@ -4820,8 +4843,8 @@ get_address_from_value (tree val, hsa_bb *hbb)
     case SSA_NAME:
       {
 	BrigType16_t addrtype = hsa_get_segment_addr_type (BRIG_SEGMENT_FLAT);
-	hsa_op_base *reg = hsa_cfun->reg_for_gimple_ssa (val)->get_in_type
-	  (addrtype, hbb);
+	hsa_op_base *reg
+	  = hsa_cfun->reg_for_gimple_ssa (val)->get_in_type (addrtype, hbb);
 	return new hsa_op_address (NULL, as_a <hsa_op_reg *> (reg), 0);
       }
     case ADDR_EXPR:
@@ -4897,7 +4920,8 @@ get_memory_order (unsigned memmodel, location_t location)
 
 static void
 gen_hsa_ternary_atomic_for_builtin (bool ret_orig,
- 				    enum BrigAtomicOperation acode, gimple *stmt,
+ 				    enum BrigAtomicOperation acode,
+				    gimple *stmt,
 				    hsa_bb *hbb)
 {
   tree lhs = gimple_call_lhs (stmt);
@@ -4909,16 +4933,15 @@ gen_hsa_ternary_atomic_for_builtin (bool ret_orig,
 
   if (!tree_fits_uhwi_p (model))
     {
-      HSA_SORRY_ATV
-	(gimple_location (stmt),
-	 "support for HSA does not implement memory model %E", model);
+      HSA_SORRY_ATV (gimple_location (stmt),
+		     "support for HSA does not implement memory model %E",
+		     model);
       return;
     }
 
   unsigned HOST_WIDE_INT mmodel = tree_to_uhwi (model);
 
-  BrigMemoryOrder memorder = get_memory_order
-    (mmodel, gimple_location (stmt));
+  BrigMemoryOrder memorder = get_memory_order (mmodel, gimple_location (stmt));
 
   /* Certain atomic insns must have Bx memory types.  */
   switch (acode)
@@ -5089,7 +5112,7 @@ gen_hsa_insn_for_internal_fn_call (gcall *stmt, hsa_bb *hbb)
 	  gen_hsa_unaryop_or_call_for_builtin (BRIG_OPCODE_NSIN, stmt, hbb);
 	else
 	  gen_hsa_unaryop_builtin_call (stmt, hbb);
-        break;
+	break;
       }
 
     case IFN_CLRSB:
@@ -5291,9 +5314,10 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
 	tree model = gimple_call_arg (stmt, 1);
 	if (!tree_fits_uhwi_p (model))
 	  {
-	    HSA_SORRY_ATV
-	      (gimple_location (stmt),
-	       "support for HSA does not implement memory model: %E", model);
+	    HSA_SORRY_ATV (gimple_location (stmt),
+			   "support for HSA does not implement "
+			   "memory model: %E",
+			   model);
 	    return;
 	  }
 
@@ -5304,17 +5328,18 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
 	if (memorder != BRIG_MEMORY_ORDER_RELAXED
 	    && memorder != BRIG_MEMORY_ORDER_SC_RELEASE)
 	  {
-	    HSA_SORRY_ATV
-	      (gimple_location (stmt),
-	       "support for HSA does not implement memory model for "
-	       "ATOMIC_LD: %s", get_memory_order_name (mmodel));
+	    HSA_SORRY_ATV (gimple_location (stmt),
+			   "support for HSA does not implement "
+			   "memory model for ATOMIC_LD: %s",
+			   get_memory_order_name (mmodel));
 	    return;
 	  }
 
 	if (lhs)
 	  {
-	    mtype = mem_type_for_type
-	      (hsa_type_for_scalar_tree_type (TREE_TYPE (lhs), false));
+	    BrigType16_t t = hsa_type_for_scalar_tree_type (TREE_TYPE (lhs),
+							    false);
+	    mtype = mem_type_for_type (t);
 	    mtype = hsa_bittype_for_type (mtype);
 	    dest = hsa_cfun->reg_for_gimple_ssa (lhs);
 	  }
@@ -5435,15 +5460,15 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
     case BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_8:
     case BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_16:
       {
-	/* XXX Ignore mem model for now.  */
+	/* TODO: Use the appropriate memory model for now.  */
 	tree type = TREE_TYPE (gimple_call_arg (stmt, 1));
 
-	BrigType16_t atype  = hsa_bittype_for_type
-	  (hsa_type_for_scalar_tree_type (type, false));
+	BrigType16_t atype
+	  = hsa_bittype_for_type (hsa_type_for_scalar_tree_type (type, false));
 
-	hsa_insn_atomic *atominsn = new hsa_insn_atomic
-	  (4, BRIG_OPCODE_ATOMIC, BRIG_ATOMIC_CAS, atype,
-	   BRIG_MEMORY_ORDER_SC_ACQUIRE_RELEASE);
+	hsa_insn_atomic *atominsn
+	  = new hsa_insn_atomic (4, BRIG_OPCODE_ATOMIC, BRIG_ATOMIC_CAS, atype,
+				 BRIG_MEMORY_ORDER_SC_ACQUIRE_RELEASE);
 	hsa_op_address *addr;
 	addr = get_address_from_value (gimple_call_arg (stmt, 0), hbb);
 
@@ -5452,14 +5477,16 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
 	else
 	  dest = new hsa_op_reg (atype);
 
-	/* Should check what the memory scope is */
+	/* Should check what the memory scope is.  */
 	atominsn->m_memoryscope = BRIG_MEMORY_SCOPE_WORKGROUP;
 	atominsn->set_op (0, dest);
 	atominsn->set_op (1, addr);
-	atominsn->set_op
-	  (2, hsa_reg_or_immed_for_gimple_op (gimple_call_arg (stmt, 1), hbb));
-	atominsn->set_op
-	  (3, hsa_reg_or_immed_for_gimple_op (gimple_call_arg (stmt, 2), hbb));
+
+	hsa_op_with_type *op
+	  = hsa_reg_or_immed_for_gimple_op (gimple_call_arg (stmt, 1), hbb);
+	atominsn->set_op (2, op);
+	op = hsa_reg_or_immed_for_gimple_op (gimple_call_arg (stmt, 2), hbb);
+	atominsn->set_op (3, op);
 
 	hbb->append_insn (atominsn);
 	break;
@@ -5472,9 +5499,9 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
 	called = TREE_OPERAND (called, 0);
 	gcc_checking_assert (TREE_CODE (called) == FUNCTION_DECL);
 
-	hsa_add_kernel_dependency
-	  (hsa_cfun->m_decl,
-	   hsa_brig_function_name (hsa_get_declaration_name (called)));
+	const char *name
+	  = hsa_brig_function_name (hsa_get_declaration_name (called));
+	hsa_add_kernel_dependency (hsa_cfun->m_decl, name);
 	gen_hsa_insns_for_kernel_call (hbb, as_a <gcall *> (stmt));
 
 	break;
@@ -5543,9 +5570,10 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
 	    if (builtin == BUILT_IN_MEMPCPY)
 	      {
 		tmp = new hsa_op_reg (dst_reg->m_type);
-		hsa_insn_basic *add = new hsa_insn_basic
-		  (3, BRIG_OPCODE_ADD, tmp->m_type,
-		   tmp, dst_reg, new hsa_op_immed (n, dst_reg->m_type));
+		hsa_insn_basic *add
+		  = new hsa_insn_basic (3, BRIG_OPCODE_ADD, tmp->m_type,
+					tmp, dst_reg,
+					new hsa_op_immed (n, dst_reg->m_type));
 		hbb->append_insn (add);
 	      }
 	    else
@@ -5585,8 +5613,8 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
 
 	hsa_op_address *dst_addr;
 	dst_addr = get_address_from_value (dst, hbb);
-	unsigned HOST_WIDE_INT constant = tree_to_uhwi
-	  (fold_convert (unsigned_char_type_node, c));
+	unsigned HOST_WIDE_INT constant
+	  = tree_to_uhwi (fold_convert (unsigned_char_type_node, c));
 
 	gen_hsa_memory_set (hbb, dst_addr, constant, n);
 
@@ -5704,8 +5732,8 @@ gen_hsa_phi_from_gimple_phi (gimple *phi_stmt, hsa_bb *hbb)
   hsa_insn_phi *hphi;
   unsigned count = gimple_phi_num_args (phi_stmt);
 
-  hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa
-    (gimple_phi_result (phi_stmt));
+  hsa_op_reg *dest
+    = hsa_cfun->reg_for_gimple_ssa (gimple_phi_result (phi_stmt));
   hphi = new hsa_insn_phi (count, dest);
   hphi->m_bb = hbb->m_bb;
 
@@ -5742,8 +5770,9 @@ gen_hsa_phi_from_gimple_phi (gimple *phi_stmt, hsa_bb *hbb)
 						   hbb_src);
 
 	      hsa_op_reg *dest = new hsa_op_reg (BRIG_TYPE_U64);
-	      hsa_insn_basic *insn = new  hsa_insn_basic
-		(2, BRIG_OPCODE_LDA, BRIG_TYPE_U64, dest, addr);
+	      hsa_insn_basic *insn
+		= new hsa_insn_basic (2, BRIG_OPCODE_LDA, BRIG_TYPE_U64,
+				      dest, addr);
 	      hbb_src->append_insn (insn);
 
 	      hphi->set_op (i, dest);
@@ -5772,10 +5801,10 @@ gen_hsa_phi_from_gimple_phi (gimple *phi_stmt, hsa_bb *hbb)
    index of this BB (so that the constructor does not attempt to use
    hsa_cfun during its construction).  */
 
-hsa_bb::hsa_bb (basic_block cfg_bb, int idx): m_bb (cfg_bb),
-  m_first_insn (NULL), m_last_insn (NULL), m_first_phi (NULL),
-  m_last_phi (NULL), m_index (idx), m_liveout (BITMAP_ALLOC (NULL)),
-  m_livein (BITMAP_ALLOC (NULL))
+hsa_bb::hsa_bb (basic_block cfg_bb, int idx)
+  : m_bb (cfg_bb), m_first_insn (NULL), m_last_insn (NULL), m_first_phi (NULL),
+    m_last_phi (NULL), m_index (idx), m_liveout (BITMAP_ALLOC (NULL)),
+    m_livein (BITMAP_ALLOC (NULL))
 {
   gcc_assert (!cfg_bb->aux);
   cfg_bb->aux = this;
@@ -5784,10 +5813,10 @@ hsa_bb::hsa_bb (basic_block cfg_bb, int idx): m_bb (cfg_bb),
 /* Constructor of class containing HSA-specific information about a basic
    block.  CFG_BB is the CFG BB this HSA BB is associated with.  */
 
-hsa_bb::hsa_bb (basic_block cfg_bb): m_bb (cfg_bb),
-  m_first_insn (NULL), m_last_insn (NULL), m_first_phi (NULL),
-  m_last_phi (NULL), m_index (hsa_cfun->m_hbb_count++),
-  m_liveout (BITMAP_ALLOC (NULL)), m_livein (BITMAP_ALLOC (NULL))
+hsa_bb::hsa_bb (basic_block cfg_bb)
+  : m_bb (cfg_bb), m_first_insn (NULL), m_last_insn (NULL), m_first_phi (NULL),
+    m_last_phi (NULL), m_index (hsa_cfun->m_hbb_count++),
+    m_liveout (BITMAP_ALLOC (NULL)), m_livein (BITMAP_ALLOC (NULL))
 {
   gcc_assert (!cfg_bb->aux);
   cfg_bb->aux = this;
@@ -5836,10 +5865,10 @@ init_hsa_num_threads (void)
   hsa_bb *prologue = hsa_bb_for_bb (ENTRY_BLOCK_PTR_FOR_FN (cfun));
 
   /* Save the default value to private variable hsa_num_threads.  */
-  hsa_insn_basic *basic = new hsa_insn_mem
-    (BRIG_OPCODE_ST, hsa_num_threads->m_type,
-     new hsa_op_immed (0, hsa_num_threads->m_type),
-     new hsa_op_address (hsa_num_threads));
+  hsa_insn_basic *basic
+    = new hsa_insn_mem (BRIG_OPCODE_ST, hsa_num_threads->m_type,
+			new hsa_op_immed (0, hsa_num_threads->m_type),
+			new hsa_op_address (hsa_num_threads));
   prologue->append_insn (basic);
 }
 
@@ -5862,9 +5891,9 @@ gen_body_from_gimple ()
 	     to the same basic block.  */
 	  if (e->flags & EDGE_EH)
 	    {
-	      HSA_SORRY_AT
-		(UNKNOWN_LOCATION,
-		 "support for HSA does not implement exception handling");
+	      HSA_SORRY_AT (UNKNOWN_LOCATION,
+			    "support for HSA does not implement exception "
+			    "handling");
 	      return;
 	    }
 	}
@@ -5935,8 +5964,8 @@ gen_function_decl_parameters (hsa_function_representation *f,
     {
       f->m_output_arg = new hsa_symbol (BRIG_TYPE_NONE, BRIG_SEGMENT_ARG,
 					BRIG_LINKAGE_NONE);
-      f->m_output_arg->m_type = hsa_type_for_tree_type
-	(result_type, &f->m_output_arg->m_dim);
+      f->m_output_arg->m_type
+	= hsa_type_for_tree_type (result_type, &f->m_output_arg->m_dim);
       f->m_output_arg->m_name = "res";
     }
 }
@@ -5957,10 +5986,10 @@ gen_function_def_parameters ()
     {
       struct hsa_symbol **slot;
 
-      hsa_symbol *arg = new hsa_symbol
-	(BRIG_TYPE_NONE,
-	 hsa_cfun->m_kern_p ? BRIG_SEGMENT_KERNARG : BRIG_SEGMENT_ARG,
-	 BRIG_LINKAGE_FUNCTION);
+      hsa_symbol *arg
+	= new hsa_symbol (BRIG_TYPE_NONE, hsa_cfun->m_kern_p
+			  ? BRIG_SEGMENT_KERNARG : BRIG_SEGMENT_ARG,
+			  BRIG_LINKAGE_FUNCTION);
       arg->fillup_for_decl (parm);
 
       hsa_cfun->m_input_args.safe_push (arg);
@@ -5997,8 +6026,9 @@ gen_function_def_parameters ()
 	  tree ddef = ssa_default_def (cfun, parm);
 	  if (ddef && !has_zero_uses (ddef))
 	    {
-	      BrigType16_t mtype = mem_type_for_type
-		(hsa_type_for_scalar_tree_type (TREE_TYPE (ddef), false));
+	      BrigType16_t t = hsa_type_for_scalar_tree_type (TREE_TYPE (ddef),
+							      false);
+	      BrigType16_t mtype = mem_type_for_type (t);
 	      hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (ddef);
 	      hsa_insn_mem *mem = new hsa_insn_mem (BRIG_OPCODE_LD, mtype,
 						    dest, parm_addr);
@@ -6033,8 +6063,8 @@ gen_function_def_parameters ()
 hsa_function_representation *
 hsa_generate_function_declaration (tree decl)
 {
-  hsa_function_representation *fun = new hsa_function_representation
-    (decl, false, 0);
+  hsa_function_representation *fun
+    = new hsa_function_representation (decl, false, 0);
 
   fun->m_declaration_p = true;
   fun->m_name = get_brig_function_name (decl);
@@ -6056,14 +6086,15 @@ hsa_generate_internal_fn_decl (hsa_internal_fn *fn)
 
   for (unsigned i = 0; i < fn->get_arity (); i++)
     {
-      hsa_symbol *arg = new hsa_symbol
-	(fn->get_argument_type (i), BRIG_SEGMENT_ARG, BRIG_LINKAGE_NONE);
+      hsa_symbol *arg
+	= new hsa_symbol (fn->get_argument_type (i), BRIG_SEGMENT_ARG,
+			  BRIG_LINKAGE_NONE);
       arg->m_name_number = i;
       fun->m_input_args.safe_push (arg);
     }
 
-  fun->m_output_arg = new hsa_symbol
-    (fn->get_argument_type (-1), BRIG_SEGMENT_ARG, BRIG_LINKAGE_NONE);
+  fun->m_output_arg = new hsa_symbol (fn->get_argument_type (-1),
+				      BRIG_SEGMENT_ARG, BRIG_LINKAGE_NONE);
   fun->m_output_arg->m_name = "res";
 
   return fun;
@@ -6194,8 +6225,8 @@ convert_switch_statements ()
 	tree index = gimple_switch_index (s);
 	tree index_type = TREE_TYPE (index);
 	tree default_label = gimple_switch_default_label (s);
-	basic_block default_label_bb = label_to_block_fn
-	  (func, CASE_LABEL (default_label));
+	basic_block default_label_bb
+	  = label_to_block_fn (func, CASE_LABEL (default_label));
 	basic_block cur_bb = bb;
 
 	auto_vec <edge> new_edges;
@@ -6226,8 +6257,9 @@ convert_switch_statements ()
 		    if (gimple_phi_arg_edge (phi, j) == e)
 		      {
 			tree imm = gimple_phi_arg_def (phi, j);
-			phi_todo_list.safe_push
-			  (new phi_definition (phi_index, i, imm));
+			phi_definition *p = new phi_definition (phi_index, i,
+								imm);
+			phi_todo_list.safe_push (p);
 			break;
 		      }
 		  }
@@ -6279,8 +6311,8 @@ convert_switch_statements ()
 		gsi_insert_before (&cond_gsi, assign2, GSI_SAME_STMT);
 		gsi_insert_before (&cond_gsi, assign3, GSI_SAME_STMT);
 
-		c = gimple_build_cond (NE_EXPR, tmp3, constant_boolean_node
-				       (false, boolean_type_node), NULL, NULL);
+		tree b = constant_boolean_node (false, boolean_type_node);
+		c = gimple_build_cond (NE_EXPR, tmp3, b, NULL, NULL);
 	      }
 	    else
 	      c = gimple_build_cond (EQ_EXPR, index, low, NULL, NULL);
@@ -6289,15 +6321,15 @@ convert_switch_statements ()
 
 	    gsi_insert_before (&cond_gsi, c, GSI_SAME_STMT);
 
-	    basic_block label_bb = label_to_block_fn
-	      (func, CASE_LABEL (label));
+	    basic_block label_bb
+	      = label_to_block_fn (func, CASE_LABEL (label));
 	    edge new_edge = make_edge (cur_bb, label_bb, EDGE_TRUE_VALUE);
 	    int prob_sum = sum_slice <int> (edge_probabilities, i, labels) +
 	       edge_probabilities[0];
 
 	    if (prob_sum)
-	      new_edge->probability = RDIV
-		(REG_BR_PROB_BASE * edge_probabilities[i], prob_sum);
+	      new_edge->probability
+		= RDIV (REG_BR_PROB_BASE * edge_probabilities[i], prob_sum);
 
 	    new_edge->count = edge_counts[i];
 	    new_edges.safe_push (new_edge);
@@ -6314,8 +6346,8 @@ convert_switch_statements ()
 		  }
 
 		edge next_edge = make_edge (cur_bb, next_bb, EDGE_FALSE_VALUE);
-		next_edge->probability = inverse_probability
-		  (new_edge->probability);
+		next_edge->probability
+		  = inverse_probability (new_edge->probability);
 		next_edge->count = edge_counts[0]
 		  + sum_slice <gcov_type> (edge_counts, i, labels);
 		next_bb->frequency = EDGE_FREQUENCY (next_edge);
@@ -6403,14 +6435,16 @@ expand_builtins ()
 	      tree imag_part = make_temp_ssa_name (rhs_type, NULL,
 						   "cexp_imag_part");
 
-	      tree cos_fndecl = mathfn_built_in
-		(rhs_type, fn == float_type_p ? BUILT_IN_COSF : BUILT_IN_COS);
+	      tree cos_fndecl
+		= mathfn_built_in (rhs_type, fn == float_type_p
+				   ? BUILT_IN_COSF : BUILT_IN_COS);
 	      gcall *cos = gimple_build_call (cos_fndecl, 1, rhs);
 	      gimple_call_set_lhs (cos, real_part);
 	      gsi_insert_before (&gsi, cos, GSI_SAME_STMT);
 
-	      tree sin_fndecl = mathfn_built_in
-		(rhs_type, fn == float_type_p ? BUILT_IN_SINF : BUILT_IN_SIN);
+	      tree sin_fndecl
+		= mathfn_built_in (rhs_type, fn == float_type_p
+				   ? BUILT_IN_SINF : BUILT_IN_SIN);
 	      gcall *sin = gimple_build_call (sin_fndecl, 1, rhs);
 	      gimple_call_set_lhs (sin, imag_part);
 	      gsi_insert_before (&gsi, sin, GSI_SAME_STMT);
@@ -6489,8 +6523,8 @@ generate_hsa (bool kernel)
 
   if (hsa_cfun->m_kern_p)
     {
-      hsa_function_summary *s = hsa_summaries->get
-	(cgraph_node::get (hsa_cfun->m_decl));
+      hsa_function_summary *s
+	= hsa_summaries->get (cgraph_node::get (hsa_cfun->m_decl));
       hsa_add_kern_decl_mapping (current_function_decl, hsa_cfun->m_name,
 				 hsa_cfun->m_maximum_omp_data_size,
 				 s->m_gridified_kernel_p);
@@ -6525,9 +6559,9 @@ const pass_data pass_data_gen_hsail =
 {
   GIMPLE_PASS,
   "hsagen",	 			/* name */
-  OPTGROUP_NONE,                        /* optinfo_flags */
+  OPTGROUP_NONE,			/* optinfo_flags */
   TV_NONE,				/* tv_id */
-  PROP_cfg | PROP_ssa,                  /* properties_required */
+  PROP_cfg | PROP_ssa,			/* properties_required */
   0,					/* properties_provided */
   0,					/* properties_destroyed */
   0,					/* todo_flags_start */
@@ -6559,8 +6593,8 @@ pass_gen_hsail::gate (function *f)
 unsigned int
 pass_gen_hsail::execute (function *)
 {
-  hsa_function_summary *s = hsa_summaries->get
-    (cgraph_node::get_create (current_function_decl));
+  hsa_function_summary *s
+    = hsa_summaries->get (cgraph_node::get_create (current_function_decl));
 
   convert_switch_statements ();
   expand_builtins ();
diff --git a/gcc/hsa.c b/gcc/hsa.c
index 0dd6184..ec23f81 100644
--- a/gcc/hsa.c
+++ b/gcc/hsa.c
@@ -41,7 +41,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ctype.h"
 
 /* Structure containing intermediate HSA representation of the generated
-   function. */
+   function.  */
 class hsa_function_representation *hsa_cfun;
 
 /* Element of the mapping vector between a host decl and an HSA kernel.  */
@@ -61,7 +61,8 @@ struct GTY(()) hsa_decl_kernel_map_element
 /* Mapping between decls and corresponding HSA kernels in this compilation
    unit.  */
 
-static GTY (()) vec<hsa_decl_kernel_map_element, va_gc> *hsa_decl_kernel_mapping;
+static GTY (()) vec<hsa_decl_kernel_map_element, va_gc>
+  *hsa_decl_kernel_mapping;
 
 /* Mapping between decls and corresponding HSA kernels
    called by the function.  */
@@ -118,9 +119,10 @@ hsa_deinit_compilation_unit_data (void)
   delete hsa_failed_functions;
   delete hsa_emitted_internal_decls;
 
-  for (hash_table <hsa_noop_symbol_hasher>::iterator it =
-       hsa_global_variable_symbols->begin ();
-       it != hsa_global_variable_symbols->end (); ++it)
+  for (hash_table <hsa_noop_symbol_hasher>::iterator it
+       = hsa_global_variable_symbols->begin ();
+       it != hsa_global_variable_symbols->end ();
+       ++it)
     {
       hsa_symbol *sym = *it;
       delete sym;
@@ -727,10 +729,9 @@ hsa_get_declaration_name (tree decl)
 {
   if (!DECL_NAME (decl))
     {
-      char *b = XNEWVEC (char, 64);
-      sprintf (b, "__hsa_anonymous_%i", DECL_UID (decl));
-      const char *ggc_str = ggc_alloc_string (b, strlen (b) + 1);
-      free (b);
+      char buf[64];
+      snprintf (buf, 64, "__hsa_anonymous_%i", DECL_UID (decl));
+      const char *ggc_str = ggc_strdup (buf);
       return ggc_str;
     }
 
@@ -888,6 +889,8 @@ hsa_internal_fn::get_arity ()
     case IFN_PARITY:
     case IFN_POPCOUNT:
     default:
+      /* As we produce sorry message for unknown internal functions,
+	 reaching this label is definitely a bug.  */
       gcc_unreachable ();
     }
 }
@@ -935,6 +938,8 @@ hsa_internal_fn::get_argument_type (int n)
 	  return BRIG_TYPE_S32;
       }
     default:
+      /* As we produce sorry message for unknown internal functions,
+	 reaching this label is definitely a bug.  */
       gcc_unreachable ();
     }
 }
diff --git a/gcc/hsa.h b/gcc/hsa.h
index 29b29e2..f0436f3 100644
--- a/gcc/hsa.h
+++ b/gcc/hsa.h
@@ -40,7 +40,7 @@ hsa_gen_requested_p (void)
   return !flag_disable_hsa;
 }
 
-/* Standard warning message if we failed to generate HSAIL for a function */
+/* Standard warning message if we failed to generate HSAIL for a function.  */
 
 #define HSA_SORRY_MSG "could not emit HSAIL for the function"
 
@@ -75,7 +75,7 @@ struct hsa_symbol
   tree m_decl;
 
   /* Name of the symbol, that will be written into output and dumps.  Can be
-     NULL, see name_number below.*/
+     NULL, see name_number below.  */
   const char *m_name;
 
   /* If name is NULL, artificial name will be formed from the segment name and
@@ -115,7 +115,7 @@ private:
   hsa_symbol ();
 };
 
-/* Abstract class for HSA instruction operands. */
+/* Abstract class for HSA instruction operands.  */
 
 class hsa_op_base
 {
@@ -231,7 +231,7 @@ public:
      be 'c', 's', 'd' or 'q'.  */
   char m_reg_class;
   /* If allocated, the number of the HW register (within its HSA register
-     class). */
+     class).  */
   char m_hard_num;
 
 private:
@@ -318,7 +318,7 @@ is_a_helper <hsa_op_address *>::test (hsa_op_base *p)
   return p->m_kind == BRIG_KIND_OPERAND_ADDRESS;
 }
 
-/* A reference to code HSA operand. It can be either reference
+/* A reference to code HSA operand.  It can be either reference
    to a start of a BB or a start of a function.  */
 
 class hsa_op_code_ref : public hsa_op_base
@@ -624,10 +624,10 @@ public:
   /* The segment is of the memory access is either the segment of the symbol in
      the address operand or flat address is there is no symbol there.  */
 
-  /* Required alignment of the memory operation. */
+  /* Required alignment of the memory operation.  */
   BrigAlignment8_t m_align;
 
-  /* HSA equiv class, basically an alias set number. */
+  /* HSA equiv class, basically an alias set number.  */
   uint8_t m_equiv_class;
 
   /* TODO:  Add width modifier, perhaps also other things.  */
@@ -672,7 +672,7 @@ public:
   /* Things like acquire/release/aligned.  */
   enum BrigMemoryOrder m_memoryorder;
 
-  /* Scope of the atomic operation. */
+  /* Scope of the atomic operation.  */
   enum BrigMemoryScope m_memoryscope;
 
 private:
@@ -801,7 +801,7 @@ public:
 
   void *operator new (size_t);
 
-  /* Called function */
+  /* Called function.  */
   tree m_called_function;
 
   /* Called internal function.  */
@@ -1013,7 +1013,7 @@ public:
   hsa_insn_alloca (hsa_op_with_type *dest, hsa_op_with_type *size,
 		   unsigned alignment = 0);
 
-  /* Required alignment of the allocation. */
+  /* Required alignment of the allocation.  */
   BrigAlignment8_t m_align;
 
   /* Pool allocator.  */
-- 
2.6.4

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 10/10] HSA register allocator
  2016-01-14 14:41   ` Jakub Jelinek
@ 2016-01-15 15:09     ` Martin Jambor
  0 siblings, 0 replies; 77+ messages in thread
From: Martin Jambor @ 2016-01-15 15:09 UTC (permalink / raw)
  To: gcc-patches

Hi,

On Thu, Jan 14, 2016 at 03:41:34PM +0100, Jakub Jelinek wrote:
> On Wed, Jan 13, 2016 at 06:39:35PM +0100, Martin Jambor wrote:
> > +    for (phi = hbb->m_first_phi;
> > +	 phi;
> > +	 phi = phi->m_next ? as_a <hsa_insn_phi *> (phi->m_next): NULL)
> 
> Space before :
> 
> Ok with that change.
> 

I have committed the following patch from Martin to address this and a
few other code style issues.

Thanks,

Martin

2016-01-15  Martin Liska  <mliska@suse.cz>

	* hsa-regalloc.c (naive_outof_ssa): Fixed coding style.
	(linear_scan_regalloc): Likewise.
	(regalloc): Likewise.
---
 gcc/hsa-regalloc.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/hsa-regalloc.c b/gcc/hsa-regalloc.c
index 5a42beb..f8e83ecf 100644
--- a/gcc/hsa-regalloc.c
+++ b/gcc/hsa-regalloc.c
@@ -90,7 +90,7 @@ naive_outof_ssa (void)
 
     for (phi = hbb->m_first_phi;
 	 phi;
-	 phi = phi->m_next ? as_a <hsa_insn_phi *> (phi->m_next): NULL)
+	 phi = phi->m_next ? as_a <hsa_insn_phi *> (phi->m_next) : NULL)
       naive_process_phi (phi);
 
     /* Zap PHI nodes, they will be deallocated when everything else will.  */
@@ -525,7 +525,7 @@ linear_scan_regalloc (struct m_reg_class_desc *classes)
       else
 	after_end_number = insn_order;
       /* Everything live-out in this BB has at least an end point
-         after us. */
+	 after us.  */
       EXECUTE_IF_SET_IN_BITMAP (hbb->m_liveout, 0, bit, bi)
 	note_lr_end (ind2reg[bit], after_end_number);
 
@@ -549,7 +549,7 @@ linear_scan_regalloc (struct m_reg_class_desc *classes)
 	}
 
       /* Everything live-in in this BB has a start point before
-         our first insn.  */
+	 our first insn.  */
       int before_start_number;
       if (hbb->m_first_insn)
 	before_start_number = hbb->m_first_insn->m_number;
@@ -570,7 +570,7 @@ linear_scan_regalloc (struct m_reg_class_desc *classes)
 	   are defined at the start of the routine (prologue).  */
 	if (ind2reg[i]->m_lr_begin == insn_order)
 	  ind2reg[i]->m_lr_begin = 0;
-	/* All regs that have no use but a def will have lr_end == 0, 
+	/* All regs that have no use but a def will have lr_end == 0,
 	   they are actually live from def until after the insn they are
 	   defined in.  */
 	if (ind2reg[i]->m_lr_end == 0)
@@ -672,7 +672,7 @@ regalloc (void)
   basic_block bb;
   m_reg_class_desc classes[4];
 
-  /* If there are no registers used in the function, exit right away. */
+  /* If there are no registers used in the function, exit right away.  */
   if (hsa_cfun->m_reg_count == 0)
     return;
 
-- 
2.6.4

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 09/10] Majority of the HSA back-end
  2016-01-15 15:08     ` Martin Jambor
@ 2016-01-15 15:18       ` Jakub Jelinek
  0 siblings, 0 replies; 77+ messages in thread
From: Jakub Jelinek @ 2016-01-15 15:18 UTC (permalink / raw)
  To: GCC Patches, Martin Liska

On Fri, Jan 15, 2016 at 04:08:14PM +0100, Martin Jambor wrote:
> We don't error, apart from issuing a warning we basically ignore them.
> I believe we can do it even in the long term and that it is in fact
> useful because the standard says that the "effect" if these routines
> is "unspecified" if they get called from a target region.
> 
> Perhaps this is even something we should warn about earlier in omp
> lowering/expansion.

Well, only some of the omp.h functions are not allowed to be called from 
the target regions, others are.
For the others that have unspecified behavior, there is always the question
when it is desirable to warn.  In target construct body it might be a
warning candidate, the only possibility that it is not invoking unspec
behavior is if the target construct is not encountered, if it is in dead
code in that body, or just never encountered.
But if you have declare target routine, it is more controversial to warn,
because the routine can be run both on host (where it is fine) and on target
(where it is not), whether it calls the argument e.g. could depend on some
parameter or result of some function (say check whether it is in offloaded
region).

Anyway, thanks for fixing this, the patch is ok for trunk.  And after the
commit you're the maintainer, so it is up to you to review further changes
to it.  Please keep it nicely and consistently formatted in the future ;)

	Jakub

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 07/10] IPA-HSA pass
  2016-01-15 15:01       ` Jakub Jelinek
@ 2016-01-15 16:02         ` Martin Jambor
  2016-01-15 16:10           ` Jakub Jelinek
  0 siblings, 1 reply; 77+ messages in thread
From: Martin Jambor @ 2016-01-15 16:02 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Martin Liska

Hi,

On Fri, Jan 15, 2016 at 04:01:49PM +0100, Jakub Jelinek wrote:
> On Fri, Jan 15, 2016 at 03:53:23PM +0100, Martin Jambor wrote:
> > @@ -317,7 +319,7 @@ public:
> >  bool
> >  pass_ipa_hsa::gate (function *)
> >  {
> > -  return hsa_gen_requested_p () || in_lto_p;
> > +  return hsa_gen_requested_p ();
> >  }
> >  
> >  } // anon namespace
> 
> I actually didn't mean this, I mean more of:
>   return (hsa_gen_requested_p ()
> #ifdef ENABLE_HSA
> 	  || in_lto_p
> #endif
> 	 );
> or so.  Unless you arrange in lto-wrapper or where that if
> HSA is enabled in any LTO input source, then it is enabled also in
> lto1.  If you do that, your change is fine.
> 

This pass only creates HSA specific clones of ungridified target and
parallel regions and functions marked with declare target.  Whether or
not any HSAIL is emitted is then controlled in the hsa-gen pass gate.
The in_lto_p part was in fact a relict of a previous implementation.

So while I agree that making such a change to lto-wrapper would be
beneficial (although then we should limit its activity only to those
nodes which come from enabled units), the change above does not make
the current situation worse.  I will make sure to look into
lto-wrapper but meanwhile I still prefer the new condition.

We have tested the new change and LTO compiled code with HSA enabled
and LTO linked it with HSA disabled and:
  1) if there was no gridified loop, the result was like HSA was
     disabled from the start

  2) if there was a gridified kernel, the compiler compiled the kernel
     for the host but did not register it with libgomp and it ended up
     as an unreachable function.

How do other accelerators cope with the situation when half of the
application is compiled with the accelerator disabled?  (Would some of
their calls to GOMP_target_ext lead to abort?)

Martin

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 07/10] IPA-HSA pass
  2016-01-15 16:02         ` Martin Jambor
@ 2016-01-15 16:10           ` Jakub Jelinek
  2016-01-15 16:38             ` Ilya Verbin
  0 siblings, 1 reply; 77+ messages in thread
From: Jakub Jelinek @ 2016-01-15 16:10 UTC (permalink / raw)
  To: gcc-patches, Martin Liska

On Fri, Jan 15, 2016 at 05:02:34PM +0100, Martin Jambor wrote:
> How do other accelerators cope with the situation when half of the
> application is compiled with the accelerator disabled?  (Would some of
> their calls to GOMP_target_ext lead to abort?)

GOMP_target_ext should never abort (unless internal error), worst case it
just falls back into the host fallback.

	Jakub

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 07/10] IPA-HSA pass
  2016-01-15 16:10           ` Jakub Jelinek
@ 2016-01-15 16:38             ` Ilya Verbin
  2016-01-15 16:45               ` Jakub Jelinek
  0 siblings, 1 reply; 77+ messages in thread
From: Ilya Verbin @ 2016-01-15 16:38 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Martin Liska

On Fri, Jan 15, 2016 at 17:09:54 +0100, Jakub Jelinek wrote:
> On Fri, Jan 15, 2016 at 05:02:34PM +0100, Martin Jambor wrote:
> > How do other accelerators cope with the situation when half of the
> > application is compiled with the accelerator disabled?  (Would some of
> > their calls to GOMP_target_ext lead to abort?)
> 
> GOMP_target_ext should never abort (unless internal error), worst case it
> just falls back into the host fallback.

Wouldn't that lead to hard-to-find problems in case of nonshared memory?
I mean when someone expects that all target regions are executed on the device,
but in fact some of them are silently executed on the host with different data
environment.

  -- Ilya

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 07/10] IPA-HSA pass
  2016-01-15 16:38             ` Ilya Verbin
@ 2016-01-15 16:45               ` Jakub Jelinek
  2016-01-15 18:06                 ` Ilya Verbin
  2016-01-15 19:19                 ` [hsa merge 07/10] IPA-HSA pass Alexander Monakov
  0 siblings, 2 replies; 77+ messages in thread
From: Jakub Jelinek @ 2016-01-15 16:45 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: gcc-patches, Martin Liska

On Fri, Jan 15, 2016 at 07:38:14PM +0300, Ilya Verbin wrote:
> On Fri, Jan 15, 2016 at 17:09:54 +0100, Jakub Jelinek wrote:
> > On Fri, Jan 15, 2016 at 05:02:34PM +0100, Martin Jambor wrote:
> > > How do other accelerators cope with the situation when half of the
> > > application is compiled with the accelerator disabled?  (Would some of
> > > their calls to GOMP_target_ext lead to abort?)
> > 
> > GOMP_target_ext should never abort (unless internal error), worst case it
> > just falls back into the host fallback.
> 
> Wouldn't that lead to hard-to-find problems in case of nonshared memory?
> I mean when someone expects that all target regions are executed on the device,
> but in fact some of them are silently executed on the host with different data
> environment.

E.g. for HSA it really shouldn't matter, as it is shared memory accelerator.
For XeonPhi we hopefully can offload anything.  NVPTX is problematic,
because it can't offload all the code, but if it can be e.g. compile time
detected that it will not be possible, it can just provide offloaded code
for the target.

	Jakub

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 08/10] HSAIL BRIG description header file
  2016-01-15 12:03           ` Jakub Jelinek
@ 2016-01-15 17:23             ` Martin Jambor
  2016-01-16 11:43               ` Jakub Jelinek
  0 siblings, 1 reply; 77+ messages in thread
From: Martin Jambor @ 2016-01-15 17:23 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Ian Lance Taylor, GCC Patches, David Edelsohn, Jeff Law

On Fri, Jan 15, 2016 at 01:03:35PM +0100, Jakub Jelinek wrote:
> On Fri, Jan 15, 2016 at 11:37:32AM +0100, Jakub Jelinek wrote:
> > On Fri, Jan 15, 2016 at 11:14:33AM +0100, Martin Jambor wrote:
> > > > Martin, could you ask the HSA Foundation or AMD or whoever if there is
> > > > any way they could remove the second requirement of the license?  It
> > > > adds yet another case where anybody distributing GCC has to list yet
> > > > another copyright notice.
> > > 
> > > I will raise this with the HSA PRM group and perhaps there is a slight
> > > chance that they will change this in the upcoming version of HSAIL.
> > > But it is not going to happen soon enough.
> > 
> > Under what license is
> > http://www.hsafoundation.com/html/Content/PRM/Topics/18_BRIG/_chpStr_BRIG_HSAIL_binary_format.htm
> > ?  Sounds the same as the pdf to me.
> > Unlike the pdf version thereof, you could grab the <pre>...</pre> chunks
> > out of this fairly easily with recursive wget and some quick scripting.
> 
> E.g.
> for i in `seq 2 123`; do sed 's/\r$//;s/&lt;/</g;s/<pre[^>]*>/\n<pre>\n/g;s/<\/pre>/\n<\/pre>\n/g;s/<a name=[^>]*><\/a>//g' $i | sed -n '/^<pre>$/,/^<\/pre>$/{/^<.*pre>$/d;p}'; done
> on downloaded (in the order of appearance in the toc) files, I get
> following, which while it doesn't compile, I suppose some manual reordering
> and if it is needed in C, also e.g. in case of typedef BrigModuleHeader* BrigModule_t; adding
> struct before BrigModuleHeader or turning that struct also into a typedef, might make it work.
> Now the question is if it covers all you care about.
> 

Yes it does.  We have massaged it just a little and it works fine (and
the compiler is also also basically the same binary-wise).  So we will
go with the following hsa-brig-format.h (in its old location in gcc/).

Thanks for this input, it really helped,

Martin


/* HSA BRIG (binary representation of HSAIL) 1.0.1 representation description.
   Copyright (C) 2016 Free Software Foundation, Inc.

This file is part of GCC.

GCC is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3, or (at your option)
any later version.

GCC is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with GCC; see the file COPYING3.  If not see
<http://www.gnu.org/licenses/>.

The contents of the file was created by extracting data structures, enum,
typedef and other definitions from HSA Programmer's Reference Manual Version
1.0.1 (http://www.hsafoundation.com/standards/).

HTML version is provided on the following link:
http://www.hsafoundation.com/html/Content/PRM/Topics/PRM_title_page.htm */

#ifndef HSA_BRIG_FORMAT_H
#define HSA_BRIG_FORMAT_H

struct BrigModuleHeader;
typedef uint16_t BrigKind16_t;
typedef uint32_t BrigVersion32_t;

typedef BrigModuleHeader *BrigModule_t;
typedef uint32_t BrigDataOffset32_t;
typedef uint32_t BrigCodeOffset32_t;
typedef uint32_t BrigOperandOffset32_t;
typedef BrigDataOffset32_t BrigDataOffsetString32_t;
typedef BrigDataOffset32_t BrigDataOffsetCodeList32_t;
typedef BrigDataOffset32_t BrigDataOffsetOperandList32_t;
typedef uint8_t BrigAlignment8_t;

enum BrigAlignment
{
  BRIG_ALIGNMENT_NONE = 0,
  BRIG_ALIGNMENT_1 = 1,
  BRIG_ALIGNMENT_2 = 2,
  BRIG_ALIGNMENT_4 = 3,
  BRIG_ALIGNMENT_8 = 4,
  BRIG_ALIGNMENT_16 = 5,
  BRIG_ALIGNMENT_32 = 6,
  BRIG_ALIGNMENT_64 = 7,
  BRIG_ALIGNMENT_128 = 8,
  BRIG_ALIGNMENT_256 = 9
};

typedef uint8_t BrigAllocation8_t;

enum BrigAllocation
{
  BRIG_ALLOCATION_NONE = 0,
  BRIG_ALLOCATION_PROGRAM = 1,
  BRIG_ALLOCATION_AGENT = 2,
  BRIG_ALLOCATION_AUTOMATIC = 3
};

typedef uint8_t BrigAluModifier8_t;

enum BrigAluModifierMask
{
  BRIG_ALU_FTZ = 1
};

typedef uint8_t BrigAtomicOperation8_t;

enum BrigAtomicOperation
{
  BRIG_ATOMIC_ADD = 0,
  BRIG_ATOMIC_AND = 1,
  BRIG_ATOMIC_CAS = 2,
  BRIG_ATOMIC_EXCH = 3,
  BRIG_ATOMIC_LD = 4,
  BRIG_ATOMIC_MAX = 5,
  BRIG_ATOMIC_MIN = 6,
  BRIG_ATOMIC_OR = 7,
  BRIG_ATOMIC_ST = 8,
  BRIG_ATOMIC_SUB = 9,
  BRIG_ATOMIC_WRAPDEC = 10,
  BRIG_ATOMIC_WRAPINC = 11,
  BRIG_ATOMIC_XOR = 12,
  BRIG_ATOMIC_WAIT_EQ = 13,
  BRIG_ATOMIC_WAIT_NE = 14,
  BRIG_ATOMIC_WAIT_LT = 15,
  BRIG_ATOMIC_WAIT_GTE = 16,
  BRIG_ATOMIC_WAITTIMEOUT_EQ = 17,
  BRIG_ATOMIC_WAITTIMEOUT_NE = 18,
  BRIG_ATOMIC_WAITTIMEOUT_LT = 19,
  BRIG_ATOMIC_WAITTIMEOUT_GTE = 20
};

struct BrigBase
{
  uint16_t byteCount;
  BrigKind16_t kind;
};

typedef uint8_t BrigCompareOperation8_t;

enum BrigCompareOperation
{
  BRIG_COMPARE_EQ = 0,
  BRIG_COMPARE_NE = 1,
  BRIG_COMPARE_LT = 2,
  BRIG_COMPARE_LE = 3,
  BRIG_COMPARE_GT = 4,
  BRIG_COMPARE_GE = 5,
  BRIG_COMPARE_EQU = 6,
  BRIG_COMPARE_NEU = 7,
  BRIG_COMPARE_LTU = 8,
  BRIG_COMPARE_LEU = 9,
  BRIG_COMPARE_GTU = 10,
  BRIG_COMPARE_GEU = 11,
  BRIG_COMPARE_NUM = 12,
  BRIG_COMPARE_NAN = 13,
  BRIG_COMPARE_SEQ = 14,
  BRIG_COMPARE_SNE = 15,
  BRIG_COMPARE_SLT = 16,
  BRIG_COMPARE_SLE = 17,
  BRIG_COMPARE_SGT = 18,
  BRIG_COMPARE_SGE = 19,
  BRIG_COMPARE_SGEU = 20,
  BRIG_COMPARE_SEQU = 21,
  BRIG_COMPARE_SNEU = 22,
  BRIG_COMPARE_SLTU = 23,
  BRIG_COMPARE_SLEU = 24,
  BRIG_COMPARE_SNUM = 25,
  BRIG_COMPARE_SNAN = 26,
  BRIG_COMPARE_SGTU = 27
};

typedef uint16_t BrigControlDirective16_t;

enum BrigControlDirective
{
  BRIG_CONTROL_NONE = 0,
  BRIG_CONTROL_ENABLEBREAKEXCEPTIONS = 1,
  BRIG_CONTROL_ENABLEDETECTEXCEPTIONS = 2,
  BRIG_CONTROL_MAXDYNAMICGROUPSIZE = 3,
  BRIG_CONTROL_MAXFLATGRIDSIZE = 4,
  BRIG_CONTROL_MAXFLATWORKGROUPSIZE = 5,
  BRIG_CONTROL_REQUIREDDIM = 6,
  BRIG_CONTROL_REQUIREDGRIDSIZE = 7,
  BRIG_CONTROL_REQUIREDWORKGROUPSIZE = 8,
  BRIG_CONTROL_REQUIRENOPARTIALWORKGROUPS = 9
};

typedef uint32_t BrigExceptions32_t;

enum BrigExceptionsMask
{
  BRIG_EXCEPTIONS_INVALID_OPERATION = 1 << 0,
  BRIG_EXCEPTIONS_DIVIDE_BY_ZERO = 1 << 1,
  BRIG_EXCEPTIONS_OVERFLOW = 1 << 2,
  BRIG_EXCEPTIONS_UNDERFLOW = 1 << 3,
  BRIG_EXCEPTIONS_INEXACT = 1 << 4,
  BRIG_EXCEPTIONS_FIRST_USER_DEFINED = 1 << 16
};

typedef uint8_t BrigExecutableModifier8_t;

enum BrigExecutableModifierMask
{
  BRIG_EXECUTABLE_DEFINITION = 1
};

typedef uint8_t BrigImageChannelOrder8_t;

enum BrigImageChannelOrder
{
  BRIG_CHANNEL_ORDER_A = 0,
  BRIG_CHANNEL_ORDER_R = 1,
  BRIG_CHANNEL_ORDER_RX = 2,
  BRIG_CHANNEL_ORDER_RG = 3,
  BRIG_CHANNEL_ORDER_RGX = 4,
  BRIG_CHANNEL_ORDER_RA = 5,
  BRIG_CHANNEL_ORDER_RGB = 6,
  BRIG_CHANNEL_ORDER_RGBX = 7,
  BRIG_CHANNEL_ORDER_RGBA = 8,
  BRIG_CHANNEL_ORDER_BGRA = 9,
  BRIG_CHANNEL_ORDER_ARGB = 10,
  BRIG_CHANNEL_ORDER_ABGR = 11,
  BRIG_CHANNEL_ORDER_SRGB = 12,
  BRIG_CHANNEL_ORDER_SRGBX = 13,
  BRIG_CHANNEL_ORDER_SRGBA = 14,
  BRIG_CHANNEL_ORDER_SBGRA = 15,
  BRIG_CHANNEL_ORDER_INTENSITY = 16,
  BRIG_CHANNEL_ORDER_LUMINANCE = 17,
  BRIG_CHANNEL_ORDER_DEPTH = 18,
  BRIG_CHANNEL_ORDER_DEPTH_STENCIL = 19,
  BRIG_CHANNEL_ORDER_FIRST_USER_DEFINED = 128
};

typedef uint8_t BrigImageChannelType8_t;

enum BrigImageChannelType
{
  BRIG_CHANNEL_TYPE_SNORM_INT8 = 0,
  BRIG_CHANNEL_TYPE_SNORM_INT16 = 1,
  BRIG_CHANNEL_TYPE_UNORM_INT8 = 2,
  BRIG_CHANNEL_TYPE_UNORM_INT16 = 3,
  BRIG_CHANNEL_TYPE_UNORM_INT24 = 4,
  BRIG_CHANNEL_TYPE_UNORM_SHORT_555 = 5,
  BRIG_CHANNEL_TYPE_UNORM_SHORT_565 = 6,
  BRIG_CHANNEL_TYPE_UNORM_INT_101010 = 7,
  BRIG_CHANNEL_TYPE_SIGNED_INT8 = 8,
  BRIG_CHANNEL_TYPE_SIGNED_INT16 = 9,
  BRIG_CHANNEL_TYPE_SIGNED_INT32 = 10,
  BRIG_CHANNEL_TYPE_UNSIGNED_INT8 = 11,
  BRIG_CHANNEL_TYPE_UNSIGNED_INT16 = 12,
  BRIG_CHANNEL_TYPE_UNSIGNED_INT32 = 13,
  BRIG_CHANNEL_TYPE_HALF_FLOAT = 14,
  BRIG_CHANNEL_TYPE_FLOAT = 15,
  BRIG_CHANNEL_TYPE_FIRST_USER_DEFINED = 128
};

typedef uint8_t BrigImageGeometry8_t;

enum BrigImageGeometry
{
  BRIG_GEOMETRY_1D = 0,
  BRIG_GEOMETRY_2D = 1,
  BRIG_GEOMETRY_3D = 2,
  BRIG_GEOMETRY_1DA = 3,
  BRIG_GEOMETRY_2DA = 4,
  BRIG_GEOMETRY_1DB = 5,
  BRIG_GEOMETRY_2DDEPTH = 6,
  BRIG_GEOMETRY_2DADEPTH = 7,
  BRIG_GEOMETRY_FIRST_USER_DEFINED = 128
};

typedef uint8_t BrigImageQuery8_t;

enum BrigImageQuery
{
  BRIG_IMAGE_QUERY_WIDTH = 0,
  BRIG_IMAGE_QUERY_HEIGHT = 1,
  BRIG_IMAGE_QUERY_DEPTH = 2,
  BRIG_IMAGE_QUERY_ARRAY = 3,
  BRIG_IMAGE_QUERY_CHANNELORDER = 4,
  BRIG_IMAGE_QUERY_CHANNELTYPE = 5
};

enum BrigKind
{
  BRIG_KIND_NONE = 0x0000,
  BRIG_KIND_DIRECTIVE_BEGIN = 0x1000,
  BRIG_KIND_DIRECTIVE_ARG_BLOCK_END = 0x1000,
  BRIG_KIND_DIRECTIVE_ARG_BLOCK_START = 0x1001,
  BRIG_KIND_DIRECTIVE_COMMENT = 0x1002,
  BRIG_KIND_DIRECTIVE_CONTROL = 0x1003,
  BRIG_KIND_DIRECTIVE_EXTENSION = 0x1004,
  BRIG_KIND_DIRECTIVE_FBARRIER = 0x1005,
  BRIG_KIND_DIRECTIVE_FUNCTION = 0x1006,
  BRIG_KIND_DIRECTIVE_INDIRECT_FUNCTION = 0x1007,
  BRIG_KIND_DIRECTIVE_KERNEL = 0x1008,
  BRIG_KIND_DIRECTIVE_LABEL = 0x1009,
  BRIG_KIND_DIRECTIVE_LOC = 0x100a,
  BRIG_KIND_DIRECTIVE_MODULE = 0x100b,
  BRIG_KIND_DIRECTIVE_PRAGMA = 0x100c,
  BRIG_KIND_DIRECTIVE_SIGNATURE = 0x100d,
  BRIG_KIND_DIRECTIVE_VARIABLE = 0x100e,
  BRIG_KIND_DIRECTIVE_END = 0x100f,
  BRIG_KIND_INST_BEGIN = 0x2000,
  BRIG_KIND_INST_ADDR = 0x2000,
  BRIG_KIND_INST_ATOMIC = 0x2001,
  BRIG_KIND_INST_BASIC = 0x2002,
  BRIG_KIND_INST_BR = 0x2003,
  BRIG_KIND_INST_CMP = 0x2004,
  BRIG_KIND_INST_CVT = 0x2005,
  BRIG_KIND_INST_IMAGE = 0x2006,
  BRIG_KIND_INST_LANE = 0x2007,
  BRIG_KIND_INST_MEM = 0x2008,
  BRIG_KIND_INST_MEM_FENCE = 0x2009,
  BRIG_KIND_INST_MOD = 0x200a,
  BRIG_KIND_INST_QUERY_IMAGE = 0x200b,
  BRIG_KIND_INST_QUERY_SAMPLER = 0x200c,
  BRIG_KIND_INST_QUEUE = 0x200d,
  BRIG_KIND_INST_SEG = 0x200e,
  BRIG_KIND_INST_SEG_CVT = 0x200f,
  BRIG_KIND_INST_SIGNAL = 0x2010,
  BRIG_KIND_INST_SOURCE_TYPE = 0x2011,
  BRIG_KIND_INST_END = 0x2012,
  BRIG_KIND_OPERAND_BEGIN = 0x3000,
  BRIG_KIND_OPERAND_ADDRESS = 0x3000,
  BRIG_KIND_OPERAND_ALIGN = 0x3001,
  BRIG_KIND_OPERAND_CODE_LIST = 0x3002,
  BRIG_KIND_OPERAND_CODE_REF = 0x3003,
  BRIG_KIND_OPERAND_CONSTANT_BYTES = 0x3004,
  BRIG_KIND_OPERAND_RESERVED = 0x3005,
  BRIG_KIND_OPERAND_CONSTANT_IMAGE = 0x3006,
  BRIG_KIND_OPERAND_CONSTANT_OPERAND_LIST = 0x3007,
  BRIG_KIND_OPERAND_CONSTANT_SAMPLER = 0x3008,
  BRIG_KIND_OPERAND_OPERAND_LIST = 0x3009,
  BRIG_KIND_OPERAND_REGISTER = 0x300a,
  BRIG_KIND_OPERAND_STRING = 0x300b,
  BRIG_KIND_OPERAND_WAVESIZE = 0x3009c,
  BRIG_KIND_OPERAND_END = 0x300d
};

typedef uint8_t BrigLinkage8_t;

enum BrigLinkage
{
  BRIG_LINKAGE_NONE = 0,
  BRIG_LINKAGE_PROGRAM = 1,
  BRIG_LINKAGE_MODULE = 2,
  BRIG_LINKAGE_FUNCTION = 3,
  BRIG_LINKAGE_ARG = 4
};

typedef uint8_t BrigMachineModel8_t;

enum BrigMachineModel
{
  BRIG_MACHINE_SMALL = 0,
  BRIG_MACHINE_LARGE = 1
};

typedef uint8_t BrigMemoryModifier8_t;

enum BrigMemoryModifierMask
{
  BRIG_MEMORY_CONST = 1
};

typedef uint8_t BrigMemoryOrder8_t;

enum BrigMemoryOrder
{
  BRIG_MEMORY_ORDER_NONE = 0,
  BRIG_MEMORY_ORDER_RELAXED = 1,
  BRIG_MEMORY_ORDER_SC_ACQUIRE = 2,
  BRIG_MEMORY_ORDER_SC_RELEASE = 3,
  BRIG_MEMORY_ORDER_SC_ACQUIRE_RELEASE = 4
};

typedef uint8_t BrigMemoryScope8_t;

enum BrigMemoryScope
{
  BRIG_MEMORY_SCOPE_NONE = 0,
  BRIG_MEMORY_SCOPE_WORKITEM = 1,
  BRIG_MEMORY_SCOPE_WAVEFRONT = 2,
  BRIG_MEMORY_SCOPE_WORKGROUP = 3,
  BRIG_MEMORY_SCOPE_AGENT = 4,
  BRIG_MEMORY_SCOPE_SYSTEM = 5
};

struct BrigModuleHeader
{
  char identification[8];
  BrigVersion32_t brigMajor;
  BrigVersion32_t brigMinor;
  uint64_t byteCount;
  uint8_t hash[64];
  uint32_t reserved;
  uint32_t sectionCount;
  uint64_t sectionIndex;
};

typedef uint16_t BrigOpcode16_t;

enum BrigOpcode
{
  BRIG_OPCODE_NOP = 0,
  BRIG_OPCODE_ABS = 1,
  BRIG_OPCODE_ADD = 2,
  BRIG_OPCODE_BORROW = 3,
  BRIG_OPCODE_CARRY = 4,
  BRIG_OPCODE_CEIL = 5,
  BRIG_OPCODE_COPYSIGN = 6,
  BRIG_OPCODE_DIV = 7,
  BRIG_OPCODE_FLOOR = 8,
  BRIG_OPCODE_FMA = 9,
  BRIG_OPCODE_FRACT = 10,
  BRIG_OPCODE_MAD = 11,
  BRIG_OPCODE_MAX = 12,
  BRIG_OPCODE_MIN = 13,
  BRIG_OPCODE_MUL = 14,
  BRIG_OPCODE_MULHI = 15,
  BRIG_OPCODE_NEG = 16,
  BRIG_OPCODE_REM = 17,
  BRIG_OPCODE_RINT = 18,
  BRIG_OPCODE_SQRT = 19,
  BRIG_OPCODE_SUB = 20,
  BRIG_OPCODE_TRUNC = 21,
  BRIG_OPCODE_MAD24 = 22,
  BRIG_OPCODE_MAD24HI = 23,
  BRIG_OPCODE_MUL24 = 24,
  BRIG_OPCODE_MUL24HI = 25,
  BRIG_OPCODE_SHL = 26,
  BRIG_OPCODE_SHR = 27,
  BRIG_OPCODE_AND = 28,
  BRIG_OPCODE_NOT = 29,
  BRIG_OPCODE_OR = 30,
  BRIG_OPCODE_POPCOUNT = 31,
  BRIG_OPCODE_XOR = 32,
  BRIG_OPCODE_BITEXTRACT = 33,
  BRIG_OPCODE_BITINSERT = 34,
  BRIG_OPCODE_BITMASK = 35,
  BRIG_OPCODE_BITREV = 36,
  BRIG_OPCODE_BITSELECT = 37,
  BRIG_OPCODE_FIRSTBIT = 38,
  BRIG_OPCODE_LASTBIT = 39,
  BRIG_OPCODE_COMBINE = 40,
  BRIG_OPCODE_EXPAND = 41,
  BRIG_OPCODE_LDA = 42,
  BRIG_OPCODE_MOV = 43,
  BRIG_OPCODE_SHUFFLE = 44,
  BRIG_OPCODE_UNPACKHI = 45,
  BRIG_OPCODE_UNPACKLO = 46,
  BRIG_OPCODE_PACK = 47,
  BRIG_OPCODE_UNPACK = 48,
  BRIG_OPCODE_CMOV = 49,
  BRIG_OPCODE_CLASS = 50,
  BRIG_OPCODE_NCOS = 51,
  BRIG_OPCODE_NEXP2 = 52,
  BRIG_OPCODE_NFMA = 53,
  BRIG_OPCODE_NLOG2 = 54,
  BRIG_OPCODE_NRCP = 55,
  BRIG_OPCODE_NRSQRT = 56,
  BRIG_OPCODE_NSIN = 57,
  BRIG_OPCODE_NSQRT = 58,
  BRIG_OPCODE_BITALIGN = 59,
  BRIG_OPCODE_BYTEALIGN = 60,
  BRIG_OPCODE_PACKCVT = 61,
  BRIG_OPCODE_UNPACKCVT = 62,
  BRIG_OPCODE_LERP = 63,
  BRIG_OPCODE_SAD = 64,
  BRIG_OPCODE_SADHI = 65,
  BRIG_OPCODE_SEGMENTP = 66,
  BRIG_OPCODE_FTOS = 67,
  BRIG_OPCODE_STOF = 68,
  BRIG_OPCODE_CMP = 69,
  BRIG_OPCODE_CVT = 70,
  BRIG_OPCODE_LD = 71,
  BRIG_OPCODE_ST = 72,
  BRIG_OPCODE_ATOMIC = 73,
  BRIG_OPCODE_ATOMICNORET = 74,
  BRIG_OPCODE_SIGNAL = 75,
  BRIG_OPCODE_SIGNALNORET = 76,
  BRIG_OPCODE_MEMFENCE = 77,
  BRIG_OPCODE_RDIMAGE = 78,
  BRIG_OPCODE_LDIMAGE = 79,
  BRIG_OPCODE_STIMAGE = 80,
  BRIG_OPCODE_IMAGEFENCE = 81,
  BRIG_OPCODE_QUERYIMAGE = 82,
  BRIG_OPCODE_QUERYSAMPLER = 83,
  BRIG_OPCODE_CBR = 84,
  BRIG_OPCODE_BR = 85,
  BRIG_OPCODE_SBR = 86,
  BRIG_OPCODE_BARRIER = 87,
  BRIG_OPCODE_WAVEBARRIER = 88,
  BRIG_OPCODE_ARRIVEFBAR = 89,
  BRIG_OPCODE_INITFBAR = 90,
  BRIG_OPCODE_JOINFBAR = 91,
  BRIG_OPCODE_LEAVEFBAR = 92,
  BRIG_OPCODE_RELEASEFBAR = 93,
  BRIG_OPCODE_WAITFBAR = 94,
  BRIG_OPCODE_LDF = 95,
  BRIG_OPCODE_ACTIVELANECOUNT = 96,
  BRIG_OPCODE_ACTIVELANEID = 97,
  BRIG_OPCODE_ACTIVELANEMASK = 98,
  BRIG_OPCODE_ACTIVELANEPERMUTE = 99,
  BRIG_OPCODE_CALL = 100,
  BRIG_OPCODE_SCALL = 101,
  BRIG_OPCODE_ICALL = 102,
  BRIG_OPCODE_RET = 103,
  BRIG_OPCODE_ALLOCA = 104,
  BRIG_OPCODE_CURRENTWORKGROUPSIZE = 105,
  BRIG_OPCODE_CURRENTWORKITEMFLATID = 106,
  BRIG_OPCODE_DIM = 107,
  BRIG_OPCODE_GRIDGROUPS = 108,
  BRIG_OPCODE_GRIDSIZE = 109,
  BRIG_OPCODE_PACKETCOMPLETIONSIG = 110,
  BRIG_OPCODE_PACKETID = 111,
  BRIG_OPCODE_WORKGROUPID = 112,
  BRIG_OPCODE_WORKGROUPSIZE = 113,
  BRIG_OPCODE_WORKITEMABSID = 114,
  BRIG_OPCODE_WORKITEMFLATABSID = 115,
  BRIG_OPCODE_WORKITEMFLATID = 116,
  BRIG_OPCODE_WORKITEMID = 117,
  BRIG_OPCODE_CLEARDETECTEXCEPT = 118,
  BRIG_OPCODE_GETDETECTEXCEPT = 119,
  BRIG_OPCODE_SETDETECTEXCEPT = 120,
  BRIG_OPCODE_ADDQUEUEWRITEINDEX = 121,
  BRIG_OPCODE_CASQUEUEWRITEINDEX = 122,
  BRIG_OPCODE_LDQUEUEREADINDEX = 123,
  BRIG_OPCODE_LDQUEUEWRITEINDEX = 124,
  BRIG_OPCODE_STQUEUEREADINDEX = 125,
  BRIG_OPCODE_STQUEUEWRITEINDEX = 126,
  BRIG_OPCODE_CLOCK = 127,
  BRIG_OPCODE_CUID = 128,
  BRIG_OPCODE_DEBUGTRAP = 129,
  BRIG_OPCODE_GROUPBASEPTR = 130,
  BRIG_OPCODE_KERNARGBASEPTR = 131,
  BRIG_OPCODE_LANEID = 132,
  BRIG_OPCODE_MAXCUID = 133,
  BRIG_OPCODE_MAXWAVEID = 134,
  BRIG_OPCODE_NULLPTR = 135,
  BRIG_OPCODE_WAVEID = 136,
  BRIG_OPCODE_FIRST_USER_DEFINED = 32768
};

typedef uint8_t BrigPack8_t;

enum BrigPack
{
  BRIG_PACK_NONE = 0,
  BRIG_PACK_PP = 1,
  BRIG_PACK_PS = 2,
  BRIG_PACK_SP = 3,
  BRIG_PACK_SS = 4,
  BRIG_PACK_S = 5,
  BRIG_PACK_P = 6,
  BRIG_PACK_PPSAT = 7,
  BRIG_PACK_PSSAT = 8,
  BRIG_PACK_SPSAT = 9,
  BRIG_PACK_SSSAT = 10,
  BRIG_PACK_SSAT = 11,
  BRIG_PACK_PSAT = 12
};

typedef uint8_t BrigProfile8_t;

enum BrigProfile
{
  BRIG_PROFILE_BASE = 0,
  BRIG_PROFILE_FULL = 1
};

typedef uint16_t BrigRegisterKind16_t;

enum BrigRegisterKind
{
  BRIG_REGISTER_KIND_CONTROL = 0,
  BRIG_REGISTER_KIND_SINGLE = 1,
  BRIG_REGISTER_KIND_DOUBLE = 2,
  BRIG_REGISTER_KIND_QUAD = 3
};

typedef uint8_t BrigRound8_t;

enum BrigRound
{
  BRIG_ROUND_NONE = 0,
  BRIG_ROUND_FLOAT_DEFAULT = 1,
  BRIG_ROUND_FLOAT_NEAR_EVEN = 2,
  BRIG_ROUND_FLOAT_ZERO = 3,
  BRIG_ROUND_FLOAT_PLUS_INFINITY = 4,
  BRIG_ROUND_FLOAT_MINUS_INFINITY = 5,
  BRIG_ROUND_INTEGER_NEAR_EVEN = 6,
  BRIG_ROUND_INTEGER_ZERO = 7,
  BRIG_ROUND_INTEGER_PLUS_INFINITY = 8,
  BRIG_ROUND_INTEGER_MINUS_INFINITY = 9,
  BRIG_ROUND_INTEGER_NEAR_EVEN_SAT = 10,
  BRIG_ROUND_INTEGER_ZERO_SAT = 11,
  BRIG_ROUND_INTEGER_PLUS_INFINITY_SAT = 12,
  BRIG_ROUND_INTEGER_MINUS_INFINITY_SAT = 13,
  BRIG_ROUND_INTEGER_SIGNALING_NEAR_EVEN = 14,
  BRIG_ROUND_INTEGER_SIGNALING_ZERO = 15,
  BRIG_ROUND_INTEGER_SIGNALING_PLUS_INFINITY = 16,
  BRIG_ROUND_INTEGER_SIGNALING_MINUS_INFINITY = 17,
  BRIG_ROUND_INTEGER_SIGNALING_NEAR_EVEN_SAT = 18,
  BRIG_ROUND_INTEGER_SIGNALING_ZERO_SAT = 19,
  BRIG_ROUND_INTEGER_SIGNALING_PLUS_INFINITY_SAT = 20,
  BRIG_ROUND_INTEGER_SIGNALING_MINUS_INFINITY_SAT = 21
};

typedef uint8_t BrigSamplerAddressing8_t;

enum BrigSamplerAddressing
{
  BRIG_ADDRESSING_UNDEFINED = 0,
  BRIG_ADDRESSING_CLAMP_TO_EDGE = 1,
  BRIG_ADDRESSING_CLAMP_TO_BORDER = 2,
  BRIG_ADDRESSING_REPEAT = 3,
  BRIG_ADDRESSING_MIRRORED_REPEAT = 4,
  BRIG_ADDRESSING_FIRST_USER_DEFINED = 128
};

typedef uint8_t BrigSamplerCoordNormalization8_t;

enum BrigSamplerCoordNormalization
{
  BRIG_COORD_UNNORMALIZED = 0,
  BRIG_COORD_NORMALIZED = 1
};

typedef uint8_t BrigSamplerFilter8_t;

enum BrigSamplerFilter
{
  BRIG_FILTER_NEAREST = 0,
  BRIG_FILTER_LINEAR = 1,
  BRIG_FILTER_FIRST_USER_DEFINED = 128
};

typedef uint8_t BrigSamplerQuery8_t;

enum BrigSamplerQuery
{
  BRIG_SAMPLER_QUERY_ADDRESSING = 0,
  BRIG_SAMPLER_QUERY_COORD = 1,
  BRIG_SAMPLER_QUERY_FILTER = 2
};

typedef uint32_t BrigSectionIndex32_t;

enum BrigSectionIndex
{
  BRIG_SECTION_INDEX_DATA = 0,
  BRIG_SECTION_INDEX_CODE = 1,
  BRIG_SECTION_INDEX_OPERAND = 2,
  BRIG_SECTION_INDEX_BEGIN_IMPLEMENTATION_DEFINED = 3
};

struct BrigSectionHeader
{
  uint64_t byteCount;
  uint32_t headerByteCount;
  uint32_t nameLength;
  uint8_t name[1];
};

typedef uint8_t BrigSegCvtModifier8_t;

enum BrigSegCvtModifierMask
{
  BRIG_SEG_CVT_NONULL = 1
};

typedef uint8_t BrigSegment8_t;

enum BrigSegment
{
  BRIG_SEGMENT_NONE = 0,
  BRIG_SEGMENT_FLAT = 1,
  BRIG_SEGMENT_GLOBAL = 2,
  BRIG_SEGMENT_READONLY = 3,
  BRIG_SEGMENT_KERNARG = 4,
  BRIG_SEGMENT_GROUP = 5,
  BRIG_SEGMENT_PRIVATE = 6,
  BRIG_SEGMENT_SPILL = 7,
  BRIG_SEGMENT_ARG = 8,
  BRIG_SEGMENT_FIRST_USER_DEFINED = 128
};

enum
{
  BRIG_TYPE_BASE_SIZE = 5,
  BRIG_TYPE_PACK_SIZE = 2,
  BRIG_TYPE_ARRAY_SIZE = 1,

  BRIG_TYPE_BASE_SHIFT = 0,
  BRIG_TYPE_PACK_SHIFT = BRIG_TYPE_BASE_SHIFT + BRIG_TYPE_BASE_SIZE,
  BRIG_TYPE_ARRAY_SHIFT = BRIG_TYPE_PACK_SHIFT + BRIG_TYPE_PACK_SIZE,

  BRIG_TYPE_BASE_MASK = ((1 << BRIG_TYPE_BASE_SIZE) - 1)
			<< BRIG_TYPE_BASE_SHIFT,
  BRIG_TYPE_PACK_MASK = ((1 << BRIG_TYPE_PACK_SIZE) - 1)
			<< BRIG_TYPE_PACK_SHIFT,
  BRIG_TYPE_ARRAY_MASK = ((1 << BRIG_TYPE_ARRAY_SIZE) - 1)
			 << BRIG_TYPE_ARRAY_SHIFT,

  BRIG_TYPE_PACK_NONE = 0 << BRIG_TYPE_PACK_SHIFT,
  BRIG_TYPE_PACK_32 = 1 << BRIG_TYPE_PACK_SHIFT,
  BRIG_TYPE_PACK_64 = 2 << BRIG_TYPE_PACK_SHIFT,
  BRIG_TYPE_PACK_128 = 3 << BRIG_TYPE_PACK_SHIFT,

  BRIG_TYPE_ARRAY = 1 << BRIG_TYPE_ARRAY_SHIFT
};

typedef uint16_t BrigType16_t;

enum BrigType
{
  BRIG_TYPE_NONE = 0,

  BRIG_TYPE_U8 = 1,
  BRIG_TYPE_U16 = 2,
  BRIG_TYPE_U32 = 3,
  BRIG_TYPE_U64 = 4,

  BRIG_TYPE_S8 = 5,
  BRIG_TYPE_S16 = 6,
  BRIG_TYPE_S32 = 7,
  BRIG_TYPE_S64 = 8,

  BRIG_TYPE_F16 = 9,
  BRIG_TYPE_F32 = 10,
  BRIG_TYPE_F64 = 11,

  BRIG_TYPE_B1 = 12,
  BRIG_TYPE_B8 = 13,
  BRIG_TYPE_B16 = 14,
  BRIG_TYPE_B32 = 15,
  BRIG_TYPE_B64 = 16,
  BRIG_TYPE_B128 = 17,

  BRIG_TYPE_SAMP = 18,
  BRIG_TYPE_ROIMG = 19,
  BRIG_TYPE_WOIMG = 20,
  BRIG_TYPE_RWIMG = 21,

  BRIG_TYPE_SIG32 = 22,
  BRIG_TYPE_SIG64 = 23,

  BRIG_TYPE_U8X4 = BRIG_TYPE_U8 | BRIG_TYPE_PACK_32,
  BRIG_TYPE_U8X8 = BRIG_TYPE_U8 | BRIG_TYPE_PACK_64,
  BRIG_TYPE_U8X16 = BRIG_TYPE_U8 | BRIG_TYPE_PACK_128,

  BRIG_TYPE_U16X2 = BRIG_TYPE_U16 | BRIG_TYPE_PACK_32,
  BRIG_TYPE_U16X4 = BRIG_TYPE_U16 | BRIG_TYPE_PACK_64,
  BRIG_TYPE_U16X8 = BRIG_TYPE_U16 | BRIG_TYPE_PACK_128,

  BRIG_TYPE_U32X2 = BRIG_TYPE_U32 | BRIG_TYPE_PACK_64,
  BRIG_TYPE_U32X4 = BRIG_TYPE_U32 | BRIG_TYPE_PACK_128,

  BRIG_TYPE_U64X2 = BRIG_TYPE_U64 | BRIG_TYPE_PACK_128,

  BRIG_TYPE_S8X4 = BRIG_TYPE_S8 | BRIG_TYPE_PACK_32,
  BRIG_TYPE_S8X8 = BRIG_TYPE_S8 | BRIG_TYPE_PACK_64,
  BRIG_TYPE_S8X16 = BRIG_TYPE_S8 | BRIG_TYPE_PACK_128,

  BRIG_TYPE_S16X2 = BRIG_TYPE_S16 | BRIG_TYPE_PACK_32,
  BRIG_TYPE_S16X4 = BRIG_TYPE_S16 | BRIG_TYPE_PACK_64,
  BRIG_TYPE_S16X8 = BRIG_TYPE_S16 | BRIG_TYPE_PACK_128,

  BRIG_TYPE_S32X2 = BRIG_TYPE_S32 | BRIG_TYPE_PACK_64,
  BRIG_TYPE_S32X4 = BRIG_TYPE_S32 | BRIG_TYPE_PACK_128,

  BRIG_TYPE_S64X2 = BRIG_TYPE_S64 | BRIG_TYPE_PACK_128,

  BRIG_TYPE_F16X2 = BRIG_TYPE_F16 | BRIG_TYPE_PACK_32,
  BRIG_TYPE_F16X4 = BRIG_TYPE_F16 | BRIG_TYPE_PACK_64,
  BRIG_TYPE_F16X8 = BRIG_TYPE_F16 | BRIG_TYPE_PACK_128,

  BRIG_TYPE_F32X2 = BRIG_TYPE_F32 | BRIG_TYPE_PACK_64,
  BRIG_TYPE_F32X4 = BRIG_TYPE_F32 | BRIG_TYPE_PACK_128,

  BRIG_TYPE_F64X2 = BRIG_TYPE_F64 | BRIG_TYPE_PACK_128,

  BRIG_TYPE_U8_ARRAY = BRIG_TYPE_U8 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_U16_ARRAY = BRIG_TYPE_U16 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_U32_ARRAY = BRIG_TYPE_U32 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_U64_ARRAY = BRIG_TYPE_U64 | BRIG_TYPE_ARRAY,

  BRIG_TYPE_S8_ARRAY = BRIG_TYPE_S8 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_S16_ARRAY = BRIG_TYPE_S16 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_S32_ARRAY = BRIG_TYPE_S32 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_S64_ARRAY = BRIG_TYPE_S64 | BRIG_TYPE_ARRAY,

  BRIG_TYPE_F16_ARRAY = BRIG_TYPE_F16 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_F32_ARRAY = BRIG_TYPE_F32 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_F64_ARRAY = BRIG_TYPE_F64 | BRIG_TYPE_ARRAY,

  BRIG_TYPE_B8_ARRAY = BRIG_TYPE_B8 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_B16_ARRAY = BRIG_TYPE_B16 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_B32_ARRAY = BRIG_TYPE_B32 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_B64_ARRAY = BRIG_TYPE_B64 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_B128_ARRAY = BRIG_TYPE_B128 | BRIG_TYPE_ARRAY,

  BRIG_TYPE_SAMP_ARRAY = BRIG_TYPE_SAMP | BRIG_TYPE_ARRAY,
  BRIG_TYPE_ROIMG_ARRAY = BRIG_TYPE_ROIMG | BRIG_TYPE_ARRAY,
  BRIG_TYPE_WOIMG_ARRAY = BRIG_TYPE_WOIMG | BRIG_TYPE_ARRAY,
  BRIG_TYPE_RWIMG_ARRAY = BRIG_TYPE_RWIMG | BRIG_TYPE_ARRAY,

  BRIG_TYPE_SIG32_ARRAY = BRIG_TYPE_SIG32 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_SIG64_ARRAY = BRIG_TYPE_SIG64 | BRIG_TYPE_ARRAY,

  BRIG_TYPE_U8X4_ARRAY = BRIG_TYPE_U8X4 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_U8X8_ARRAY = BRIG_TYPE_U8X8 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_U8X16_ARRAY = BRIG_TYPE_U8X16 | BRIG_TYPE_ARRAY,

  BRIG_TYPE_U16X2_ARRAY = BRIG_TYPE_U16X2 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_U16X4_ARRAY = BRIG_TYPE_U16X4 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_U16X8_ARRAY = BRIG_TYPE_U16X8 | BRIG_TYPE_ARRAY,

  BRIG_TYPE_U32X2_ARRAY = BRIG_TYPE_U32X2 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_U32X4_ARRAY = BRIG_TYPE_U32X4 | BRIG_TYPE_ARRAY,

  BRIG_TYPE_U64X2_ARRAY = BRIG_TYPE_U64X2 | BRIG_TYPE_ARRAY,

  BRIG_TYPE_S8X4_ARRAY = BRIG_TYPE_S8X4 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_S8X8_ARRAY = BRIG_TYPE_S8X8 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_S8X16_ARRAY = BRIG_TYPE_S8X16 | BRIG_TYPE_ARRAY,

  BRIG_TYPE_S16X2_ARRAY = BRIG_TYPE_S16X2 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_S16X4_ARRAY = BRIG_TYPE_S16X4 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_S16X8_ARRAY = BRIG_TYPE_S16X8 | BRIG_TYPE_ARRAY,

  BRIG_TYPE_S32X2_ARRAY = BRIG_TYPE_S32X2 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_S32X4_ARRAY = BRIG_TYPE_S32X4 | BRIG_TYPE_ARRAY,

  BRIG_TYPE_S64X2_ARRAY = BRIG_TYPE_S64X2 | BRIG_TYPE_ARRAY,

  BRIG_TYPE_F16X2_ARRAY = BRIG_TYPE_F16X2 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_F16X4_ARRAY = BRIG_TYPE_F16X4 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_F16X8_ARRAY = BRIG_TYPE_F16X8 | BRIG_TYPE_ARRAY,

  BRIG_TYPE_F32X2_ARRAY = BRIG_TYPE_F32X2 | BRIG_TYPE_ARRAY,
  BRIG_TYPE_F32X4_ARRAY = BRIG_TYPE_F32X4 | BRIG_TYPE_ARRAY,

  BRIG_TYPE_F64X2_ARRAY = BRIG_TYPE_F64X2 | BRIG_TYPE_ARRAY
};

struct BrigUInt64
{
  uint32_t lo;
  uint32_t hi;
};

typedef uint8_t BrigVariableModifier8_t;

enum BrigVariableModifierMask
{
  BRIG_VARIABLE_DEFINITION = 1,
  BRIG_VARIABLE_CONST = 2
};

enum BrigVersion
{
  BRIG_VERSION_HSAIL_MAJOR = 1,
  BRIG_VERSION_HSAIL_MINOR = 0,
  BRIG_VERSION_BRIG_MAJOR = 1,
  BRIG_VERSION_BRIG_MINOR = 0
};

typedef uint8_t BrigWidth8_t;

enum BrigWidth
{
  BRIG_WIDTH_NONE = 0,
  BRIG_WIDTH_1 = 1,
  BRIG_WIDTH_2 = 2,
  BRIG_WIDTH_4 = 3,
  BRIG_WIDTH_8 = 4,
  BRIG_WIDTH_16 = 5,
  BRIG_WIDTH_32 = 6,
  BRIG_WIDTH_64 = 7,
  BRIG_WIDTH_128 = 8,
  BRIG_WIDTH_256 = 9,
  BRIG_WIDTH_512 = 10,
  BRIG_WIDTH_1024 = 11,
  BRIG_WIDTH_2048 = 12,
  BRIG_WIDTH_4096 = 13,
  BRIG_WIDTH_8192 = 14,
  BRIG_WIDTH_16384 = 15,
  BRIG_WIDTH_32768 = 16,
  BRIG_WIDTH_65536 = 17,
  BRIG_WIDTH_131072 = 18,
  BRIG_WIDTH_262144 = 19,
  BRIG_WIDTH_524288 = 20,
  BRIG_WIDTH_1048576 = 21,
  BRIG_WIDTH_2097152 = 22,
  BRIG_WIDTH_4194304 = 23,
  BRIG_WIDTH_8388608 = 24,
  BRIG_WIDTH_16777216 = 25,
  BRIG_WIDTH_33554432 = 26,
  BRIG_WIDTH_67108864 = 27,
  BRIG_WIDTH_134217728 = 28,
  BRIG_WIDTH_268435456 = 29,
  BRIG_WIDTH_536870912 = 30,
  BRIG_WIDTH_1073741824 = 31,
  BRIG_WIDTH_2147483648 = 32,
  BRIG_WIDTH_WAVESIZE = 33,
  BRIG_WIDTH_ALL = 34
};

struct BrigData
{
  uint32_t byteCount;
  uint8_t bytes[1];
};

struct BrigDirectiveArgBlock
{
  BrigBase base;
};

struct BrigDirectiveComment
{
  BrigBase base;
  BrigDataOffsetString32_t name;
};

struct BrigDirectiveControl
{
  BrigBase base;
  BrigControlDirective16_t control;
  uint16_t reserved;
  BrigDataOffsetOperandList32_t operands;
};

struct BrigDirectiveExecutable
{
  BrigBase base;
  BrigDataOffsetString32_t name;
  uint16_t outArgCount;
  uint16_t inArgCount;
  BrigCodeOffset32_t firstInArg;
  BrigCodeOffset32_t firstCodeBlockEntry;
  BrigCodeOffset32_t nextModuleEntry;
  BrigExecutableModifier8_t modifier;
  BrigLinkage8_t linkage;
  uint16_t reserved;
};

struct BrigDirectiveExtension
{
  BrigBase base;
  BrigDataOffsetString32_t name;
};

struct BrigDirectiveFbarrier
{
  BrigBase base;
  BrigDataOffsetString32_t name;
  BrigVariableModifier8_t modifier;
  BrigLinkage8_t linkage;
  uint16_t reserved;
};

struct BrigDirectiveLabel
{
  BrigBase base;
  BrigDataOffsetString32_t name;
};

struct BrigDirectiveLoc
{
  BrigBase base;
  BrigDataOffsetString32_t filename;
  uint32_t line;
  uint32_t column;
};

struct BrigDirectiveModule
{
  BrigBase base;
  BrigDataOffsetString32_t name;
  BrigVersion32_t hsailMajor;
  BrigVersion32_t hsailMinor;
  BrigProfile8_t profile;
  BrigMachineModel8_t machineModel;
  BrigRound8_t defaultFloatRound;
  uint8_t reserved;
};

struct BrigDirectiveNone
{
  BrigBase base;
};

struct BrigDirectivePragma
{
  BrigBase base;
  BrigDataOffsetOperandList32_t operands;
};

struct BrigDirectiveVariable
{
  BrigBase base;
  BrigDataOffsetString32_t name;
  BrigOperandOffset32_t init;
  BrigType16_t type;
  BrigSegment8_t segment;
  BrigAlignment8_t align;
  BrigUInt64 dim;
  BrigVariableModifier8_t modifier;
  BrigLinkage8_t linkage;
  BrigAllocation8_t allocation;
  uint8_t reserved;
};

struct BrigInstBase
{
  BrigBase base;
  BrigOpcode16_t opcode;
  BrigType16_t type;
  BrigDataOffsetOperandList32_t operands;
};

struct BrigInstAddr
{
  BrigInstBase base;
  BrigSegment8_t segment;
  uint8_t reserved[3];
};

struct BrigInstAtomic
{
  BrigInstBase base;
  BrigSegment8_t segment;
  BrigMemoryOrder8_t memoryOrder;
  BrigMemoryScope8_t memoryScope;
  BrigAtomicOperation8_t atomicOperation;
  uint8_t equivClass;
  uint8_t reserved[3];
};

struct BrigInstBasic
{
  BrigInstBase base;
};

struct BrigInstBr
{
  BrigInstBase base;
  BrigWidth8_t width;
  uint8_t reserved[3];
};

struct BrigInstCmp
{
  BrigInstBase base;
  BrigType16_t sourceType;
  BrigAluModifier8_t modifier;
  BrigCompareOperation8_t compare;
  BrigPack8_t pack;
  uint8_t reserved[3];
};

struct BrigInstCvt
{
  BrigInstBase base;
  BrigType16_t sourceType;
  BrigAluModifier8_t modifier;
  BrigRound8_t round;
};

struct BrigInstImage
{
  BrigInstBase base;
  BrigType16_t imageType;
  BrigType16_t coordType;
  BrigImageGeometry8_t geometry;
  uint8_t equivClass;
  uint16_t reserved;
};

struct BrigInstLane
{
  BrigInstBase base;
  BrigType16_t sourceType;
  BrigWidth8_t width;
  uint8_t reserved;
};

struct BrigInstMem
{
  BrigInstBase base;
  BrigSegment8_t segment;
  BrigAlignment8_t align;
  uint8_t equivClass;
  BrigWidth8_t width;
  BrigMemoryModifier8_t modifier;
  uint8_t reserved[3];
};

struct BrigInstMemFence
{
  BrigInstBase base;
  BrigMemoryOrder8_t memoryOrder;
  BrigMemoryScope8_t globalSegmentMemoryScope;
  BrigMemoryScope8_t groupSegmentMemoryScope;
  BrigMemoryScope8_t imageSegmentMemoryScope;
};

struct BrigInstMod
{
  BrigInstBase base;
  BrigAluModifier8_t modifier;
  BrigRound8_t round;
  BrigPack8_t pack;
  uint8_t reserved;
};

struct BrigInstQueryImage
{
  BrigInstBase base;
  BrigType16_t imageType;
  BrigImageGeometry8_t geometry;
  BrigImageQuery8_t query;
};

struct BrigInstQuerySampler
{
  BrigInstBase base;
  BrigSamplerQuery8_t query;
  uint8_t reserved[3];
};

struct BrigInstQueue
{
  BrigInstBase base;
  BrigSegment8_t segment;
  BrigMemoryOrder8_t memoryOrder;
  uint16_t reserved;
};

struct BrigInstSeg
{
  BrigInstBase base;
  BrigSegment8_t segment;
  uint8_t reserved[3];
};

struct BrigInstSegCvt
{
  BrigInstBase base;
  BrigType16_t sourceType;
  BrigSegment8_t segment;
  BrigSegCvtModifier8_t modifier;
};

struct BrigInstSignal
{
  BrigInstBase base;
  BrigType16_t signalType;
  BrigMemoryOrder8_t memoryOrder;
  BrigAtomicOperation8_t signalOperation;
};

struct BrigInstSourceType
{
  BrigInstBase base;
  BrigType16_t sourceType;
  uint16_t reserved;
};

struct BrigOperandAddress
{
  BrigBase base;
  BrigCodeOffset32_t symbol;
  BrigOperandOffset32_t reg;
  BrigUInt64 offset;
};

struct BrigOperandAlign
{
  BrigBase base;
  BrigAlignment8_t align;
  uint8_t reserved[3];
};

struct BrigOperandCodeList
{
  BrigBase base;
  BrigDataOffsetCodeList32_t elements;
};

struct BrigOperandCodeRef
{
  BrigBase base;
  BrigCodeOffset32_t ref;
};

struct BrigOperandConstantBytes
{
  BrigBase base;
  BrigType16_t type;
  uint16_t reserved;
  BrigDataOffsetString32_t bytes;
};

struct BrigOperandConstantImage
{
  BrigBase base;
  BrigType16_t type;
  BrigImageGeometry8_t geometry;
  BrigImageChannelOrder8_t channelOrder;
  BrigImageChannelType8_t channelType;
  uint8_t reserved[3];
  BrigUInt64 width;
  BrigUInt64 height;
  BrigUInt64 depth;
  BrigUInt64 array;
};

struct BrigOperandConstantOperandList
{
  BrigBase base;
  BrigType16_t type;
  uint16_t reserved;
  BrigDataOffsetOperandList32_t elements;
};

struct BrigOperandConstantSampler
{
  BrigBase base;
  BrigType16_t type;
  BrigSamplerCoordNormalization8_t coord;
  BrigSamplerFilter8_t filter;
  BrigSamplerAddressing8_t addressing;
  uint8_t reserved[3];
};

struct BrigOperandOperandList
{
  BrigBase base;
  BrigDataOffsetOperandList32_t elements;
};

struct BrigOperandRegister
{
  BrigBase base;
  BrigRegisterKind16_t regKind;
  uint16_t regNum;
};

struct BrigOperandString
{
  BrigBase base;
  BrigDataOffsetString32_t string;
};

struct BrigOperandWavesize
{
  BrigBase base;
};

#endif /* HSA_BRIG_FORMAT_H */

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 08/10] HSAIL BRIG description header file
  2016-01-15 10:37         ` Jakub Jelinek
  2016-01-15 12:03           ` Jakub Jelinek
@ 2016-01-15 17:53           ` Mike Stump
  2016-01-15 20:23             ` Richard Biener
  1 sibling, 1 reply; 77+ messages in thread
From: Mike Stump @ 2016-01-15 17:53 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Ian Lance Taylor, GCC Patches, David Edelsohn, Jeff Law

On Jan 15, 2016, at 2:37 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> HSA Foundation grants express permission to any current Founder, Promoter,
> Supporter Contributor, Academic or Associate member of HSA Foundation to
> copy and redistribute UNMODIFIED versions of this specification

So, this isn’t the GNU way.  We need to get permission from and they need to grant us, or they need to sign an assignment or it needs to be reimplemented.

They need to ask themselves, if they want us to support their standard or not.  Getting permission might take a week to a month, but, it is better to go that route.  If they don’t want to grant us what we want, then they are likely to want to sue users of our compiler, and in that case, we are better not putting it in in the first place.

My vote would be for the SC to nix this until the issue is resolved.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 07/10] IPA-HSA pass
  2016-01-15 16:45               ` Jakub Jelinek
@ 2016-01-15 18:06                 ` Ilya Verbin
  2016-01-20 18:54                   ` Ilya Verbin
  2016-01-15 19:19                 ` [hsa merge 07/10] IPA-HSA pass Alexander Monakov
  1 sibling, 1 reply; 77+ messages in thread
From: Ilya Verbin @ 2016-01-15 18:06 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Martin Liska

On Fri, Jan 15, 2016 at 17:45:22 +0100, Jakub Jelinek wrote:
> On Fri, Jan 15, 2016 at 07:38:14PM +0300, Ilya Verbin wrote:
> > On Fri, Jan 15, 2016 at 17:09:54 +0100, Jakub Jelinek wrote:
> > > On Fri, Jan 15, 2016 at 05:02:34PM +0100, Martin Jambor wrote:
> > > > How do other accelerators cope with the situation when half of the
> > > > application is compiled with the accelerator disabled?  (Would some of
> > > > their calls to GOMP_target_ext lead to abort?)
> > > 
> > > GOMP_target_ext should never abort (unless internal error), worst case it
> > > just falls back into the host fallback.
> > 
> > Wouldn't that lead to hard-to-find problems in case of nonshared memory?
> > I mean when someone expects that all target regions are executed on the device,
> > but in fact some of them are silently executed on the host with different data
> > environment.
> 
> E.g. for HSA it really shouldn't matter, as it is shared memory accelerator.
> For XeonPhi we hopefully can offload anything.

As you said, if compilation of target image fails with ICE or somehow, host
fallback and offloading to other targets should still work:
https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00951.html
That patch was not applied, but it can be simulated by -foffload=disable,
I've created a testcase:

$ cat main.c

#pragma omp declare target
int x;
#pragma omp end declare target
extern int foo ();

int main ()
{
  int shared_mem = 0;
  #pragma omp target map (alloc: x, shared_mem)
    {
      x = 10;
      shared_mem = 1;
    }

  x = 20;
  int r = foo ();
  if (!shared_mem && r != 100)
    __builtin_abort ();
  return 0;
}


$ cat liba.c 

#pragma omp declare target
extern int x;
#pragma omp end declare target

int foo ()
{
  int r;
  #pragma omp target map (from: r) map (alloc: x)
    r = x * x;
  return r;
}


$ gcc -fopenmp -fPIC -shared liba.c -o liba.so -foffload=disable
$ gcc -fopenmp -L. -la main.c


Currently it prints "libgomp: Target function wasn't mapped", but after this
change:

--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1390,7 +1390,7 @@ gomp_get_target_fn_addr (struct gomp_device_descr *devicep,
       splay_tree_key tgt_fn = splay_tree_lookup (&devicep->mem_map, &k);
       gomp_mutex_unlock (&devicep->lock);
       if (tgt_fn == NULL)
-	gomp_fatal ("Target function wasn't mapped");
+	return NULL;

... it will fail at __builtin_abort, but without -foffload=disable it will pass.

  -- Ilya

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 07/10] IPA-HSA pass
  2016-01-15 16:45               ` Jakub Jelinek
  2016-01-15 18:06                 ` Ilya Verbin
@ 2016-01-15 19:19                 ` Alexander Monakov
  2016-01-15 19:32                   ` Jakub Jelinek
  1 sibling, 1 reply; 77+ messages in thread
From: Alexander Monakov @ 2016-01-15 19:19 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Ilya Verbin, gcc-patches, Martin Liska

On Fri, 15 Jan 2016, Jakub Jelinek wrote:
> On Fri, Jan 15, 2016 at 07:38:14PM +0300, Ilya Verbin wrote:
> > On Fri, Jan 15, 2016 at 17:09:54 +0100, Jakub Jelinek wrote:
> > > On Fri, Jan 15, 2016 at 05:02:34PM +0100, Martin Jambor wrote:
> > > > How do other accelerators cope with the situation when half of the
> > > > application is compiled with the accelerator disabled?  (Would some of
> > > > their calls to GOMP_target_ext lead to abort?)
> > > 
> > > GOMP_target_ext should never abort (unless internal error), worst case it
> > > just falls back into the host fallback.

Agreed -- the way it aborts today rather than using host fallback looks rather
surprising to me.

> > Wouldn't that lead to hard-to-find problems in case of nonshared memory?
> > I mean when someone expects that all target regions are executed on the device,
> > but in fact some of them are silently executed on the host with different data
> > environment.
> 
> E.g. for HSA it really shouldn't matter, as it is shared memory accelerator.
> For XeonPhi we hopefully can offload anything.  NVPTX is problematic,
> because it can't offload all the code, 

Sorry, can you clarify -- what do you mean by "can't offload"?

> but if it can be e.g. compile time detected that it will not be possible, it
> can just provide offloaded code for the target.

(as a result of previous confusion I can't follow this part either)

Thanks.
Alexander

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 07/10] IPA-HSA pass
  2016-01-15 19:19                 ` [hsa merge 07/10] IPA-HSA pass Alexander Monakov
@ 2016-01-15 19:32                   ` Jakub Jelinek
  2016-01-15 19:46                     ` Alexander Monakov
  0 siblings, 1 reply; 77+ messages in thread
From: Jakub Jelinek @ 2016-01-15 19:32 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: Ilya Verbin, gcc-patches, Martin Liska

On Fri, Jan 15, 2016 at 10:19:13PM +0300, Alexander Monakov wrote:
> Sorry, can you clarify -- what do you mean by "can't offload"?

I meant stuff like setjmp/longjmp, exceptions?, alloca (I know your changes
might fix this one), computed goto, non-local goto, and the like, which I
believe nvptx doesn't support.

	Jakub

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 07/10] IPA-HSA pass
  2016-01-15 19:32                   ` Jakub Jelinek
@ 2016-01-15 19:46                     ` Alexander Monakov
  2016-01-15 19:55                       ` Jakub Jelinek
  0 siblings, 1 reply; 77+ messages in thread
From: Alexander Monakov @ 2016-01-15 19:46 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Ilya Verbin, gcc-patches, Martin Liska

On Fri, 15 Jan 2016, Jakub Jelinek wrote:

> On Fri, Jan 15, 2016 at 10:19:13PM +0300, Alexander Monakov wrote:
> > Sorry, can you clarify -- what do you mean by "can't offload"?
> 
> I meant stuff like setjmp/longjmp, exceptions?, alloca (I know your changes
> might fix this one), computed goto, non-local goto, and the like, which I
> believe nvptx doesn't support.

Right, but such issues are diagnosed as a compile-time error; the run-time
stage is simply not reached.

Did you mean that eventually GCC might change and somehow allow compilation to
run to completion even though offloaded code cannot be fully generated?

Thanks.
Alexander

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 07/10] IPA-HSA pass
  2016-01-15 19:46                     ` Alexander Monakov
@ 2016-01-15 19:55                       ` Jakub Jelinek
  0 siblings, 0 replies; 77+ messages in thread
From: Jakub Jelinek @ 2016-01-15 19:55 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: Ilya Verbin, gcc-patches, Martin Liska

On Fri, Jan 15, 2016 at 10:46:32PM +0300, Alexander Monakov wrote:
> On Fri, 15 Jan 2016, Jakub Jelinek wrote:
> 
> > On Fri, Jan 15, 2016 at 10:19:13PM +0300, Alexander Monakov wrote:
> > > Sorry, can you clarify -- what do you mean by "can't offload"?
> > 
> > I meant stuff like setjmp/longjmp, exceptions?, alloca (I know your changes
> > might fix this one), computed goto, non-local goto, and the like, which I
> > believe nvptx doesn't support.
> 
> Right, but such issues are diagnosed as a compile-time error; the run-time
> stage is simply not reached.
> 
> Did you mean that eventually GCC might change and somehow allow compilation to
> run to completion even though offloaded code cannot be fully generated?

Yeah, at least depending on some option, either
downgrade all errors in the offloading compiler into warnings that just
result in the offloading image for the particular accelerator not being
created, or issue errors, but still allow the linking.

	Jakub

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 08/10] HSAIL BRIG description header file
  2016-01-15 17:53           ` Mike Stump
@ 2016-01-15 20:23             ` Richard Biener
  0 siblings, 0 replies; 77+ messages in thread
From: Richard Biener @ 2016-01-15 20:23 UTC (permalink / raw)
  To: Mike Stump, Jakub Jelinek
  Cc: Ian Lance Taylor, GCC Patches, David Edelsohn, Jeff Law

On January 15, 2016 6:50:51 PM GMT+01:00, Mike Stump <mikestump@comcast.net> wrote:
>On Jan 15, 2016, at 2:37 AM, Jakub Jelinek <jakub@redhat.com> wrote:
>> HSA Foundation grants express permission to any current Founder,
>Promoter,
>> Supporter Contributor, Academic or Associate member of HSA Foundation
>to
>> copy and redistribute UNMODIFIED versions of this specification
>
>So, this isn’t the GNU way.  We need to get permission from and they
>need to grant us, or they need to sign an assignment or it needs to be
>reimplemented.
>
>They need to ask themselves, if they want us to support their standard
>or not.  Getting permission might take a week to a month, but, it is
>better to go that route.  If they don’t want to grant us what we want,
>then they are likely to want to sue users of our compiler, and in that
>case, we are better not putting it in in the first place.
>
>My vote would be for the SC to nix this until the issue is resolved.

It's a non copyrightable set of magic numbers associated to identifiers (we could even change).

Richard.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 09/10] Majority of the HSA back-end
  2016-01-13 18:11 ` [hsa merge 09/10] Majority of the HSA back-end Martin Jambor
  2016-01-14 14:05   ` Jakub Jelinek
@ 2016-01-15 23:49   ` Martin Jambor
  2016-01-16  8:58     ` Jakub Jelinek
  1 sibling, 1 reply; 77+ messages in thread
From: Martin Jambor @ 2016-01-15 23:49 UTC (permalink / raw)
  To: gcc-patches

Hi,

bootstrapping on i686-linux revealed the need for the following simple
patch.  I've run into two types of compilation errors on
powerpc-ibm-aix (no htolenn functions and ASM_GENERATE_INTERNAL_LABEL
somehow expanding to undeclared rs6000_xcoff_strip_dollar).  I plan to
workaround them quickly by making most of the contents of hsa-*.c
files compiled only conditionally (and leave potential hsa support on
non-linux platforms for later), but I will not have time to do the
change and test it properly until Monday.

But that will hopefully really be it,

Martin


2016-01-16  Martin Jambor  <mjambor@suse.cz>

	* hsa-dump.c (dump_hsa_symbol): Add missing argumet cast.

diff --git a/gcc/hsa-dump.c b/gcc/hsa-dump.c
index af79bcb..c5f1f69 100644
--- a/gcc/hsa-dump.c
+++ b/gcc/hsa-dump.c
@@ -720,7 +720,7 @@ dump_hsa_symbol (FILE *f, hsa_symbol *symbol)
 	   hsa_type_name (symbol->m_type & ~BRIG_TYPE_ARRAY_MASK), name);
 
   if (symbol->m_type & BRIG_TYPE_ARRAY_MASK)
-    fprintf (f, "[%lu]", symbol->m_dim);
+    fprintf (f, "[%lu]", (unsigned long) symbol->m_dim);
 }
 
 /* Dump textual representation of HSA IL operand OP to file F.  */

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 09/10] Majority of the HSA back-end
  2016-01-15 23:49   ` Martin Jambor
@ 2016-01-16  8:58     ` Jakub Jelinek
  2016-01-16  9:23       ` Jakub Jelinek
  2016-01-18 15:33       ` Martin Jambor
  0 siblings, 2 replies; 77+ messages in thread
From: Jakub Jelinek @ 2016-01-16  8:58 UTC (permalink / raw)
  To: gcc-patches

On Sat, Jan 16, 2016 at 12:49:12AM +0100, Martin Jambor wrote:
> bootstrapping on i686-linux revealed the need for the following simple
> patch.  I've run into two types of compilation errors on
> powerpc-ibm-aix (no htolenn functions and ASM_GENERATE_INTERNAL_LABEL
> somehow expanding to undeclared rs6000_xcoff_strip_dollar).  I plan to
> workaround them quickly by making most of the contents of hsa-*.c
> files compiled only conditionally (and leave potential hsa support on
> non-linux platforms for later), but I will not have time to do the
> change and test it properly until Monday.
> 
> But that will hopefully really be it,

IMHO you'd be best to write your own helpers for conversion to little
endian (and back).
gcc configure already has AC_C_BIGENDIAN (dunno how it handles pdp endian
host though, so not sure if it is safe to rely on that), for recent GCC
you can use __BYTE_ORDER__ macro to check endianity and __builtin_bswap*.
So perhaps just
#if GCC_VERSION >= 4006
// use __BYTE_ORDER__ and __builtin_bswap or nothing
#else
// provide a safe slower default, with shifts and masking
#endif

As for rs6000_xcoff_strip_dollar, look at other sources that use it what
headers they do include, bet you want to #include "tm_p.h" to make it work.

	Jakub

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 09/10] Majority of the HSA back-end
  2016-01-16  8:58     ` Jakub Jelinek
@ 2016-01-16  9:23       ` Jakub Jelinek
  2016-01-18 15:33       ` Martin Jambor
  1 sibling, 0 replies; 77+ messages in thread
From: Jakub Jelinek @ 2016-01-16  9:23 UTC (permalink / raw)
  To: gcc-patches

On Sat, Jan 16, 2016 at 09:58:51AM +0100, Jakub Jelinek wrote:
> IMHO you'd be best to write your own helpers for conversion to little
> endian (and back).
> gcc configure already has AC_C_BIGENDIAN (dunno how it handles pdp endian
> host though, so not sure if it is safe to rely on that), for recent GCC
> you can use __BYTE_ORDER__ macro to check endianity and __builtin_bswap*.
> So perhaps just
> #if GCC_VERSION >= 4006
> // use __BYTE_ORDER__ and __builtin_bswap or nothing
> #else
> // provide a safe slower default, with shifts and masking
> #endif

Note e.g. libiberty/simple-object* uses the slow version with shifts and
masking all the time, so just optimizing the case of recent GCC host (and
therefore also bootstrapped compilers) might be enough.

	Jakub

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 07/10] IPA-HSA pass
  2016-01-15 11:47     ` Martin Liška
@ 2016-01-16 10:00       ` Jan Hubicka
  2016-01-25 15:22         ` Martin Liška
  0 siblings, 1 reply; 77+ messages in thread
From: Jan Hubicka @ 2016-01-16 10:00 UTC (permalink / raw)
  To: Martin Liška; +Cc: Jan Hubicka, Martin Jambor, GCC Patches

> On 01/15/2016 10:52 AM, Jan Hubicka wrote:
> > Do we really need to look that up in the hsa summary? Why these can not be partitioned the
> > usual way?
> 
> Hi.
> 
> Yes, it's needed as hsa-brig.c uses host function declaration of a kernel as a key for libgomp.
> That's why we want to put the pair to a LTO partition.

Can't it be represented via explicit REF_ADDR or something like that?

Honza
> 
> Martin

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 08/10] HSAIL BRIG description header file
  2016-01-15 17:23             ` Martin Jambor
@ 2016-01-16 11:43               ` Jakub Jelinek
  2016-01-18 15:22                 ` Martin Jambor
  0 siblings, 1 reply; 77+ messages in thread
From: Jakub Jelinek @ 2016-01-16 11:43 UTC (permalink / raw)
  To: Ian Lance Taylor, GCC Patches, David Edelsohn, Jeff Law

On Fri, Jan 15, 2016 at 06:23:05PM +0100, Martin Jambor wrote:
>   BRIG_KIND_OPERAND_REGISTER = 0x300a,
>   BRIG_KIND_OPERAND_STRING = 0x300b,
>   BRIG_KIND_OPERAND_WAVESIZE = 0x3009c,
>   BRIG_KIND_OPERAND_END = 0x300d

The above looks weird, I'd have expected BRIG_KIND_OPERAND_WAVESIZE
to be 0x300c instead.  Bug in the standard?
As typedef uint16_t BrigKind16_t;, I'm afraid this doesn't even fit
into the data type.  Note the original brig header you've posted
had this fixed.

	Jakub

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 08/10] HSAIL BRIG description header file
  2016-01-16 11:43               ` Jakub Jelinek
@ 2016-01-18 15:22                 ` Martin Jambor
  0 siblings, 0 replies; 77+ messages in thread
From: Martin Jambor @ 2016-01-18 15:22 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC Patches

Hi,

On Sat, Jan 16, 2016 at 12:43:07PM +0100, Jakub Jelinek wrote:
> On Fri, Jan 15, 2016 at 06:23:05PM +0100, Martin Jambor wrote:
> >   BRIG_KIND_OPERAND_REGISTER = 0x300a,
> >   BRIG_KIND_OPERAND_STRING = 0x300b,
> >   BRIG_KIND_OPERAND_WAVESIZE = 0x3009c,
> >   BRIG_KIND_OPERAND_END = 0x300d
> 
> The above looks weird, I'd have expected BRIG_KIND_OPERAND_WAVESIZE
> to be 0x300c instead.  Bug in the standard?
> As typedef uint16_t BrigKind16_t;, I'm afraid this doesn't even fit
> into the data type.  Note the original brig header you've posted
> had this fixed.
> 

That is clearly a bug.  We did not catch it whe comparing the compiler
binary because we never use this constant.  Have you found this by
hand or did you do any more systematic comparison?

BRIG is always validated when finalized and I belive that fortunately
this particular bug would be caught by that as would majority of
similar "random" ones.

I am going to commit the following patch to the branch.

Thanks for spotting this.

Martin


2016-01-18  Martin Jambor  <mjambor@suse.cz>

	* hsa-brig-format.h (BrigKind): Fix the value of
	BRIG_KIND_OPERAND_WAVESIZE.
---
 gcc/hsa-brig-format.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/hsa-brig-format.h b/gcc/hsa-brig-format.h
index 247799b..e1c6cd2 100644
--- a/gcc/hsa-brig-format.h
+++ b/gcc/hsa-brig-format.h
@@ -303,7 +303,7 @@ enum BrigKind
   BRIG_KIND_OPERAND_OPERAND_LIST = 0x3009,
   BRIG_KIND_OPERAND_REGISTER = 0x300a,
   BRIG_KIND_OPERAND_STRING = 0x300b,
-  BRIG_KIND_OPERAND_WAVESIZE = 0x3009c,
+  BRIG_KIND_OPERAND_WAVESIZE = 0x300c,
   BRIG_KIND_OPERAND_END = 0x300d
 };
 
-- 
2.6.4


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 09/10] Majority of the HSA back-end
  2016-01-16  8:58     ` Jakub Jelinek
  2016-01-16  9:23       ` Jakub Jelinek
@ 2016-01-18 15:33       ` Martin Jambor
  2016-01-18 15:48         ` Jakub Jelinek
  1 sibling, 1 reply; 77+ messages in thread
From: Martin Jambor @ 2016-01-18 15:33 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches

Hi,

On Sat, Jan 16, 2016 at 09:58:51AM +0100, Jakub Jelinek wrote:
> On Sat, Jan 16, 2016 at 12:49:12AM +0100, Martin Jambor wrote:
> > bootstrapping on i686-linux revealed the need for the following simple
> > patch.  I've run into two types of compilation errors on
> > powerpc-ibm-aix (no htolenn functions and ASM_GENERATE_INTERNAL_LABEL
> > somehow expanding to undeclared rs6000_xcoff_strip_dollar).  I plan to
> > workaround them quickly by making most of the contents of hsa-*.c
> > files compiled only conditionally (and leave potential hsa support on
> > non-linux platforms for later), but I will not have time to do the
> > change and test it properly until Monday.
> > 
> > But that will hopefully really be it,
> 
> IMHO you'd be best to write your own helpers for conversion to little
> endian (and back).
> gcc configure already has AC_C_BIGENDIAN (dunno how it handles pdp endian
> host though, so not sure if it is safe to rely on that), for recent GCC
> you can use __BYTE_ORDER__ macro to check endianity and __builtin_bswap*.
> So perhaps just
> #if GCC_VERSION >= 4006
> // use __BYTE_ORDER__ and __builtin_bswap or nothing
> #else
> // provide a safe slower default, with shifts and masking
> #endif
> 
> As for rs6000_xcoff_strip_dollar, look at other sources that use it what
> headers they do include, bet you want to #include "tm_p.h" to make it work.
> 

thanks for the suggestion.  With the following two patches, I can
compile HSA branch on powerpc-aix.  I'm going to prepare a new patch
with them, bootstrap it on x86_64, i686 and ppc-aix and unless
something new pops up again, I will commit it either at nigh today or
early morning tomorrow.

I have tested the slow paths of little endian conversion only very
rudimentarily but I did.  OTOH, I am actually not quite sure how 64
bit-wide numbers are spaced out on PDP-endian systems.  But I guess it
is OK to fix those only later if I am wrong.

I am also willing to incorporate any feedback later, even if it is
only a matter of style.

Thanks,

Martin


2016-01-18  Martin Jambor  <mjambor@suse.cz>

	* hsa-brig.c: Include target.h and tm_p.h.
---
 gcc/hsa-brig.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/hsa-brig.c b/gcc/hsa-brig.c
index 9260c21..ee06804 100644
--- a/gcc/hsa-brig.c
+++ b/gcc/hsa-brig.c
@@ -23,6 +23,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "system.h"
 #include "coretypes.h"
 #include "tm.h"
+#include "target.h"
+#include "tm_p.h"
 #include "is-a.h"
 #include "vec.h"
 #include "hash-table.h"
-- 
2.6.4


2016-01-18  Martin Jambor  <mjambor@suse.cz>

	* hsa-brig.c (lendian16): New function.  Changed all uses of htole16
	to use it.
	(lendian32): New function.  Changed all uses of htole32 to use it.
	(lendian64): New function.  Changed all uses of htole64 to use it.
---
 gcc/hsa-brig.c | 412 ++++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 245 insertions(+), 167 deletions(-)

diff --git a/gcc/hsa-brig.c b/gcc/hsa-brig.c
index d4e644f..9260c21 100644
--- a/gcc/hsa-brig.c
+++ b/gcc/hsa-brig.c
@@ -44,6 +44,83 @@ along with GCC; see the file COPYING3.  If not see
 #include "hsa.h"
 #include "gomp-constants.h"
 
+/* Convert VAL to little endian form, if necessary.  */
+
+static uint16_t
+lendian16 (uint16_t val)
+{
+#if GCC_VERSION >= 4006
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+  return val;
+#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+  return __builtin_bswap16 (val);
+#else   /* __ORDER_PDP_ENDIAN__ */
+  return val;
+#endif
+#else
+// provide a safe slower default, with shifts and masking
+#ifndef WORDS_BIGENDIAN
+  return val;
+#else
+  return (val >> 8) | (val << 8);
+#endif
+#endif
+}
+
+/* Convert VAL to little endian form, if necessary.  */
+
+static uint32_t
+lendian32 (uint32_t val)
+{
+#if GCC_VERSION >= 4006
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+  return val;
+#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+  return __builtin_bswap32 (val);
+#else  /* __ORDER_PDP_ENDIAN__ */
+  return (val >> 16) | (val << 16);
+#endif
+#else
+// provide a safe slower default, with shifts and masking
+#ifndef WORDS_BIGENDIAN
+  return val;
+#else
+  val  = ((val & 0xff00ff00) >> 8) | ((val & 0xff00ff) << 8);
+  return (val >> 16) | (val << 16);
+#endif
+#endif
+}
+
+/* Convert VAL to little endian form, if necessary.  */
+
+static uint64_t
+lendian64 (uint64_t val)
+{
+#if GCC_VERSION >= 4006
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+  return val;
+#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+  return __builtin_bswap64 (val);
+#else  /* __ORDER_PDP_ENDIAN__ */
+  return (((val & 0xffff) << 48)
+	  | ((val & 0xffff0000) << 16)
+	  | ((val & 0xffff00000000) >> 16)
+	  | ((val & 0xffff000000000000) >> 48));
+#endif
+#else
+// provide a safe slower default, with shifts and masking
+#ifndef WORDS_BIGENDIAN
+  return val;
+#else
+  val = (((val & 0xff00ff00ff00ff00ll) >> 8)
+	 | ((val & 0x00ff00ff00ff00ffll) << 8));
+  val = ((( val & 0xffff0000ffff0000ll) >> 16)
+	 | (( val & 0x0000ffff0000ffffll) << 16));
+  return (val >> 32) | (val << 32);
+#endif
+#endif
+}
+
 #define BRIG_ELF_SECTION_NAME ".brig"
 #define BRIG_LABEL_STRING "hsa_brig"
 #define BRIG_SECTION_DATA_NAME    "hsa_data"
@@ -177,9 +254,9 @@ hsa_brig_section::output ()
   struct BrigSectionHeader section_header;
   char padding[8];
 
-  section_header.byteCount = htole64 (total_size);
-  section_header.headerByteCount = htole32 (header_byte_count);
-  section_header.nameLength = htole32 (strlen (section_name));
+  section_header.byteCount = lendian64 (total_size);
+  section_header.headerByteCount = lendian32 (header_byte_count);
+  section_header.nameLength = lendian32 (strlen (section_name));
   assemble_string ((const char *) &section_header, 16);
   assemble_string (section_name, (section_header.nameLength));
   memset (&padding, 0, sizeof (padding));
@@ -314,7 +391,7 @@ brig_emit_string (const char *str, char prefix = 0, bool sanitize = true)
 {
   unsigned slen = strlen (str);
   unsigned offset, len = slen + (prefix ? 1 : 0);
-  uint32_t hdr_len = htole32 (len);
+  uint32_t hdr_len = lendian32 (len);
   brig_string_slot s_slot;
   brig_string_slot **slot;
   char *str2;
@@ -389,7 +466,7 @@ brig_init (void)
 
   struct BrigDirectiveModule moddir;
   memset (&moddir, 0, sizeof (moddir));
-  moddir.base.byteCount = htole16 (sizeof (moddir));
+  moddir.base.byteCount = lendian16 (sizeof (moddir));
 
   char *modname;
   if (main_input_filename && *main_input_filename != '\0')
@@ -424,9 +501,9 @@ brig_init (void)
     }
   else
     moddir.name = brig_emit_string ("__hsa_module_unnamed", '&');
-  moddir.base.kind = htole16 (BRIG_KIND_DIRECTIVE_MODULE);
-  moddir.hsailMajor = htole32 (BRIG_VERSION_HSAIL_MAJOR);
-  moddir.hsailMinor = htole32 (BRIG_VERSION_HSAIL_MINOR);
+  moddir.base.kind = lendian16 (BRIG_KIND_DIRECTIVE_MODULE);
+  moddir.hsailMajor = lendian32 (BRIG_VERSION_HSAIL_MAJOR);
+  moddir.hsailMinor = lendian32 (BRIG_VERSION_HSAIL_MINOR);
   moddir.profile = hsa_full_profile_p () ? BRIG_PROFILE_FULL: BRIG_PROFILE_BASE;
   if (hsa_machine_large_p ())
     moddir.machineModel = BRIG_MACHINE_LARGE;
@@ -500,8 +577,8 @@ emit_directive_variable (struct hsa_symbol *symbol)
     return symbol->m_directive_offset;
 
   memset (&dirvar, 0, sizeof (dirvar));
-  dirvar.base.byteCount = htole16 (sizeof (dirvar));
-  dirvar.base.kind = htole16 (BRIG_KIND_DIRECTIVE_VARIABLE);
+  dirvar.base.byteCount = lendian16 (sizeof (dirvar));
+  dirvar.base.kind = lendian16 (BRIG_KIND_DIRECTIVE_VARIABLE);
   dirvar.allocation = symbol->m_allocation;
 
   char prefix = symbol->m_global_scope_p ? '&' : '%';
@@ -522,9 +599,9 @@ emit_directive_variable (struct hsa_symbol *symbol)
       name_offset = brig_emit_string (buf, prefix);
     }
 
-  dirvar.name = htole32 (name_offset);
+  dirvar.name = lendian32 (name_offset);
   dirvar.init = 0;
-  dirvar.type = htole16 (symbol->m_type);
+  dirvar.type = lendian16 (symbol->m_type);
   dirvar.segment = symbol->m_segment;
   /* TODO: Once we are able to access global variables, we must copy their
      alignment.  */
@@ -542,7 +619,7 @@ emit_directive_variable (struct hsa_symbol *symbol)
   if (symbol->m_cst_value)
     {
       dirvar.modifier |= BRIG_VARIABLE_CONST;
-      dirvar.init = htole32 (enqueue_op (symbol->m_cst_value));
+      dirvar.init = lendian32 (enqueue_op (symbol->m_cst_value));
     }
 
   symbol->m_directive_offset = brig_code.add (&dirvar, sizeof (dirvar));
@@ -583,15 +660,15 @@ emit_function_directives (hsa_function_representation *f, bool is_declaration)
   next_toplev_off = scoped_off + count * sizeof (struct BrigDirectiveVariable);
 
   memset (&fndir, 0, sizeof (fndir));
-  fndir.base.byteCount = htole16 (sizeof (fndir));
-  fndir.base.kind = htole16 (f->m_kern_p ? BRIG_KIND_DIRECTIVE_KERNEL
-			     : BRIG_KIND_DIRECTIVE_FUNCTION);
-  fndir.name = htole32 (name_offset);
-  fndir.inArgCount = htole16 (f->m_input_args.length ());
-  fndir.outArgCount = htole16 (f->m_output_arg ? 1 : 0);
-  fndir.firstInArg = htole32 (inarg_off);
-  fndir.firstCodeBlockEntry = htole32 (scoped_off);
-  fndir.nextModuleEntry = htole32 (next_toplev_off);
+  fndir.base.byteCount = lendian16 (sizeof (fndir));
+  fndir.base.kind = lendian16 (f->m_kern_p ? BRIG_KIND_DIRECTIVE_KERNEL
+			       : BRIG_KIND_DIRECTIVE_FUNCTION);
+  fndir.name = lendian32 (name_offset);
+  fndir.inArgCount = lendian16 (f->m_input_args.length ());
+  fndir.outArgCount = lendian16 (f->m_output_arg ? 1 : 0);
+  fndir.firstInArg = lendian32 (inarg_off);
+  fndir.firstCodeBlockEntry = lendian32 (scoped_off);
+  fndir.nextModuleEntry = lendian32 (next_toplev_off);
   fndir.linkage = f->get_linkage ();
   if (!f->m_declaration_p)
     fndir.modifier |= BRIG_EXECUTABLE_DEFINITION;
@@ -656,12 +733,12 @@ emit_bb_label_directive (hsa_bb *hbb)
 {
   struct BrigDirectiveLabel lbldir;
 
-  lbldir.base.byteCount = htole16 (sizeof (lbldir));
-  lbldir.base.kind = htole16 (BRIG_KIND_DIRECTIVE_LABEL);
+  lbldir.base.byteCount = lendian16 (sizeof (lbldir));
+  lbldir.base.kind = lendian16 (BRIG_KIND_DIRECTIVE_LABEL);
   char buf[32];
   snprintf (buf, 32, "BB_%u_%i", DECL_UID (current_function_decl),
 	    hbb->m_index);
-  lbldir.name = htole32 (brig_emit_string (buf, '@'));
+  lbldir.name = lendian32 (brig_emit_string (buf, '@'));
 
   hbb->m_label_ref.m_directive_offset = brig_code.add (&lbldir,
 						       sizeof (lbldir));
@@ -921,11 +998,11 @@ emit_immediate_operand (hsa_op_immed *imm)
   struct BrigOperandConstantBytes out;
 
   memset (&out, 0, sizeof (out));
-  out.base.byteCount = htole16 (sizeof (out));
-  out.base.kind = htole16 (BRIG_KIND_OPERAND_CONSTANT_BYTES);
-  uint32_t byteCount = htole32 (imm->m_brig_repr_size);
-  out.type = htole16 (imm->m_type);
-  out.bytes = htole32 (brig_data.add (&byteCount, sizeof (byteCount)));
+  out.base.byteCount = lendian16 (sizeof (out));
+  out.base.kind = lendian16 (BRIG_KIND_OPERAND_CONSTANT_BYTES);
+  uint32_t byteCount = lendian32 (imm->m_brig_repr_size);
+  out.type = lendian16 (imm->m_type);
+  out.bytes = lendian32 (brig_data.add (&byteCount, sizeof (byteCount)));
   brig_operand.add (&out, sizeof (out));
   brig_data.add (imm->m_brig_repr, imm->m_brig_repr_size);
   brig_data.round_size_up (4);
@@ -938,9 +1015,9 @@ emit_register_operand (hsa_op_reg *reg)
 {
   struct BrigOperandRegister out;
 
-  out.base.byteCount = htole16 (sizeof (out));
-  out.base.kind = htole16 (BRIG_KIND_OPERAND_REGISTER);
-  out.regNum = htole32 (reg->m_hard_num);
+  out.base.byteCount = lendian16 (sizeof (out));
+  out.base.kind = lendian16 (BRIG_KIND_OPERAND_REGISTER);
+  out.regNum = lendian32 (reg->m_hard_num);
 
   switch (regtype_for_type (reg->m_type))
     {
@@ -970,21 +1047,21 @@ emit_address_operand (hsa_op_address *addr)
 {
   struct BrigOperandAddress out;
 
-  out.base.byteCount = htole16 (sizeof (out));
-  out.base.kind = htole16 (BRIG_KIND_OPERAND_ADDRESS);
+  out.base.byteCount = lendian16 (sizeof (out));
+  out.base.kind = lendian16 (BRIG_KIND_OPERAND_ADDRESS);
   out.symbol = addr->m_symbol
-    ? htole32 (emit_directive_variable (addr->m_symbol)) : 0;
-  out.reg = addr->m_reg ? htole32 (enqueue_op (addr->m_reg)) : 0;
+    ? lendian32 (emit_directive_variable (addr->m_symbol)) : 0;
+  out.reg = addr->m_reg ? lendian32 (enqueue_op (addr->m_reg)) : 0;
 
   if (sizeof (addr->m_imm_offset) == 8)
     {
-      out.offset.lo = htole32 (addr->m_imm_offset);
-      out.offset.hi = htole32 (addr->m_imm_offset >> 32);
+      out.offset.lo = lendian32 (addr->m_imm_offset);
+      out.offset.hi = lendian32 (addr->m_imm_offset >> 32);
     }
   else
     {
       gcc_assert (sizeof (addr->m_imm_offset) == 4);
-      out.offset.lo = htole32 (addr->m_imm_offset);
+      out.offset.lo = lendian32 (addr->m_imm_offset);
       out.offset.hi = 0;
     }
 
@@ -998,9 +1075,9 @@ emit_code_ref_operand (hsa_op_code_ref *ref)
 {
   struct BrigOperandCodeRef out;
 
-  out.base.byteCount = htole16 (sizeof (out));
-  out.base.kind = htole16 (BRIG_KIND_OPERAND_CODE_REF);
-  out.ref = htole32 (ref->m_directive_offset);
+  out.base.byteCount = lendian16 (sizeof (out));
+  out.base.kind = lendian16 (BRIG_KIND_OPERAND_CODE_REF);
+  out.ref = lendian32 (ref->m_directive_offset);
   brig_operand.add (&out, sizeof (out));
 }
 
@@ -1015,12 +1092,12 @@ emit_code_list_operand (hsa_op_code_list *code_list)
   for (unsigned i = 0; i < args; i++)
     gcc_assert (code_list->m_offsets[i]);
 
-  out.base.byteCount = htole16 (sizeof (out));
-  out.base.kind = htole16 (BRIG_KIND_OPERAND_CODE_LIST);
+  out.base.byteCount = lendian16 (sizeof (out));
+  out.base.kind = lendian16 (BRIG_KIND_OPERAND_CODE_LIST);
 
-  uint32_t byteCount = htole32 (4 * args);
+  uint32_t byteCount = lendian32 (4 * args);
 
-  out.elements = htole32 (brig_data.add (&byteCount, sizeof (byteCount)));
+  out.elements = lendian32 (brig_data.add (&byteCount, sizeof (byteCount)));
   brig_data.add (code_list->m_offsets.address (), args * sizeof (uint32_t));
   brig_data.round_size_up (4);
   brig_operand.add (&out, sizeof (out));
@@ -1037,12 +1114,12 @@ emit_operand_list_operand (hsa_op_operand_list *operand_list)
   for (unsigned i = 0; i < args; i++)
     gcc_assert (operand_list->m_offsets[i]);
 
-  out.base.byteCount = htole16 (sizeof (out));
-  out.base.kind = htole16 (BRIG_KIND_OPERAND_OPERAND_LIST);
+  out.base.byteCount = lendian16 (sizeof (out));
+  out.base.kind = lendian16 (BRIG_KIND_OPERAND_OPERAND_LIST);
 
-  uint32_t byteCount = htole32 (4 * args);
+  uint32_t byteCount = lendian32 (4 * args);
 
-  out.elements = htole32 (brig_data.add (&byteCount, sizeof (byteCount)));
+  out.elements = lendian32 (brig_data.add (&byteCount, sizeof (byteCount)));
   brig_data.add (operand_list->m_offsets.address (), args * sizeof (uint32_t));
   brig_data.round_size_up (4);
   brig_operand.add (&out, sizeof (out));
@@ -1118,10 +1195,10 @@ emit_insn_operands (hsa_insn_basic *insn)
   operand_offsets.safe_grow (l);
 
   for (unsigned i = 0; i < l; i++)
-    operand_offsets[i] = htole32 (enqueue_op (insn->get_op (i)));
+    operand_offsets[i] = lendian32 (enqueue_op (insn->get_op (i)));
 
   /* We have N operands so use 4 * N for the byte_count.  */
-  uint32_t byte_count = htole32 (4 * l);
+  uint32_t byte_count = lendian32 (4 * l);
 
   unsigned offset = brig_data.add (&byte_count, sizeof (byte_count));
   brig_data.add (operand_offsets.address (),
@@ -1155,7 +1232,7 @@ emit_operands (hsa_op_base *op0, hsa_op_base *op1 = NULL,
   unsigned l = operand_offsets.length ();
 
   /* We have N operands so use 4 * N for the byte_count.  */
-  uint32_t byte_count = htole32 (4 * l);
+  uint32_t byte_count = lendian32 (4 * l);
 
   unsigned offset = brig_data.add (&byte_count, sizeof (byte_count));
   brig_data.add (operand_offsets.address (),
@@ -1182,11 +1259,11 @@ emit_memory_insn (hsa_insn_mem *mem)
      random stuff (which we do not want so that we can test things don't
      change).  */
   memset (&repr, 0, sizeof (repr));
-  repr.base.base.byteCount = htole16 (sizeof (repr));
-  repr.base.base.kind = htole16 (BRIG_KIND_INST_MEM);
-  repr.base.opcode = htole16 (mem->m_opcode);
-  repr.base.type = htole16 (mem->m_type);
-  repr.base.operands = htole32 (emit_insn_operands (mem));
+  repr.base.base.byteCount = lendian16 (sizeof (repr));
+  repr.base.base.kind = lendian16 (BRIG_KIND_INST_MEM);
+  repr.base.opcode = lendian16 (mem->m_opcode);
+  repr.base.type = lendian16 (mem->m_type);
+  repr.base.operands = lendian32 (emit_insn_operands (mem));
 
   if (addr->m_symbol)
     repr.segment = addr->m_symbol->m_segment;
@@ -1217,11 +1294,11 @@ emit_signal_insn (hsa_insn_signal *mem)
      random stuff (which we do not want so that we can test things don't
      change).  */
   memset (&repr, 0, sizeof (repr));
-  repr.base.base.byteCount = htole16 (sizeof (repr));
-  repr.base.base.kind = htole16 (BRIG_KIND_INST_SIGNAL);
-  repr.base.opcode = htole16 (mem->m_opcode);
-  repr.base.type = htole16 (mem->m_type);
-  repr.base.operands = htole32 (emit_insn_operands (mem));
+  repr.base.base.byteCount = lendian16 (sizeof (repr));
+  repr.base.base.kind = lendian16 (BRIG_KIND_INST_SIGNAL);
+  repr.base.opcode = lendian16 (mem->m_opcode);
+  repr.base.type = lendian16 (mem->m_type);
+  repr.base.operands = lendian32 (emit_insn_operands (mem));
 
   repr.memoryOrder = mem->m_memoryorder;
   repr.signalOperation = mem->m_atomicop;
@@ -1251,11 +1328,11 @@ emit_atomic_insn (hsa_insn_atomic *mem)
      random stuff (which we do not want so that we can test things don't
      change).  */
   memset (&repr, 0, sizeof (repr));
-  repr.base.base.byteCount = htole16 (sizeof (repr));
-  repr.base.base.kind = htole16 (BRIG_KIND_INST_ATOMIC);
-  repr.base.opcode = htole16 (mem->m_opcode);
-  repr.base.type = htole16 (mem->m_type);
-  repr.base.operands = htole32 (emit_insn_operands (mem));
+  repr.base.base.byteCount = lendian16 (sizeof (repr));
+  repr.base.base.kind = lendian16 (BRIG_KIND_INST_ATOMIC);
+  repr.base.opcode = lendian16 (mem->m_opcode);
+  repr.base.type = lendian16 (mem->m_type);
+  repr.base.operands = lendian32 (emit_insn_operands (mem));
 
   if (addr->m_symbol)
     repr.segment = addr->m_symbol->m_segment;
@@ -1279,11 +1356,11 @@ emit_addr_insn (hsa_insn_basic *insn)
 
   hsa_op_address *addr = as_a <hsa_op_address *> (insn->get_op (1));
 
-  repr.base.base.byteCount = htole16 (sizeof (repr));
-  repr.base.base.kind = htole16 (BRIG_KIND_INST_ADDR);
-  repr.base.opcode = htole16 (insn->m_opcode);
-  repr.base.type = htole16 (insn->m_type);
-  repr.base.operands = htole32 (emit_insn_operands (insn));
+  repr.base.base.byteCount = lendian16 (sizeof (repr));
+  repr.base.base.kind = lendian16 (BRIG_KIND_INST_ADDR);
+  repr.base.opcode = lendian16 (insn->m_opcode);
+  repr.base.type = lendian16 (insn->m_type);
+  repr.base.operands = lendian32 (emit_insn_operands (insn));
 
   if (addr->m_symbol)
     repr.segment = addr->m_symbol->m_segment;
@@ -1303,12 +1380,12 @@ emit_segment_insn (hsa_insn_seg *seg)
 {
   struct BrigInstSegCvt repr;
 
-  repr.base.base.byteCount = htole16 (sizeof (repr));
-  repr.base.base.kind = htole16 (BRIG_KIND_INST_SEG_CVT);
-  repr.base.opcode = htole16 (seg->m_opcode);
-  repr.base.type = htole16 (seg->m_type);
-  repr.base.operands = htole32 (emit_insn_operands (seg));
-  repr.sourceType = htole16 (as_a <hsa_op_reg *> (seg->get_op (1))->m_type);
+  repr.base.base.byteCount = lendian16 (sizeof (repr));
+  repr.base.base.kind = lendian16 (BRIG_KIND_INST_SEG_CVT);
+  repr.base.opcode = lendian16 (seg->m_opcode);
+  repr.base.type = lendian16 (seg->m_type);
+  repr.base.operands = lendian32 (emit_insn_operands (seg));
+  repr.sourceType = lendian16 (as_a <hsa_op_reg *> (seg->get_op (1))->m_type);
   repr.segment = seg->m_segment;
   repr.modifier = 0;
 
@@ -1331,11 +1408,11 @@ emit_alloca_insn (hsa_insn_alloca *alloca)
      random stuff (which we do not want so that we can test things don't
      change).  */
   memset (&repr, 0, sizeof (repr));
-  repr.base.base.byteCount = htole16 (sizeof (repr));
-  repr.base.base.kind = htole16 (BRIG_KIND_INST_MEM);
-  repr.base.opcode = htole16 (alloca->m_opcode);
-  repr.base.type = htole16 (alloca->m_type);
-  repr.base.operands = htole32 (emit_insn_operands (alloca));
+  repr.base.base.byteCount = lendian16 (sizeof (repr));
+  repr.base.base.kind = lendian16 (BRIG_KIND_INST_MEM);
+  repr.base.opcode = lendian16 (alloca->m_opcode);
+  repr.base.type = lendian16 (alloca->m_type);
+  repr.base.operands = lendian32 (emit_insn_operands (alloca));
   repr.segment = BRIG_SEGMENT_PRIVATE;
   repr.modifier = 0;
   repr.equivClass = 0;
@@ -1355,17 +1432,18 @@ emit_cmp_insn (hsa_insn_cmp *cmp)
   struct BrigInstCmp repr;
 
   memset (&repr, 0, sizeof (repr));
-  repr.base.base.byteCount = htole16 (sizeof (repr));
-  repr.base.base.kind = htole16 (BRIG_KIND_INST_CMP);
-  repr.base.opcode = htole16 (cmp->m_opcode);
-  repr.base.type = htole16 (cmp->m_type);
-  repr.base.operands = htole32 (emit_insn_operands (cmp));
+  repr.base.base.byteCount = lendian16 (sizeof (repr));
+  repr.base.base.kind = lendian16 (BRIG_KIND_INST_CMP);
+  repr.base.opcode = lendian16 (cmp->m_opcode);
+  repr.base.type = lendian16 (cmp->m_type);
+  repr.base.operands = lendian32 (emit_insn_operands (cmp));
 
   if (is_a <hsa_op_reg *> (cmp->get_op (1)))
-    repr.sourceType = htole16 (as_a <hsa_op_reg *> (cmp->get_op (1))->m_type);
+    repr.sourceType
+      = lendian16 (as_a <hsa_op_reg *> (cmp->get_op (1))->m_type);
   else
     repr.sourceType
-      = htole16 (as_a <hsa_op_immed *> (cmp->get_op (1))->m_type);
+      = lendian16 (as_a <hsa_op_immed *> (cmp->get_op (1))->m_type);
   repr.modifier = 0;
   repr.compare = cmp->m_compare;
   repr.pack = 0;
@@ -1388,12 +1466,12 @@ emit_branch_insn (hsa_insn_br *br)
 
   /* At the moment we only handle direct conditional jumps.  */
   gcc_assert (br->m_opcode == BRIG_OPCODE_CBR);
-  repr.base.base.byteCount = htole16 (sizeof (repr));
-  repr.base.base.kind = htole16 (BRIG_KIND_INST_BR);
-  repr.base.opcode = htole16 (br->m_opcode);
+  repr.base.base.byteCount = lendian16 (sizeof (repr));
+  repr.base.base.kind = lendian16 (BRIG_KIND_INST_BR);
+  repr.base.opcode = lendian16 (br->m_opcode);
   repr.width = BRIG_WIDTH_1;
   /* For Conditional jumps the type is always B1.  */
-  repr.base.type = htole16 (BRIG_TYPE_B1);
+  repr.base.type = lendian16 (BRIG_TYPE_B1);
 
   FOR_EACH_EDGE (e, ei, br->m_bb->succs)
     if (e->flags & EDGE_TRUE_VALUE)
@@ -1404,8 +1482,8 @@ emit_branch_insn (hsa_insn_br *br)
   gcc_assert (target);
 
   repr.base.operands
-    = htole32 (emit_operands (br->get_op (0),
-			      &hsa_bb_for_bb (target)->m_label_ref));
+    = lendian32 (emit_operands (br->get_op (0),
+				&hsa_bb_for_bb (target)->m_label_ref));
   memset (&repr.reserved, 0, sizeof (repr.reserved));
 
   brig_code.add (&repr, sizeof (repr));
@@ -1420,14 +1498,14 @@ emit_unconditional_jump (hsa_op_code_ref *reference)
 {
   struct BrigInstBr repr;
 
-  repr.base.base.byteCount = htole16 (sizeof (repr));
-  repr.base.base.kind = htole16 (BRIG_KIND_INST_BR);
-  repr.base.opcode = htole16 (BRIG_OPCODE_BR);
-  repr.base.type = htole16 (BRIG_TYPE_NONE);
+  repr.base.base.byteCount = lendian16 (sizeof (repr));
+  repr.base.base.kind = lendian16 (BRIG_KIND_INST_BR);
+  repr.base.opcode = lendian16 (BRIG_OPCODE_BR);
+  repr.base.type = lendian16 (BRIG_TYPE_NONE);
   /* Direct branches to labels must be width(all).  */
   repr.width = BRIG_WIDTH_ALL;
 
-  repr.base.operands = htole32 (emit_operands (reference));
+  repr.base.operands = lendian32 (emit_operands (reference));
   memset (&repr.reserved, 0, sizeof (repr.reserved));
   brig_code.add (&repr, sizeof (repr));
   brig_insn_count++;
@@ -1442,15 +1520,15 @@ emit_switch_insn (hsa_insn_sbr *sbr)
   struct BrigInstBr repr;
 
   gcc_assert (sbr->m_opcode == BRIG_OPCODE_SBR);
-  repr.base.base.byteCount = htole16 (sizeof (repr));
-  repr.base.base.kind = htole16 (BRIG_KIND_INST_BR);
-  repr.base.opcode = htole16 (sbr->m_opcode);
+  repr.base.base.byteCount = lendian16 (sizeof (repr));
+  repr.base.base.kind = lendian16 (BRIG_KIND_INST_BR);
+  repr.base.opcode = lendian16 (sbr->m_opcode);
   repr.width = BRIG_WIDTH_1;
   /* For Conditional jumps the type is always B1.  */
   hsa_op_reg *index = as_a <hsa_op_reg *> (sbr->get_op (0));
-  repr.base.type = htole16 (index->m_type);
+  repr.base.type = lendian16 (index->m_type);
   repr.base.operands
-    = htole32 (emit_operands (sbr->get_op (0), sbr->m_label_code_list));
+    = lendian32 (emit_operands (sbr->get_op (0), sbr->m_label_code_list));
   memset (&repr.reserved, 0, sizeof (repr.reserved));
 
   brig_code.add (&repr, sizeof (repr));
@@ -1470,17 +1548,17 @@ emit_cvt_insn (hsa_insn_cvt *insn)
   struct BrigInstCvt repr;
   BrigType16_t srctype;
 
-  repr.base.base.byteCount = htole16 (sizeof (repr));
-  repr.base.base.kind = htole16 (BRIG_KIND_INST_CVT);
-  repr.base.opcode = htole16 (insn->m_opcode);
-  repr.base.type = htole16 (insn->m_type);
-  repr.base.operands = htole32 (emit_insn_operands (insn));
+  repr.base.base.byteCount = lendian16 (sizeof (repr));
+  repr.base.base.kind = lendian16 (BRIG_KIND_INST_CVT);
+  repr.base.opcode = lendian16 (insn->m_opcode);
+  repr.base.type = lendian16 (insn->m_type);
+  repr.base.operands = lendian32 (emit_insn_operands (insn));
 
   if (is_a <hsa_op_reg *> (insn->get_op (1)))
     srctype = as_a <hsa_op_reg *> (insn->get_op (1))->m_type;
   else
     srctype = as_a <hsa_op_immed *> (insn->get_op (1))->m_type;
-  repr.sourceType = htole16 (srctype);
+  repr.sourceType = lendian16 (srctype);
   repr.modifier = 0;
   /* float to smaller float requires a rounding setting (we default
      to 'near'.  */
@@ -1506,14 +1584,14 @@ emit_call_insn (hsa_insn_call *call)
 {
   struct BrigInstBr repr;
 
-  repr.base.base.byteCount = htole16 (sizeof (repr));
-  repr.base.base.kind = htole16 (BRIG_KIND_INST_BR);
-  repr.base.opcode = htole16 (BRIG_OPCODE_CALL);
-  repr.base.type = htole16 (BRIG_TYPE_NONE);
+  repr.base.base.byteCount = lendian16 (sizeof (repr));
+  repr.base.base.kind = lendian16 (BRIG_KIND_INST_BR);
+  repr.base.opcode = lendian16 (BRIG_OPCODE_CALL);
+  repr.base.type = lendian16 (BRIG_TYPE_NONE);
 
   repr.base.operands
-    = htole32 (emit_operands (call->m_result_code_list, &call->m_func,
-			      call->m_args_code_list));
+    = lendian32 (emit_operands (call->m_result_code_list, &call->m_func,
+				call->m_args_code_list));
 
   /* Internal functions have not set m_called_function.  */
   if (call->m_called_function)
@@ -1548,23 +1626,23 @@ emit_arg_block_insn (hsa_insn_arg_block *insn)
     case BRIG_KIND_DIRECTIVE_ARG_BLOCK_START:
       {
 	struct BrigDirectiveArgBlock repr;
-	repr.base.byteCount = htole16 (sizeof (repr));
-	repr.base.kind = htole16 (insn->m_kind);
+	repr.base.byteCount = lendian16 (sizeof (repr));
+	repr.base.kind = lendian16 (insn->m_kind);
 	brig_code.add (&repr, sizeof (repr));
 
 	for (unsigned i = 0; i < insn->m_call_insn->m_input_args.length (); i++)
 	  {
 	    insn->m_call_insn->m_args_code_list->m_offsets[i]
-	      = htole32 (emit_directive_variable
-			 (insn->m_call_insn->m_input_args[i]));
+	      = lendian32 (emit_directive_variable
+			   (insn->m_call_insn->m_input_args[i]));
 	    brig_insn_count++;
 	  }
 
 	if (insn->m_call_insn->m_output_arg)
 	  {
 	    insn->m_call_insn->m_result_code_list->m_offsets[0]
-	      = htole32 (emit_directive_variable
-			 (insn->m_call_insn->m_output_arg));
+	      = lendian32 (emit_directive_variable
+			   (insn->m_call_insn->m_output_arg));
 	    brig_insn_count++;
 	  }
 
@@ -1573,8 +1651,8 @@ emit_arg_block_insn (hsa_insn_arg_block *insn)
     case BRIG_KIND_DIRECTIVE_ARG_BLOCK_END:
       {
 	struct BrigDirectiveArgBlock repr;
-	repr.base.byteCount = htole16 (sizeof (repr));
-	repr.base.kind = htole16 (insn->m_kind);
+	repr.base.byteCount = lendian16 (sizeof (repr));
+	repr.base.kind = lendian16 (insn->m_kind);
 	brig_code.add (&repr, sizeof (repr));
 	break;
       }
@@ -1593,8 +1671,8 @@ emit_comment_insn (hsa_insn_comment *insn)
   struct BrigDirectiveComment repr;
   memset (&repr, 0, sizeof (repr));
 
-  repr.base.byteCount = htole16 (sizeof (repr));
-  repr.base.kind = htole16 (insn->m_opcode);
+  repr.base.byteCount = lendian16 (sizeof (repr));
+  repr.base.kind = lendian16 (insn->m_opcode);
   repr.name = brig_emit_string (insn->m_comment, '\0', false);
   brig_code.add (&repr, sizeof (repr));
 }
@@ -1607,13 +1685,13 @@ emit_queue_insn (hsa_insn_queue *insn)
   BrigInstQueue repr;
   memset (&repr, 0, sizeof (repr));
 
-  repr.base.base.byteCount = htole16 (sizeof (repr));
-  repr.base.base.kind = htole16 (BRIG_KIND_INST_QUEUE);
-  repr.base.opcode = htole16 (insn->m_opcode);
-  repr.base.type = htole16 (insn->m_type);
+  repr.base.base.byteCount = lendian16 (sizeof (repr));
+  repr.base.base.kind = lendian16 (BRIG_KIND_INST_QUEUE);
+  repr.base.opcode = lendian16 (insn->m_opcode);
+  repr.base.type = lendian16 (insn->m_type);
   repr.segment = BRIG_SEGMENT_GLOBAL;
   repr.memoryOrder = BRIG_MEMORY_ORDER_SC_RELEASE;
-  repr.base.operands = htole32 (emit_insn_operands (insn));
+  repr.base.operands = lendian32 (emit_insn_operands (insn));
   brig_data.round_size_up (4);
   brig_code.add (&repr, sizeof (repr));
 
@@ -1631,13 +1709,13 @@ emit_srctype_insn (hsa_insn_srctype *insn)
   gcc_checking_assert (operand_count >= 2);
 
   memset (&repr, 0, sizeof (repr));
-  repr.sourceType = htole16 (insn->m_source_type);
-  repr.base.base.byteCount = htole16 (sizeof (repr));
-  repr.base.base.kind = htole16 (BRIG_KIND_INST_SOURCE_TYPE);
-  repr.base.opcode = htole16 (insn->m_opcode);
-  repr.base.type = htole16 (insn->m_type);
+  repr.sourceType = lendian16 (insn->m_source_type);
+  repr.base.base.byteCount = lendian16 (sizeof (repr));
+  repr.base.base.kind = lendian16 (BRIG_KIND_INST_SOURCE_TYPE);
+  repr.base.opcode = lendian16 (insn->m_opcode);
+  repr.base.type = lendian16 (insn->m_type);
 
-  repr.base.operands = htole32 (emit_insn_operands (insn));
+  repr.base.operands = lendian32 (emit_insn_operands (insn));
   brig_code.add (&repr, sizeof (struct BrigInstSourceType));
   brig_insn_count++;
 }
@@ -1653,11 +1731,11 @@ emit_packed_insn (hsa_insn_packed *insn)
   gcc_checking_assert (operand_count >= 2);
 
   memset (&repr, 0, sizeof (repr));
-  repr.sourceType = htole16 (insn->m_source_type);
-  repr.base.base.byteCount = htole16 (sizeof (repr));
-  repr.base.base.kind = htole16 (BRIG_KIND_INST_SOURCE_TYPE);
-  repr.base.opcode = htole16 (insn->m_opcode);
-  repr.base.type = htole16 (insn->m_type);
+  repr.sourceType = lendian16 (insn->m_source_type);
+  repr.base.base.byteCount = lendian16 (sizeof (repr));
+  repr.base.base.kind = lendian16 (BRIG_KIND_INST_SOURCE_TYPE);
+  repr.base.opcode = lendian16 (insn->m_opcode);
+  repr.base.type = lendian16 (insn->m_type);
 
   if (insn->m_opcode == BRIG_OPCODE_COMBINE)
     {
@@ -1666,11 +1744,11 @@ emit_packed_insn (hsa_insn_packed *insn)
 	{
 	  gcc_checking_assert (insn->get_op (i));
 	  insn->m_operand_list->m_offsets[i - 1]
-	    = htole32 (enqueue_op (insn->get_op (i)));
+	    = lendian32 (enqueue_op (insn->get_op (i)));
 	}
 
-      repr.base.operands = htole32 (emit_operands (insn->get_op (0),
-						   insn->m_operand_list));
+      repr.base.operands = lendian32 (emit_operands (insn->get_op (0),
+						     insn->m_operand_list));
     }
   else if (insn->m_opcode == BRIG_OPCODE_EXPAND)
     {
@@ -1679,12 +1757,12 @@ emit_packed_insn (hsa_insn_packed *insn)
 	{
 	  gcc_checking_assert (insn->get_op (i));
 	  insn->m_operand_list->m_offsets[i]
-	    = htole32 (enqueue_op (insn->get_op (i)));
+	    = lendian32 (enqueue_op (insn->get_op (i)));
 	}
 
       unsigned ops = emit_operands (insn->m_operand_list,
 				    insn->get_op (insn->operand_count () - 1));
-      repr.base.operands = htole32 (ops);
+      repr.base.operands = lendian32 (ops);
     }
 
 
@@ -1703,9 +1781,9 @@ emit_basic_insn (hsa_insn_basic *insn)
   BrigType16_t type;
 
   memset (&repr, 0, sizeof (repr));
-  repr.base.base.byteCount = htole16 (sizeof (BrigInstBasic));
-  repr.base.base.kind = htole16 (BRIG_KIND_INST_BASIC);
-  repr.base.opcode = htole16 (insn->m_opcode);
+  repr.base.base.byteCount = lendian16 (sizeof (BrigInstBasic));
+  repr.base.base.kind = lendian16 (BRIG_KIND_INST_BASIC);
+  repr.base.opcode = lendian16 (insn->m_opcode);
   switch (insn->m_opcode)
     {
       /* And the bit-logical operations need bit types and whine about
@@ -1720,8 +1798,8 @@ emit_basic_insn (hsa_insn_basic *insn)
 	type = insn->m_type;
 	break;
     }
-  repr.base.type = htole16 (type);
-  repr.base.operands = htole32 (emit_insn_operands (insn));
+  repr.base.type = lendian16 (type);
+  repr.base.operands = lendian32 (emit_insn_operands (insn));
 
   if ((type & BRIG_TYPE_PACK_MASK) != BRIG_TYPE_PACK_NONE)
     {
@@ -1736,8 +1814,8 @@ emit_basic_insn (hsa_insn_basic *insn)
       else
 	repr.pack = BRIG_PACK_P;
       repr.reserved = 0;
-      repr.base.base.byteCount = htole16 (sizeof (BrigInstMod));
-      repr.base.base.kind = htole16 (BRIG_KIND_INST_MOD);
+      repr.base.base.byteCount = lendian16 (sizeof (BrigInstMod));
+      repr.base.base.kind = lendian16 (BRIG_KIND_INST_MOD);
       brig_code.add (&repr, sizeof (struct BrigInstMod));
     }
   else
@@ -2379,7 +2457,7 @@ hsa_output_brig (void)
       BrigOperandCodeRef *code_ref
 	= (BrigOperandCodeRef *) (brig_operand.get_ptr_by_offset (p.offset));
       gcc_assert (code_ref->base.kind == BRIG_KIND_OPERAND_CODE_REF);
-      code_ref->ref = htole32 (*func_offset);
+      code_ref->ref = lendian32 (*func_offset);
     }
 
   /* Iterate all function declarations and if we meet a function that should
@@ -2419,8 +2497,8 @@ hsa_output_brig (void)
   BrigModuleHeader module_header;
   memcpy (&module_header.identification, "HSA BRIG",
 	  sizeof (module_header.identification));
-  module_header.brigMajor = htole32 (BRIG_VERSION_BRIG_MAJOR);
-  module_header.brigMinor = htole32 (BRIG_VERSION_BRIG_MINOR);
+  module_header.brigMajor = lendian32 (BRIG_VERSION_BRIG_MAJOR);
+  module_header.brigMinor = lendian32 (BRIG_VERSION_BRIG_MINOR);
   uint64_t section_index[3];
 
   int data_padding, code_padding, operand_padding;
@@ -2440,18 +2518,18 @@ hsa_output_brig (void)
     + brig_operand.total_size
     + operand_padding;
   gcc_assert ((module_size % 16) == 0);
-  module_header.byteCount = htole64 (module_size);
+  module_header.byteCount = lendian64 (module_size);
   memset (&module_header.hash, 0, sizeof (module_header.hash));
   module_header.reserved = 0;
-  module_header.sectionCount = htole32 (3);
-  module_header.sectionIndex = htole64 (sizeof (module_header));
+  module_header.sectionCount = lendian32 (3);
+  module_header.sectionIndex = lendian64 (sizeof (module_header));
   assemble_string ((const char *) &module_header, sizeof (module_header));
   uint64_t off = sizeof (module_header) + sizeof (section_index);
-  section_index[0] = htole64 (off);
+  section_index[0] = lendian64 (off);
   off += brig_data.total_size + data_padding;
-  section_index[1] = htole64 (off);
+  section_index[1] = lendian64 (off);
   off += brig_code.total_size + code_padding;
-  section_index[2] = htole64 (off);
+  section_index[2] = lendian64 (off);
   assemble_string ((const char *) &section_index, sizeof (section_index));
 
   char padding[HSA_SECTION_ALIGNMENT];
-- 
2.6.4


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 09/10] Majority of the HSA back-end
  2016-01-18 15:33       ` Martin Jambor
@ 2016-01-18 15:48         ` Jakub Jelinek
  0 siblings, 0 replies; 77+ messages in thread
From: Jakub Jelinek @ 2016-01-18 15:48 UTC (permalink / raw)
  To: gcc-patches

Hi!

PDP endian is
      gcc_assert (!BYTES_BIG_ENDIAN);
      gcc_assert (WORDS_BIG_ENDIAN);
and 16-bit words, thus within uint16_t it is little endian, and the
16-bit words are ordered in larger units in big endian order.

> +#else
> +  val  = ((val & 0xff00ff00) >> 8) | ((val & 0xff00ff) << 8);

Too many spaces before =?

> +  return (val >> 16) | (val << 16);
> +#endif
> +#endif
> +}
> +
> +/* Convert VAL to little endian form, if necessary.  */
> +
> +static uint64_t
> +lendian64 (uint64_t val)
> +{
> +#if GCC_VERSION >= 4006
> +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
> +  return val;
> +#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
> +  return __builtin_bswap64 (val);
> +#else  /* __ORDER_PDP_ENDIAN__ */
> +  return (((val & 0xffff) << 48)
> +	  | ((val & 0xffff0000) << 16)
> +	  | ((val & 0xffff00000000) >> 16)
> +	  | ((val & 0xffff000000000000) >> 48));

You are missing ll suffixes on the large constants.

That said, PDP endian host will not work with your patch if the system
compiler is not GCC >= 4.6, and most likely you are relying on __CHAR_BIT__
== 8 on the host too.  Guess it can be handled incrementally though.

	Jakub

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 00/10] Merge of HSA branch
  2016-01-13 17:55 [hsa merge 00/10] Merge of HSA branch Martin Jambor
                   ` (9 preceding siblings ...)
  2016-01-13 18:17 ` [hsa merge 05/10] OpenMP lowering/expansion changes (gridification) Martin Jambor
@ 2016-01-19 10:45 ` Martin Jambor
  2016-01-19 14:25   ` Richard Biener
  10 siblings, 1 reply; 77+ messages in thread
From: Martin Jambor @ 2016-01-19 10:45 UTC (permalink / raw)
  To: GCC Patches

Hi,

On Wed, Jan 13, 2016 at 06:39:25PM +0100, Martin Jambor wrote:
> Hi,
> 
> this is hopefully the last big re-post of the HSA patches...

I have committed the combined patch as revision 232549 after
bootstrapping and testing all languages on x86_64-linux and i686-linux
and verifying I did not break powerpc-aix more than it was before.

I will be updating gcc offloading wiki in a few days, meanwhile you
can use README.hsa file from the branch:

https://gcc.gnu.org/viewcvs/gcc/branches/hsa/gcc/README.hsa?view=markup

I will be also posting followup testsuite patches.

> 
> Thanks everybody for patience and feedback.  While we are of course
> opened for mor more of it, let's also hope the approval process will
> finish soon as it should now.

I can't but repeat my thanks, especially to Jakub for the review and
help with the many last-minute issues.

Martin

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 00/10] Merge of HSA branch
  2016-01-19 10:45 ` [hsa merge 00/10] Merge of HSA branch Martin Jambor
@ 2016-01-19 14:25   ` Richard Biener
  2016-01-21 23:10     ` Gerald Pfeifer
  0 siblings, 1 reply; 77+ messages in thread
From: Richard Biener @ 2016-01-19 14:25 UTC (permalink / raw)
  To: GCC Patches

On Tue, Jan 19, 2016 at 11:45 AM, Martin Jambor <mjambor@suse.cz> wrote:
> Hi,
>
> On Wed, Jan 13, 2016 at 06:39:25PM +0100, Martin Jambor wrote:
>> Hi,
>>
>> this is hopefully the last big re-post of the HSA patches...
>
> I have committed the combined patch as revision 232549 after
> bootstrapping and testing all languages on x86_64-linux and i686-linux
> and verifying I did not break powerpc-aix more than it was before.
>
> I will be updating gcc offloading wiki in a few days, meanwhile you
> can use README.hsa file from the branch:
>
> https://gcc.gnu.org/viewcvs/gcc/branches/hsa/gcc/README.hsa?view=markup
>
> I will be also posting followup testsuite patches.
>
>>
>> Thanks everybody for patience and feedback.  While we are of course
>> opened for mor more of it, let's also hope the approval process will
>> finish soon as it should now.
>
> I can't but repeat my thanks, especially to Jakub for the review and
> help with the many last-minute issues.

I think the merge warrants a NEWS entry on gcc.gnu.org/

Richard.

> Martin

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 02/10] Modifications to libgomp proper
  2016-01-13 17:55 ` [hsa merge 02/10] Modifications to libgomp proper Martin Jambor
@ 2016-01-20 13:37   ` Ilya Verbin
  2016-01-20 14:48   ` Ilya Verbin
  1 sibling, 0 replies; 77+ messages in thread
From: Ilya Verbin @ 2016-01-20 13:37 UTC (permalink / raw)
  To: Martin Jambor; +Cc: GCC Patches, Jakub Jelinek

On Wed, Jan 13, 2016 at 18:39:27 +0100, Martin Jambor wrote:
> diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
> index 68f7b2c..58ef595 100644
> --- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
> +++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
> @@ -528,7 +528,7 @@ GOMP_OFFLOAD_dev2dev (int device, void *dst_ptr, const void *src_ptr,
>  
>  extern "C" void
>  GOMP_OFFLOAD_async_run (int device, void *tgt_fn, void *tgt_vars,
> -			void *async_data)
> +			void **, void *async_data)
>  {
>    TRACE ("(device = %d, tgt_fn = %p, tgt_vars = %p, async_data = %p)", device,
>  	 tgt_fn, tgt_vars, async_data);
> @@ -544,7 +544,7 @@ GOMP_OFFLOAD_async_run (int device, void *tgt_fn, void *tgt_vars,
>  }
>  
>  extern "C" void
> -GOMP_OFFLOAD_run (int device, void *tgt_fn, void *tgt_vars)
> +GOMP_OFFLOAD_run (int device, void *tgt_fn, void *tgt_vars, void **)
>  {
>    TRACE ("(device = %d, tgt_fn = %p, tgt_vars = %p)", device, tgt_fn, tgt_vars);

This breaks GOMP_OFFLOAD_run.  Committed as obvious.


2016-01-20  Ilya Verbin  <ilya.verbin@intel.com>

liboffloadmic/
	* plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_run): Pass extra NULL
	to GOMP_OFFLOAD_async_run.


diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
index 58ef595..57accb4 100644
--- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
+++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
@@ -548,5 +548,5 @@ GOMP_OFFLOAD_run (int device, void *tgt_fn, void *tgt_vars, void **)
 {
   TRACE ("(device = %d, tgt_fn = %p, tgt_vars = %p)", device, tgt_fn, tgt_vars);
 
-  GOMP_OFFLOAD_async_run (device, tgt_fn, tgt_vars, NULL);
+  GOMP_OFFLOAD_async_run (device, tgt_fn, tgt_vars, NULL, NULL);
 }


  -- Ilya

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 02/10] Modifications to libgomp proper
  2016-01-13 17:55 ` [hsa merge 02/10] Modifications to libgomp proper Martin Jambor
  2016-01-20 13:37   ` Ilya Verbin
@ 2016-01-20 14:48   ` Ilya Verbin
  2016-01-20 14:55     ` Jakub Jelinek
  1 sibling, 1 reply; 77+ messages in thread
From: Ilya Verbin @ 2016-01-20 14:48 UTC (permalink / raw)
  To: Martin Jambor; +Cc: GCC Patches, Jakub Jelinek

On Wed, Jan 13, 2016 at 18:39:27 +0100, Martin Jambor wrote:
> 	* task.c (GOMP_PLUGIN_target_task_completion): Free
> 	firstprivate_copies.

Also this change caused 3 fails on intelmicemul:

FAIL: libgomp.c/target-32.c execution test
FAIL: libgomp.c/target-33.c execution test
FAIL: libgomp.c/target-34.c execution test

Because ttask->firstprivate_copies is uninitialized for !GOMP_OFFLOAD_CAP_SHARED_MEM.

(gdb) p ttask->firstprivate_copies
$1 = (void *) 0x1
(gdb) n
Program received signal SIGSEGV, Segmentation fault.
0x0000003b076800dc in free () from /lib64/libc.so.6
(gdb) bt
#0  0x0000003b076800dc in free () from /lib64/libc.so.6
#1  0x00007ffff7dda871 in GOMP_PLUGIN_target_task_completion (data=0x624ac0) at gcc/libgomp/task.c:585
[...]


OK for trunk?

libgomp/
	* task.c (gomp_create_target_task): Set firstprivate_copies to NULL.

diff --git a/libgomp/task.c b/libgomp/task.c
index 0f45c44..38d4e9b 100644
--- a/libgomp/task.c
+++ b/libgomp/task.c
@@ -683,6 +683,7 @@ gomp_create_target_task (struct gomp_device_descr *devicep,
   ttask->state = state;
   ttask->task = task;
   ttask->team = team;
+  ttask->firstprivate_copies = NULL;
   task->fn = NULL;
   task->fn_data = ttask;
   task->final_task = 0;

  -- Ilya

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 02/10] Modifications to libgomp proper
  2016-01-20 14:48   ` Ilya Verbin
@ 2016-01-20 14:55     ` Jakub Jelinek
  0 siblings, 0 replies; 77+ messages in thread
From: Jakub Jelinek @ 2016-01-20 14:55 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: Martin Jambor, GCC Patches

On Wed, Jan 20, 2016 at 05:47:59PM +0300, Ilya Verbin wrote:
> OK for trunk?
> 
> libgomp/
> 	* task.c (gomp_create_target_task): Set firstprivate_copies to NULL.
> 
> diff --git a/libgomp/task.c b/libgomp/task.c
> index 0f45c44..38d4e9b 100644
> --- a/libgomp/task.c
> +++ b/libgomp/task.c
> @@ -683,6 +683,7 @@ gomp_create_target_task (struct gomp_device_descr *devicep,
>    ttask->state = state;
>    ttask->task = task;
>    ttask->team = team;
> +  ttask->firstprivate_copies = NULL;
>    task->fn = NULL;
>    task->fn_data = ttask;
>    task->final_task = 0;

Ok (though, eventually I'd prefer if free (ttask->firstprivate_copies) is
only performed for the shared mem async tasks and not other one.

	Jakub

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 07/10] IPA-HSA pass
  2016-01-15 18:06                 ` Ilya Verbin
@ 2016-01-20 18:54                   ` Ilya Verbin
  2016-01-21 11:39                     ` Alexander Monakov
  2016-01-22 10:19                     ` Jakub Jelinek
  0 siblings, 2 replies; 77+ messages in thread
From: Ilya Verbin @ 2016-01-20 18:54 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: gcc-patches, Martin Jambor, Alexander Monakov, Thomas Schwinge,
	Kirill Yukhin

On Fri, Jan 15, 2016 at 21:05:47 +0300, Ilya Verbin wrote:
> On Fri, Jan 15, 2016 at 17:45:22 +0100, Jakub Jelinek wrote:
> > On Fri, Jan 15, 2016 at 07:38:14PM +0300, Ilya Verbin wrote:
> > > On Fri, Jan 15, 2016 at 17:09:54 +0100, Jakub Jelinek wrote:
> > > > On Fri, Jan 15, 2016 at 05:02:34PM +0100, Martin Jambor wrote:
> > > > > How do other accelerators cope with the situation when half of the
> > > > > application is compiled with the accelerator disabled?  (Would some of
> > > > > their calls to GOMP_target_ext lead to abort?)
> > > > 
> > > > GOMP_target_ext should never abort (unless internal error), worst case it
> > > > just falls back into the host fallback.
> > > 
> > > Wouldn't that lead to hard-to-find problems in case of nonshared memory?
> > > I mean when someone expects that all target regions are executed on the device,
> > > but in fact some of them are silently executed on the host with different data
> > > environment.
> > 
> > E.g. for HSA it really shouldn't matter, as it is shared memory accelerator.
> > For XeonPhi we hopefully can offload anything.
> 
> As you said, if compilation of target image fails with ICE or somehow, host
> fallback and offloading to other targets should still work:
> https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00951.html
> That patch was not applied, but it can be simulated by -foffload=disable,

I agree that OpenMP doesn't guarantee that all target regions must be executed
on the device, but in this case a user can't be sure that some library function
always will offload (because the library might be replaced by fallback version),
and he/she will have to write something like:

{
  map_data_to_target ();
  some_library1_fn_with_offload ();
  get_data_from_target ();   /* ! */
  send_data_to_target ();    /* ! */
  some_library2_fn_with_offload ();
  get_data_from_target ();   /* ! */
  send_data_to_target ();    /* ! */
  some_library3_fn_with_offload ();
  unmap_data_from_target ();
}

If you're OK with this, I'll install this patch:


libgomp/
	* target.c (gomp_get_target_fn_addr): Allow host fallback if target
	function wasn't mapped to the device with non-shared memory.

diff --git a/libgomp/target.c b/libgomp/target.c
index f1f5849..96fe3d5 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1436,12 +1436,7 @@ gomp_get_target_fn_addr (struct gomp_device_descr *devicep,
       splay_tree_key tgt_fn = splay_tree_lookup (&devicep->mem_map, &k);
       gomp_mutex_unlock (&devicep->lock);
       if (tgt_fn == NULL)
-	{
-	  if (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
-	    return NULL;
-	  else
-	    gomp_fatal ("Target function wasn't mapped");
-	}
+	return NULL;
 
       return (void *) tgt_fn->tgt_offset;
     }

  -- Ilya

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 07/10] IPA-HSA pass
  2016-01-20 18:54                   ` Ilya Verbin
@ 2016-01-21 11:39                     ` Alexander Monakov
  2016-01-22 10:19                     ` Jakub Jelinek
  1 sibling, 0 replies; 77+ messages in thread
From: Alexander Monakov @ 2016-01-21 11:39 UTC (permalink / raw)
  To: Ilya Verbin
  Cc: Jakub Jelinek, gcc-patches, Martin Jambor, Thomas Schwinge,
	Kirill Yukhin

On Wed, 20 Jan 2016, Ilya Verbin wrote:
> I agree that OpenMP doesn't guarantee that all target regions must be executed
> on the device, but in this case a user can't be sure that some library function
> always will offload (because the library might be replaced by fallback version),
> and he/she will have to write something like:

I think there should be a way to allow the OpenMP runtime deduce what data
needs to be resynced on target region entries/exits in presence of fallback
execution; explicit copying via map(from/to:...) is a too big hammer for that.
I wonder if it was discussed.

It would be nice to be able to apply the idea of "debug counters" to target
region offloading in order to automatically bisect offload miscompilations:
force fallback execution for target region entries for Nth and next
executions; bisect by N to find the first incorrectly executing offload
region.  If the implementation cannot count on source program fully handling
arbitrary fallbacks, this idea doesn't work in general.

Alexander

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 00/10] Merge of HSA branch
  2016-01-19 14:25   ` Richard Biener
@ 2016-01-21 23:10     ` Gerald Pfeifer
  2016-01-27 14:20       ` Martin Jambor
  0 siblings, 1 reply; 77+ messages in thread
From: Gerald Pfeifer @ 2016-01-21 23:10 UTC (permalink / raw)
  To: Martin Jambor, Richard Biener; +Cc: GCC Patches

On Tue, 19 Jan 2016, Richard Biener wrote:
> I think the merge warrants a NEWS entry on gcc.gnu.org/

...and gcc-6/changes.html. :-)

Martin, happy to help.  Want to propose some text (or even patch)?

Gerald

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 07/10] IPA-HSA pass
  2016-01-20 18:54                   ` Ilya Verbin
  2016-01-21 11:39                     ` Alexander Monakov
@ 2016-01-22 10:19                     ` Jakub Jelinek
  2016-01-28 11:36                       ` Partial Offloading (was: [hsa merge 07/10] IPA-HSA pass) Thomas Schwinge
  1 sibling, 1 reply; 77+ messages in thread
From: Jakub Jelinek @ 2016-01-22 10:19 UTC (permalink / raw)
  To: Ilya Verbin
  Cc: gcc-patches, Martin Jambor, Alexander Monakov, Thomas Schwinge,
	Kirill Yukhin

On Wed, Jan 20, 2016 at 09:53:30PM +0300, Ilya Verbin wrote:
> If you're OK with this, I'll install this patch:
> 
> 
> libgomp/
> 	* target.c (gomp_get_target_fn_addr): Allow host fallback if target
> 	function wasn't mapped to the device with non-shared memory.

Ok, thanks.

> diff --git a/libgomp/target.c b/libgomp/target.c
> index f1f5849..96fe3d5 100644
> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -1436,12 +1436,7 @@ gomp_get_target_fn_addr (struct gomp_device_descr *devicep,
>        splay_tree_key tgt_fn = splay_tree_lookup (&devicep->mem_map, &k);
>        gomp_mutex_unlock (&devicep->lock);
>        if (tgt_fn == NULL)
> -	{
> -	  if (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
> -	    return NULL;
> -	  else
> -	    gomp_fatal ("Target function wasn't mapped");
> -	}
> +	return NULL;
>  
>        return (void *) tgt_fn->tgt_offset;
>      }
> 
>   -- Ilya

	Jakub

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 07/10] IPA-HSA pass
  2016-01-16 10:00       ` Jan Hubicka
@ 2016-01-25 15:22         ` Martin Liška
  2016-01-25 15:24           ` Jakub Jelinek
  0 siblings, 1 reply; 77+ messages in thread
From: Martin Liška @ 2016-01-25 15:22 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Biener, Jakub Jelinek, Jan Hubicka, Martin Jambor

[-- Attachment #1: Type: text/plain, Size: 311 bytes --]

On 01/16/2016 11:00 AM, Jan Hubicka wrote:
> Can't it be represented via explicit REF_ADDR or something like that?
> 
> Honza

Hi.

Sure, I've just done a patch that can do that. However, as we're currently in stage4,
that change would probably require explicit permission of a release manager?

Thanks,
Martin

[-- Attachment #2: 0001-HSA-simplify-partitioning-of-HSA-kernels-and-host-im.patch --]
[-- Type: text/x-patch, Size: 2520 bytes --]

From 9639fff94d043c55b55bfb12bb086032db565f0a Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Mon, 25 Jan 2016 16:11:00 +0100
Subject: [PATCH] HSA: simplify partitioning of HSA kernels and host impls.

gcc/lto/ChangeLog:

2016-01-25  Martin Liska  <mliska@suse.cz>

	* lto-partition.c (add_symbol_to_partition_1): Remove usage
	of hsa_summaries.

gcc/ChangeLog:

2016-01-25  Martin Liska  <mliska@suse.cz>

	* hsa.c (hsa_summary_t::link_functions): Create IPA_REF_ADDR
	reference for an HSA kernel and its host function.
---
 gcc/hsa.c               |  5 +++++
 gcc/lto/lto-partition.c | 19 -------------------
 2 files changed, 5 insertions(+), 19 deletions(-)

diff --git a/gcc/hsa.c b/gcc/hsa.c
index ec23f81..f0b3205 100644
--- a/gcc/hsa.c
+++ b/gcc/hsa.c
@@ -781,6 +781,11 @@ hsa_summary_t::link_functions (cgraph_node *gpu, cgraph_node *host,
   TREE_OPTIMIZATION (fn_opts)->x_flag_tree_loop_vectorize = false;
   TREE_OPTIMIZATION (fn_opts)->x_flag_tree_slp_vectorize = false;
   DECL_FUNCTION_SPECIFIC_OPTIMIZATION (gdecl) = fn_opts;
+
+  /* Create reference between a kernel and a corresponding host implementation
+     to quarantee LTO streaming to a same LTRANS.  */
+  if (kind == HSA_KERNEL)
+    gpu->create_reference (host, IPA_REF_ADDR);
 }
 
 /* Add a HOST function to HSA summaries.  */
diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c
index eb28fed..9eb63c2 100644
--- a/gcc/lto/lto-partition.c
+++ b/gcc/lto/lto-partition.c
@@ -34,7 +34,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-prop.h"
 #include "ipa-inline.h"
 #include "lto-partition.h"
-#include "hsa.h"
 
 vec<ltrans_partition> ltrans_partitions;
 
@@ -171,24 +170,6 @@ add_symbol_to_partition_1 (ltrans_partition part, symtab_node *node)
 	 Therefore put it into the same partition.  */
       if (cnode->instrumented_version)
 	add_symbol_to_partition_1 (part, cnode->instrumented_version);
-
-      /* Add an HSA associated with the symbol.  */
-      if (hsa_summaries != NULL)
-	{
-	  hsa_function_summary *s = hsa_summaries->get (cnode);
-	  if (s->m_kind == HSA_KERNEL)
-	    {
-	      /* Add binded function.  */
-	      bool added = add_symbol_to_partition_1 (part,
-						      s->m_binded_function);
-	      gcc_assert (added);
-	      if (symtab->dump_file)
-		fprintf (symtab->dump_file,
-			 "adding an HSA function (host/gpu) to the "
-			 "partition: %s\n",
-			 s->m_binded_function->name ());
-	    }
-	}
     }
 
   add_references_to_partition (part, node);
-- 
2.7.0


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 07/10] IPA-HSA pass
  2016-01-25 15:22         ` Martin Liška
@ 2016-01-25 15:24           ` Jakub Jelinek
  2016-01-25 23:41             ` Jan Hubicka
  0 siblings, 1 reply; 77+ messages in thread
From: Jakub Jelinek @ 2016-01-25 15:24 UTC (permalink / raw)
  To: Martin Liška; +Cc: gcc-patches, Richard Biener, Jan Hubicka, Martin Jambor

On Mon, Jan 25, 2016 at 04:21:50PM +0100, Martin Liška wrote:
> On 01/16/2016 11:00 AM, Jan Hubicka wrote:
> > Can't it be represented via explicit REF_ADDR or something like that?
> > 
> > Honza
> 
> Hi.
> 
> Sure, I've just done a patch that can do that. However, as we're currently in stage4,
> that change would probably require explicit permission of a release manager?

If Honza is fine with it and you've tested it, this is ok for trunk.

> >From 9639fff94d043c55b55bfb12bb086032db565f0a Mon Sep 17 00:00:00 2001
> From: marxin <mliska@suse.cz>
> Date: Mon, 25 Jan 2016 16:11:00 +0100
> Subject: [PATCH] HSA: simplify partitioning of HSA kernels and host impls.
> 
> gcc/lto/ChangeLog:
> 
> 2016-01-25  Martin Liska  <mliska@suse.cz>
> 
> 	* lto-partition.c (add_symbol_to_partition_1): Remove usage
> 	of hsa_summaries.
> 
> gcc/ChangeLog:
> 
> 2016-01-25  Martin Liska  <mliska@suse.cz>
> 
> 	* hsa.c (hsa_summary_t::link_functions): Create IPA_REF_ADDR
> 	reference for an HSA kernel and its host function.
> ---
>  gcc/hsa.c               |  5 +++++
>  gcc/lto/lto-partition.c | 19 -------------------
>  2 files changed, 5 insertions(+), 19 deletions(-)
> 
> diff --git a/gcc/hsa.c b/gcc/hsa.c
> index ec23f81..f0b3205 100644
> --- a/gcc/hsa.c
> +++ b/gcc/hsa.c
> @@ -781,6 +781,11 @@ hsa_summary_t::link_functions (cgraph_node *gpu, cgraph_node *host,
>    TREE_OPTIMIZATION (fn_opts)->x_flag_tree_loop_vectorize = false;
>    TREE_OPTIMIZATION (fn_opts)->x_flag_tree_slp_vectorize = false;
>    DECL_FUNCTION_SPECIFIC_OPTIMIZATION (gdecl) = fn_opts;
> +
> +  /* Create reference between a kernel and a corresponding host implementation
> +     to quarantee LTO streaming to a same LTRANS.  */
> +  if (kind == HSA_KERNEL)
> +    gpu->create_reference (host, IPA_REF_ADDR);
>  }
>  
>  /* Add a HOST function to HSA summaries.  */
> diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c
> index eb28fed..9eb63c2 100644
> --- a/gcc/lto/lto-partition.c
> +++ b/gcc/lto/lto-partition.c
> @@ -34,7 +34,6 @@ along with GCC; see the file COPYING3.  If not see
>  #include "ipa-prop.h"
>  #include "ipa-inline.h"
>  #include "lto-partition.h"
> -#include "hsa.h"
>  
>  vec<ltrans_partition> ltrans_partitions;
>  
> @@ -171,24 +170,6 @@ add_symbol_to_partition_1 (ltrans_partition part, symtab_node *node)
>  	 Therefore put it into the same partition.  */
>        if (cnode->instrumented_version)
>  	add_symbol_to_partition_1 (part, cnode->instrumented_version);
> -
> -      /* Add an HSA associated with the symbol.  */
> -      if (hsa_summaries != NULL)
> -	{
> -	  hsa_function_summary *s = hsa_summaries->get (cnode);
> -	  if (s->m_kind == HSA_KERNEL)
> -	    {
> -	      /* Add binded function.  */
> -	      bool added = add_symbol_to_partition_1 (part,
> -						      s->m_binded_function);
> -	      gcc_assert (added);
> -	      if (symtab->dump_file)
> -		fprintf (symtab->dump_file,
> -			 "adding an HSA function (host/gpu) to the "
> -			 "partition: %s\n",
> -			 s->m_binded_function->name ());
> -	    }
> -	}
>      }
>  
>    add_references_to_partition (part, node);
> -- 
> 2.7.0
> 


	Jakub

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 07/10] IPA-HSA pass
  2016-01-25 15:24           ` Jakub Jelinek
@ 2016-01-25 23:41             ` Jan Hubicka
  2016-01-27 12:38               ` Martin Liška
  0 siblings, 1 reply; 77+ messages in thread
From: Jan Hubicka @ 2016-01-25 23:41 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Martin Liška, gcc-patches, Richard Biener, Jan Hubicka,
	Martin Jambor

> On Mon, Jan 25, 2016 at 04:21:50PM +0100, Martin Liška wrote:
> > On 01/16/2016 11:00 AM, Jan Hubicka wrote:
> > > Can't it be represented via explicit REF_ADDR or something like that?
> > > 
> > > Honza
> > 
> > Hi.
> > 
> > Sure, I've just done a patch that can do that. However, as we're currently in stage4,
> > that change would probably require explicit permission of a release manager?
> 
> If Honza is fine with it and you've tested it, this is ok for trunk.

It looks fine to me.

Honza

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 07/10] IPA-HSA pass
  2016-01-25 23:41             ` Jan Hubicka
@ 2016-01-27 12:38               ` Martin Liška
  0 siblings, 0 replies; 77+ messages in thread
From: Martin Liška @ 2016-01-27 12:38 UTC (permalink / raw)
  To: Jan Hubicka, Jakub Jelinek; +Cc: gcc-patches, Richard Biener, Martin Jambor

On 01/26/2016 12:41 AM, Jan Hubicka wrote:
>> On Mon, Jan 25, 2016 at 04:21:50PM +0100, Martin Liška wrote:
>>> On 01/16/2016 11:00 AM, Jan Hubicka wrote:
>>>> Can't it be represented via explicit REF_ADDR or something like that?
>>>>
>>>> Honza
>>>
>>> Hi.
>>>
>>> Sure, I've just done a patch that can do that. However, as we're currently in stage4,
>>> that change would probably require explicit permission of a release manager?
>>
>> If Honza is fine with it and you've tested it, this is ok for trunk.
> 
> It looks fine to me.
> 
> Honza
> 

I've just bootregtested the patch on x86_64-linux-pc and I'm going to install
it to trunk.

Martin

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 00/10] Merge of HSA branch
  2016-01-21 23:10     ` Gerald Pfeifer
@ 2016-01-27 14:20       ` Martin Jambor
  2016-01-28 15:18         ` Gerald Pfeifer
  0 siblings, 1 reply; 77+ messages in thread
From: Martin Jambor @ 2016-01-27 14:20 UTC (permalink / raw)
  To: Gerald Pfeifer; +Cc: Richard Biener, GCC Patches

Hi,

sorry for getting so late to this:

On Thu, Jan 21, 2016 at 05:10:17PM -0600, Gerald Pfeifer wrote:
> On Tue, 19 Jan 2016, Richard Biener wrote:
> > I think the merge warrants a NEWS entry on gcc.gnu.org/
> 
> ...and gcc-6/changes.html. :-)
> 
> Martin, happy to help.  Want to propose some text (or even patch)?
> 

So what would you think about the following?  Perhaps it is too
verbose but I wanted to mention the few areas users should know have
changed, if they happen to try HSA out.  I can certainly cut it down a
bit.

Index: changes.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v
retrieving revision 1.52
diff -u -r1.52 changes.html
--- changes.html	25 Jan 2016 15:09:55 -0000	1.52
+++ changes.html	27 Jan 2016 14:15:49 -0000
@@ -272,6 +272,30 @@
    </ul>
 <!-- <h3 id="avr">AVR</h3> -->
 
+<h3 id="hsa">Heterogeneous Systems Architecture</h3>
+   <ul>
+     <li><p>GCC can now generate HSAIL for simple OpenMP device constructs
+       if configured with <code>--enable-offload-targets=hsa</code>.  A new
+       libgomp plugin then run these HSAIL kernels implementing these
+       constructs on HSA capable GPUs via standard HSA run-time.</p>
+       
+       <p>If the HSA compilation back-end determines it cannot output HSAIL
+       for a particular input, it gives a warning by default.  These
+       warnings can be suppressed with <code>-Wno-hsa</code>.  To give a
+       few examples, the HSA back-end does not implement compilation of
+       code using function pointers and variable-sized variables and
+       parameters, functions with variadic arguments as well as a number of
+       other less common programming constructs.</p>
+
+       <p>When compilation for HSA is enabled, the compiler attempts to
+       compile composite OpenMP constructs
+	 <blockquote><pre>
+#pragma omp target teams distribute parallel for</pre></blockquote>
+	 into parallel HSA GPU kernels.</p>
+     </li>
+   </ul>
+
+
 <h3 id="x86">IA-32/x86-64</h3>
    <ul>
      <li>GCC now supports the Intel CPU named Skylake with AVX-512 extensions

The change to the news on the main page might then be:

Index: index.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/index.html,v
retrieving revision 1.992
diff -u -r1.992 index.html
--- index.html	24 Jan 2016 23:54:36 -0000	1.992
+++ index.html	27 Jan 2016 14:16:25 -0000
@@ -52,6 +52,13 @@
 
 <dl class="news">
 
+<dt><span> Heterogeneous Systems Architecture support in GCC</span>
+     <span class="date">[2016-01-27]</span></dt>
+     <dd><a href="http://www.hsafoundation.com/"> Heterogeneous Systems
+     Architecture 1.0</a> <a href="https://gcc.gnu.org/gcc-6/changes.html#hsa">
+     support was added to GCC.</a>  Contributed by Martin Jambor, Martin Liška
+     and Michael Matz from SUSE.</dd>
+
 <dt><span><a href="gcc-5/">GCC 5.3</a> released</span>
     <span class="date">[2015-12-04]</span></dt>
     <dd></dd>

Any comments welcome.

Thanks,

Martin

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Partial Offloading (was: [hsa merge 07/10] IPA-HSA pass)
  2016-01-22 10:19                     ` Jakub Jelinek
@ 2016-01-28 11:36                       ` Thomas Schwinge
  2016-02-17 16:41                         ` Ilya Verbin
  0 siblings, 1 reply; 77+ messages in thread
From: Thomas Schwinge @ 2016-01-28 11:36 UTC (permalink / raw)
  To: Jakub Jelinek, Ilya Verbin, Martin Jambor, Alexander Monakov
  Cc: gcc-patches, Kirill Yukhin

[-- Attachment #1: Type: text/plain, Size: 244 bytes --]

Hi!

> [...]

I made an attempt to capture the recent discussion (plus my own
ideas/understanding) in this new section:
<https://gcc.gnu.org/wiki/Offloading#Partial_Offloading>.  Please
change/extend, as required.


Grüße
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 00/10] Merge of HSA branch
  2016-01-27 14:20       ` Martin Jambor
@ 2016-01-28 15:18         ` Gerald Pfeifer
  2016-02-02 14:20           ` Martin Jambor
  0 siblings, 1 reply; 77+ messages in thread
From: Gerald Pfeifer @ 2016-01-28 15:18 UTC (permalink / raw)
  To: Martin Jambor; +Cc: Richard Biener, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2656 bytes --]

Hi Martin,

On Wed, 27 Jan 2016, Martin Jambor wrote:
> So what would you think about the following?  Perhaps it is too
> verbose but I wanted to mention the few areas users should know 
> have changed, if they happen to try HSA out.  I can certainly 
> cut it down a bit.

I like this.  In fact, in one or two spots we even may want to
be a little bit longer on the changes.html page.  (Definitely a
good idea to split this into the shorter announcement and the
longer one for the release notes.)

> Index: changes.html
> ===================================================================
> +     <li><p>GCC can now generate HSAIL for simple OpenMP device constructs
> +       if configured with <code>--enable-offload-targets=hsa</code>.  A new
> +       libgomp plugin then run these HSAIL kernels implementing these
> +       constructs on HSA capable GPUs via standard HSA run-time.</p>

"run" -> "runs"

"via a standard HSA"

"run-time" -> "run time" (since it's a noun)

Will everyone know what HSAIL is?  Should we expand the acronym
at first or note (in parenthesis what it is)?

> +       <p>If the HSA compilation back-end determines it cannot output HSAIL
> +       for a particular input, it gives a warning by default.  These
> +       warnings can be suppressed with <code>-Wno-hsa</code>.  To give a
> +       few examples, the HSA back-end does not implement compilation of

"back end"

> +       code using function pointers and variable-sized variables and

"variable-sized variables" is a bit confusing. ;-)  Is there a
better term?  "variable-sized arrays" I know, for example.

> +       <p>When compilation for HSA is enabled, the compiler attempts to
> +       compile composite OpenMP constructs
> +	 <blockquote><pre>
> +#pragma omp target teams distribute parallel for</pre></blockquote>
> +	 into parallel HSA GPU kernels.</p>

Above we had "HSAIL kernels", here "HSA GPU kernels".  Should we
converge on one?

> Index: index.html
> ===================================================================
> +<dt><span> Heterogeneous Systems Architecture support in GCC</span>

Here I would omit the "in GCC" since this is our main page.

> +     <span class="date">[2016-01-27]</span></dt>
> +     <dd><a href="http://www.hsafoundation.com/"> Heterogeneous Systems
> +     Architecture 1.0</a> <a href="https://gcc.gnu.org/gcc-6/changes.html#hsa">
> +     support was added to GCC.</a>  Contributed by Martin Jambor, Martin Liška
> +     and Michael Matz from SUSE.</dd>

How about "to GCC</a>, contributed by..."?

Not too long for the main page I'd say.


This is okay with the changes/considering the questions above.

Gerald

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 01/10] Configury changes and new options
  2016-01-13 17:55 ` [hsa merge 01/10] Configury changes and new options Martin Jambor
  2016-01-14 12:35   ` Jakub Jelinek
@ 2016-02-02 12:53   ` Thomas Schwinge
  1 sibling, 0 replies; 77+ messages in thread
From: Thomas Schwinge @ 2016-02-02 12:53 UTC (permalink / raw)
  To: Martin Jambor, GCC Patches, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 19492 bytes --]

Hi!

On Wed, 13 Jan 2016 18:39:26 +0100, Martin Jambor <mjambor@suse.cz> wrote:
> this patch contains changes to the configuration mechanism and offload
> bits, so that users can build compilers with HSA support.
> 
> It is a re-post of
> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00714.html, which, has
> already been approved by Jakub after a few changes
> (https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01284.html).

On top of that, I applied four patches "as obvious":

> --- a/libgomp/plugin/configfrag.ac
> +++ b/libgomp/plugin/configfrag.ac
> @@ -81,6 +81,62 @@ AC_SUBST(PLUGIN_NVPTX_CPPFLAGS)
>  AC_SUBST(PLUGIN_NVPTX_LDFLAGS)
>  AC_SUBST(PLUGIN_NVPTX_LIBS)
>  
> +# Look for HSA run-time, its includes and libraries
> +
> +HSA_RUNTIME_INCLUDE=
> +HSA_RUNTIME_LIB=
> +AC_SUBST(HSA_RUNTIME_INCLUDE)
> +AC_SUBST(HSA_RUNTIME_LIB)
> +HSA_RUNTIME_CPPFLAGS=
> +HSA_RUNTIME_LDFLAGS=
> +
> +AC_ARG_WITH(hsa-runtime,
> +	[AS_HELP_STRING([--with-hsa-runtime=PATH],
> +		[specify prefix directory for installed HSA run-time package.
> +		 Equivalent to --with-hsa-runtime-include=PATH/include
> +		 plus --with-hsa-runtime-lib=PATH/lib])])
> +AC_ARG_WITH(hsa-runtime-include,
> +	[AS_HELP_STRING([--with-hsa-runtime-include=PATH],
> +		[specify directory for installed HSA run-time include files])])
> +AC_ARG_WITH(hsa-runtime-lib,
> +	[AS_HELP_STRING([--with-hsa-runtime-lib=PATH],
> +		[specify directory for the installed HSA run-time library])])
> +if test "x$with_hsa_runtime" != x; then
> +  HSA_RUNTIME_INCLUDE=$with_hsa_runtime/include
> +  HSA_RUNTIME_LIB=$with_hsa_runtime/lib
> +fi
> +if test "x$with_hsa_runtime_include" != x; then
> +  HSA_RUNTIME_INCLUDE=$with_hsa_runtime_include
> +fi
> +if test "x$with_hsa_runtime_lib" != x; then
> +  HSA_RUNTIME_LIB=$with_hsa_runtime_lib
> +fi
> +if test "x$HSA_RUNTIME_INCLUDE" != x; then
> +  HSA_RUNTIME_CPPFLAGS=-I$HSA_RUNTIME_INCLUDE
> +fi
> +if test "x$HSA_RUNTIME_LIB" != x; then
> +  HSA_RUNTIME_LDFLAGS=-L$HSA_RUNTIME_LIB
> +fi
> +
> +AC_ARG_WITH(hsa-kmt-lib,
> +	[AS_HELP_STRING([--with-hsa-kmt-lib=PATH],
> +		[specify directory for installed HSA KMT library.])])
> +if test "x$with_hsa_kmt_lib" != x; then
> +  HSA_RUNTIME_LDFLAGS="$HSA_RUNTIME_LDFLAGS -L$with_hsa_kmt_lib"
> +  HSA_RUNTIME_LIB=
> +fi
> +
> +PLUGIN_HSA=0
> +PLUGIN_HSA_CPPFLAGS=
> +PLUGIN_HSA_LDFLAGS=
> +PLUGIN_HSA_LIBS=
> +AC_SUBST(PLUGIN_HSA)
> +AC_SUBST(PLUGIN_HSA_CPPFLAGS)
> +AC_SUBST(PLUGIN_HSA_LDFLAGS)
> +AC_SUBST(PLUGIN_HSA_LIBS)
> +
> +
> +
>  # Get offload targets and path to install tree of offloading compiler.
>  offload_additional_options=
>  offload_additional_lib_paths=
> @@ -122,6 +178,49 @@ if test x"$enable_offload_targets" != x; then
>  	    ;;
>  	esac
>  	;;
> +      hsa*)
> +	case "${target}" in
> +	  x86_64-*-*)
> +	    case " ${CC} ${CFLAGS} " in
> +	      *" -m32 "*)
> +	        PLUGIN_HSA=0
> +		;;
> +	      *)
> +	        tgt_name=hsa
> +	        PLUGIN_HSA=$tgt
> +	        PLUGIN_HSA_CPPFLAGS=$HSA_RUNTIME_CPPFLAGS
> +	        PLUGIN_HSA_LDFLAGS=$HSA_RUNTIME_LDFLAGS
> +	        PLUGIN_HSA_LIBS="-lhsa-runtime64 -lhsakmt"
> +
> +	        PLUGIN_HSA_save_CPPFLAGS=$CPPFLAGS
> +	        CPPFLAGS="$PLUGIN_HSA_CPPFLAGS $CPPFLAGS"
> +	        PLUGIN_HSA_save_LDFLAGS=$LDFLAGS
> +	        LDFLAGS="$PLUGIN_HSA_LDFLAGS $LDFLAGS"
> +	        PLUGIN_HSA_save_LIBS=$LIBS
> +	        LIBS="$PLUGIN_HSA_LIBS $LIBS"
> +
> +	        AC_LINK_IFELSE(
> +	          [AC_LANG_PROGRAM(
> +	            [#include "hsa.h"],
> +	              [hsa_status_t status = hsa_init ()])],
> +	          [PLUGIN_HSA=1])
> +	        CPPFLAGS=$PLUGIN_HSA_save_CPPFLAGS
> +	        LDFLAGS=$PLUGIN_HSA_save_LDFLAGS
> +	        LIBS=$PLUGIN_HSA_save_LIBS
> +	        case $PLUGIN_HSA in
> +	          hsa*)
> +	            HSA_PLUGIN=0
> +	            AC_MSG_ERROR([HSA run-time package required for HSA support])
> +	            ;;
> +	        esac
> +		;;
> +	      esac
> +    	    ;;
> +	  *-*-*)
> +	    PLUGIN_HSA=0
> +            ;;
> +        esac
> +        ;;
>        *)
>  	AC_MSG_ERROR([unknown offload target specified])
>  	;;

For the cases where PLUGIN_HSA is set to "0", we also shouldn't configure
libgomp for hsa offloading: loading the plugin will expectedly fail if we
don't build it; committed in r233070:

commit bf57e97e03d046a6109d976bc098e197f220d1b9
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Tue Feb 2 12:48:04 2016 +0000

    libgomp: Don't configure for offloading target if we don't build the corresponding plugin
    
    	libgomp/
    	* plugin/configfrag.ac: Don't configure for offloading target if
    	we don't build the corresponding plugin.
    	* configure: Regenerate.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@233070 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog            | 6 ++++++
 libgomp/configure            | 7 ++++++-
 libgomp/plugin/configfrag.ac | 7 ++++++-
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git libgomp/ChangeLog libgomp/ChangeLog
index c94fa27..610074c 100644
--- libgomp/ChangeLog
+++ libgomp/ChangeLog
@@ -1,3 +1,9 @@
+2016-02-02  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* plugin/configfrag.ac: Don't configure for offloading target if
+	we don't build the corresponding plugin.
+	* configure: Regenerate.
+
 2016-02-01  Nathan Sidwell  <nathan@codesourcery.com>
 
 	* testsuite/libgomp.oacc-c-c++-common/loop-dim-default.c: New.
diff --git libgomp/configure libgomp/configure
index 1410bc7..d34a350 100755
--- libgomp/configure
+++ libgomp/configure
@@ -15323,6 +15323,7 @@ if test x"$enable_offload_targets" != x; then
   for tgt in `echo $enable_offload_targets | sed -e 's#,# #g'`; do
     tgt_dir=`echo $tgt | grep '=' | sed 's/.*=//'`
     tgt=`echo $tgt | sed 's/=.*//'`
+    tgt_name=
     case $tgt in
       *-intelmic-* | *-intelmicemul-*)
 	tgt_name=intelmic
@@ -15424,7 +15425,11 @@ rm -f core conftest.err conftest.$ac_objext \
 	as_fn_error "unknown offload target specified" "$LINENO" 5
 	;;
     esac
-    if test x"$offload_targets" = x; then
+    if test x"$tgt_name" = x; then
+      # Don't configure libgomp for this offloading target if we don't build
+      # the corresponding plugin.
+      continue
+    elif test x"$offload_targets" = x; then
       offload_targets=$tgt_name
     else
       offload_targets=$offload_targets,$tgt_name
diff --git libgomp/plugin/configfrag.ac libgomp/plugin/configfrag.ac
index 2a9d9f9..01815d4 100644
--- libgomp/plugin/configfrag.ac
+++ libgomp/plugin/configfrag.ac
@@ -146,6 +146,7 @@ if test x"$enable_offload_targets" != x; then
   for tgt in `echo $enable_offload_targets | sed -e 's#,# #g'`; do
     tgt_dir=`echo $tgt | grep '=' | sed 's/.*=//'`
     tgt=`echo $tgt | sed 's/=.*//'`
+    tgt_name=
     case $tgt in
       *-intelmic-* | *-intelmicemul-*)
 	tgt_name=intelmic
@@ -225,7 +226,11 @@ if test x"$enable_offload_targets" != x; then
 	AC_MSG_ERROR([unknown offload target specified])
 	;;
     esac
-    if test x"$offload_targets" = x; then
+    if test x"$tgt_name" = x; then
+      # Don't configure libgomp for this offloading target if we don't build
+      # the corresponding plugin.
+      continue
+    elif test x"$offload_targets" = x; then
       offload_targets=$tgt_name
     else
       offload_targets=$offload_targets,$tgt_name


No separate offloading compiler will be searched/invoked for hsa
offloading; committed in r233071:

commit 071822e70273f6df33adee3ac278c229caeabf0a
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Tue Feb 2 12:48:21 2016 +0000

    libgomp: For hsa offloading, compilation is all handled by the target compiler
    
    	libgomp/
    	* plugin/configfrag.ac (offload_additional_options)
    	(offload_additional_lib_paths): Don't amend for hsa offloading.
    	* configure: Regenerate.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@233071 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog            | 4 ++++
 libgomp/configure            | 5 ++++-
 libgomp/plugin/configfrag.ac | 5 ++++-
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git libgomp/ChangeLog libgomp/ChangeLog
index 610074c..6cae4d7 100644
--- libgomp/ChangeLog
+++ libgomp/ChangeLog
@@ -1,5 +1,9 @@
 2016-02-02  Thomas Schwinge  <thomas@codesourcery.com>
 
+	* plugin/configfrag.ac (offload_additional_options)
+	(offload_additional_lib_paths): Don't amend for hsa offloading.
+	* configure: Regenerate.
+
 	* plugin/configfrag.ac: Don't configure for offloading target if
 	we don't build the corresponding plugin.
 	* configure: Regenerate.
diff --git libgomp/configure libgomp/configure
index d34a350..f643bf8 100755
--- libgomp/configure
+++ libgomp/configure
@@ -15434,7 +15434,10 @@ rm -f core conftest.err conftest.$ac_objext \
     else
       offload_targets=$offload_targets,$tgt_name
     fi
-    if test x"$tgt_dir" != x; then
+    if test "$tgt_name" = hsa; then
+      # Offloading compilation is all handled by the target compiler.
+      :
+    elif test x"$tgt_dir" != x; then
       offload_additional_options="$offload_additional_options -B$tgt_dir/libexec/gcc/\$(target_alias)/\$(gcc_version) -B$tgt_dir/bin"
       offload_additional_lib_paths="$offload_additional_lib_paths:$tgt_dir/lib64:$tgt_dir/lib:$tgt_dir/lib32"
     else
diff --git libgomp/plugin/configfrag.ac libgomp/plugin/configfrag.ac
index 01815d4..b9209cb 100644
--- libgomp/plugin/configfrag.ac
+++ libgomp/plugin/configfrag.ac
@@ -235,7 +235,10 @@ if test x"$enable_offload_targets" != x; then
     else
       offload_targets=$offload_targets,$tgt_name
     fi
-    if test x"$tgt_dir" != x; then
+    if test "$tgt_name" = hsa; then
+      # Offloading compilation is all handled by the target compiler.
+      :
+    elif test x"$tgt_dir" != x; then
       offload_additional_options="$offload_additional_options -B$tgt_dir/libexec/gcc/\$(target_alias)/\$(gcc_version) -B$tgt_dir/bin"
       offload_additional_lib_paths="$offload_additional_lib_paths:$tgt_dir/lib64:$tgt_dir/lib:$tgt_dir/lib32"
     else


The hsa libgomp plugin needs to find the libraries it's been linked
against; committed in r233072:

commit f1deafd651ec9c8bb8196ab32a637a13d279461f
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Tue Feb 2 12:48:31 2016 +0000

    libgomp: Use HSA_RUNTIME_LIB, HSA_KMT_LIB in the testsuite
    
    	libgomp/
    	* plugin/configfrag.ac (HSA_KMT_LIB, HSA_KMT_LDFLAGS): New
    	variables.
    	* testsuite/libgomp-test-support.exp.in (hsa_runtime_lib)
    	(hsa_kmt_lib): Set variables.
    	* testsuite/lib/libgomp.exp (libgomp_init): Use them to amend
    	always_ld_library_path.
    	* Makefile.in: Regenerate.
    	* configure: Likewise.
    	* testsuite/Makefile.in: Likewise.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@233072 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog                             | 10 ++++++++++
 libgomp/Makefile.in                           |  1 +
 libgomp/configure                             | 16 +++++++++++-----
 libgomp/plugin/configfrag.ac                  | 11 ++++++++---
 libgomp/testsuite/Makefile.in                 |  1 +
 libgomp/testsuite/lib/libgomp.exp             | 10 +++++++++-
 libgomp/testsuite/libgomp-test-support.exp.in |  2 ++
 7 files changed, 42 insertions(+), 9 deletions(-)

diff --git libgomp/ChangeLog libgomp/ChangeLog
index 6cae4d7..806e23d 100644
--- libgomp/ChangeLog
+++ libgomp/ChangeLog
@@ -1,5 +1,15 @@
 2016-02-02  Thomas Schwinge  <thomas@codesourcery.com>
 
+	* plugin/configfrag.ac (HSA_KMT_LIB, HSA_KMT_LDFLAGS): New
+	variables.
+	* testsuite/libgomp-test-support.exp.in (hsa_runtime_lib)
+	(hsa_kmt_lib): Set variables.
+	* testsuite/lib/libgomp.exp (libgomp_init): Use them to amend
+	always_ld_library_path.
+	* Makefile.in: Regenerate.
+	* configure: Likewise.
+	* testsuite/Makefile.in: Likewise.
+
 	* plugin/configfrag.ac (offload_additional_options)
 	(offload_additional_lib_paths): Don't amend for hsa offloading.
 	* configure: Regenerate.
diff --git libgomp/Makefile.in libgomp/Makefile.in
index bbfac4e..88c8517 100644
--- libgomp/Makefile.in
+++ libgomp/Makefile.in
@@ -268,6 +268,7 @@ FC = @FC@
 FCFLAGS = @FCFLAGS@
 FGREP = @FGREP@
 GREP = @GREP@
+HSA_KMT_LIB = @HSA_KMT_LIB@
 HSA_RUNTIME_INCLUDE = @HSA_RUNTIME_INCLUDE@
 HSA_RUNTIME_LIB = @HSA_RUNTIME_LIB@
 INSTALL = @INSTALL@
diff --git libgomp/configure libgomp/configure
index f643bf8..8d03eb6 100755
--- libgomp/configure
+++ libgomp/configure
@@ -637,6 +637,7 @@ PLUGIN_HSA_LIBS
 PLUGIN_HSA_LDFLAGS
 PLUGIN_HSA_CPPFLAGS
 PLUGIN_HSA
+HSA_KMT_LIB
 HSA_RUNTIME_LIB
 HSA_RUNTIME_INCLUDE
 PLUGIN_NVPTX_LIBS
@@ -11144,7 +11145,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11147 "configure"
+#line 11148 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -11250,7 +11251,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11253 "configure"
+#line 11254 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -15292,6 +15293,9 @@ if test "x$HSA_RUNTIME_LIB" != x; then
   HSA_RUNTIME_LDFLAGS=-L$HSA_RUNTIME_LIB
 fi
 
+HSA_KMT_LIB=
+
+HSA_KMT_LDFLAGS=
 
 # Check whether --with-hsa-kmt-lib was given.
 if test "${with_hsa_kmt_lib+set}" = set; then :
@@ -15299,8 +15303,10 @@ if test "${with_hsa_kmt_lib+set}" = set; then :
 fi
 
 if test "x$with_hsa_kmt_lib" != x; then
-  HSA_RUNTIME_LDFLAGS="$HSA_RUNTIME_LDFLAGS -L$with_hsa_kmt_lib"
-  HSA_RUNTIME_LIB=
+  HSA_KMT_LIB=$with_hsa_kmt_lib
+fi
+if test "x$HSA_KMT_LIB" != x; then
+  HSA_KMT_LDFLAGS=-L$HSA_KMT_LIB
 fi
 
 PLUGIN_HSA=0
@@ -15378,7 +15384,7 @@ rm -f core conftest.err conftest.$ac_objext \
 	        tgt_name=hsa
 	        PLUGIN_HSA=$tgt
 	        PLUGIN_HSA_CPPFLAGS=$HSA_RUNTIME_CPPFLAGS
-	        PLUGIN_HSA_LDFLAGS=$HSA_RUNTIME_LDFLAGS
+	        PLUGIN_HSA_LDFLAGS="$HSA_RUNTIME_LDFLAGS $HSA_KMT_LDFLAGS"
 	        PLUGIN_HSA_LIBS="-lhsa-runtime64 -lhsakmt"
 
 	        PLUGIN_HSA_save_CPPFLAGS=$CPPFLAGS
diff --git libgomp/plugin/configfrag.ac libgomp/plugin/configfrag.ac
index b9209cb..88b4156 100644
--- libgomp/plugin/configfrag.ac
+++ libgomp/plugin/configfrag.ac
@@ -118,12 +118,17 @@ if test "x$HSA_RUNTIME_LIB" != x; then
   HSA_RUNTIME_LDFLAGS=-L$HSA_RUNTIME_LIB
 fi
 
+HSA_KMT_LIB=
+AC_SUBST(HSA_KMT_LIB)
+HSA_KMT_LDFLAGS=
 AC_ARG_WITH(hsa-kmt-lib,
 	[AS_HELP_STRING([--with-hsa-kmt-lib=PATH],
 		[specify directory for installed HSA KMT library.])])
 if test "x$with_hsa_kmt_lib" != x; then
-  HSA_RUNTIME_LDFLAGS="$HSA_RUNTIME_LDFLAGS -L$with_hsa_kmt_lib"
-  HSA_RUNTIME_LIB=
+  HSA_KMT_LIB=$with_hsa_kmt_lib
+fi
+if test "x$HSA_KMT_LIB" != x; then
+  HSA_KMT_LDFLAGS=-L$HSA_KMT_LIB
 fi
 
 PLUGIN_HSA=0
@@ -190,7 +195,7 @@ if test x"$enable_offload_targets" != x; then
 	        tgt_name=hsa
 	        PLUGIN_HSA=$tgt
 	        PLUGIN_HSA_CPPFLAGS=$HSA_RUNTIME_CPPFLAGS
-	        PLUGIN_HSA_LDFLAGS=$HSA_RUNTIME_LDFLAGS
+	        PLUGIN_HSA_LDFLAGS="$HSA_RUNTIME_LDFLAGS $HSA_KMT_LDFLAGS"
 	        PLUGIN_HSA_LIBS="-lhsa-runtime64 -lhsakmt"
 
 	        PLUGIN_HSA_save_CPPFLAGS=$CPPFLAGS
diff --git libgomp/testsuite/Makefile.in libgomp/testsuite/Makefile.in
index 1fae9e8..4dbb406 100644
--- libgomp/testsuite/Makefile.in
+++ libgomp/testsuite/Makefile.in
@@ -111,6 +111,7 @@ FC = @FC@
 FCFLAGS = @FCFLAGS@
 FGREP = @FGREP@
 GREP = @GREP@
+HSA_KMT_LIB = @HSA_KMT_LIB@
 HSA_RUNTIME_INCLUDE = @HSA_RUNTIME_INCLUDE@
 HSA_RUNTIME_LIB = @HSA_RUNTIME_LIB@
 INSTALL = @INSTALL@
diff --git libgomp/testsuite/lib/libgomp.exp libgomp/testsuite/lib/libgomp.exp
index 6dc1e8e..bb9b15f 100644
--- libgomp/testsuite/lib/libgomp.exp
+++ libgomp/testsuite/lib/libgomp.exp
@@ -184,7 +184,7 @@ proc libgomp_init { args } {
     }
     lappend ALWAYS_CFLAGS "additional_flags=-I${srcdir}/.."
 
-    # For build-tree testing, also consider the CUDA paths used for builing.
+    # For build-tree testing, also consider the library paths used for builing.
     # For installed testing, we assume all that to be provided in the sysroot.
     if { $blddir != "" } {
 	global cuda_driver_include
@@ -200,6 +200,14 @@ proc libgomp_init { args } {
 	    lappend ALWAYS_CFLAGS "additional_flags=-L$cuda_driver_lib"
 	    append always_ld_library_path ":$cuda_driver_lib"
 	}
+	global hsa_runtime_lib
+	global hsa_kmt_lib
+	if { $hsa_runtime_lib != "" } {
+	    append always_ld_library_path ":$hsa_runtime_lib"
+	}
+	if { $hsa_kmt_lib != "" } {
+	    append always_ld_library_path ":$hsa_kmt_lib"
+	}
     }
 
     # We use atomic operations in the testcases to validate results.
diff --git libgomp/testsuite/libgomp-test-support.exp.in libgomp/testsuite/libgomp-test-support.exp.in
index 764bec0..5a724fb 100644
--- libgomp/testsuite/libgomp-test-support.exp.in
+++ libgomp/testsuite/libgomp-test-support.exp.in
@@ -1,4 +1,6 @@
 set cuda_driver_include "@CUDA_DRIVER_INCLUDE@"
 set cuda_driver_lib "@CUDA_DRIVER_LIB@"
+set hsa_runtime_lib "@HSA_RUNTIME_LIB@"
+set hsa_kmt_lib "@HSA_KMT_LIB@"
 
 set offload_targets "@offload_targets@"


Skip (unsupported) hsa offloading for OpenACC test cases; committed in
r233073:

commit 6637013a46441d9808a681526d249bfad10877fe
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Tue Feb 2 12:48:41 2016 +0000

    libgomp: Skip hsa offloading for OpenACC test cases
    
    	libgomp/
    	* testsuite/lib/libgomp.exp: Skip hsa offloading for OpenACC test
    	cases.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@233073 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog                 | 3 +++
 libgomp/testsuite/lib/libgomp.exp | 6 ++++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git libgomp/ChangeLog libgomp/ChangeLog
index 806e23d..6c9bf6a 100644
--- libgomp/ChangeLog
+++ libgomp/ChangeLog
@@ -1,5 +1,8 @@
 2016-02-02  Thomas Schwinge  <thomas@codesourcery.com>
 
+	* testsuite/lib/libgomp.exp: Skip hsa offloading for OpenACC test
+	cases.
+
 	* plugin/configfrag.ac (HSA_KMT_LIB, HSA_KMT_LDFLAGS): New
 	variables.
 	* testsuite/libgomp-test-support.exp.in (hsa_runtime_lib)
diff --git libgomp/testsuite/lib/libgomp.exp libgomp/testsuite/lib/libgomp.exp
index bb9b15f..a4c9d83 100644
--- libgomp/testsuite/lib/libgomp.exp
+++ libgomp/testsuite/lib/libgomp.exp
@@ -42,15 +42,17 @@ load_file libgomp-test-support.exp
 set offload_targets_s [split $offload_targets ","]
 set offload_targets_s_openacc {}
 foreach offload_target_openacc $offload_targets_s {
+    # Translate to OpenACC names, or skip if not yet supported.
     switch $offload_target_openacc {
 	intelmic {
-	    # Skip; will all FAIL because of missing
-	    # GOMP_OFFLOAD_CAP_OPENACC_200.
 	    continue
 	}
 	nvptx {
 	    set offload_target_openacc "nvidia"
 	}
+	hsa {
+	    continue
+	}
     }
     lappend offload_targets_s_openacc "$offload_target_openacc"
 }


Grüße
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 00/10] Merge of HSA branch
  2016-01-28 15:18         ` Gerald Pfeifer
@ 2016-02-02 14:20           ` Martin Jambor
  0 siblings, 0 replies; 77+ messages in thread
From: Martin Jambor @ 2016-02-02 14:20 UTC (permalink / raw)
  To: Gerald Pfeifer; +Cc: GCC Patches

Hi,

On Thu, Jan 28, 2016 at 08:18:27AM -0700, Gerald Pfeifer wrote:
> 
> This is okay with the changes/considering the questions above.
> 

thanks for the feedback.  I have committed the following after
incorporating the comments.

Martin

Index: changes.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v
retrieving revision 1.52
diff -u -r1.52 changes.html
--- changes.html	25 Jan 2016 15:09:55 -0000	1.52
+++ changes.html	2 Feb 2016 14:09:11 -0000
@@ -272,6 +272,30 @@
    </ul>
 <!-- <h3 id="avr">AVR</h3> -->
 
+<h3 id="hsa">Heterogeneous Systems Architecture</h3>
+   <ul>
+     <li><p>GCC can now generate HSAIL (Heterogeneous System Architecture
+       Intermediate Language) for simple OpenMP device constructs if
+       configured with <code>--enable-offload-targets=hsa</code>.  A new
+       libgomp plugin then runs the HSA GPU kernels implementing these
+       constructs on HSA capable GPUs via a standard HSA run time.</p>
+       
+       <p>If the HSA compilation back end determines it cannot output HSAIL
+       for a particular input, it gives a warning by default.  These
+       warnings can be suppressed with <code>-Wno-hsa</code>.  To give a few
+       examples, the HSA back end does not implement compilation of code
+       using function pointers, automatic allocation of variable sized
+       arrays, functions with variadic arguments as well as a number of
+       other less common programming constructs.</p>
+
+       <p>When compilation for HSA is enabled, the compiler attempts to
+       compile composite OpenMP constructs
+	 <blockquote><pre>
+#pragma omp target teams distribute parallel for</pre></blockquote>
+	 into parallel HSA GPU kernels.</p>
+     </li>
+   </ul>
+
 <h3 id="x86">IA-32/x86-64</h3>
    <ul>
      <li>GCC now supports the Intel CPU named Skylake with AVX-512 extensions




Index: index.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/index.html,v
retrieving revision 1.993
diff -u -r1.993 index.html
--- index.html	30 Jan 2016 06:01:48 -0000	1.993
+++ index.html	2 Feb 2016 14:10:25 -0000
@@ -50,6 +50,13 @@
 <h2 id="news">News</h2>
 <dl>
 
+<dt><span> Heterogeneous Systems Architecture support</span>
+     <span class="date">[2016-01-27]</span></dt>
+     <dd><a href="http://www.hsafoundation.com/"> Heterogeneous Systems
+     Architecture 1.0</a> <a href="https://gcc.gnu.org/gcc-6/changes.html#hsa">
+     support was added to GCC</a>, contributed by Martin Jambor, Martin Liška
+     and Michael Matz from SUSE.</dd>
+
 <dt><span><a href="gcc-5/">GCC 5.3</a> released</span>
     <span class="date">[2015-12-04]</span></dt>
     <dd></dd>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: Partial Offloading (was: [hsa merge 07/10] IPA-HSA pass)
  2016-01-28 11:36                       ` Partial Offloading (was: [hsa merge 07/10] IPA-HSA pass) Thomas Schwinge
@ 2016-02-17 16:41                         ` Ilya Verbin
  0 siblings, 0 replies; 77+ messages in thread
From: Ilya Verbin @ 2016-02-17 16:41 UTC (permalink / raw)
  To: Thomas Schwinge, Jakub Jelinek
  Cc: Martin Jambor, Alexander Monakov, gcc-patches, Kirill Yukhin

On Thu, Jan 28, 2016 at 12:36:19 +0100, Thomas Schwinge wrote:
> I made an attempt to capture the recent discussion (plus my own
> ideas/understanding) in this new section:
> <https://gcc.gnu.org/wiki/Offloading#Partial_Offloading>.  Please
> change/extend, as required.

Thanks for summarizing this.


I'm not very happy how -foffload=disable works in GCC 6, here is a testcase:

int main ()
{
  int x = 10;
  #pragma omp target data map (from: x)
    #pragma omp target map (alloc: x)
      x = 20;
  if (x != 10 && x != 20)
    __builtin_abort ();
}

On the system with non-shared accelerator it will abort, because "#pragma omp
target data" behaves like offloading is enabled, but "#pragma omp target" runs
on the host.  As the result, at the end of the *target data* region, it tries to
receive x from target and receives 0, or crashes.

We can forbid -foffload=disable option, but I think it's very useful, e.g. for
comparing performance of host vs. accelerator using the same compiler, etc.
Or if the system contains 2 different accelerators, someone might want to
compile only for the first, but libgomp will load 2 plugins, and the program
will crash (instead of doing fallback) if it will try to use the second device.

So, maybe we still need something like this patch?
https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01033.html

  -- Ilya

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 08/10] HSAIL BRIG description header file
  2016-01-15  1:19     ` Ian Lance Taylor
  2016-01-15 10:03       ` Richard Biener
  2016-01-15 10:14       ` Martin Jambor
@ 2016-02-26 18:58       ` Martin Jambor
  2016-02-26 19:28         ` Mike Stump
  2 siblings, 1 reply; 77+ messages in thread
From: Martin Jambor @ 2016-02-26 18:58 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Jakub Jelinek, GCC Patches, David Edelsohn, Jeff Law

Hi,

I hope I've got some good news:

On Thu, Jan 14, 2016 at 05:18:56PM -0800, Ian Lance Taylor wrote:
> Jakub Jelinek <jakub@redhat.com> writes:
>
> > On Wed, Jan 13, 2016 at 06:39:33PM +0100, Martin Jambor wrote:
> >> the following patch adds a BRIG (binary representation of HSAIL)
> >> representation description.  It is within a single header file
> >> describing the binary structures and constants of the format.
> >>
> >> The file comes from the HSA Foundation (I have only added the
> >> HSA_BRIG_FORMAT_H macro and check and removed some weird comments
> >> which are not present in proposed future versions of the file) and is
> >> licensed under "University of Illinois/NCSA Open Source License."
> >>
> >> The license is "GPL-compatible" according to FSF
> >> (http://www.gnu.org/licenses/license-list.en.html#GPLCompatibleLicenses)
> >> so I believe we can have it in GCC.  Nevertheless, it is not GPL and
> >> there is no copyright assignment for it, but the situation is
> >> hopefully analogous to some other libraries that have their upstream
> >> elsewhere but we ship them as part of the GCC.
> >>
> >> In the previous posting of this patch
> >> (https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00721.html) I have
> >> requested a permission from the steering committee to include this file
> >> with a different upstream in GCC.  I have not received an official
> >> reply but since I have been chosen to be the HSA maintainer, I tend to
> >> think there were no legal objections against HSA going forward,
> >> including this file.
>
> Martin, could you ask the HSA Foundation or AMD or whoever if there is
> any way they could remove the second requirement of the license?  It
> adds yet another case where anybody distributing GCC has to list yet
> another copyright notice.
>

I have asked HSA foundation to do just that and apparently they agreed
to change the licensing of the file (in upcoming versions of HSA) to
the MIT license.  IIUC, the reading of the license header would then
be the one below.  I hope that means the problematic requirements will
be gone and we will be able to just use their file.  If, however, you
still think there will be issues preventing us from doing that, please
let me know as soon as possible.

Thanks,

Martin



The license is going to be:

The MIT License (MIT)

Copyright (c) 2016, HSA Foundation, Inc

 * All rights reserved.



Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
of the Software, and to permit persons to whom the Software is furnished to do
so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE."

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [hsa merge 08/10] HSAIL BRIG description header file
  2016-02-26 18:58       ` Martin Jambor
@ 2016-02-26 19:28         ` Mike Stump
  0 siblings, 0 replies; 77+ messages in thread
From: Mike Stump @ 2016-02-26 19:28 UTC (permalink / raw)
  To: Martin Jambor
  Cc: Ian Lance Taylor, Jakub Jelinek, GCC Patches, David Edelsohn, Jeff Law

On Feb 26, 2016, at 10:58 AM, Martin Jambor <mjambor@suse.cz> wrote:
> I have asked HSA foundation

Thanks.

> The license is going to be:
> 
> The MIT License (MIT)

Wonderful.

^ permalink raw reply	[flat|nested] 77+ messages in thread

end of thread, other threads:[~2016-02-26 19:28 UTC | newest]

Thread overview: 77+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-13 17:55 [hsa merge 00/10] Merge of HSA branch Martin Jambor
2016-01-13 17:55 ` [hsa merge 01/10] Configury changes and new options Martin Jambor
2016-01-14 12:35   ` Jakub Jelinek
2016-02-02 12:53   ` Thomas Schwinge
2016-01-13 17:55 ` [hsa merge 04/10] Avoid extraneous remapping in copy_gimple_seq_and_replace_locals Martin Jambor
2016-01-13 18:02   ` Jakub Jelinek
2016-01-13 17:55 ` [hsa merge 10/10] HSA register allocator Martin Jambor
2016-01-14 14:41   ` Jakub Jelinek
2016-01-15 15:09     ` Martin Jambor
2016-01-13 17:55 ` [hsa merge 02/10] Modifications to libgomp proper Martin Jambor
2016-01-20 13:37   ` Ilya Verbin
2016-01-20 14:48   ` Ilya Verbin
2016-01-20 14:55     ` Jakub Jelinek
2016-01-13 17:55 ` [hsa merge 07/10] IPA-HSA pass Martin Jambor
2016-01-14 12:59   ` Jakub Jelinek
2016-01-15 11:48     ` Martin Liška
2016-01-15 14:53     ` Martin Jambor
2016-01-15 15:01       ` Jakub Jelinek
2016-01-15 16:02         ` Martin Jambor
2016-01-15 16:10           ` Jakub Jelinek
2016-01-15 16:38             ` Ilya Verbin
2016-01-15 16:45               ` Jakub Jelinek
2016-01-15 18:06                 ` Ilya Verbin
2016-01-20 18:54                   ` Ilya Verbin
2016-01-21 11:39                     ` Alexander Monakov
2016-01-22 10:19                     ` Jakub Jelinek
2016-01-28 11:36                       ` Partial Offloading (was: [hsa merge 07/10] IPA-HSA pass) Thomas Schwinge
2016-02-17 16:41                         ` Ilya Verbin
2016-01-15 19:19                 ` [hsa merge 07/10] IPA-HSA pass Alexander Monakov
2016-01-15 19:32                   ` Jakub Jelinek
2016-01-15 19:46                     ` Alexander Monakov
2016-01-15 19:55                       ` Jakub Jelinek
2016-01-15  9:52   ` Jan Hubicka
2016-01-15 11:47     ` Martin Liška
2016-01-16 10:00       ` Jan Hubicka
2016-01-25 15:22         ` Martin Liška
2016-01-25 15:24           ` Jakub Jelinek
2016-01-25 23:41             ` Jan Hubicka
2016-01-27 12:38               ` Martin Liška
2016-01-13 17:55 ` [hsa merge 03/10] HSA libgomp plugin Martin Jambor
2016-01-14 12:37   ` Jakub Jelinek
2016-01-13 17:55 ` [hsa merge 06/10] Pass manager changes Martin Jambor
2016-01-14 12:52   ` Jakub Jelinek
2016-01-13 18:11 ` [hsa merge 09/10] Majority of the HSA back-end Martin Jambor
2016-01-14 14:05   ` Jakub Jelinek
2016-01-15 15:08     ` Martin Jambor
2016-01-15 15:18       ` Jakub Jelinek
2016-01-15 23:49   ` Martin Jambor
2016-01-16  8:58     ` Jakub Jelinek
2016-01-16  9:23       ` Jakub Jelinek
2016-01-18 15:33       ` Martin Jambor
2016-01-18 15:48         ` Jakub Jelinek
2016-01-13 18:12 ` [hsa merge 08/10] HSAIL BRIG description header file Martin Jambor
2016-01-14 13:04   ` Jakub Jelinek
2016-01-15  1:19     ` Ian Lance Taylor
2016-01-15 10:03       ` Richard Biener
2016-01-15 10:14       ` Martin Jambor
2016-01-15 10:17         ` Richard Biener
2016-01-15 10:37         ` Jakub Jelinek
2016-01-15 12:03           ` Jakub Jelinek
2016-01-15 17:23             ` Martin Jambor
2016-01-16 11:43               ` Jakub Jelinek
2016-01-18 15:22                 ` Martin Jambor
2016-01-15 17:53           ` Mike Stump
2016-01-15 20:23             ` Richard Biener
2016-02-26 18:58       ` Martin Jambor
2016-02-26 19:28         ` Mike Stump
2016-01-13 18:17 ` [hsa merge 05/10] OpenMP lowering/expansion changes (gridification) Martin Jambor
2016-01-14 12:52   ` Jakub Jelinek
2016-01-15 14:52     ` Martin Jambor
2016-01-14 14:22   ` Alexander Monakov
2016-01-19 10:45 ` [hsa merge 00/10] Merge of HSA branch Martin Jambor
2016-01-19 14:25   ` Richard Biener
2016-01-21 23:10     ` Gerald Pfeifer
2016-01-27 14:20       ` Martin Jambor
2016-01-28 15:18         ` Gerald Pfeifer
2016-02-02 14:20           ` Martin Jambor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).