public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [gomp4] various OpenACC/PTX built-ins and a reduction tweak
@ 2014-09-17  0:33 Cesar Philippidis
  2014-09-17  8:44 ` Tobias Burnus
  2014-09-18 18:43 ` [gomp4] various OpenACC/PTX built-ins and a reduction tweak Thomas Schwinge
  0 siblings, 2 replies; 9+ messages in thread
From: Cesar Philippidis @ 2014-09-17  0:33 UTC (permalink / raw)
  To: gcc-patches, fortran, Thomas Schwinge

[-- Attachment #1: Type: text/plain, Size: 1229 bytes --]

The patch introduces the following OpenACC/PTX-specific built-ins:

  * GOACC_ntid
  * GOACC_tid
  * GOACC_nctaid
  * GOACC_ctaid
  * acc_on_device
  * GOACC_get_thread_num
  * GOACC_get_num_threads

Of these functions, the only one part of the OpenACC spec is
acc_on_device. The other functions are helpers for omp-low.c. In
particular, I'm using GOACC_get_thread_num and GOACC_get_num_threads to
determine the number of accelerator threads available to the reduction
clause. Current GOACC_get_num_threads is num_gangs * vector_length, but
value is subject to change later on. It's probably a premature to
include the PTX built-ins right now, but I'd like to middle end of our
internal OpenACC branch in sync with gomp-4_0-branch.

This patch also allows OpenACC reductions to process the array holding
partial reductions on the accelerator, instead of copying that array
back to the host. Currently, this only happens when num_gangs = 1. For
PTX targets, we're going to need to use another kernel to process the
array of partial results because PTX lacks inter-CTA synchronization
(we're currently mapping gangs to CTAs). That's why I was working on the
routine clause recently.

Is this OK for gomp-4_0-branch?

Thanks,
Cesar

[-- Attachment #2: builtins-gomp4.diff --]
[-- Type: text/x-patch, Size: 36188 bytes --]

2014-09-16  Cesar Philippidis  <cesar@codesourcery.com>

	gcc/
	* builtins.c (expand_builtin_acc_on_device): New function.
	(expand_oacc_builtin): New function.
	(expand_builtin): Handle BUILT_IN_GOACC_NCTAID, BUILT_IN_GOACC_CTAID,
	BUILT_IN_GOACC_NTID, BUILT_IN_GOACC_TID, BUILT_IN_GOACC_GET_THREAD_NUM
	and BUILT_IN_GOACC_GET_NUM_THREADS.
	(is_simple_builtin): Handle BUILT_IN_GOACC_NTID and BUILT_IN_GOACC_TID.
	(is_inexpensive_builtin): Handle BUILT_IN_ACC_ON_DEVICE.
	* gcc/builtins.def (DEF_GOACC_BUILTIN): Temporarily make COND always
	true.
	(DEF_GOACC_BUILTIN_COMPILER): New.

	* gcc/oacc-builtins.def (BUILT_IN_GOACC_NTID, BUILT_IN_GOACC_TID,
	BUILT_IN_GOACC_NCTAID, BUILT_IN_GOACC_CTAID, BUILT_IN_ACC_ON_DEVICE,
	BUILT_IN_GOACC_GET_THREAD_NUM, BUILT_IN_GOACC_GET_NUM_THREADS): New
	built-ins.
	* gcc/omp-low.c (finish_reduction_on_host): New function.
	(oacc_host_nthreads): New function.
	(lower_reduction_clauses): Process the array of partial reductions
	on the accelerator is num_gangs = 1.
	(expand_omp_for_static_nochunk): Use BUILT_IN_GOACC_GET_NUM_THREADS and
	BUILT_IN_GOACC_GET_THREAD_NUM for nthreads and threadid, respectively,
	with GF_OMP_FOR_KIND_OACC_LOOP.
	(expand_omp_for_static_chunk): Likewise.
	(expand_omp_target): Likewise.
	(initialize_reduction_data): Adjust memory maps for the case where
	the partial reductions are processed on the accelerator.
	(finalize_reduction_data): Handle reductions on the accelerator.
	(process_reduction_data): Likewise.


	gcc/fortran/
	* f95-lang.c (gfc_init_builtin_functions): Define
	DEF_GOACC_BUILTIN_COMPILER.
	* types.def (DEF_FUNCTION_TYPE_0): Define DEF_FUNCTION_TYPE_1 and
	DEF_FUNCTION_TYPE_3.

	gcc/testsuite/
	* c-c++-common/goacc/goacc_builtins.c: New test.


diff --git a/gcc/builtins.c b/gcc/builtins.c
index 975f696..fa1ac2d 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -5747,6 +5747,131 @@ expand_stack_save (void)
   return ret;
 }
 
+
+/* Expand OpenACC acc_on_device.
+
+   This has to happen late (that is, not in early folding; expand_builtin_*,
+   rather than fold_builtin_*), as we have to act differently for host and
+   acceleration device.  */
+
+static rtx
+expand_builtin_acc_on_device (tree exp, rtx target ATTRIBUTE_UNUSED)
+{
+  if (!validate_arglist (exp, INTEGER_TYPE, VOID_TYPE))
+    return NULL_RTX;
+
+  tree arg, v1, v2, ret;
+  location_t loc;
+
+  arg = CALL_EXPR_ARG (exp, 0);
+  arg = builtin_save_expr (arg);
+  loc = EXPR_LOCATION (exp);
+
+  /* Build: (arg == v1 || arg == v2) ? 1 : 0.  */
+
+#ifdef ACCEL_COMPILER
+  v1 = build_int_cst (TREE_TYPE (arg), /* TODO: acc_device_not_host */ 3);
+  v2 = build_int_cst (TREE_TYPE (arg), ACCEL_COMPILER_acc_device);
+#else
+  v1 = build_int_cst (TREE_TYPE (arg), /* TODO: acc_device_none */ 0);
+  v2 = build_int_cst (TREE_TYPE (arg), /* TODO: acc_device_host */ 2);
+#endif
+
+  v1 = fold_build2_loc (loc, EQ_EXPR, integer_type_node, arg, v1);
+  v2 = fold_build2_loc (loc, EQ_EXPR, integer_type_node, arg, v2);
+
+  /* Can't use TRUTH_ORIF_EXPR, as that is not supported by
+     expand_expr_real*.  */
+  ret = fold_build3_loc (loc, COND_EXPR, integer_type_node, v1, v1, v2);
+  ret = fold_build3_loc (loc, COND_EXPR, integer_type_node,
+			 ret, integer_one_node, integer_zero_node);
+
+  return expand_normal (ret);
+}
+
+
+/* Expand a thread-id/thread-count builtin for OpenACC.  */
+static rtx
+expand_oacc_builtin (enum built_in_function fcode, tree exp, rtx target)
+{
+  tree arg0 = NULL_TREE;
+  bool has_arg0 = false;
+  rtx result = const0_rtx;
+  rtx arg;
+
+  enum insn_code icode = CODE_FOR_nothing;
+  switch (fcode)
+    {
+    case BUILT_IN_GOACC_NTID:
+#ifdef HAVE_oacc_ntid
+      icode = CODE_FOR_oacc_ntid;
+#endif
+      has_arg0 = true;
+      result = const1_rtx;
+      break;
+    case BUILT_IN_GOACC_TID:
+#ifdef HAVE_oacc_ntid
+      icode = CODE_FOR_oacc_tid;
+#endif
+      has_arg0 = true;
+      break;
+    case BUILT_IN_GOACC_NCTAID:
+#ifdef HAVE_oacc_ntid
+      icode = CODE_FOR_oacc_nctaid;
+#endif
+      has_arg0 = true;
+      result = const1_rtx;
+      break;
+    case BUILT_IN_GOACC_CTAID:
+#ifdef HAVE_oacc_ntid
+      icode = CODE_FOR_oacc_ctaid;
+#endif
+      has_arg0 = true;
+      break;
+    case BUILT_IN_GOACC_GET_THREAD_NUM:
+#ifdef HAVE_oacc_threadnum
+      icode = CODE_FOR_oacc_threadnum;
+#endif
+      result = const0_rtx;
+      break;
+    case BUILT_IN_GOACC_GET_NUM_THREADS:
+#ifdef HAVE_oacc_numthreads
+      icode = CODE_FOR_oacc_numthreads;
+#endif
+      result = const1_rtx;
+      break;
+    default:
+      break;
+    }
+
+  if (has_arg0)
+    {
+      arg0 = CALL_EXPR_ARG (exp, 0);
+
+      gcc_assert (TREE_CODE (arg0) == INTEGER_CST);
+      arg = expand_normal (arg0);
+    }
+  if (icode != CODE_FOR_nothing)
+    {
+      enum machine_mode mode = insn_data[icode].operand[0].mode;
+      rtx tmp = target;
+      rtx insn;
+      if (!REG_P (tmp) || GET_MODE (tmp) != mode)
+	tmp = gen_reg_rtx (mode);
+      if (arg0)
+	insn = GEN_FCN (icode) (tmp, arg);
+      else
+	insn = GEN_FCN (icode) (tmp);
+      if (insn != NULL_RTX)
+	{
+	  emit_insn (insn);
+	  return tmp;
+        }
+    }
+
+  return result;
+}
+
 /* Expand an expression EXP that calls a built-in function,
    with result going to TARGET if that's convenient
    (and in mode MODE if that's convenient).
@@ -6816,6 +6941,20 @@ expand_builtin (tree exp, rtx target, rtx subtarget, enum machine_mode mode,
       expand_builtin_cilk_pop_frame (exp);
       return const0_rtx;
 
+    case BUILT_IN_ACC_ON_DEVICE:
+      target = expand_builtin_acc_on_device (exp, target);
+      if (target)
+	return target;
+      break;
+
+    case BUILT_IN_GOACC_NCTAID:
+    case BUILT_IN_GOACC_CTAID:
+    case BUILT_IN_GOACC_NTID:
+    case BUILT_IN_GOACC_TID:
+    case BUILT_IN_GOACC_GET_THREAD_NUM:
+    case BUILT_IN_GOACC_GET_NUM_THREADS:
+      return expand_oacc_builtin (fcode, exp, target);
+
     default:	/* just do library call, if unknown builtin */
       break;
     }
@@ -12663,6 +12802,9 @@ is_simple_builtin (tree decl)
       case BUILT_IN_EH_FILTER:
       case BUILT_IN_EH_POINTER:
       case BUILT_IN_EH_COPY_VALUES:
+	/* Just a special register access.  */
+      case BUILT_IN_GOACC_NTID:
+      case BUILT_IN_GOACC_TID:
 	return true;
 
       default:
@@ -12748,6 +12890,7 @@ is_inexpensive_builtin (tree decl)
       case BUILT_IN_LABS:
       case BUILT_IN_LLABS:
       case BUILT_IN_PREFETCH:
+      case BUILT_IN_ACC_ON_DEVICE:
 	return true;
 
       default:
diff --git a/gcc/builtins.def b/gcc/builtins.def
index 2ef896e..ef3267b 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -146,12 +146,18 @@ along with GCC; see the file COPYING3.  If not see
   DEF_BUILTIN (ENUM, NAME, BUILT_IN_NORMAL, BT_LAST, BT_LAST, false, false, \
 	       false, ATTR_LAST, false, false)
 
-/* Builtin used by the implementation of GNU OpenACC.  None of these are
-   actually implemented in the compiler; they're all in libgomp.  */
+/* Builtin used by the implementation of GNU OpenACC.  Few of these are
+   actually implemented in the compiler; most are in libgomp.  */
 #undef DEF_GOACC_BUILTIN
 #define DEF_GOACC_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
   DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,    \
-               false, true, true, ATTRS, false, flag_openacc)
+               false, true, true, ATTRS, false, \
+	       (/* TODO */ true || flag_openacc))
+#undef DEF_GOACC_BUILTIN_COMPILER
+#define DEF_GOACC_BUILTIN_COMPILER(ENUM, NAME, TYPE, ATTRS) \
+  DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,    \
+               true, true, true, ATTRS, false, \
+	       (/* TODO */ true || flag_openacc))
 
 /* Builtin used by the implementation of GNU OpenMP.  None of these are
    actually implemented in the compiler; they're all in libgomp.  */
@@ -159,7 +165,7 @@ along with GCC; see the file COPYING3.  If not see
 #define DEF_GOMP_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
   DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,    \
                false, true, true, ATTRS, false, \
-	       (flag_openmp || flag_tree_parallelize_loops))
+	       (/* TODO */ true || flag_openmp || flag_tree_parallelize_loops))
 
 /* Builtin used by implementation of Cilk Plus.  Most of these are decomposed
    by the compiler but a few are implemented in libcilkrts.  */ 
diff --git a/gcc/fortran/f95-lang.c b/gcc/fortran/f95-lang.c
index e7c64b7..85f0ed6 100644
--- a/gcc/fortran/f95-lang.c
+++ b/gcc/fortran/f95-lang.c
@@ -1093,7 +1093,11 @@ gfc_init_builtin_functions (void)
 #define DEF_GOACC_BUILTIN(code, name, type, attr) \
       gfc_define_builtin ("__builtin_" name, builtin_types[type], \
 			  code, name, attr);
+#undef DEF_GOACC_BUILTIN_COMPILER
+#define DEF_GOACC_BUILTIN_COMPILER(code, name, type, attr) \
+      gfc_define_builtin (name, builtin_types[type], code, name, attr);
 #include "../oacc-builtins.def"
+#undef DEF_GOACC_BUILTIN_COMPILER
 #undef DEF_GOACC_BUILTIN
     }
 
diff --git a/gcc/fortran/types.def b/gcc/fortran/types.def
index 59ac4c3..1dce308 100644
--- a/gcc/fortran/types.def
+++ b/gcc/fortran/types.def
@@ -82,6 +82,7 @@ DEF_FUNCTION_TYPE_0 (BT_FN_VOID, BT_VOID)
 DEF_FUNCTION_TYPE_1 (BT_FN_VOID_PTR, BT_VOID, BT_PTR)
 DEF_FUNCTION_TYPE_1 (BT_FN_VOID_PTRPTR, BT_VOID, BT_PTR_PTR)
 DEF_FUNCTION_TYPE_1 (BT_FN_VOID_VPTR, BT_VOID, BT_VOLATILE_PTR)
+DEF_FUNCTION_TYPE_1 (BT_FN_INT_INT, BT_INT, BT_INT)
 DEF_FUNCTION_TYPE_1 (BT_FN_UINT_UINT, BT_UINT, BT_UINT)
 DEF_FUNCTION_TYPE_1 (BT_FN_PTR_PTR, BT_PTR, BT_PTR)
 DEF_FUNCTION_TYPE_1 (BT_FN_VOID_INT, BT_VOID, BT_INT)
@@ -144,6 +145,7 @@ DEF_FUNCTION_TYPE_3 (BT_FN_VOID_VPTR_I2_INT, BT_VOID, BT_VOLATILE_PTR, BT_I2, BT
 DEF_FUNCTION_TYPE_3 (BT_FN_VOID_VPTR_I4_INT, BT_VOID, BT_VOLATILE_PTR, BT_I4, BT_INT)
 DEF_FUNCTION_TYPE_3 (BT_FN_VOID_VPTR_I8_INT, BT_VOID, BT_VOLATILE_PTR, BT_I8, BT_INT)
 DEF_FUNCTION_TYPE_3 (BT_FN_VOID_VPTR_I16_INT, BT_VOID, BT_VOLATILE_PTR, BT_I16, BT_INT)
+DEF_FUNCTION_TYPE_3 (BT_FN_VOID_INT_PTR_INT, BT_VOID, BT_INT, BT_PTR, BT_INT)
 
 DEF_FUNCTION_TYPE_4 (BT_FN_VOID_OMPFN_PTR_UINT_UINT,
                      BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR, BT_UINT, BT_UINT)
diff --git a/gcc/oacc-builtins.def b/gcc/oacc-builtins.def
index dfb688c..909a917 100644
--- a/gcc/oacc-builtins.def
+++ b/gcc/oacc-builtins.def
@@ -39,3 +39,17 @@ DEF_GOACC_BUILTIN (BUILT_IN_GOACC_PARALLEL, "GOACC_parallel",
 		   ATTR_NOTHROW_LIST)
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_UPDATE, "GOACC_update",
 		   BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST)
+DEF_GOACC_BUILTIN (BUILT_IN_GOACC_NTID, "GOACC_ntid",
+		   BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_GOACC_BUILTIN (BUILT_IN_GOACC_TID, "GOACC_tid",
+		   BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_GOACC_BUILTIN (BUILT_IN_GOACC_NCTAID, "GOACC_nctaid",
+		   BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_GOACC_BUILTIN (BUILT_IN_GOACC_CTAID, "GOACC_ctaid",
+		   BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_GOACC_BUILTIN_COMPILER (BUILT_IN_ACC_ON_DEVICE, "acc_on_device",
+			    BT_FN_INT_INT, ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_GOACC_BUILTIN (BUILT_IN_GOACC_GET_THREAD_NUM, "GOACC_get_thread_num",
+		   BT_FN_INT, ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_GOACC_BUILTIN (BUILT_IN_GOACC_GET_NUM_THREADS, "GOACC_get_num_threads",
+		   BT_FN_INT, ATTR_CONST_NOTHROW_LEAF_LIST)
diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def
index 698dc79..08b825c 100644
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -236,6 +236,3 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET_UPDATE, "GOMP_target_update",
 		  BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TEAMS, "GOMP_teams",
 		  BT_FN_VOID_UINT_UINT, ATTR_NOTHROW_LIST)
-
-DEF_GOMP_BUILTIN (BUILT_IN_OMP_SET_NUM_THREADS, "omp_set_num_threads",
-		  BT_FN_VOID_INT, ATTR_CONST_NOTHROW_LEAF_LIST)
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 927a294..c4a6a90 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -238,6 +238,88 @@ omp_get_id (tree node)
   return IDENTIFIER_POINTER(get_identifier (temp_name));
 }
 
+/* Determines if the reduction array should be processed on the host.
+   This is done to avoid launching multiple kernels to synchronize
+   threads across PTX Cooperative Thread Arrays.  */
+static bool
+finish_reduction_on_host (omp_context *ctx)
+{
+  /* Currently, OpenACC gangs are mapped onto PTX CTAs.  Return false
+     if the num_gangs may be set to something other than one.  */
+  for (omp_context *oc = ctx; oc; oc = oc->outer)
+    {
+      tree c, t;
+      int gangs;
+
+      if (gimple_code (oc->stmt) == GIMPLE_OACC_PARALLEL)
+	{
+	  c = gimple_oacc_parallel_clauses (oc->stmt);
+	  t = find_omp_clause (c, OMP_CLAUSE_NUM_GANGS);
+	  if (t)
+	    {
+	      t = fold_convert_loc (OMP_CLAUSE_LOCATION (t),
+				    integer_type_node,
+				    OMP_CLAUSE_NUM_GANGS_EXPR (t));
+
+	      if (TREE_CODE (t) != INTEGER_CST)
+		return true;
+
+	      gangs = TREE_INT_CST_LOW (t);
+
+	      if (gangs > 1)
+		return true;
+	    }
+	  break;
+	}
+    }
+
+  return false;
+}
+
+/* Determine the number of threads OpenACC threads.  Currently, this is
+   num_gangs * vector_length.  */
+
+static tree
+oacc_host_nthreads (omp_context *ctx)
+{
+  tree nthreads, vector_length, gangs, clauses;
+
+  gangs = fold_convert (sizetype, integer_one_node);
+  vector_length = gangs;
+
+  /* The reduction clause may be nested inside a loop directive.
+     Scan for the innermost vector_length clause.  */
+  for (omp_context *oc = ctx; oc; oc = oc->outer)
+    {
+      if (gimple_code (oc->stmt) != GIMPLE_OACC_PARALLEL)
+	continue;
+
+      clauses = gimple_oacc_parallel_clauses (oc->stmt);
+
+      vector_length = find_omp_clause (clauses, OMP_CLAUSE_VECTOR_LENGTH);
+      if (vector_length)
+	vector_length = fold_convert_loc (OMP_CLAUSE_LOCATION (vector_length),
+					  sizetype,
+					  OMP_CLAUSE_VECTOR_LENGTH_EXPR
+					  (vector_length));
+      else
+	vector_length = fold_convert (sizetype, integer_one_node);
+
+      gangs = find_omp_clause (clauses, OMP_CLAUSE_NUM_GANGS);
+      if (gangs)
+        gangs = fold_convert_loc (OMP_CLAUSE_LOCATION (gangs), sizetype,
+				  OMP_CLAUSE_NUM_GANGS_EXPR (gangs));
+      else
+	gangs = fold_convert (sizetype, integer_one_node);
+
+      break;
+    }
+
+  nthreads = fold_build2 (MULT_EXPR, sizetype, gangs, vector_length);
+
+  return nthreads;
+}
+
 /* Holds a decl for __OPENMP_TARGET__.  */
 static GTY(()) tree offload_symbol_decl;
 
@@ -4356,6 +4438,10 @@ lower_lastprivate_clauses (tree clauses, tree predicate, gimple_seq *stmt_list,
 }
 
 
+static void
+finalize_reduction_data (tree clauses, tree nthreads, gimple_seq *stmt_seqp,
+			 omp_context *ctx, bool receiver = false);
+
 /* Generate code to implement the REDUCTION clauses.  */
 
 static void
@@ -4433,61 +4519,26 @@ lower_reduction_clauses (tree clauses, gimple_seq *stmt_seqp, omp_context *ctx)
 	  tree t = NULL_TREE, array, nthreads;
 	  tree type = get_base_type (var);
 
-	  /* First ensure that the current tid is less than vector_length.  */
-	  tree exit_label = create_artificial_label (UNKNOWN_LOCATION);
-	  tree reduction_label = create_artificial_label (UNKNOWN_LOCATION);
-
 	  /* Get the current thread id.  */
-	  tree call = builtin_decl_explicit (BUILT_IN_OMP_GET_THREAD_NUM);
-	  gimple stmt = gimple_build_call (call, 1, integer_zero_node);
-	  tree fntype = gimple_call_fntype (stmt);
-	  tree tid = create_tmp_var (TREE_TYPE (fntype), NULL);
+	  tree call = builtin_decl_explicit (BUILT_IN_GOACC_GET_THREAD_NUM);
+	  tree tid = create_tmp_var (TREE_TYPE (TREE_TYPE (call)), NULL);
+	  gimple stmt = gimple_build_call (call, 0);
 	  gimple_call_set_lhs (stmt, tid);
 	  gimple_seq_add_stmt (stmt_seqp, stmt);
 
 	  /* Find the total number of threads.  A reduction clause
 	     only appears inside a loop construction or a combined
 	     parallel and loop construct.  */
-	  tree c;
-
-	  if (gimple_code (ctx->stmt) == GIMPLE_OMP_FOR)
-	    c = gimple_oacc_parallel_clauses (ctx->outer->stmt);
-	  else
-	    c = gimple_oacc_parallel_clauses (ctx->stmt);
 
-	  t = find_omp_clause (c, OMP_CLAUSE_VECTOR_LENGTH);
-
-	  if (t)
-	    {
-	      t = fold_convert_loc (OMP_CLAUSE_LOCATION (t),
-				    integer_type_node,
-				    OMP_CLAUSE_VECTOR_LENGTH_EXPR (t));
-	    }
-
-	  if (!t)
-	    t = integer_one_node;
+	  call = builtin_decl_explicit (BUILT_IN_GOACC_GET_NUM_THREADS);
+	  t = create_tmp_var (TREE_TYPE (TREE_TYPE (call)), NULL);
+	  stmt = gimple_build_call (call, 0);
+	  gimple_call_set_lhs (stmt, t);
+	  gimple_seq_add_stmt (stmt_seqp, stmt);
 
-	  /* Extract the number of threads.  */
 	  nthreads = create_tmp_var (sizetype, NULL);
 	  gimplify_assign (nthreads, fold_build1 (NOP_EXPR, sizetype, t),
 			   stmt_seqp);
-	  stmt = gimple_build_assign_with_ops  (MINUS_EXPR, nthreads, nthreads,
-				 fold_build1 (NOP_EXPR, sizetype,
-					      integer_one_node));
-	  gimple_seq_add_stmt (stmt_seqp, stmt);
-
-	  /* If tid >= nthreads, goto exit_label.  */
-	  t = create_tmp_var (sizetype, NULL);
-	  gimplify_assign (t, fold_build1 (NOP_EXPR, sizetype, tid),
-			   stmt_seqp);
-	  stmt = gimple_build_cond (GT_EXPR, t, nthreads, exit_label,
-				    reduction_label);
-	  gimple_seq_add_stmt (stmt_seqp, stmt);
-
-	  /* Place the reduction_label here.  */
-
-	  gimple_seq_add_stmt (stmt_seqp,
-			       gimple_build_label (reduction_label));
 
 	  /* Now insert the partial reductions into the array.  */
 
@@ -4510,9 +4561,11 @@ lower_reduction_clauses (tree clauses, gimple_seq *stmt_seqp, omp_context *ctx)
 	  gimplify_assign (offset, TYPE_SIZE_UNIT (type),
 			   stmt_seqp);
 	  t = create_tmp_var (sizetype, NULL);
-	  gimplify_assign (t, unshare_expr (fold_build1 (NOP_EXPR, sizetype,
-							 tid)),
-			   stmt_seqp);
+
+	  /* Calculate the stack offset to be array[tid+1].  */
+	  x = fold_build2 (PLUS_EXPR, sizetype, build_int_cst (sizetype, 1),
+			   fold_build1 (NOP_EXPR, sizetype, tid));
+	  gimplify_assign (t, unshare_expr (x), stmt_seqp);
 	  stmt = gimple_build_assign_with_ops (MULT_EXPR, offset, offset, t);
 	  gimple_seq_add_stmt (stmt_seqp, stmt);
 
@@ -4528,8 +4581,50 @@ lower_reduction_clauses (tree clauses, gimple_seq *stmt_seqp, omp_context *ctx)
 	  x = unshare_expr (build_simple_mem_ref (ptr));
 	  stmt = gimplify_assign (x, new_var, stmt_seqp);
 
-	  /* Place exit label here.  */
-	  gimple_seq_add_stmt (stmt_seqp, gimple_build_label (exit_label));
+	  /* Synchronize the threads and finish up the reduction.  */
+
+	  tree next = create_artificial_label (UNKNOWN_LOCATION);
+	  tree reduction_exit = create_artificial_label (UNKNOWN_LOCATION);
+
+	  /* Synchronize all of the threads.  */
+	  call = builtin_decl_explicit (BUILT_IN_SYNC_SYNCHRONIZE);
+	  stmt = gimple_build_call (call, 0);
+	  gimple_seq_add_stmt (stmt_seqp, stmt);
+
+	  /* Jump to the exit label if tid != 0.  */
+	  tree t1 = create_tmp_var (sizetype, NULL);
+	  tree t2 = create_tmp_var (sizetype, NULL);
+	  gimplify_assign (t1, fold_build1 (NOP_EXPR, sizetype, tid),
+			   stmt_seqp);
+	  gimplify_assign (t2, fold_build1 (NOP_EXPR, sizetype,
+					    integer_zero_node),
+			   stmt_seqp);
+	  stmt = gimple_build_cond (NE_EXPR, t1, t2, reduction_exit, next);
+	  gimple_seq_add_stmt (stmt_seqp, stmt);
+	  gimple_seq_add_stmt (stmt_seqp, gimple_build_label (next));
+
+	  if (finish_reduction_on_host (ctx))
+	    {
+	      /* Set the last element of the array to be 1 if this kernel
+		 is executed on the accelerator.  */
+	      call = builtin_decl_explicit (BUILT_IN_ACC_ON_DEVICE);
+	      tree lhs = create_tmp_var (TREE_TYPE (TREE_TYPE (call)), NULL);
+	      stmt = gimple_build_call (call, 1, build_int_cst
+					(integer_type_node, 2));
+	      gimple_call_set_lhs (stmt, lhs);
+	      gimple_seq_add_stmt (stmt_seqp, stmt);
+
+	      x = unshare_expr (build_simple_mem_ref (array));
+	      stmt = gimplify_assign (x, convert (TREE_TYPE (new_var),
+						  fold_build1 (TRUTH_NOT_EXPR,
+							       sizetype, lhs)),
+				      stmt_seqp);
+	    }
+	    else
+	      finalize_reduction_data (clauses, nthreads, stmt_seqp, ctx,
+				       true);
+
+	  gimple_seq_add_stmt (stmt_seqp, gimple_build_label (reduction_exit));
 
 	  return;
 	    }
@@ -5644,9 +5739,9 @@ expand_oacc_offload (struct omp_region *region)
   tree openmp_target = get_offload_symbol_decl ();
   tree fnaddr = build_fold_addr_expr (child_fn);
   g = gimple_build_call (builtin_decl_explicit (start_ix), 10, device,
-			 fnaddr, build_fold_addr_expr (openmp_target),
-			 t1, t2, t3, t4,
-			 t_num_gangs, t_num_workers, t_vector_length);
+                    fnaddr, build_fold_addr_expr (openmp_target),
+                    t1, t2, t3, t4,
+                    t_num_gangs, t_num_workers, t_vector_length);
   gimple_set_location (g, gimple_location (entry_stmt));
   gsi_insert_before (&gsi, g, GSI_SAME_STMT);
 }
@@ -6913,8 +7008,10 @@ expand_omp_for_static_nochunk (struct omp_region *region,
       threadid = build_call_expr (threadid, 0);
       break;
     case GF_OMP_FOR_KIND_OACC_LOOP:
-      nthreads = integer_one_node;
-      threadid = integer_zero_node;
+      nthreads = builtin_decl_explicit (BUILT_IN_GOACC_GET_NUM_THREADS);
+      nthreads = build_call_expr (nthreads, 0);
+      threadid = builtin_decl_explicit (BUILT_IN_GOACC_GET_THREAD_NUM);
+      threadid = build_call_expr (threadid, 0);
       break;
     default:
       gcc_unreachable ();
@@ -6922,6 +7019,15 @@ expand_omp_for_static_nochunk (struct omp_region *region,
   nthreads = fold_convert (itype, nthreads);
   nthreads = force_gimple_operand_gsi (&gsi, nthreads, true, NULL_TREE,
 				       true, GSI_SAME_STMT);
+
+  /* Ensure nthreads is at least 1.  BUILT_IN_GOACC_NTID returns 0 for a target
+     that does not have a specific expansion.  */
+  nthreads
+    = fold_build2 (MAX_EXPR, itype, nthreads,
+		   fold_convert (TREE_TYPE (nthreads), integer_one_node));
+  nthreads = force_gimple_operand_gsi (&gsi, nthreads, true, NULL_TREE,
+				       true, GSI_SAME_STMT);
+
   threadid = fold_convert (itype, threadid);
   threadid = force_gimple_operand_gsi (&gsi, threadid, true, NULL_TREE,
 				       true, GSI_SAME_STMT);
@@ -7317,8 +7423,10 @@ expand_omp_for_static_chunk (struct omp_region *region,
       threadid = build_call_expr (threadid, 0);
       break;
     case GF_OMP_FOR_KIND_OACC_LOOP:
-      nthreads = integer_one_node;
-      threadid = integer_zero_node;
+      nthreads = builtin_decl_explicit (BUILT_IN_GOACC_GET_NUM_THREADS);
+      nthreads = build_call_expr (nthreads, 0);
+      threadid = builtin_decl_explicit (BUILT_IN_GOACC_GET_THREAD_NUM);
+      threadid = build_call_expr (threadid, 0);
       break;
     default:
       gcc_unreachable ();
@@ -7326,6 +7434,15 @@ expand_omp_for_static_chunk (struct omp_region *region,
   nthreads = fold_convert (itype, nthreads);
   nthreads = force_gimple_operand_gsi (&gsi, nthreads, true, NULL_TREE,
 				       true, GSI_SAME_STMT);
+
+  /* Ensure nthreads is at least 1.  BUILT_IN_GOACC_NTID returns 0 for a target
+     that does not have a specific expansion.  */
+  nthreads
+    = fold_build2 (MAX_EXPR, itype, nthreads,
+		   fold_convert (TREE_TYPE (nthreads), integer_one_node));
+  nthreads = force_gimple_operand_gsi (&gsi, nthreads, true, NULL_TREE,
+				       true, GSI_SAME_STMT);
+
   threadid = fold_convert (itype, threadid);
   threadid = force_gimple_operand_gsi (&gsi, threadid, true, NULL_TREE,
 				       true, GSI_SAME_STMT);
@@ -9390,6 +9507,7 @@ expand_omp_target (struct omp_region *region)
     g = gimple_build_call (builtin_decl_explicit (start_ix), 6, device,
 			   build_fold_addr_expr (openmp_target),
 			   t1, t2, t3, t4);
+
   gimple_set_location (g, gimple_location (entry_stmt));
   gsi_insert_before (&gsi, g, GSI_SAME_STMT);
   if (kind != GF_OMP_TARGET_KIND_REGION)
@@ -9782,6 +9900,14 @@ initialize_reduction_data (tree clauses, tree nthreads, gimple_seq *stmt_seqp,
   tree (*gimple_omp_clauses) (const_gimple);
   void (*gimple_omp_set_clauses) (gimple, tree);
 
+  /* Increment nthreads by one, so the kernel can return the host type
+     in the last element of the array.  */
+  t = create_tmp_var (sizetype, NULL);
+  gimplify_assign (t, fold_build2 (PLUS_EXPR, sizetype,
+				   fold_build1 (NOP_EXPR, sizetype, nthreads),
+				   build_int_cst (sizetype, 1)), stmt_seqp);
+  nthreads = t;
+
   /* Find the innermost PARALLEL openmp context.  FIXME: OpenACC kernels
      may require extra care unless they are converted to openmp for loops.  */
 
@@ -9817,7 +9943,6 @@ initialize_reduction_data (tree clauses, tree nthreads, gimple_seq *stmt_seqp,
 			 fold_convert (TREE_TYPE (nthreads),
 				       TYPE_SIZE_UNIT (type)));
       gimple_seq_add_stmt (stmt_seqp, stmt);
-
       size = create_tmp_var (sizetype, NULL);
       gimplify_assign (size, fold_build1 (NOP_EXPR, sizetype, t), stmt_seqp);
 
@@ -9837,7 +9962,8 @@ initialize_reduction_data (tree clauses, tree nthreads, gimple_seq *stmt_seqp,
 	 most clause so that copy-out works.  */
       tree x = array;
       t = build_omp_clause (gimple_location (ctx->stmt), OMP_CLAUSE_MAP);
-      OMP_CLAUSE_MAP_KIND (t) = OMP_CLAUSE_MAP_FORCE_FROM;
+      OMP_CLAUSE_MAP_KIND (t) = finish_reduction_on_host (ctx) ?
+	OMP_CLAUSE_MAP_FORCE_FROM : OMP_CLAUSE_MAP_FORCE_ALLOC;
       OMP_CLAUSE_DECL (t) = x;
       OMP_CLAUSE_CHAIN (t) = NULL;
       if (oc)
@@ -9857,53 +9983,103 @@ initialize_reduction_data (tree clauses, tree nthreads, gimple_seq *stmt_seqp,
 
 static void
 finalize_reduction_data (tree clauses, tree nthreads, gimple_seq *stmt_seqp,
-			 omp_context *ctx)
+			 omp_context *ctx, bool receiver)
 {
   gcc_assert (is_gimple_omp_oacc_specifically (ctx->stmt));
 
-  tree c, var, array, loop_header, loop_body, loop_exit, type;
+  tree c, x, var, array, loop_header, loop_body, loop_exit, type, ptype;
   gimple stmt;
 
+  /* Update nthreads in case the reduction kernel was executed on the
+     host.  */
+  if (!receiver)
+    {
+      for (c = clauses; c && OMP_CLAUSE_CODE (c) != OMP_CLAUSE_REDUCTION;
+	   c = OMP_CLAUSE_CHAIN (c));
+
+      /* Set up reduction variable, var.  Because it's not gimple register,
+	 it needs to be treated as a reference.  */
+      var = OMP_CLAUSE_DECL (c);
+      type = get_base_type (var);
+      ptype = build_pointer_type (type);
+      if (receiver)
+	var = lookup_decl_in_outer_ctx (var, ctx);
+
+      /* Extract array[0] into mem.  */
+      array = lookup_reduction (omp_get_id (OMP_CLAUSE_DECL (c)), ctx);
+      tree mem = create_tmp_var (type, NULL);
+      gimplify_assign (mem, build_simple_mem_ref (array), stmt_seqp);
+
+      tree l1 = create_artificial_label (UNKNOWN_LOCATION);
+      tree l2 = create_artificial_label (UNKNOWN_LOCATION);
+
+      x = create_tmp_var (integer_type_node, NULL);
+      gimplify_assign (x, convert (integer_type_node, mem), stmt_seqp);
+      stmt = gimple_build_cond (EQ_EXPR, x,
+				integer_zero_node, l1, l2);
+
+      gimple_seq_add_stmt (stmt_seqp, stmt);
+      gimple_seq_add_stmt (stmt_seqp, gimple_build_label (l1));
+      gimplify_assign (nthreads, build_int_cst (sizetype, 1), stmt_seqp);
+      gimple_seq_add_stmt (stmt_seqp, gimple_build_label (l2));
+    }
+
   /* Create for loop.
 
      let var = the original reduction variable
      let array = reduction variable array
 
-     var = array[0]
-     for (i = 1; i < nthreads; i++)
+     var = array[1]
+     for (i = 2; i < nthreads; i++)
        var op= array[i]
- */
+  */
 
   loop_header = create_artificial_label (UNKNOWN_LOCATION);
   loop_body = create_artificial_label (UNKNOWN_LOCATION);
   loop_exit = create_artificial_label (UNKNOWN_LOCATION);
 
   /* Initialize the reduction variables to be value of the first array
-     element.  */
+     element.  FIXME: A parallel loop should use the original reduction
+     variable as the initial value.  */
   for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
     {
       if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_REDUCTION)
 	continue;
 
-      tree_code reduction_code = OMP_CLAUSE_REDUCTION_CODE (c);
-
-      /* reduction(-:var) sums up the partial results, so it acts
-	 identically to reduction(+:var).  */
-      if (reduction_code == MINUS_EXPR)
-        reduction_code = PLUS_EXPR;
-
       /* Set up reduction variable, var.  Becuase it's not gimple register,
          it needs to be treated as a reference.  */
       var = OMP_CLAUSE_DECL (c);
       type = get_base_type (var);
-      tree ptr = lookup_reduction (omp_get_id (OMP_CLAUSE_DECL (c)), ctx);
+      ptype = build_pointer_type (type);
+      if (receiver)
+	var = lookup_decl_in_outer_ctx (var, ctx);
+      array = lookup_reduction (omp_get_id (OMP_CLAUSE_DECL (c)), ctx);
 
-      /* Extract array[0] into mem.  */
+      if (receiver)
+	{
+	  tree t = create_tmp_var (ptype, NULL);
+	  array = build_receiver_ref (array, false, ctx->outer);
+	  gimplify_assign (t, array, stmt_seqp);
+	  array = t;
+	}
+
+      /* Calculate the array offset.  */
+      tree offset = create_tmp_var (sizetype, NULL);
+      gimplify_assign (offset, TYPE_SIZE_UNIT (type), stmt_seqp);
+      stmt = gimple_build_assign_with_ops (MULT_EXPR, offset, offset,
+					   build_int_cst (sizetype, 1));
+      gimple_seq_add_stmt (stmt_seqp, stmt);
+
+      tree ptr = create_tmp_var (TREE_TYPE (array), NULL);
+      stmt = gimple_build_assign_with_ops (POINTER_PLUS_EXPR, ptr, array,
+					   offset);
+      gimple_seq_add_stmt (stmt_seqp, stmt);
+
+      /* Extract array[1] into mem.  */
       tree mem = create_tmp_var (type, NULL);
       gimplify_assign (mem, build_simple_mem_ref (ptr), stmt_seqp);
 
       /* Find the original reduction variable.  */
-      tree x = build_outer_var_ref (var, ctx);
       if (is_reference (var))
 	var = build_simple_mem_ref (var);
 
@@ -9913,16 +10089,15 @@ finalize_reduction_data (tree clauses, tree nthreads, gimple_seq *stmt_seqp,
 
   /* Create an index variable and set it to one.  */
   tree ix = create_tmp_var (sizetype, NULL);
-  gimplify_assign (ix, fold_build1 (NOP_EXPR, sizetype, integer_one_node),
-		   stmt_seqp);
+  gimplify_assign (ix, build_int_cst (sizetype, 2), stmt_seqp);
 
   /* Insert the loop header label here.  */
   gimple_seq_add_stmt (stmt_seqp, gimple_build_label (loop_header));
 
-  /* Loop if ix >= nthreads.  */
-  tree x = create_tmp_var (sizetype, NULL);
+  /* Loop if ix < nthreads.  */
+  x = create_tmp_var (sizetype, NULL);
   gimplify_assign (x, fold_build1 (NOP_EXPR, sizetype, nthreads), stmt_seqp);
-  stmt = gimple_build_cond (GE_EXPR, ix, x, loop_exit, loop_body);
+  stmt = gimple_build_cond (GT_EXPR, ix, x, loop_exit, loop_body);
   gimple_seq_add_stmt (stmt_seqp, stmt);
 
   /* Insert the loop body label here.  */
@@ -9944,8 +10119,19 @@ finalize_reduction_data (tree clauses, tree nthreads, gimple_seq *stmt_seqp,
       /* Set up reduction variable var.  */
       var = OMP_CLAUSE_DECL (c);
       type = get_base_type (var);
+      ptype = build_pointer_type (type);
+      if (receiver)
+	var = lookup_decl_in_outer_ctx (var, ctx);
       array = lookup_reduction (omp_get_id (OMP_CLAUSE_DECL (c)), ctx);
 
+      if (receiver)
+	{
+	  tree t = create_tmp_var (ptype, NULL);
+	  array = build_receiver_ref (array, false, ctx->outer);
+	  gimplify_assign (t, array, stmt_seqp);
+	  array = t;
+	}
+
       /* Calculate the array offset.  */
       tree offset = create_tmp_var (sizetype, NULL);
       gimplify_assign (offset, TYPE_SIZE_UNIT (type), stmt_seqp);
@@ -9962,7 +10148,6 @@ finalize_reduction_data (tree clauses, tree nthreads, gimple_seq *stmt_seqp,
       gimplify_assign (mem, build_simple_mem_ref (ptr), stmt_seqp);
 
       /* Find the original reduction variable.  */
-      tree x = build_outer_var_ref (var, ctx);
       if (is_reference (var))
 	var = build_simple_mem_ref (var);
 
@@ -10026,7 +10211,6 @@ process_reduction_data (gimple_seq *body, gimple_seq *in_stmt_seqp,
 
   for (gsi = gsi_start (*body); !gsi_end_p (gsi); gsi_next (&gsi))
     {
-      tree call;
       tree clauses, nthreads, t, c;
       bool reduction_found = false;
  
@@ -10034,6 +10218,7 @@ process_reduction_data (gimple_seq *body, gimple_seq *in_stmt_seqp,
 
       switch (gimple_code (stmt))
 	{
+	  /* FIXME: A reduction may also appear in an oacc parallel.  */
 	case GIMPLE_OMP_FOR:
 	  clauses = gimple_omp_for_clauses (stmt);
 
@@ -10051,55 +10236,15 @@ process_reduction_data (gimple_seq *body, gimple_seq *in_stmt_seqp,
 	  ctx = maybe_lookup_ctx (stmt);
 	  t = NULL_TREE;
 
-	  /* The reduction clause may be nested inside a loop directive.
-	     Scan for the innermost vector_length clause.  */
-	  for (omp_context *oc = ctx; oc; oc = oc->outer)
-	    {
-	      switch (gimple_code (oc->stmt))
-		{
-		case GIMPLE_OACC_PARALLEL:
-		  c = gimple_oacc_parallel_clauses (oc->stmt);
-		  break;
-		case GIMPLE_OMP_FOR:
-		  c = gimple_omp_for_clauses (oc->stmt);
-		  break;
-		default:
-		  c = NULL_TREE;
-		  break;
-		}
-
-	      if (c && gimple_code (oc->stmt) == GIMPLE_OACC_PARALLEL)
-		{
-		  t = find_omp_clause (c, OMP_CLAUSE_VECTOR_LENGTH);
-		  if (t)
-		    t = fold_convert_loc (OMP_CLAUSE_LOCATION (t),
-					  integer_type_node,
-					  OMP_CLAUSE_VECTOR_LENGTH_EXPR (t));
-		  break;
-		}
-	    }
-
-	  if (!t)
-	    t = integer_one_node;
-
 	  /* Extract the number of threads.  */
-	  nthreads = create_tmp_var (TREE_TYPE (t), NULL);
+	  nthreads = create_tmp_var (sizetype, NULL);
+	  t = oacc_host_nthreads (ctx);
 	  gimplify_assign (nthreads, t, in_stmt_seqp);
 
-	  /* Ensure nthreads >= 1.  */
-	  stmt = gimple_build_assign_with_ops (MAX_EXPR, nthreads, nthreads,
-				          fold_convert(TREE_TYPE (nthreads),
-						       integer_one_node));
-	  gimple_seq_add_stmt (in_stmt_seqp, stmt);
-
-	  /* Set the number of threads.  */
-	  /* FIXME: This needs to handle accelerators  */
-	  call = builtin_decl_explicit (BUILT_IN_OMP_SET_NUM_THREADS);
-	  stmt = gimple_build_call (call, 1, nthreads);
-	  gimple_seq_add_stmt (in_stmt_seqp, stmt);
-
 	  initialize_reduction_data (clauses, nthreads, in_stmt_seqp, ctx);
-	  finalize_reduction_data (clauses, nthreads, out_stmt_seqp, ctx);
+
+	  if (finish_reduction_on_host (ctx))
+	    finalize_reduction_data (clauses, nthreads, out_stmt_seqp, ctx);
 	  break;
 	default:
 	  // Scan for other directives which support reduction here.
diff --git a/gcc/testsuite/c-c++-common/goacc/goacc_builtins.c b/gcc/testsuite/c-c++-common/goacc/goacc_builtins.c
new file mode 100644
index 0000000..7fa0df8
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/goacc_builtins.c
@@ -0,0 +1,66 @@
+/* { dg-do compile } */
+
+void
+ntid (void)
+{
+  const int ntid_x = __builtin_GOACC_ntid (0);
+  const int ntid_y = __builtin_GOACC_ntid (1);
+  const int ntid_z = __builtin_GOACC_ntid (2);
+
+  /* { dg-final { scan-assembler-not "__builtin_GOACC_ntid" } } */
+}
+
+void
+tid (void)
+{
+  const int tid_x = __builtin_GOACC_tid (0);
+  const int tid_y = __builtin_GOACC_tid (1);
+  const int tid_z = __builtin_GOACC_tid (2);
+
+  /* { dg-final { scan-assembler-not "__builtin_GOACC_tid" } } */
+}
+
+void
+nctaid (void)
+{
+  const int nctaid_x = __builtin_GOACC_nctaid (0);
+  const int nctaid_y = __builtin_GOACC_nctaid (1);
+  const int nctaid_z = __builtin_GOACC_nctaid (2);
+
+  /* { dg-final { scan-assembler-not "__builtin_GOACC_nctaid" } } */
+}
+
+void
+ctaid (void)
+{
+  const int ctaid_x = __builtin_GOACC_ctaid (0);
+  const int ctaid_y = __builtin_GOACC_ctaid (1);
+  const int ctaid_z = __builtin_GOACC_ctaid (2);
+
+  /* { dg-final { scan-assembler-not "__builtin_GOACC_ctaid" } } */
+}
+
+void
+on_device (void)
+{
+  const int on_host = __builtin_acc_on_device (0);
+  const int on_accelerator = __builtin_acc_on_device (1);
+
+  /* { dg-final { scan-assembler-not "__built_in_acc_on_device" } } */
+}
+
+void
+acc_get_thread_num (void)
+{
+  const int thread_num = __builtin_GOACC_get_thread_num ();
+
+  /* { dg-final { scan-assembler-not "__builtin_GOACC_get_thread_num" } } */
+}
+
+void
+acc_get_num_threads (void)
+{
+  const int num_threads = __builtin_GOACC_get_num_threads ();
+
+  /* { dg-final { scan-assembler-not "__builtin_GOACC_get_num_threads" } } */
+}

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [gomp4] various OpenACC/PTX built-ins and a reduction tweak
  2014-09-17  0:33 [gomp4] various OpenACC/PTX built-ins and a reduction tweak Cesar Philippidis
@ 2014-09-17  8:44 ` Tobias Burnus
  2014-09-17  8:50   ` Jakub Jelinek
  2014-09-18 18:43 ` [gomp4] various OpenACC/PTX built-ins and a reduction tweak Thomas Schwinge
  1 sibling, 1 reply; 9+ messages in thread
From: Tobias Burnus @ 2014-09-17  8:44 UTC (permalink / raw)
  To: Cesar Philippidis; +Cc: gcc-patches, fortran, Thomas Schwinge

Hi,

Cesar Philippidis wrote:
> The patch introduces the following OpenACC/PTX-specific built-ins:
...

It is not completely clear how they are supposed to get used. Should the
user call them directly in some cases? Or are they only used internally?

acc_on_device sounds like a function which would be in C/C++ made available
to the user via #define acc_on_device __builtin_acc_on_device.

However, the rest looks as if it should rather be an internal function
instead of a builtin. Or should the user really ever call the builtin
directly?

Regarding Fortran: Builtins aren't directly available to the user. You have to
wrap them into an intrinsic to make them available. If they have to be made
available via a module (e.g. via "module acc) - you have to create a virtual
module, which provides the intrinsic. If you don't want to convert the whole
module, you could create an auxiliar module (e.g. acc_internal_) which provides
only those bits - and then include it ("use,intrinsic :: ...") it in the
main module - written in normal Fortran.

But in any case, it would be helpful to know how those new built ins are intended
to be used.

Tobias

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [gomp4] various OpenACC/PTX built-ins and a reduction tweak
  2014-09-17  8:44 ` Tobias Burnus
@ 2014-09-17  8:50   ` Jakub Jelinek
  2014-09-17 14:12     ` Cesar Philippidis
  2014-09-18 18:01     ` [gomp4] OpenACC acc_on_device (was: various OpenACC/PTX built-ins and a reduction tweak) Thomas Schwinge
  0 siblings, 2 replies; 9+ messages in thread
From: Jakub Jelinek @ 2014-09-17  8:50 UTC (permalink / raw)
  To: Tobias Burnus; +Cc: Cesar Philippidis, gcc-patches, fortran, Thomas Schwinge

On Wed, Sep 17, 2014 at 10:44:12AM +0200, Tobias Burnus wrote:
> Cesar Philippidis wrote:
> > The patch introduces the following OpenACC/PTX-specific built-ins:
> ...
> 
> It is not completely clear how they are supposed to get used. Should the
> user call them directly in some cases? Or are they only used internally?
> 
> acc_on_device sounds like a function which would be in C/C++ made available
> to the user via #define acc_on_device __builtin_acc_on_device.

And not just providing acc_on_device prototype in some header?  Without
looking at the OpenACC standard, it sounds like this function could be
similar to omp_is_initial_device, so can and should be handled supposedly
similarly.
> 
> However, the rest looks as if it should rather be an internal function
> instead of a builtin. Or should the user really ever call the builtin
> directly?

GOMP_* functions are builtins and not internal functions too, all those
functions are library functions, while the user typically doesn't call them
directly, they still are implemented in the library.  Internal functions are
used for something that doesn't have a library implementation and is not
something user can call directly.

> Regarding Fortran: Builtins aren't directly available to the user. You have to
> wrap them into an intrinsic to make them available. If they have to be made
> available via a module (e.g. via "module acc) - you have to create a virtual
> module, which provides the intrinsic. If you don't want to convert the whole
> module, you could create an auxiliar module (e.g. acc_internal_) which provides
> only those bits - and then include it ("use,intrinsic :: ...") it in the
> main module - written in normal Fortran.

For the user callable fortran functions, for OpenMP libgomp just provides
*_ entrypoints to * functions.  Perhaps acc_on_device_ could be provided
too.

	Jakub

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [gomp4] various OpenACC/PTX built-ins and a reduction tweak
  2014-09-17  8:50   ` Jakub Jelinek
@ 2014-09-17 14:12     ` Cesar Philippidis
  2014-09-18 18:01     ` [gomp4] OpenACC acc_on_device (was: various OpenACC/PTX built-ins and a reduction tweak) Thomas Schwinge
  1 sibling, 0 replies; 9+ messages in thread
From: Cesar Philippidis @ 2014-09-17 14:12 UTC (permalink / raw)
  To: Jakub Jelinek, Tobias Burnus; +Cc: gcc-patches, fortran, Thomas Schwinge

On 09/17/2014 01:49 AM, Jakub Jelinek wrote:
> On Wed, Sep 17, 2014 at 10:44:12AM +0200, Tobias Burnus wrote:
>> Cesar Philippidis wrote:
>>> The patch introduces the following OpenACC/PTX-specific built-ins:
>> ...
>>
>> It is not completely clear how they are supposed to get used. Should the
>> user call them directly in some cases? Or are they only used internally?
>>
>> acc_on_device sounds like a function which would be in C/C++ made available
>> to the user via #define acc_on_device __builtin_acc_on_device.
> 
> And not just providing acc_on_device prototype in some header?  Without
> looking at the OpenACC standard, it sounds like this function could be
> similar to omp_is_initial_device, so can and should be handled supposedly
> similarly.

All of the functions are internal except for acc_on_device. We do have a
proper interface for it, but it's part of the libgomp runtime. Should I
remove acc_on_device from this patch until the runtime is ready?

>> However, the rest looks as if it should rather be an internal function
>> instead of a builtin. Or should the user really ever call the builtin
>> directly?
> 
> GOMP_* functions are builtins and not internal functions too, all those
> functions are library functions, while the user typically doesn't call them
> directly, they still are implemented in the library.  Internal functions are
> used for something that doesn't have a library implementation and is not
> something user can call directly.

All of the GOACC_* functions introduced in this patch are internal
functions without library functions, although that may change for
GOACC_get_thread_num and GOACC_get_num_threads. Is there a better way to
handle internal functions or is it correct as-is? I'm not that familiar
with built-in functions.

>> Regarding Fortran: Builtins aren't directly available to the user. You have to
>> wrap them into an intrinsic to make them available. If they have to be made
>> available via a module (e.g. via "module acc) - you have to create a virtual
>> module, which provides the intrinsic. If you don't want to convert the whole
>> module, you could create an auxiliar module (e.g. acc_internal_) which provides
>> only those bits - and then include it ("use,intrinsic :: ...") it in the
>> main module - written in normal Fortran.
> 
> For the user callable fortran functions, for OpenMP libgomp just provides
> *_ entrypoints to * functions.  Perhaps acc_on_device_ could be provided
> too.

Cesar

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [gomp4] OpenACC acc_on_device (was: various OpenACC/PTX built-ins and a reduction tweak)
  2014-09-17  8:50   ` Jakub Jelinek
  2014-09-17 14:12     ` Cesar Philippidis
@ 2014-09-18 18:01     ` Thomas Schwinge
  2014-10-31 10:56       ` [gomp4] OpenACC acc_on_device Thomas Schwinge
  1 sibling, 1 reply; 9+ messages in thread
From: Thomas Schwinge @ 2014-09-18 18:01 UTC (permalink / raw)
  To: Jakub Jelinek, Tobias Burnus, Cesar Philippidis; +Cc: gcc-patches, fortran

[-- Attachment #1: Type: text/plain, Size: 31272 bytes --]

Hi!

Here is my OpenACC acc_on_device patch, in a more complete form, with
test cases and all that.  Thanks, Cesar, for getting the ball rolling!

On Wed, 17 Sep 2014 10:49:54 +0200, Jakub Jelinek <jakub@redhat.com> wrote:
> On Wed, Sep 17, 2014 at 10:44:12AM +0200, Tobias Burnus wrote:
> > Cesar Philippidis wrote:
> > > The patch introduces the following OpenACC/PTX-specific built-ins:
> > ...
> > 
> > It is not completely clear how they are supposed to get used. Should the
> > user call them directly in some cases? Or are they only used internally?
> > 
> > acc_on_device sounds like a function which would be in C/C++ made available
> > to the user via #define acc_on_device __builtin_acc_on_device.
> 
> And not just providing acc_on_device prototype in some header?

Yes, just a prototype.  And next to DEF_GOACC_BUILTIN (configured the
same as DEF_GOMP_BUILTIN), I add a new DEF_GOACC_BUILTIN_COMPILER that is
configured to always provide the __builtin_[...] variant, but the
un-prefixed [...]  only if -fopenacc is in effect.  Does that look
alright?

> Without
> looking at the OpenACC standard, it sounds like this function could be
> similar to omp_is_initial_device, so can and should be handled supposedly
> similarly.

I think we've been talking about this at the Cauldron, where you agreed
that omp_is_initial_device should also be implemented as a builtin.  (Or
am I confusing things?)

> > However, the rest looks as if it should rather be an internal function
> > instead of a builtin. Or should the user really ever call the builtin
> > directly?
> 
> GOMP_* functions are builtins and not internal functions too, all those
> functions are library functions, while the user typically doesn't call them
> directly, they still are implemented in the library.  Internal functions are
> used for something that doesn't have a library implementation and is not
> something user can call directly.

> > Regarding Fortran: Builtins aren't directly available to the user. You have to
> > wrap them into an intrinsic to make them available. If they have to be made
> > available via a module (e.g. via "module acc) - you have to create a virtual
> > module, which provides the intrinsic. If you don't want to convert the whole
> > module, you could create an auxiliar module (e.g. acc_internal_) which provides
> > only those bits - and then include it ("use,intrinsic :: ...") it in the
> > main module - written in normal Fortran.

This I have not yet addressed -- please see the TODO comments in the
gcc/fortran/ files as well as Fortran test cases.

> For the user callable fortran functions, for OpenMP libgomp just provides
> *_ entrypoints to * functions.  Perhaps acc_on_device_ could be provided
> too.

This is what I had done already.

Does that patch look good?  (With the Fortran things still to be
addressed.)  (And, obviously this is not yet based on the Tobias/Jim
Fortran module/header rewrite.)

commit 8efbd08ed058d7ed3c43e10fbff0eac35b4defc9
Author: Thomas Schwinge <thomas@codesourcery.com>
Date:   Fri Jul 4 11:45:05 2014 +0000

    OpenACC acc_on_device.
    
    	gcc/
    	* builtins.def (DEF_GOACC_BUILTIN_COMPILER): New macro.
    	* oacc-builtins.def (BUILT_IN_GOACC_UPDATE): New builtin.
    	* builtins.c (expand_builtin_acc_on_device): New function.
    	(expand_builtin): Use it to handle BUILT_IN_ACC_ON_DEVICE.
    	(is_inexpensive_builtin): Handle BUILT_IN_ACC_ON_DEVICE.
    	gcc/fortran/
    	* f95-lang.c (DEF_GOACC_BUILTIN_COMPILER): New macro.
    	* types.def (BT_FN_INT_INT): New type.
    	gcc/testsuite/
    	* c-c++-common/goacc/acc_on_device-1.c: New file.
    	* c-c++-common/goacc/acc_on_device-2.c: Likewise.
    	* c-c++-common/goacc/acc_on_device-2-off.c: Likewise.
    	* gfortran.dg/goacc/acc_on_device-1.f95: Likewise.
    	* gfortran.dg/goacc/acc_on_device-2.f95: Likewise.
    	* gfortran.dg/goacc/acc_on_device-2-off.f95: Likewise.
    	libgomp/
    	* libgomp.map (OACC_2.0): Add acc_on_device, acc_on_device_.
    	* fortran.c: Include "openacc.h".
    	(acc_on_device_): New function.
    	* oacc-parallel.c: Include "openacc.h".
    	(acc_on_device): New function.
    	* openacc.f90 (acc_device_kind, acc_device_none)
    	(acc_device_default, acc_device_host, acc_device_not_host): New
    	parameters.
    	(acc_on_device): New function declaration.
    	* openacc_lib.h (acc_device_kind, acc_device_none)
    	(acc_device_default, acc_device_host, acc_device_not_host): New
    	parameters.
    	(acc_on_device): New function declaration.
    	* openacc.h (acc_device_t): New enum.
    	(acc_on_device): New function declaration.
    	* testsuite/libgomp.oacc-c/acc_on_device-1.c: New file.
    	* testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f: Likewise.
    	* testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f: Likewise.
---
 gcc/ChangeLog.gomp                                 |  8 ++++
 gcc/builtins.c                                     | 50 ++++++++++++++++++++
 gcc/builtins.def                                   |  8 +++-
 gcc/fortran/ChangeLog.gomp                         |  5 ++
 gcc/fortran/f95-lang.c                             |  5 ++
 gcc/fortran/types.def                              |  1 +
 gcc/oacc-builtins.def                              |  2 +
 gcc/testsuite/ChangeLog.gomp                       |  9 ++++
 gcc/testsuite/c-c++-common/goacc/acc_on_device-1.c | 20 ++++++++
 .../c-c++-common/goacc/acc_on_device-2-off.c       | 17 +++++++
 gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c | 17 +++++++
 .../gfortran.dg/goacc/acc_on_device-1.f95          | 22 +++++++++
 .../gfortran.dg/goacc/acc_on_device-2-off.f95      | 39 ++++++++++++++++
 .../gfortran.dg/goacc/acc_on_device-2.f95          | 40 ++++++++++++++++
 libgomp/ChangeLog.gomp                             | 22 +++++++++
 libgomp/fortran.c                                  |  8 ++++
 libgomp/libgomp.map                                |  3 ++
 libgomp/oacc-parallel.c                            | 10 ++++
 libgomp/openacc.f90                                | 17 ++++++-
 libgomp/openacc.h                                  | 13 +++++-
 libgomp/openacc_lib.h                              | 16 ++++++-
 libgomp/testsuite/libgomp.oacc-c/acc_on_device-1.c | 54 ++++++++++++++++++++++
 .../libgomp.oacc-fortran/acc_on_device-1-1.f90     | 39 ++++++++++++++++
 .../libgomp.oacc-fortran/acc_on_device-1-2.f       | 39 ++++++++++++++++
 .../libgomp.oacc-fortran/acc_on_device-1-3.f       | 39 ++++++++++++++++
 25 files changed, 498 insertions(+), 5 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index ef9a81d..0c25a27 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,3 +1,11 @@
+2014-09-18  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* builtins.def (DEF_GOACC_BUILTIN_COMPILER): New macro.
+	* oacc-builtins.def (BUILT_IN_GOACC_UPDATE): New builtin.
+	* builtins.c (expand_builtin_acc_on_device): New function.
+	(expand_builtin): Use it to handle BUILT_IN_ACC_ON_DEVICE.
+	(is_inexpensive_builtin): Handle BUILT_IN_ACC_ON_DEVICE.
+
 2014-09-08  Thomas Schwinge  <thomas@codesourcery.com>
 
 	* configure.ac (offload_targets): Remove.
diff --git gcc/builtins.c gcc/builtins.c
index 975f696..5b2ebcc 100644
--- gcc/builtins.c
+++ gcc/builtins.c
@@ -5747,6 +5747,49 @@ expand_stack_save (void)
   return ret;
 }
 
+
+/* Expand OpenACC acc_on_device.
+
+   This has to happen late (that is, not in early folding; expand_builtin_*,
+   rather than fold_builtin_*), as we have to act differently for host and
+   acceleration device (ACCEL_COMPILER conditional).  */
+
+static rtx
+expand_builtin_acc_on_device (tree exp, rtx target ATTRIBUTE_UNUSED)
+{
+  if (!validate_arglist (exp, INTEGER_TYPE, VOID_TYPE))
+    return NULL_RTX;
+
+  tree arg, v1, v2, ret;
+  location_t loc;
+
+  arg = CALL_EXPR_ARG (exp, 0);
+  arg = builtin_save_expr (arg);
+  loc = EXPR_LOCATION (exp);
+
+  /* Build: (arg == v1 || arg == v2) ? 1 : 0.  */
+
+#ifdef ACCEL_COMPILER
+  v1 = build_int_cst (TREE_TYPE (arg), /* TODO: acc_device_not_host */ 3);
+  v2 = build_int_cst (TREE_TYPE (arg), ACCEL_COMPILER_acc_device);
+#else
+  v1 = build_int_cst (TREE_TYPE (arg), /* TODO: acc_device_none */ 0);
+  v2 = build_int_cst (TREE_TYPE (arg), /* TODO: acc_device_host */ 2);
+#endif
+
+  v1 = fold_build2_loc (loc, EQ_EXPR, integer_type_node, arg, v1);
+  v2 = fold_build2_loc (loc, EQ_EXPR, integer_type_node, arg, v2);
+
+  /* Can't use TRUTH_ORIF_EXPR, as that is not supported by
+     expand_expr_real*.  */
+  ret = fold_build3_loc (loc, COND_EXPR, integer_type_node, v1, v1, v2);
+  ret = fold_build3_loc (loc, COND_EXPR, integer_type_node,
+			 ret, integer_one_node, integer_zero_node);
+
+  return expand_normal (ret);
+}
+
+
 /* Expand an expression EXP that calls a built-in function,
    with result going to TARGET if that's convenient
    (and in mode MODE if that's convenient).
@@ -6816,6 +6859,12 @@ expand_builtin (tree exp, rtx target, rtx subtarget, enum machine_mode mode,
       expand_builtin_cilk_pop_frame (exp);
       return const0_rtx;
 
+    case BUILT_IN_ACC_ON_DEVICE:
+      target = expand_builtin_acc_on_device (exp, target);
+      if (target)
+	return target;
+      break;
+
     default:	/* just do library call, if unknown builtin */
       break;
     }
@@ -12748,6 +12797,7 @@ is_inexpensive_builtin (tree decl)
       case BUILT_IN_LABS:
       case BUILT_IN_LLABS:
       case BUILT_IN_PREFETCH:
+      case BUILT_IN_ACC_ON_DEVICE:
 	return true;
 
       default:
diff --git gcc/builtins.def gcc/builtins.def
index 2ef896e..b9b8e74 100644
--- gcc/builtins.def
+++ gcc/builtins.def
@@ -146,12 +146,16 @@ along with GCC; see the file COPYING3.  If not see
   DEF_BUILTIN (ENUM, NAME, BUILT_IN_NORMAL, BT_LAST, BT_LAST, false, false, \
 	       false, ATTR_LAST, false, false)
 
-/* Builtin used by the implementation of GNU OpenACC.  None of these are
-   actually implemented in the compiler; they're all in libgomp.  */
+/* Builtin used by the implementation of GNU OpenACC.  Few of these are
+   actually implemented in the compiler; most are in libgomp.  */
 #undef DEF_GOACC_BUILTIN
 #define DEF_GOACC_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
   DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,    \
                false, true, true, ATTRS, false, flag_openacc)
+#undef DEF_GOACC_BUILTIN_COMPILER
+#define DEF_GOACC_BUILTIN_COMPILER(ENUM, NAME, TYPE, ATTRS) \
+  DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,    \
+               flag_openacc, true, true, ATTRS, false, true)
 
 /* Builtin used by the implementation of GNU OpenMP.  None of these are
    actually implemented in the compiler; they're all in libgomp.  */
diff --git gcc/fortran/ChangeLog.gomp gcc/fortran/ChangeLog.gomp
index cc7c888..df86db7 100644
--- gcc/fortran/ChangeLog.gomp
+++ gcc/fortran/ChangeLog.gomp
@@ -1,3 +1,8 @@
+2014-09-18  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* f95-lang.c (DEF_GOACC_BUILTIN_COMPILER): New macro.
+	* types.def (BT_FN_INT_INT): New type.
+
 2014-09-08  Cesar Philippidis  <cesar@codesourcery.com>
 
 	* gfortran.h (enum OMP_LIST_FIRST, OMP_LIST_LAST): New
diff --git gcc/fortran/f95-lang.c gcc/fortran/f95-lang.c
index e7c64b7..1b017b1 100644
--- gcc/fortran/f95-lang.c
+++ gcc/fortran/f95-lang.c
@@ -1093,7 +1093,12 @@ gfc_init_builtin_functions (void)
 #define DEF_GOACC_BUILTIN(code, name, type, attr) \
       gfc_define_builtin ("__builtin_" name, builtin_types[type], \
 			  code, name, attr);
+#undef DEF_GOACC_BUILTIN_COMPILER
+      /* TODO: this is not doing the right thing.  */
+#define DEF_GOACC_BUILTIN_COMPILER(code, name, type, attr) \
+      gfc_define_builtin (name, builtin_types[type], code, name, attr);
 #include "../oacc-builtins.def"
+#undef DEF_GOACC_BUILTIN_COMPILER
 #undef DEF_GOACC_BUILTIN
     }
 
diff --git gcc/fortran/types.def gcc/fortran/types.def
index 59ac4c3..6c2fdc0 100644
--- gcc/fortran/types.def
+++ gcc/fortran/types.def
@@ -82,6 +82,7 @@ DEF_FUNCTION_TYPE_0 (BT_FN_VOID, BT_VOID)
 DEF_FUNCTION_TYPE_1 (BT_FN_VOID_PTR, BT_VOID, BT_PTR)
 DEF_FUNCTION_TYPE_1 (BT_FN_VOID_PTRPTR, BT_VOID, BT_PTR_PTR)
 DEF_FUNCTION_TYPE_1 (BT_FN_VOID_VPTR, BT_VOID, BT_VOLATILE_PTR)
+DEF_FUNCTION_TYPE_1 (BT_FN_INT_INT, BT_INT, BT_INT)
 DEF_FUNCTION_TYPE_1 (BT_FN_UINT_UINT, BT_UINT, BT_UINT)
 DEF_FUNCTION_TYPE_1 (BT_FN_PTR_PTR, BT_PTR, BT_PTR)
 DEF_FUNCTION_TYPE_1 (BT_FN_VOID_INT, BT_VOID, BT_INT)
diff --git gcc/oacc-builtins.def gcc/oacc-builtins.def
index dfb688c..e4bc756 100644
--- gcc/oacc-builtins.def
+++ gcc/oacc-builtins.def
@@ -39,3 +39,5 @@ DEF_GOACC_BUILTIN (BUILT_IN_GOACC_PARALLEL, "GOACC_parallel",
 		   ATTR_NOTHROW_LIST)
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_UPDATE, "GOACC_update",
 		   BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST)
+DEF_GOACC_BUILTIN_COMPILER (BUILT_IN_ACC_ON_DEVICE, "acc_on_device",
+			    BT_FN_INT_INT, ATTR_CONST_NOTHROW_LEAF_LIST)
diff --git gcc/testsuite/ChangeLog.gomp gcc/testsuite/ChangeLog.gomp
index 4427521..e210c6b 100644
--- gcc/testsuite/ChangeLog.gomp
+++ gcc/testsuite/ChangeLog.gomp
@@ -1,3 +1,12 @@
+2014-09-18  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* c-c++-common/goacc/acc_on_device-1.c: New file.
+	* c-c++-common/goacc/acc_on_device-2.c: Likewise.
+	* c-c++-common/goacc/acc_on_device-2-off.c: Likewise.
+	* gfortran.dg/goacc/acc_on_device-1.f95: Likewise.
+	* gfortran.dg/goacc/acc_on_device-2.f95: Likewise.
+	* gfortran.dg/goacc/acc_on_device-2-off.f95: Likewise.
+
 2014-09-08  Cesar Philippidis  <cesar@codesourcery.com>
 
 	* gfortran.dg/goacc/private-1.f95: New test.
diff --git gcc/testsuite/c-c++-common/goacc/acc_on_device-1.c gcc/testsuite/c-c++-common/goacc/acc_on_device-1.c
new file mode 100644
index 0000000..d0e137b
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/acc_on_device-1.c
@@ -0,0 +1,20 @@
+/* Have to enable optimizations, as otherwise builtins won't be expanded.  */
+/* { dg-additional-options "-O -fdump-rtl-expand" } */
+
+int
+f (void)
+{
+  int r = 0;
+
+  r |= acc_on_device ();
+  r |= acc_on_device (1, 2);
+  r |= acc_on_device (3.14);
+  r |= acc_on_device ("hello");
+
+  return r;
+}
+
+/* Unsuitable to be handled as a builtin, so we're expecting four calls.
+   { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]*\\\"acc_on_device" 4 "expand" } } */
+
+/* { dg-final { cleanup-rtl-dump "expand" } } */
diff --git gcc/testsuite/c-c++-common/goacc/acc_on_device-2-off.c gcc/testsuite/c-c++-common/goacc/acc_on_device-2-off.c
new file mode 100644
index 0000000..ddc43ab
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/acc_on_device-2-off.c
@@ -0,0 +1,17 @@
+/* Have to enable optimizations, as otherwise builtins won't be expanded.  */
+/* { dg-additional-options "-O -fdump-rtl-expand -fno-openacc" } */
+
+typedef enum acc_device_t { acc_device_X = 123 } acc_device_t;
+extern int acc_on_device (acc_device_t);
+
+int
+f (void)
+{
+  const int dev = acc_device_X;
+  return acc_on_device (dev);
+}
+
+/* Without -fopenacc, we're expecting one call.
+   { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]*\\\"acc_on_device" 1 "expand" } } */
+
+/* { dg-final { cleanup-rtl-dump "expand" } } */
diff --git gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c
new file mode 100644
index 0000000..65b4ae6
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c
@@ -0,0 +1,17 @@
+/* Have to enable optimizations, as otherwise builtins won't be expanded.  */
+/* { dg-additional-options "-O -fdump-rtl-expand" } */
+
+typedef enum acc_device_t { acc_device_X = 123 } acc_device_t;
+extern int acc_on_device (acc_device_t);
+
+int
+f (void)
+{
+  const int dev = acc_device_X;
+  return acc_on_device (dev);
+}
+
+/* With -fopenacc, we're expecting the builtin to be expanded, so no calls.
+   { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]*\\\"acc_on_device" 0 "expand" } } */
+
+/* { dg-final { cleanup-rtl-dump "expand" } } */
diff --git gcc/testsuite/gfortran.dg/goacc/acc_on_device-1.f95 gcc/testsuite/gfortran.dg/goacc/acc_on_device-1.f95
new file mode 100644
index 0000000..9dfde26
--- /dev/null
+++ gcc/testsuite/gfortran.dg/goacc/acc_on_device-1.f95
@@ -0,0 +1,22 @@
+! Have to enable optimizations, as otherwise builtins won't be expanded.
+! { dg-additional-options "-O -fdump-rtl-expand" }
+
+logical function f ()
+  implicit none
+
+  external acc_on_device
+  logical (4) acc_on_device
+
+  f = .false.
+  f = f .or. acc_on_device ()
+  f = f .or. acc_on_device (1, 2)
+  f = f .or. acc_on_device (3.14)
+  f = f .or. acc_on_device ("hello")
+
+  return
+end function f
+
+! Unsuitable to be handled as a builtin, so we're expecting four calls.
+! { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]*\\\"acc_on_device" 4 "expand" } }
+
+! { dg-final { cleanup-rtl-dump "expand" } }
diff --git gcc/testsuite/gfortran.dg/goacc/acc_on_device-2-off.f95 gcc/testsuite/gfortran.dg/goacc/acc_on_device-2-off.f95
new file mode 100644
index 0000000..cf28264
--- /dev/null
+++ gcc/testsuite/gfortran.dg/goacc/acc_on_device-2-off.f95
@@ -0,0 +1,39 @@
+! Have to enable optimizations, as otherwise builtins won't be expanded.
+! { dg-additional-options "-O -fdump-rtl-expand -fno-openacc" }
+
+module openacc_kinds
+  implicit none
+
+  integer, parameter :: acc_device_kind = 4
+
+end module openacc_kinds
+
+module openacc
+  use openacc_kinds
+  implicit none
+
+  integer (acc_device_kind), parameter :: acc_device_host = 2
+
+  interface
+     function acc_on_device (dev)
+       use openacc_kinds
+       logical (4) :: acc_on_device
+       integer (acc_device_kind), intent (in) :: dev
+     end function acc_on_device
+  end interface
+end module openacc
+
+logical (4) function f ()
+  use openacc
+  implicit none
+
+  integer (4), parameter :: dev = 2
+
+  f = acc_on_device (dev)
+  return
+end function f
+
+! Without -fopenacc, we're expecting one call.
+! { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]*\\\"acc_on_device" 1 "expand" } }
+
+! { dg-final { cleanup-rtl-dump "expand" } }
diff --git gcc/testsuite/gfortran.dg/goacc/acc_on_device-2.f95 gcc/testsuite/gfortran.dg/goacc/acc_on_device-2.f95
new file mode 100644
index 0000000..7730a60
--- /dev/null
+++ gcc/testsuite/gfortran.dg/goacc/acc_on_device-2.f95
@@ -0,0 +1,40 @@
+! Have to enable optimizations, as otherwise builtins won't be expanded.
+! { dg-additional-options "-O -fdump-rtl-expand" }
+
+module openacc_kinds
+  implicit none
+
+  integer, parameter :: acc_device_kind = 4
+
+end module openacc_kinds
+
+module openacc
+  use openacc_kinds
+  implicit none
+
+  integer (acc_device_kind), parameter :: acc_device_host = 2
+
+  interface
+     function acc_on_device (dev)
+       use openacc_kinds
+       logical (4) :: acc_on_device
+       integer (acc_device_kind), intent (in) :: dev
+     end function acc_on_device
+  end interface
+end module openacc
+
+logical (4) function f ()
+  use openacc
+  implicit none
+
+  integer (4), parameter :: dev = 2
+
+  f = acc_on_device (dev)
+  return
+end function f
+
+! With -fopenacc, we're expecting the builtin to be expanded, so no calls.
+! TODO: not working.
+! { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]*\\\"acc_on_device" 0 "expand" { xfail *-*-* } } }
+
+! { dg-final { cleanup-rtl-dump "expand" } }
diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index 5b2a39d..8d774ee 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,3 +1,25 @@
+2014-09-18  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* libgomp.map (OACC_2.0): Add acc_on_device, acc_on_device_.
+	* fortran.c: Include "openacc.h".
+	(acc_on_device_): New function.
+	* oacc-parallel.c: Include "openacc.h".
+	(acc_on_device): New function.
+	* openacc.f90 (acc_device_kind, acc_device_none)
+	(acc_device_default, acc_device_host, acc_device_not_host): New
+	parameters.
+	(acc_on_device): New function declaration.
+	* openacc_lib.h (acc_device_kind, acc_device_none)
+	(acc_device_default, acc_device_host, acc_device_not_host): New
+	parameters.
+	(acc_on_device): New function declaration.
+	* openacc.h (acc_device_t): New enum.
+	(acc_on_device): New function declaration.
+	* testsuite/libgomp.oacc-c/acc_on_device-1.c: New file.
+	* testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90: Likewise.
+	* testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f: Likewise.
+	* testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f: Likewise.
+
 2014-07-09  Thomas Schwinge  <thomas@codesourcery.com>
 	    Jakub Jelinek  <jakub@redhat.com>
 
diff --git libgomp/fortran.c libgomp/fortran.c
index 1f30c51..28c83cc 100644
--- libgomp/fortran.c
+++ libgomp/fortran.c
@@ -26,6 +26,7 @@
 
 #include "libgomp.h"
 #include "libgomp_f.h"
+#include "openacc.h"
 #include <stdlib.h>
 #include <limits.h>
 
@@ -73,6 +74,7 @@ ialias_redirect (omp_get_num_devices)
 ialias_redirect (omp_get_num_teams)
 ialias_redirect (omp_get_team_num)
 ialias_redirect (omp_is_initial_device)
+ialias_redirect (acc_on_device)
 #endif
 
 #ifndef LIBGOMP_GNU_SYMBOL_VERSIONING
@@ -492,3 +494,9 @@ omp_is_initial_device_ (void)
 {
   return omp_is_initial_device ();
 }
+
+int32_t
+acc_on_device_ (const int32_t *dev)
+{
+  return acc_on_device (*dev);
+}
diff --git libgomp/libgomp.map libgomp/libgomp.map
index c575be3..69a4d83 100644
--- libgomp/libgomp.map
+++ libgomp/libgomp.map
@@ -234,6 +234,9 @@ GOMP_4.0.1 {
 } GOMP_4.0;
 
 OACC_2.0 {
+  global:
+	acc_on_device;
+	acc_on_device_;
 };
 
 GOACC_2.0 {
diff --git libgomp/oacc-parallel.c libgomp/oacc-parallel.c
index 79b6254..02fbb12 100644
--- libgomp/oacc-parallel.c
+++ libgomp/oacc-parallel.c
@@ -27,6 +27,7 @@
 
 #include "libgomp.h"
 #include "libgomp_g.h"
+#include "openacc.h"
 
 void
 GOACC_parallel (int device, void (*fn) (void *), const void *openmp_target,
@@ -128,3 +129,12 @@ GOACC_update (int device, const void *openmp_target, size_t mapnum,
     }
   GOMP_target_update (device, openmp_target, mapnum, hostaddrs, sizes, kinds_);
 }
+
+/* TODO: Move elsewhere.  */
+int
+acc_on_device (acc_device_t dev)
+{
+  /* Just rely on the compiler builtin.  */
+  return __builtin_acc_on_device (dev);
+}
+ialias (acc_on_device)
diff --git libgomp/openacc.f90 libgomp/openacc.f90
index b2a79f6..70b58d6 100644
--- libgomp/openacc.f90
+++ libgomp/openacc.f90
@@ -1,6 +1,6 @@
 !  OpenACC Runtime Library Definitions.
 
-!  Copyright (C) 2013 Free Software Foundation, Inc.
+!  Copyright (C) 2013-2014 Free Software Foundation, Inc.
 
 !  Contributed by Thomas Schwinge <thomas@codesourcery.com>.
 
@@ -28,6 +28,8 @@
 module openacc_kinds
   implicit none
 
+  integer, parameter :: acc_device_kind = 4
+
 end module openacc_kinds
 
 module openacc
@@ -36,4 +38,17 @@ module openacc
 
   integer, parameter :: openacc_version = 201306
 
+  integer (acc_device_kind), parameter :: acc_device_none = 0
+  integer (acc_device_kind), parameter :: acc_device_default = 1
+  integer (acc_device_kind), parameter :: acc_device_host = 2
+  integer (acc_device_kind), parameter :: acc_device_not_host = 3
+
+  interface
+     function acc_on_device (dev)
+       use openacc_kinds
+       logical (4) :: acc_on_device
+       integer (acc_device_kind), intent (in) :: dev
+     end function acc_on_device
+  end interface
+
 end module openacc
diff --git libgomp/openacc.h libgomp/openacc.h
index a6f7ec94..cde7429 100644
--- libgomp/openacc.h
+++ libgomp/openacc.h
@@ -37,7 +37,18 @@ extern "C" {
 #else
 # define __GOACC_NOTHROW __attribute__ ((__nothrow__))
 #endif
-  
+
+typedef enum acc_device_t
+  {
+    acc_device_none = 0,
+    acc_device_default, /* This has to be a distinct value, as no
+			   return value can match it.  */
+    acc_device_host = 2,
+    acc_device_not_host = 3
+  } acc_device_t;
+
+int acc_on_device (acc_device_t __dev) __GOACC_NOTHROW;
+
 #ifdef __cplusplus
 }
 #endif
diff --git libgomp/openacc_lib.h libgomp/openacc_lib.h
index d19c95c..be49100 100644
--- libgomp/openacc_lib.h
+++ libgomp/openacc_lib.h
@@ -1,6 +1,6 @@
 !  OpenACC Runtime Library Definitions.                   -*- mode: fortran -*-
 
-!  Copyright (C) 2013 Free Software Foundation, Inc.
+!  Copyright (C) 2013-2014 Free Software Foundation, Inc.
 
 !  Contributed by Thomas Schwinge <thomas@codesourcery.com>.
 
@@ -27,3 +27,17 @@
 
       integer openacc_version
       parameter (openacc_version = 201306)
+
+      integer acc_device_kind
+      parameter (acc_device_kind = 4)
+      integer (acc_device_kind) acc_device_none
+      parameter (acc_device_none = 0)
+      integer (acc_device_kind) acc_device_default
+      parameter (acc_device_default = 1)
+      integer (acc_device_kind) acc_device_host
+      parameter (acc_device_host = 2)
+      integer (acc_device_kind) acc_device_not_host
+      parameter (acc_device_not_host = 3)
+
+      external acc_on_device
+      logical (4) acc_on_device
diff --git libgomp/testsuite/libgomp.oacc-c/acc_on_device-1.c libgomp/testsuite/libgomp.oacc-c/acc_on_device-1.c
new file mode 100644
index 0000000..f216587
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c/acc_on_device-1.c
@@ -0,0 +1,54 @@
+/* Disable the acc_on_device builtin; we want to test the libgomp library
+   function.  */
+/* TODO: Remove -DACC_DEVICE_TYPE_host once that is set by the test harness.  */
+/* { dg-additional-options "-fno-builtin-acc_on_device -DACC_DEVICE_TYPE_host" } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+int
+main (int argc, char *argv[])
+{
+  /* Host.  */
+
+  {
+    if (!acc_on_device (acc_device_none))
+      abort ();
+    if (!acc_on_device (acc_device_host))
+      abort ();
+    if (acc_on_device (acc_device_not_host))
+      abort ();
+  }
+
+
+  /* Host via offloading fallback mode.  */
+
+#pragma acc parallel if(0)
+  {
+    if (!acc_on_device (acc_device_none))
+      abort ();
+    if (!acc_on_device (acc_device_host))
+      abort ();
+    if (acc_on_device (acc_device_not_host))
+      abort ();
+  }
+
+
+#if !ACC_DEVICE_TYPE_host
+
+  /* Offloaded.  */
+
+#pragma acc parallel
+  {
+    if (acc_on_device (acc_device_none))
+      abort ();
+    if (acc_on_device (acc_device_host))
+      abort ();
+    if (!acc_on_device (acc_device_not_host))
+      abort ();
+  }
+
+#endif
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90 libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90
new file mode 100644
index 0000000..c4597a6
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90
@@ -0,0 +1,39 @@
+! TODO: Remove -DACC_DEVICE_TYPE_host once that is set by the test harness.
+! { dg-additional-options "-cpp -DACC_DEVICE_TYPE_host" }
+! TODO: Have to disable the acc_on_device builtin for we want to test the
+! libgomp library function?  The command line option
+! '-fno-builtin-acc_on_device' is valid for C/C++/ObjC/ObjC++ but not for
+! Fortran.
+
+use openacc
+implicit none
+
+! Host.
+
+if (.not. acc_on_device (acc_device_none)) call abort
+if (.not. acc_on_device (acc_device_host)) call abort
+if (acc_on_device (acc_device_not_host)) call abort
+
+
+! Host via offloading fallback mode.
+
+!$acc parallel if(.false.)
+if (.not. acc_on_device (acc_device_none)) call abort
+if (.not. acc_on_device (acc_device_host)) call abort
+if (acc_on_device (acc_device_not_host)) call abort
+!$acc end parallel
+
+
+#if !ACC_DEVICE_TYPE_host
+
+! Offloaded.
+
+!$acc parallel
+if (acc_on_device (acc_device_none)) call abort
+if (acc_on_device (acc_device_host)) call abort
+if (.not. acc_on_device (acc_device_not_host)) call abort
+!$acc end parallel
+
+#endif
+
+end
diff --git libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f
new file mode 100644
index 0000000..3787e1e
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f
@@ -0,0 +1,39 @@
+! TODO: Remove -DACC_DEVICE_TYPE_host once that is set by the test harness.
+! { dg-additional-options "-cpp -DACC_DEVICE_TYPE_host" }
+! TODO: Have to disable the acc_on_device builtin for we want to test
+! the libgomp library function?  The command line option
+! '-fno-builtin-acc_on_device' is valid for C/C++/ObjC/ObjC++ but not
+! for Fortran.
+
+      USE OPENACC
+      IMPLICIT NONE
+
+!Host.
+
+      IF (.NOT. ACC_ON_DEVICE (ACC_DEVICE_NONE)) CALL ABORT
+      IF (.NOT. ACC_ON_DEVICE (ACC_DEVICE_HOST)) CALL ABORT
+      IF (ACC_ON_DEVICE (ACC_DEVICE_NOT_HOST)) CALL ABORT
+
+
+!Host via offloading fallback mode.
+
+!$ACC PARALLEL IF(.FALSE.)
+      IF (.NOT. ACC_ON_DEVICE (ACC_DEVICE_NONE)) CALL ABORT
+      IF (.NOT. ACC_ON_DEVICE (ACC_DEVICE_HOST)) CALL ABORT
+      IF (ACC_ON_DEVICE (ACC_DEVICE_NOT_HOST)) CALL ABORT
+!$ACC END PARALLEL
+
+
+#if !ACC_DEVICE_TYPE_host
+
+! Offloaded.
+
+!$ACC PARALLEL
+      IF (ACC_ON_DEVICE (ACC_DEVICE_NONE)) CALL ABORT
+      IF (ACC_ON_DEVICE (ACC_DEVICE_HOST)) CALL ABORT
+      IF (.NOT. ACC_ON_DEVICE (ACC_DEVICE_NOT_HOST)) CALL ABORT
+!$ACC END PARALLEL
+
+#endif
+
+      END
diff --git libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f
new file mode 100644
index 0000000..1ee5926
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f
@@ -0,0 +1,39 @@
+! TODO: Remove -DACC_DEVICE_TYPE_host once that is set by the test harness.
+! { dg-additional-options "-cpp -DACC_DEVICE_TYPE_host" }
+! TODO: Have to disable the acc_on_device builtin for we want to test
+! the libgomp library function?  The command line option
+! '-fno-builtin-acc_on_device' is valid for C/C++/ObjC/ObjC++ but not
+! for Fortran.
+
+      IMPLICIT NONE
+      INCLUDE "openacc_lib.h"
+
+!Host.
+
+      IF (.NOT. ACC_ON_DEVICE (ACC_DEVICE_NONE)) CALL ABORT
+      IF (.NOT. ACC_ON_DEVICE (ACC_DEVICE_HOST)) CALL ABORT
+      IF (ACC_ON_DEVICE (ACC_DEVICE_NOT_HOST)) CALL ABORT
+
+
+!Host via offloading fallback mode.
+
+!$ACC PARALLEL IF(.FALSE.)
+      IF (.NOT. ACC_ON_DEVICE (ACC_DEVICE_NONE)) CALL ABORT
+      IF (.NOT. ACC_ON_DEVICE (ACC_DEVICE_HOST)) CALL ABORT
+      IF (ACC_ON_DEVICE (ACC_DEVICE_NOT_HOST)) CALL ABORT
+!$ACC END PARALLEL
+
+
+#if !ACC_DEVICE_TYPE_host
+
+! Offloaded.
+
+!$ACC PARALLEL
+      IF (ACC_ON_DEVICE (ACC_DEVICE_NONE)) CALL ABORT
+      IF (ACC_ON_DEVICE (ACC_DEVICE_HOST)) CALL ABORT
+      IF (.NOT. ACC_ON_DEVICE (ACC_DEVICE_NOT_HOST)) CALL ABORT
+!$ACC END PARALLEL
+
+#endif
+
+      END

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [gomp4] various OpenACC/PTX built-ins and a reduction tweak
  2014-09-17  0:33 [gomp4] various OpenACC/PTX built-ins and a reduction tweak Cesar Philippidis
  2014-09-17  8:44 ` Tobias Burnus
@ 2014-09-18 18:43 ` Thomas Schwinge
  2014-09-18 19:10   ` Cesar Philippidis
  2014-11-05 15:39   ` [gomp4] Remove unused BUILT_IN_OMP_SET_NUM_THREADS (was: various OpenACC/PTX built-ins and a reduction tweak) Thomas Schwinge
  1 sibling, 2 replies; 9+ messages in thread
From: Thomas Schwinge @ 2014-09-18 18:43 UTC (permalink / raw)
  To: Cesar Philippidis; +Cc: gcc-patches, fortran

[-- Attachment #1: Type: text/plain, Size: 6252 bytes --]

Hi Cesar!

On Tue, 16 Sep 2014 17:32:54 -0700, Cesar Philippidis <cesar@codesourcery.com> wrote:
> The patch introduces the following OpenACC/PTX-specific built-ins:
> 
>   * GOACC_ntid
>   * GOACC_tid
>   * GOACC_nctaid
>   * GOACC_ctaid
>   * acc_on_device
>   * GOACC_get_thread_num
>   * GOACC_get_num_threads
> 
> Of these functions, the only one part of the OpenACC spec is
> acc_on_device. The other functions are helpers for omp-low.c. In
> particular, I'm using GOACC_get_thread_num and GOACC_get_num_threads to
> determine the number of accelerator threads available to the reduction
> clause. Current GOACC_get_num_threads is num_gangs * vector_length, but
> value is subject to change later on. It's probably a premature to
> include the PTX built-ins right now, but I'd like to middle end of our
> internal OpenACC branch in sync with gomp-4_0-branch.

In my opinion (and we've once only very briefly discussed this
internally), exposing concepts such as TID or CTA is wrong at this
abstraction level, for these are PTX concepts, but we'd like OpenACC
concepts here: numbers and IDs of gangs, workers, vector-length.  That
said, I'm not opposed to having these committed to gomp-4_0-branch, and
we'll fix it in the following.  (And -- full disclosure ;-) -- it was me
who internally added GOACC_ntid/GOACC_tid, when I needed those as
prototypes, and never got around to re-doing that properly.)


> This patch also allows OpenACC reductions to process the array holding
> partial reductions on the accelerator, instead of copying that array
> back to the host. Currently, this only happens when num_gangs = 1. For
> PTX targets, we're going to need to use another kernel to process the
> array of partial results because PTX lacks inter-CTA synchronization
> (we're currently mapping gangs to CTAs). That's why I was working on the
> routine clause recently.
> 
> Is this OK for gomp-4_0-branch?

If we agree to fix this up as discussed above (and I guess we have no
chance but to agree on this), I'd say so.  A few comments, none of which
are blocking:

> --- a/gcc/builtins.def
> +++ b/gcc/builtins.def

>  #undef DEF_GOACC_BUILTIN
>  #define DEF_GOACC_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
>    DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,    \
> -               false, true, true, ATTRS, false, flag_openacc)
> +               false, true, true, ATTRS, false, \
> +	       (/* TODO */ true || flag_openacc))

Hack that, I hope, will soon be addressed properly.

>  #define DEF_GOMP_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
>    DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,    \
>                 false, true, true, ATTRS, false, \
> -	       (flag_openmp || flag_tree_parallelize_loops))
> +	       (/* TODO */ true || flag_openmp || flag_tree_parallelize_loops))

Likewise.


> --- a/gcc/omp-builtins.def
> +++ b/gcc/omp-builtins.def
> @@ -236,6 +236,3 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET_UPDATE, "GOMP_target_update",
>  		  BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST)
>  DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TEAMS, "GOMP_teams",
>  		  BT_FN_VOID_UINT_UINT, ATTR_NOTHROW_LIST)
> -
> -DEF_GOMP_BUILTIN (BUILT_IN_OMP_SET_NUM_THREADS, "omp_set_num_threads",
> -		  BT_FN_VOID_INT, ATTR_CONST_NOTHROW_LEAF_LIST)

To avoid confusion: that has been added to gomp-4_0-branch earlier, and
is now reverted to the trunk state.


> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c

> +static bool
> +finish_reduction_on_host (omp_context *ctx)

> +static tree
> +oacc_host_nthreads (omp_context *ctx)

> +static void
> +finalize_reduction_data (tree clauses, tree nthreads, gimple_seq *stmt_seqp,

> @@ -4433,61 +4519,26 @@ lower_reduction_clauses (tree clauses, gimple_seq *stmt_seqp, omp_context *ctx)

> @@ -9782,6 +9900,14 @@ initialize_reduction_data (tree clauses, tree nthreads, gimple_seq *stmt_seqp,

> @@ -9913,16 +10089,15 @@ finalize_reduction_data (tree clauses, tree nthreads, gimple_seq *stmt_seqp,

> @@ -10026,7 +10211,6 @@ process_reduction_data (gimple_seq *body, gimple_seq *in_stmt_seqp,

Have not reviewed in detail, but that also applies to previous reduction
changes, so OK to apply on top of that.


> @@ -5644,9 +5739,9 @@ expand_oacc_offload (struct omp_region *region)
>    tree openmp_target = get_offload_symbol_decl ();
>    tree fnaddr = build_fold_addr_expr (child_fn);
>    g = gimple_build_call (builtin_decl_explicit (start_ix), 10, device,
> -			 fnaddr, build_fold_addr_expr (openmp_target),
> -			 t1, t2, t3, t4,
> -			 t_num_gangs, t_num_workers, t_vector_length);
> +                    fnaddr, build_fold_addr_expr (openmp_target),
> +                    t1, t2, t3, t4,
> +                    t_num_gangs, t_num_workers, t_vector_length);
>    gimple_set_location (g, gimple_location (entry_stmt));
>    gsi_insert_before (&gsi, g, GSI_SAME_STMT);
>  }

Unintentional?


> @@ -6913,8 +7008,10 @@ expand_omp_for_static_nochunk (struct omp_region *region,

> +  /* Ensure nthreads is at least 1.  BUILT_IN_GOACC_NTID returns 0 for a target
> +     that does not have a specific expansion.  */

Remains to be decided.

> @@ -7317,8 +7423,10 @@ expand_omp_for_static_chunk (struct omp_region *region,

> +  /* Ensure nthreads is at least 1.  BUILT_IN_GOACC_NTID returns 0 for a target
> +     that does not have a specific expansion.  */
> +  nthreads
> +    = fold_build2 (MAX_EXPR, itype, nthreads,
> +		   fold_convert (TREE_TYPE (nthreads), integer_one_node));
> +  nthreads = force_gimple_operand_gsi (&gsi, nthreads, true, NULL_TREE,
> +				       true, GSI_SAME_STMT);

Likewise.


>    /* Initialize the reduction variables to be value of the first array
> -     element.  */
> +     element.  FIXME: A parallel loop should use the original reduction
> +     variable as the initial value.  */

Is that
<http://news.gmane.org/find-root.php?message_id=%3C87wqaxh0x7.fsf%40kepler.schwinge.homeip.net%3E>,
and where you had sent that question to the OpenACC technical mailing
list, weeks ago, which has not yet been answered as far as I know --
might be a good idea to "ping" them?


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [gomp4] various OpenACC/PTX built-ins and a reduction tweak
  2014-09-18 18:43 ` [gomp4] various OpenACC/PTX built-ins and a reduction tweak Thomas Schwinge
@ 2014-09-18 19:10   ` Cesar Philippidis
  2014-11-05 15:39   ` [gomp4] Remove unused BUILT_IN_OMP_SET_NUM_THREADS (was: various OpenACC/PTX built-ins and a reduction tweak) Thomas Schwinge
  1 sibling, 0 replies; 9+ messages in thread
From: Cesar Philippidis @ 2014-09-18 19:10 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: gcc-patches, fortran

On 09/18/2014 11:43 AM, Thomas Schwinge wrote:
> Hi Cesar!
> 
> On Tue, 16 Sep 2014 17:32:54 -0700, Cesar Philippidis <cesar@codesourcery.com> wrote:
>> The patch introduces the following OpenACC/PTX-specific built-ins:
>>
>>   * GOACC_ntid
>>   * GOACC_tid
>>   * GOACC_nctaid
>>   * GOACC_ctaid
>>   * acc_on_device
>>   * GOACC_get_thread_num
>>   * GOACC_get_num_threads
>>
>> Of these functions, the only one part of the OpenACC spec is
>> acc_on_device. The other functions are helpers for omp-low.c. In
>> particular, I'm using GOACC_get_thread_num and GOACC_get_num_threads to
>> determine the number of accelerator threads available to the reduction
>> clause. Current GOACC_get_num_threads is num_gangs * vector_length, but
>> value is subject to change later on. It's probably a premature to
>> include the PTX built-ins right now, but I'd like to middle end of our
>> internal OpenACC branch in sync with gomp-4_0-branch.
> 
> In my opinion (and we've once only very briefly discussed this
> internally), exposing concepts such as TID or CTA is wrong at this
> abstraction level, for these are PTX concepts, but we'd like OpenACC
> concepts here: numbers and IDs of gangs, workers, vector-length.  That
> said, I'm not opposed to having these committed to gomp-4_0-branch, and
> we'll fix it in the following.  (And -- full disclosure ;-) -- it was me
> who internally added GOACC_ntid/GOACC_tid, when I needed those as
> prototypes, and never got around to re-doing that properly.)
> 
> 
>> This patch also allows OpenACC reductions to process the array holding
>> partial reductions on the accelerator, instead of copying that array
>> back to the host. Currently, this only happens when num_gangs = 1. For
>> PTX targets, we're going to need to use another kernel to process the
>> array of partial results because PTX lacks inter-CTA synchronization
>> (we're currently mapping gangs to CTAs). That's why I was working on the
>> routine clause recently.
>>
>> Is this OK for gomp-4_0-branch?
> 
> If we agree to fix this up as discussed above (and I guess we have no
> chance but to agree on this), I'd say so.  A few comments, none of which
> are blocking:

That sounds fair.

>> --- a/gcc/builtins.def
>> +++ b/gcc/builtins.def
> 
>>  #undef DEF_GOACC_BUILTIN
>>  #define DEF_GOACC_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
>>    DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,    \
>> -               false, true, true, ATTRS, false, flag_openacc)
>> +               false, true, true, ATTRS, false, \
>> +	       (/* TODO */ true || flag_openacc))
> 
> Hack that, I hope, will soon be addressed properly.
> 
>>  #define DEF_GOMP_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
>>    DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,    \
>>                 false, true, true, ATTRS, false, \
>> -	       (flag_openmp || flag_tree_parallelize_loops))
>> +	       (/* TODO */ true || flag_openmp || flag_tree_parallelize_loops))
> 
> Likewise.
> 
> 
>> --- a/gcc/omp-builtins.def
>> +++ b/gcc/omp-builtins.def
>> @@ -236,6 +236,3 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET_UPDATE, "GOMP_target_update",
>>  		  BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST)
>>  DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TEAMS, "GOMP_teams",
>>  		  BT_FN_VOID_UINT_UINT, ATTR_NOTHROW_LIST)
>> -
>> -DEF_GOMP_BUILTIN (BUILT_IN_OMP_SET_NUM_THREADS, "omp_set_num_threads",
>> -		  BT_FN_VOID_INT, ATTR_CONST_NOTHROW_LEAF_LIST)
> 
> To avoid confusion: that has been added to gomp-4_0-branch earlier, and
> is now reverted to the trunk state.
> 
> 
>> --- a/gcc/omp-low.c
>> +++ b/gcc/omp-low.c
> 
>> +static bool
>> +finish_reduction_on_host (omp_context *ctx)
> 
>> +static tree
>> +oacc_host_nthreads (omp_context *ctx)
> 
>> +static void
>> +finalize_reduction_data (tree clauses, tree nthreads, gimple_seq *stmt_seqp,
> 
>> @@ -4433,61 +4519,26 @@ lower_reduction_clauses (tree clauses, gimple_seq *stmt_seqp, omp_context *ctx)
> 
>> @@ -9782,6 +9900,14 @@ initialize_reduction_data (tree clauses, tree nthreads, gimple_seq *stmt_seqp,
> 
>> @@ -9913,16 +10089,15 @@ finalize_reduction_data (tree clauses, tree nthreads, gimple_seq *stmt_seqp,
> 
>> @@ -10026,7 +10211,6 @@ process_reduction_data (gimple_seq *body, gimple_seq *in_stmt_seqp,
> 
> Have not reviewed in detail, but that also applies to previous reduction
> changes, so OK to apply on top of that.
> 
> 
>> @@ -5644,9 +5739,9 @@ expand_oacc_offload (struct omp_region *region)
>>    tree openmp_target = get_offload_symbol_decl ();
>>    tree fnaddr = build_fold_addr_expr (child_fn);
>>    g = gimple_build_call (builtin_decl_explicit (start_ix), 10, device,
>> -			 fnaddr, build_fold_addr_expr (openmp_target),
>> -			 t1, t2, t3, t4,
>> -			 t_num_gangs, t_num_workers, t_vector_length);
>> +                    fnaddr, build_fold_addr_expr (openmp_target),
>> +                    t1, t2, t3, t4,
>> +                    t_num_gangs, t_num_workers, t_vector_length);
>>    gimple_set_location (g, gimple_location (entry_stmt));
>>    gsi_insert_before (&gsi, g, GSI_SAME_STMT);
>>  }
> 
> Unintentional?
> 
> 
>> @@ -6913,8 +7008,10 @@ expand_omp_for_static_nochunk (struct omp_region *region,
> 
>> +  /* Ensure nthreads is at least 1.  BUILT_IN_GOACC_NTID returns 0 for a target
>> +     that does not have a specific expansion.  */
> 
> Remains to be decided.
> 
>> @@ -7317,8 +7423,10 @@ expand_omp_for_static_chunk (struct omp_region *region,
> 
>> +  /* Ensure nthreads is at least 1.  BUILT_IN_GOACC_NTID returns 0 for a target
>> +     that does not have a specific expansion.  */
>> +  nthreads
>> +    = fold_build2 (MAX_EXPR, itype, nthreads,
>> +		   fold_convert (TREE_TYPE (nthreads), integer_one_node));
>> +  nthreads = force_gimple_operand_gsi (&gsi, nthreads, true, NULL_TREE,
>> +				       true, GSI_SAME_STMT);
> 
> Likewise.
> 
> 
>>    /* Initialize the reduction variables to be value of the first array
>> -     element.  */
>> +     element.  FIXME: A parallel loop should use the original reduction
>> +     variable as the initial value.  */
>
> Is that
> <http://news.gmane.org/find-root.php?message_id=%3C87wqaxh0x7.fsf%40kepler.schwinge.homeip.net%3E>,
> and where you had sent that question to the OpenACC technical mailing
> list, weeks ago, which has not yet been answered as far as I know --
> might be a good idea to "ping" them?

Yes. I'll ping them.

I'll make the changes you suggested in a follow-up patch. Since we both
touch builtins.c, I'll let your patch go in first since you already have
one ready. Then I'll rebase the revised version of this patch on top of
your patch.

Thanks,
Cesar

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [gomp4] OpenACC acc_on_device
  2014-09-18 18:01     ` [gomp4] OpenACC acc_on_device (was: various OpenACC/PTX built-ins and a reduction tweak) Thomas Schwinge
@ 2014-10-31 10:56       ` Thomas Schwinge
  0 siblings, 0 replies; 9+ messages in thread
From: Thomas Schwinge @ 2014-10-31 10:56 UTC (permalink / raw)
  To: gcc-patches
  Cc: fortran, Tobias Burnus, Cesar Philippidis, James Norris, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 7950 bytes --]

Hi!

On Thu, 18 Sep 2014 20:01:02 +0200, I wrote:
> Here is my OpenACC acc_on_device patch, in a more complete form, with
> test cases and all that.
> 
> On Wed, 17 Sep 2014 10:49:54 +0200, Jakub Jelinek <jakub@redhat.com> wrote:
> > On Wed, Sep 17, 2014 at 10:44:12AM +0200, Tobias Burnus wrote:
> > > Cesar Philippidis wrote:
> > > > The patch introduces the following OpenACC/PTX-specific built-ins:
> > > ...
> > > 
> > > It is not completely clear how they are supposed to get used. Should the
> > > user call them directly in some cases? Or are they only used internally?
> > > 
> > > acc_on_device sounds like a function which would be in C/C++ made available
> > > to the user via #define acc_on_device __builtin_acc_on_device.
> > 
> > And not just providing acc_on_device prototype in some header?
> 
> Yes, just a prototype.  And next to DEF_GOACC_BUILTIN (configured the
> same as DEF_GOMP_BUILTIN), I add a new DEF_GOACC_BUILTIN_COMPILER that is
> configured to always provide the __builtin_[...] variant, but the
> un-prefixed [...]  only if -fopenacc is in effect.  Does that look
> alright?
> 
> > Without
> > looking at the OpenACC standard, it sounds like this function could be
> > similar to omp_is_initial_device, so can and should be handled supposedly
> > similarly.
> 
> I think we've been talking about this at the Cauldron, where you agreed
> that omp_is_initial_device should also be implemented as a builtin.  (Or
> am I confusing things?)
> 
> > > However, the rest looks as if it should rather be an internal function
> > > instead of a builtin. Or should the user really ever call the builtin
> > > directly?
> > 
> > GOMP_* functions are builtins and not internal functions too, all those
> > functions are library functions, while the user typically doesn't call them
> > directly, they still are implemented in the library.  Internal functions are
> > used for something that doesn't have a library implementation and is not
> > something user can call directly.
> 
> > > Regarding Fortran: Builtins aren't directly available to the user. You have to
> > > wrap them into an intrinsic to make them available. If they have to be made
> > > available via a module (e.g. via "module acc) - you have to create a virtual
> > > module, which provides the intrinsic. If you don't want to convert the whole
> > > module, you could create an auxiliar module (e.g. acc_internal_) which provides
> > > only those bits - and then include it ("use,intrinsic :: ...") it in the
> > > main module - written in normal Fortran.
> 
> This I have not yet addressed -- please see the TODO comments in the
> gcc/fortran/ files as well as Fortran test cases.
> 
> > For the user callable fortran functions, for OpenMP libgomp just provides
> > *_ entrypoints to * functions.  Perhaps acc_on_device_ could be provided
> > too.
> 
> This is what I had done already.
> 
> Does that patch look good?  (With the Fortran things still to be
> addressed.)

(Checked in, back then, to gomp-4_0-branch in r215506.)

>     	gcc/testsuite/
>     	* c-c++-common/goacc/acc_on_device-1.c: New file.
>     	* c-c++-common/goacc/acc_on_device-2.c: Likewise.
>     	* c-c++-common/goacc/acc_on_device-2-off.c: Likewise.

Here is a patch, checked in to gomp-4_0-branch in r216953, to make
acc_on_device-1.c C-only (implicitly declared functions are only
"supported" in C), and make the others actually fit for C++ -- and XFAIL
the C++ case.  How to resolve that one?

commit b1a009fdf340acf1840c1b6c9022be69a8f0a661
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Fri Oct 31 10:39:44 2014 +0000

    Make acc_on_device test cases fit for C++.
    
    	gcc/testsuite/
    	* c-c++-common/goacc/acc_on_device-1.c: Move...
    	* gcc.dg/goacc/acc_on_device-1.c: ... here.
    	(dg-additional-options): Add -std=c89.
    	* c-c++-common/goacc/acc_on_device-2-off.c: Extend for C++.
    	* c-c++-common/goacc/acc_on_device-2.c: Likewise.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@216953 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/testsuite/ChangeLog.gomp                             |  8 ++++++++
 gcc/testsuite/c-c++-common/goacc/acc_on_device-2-off.c   | 10 +++++++++-
 gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c       | 16 ++++++++++++++--
 .../{c-c++-common => gcc.dg}/goacc/acc_on_device-1.c     |  2 +-
 4 files changed, 32 insertions(+), 4 deletions(-)

diff --git gcc/testsuite/ChangeLog.gomp gcc/testsuite/ChangeLog.gomp
index 10232bc..2489b39 100644
--- gcc/testsuite/ChangeLog.gomp
+++ gcc/testsuite/ChangeLog.gomp
@@ -1,3 +1,11 @@
+2014-10-30  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* c-c++-common/goacc/acc_on_device-1.c: Move...
+	* gcc.dg/goacc/acc_on_device-1.c: ... here.
+	(dg-additional-options): Add -std=c89.
+	* c-c++-common/goacc/acc_on_device-2-off.c: Extend for C++.
+	* c-c++-common/goacc/acc_on_device-2.c: Likewise.
+
 2014-10-20  Thomas Schwinge  <thomas@codesourcery.com>
 
 	* gcc.dg/goacc/sb-1.c: Move file...
diff --git gcc/testsuite/c-c++-common/goacc/acc_on_device-2-off.c gcc/testsuite/c-c++-common/goacc/acc_on_device-2-off.c
index ddc43ab..25d21ad 100644
--- gcc/testsuite/c-c++-common/goacc/acc_on_device-2-off.c
+++ gcc/testsuite/c-c++-common/goacc/acc_on_device-2-off.c
@@ -1,13 +1,21 @@
 /* Have to enable optimizations, as otherwise builtins won't be expanded.  */
 /* { dg-additional-options "-O -fdump-rtl-expand -fno-openacc" } */
 
+#if __cplusplus
+extern "C" {
+#endif
+
 typedef enum acc_device_t { acc_device_X = 123 } acc_device_t;
 extern int acc_on_device (acc_device_t);
 
+#if __cplusplus
+}
+#endif
+
 int
 f (void)
 {
-  const int dev = acc_device_X;
+  const acc_device_t dev = acc_device_X;
   return acc_on_device (dev);
 }
 
diff --git gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c
index 65b4ae6..d5389a9 100644
--- gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c
+++ gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c
@@ -1,17 +1,29 @@
 /* Have to enable optimizations, as otherwise builtins won't be expanded.  */
 /* { dg-additional-options "-O -fdump-rtl-expand" } */
 
+#if __cplusplus
+extern "C" {
+#endif
+
 typedef enum acc_device_t { acc_device_X = 123 } acc_device_t;
 extern int acc_on_device (acc_device_t);
 
+#if __cplusplus
+}
+#endif
+
 int
 f (void)
 {
-  const int dev = acc_device_X;
+  const acc_device_t dev = acc_device_X;
   return acc_on_device (dev);
 }
 
 /* With -fopenacc, we're expecting the builtin to be expanded, so no calls.
-   { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]*\\\"acc_on_device" 0 "expand" } } */
+   TODO: in C++, even under extern "C", the use of enum for acc_device_t
+   perturbs expansion as a builtin, which expects an int parameter.  It's fine
+   when changing acc_device_t to plain int, but that's not what we're doing in
+   <openacc.h>.
+   { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]*\\\"acc_on_device" 0 "expand" { xfail c++ } } } */
 
 /* { dg-final { cleanup-rtl-dump "expand" } } */
diff --git gcc/testsuite/c-c++-common/goacc/acc_on_device-1.c gcc/testsuite/gcc.dg/goacc/acc_on_device-1.c
similarity index 82%
rename from gcc/testsuite/c-c++-common/goacc/acc_on_device-1.c
rename to gcc/testsuite/gcc.dg/goacc/acc_on_device-1.c
index e606b88..1a0276e 100644
--- gcc/testsuite/c-c++-common/goacc/acc_on_device-1.c
+++ gcc/testsuite/gcc.dg/goacc/acc_on_device-1.c
@@ -1,5 +1,5 @@
 /* Have to enable optimizations, as otherwise builtins won't be expanded.  */
-/* { dg-additional-options "-O -fdump-rtl-expand -Wno-implicit-function-declaration" } */
+/* { dg-additional-options "-O -fdump-rtl-expand -std=c89 -Wno-implicit-function-declaration" } */
 
 int
 f (void)


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [gomp4] Remove unused BUILT_IN_OMP_SET_NUM_THREADS (was: various OpenACC/PTX built-ins and a reduction tweak)
  2014-09-18 18:43 ` [gomp4] various OpenACC/PTX built-ins and a reduction tweak Thomas Schwinge
  2014-09-18 19:10   ` Cesar Philippidis
@ 2014-11-05 15:39   ` Thomas Schwinge
  1 sibling, 0 replies; 9+ messages in thread
From: Thomas Schwinge @ 2014-11-05 15:39 UTC (permalink / raw)
  To: gcc-patches; +Cc: Cesar Philippidis

[-- Attachment #1: Type: text/plain, Size: 2276 bytes --]

Hi!

On Thu, 18 Sep 2014 20:43:20 +0200, I wrote:
> On Tue, 16 Sep 2014 17:32:54 -0700, Cesar Philippidis <cesar@codesourcery.com> wrote:
> > The patch [...]

> > --- a/gcc/omp-builtins.def
> > +++ b/gcc/omp-builtins.def
> > @@ -236,6 +236,3 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET_UPDATE, "GOMP_target_update",
> >  		  BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST)
> >  DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TEAMS, "GOMP_teams",
> >  		  BT_FN_VOID_UINT_UINT, ATTR_NOTHROW_LIST)
> > -
> > -DEF_GOMP_BUILTIN (BUILT_IN_OMP_SET_NUM_THREADS, "omp_set_num_threads",
> > -		  BT_FN_VOID_INT, ATTR_CONST_NOTHROW_LEAF_LIST)
> 
> To avoid confusion: that has been added to gomp-4_0-branch earlier, and
> is now reverted to the trunk state.

I have now actually removed this; r217135:

commit d2579456a7b9008ba19cabc88393f83334324bdd
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Wed Nov 5 15:38:02 2014 +0000

    Remove unused BUILT_IN_OMP_SET_NUM_THREADS.
    
    	gcc/
    	* omp-builtins.def (BUILT_IN_OMP_SET_NUM_THREADS): Remove.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@217135 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp   | 4 ++++
 gcc/omp-builtins.def | 3 ---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index ce98a18..ae1afd0 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,3 +1,7 @@
+2014-11-05  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* omp-builtins.def (BUILT_IN_OMP_SET_NUM_THREADS): Remove.
+
 2014-11-03  Cesar Philippidis  <cesar@codesourcery.com>
 
 	* builtins.def (DEF_GOACC_BUILTIN): Revert erroneous checkin.
diff --git gcc/omp-builtins.def gcc/omp-builtins.def
index 698dc79..08b825c 100644
--- gcc/omp-builtins.def
+++ gcc/omp-builtins.def
@@ -236,6 +236,3 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET_UPDATE, "GOMP_target_update",
 		  BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TEAMS, "GOMP_teams",
 		  BT_FN_VOID_UINT_UINT, ATTR_NOTHROW_LIST)
-
-DEF_GOMP_BUILTIN (BUILT_IN_OMP_SET_NUM_THREADS, "omp_set_num_threads",
-		  BT_FN_VOID_INT, ATTR_CONST_NOTHROW_LEAF_LIST)


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-11-05 15:39 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-17  0:33 [gomp4] various OpenACC/PTX built-ins and a reduction tweak Cesar Philippidis
2014-09-17  8:44 ` Tobias Burnus
2014-09-17  8:50   ` Jakub Jelinek
2014-09-17 14:12     ` Cesar Philippidis
2014-09-18 18:01     ` [gomp4] OpenACC acc_on_device (was: various OpenACC/PTX built-ins and a reduction tweak) Thomas Schwinge
2014-10-31 10:56       ` [gomp4] OpenACC acc_on_device Thomas Schwinge
2014-09-18 18:43 ` [gomp4] various OpenACC/PTX built-ins and a reduction tweak Thomas Schwinge
2014-09-18 19:10   ` Cesar Philippidis
2014-11-05 15:39   ` [gomp4] Remove unused BUILT_IN_OMP_SET_NUM_THREADS (was: various OpenACC/PTX built-ins and a reduction tweak) Thomas Schwinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).