public inbox for fortran@gcc.gnu.org
 help / color / mirror / Atom feed
From: Cesar Philippidis <cesar@codesourcery.com>
To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	Fortran List	<fortran@gcc.gnu.org>,
	Jakub Jelinek <jakub@redhat.com>
Subject: [PATCH,openacc] check for compatible loop parallelism with acc routine calls
Date: Thu, 16 Jun 2016 03:12:00 -0000	[thread overview]
Message-ID: <5762190F.4030102@codesourcery.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 2424 bytes --]

This patch addresses the following problems with acc routines:

 * incorrectly permitting 'acc seq' loops to call gang, worker and
   vector routines

 * lto-wrapper errors when a function or subroutine isn't marked as
   'acc routine'

The solution to the first problem is straightforward. It only required a
small change to oacc_loop_fixed_partitions. The solution to the second
problem is more involved, since it required changes to the fortran FE,
gimplifier, the behavior of flag_generate_offload, and libgomp.

Starting with the the fortran changes, this patch updates the way that
the fortran FE handles the 'acc routine' attribute in modules. Before,
it only recorded that a function was marked as an acc routine. With this
patch, it now records the level of parallelism the routine has. This is
necessary for the middle end to validate compatible parallelism between
the loop calling the routine and the routine itself.

The second set of changes involves teaching the gimplifier to error when
it detects a function call to an non-acc routines inside an OpenACC
offloaded region. Actually, I relaxed non-acc routines by excluding
calls to builtin functions, including those prefixed with _gfortran_.
Nvptx does have a newlib c library, and it also has a subset of
libgfortran. Still, this solution is probably not optimal.

Next, I had to modify the openacc header files in libgomp to mark
acc_on_device as an acc routine. Unfortunately, this meant that I had to
build the opeancc.mod module for gfortran with -fopenacc. But doing
that, caused caused gcc to stream offloaded code to the openacc.o object
file. So, I've updated the behavior of flag_generate_offload such that
minus one indicates that the user specified -foffload=disable, and that
will prevent gcc from streaming offloaded lto code. The alternative was
to hack libtool to build libgomp with -foffload=disable.

Is this patch OK for trunk?

There are still a couple of other quirks with routines we'll need to
address with a follow up patch. Namely, passing scalar dummy arguments
causes to subroutines trips up the nvptx worker and vector state
propagator if the actual argument is a local variable. That's because
the nvptx state propagator only forwards the pointer to the worker and
vector threads, and not the actual variable itself. Consequently, those
pointers dereference garbage. This is a problem with pass-by-reference
in general.

Cesar


[-- Attachment #2: acc-subroutines-20160615.diff --]
[-- Type: text/x-patch, Size: 24095 bytes --]

2016-06-15  Cesar Philippidis  <cesar@codesourcery.com>

	gcc/
	* cgraphunit.c (ipa_passes): Only stream offloaded code when
	flag_generate_offload is positive.
	(symbol_table::compile): Likewise.
	* common.opt (flag_generate_offload): Update comment on its usage.
	* gimplify.c (gimplify_call_expr): Verify that function calls inside
	OpenACC offloaded regions are 'acc routines'.
	* ipa-inline-analysis.c (inline_generate_summary): Update the usage of
	flag_generate_offload.
	* lto-streamer.c (gate_lto_out): Likewise.
	* omp-low.c (oacc_loop_fixed_partitions): Consider SEQ loop when
	validing loop parallelism restrictions.
	* opts.c (common_handle_option): Set x_flag_generate_offload to minus
	one with -foffload=disable.
	* passes.c (ipa_write_summaries): Update usage of flag_generate_offload.
	* toplev.c (compile_file): Likewise.
	* tree.c (free_lang_data):  Likewise.

	gcc/fortran/
	* gfortran.h (enum oacc_function): New enum.
	* module.c (oacc_function): New DECIO_MIO_NAME.
	(mio_symbol_attribute): Handle oacc_function attributes.
	* openmp.c (gfc_oacc_routine_dims): Use enum oacc_function to capture
	acc routine geometry.
	(gfc_match_oacc_routine): Update call to gfc_oacc_routine_dims.
	* symbol.c (oacc_function_types): New const mstring.
	* trans-decl.c (add_attributes_to_decl): Update handling of
	oacc_function.

	gcc/testsuite/
	* c-c++-common/goacc/routine-3.c: Add test coverage for seq loops.
	* c-c++-common/goacc/routine-6.c: New test.
	* gfortran.dg/goacc/routine-7.f90: New test.
	* gfortran.dg/goacc/routine-8.f90: New test.

	libgomp/
	* Makefile.am (openacc.lo): New target.
	(openacc.mod): Build with -fopenacc -foffload=disable.
	* Makefile.in: Regenerate.
	* openacc.f90 (function_on_device_h): Make 'acc routine seq'.
	* openacc.h (acc_on_device): Likewise.
	* openacc_lib.h (acc_on_device): Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-4.c: Filter out warning.
	* testsuite/libgomp.oacc-fortran/routine-7.f90: Update test case to
	properly utilize acc parallelism.

diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index 4bfcad7..5dd211c 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -2292,12 +2292,12 @@ ipa_passes (void)
     }
 
   /* Some targets need to handle LTO assembler output specially.  */
-  if (flag_generate_lto || flag_generate_offload)
+  if (flag_generate_lto || flag_generate_offload > 0)
     targetm.asm_out.lto_start ();
 
   if (!in_lto_p)
     {
-      if (g->have_offload)
+      if (g->have_offload && flag_generate_offload > 0)
 	{
 	  section_name_prefix = OFFLOAD_SECTION_NAME_PREFIX;
 	  lto_stream_offload_p = true;
@@ -2312,7 +2312,7 @@ ipa_passes (void)
 	}
     }
 
-  if (flag_generate_lto || flag_generate_offload)
+  if (flag_generate_lto || flag_generate_offload > 0)
     targetm.asm_out.lto_end ();
 
   if (!flag_ltrans && (in_lto_p || !flag_lto || flag_fat_lto_objects))
@@ -2393,11 +2393,11 @@ symbol_table::compile (void)
   state = IPA;
 
   /* Offloading requires LTO infrastructure.  */
-  if (!in_lto_p && g->have_offload)
+  if (!in_lto_p && g->have_offload && flag_generate_offload >= 0)
     flag_generate_offload = 1;
 
   /* If LTO is enabled, initialize the streamer hooks needed by GIMPLE.  */
-  if (flag_generate_lto || flag_generate_offload)
+  if (flag_generate_lto || flag_generate_offload > 0)
     lto_streamer_hooks_init ();
 
   /* Don't run the IPA passes if there was any error or sorry messages.  */
diff --git a/gcc/common.opt b/gcc/common.opt
index f0d7196..9560e08 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -73,7 +73,8 @@ int *param_values
 Variable
 int flag_generate_lto
 
-; Nonzero if we should write GIMPLE bytecode for offload compilation.
+; Positive if we should write GIMPLE bytecode for offload compilation.
+; Negative if the user explicitly passed -foffload=disable.
 Variable
 int flag_generate_offload = 0
 
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 0bb71cb..fac94ca 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -303,6 +303,15 @@ enum save_state
 { SAVE_NONE = 0, SAVE_EXPLICIT, SAVE_IMPLICIT
 };
 
+/* Flags to keep track of ACC routine states.  */
+enum oacc_function
+{ OACC_FUNCTION_NONE = 0,
+  OACC_FUNCTION_SEQ,
+  OACC_FUNCTION_GANG,
+  OACC_FUNCTION_WORKER,
+  OACC_FUNCTION_VECTOR
+};
+
 /* Strings for all symbol attributes.  We use these for dumping the
    parse tree, in error messages, and also when reading and writing
    modules.  In symbol.c.  */
@@ -312,6 +321,7 @@ extern const mstring intents[];
 extern const mstring access_types[];
 extern const mstring ifsrc_types[];
 extern const mstring save_status[];
+extern const mstring oacc_function_types[];
 
 /* Enumeration of all the generic intrinsic functions.  Used by the
    backend for identification of a function.  */
@@ -862,7 +872,7 @@ typedef struct
   unsigned oacc_declare_link:1;
 
   /* This is an OpenACC acclerator function at level N - 1  */
-  unsigned oacc_function:3;
+  ENUM_BITFIELD (oacc_function) oacc_function:3;
 
   /* Attributes set by compiler extensions (!GCC$ ATTRIBUTES).  */
   unsigned ext_attr:EXT_ATTR_NUM;
diff --git a/gcc/fortran/module.c b/gcc/fortran/module.c
index 6d3860e..e3ed2a0 100644
--- a/gcc/fortran/module.c
+++ b/gcc/fortran/module.c
@@ -2095,6 +2095,7 @@ DECL_MIO_NAME (procedure_type)
 DECL_MIO_NAME (ref_type)
 DECL_MIO_NAME (sym_flavor)
 DECL_MIO_NAME (sym_intent)
+DECL_MIO_NAME (oacc_function)
 #undef DECL_MIO_NAME
 
 /* Symbol attributes are stored in list with the first three elements
@@ -2116,6 +2117,8 @@ mio_symbol_attribute (symbol_attribute *attr)
   attr->proc = MIO_NAME (procedure_type) (attr->proc, procedures);
   attr->if_source = MIO_NAME (ifsrc) (attr->if_source, ifsrc_types);
   attr->save = MIO_NAME (save_state) (attr->save, save_status);
+  attr->oacc_function = MIO_NAME (oacc_function) (attr->oacc_function,
+						  oacc_function_types);
 
   ext_attr = attr->ext_attr;
   mio_integer ((int *) &ext_attr);
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 2c92794..96fc2fd 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -1716,21 +1716,31 @@ gfc_match_oacc_cache (void)
 
 /* Determine the loop level for a routine.   */
 
-static int
+static oacc_function
 gfc_oacc_routine_dims (gfc_omp_clauses *clauses)
 {
   int level = -1;
+  oacc_function ret = OACC_FUNCTION_SEQ;
 
   if (clauses)
     {
       unsigned mask = 0;
 
       if (clauses->gang)
-	level = GOMP_DIM_GANG, mask |= GOMP_DIM_MASK (level);
+	{
+	  level = GOMP_DIM_GANG, mask |= GOMP_DIM_MASK (level);
+	  ret = OACC_FUNCTION_GANG;
+	}
       if (clauses->worker)
-	level = GOMP_DIM_WORKER, mask |= GOMP_DIM_MASK (level);
+	{
+	  level = GOMP_DIM_WORKER, mask |= GOMP_DIM_MASK (level);
+	  ret = OACC_FUNCTION_WORKER;
+	}
       if (clauses->vector)
-	level = GOMP_DIM_VECTOR, mask |= GOMP_DIM_MASK (level);
+	{
+	  level = GOMP_DIM_VECTOR, mask |= GOMP_DIM_MASK (level);
+	  ret = OACC_FUNCTION_VECTOR;
+	}
       if (clauses->seq)
 	level = GOMP_DIM_MAX, mask |= GOMP_DIM_MASK (level);
 
@@ -1741,7 +1751,7 @@ gfc_oacc_routine_dims (gfc_omp_clauses *clauses)
   if (level < 0)
     level = GOMP_DIM_MAX;
 
-  return level;
+  return ret;
 }
 
 match
@@ -1834,7 +1844,7 @@ gfc_match_oacc_routine (void)
 				       &old_loc))
 	goto cleanup;
       gfc_current_ns->proc_name->attr.oacc_function
-	= gfc_oacc_routine_dims (c) + 1;
+	= gfc_oacc_routine_dims (c);
     }
 
   if (n)
diff --git a/gcc/fortran/symbol.c b/gcc/fortran/symbol.c
index 0ee7dec..b1dd32b 100644
--- a/gcc/fortran/symbol.c
+++ b/gcc/fortran/symbol.c
@@ -87,6 +87,15 @@ const mstring save_status[] =
     minit ("IMPLICIT-SAVE", SAVE_IMPLICIT),
 };
 
+const mstring oacc_function_types[] =
+{
+  minit ("NONE", OACC_FUNCTION_NONE),
+  minit ("OACC_FUNCTION_SEQ", OACC_FUNCTION_SEQ),
+  minit ("OACC_FUNCTION_GANG", OACC_FUNCTION_GANG),
+  minit ("OACC_FUNCTION_WORKER", OACC_FUNCTION_WORKER),
+  minit ("OACC_FUNCTION_VECTOR", OACC_FUNCTION_VECTOR)
+};
+
 /* This is to make sure the backend generates setup code in the correct
    order.  */
 
diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c
index 2f5e434..0b8d638 100644
--- a/gcc/fortran/trans-decl.c
+++ b/gcc/fortran/trans-decl.c
@@ -1331,7 +1331,22 @@ add_attributes_to_decl (symbol_attribute sym_attr, tree list)
     {
       tree dims = NULL_TREE;
       int ix;
-      int level = sym_attr.oacc_function - 1;
+      int level = GOMP_DIM_MAX;
+
+      switch (sym_attr.oacc_function)
+	{
+	case OACC_FUNCTION_GANG:
+	  level = GOMP_DIM_GANG;
+	  break;
+	case OACC_FUNCTION_WORKER:
+	  level = GOMP_DIM_WORKER;
+	  break;
+	case OACC_FUNCTION_VECTOR:
+	  level = GOMP_DIM_VECTOR;
+	  break;
+	case OACC_FUNCTION_SEQ:
+	default:;
+	}
 
       for (ix = GOMP_DIM_MAX; ix--;)
 	dims = tree_cons (build_int_cst (boolean_type_node, ix >= level),
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index ae8b4fc..6a9ab3c 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -2697,6 +2697,26 @@ gimplify_call_expr (tree *expr_p, gimple_seq *pre_p, bool want_value)
     CALL_EXPR_FN (*expr_p) = build1 (NOP_EXPR, fnptrtype,
 				     CALL_EXPR_FN (*expr_p));
 
+  /* Check if this function is being called from inside an OpenACC
+     offloaded region.  If so, verify that this function has been
+     declared as an 'acc routine'.  Defer loop parallelism geometry
+     checking until oacc_device_lower.  */
+
+  const char *name = fndecl == NULL_TREE ? ""
+    : IDENTIFIER_POINTER (DECL_NAME (fndecl));
+  gimplify_omp_ctx *ctx = gimplify_omp_ctxp;
+  if (ctx != NULL && (ctx->region_type == ORT_ACC
+		      || ctx->region_type == ORT_ACC_PARALLEL
+		      || ctx->region_type == ORT_ACC_KERNELS)
+      && !is_builtin_fn (fndecl)
+      && !(lang_GNU_Fortran () && strncmp (name, "_gfortran_", 10) == 0)
+      && get_oacc_fn_attrib (fndecl) == NULL_TREE)
+    {
+      error ("%qE is not an %<acc routine%>", fndecl);
+      CALL_EXPR_FN (*expr_p) = NULL_TREE;
+      return GS_OK;
+    }
+
   return ret;
 }
 
diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
index 5d67218..c37a5ab 100644
--- a/gcc/ipa-inline-analysis.c
+++ b/gcc/ipa-inline-analysis.c
@@ -4179,7 +4179,8 @@ inline_generate_summary (void)
 
   /* When not optimizing, do not bother to analyze.  Inlining is still done
      because edge redirection needs to happen there.  */
-  if (!optimize && !flag_generate_lto && !flag_generate_offload && !flag_wpa)
+  if (!optimize && !flag_generate_lto && flag_generate_offload <= 0
+      && !flag_wpa)
     return;
 
   if (!inline_summaries)
diff --git a/gcc/lto-streamer.c b/gcc/lto-streamer.c
index bfde1fe..e1848be 100644
--- a/gcc/lto-streamer.c
+++ b/gcc/lto-streamer.c
@@ -309,7 +309,7 @@ lto_streamer_init (void)
 bool
 gate_lto_out (void)
 {
-  return ((flag_generate_lto || flag_generate_offload || in_lto_p)
+  return ((flag_generate_lto || flag_generate_offload > 0 || in_lto_p)
 	  /* Don't bother doing anything if the program has errors.  */
 	  && !seen_error ());
 }
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 22e5909..7824048 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -19420,7 +19420,8 @@ oacc_loop_fixed_partitions (oacc_loop *loop, unsigned outer_mask)
     {
       unsigned outermost = this_mask & -this_mask;
 
-      if (outermost && outermost <= outer_mask)
+      if ((outermost && outermost <= outer_mask)
+	  || (this_mask && (loop->parent->flags & OLF_SEQ)))
 	{
 	  if (noisy)
 	    {
diff --git a/gcc/opts.c b/gcc/opts.c
index e80331f..eec91f8 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -1930,6 +1930,7 @@ common_handle_option (struct gcc_options *opts,
 		&& (p[7] == ',' || p[7] == '\0'))
 	      {
 		opts->x_flag_disable_hsa = true;
+		opts->x_flag_generate_offload = -1;
 		break;
 	      }
 
diff --git a/gcc/passes.c b/gcc/passes.c
index 0565cfa..9c1b902 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -2531,7 +2531,7 @@ ipa_write_summaries (void)
   struct cgraph_node *node;
   struct cgraph_node **order;
 
-  if ((!flag_generate_lto && !flag_generate_offload) || seen_error ())
+  if ((!flag_generate_lto && flag_generate_offload <= 0) || seen_error ())
     return;
 
   select_what_to_stream ();
diff --git a/gcc/testsuite/c-c++-common/goacc/routine-3.c b/gcc/testsuite/c-c++-common/goacc/routine-3.c
index b322d26..fabae1f 100644
--- a/gcc/testsuite/c-c++-common/goacc/routine-3.c
+++ b/gcc/testsuite/c-c++-common/goacc/routine-3.c
@@ -49,7 +49,7 @@ main ()
   int red = 0;
 #pragma acc parallel copy (red)
   {
-    /* Independent/seq loop tests.  */
+    /* Independent loop tests.  */
 #pragma acc loop reduction (+:red) // { dg-warning "insufficient partitioning" }
     for (int i = 0; i < 10; i++)
       red += gang ();
@@ -62,6 +62,19 @@ main ()
     for (int i = 0; i < 10; i++)
       red += vector ();
 
+    /* Seq loop tests.  */
+#pragma acc loop seq reduction (+:red)  /* { dg-message "containing loop" } */
+    for (int i = 0; i < 10; i++)
+      red += gang (); /* { dg-error "incorrectly nested" } */
+
+#pragma acc loop seq reduction (+:red)  /* { dg-message "containing loop" } */
+    for (int i = 0; i < 10; i++)
+      red += worker (); /* { dg-error "incorrectly nested" } */
+
+#pragma acc loop seq reduction (+:red)  /* { dg-message "containing loop" } */
+    for (int i = 0; i < 10; i++)
+      red += vector (); /* { dg-error "incorrectly nested" } */
+    
     /* Gang routine tests.  */
 #pragma acc loop gang reduction (+:red)  /* { dg-message "containing loop" } */
     for (int i = 0; i < 10; i++)
diff --git a/gcc/testsuite/c-c++-common/goacc/routine-6.c b/gcc/testsuite/c-c++-common/goacc/routine-6.c
new file mode 100644
index 0000000..e95954b
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/routine-6.c
@@ -0,0 +1,26 @@
+/* Test calls to non-routines.  */
+
+int
+sum (int a, int b)
+{
+  return a + b;
+}
+
+int
+main ()
+{
+  int c = 0, i;
+
+#pragma acc parallel loop reduction(+:c)
+  for (i = 0; i < 100; i++)
+    c += sum (i, i); /* { dg-error "'sum' is not an 'acc routine'" } */
+
+  /* Built-in functions are permitted.  */
+#pragma acc parallel
+  {
+    if (c < 0)
+      __builtin_abort ();
+  }
+
+  return 0;
+}
diff --git a/gcc/testsuite/gfortran.dg/goacc/routine-7.f90 b/gcc/testsuite/gfortran.dg/goacc/routine-7.f90
new file mode 100644
index 0000000..27b08b5
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/routine-7.f90
@@ -0,0 +1,25 @@
+! Test calls to non-acc routines.
+
+program test
+  implicit none
+  integer c, i
+
+  c = 0
+
+  !$acc parallel loop reduction(+:c)
+  do i = 0, 100
+     c = c + sum (i, i) ! { dg-error "'sum' is not an 'acc routine'" }
+  end do
+  !$acc end parallel loop
+
+  !$acc parallel
+  if (c .le. 0) call abort
+  !$acc end parallel
+  
+contains
+  integer function sum(a, b)
+    integer a, b
+    sum = a + b
+  end function sum
+  
+end program test
diff --git a/gcc/testsuite/gfortran.dg/goacc/routine-8.f90 b/gcc/testsuite/gfortran.dg/goacc/routine-8.f90
new file mode 100644
index 0000000..d2cb51a
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/routine-8.f90
@@ -0,0 +1,122 @@
+! Check routine calls with insufficient parallelism.
+
+! { dg-do compile }
+! { dg-additional-options "-cpp -O0" }
+
+#define M 8
+#define N 32
+
+program main
+  integer :: i
+  integer :: a(N)
+  integer :: b(M * N)
+
+  do i = 1, N
+    a(i) = 0
+  end do
+
+  !$acc parallel copy (a)
+  !$acc loop seq
+    do i = 1, N
+      call seq (a)
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (a(i) .ne.N) call abort
+  end do
+
+  !$acc parallel copy (a)
+  !$acc loop seq ! { dg-message "containing loop here" }
+    do i = 1, N 
+      call gang (a) ! { dg-error "incorrectly nested OpenACC loop parallelism" }
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (a(i) .ne. (N + (N * (-1 * i)))) call abort
+  end do
+
+  do i = 1, N
+    b(i) = i
+  end do
+
+  !$acc parallel copy (b)
+  !$acc loop seq ! { dg-message "containing loop here" }
+    do i = 1, N
+      call worker (b) ! { dg-error "incorrectly nested OpenACC loop parallelism" }
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (b(i) .ne. N + i) call abort
+  end do
+
+  do i = 1, N
+    a(i) = i
+  end do
+
+  !$acc parallel copy (a)
+  !$acc loop seq ! { dg-message "containing loop here" }
+    do i = 1, N
+      call vector (a) ! { dg-error "incorrectly nested OpenACC loop parallelism" }
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (a(i) .ne. 0) call abort
+  end do
+
+contains
+
+subroutine vector (a)
+  !$acc routine vector
+  integer, intent (inout) :: a(N)
+  integer :: i
+
+  !$acc loop vector
+  do i = 1, N
+    a(i) = a(i) - a(i) 
+  end do
+
+end subroutine vector
+
+subroutine worker (b)
+  !$acc routine worker
+  integer, intent (inout) :: b(M*N)
+  integer :: i, j
+
+  !$acc loop worker
+  do i = 1, N
+  !$acc loop vector
+    do j = 1, M
+      b(j + ((i - 1) * M)) = b(j + ((i - 1) * M)) + 1
+    end do
+  end do
+
+end subroutine worker
+
+subroutine gang (a)
+  !$acc routine gang
+  integer, intent (inout) :: a(N)
+  integer :: i
+
+  !$acc loop gang
+  do i = 1, N
+    a(i) = a(i) - i 
+  end do
+
+end subroutine gang
+
+subroutine seq (a)
+  !$acc routine seq
+  integer, intent (inout) :: a(M)
+  integer :: i
+
+  do i = 1, N
+    a(i) = a(i) + 1
+  end do
+
+end subroutine seq
+
+end program main
diff --git a/gcc/toplev.c b/gcc/toplev.c
index f51d2cb..47d8e2e 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -556,7 +556,7 @@ compile_file (void)
      We used to emit an undefined reference here, but this produces
      link errors if an object file with IL is stored into a shared
      library without invoking lto1.  */
-  if (flag_generate_lto || flag_generate_offload)
+  if (flag_generate_lto || flag_generate_offload > 0)
     {
 #if defined ASM_OUTPUT_ALIGNED_DECL_COMMON
       ASM_OUTPUT_ALIGNED_DECL_COMMON (asm_out_file, NULL_TREE,
diff --git a/gcc/tree.c b/gcc/tree.c
index fd0e692..e6712c2 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -6000,7 +6000,7 @@ free_lang_data (void)
 
   /* If we are the LTO frontend we have freed lang-specific data already.  */
   if (in_lto_p
-      || (!flag_generate_lto && !flag_generate_offload))
+      || (!flag_generate_lto && flag_generate_offload <= 0))
     return 0;
 
   /* Allocate and assign alias sets to the standard integer types
diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
index a3e1c2b..085478b 100644
--- a/libgomp/Makefile.am
+++ b/libgomp/Makefile.am
@@ -87,8 +87,10 @@ omp_lib_kinds.mod: omp_lib.mod
 	:
 openacc_kinds.mod: openacc.mod
 	:
-openacc.mod: openacc.lo
-	:
+openacc.lo: openacc.f90
+	$(LTFCCOMPILE) -fopenacc -foffload=disable -c -o $@ $^
+openacc.mod: openacc.f90
+	$(FC) $(FCFLAGS) -fopenacc -foffload=disable -c $<
 %.mod: %.f90
 	$(FC) $(FCFLAGS) -fsyntax-only $<
 fortran.lo: libgomp_f.h
diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
index 88c8517..baf0f8d 100644
--- a/libgomp/Makefile.in
+++ b/libgomp/Makefile.in
@@ -1286,8 +1286,10 @@ omp_lib_kinds.mod: omp_lib.mod
 	:
 openacc_kinds.mod: openacc.mod
 	:
-openacc.mod: openacc.lo
-	:
+openacc.lo: openacc.f90
+	$(LTFCCOMPILE) -fopenacc -foffload=disable -c -o $@ $^
+openacc.mod: openacc.f90
+	$(FC) $(FCFLAGS) -fopenacc -foffload=disable -c $<
 %.mod: %.f90
 	$(FC) $(FCFLAGS) -fsyntax-only $<
 fortran.lo: libgomp_f.h
diff --git a/libgomp/openacc.f90 b/libgomp/openacc.f90
index 4b71489..98ba493 100644
--- a/libgomp/openacc.f90
+++ b/libgomp/openacc.f90
@@ -128,6 +128,7 @@ module openacc_internal
 
     function acc_on_device_h (d)
       import
+      !$acc routine seq
       integer (acc_device_kind) d
       logical acc_on_device_h
     end function
@@ -719,6 +720,7 @@ end subroutine
 function acc_on_device_h (d)
   use openacc_internal, only: acc_on_device_l
   use openacc_kinds
+  !$acc routine seq
   integer (acc_device_kind) d
   logical acc_on_device_h
   if (acc_on_device_l (d) .eq. 1) then
diff --git a/libgomp/openacc.h b/libgomp/openacc.h
index 7ea8794..094db50 100644
--- a/libgomp/openacc.h
+++ b/libgomp/openacc.h
@@ -83,6 +83,9 @@ void acc_shutdown (acc_device_t) __GOACC_NOTHROW;
 #ifdef __cplusplus
 int acc_on_device (int __arg) __GOACC_NOTHROW;
 #else
+#ifdef _OPENACC
+#pragma acc routine seq
+#endif
 int acc_on_device (acc_device_t __arg) __GOACC_NOTHROW;
 #endif
 void *acc_malloc (size_t) __GOACC_NOTHROW;
diff --git a/libgomp/openacc_lib.h b/libgomp/openacc_lib.h
index a3f94d7..d627857 100644
--- a/libgomp/openacc_lib.h
+++ b/libgomp/openacc_lib.h
@@ -142,6 +142,7 @@
       interface acc_on_device
         function acc_on_device_h (devicetype)
           import acc_device_kind
+!$acc routine seq
           logical acc_on_device_h
           integer (acc_device_kind) devicetype
         end function
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-4.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-4.c
index d6ff44d..02b1f15 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-4.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-4.c
@@ -78,7 +78,7 @@ main(int argc, char **argv)
 
 #pragma acc parallel copy (a[0:N])
   {
-#pragma acc loop seq
+#pragma acc loop /* { dg-warning "insufficient partitioning" } */
     for (i = 0; i < N; i++)
       gang (&a[0]);
   }
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/routine-7.f90 b/libgomp/testsuite/libgomp.oacc-fortran/routine-7.f90
index 200188e..27cda44 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/routine-7.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/routine-7.f90
@@ -1,121 +1,101 @@
+! Test acc routines.
 
 ! { dg-do run }
-! { dg-additional-options "-cpp" }
 
-#define M 8
-#define N 32
+module size
+  integer, parameter :: N = 32
+end module size
 
 program main
+  use size
+  implicit none
+
   integer :: i
   integer :: a(N)
-  integer :: b(M * N)
 
-  do i = 1, N
-    a(i) = 0
-  end do
-
-  !$acc parallel copy (a)
-  !$acc loop seq
-    do i = 1, N
-      call seq (a)
-    end do
+  !$acc parallel
+  call seq (a)
   !$acc end parallel
 
   do i = 1, N
-    if (a(i) .ne.N) call abort
+    if (a(i) .ne. 4) call abort
   end do
 
-  !$acc parallel copy (a)
-  !$acc loop seq
-    do i = 1, N 
-      call gang (a)
-    end do
+  !$acc parallel
+  call gang (a)
   !$acc end parallel
 
   do i = 1, N
-    if (a(i) .ne. (N + (N * (-1 * i)))) call abort
+    if (a(i) .ne. 3) call abort
   end do
 
-  do i = 1, N
-    b(i) = i
-  end do
-
-  !$acc parallel copy (b)
-  !$acc loop seq
-    do i = 1, N
-      call worker (b)
-    end do
+  !$acc parallel
+  call worker (a)
   !$acc end parallel
 
   do i = 1, N
-    if (b(i) .ne. N + i) call abort
-  end do
-
-  do i = 1, N
-    a(i) = i
+    if (a(i) .ne. 2) call abort
   end do
 
-  !$acc parallel copy (a)
-  !$acc loop seq
-    do i = 1, N
-      call vector (a)
-    end do
+  !$acc parallel
+  call vector (a)
   !$acc end parallel
 
   do i = 1, N
-    if (a(i) .ne. 0) call abort
+    if (a(i) .ne. 1) call abort
   end do
 
 contains
 
 subroutine vector (a)
+  use size
+  implicit none
   !$acc routine vector
   integer, intent (inout) :: a(N)
   integer :: i
 
   !$acc loop vector
   do i = 1, N
-    a(i) = a(i) - a(i) 
+    a(i) = 1
   end do
-
 end subroutine vector
 
-subroutine worker (b)
+subroutine worker (a)
+  use size
+  implicit none
   !$acc routine worker
-  integer, intent (inout) :: b(M*N)
-  integer :: i, j
+  integer, intent (inout) :: a(N)
+  integer :: i
 
   !$acc loop worker
   do i = 1, N
-  !$acc loop vector
-    do j = 1, M
-      b(j + ((i - 1) * M)) = b(j + ((i - 1) * M)) + 1
-    end do
+    a(i) = 2
   end do
-
 end subroutine worker
 
 subroutine gang (a)
+  use size
+  implicit none
   !$acc routine gang
   integer, intent (inout) :: a(N)
   integer :: i
 
   !$acc loop gang
   do i = 1, N
-    a(i) = a(i) - i 
+    a(i) = 3
   end do
-
 end subroutine gang
 
 subroutine seq (a)
+  use size
+  implicit none
   !$acc routine seq
-  integer, intent (inout) :: a(M)
+  integer, intent (inout) :: a(N)
   integer :: i
 
   do i = 1, N
-    a(i) = a(i) + 1
+    a(i) = 4
   end do
-
 end subroutine seq
 
 end program main

             reply	other threads:[~2016-06-16  3:12 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-16  3:12 Cesar Philippidis [this message]
2016-06-17 14:42 ` Jakub Jelinek
2016-06-23 16:05   ` Cesar Philippidis
2016-06-29 14:11     ` Thomas Schwinge
2016-06-29 14:35       ` Jakub Jelinek
2016-06-29 15:31       ` Cesar Philippidis
2019-02-28 21:12 ` [PR72741] Encode OpenACC 'routine' directive inside Fortran module files Thomas Schwinge
2019-02-28 21:17   ` Jakub Jelinek
2019-03-13 17:50     ` [PR72741] Encode OpenACC 'routine' directive's level of parallelism " Thomas Schwinge
2019-03-13 22:13       ` Thomas Koenig
2019-03-14  7:38         ` Thomas Schwinge
2019-03-20 10:09           ` Thomas Schwinge
2019-03-21 20:49             ` Thomas Koenig
2019-03-21 19:47       ` Thomas Schwinge

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5762190F.4030102@codesourcery.com \
    --to=cesar@codesourcery.com \
    --cc=fortran@gcc.gnu.org \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).