public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* OpenMP: Generate SIMD clones for functions with "declare target"
@ 2022-09-14 17:32 Sandra Loosemore
  2022-09-14 18:12 ` Jakub Jelinek
  2022-09-14 21:45 ` Thomas Schwinge
  0 siblings, 2 replies; 10+ messages in thread
From: Sandra Loosemore @ 2022-09-14 17:32 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1063 bytes --]

This patch is part of the ongoing effort to find more SIMD optimization 
opportunities in OpenMP code.  Here we are looking for functions that 
have the "omp declare target" attribute that are also suitable 
candidates for automatic SIMD cloning.  I've made the filter quite 
conservative, but maybe it could be improved with some further analysis. 
  I added a command-line flag to disable this in case it is buggy :-P or 
leads to excessive code bloat without improving performance in some 
cases, otherwise the SIMD clones are generated in the same way and at 
the same optimization levels as the existing simdclone pass.

I had to modify the TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN hook 
to add a boolean argument to control diagnostics, since GCC shouldn't 
complain about types the target doesn't support in cases where the user 
didn't explicitly ask for clones to be created.  I tested on 
x86_64-linux-gnu-amdgcn, plain x86_64-linux-gnu, and aarch64-linux-gnu 
to get coverage of all 3 backends that implement this hook.  OK for 
mainline?

-Sandra

[-- Attachment #2: 0001-OpenMP-Generate-SIMD-clones-for-functions-with-decla.patch --]
[-- Type: text/x-patch, Size: 22746 bytes --]

From 77df203f8ec191e036580d17b7fa83ae517a8018 Mon Sep 17 00:00:00 2001
From: Sandra Loosemore <sandra@codesourcery.com>
Date: Wed, 14 Sep 2022 00:20:25 +0000
Subject: [PATCH] OpenMP: Generate SIMD clones for functions with "declare
 target"

This patch causes the IPA simdclone pass to generate clones for
functions with the "omp declare target" attribute as if they had
"omp declare simd", provided the function appears to be suitable for
SIMD execution.  The filter is conservative, rejecting functions
that write memory or that call other functions not known to be safe.
A new option -fopenmp-target-simd-clone is added to control this
transformation; it's enabled by default.

gcc/ChangeLog:

	* c-family/c.opt (fopenmp-target-simd-clone): New option.
	* fortran/lang.opt (fopenmp-target-simd-clone): New option.
	* doc/invoke.texi (-fno-openmp-target-simd-clone): Document.
	* omp-simd-clone.cc (auto_simd_check_stmt): New function.
	(mark_auto_simd_clone): New function.
	(expand_simd_clones): Also check for cloneable functions with
	"omp declare target".  Pass error_p argument to
	simd_clone.compute_vecsize_and_simdlen target hook.
	* target.def (TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN):
	Add bool error_p argument.
	* doc/tm.texi: Regenerated.
	* config/aarch64/aarch64.cc
	(aarch64_simd_clone_compute_vecsize_and_simdlen): Update.
	* config/gcn/gcn.cc
	(gcn_simd_clone_compute_vecsize_and_simdlen): Update.
	* config/i386/i386.cc
	(ix86_simd_clone_compute_vecsize_and_simdlen): Update.

gcc/testsuite/ChangeLog:

	* gcc.dg/gomp/target-simd-clone-1.c: New.
	* gcc.dg/gomp/target-simd-clone-2.c: New.
	* gcc.dg/gomp/target-simd-clone-3.c: New.
	* gcc.dg/gomp/target-simd-clone-4.c: New.
---
 gcc/c-family/c.opt                            |   4 +
 gcc/config/aarch64/aarch64.cc                 |  24 ++-
 gcc/config/gcn/gcn.cc                         |  10 +-
 gcc/config/i386/i386.cc                       |  25 ++-
 gcc/doc/invoke.texi                           |  13 +-
 gcc/doc/tm.texi                               |   2 +-
 gcc/fortran/lang.opt                          |   4 +
 gcc/omp-simd-clone.cc                         | 178 +++++++++++++++++-
 gcc/target.def                                |   2 +-
 .../gcc.dg/gomp/target-simd-clone-1.c         |  19 ++
 .../gcc.dg/gomp/target-simd-clone-2.c         |  18 ++
 .../gcc.dg/gomp/target-simd-clone-3.c         |  17 ++
 .../gcc.dg/gomp/target-simd-clone-4.c         |  16 ++
 13 files changed, 301 insertions(+), 31 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/gomp/target-simd-clone-1.c
 create mode 100644 gcc/testsuite/gcc.dg/gomp/target-simd-clone-2.c
 create mode 100644 gcc/testsuite/gcc.dg/gomp/target-simd-clone-3.c
 create mode 100644 gcc/testsuite/gcc.dg/gomp/target-simd-clone-4.c

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index ff6fe861534..0be39ae7709 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1993,6 +1993,10 @@ fopenmp-simd
 C ObjC C++ ObjC++ Var(flag_openmp_simd)
 Enable OpenMP's SIMD directives.
 
+fopenmp-target-simd-clone
+C ObjC C++ ObjC++ Var(flag_openmp_target_simd_clone) Init(1)
+Generate SIMD clones for functions with the OpenMP declare target directive.
+
 foperator-names
 C++ ObjC++
 Recognize C++ keywords like \"compl\" and \"xor\".
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index f199e77cd42..42c5d281537 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -26612,7 +26612,8 @@ currently_supported_simd_type (tree t, tree b)
 static int
 aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 					struct cgraph_simd_clone *clonei,
-					tree base_type, int num)
+					tree base_type, int num,
+					bool error_p)
 {
   tree t, ret_type;
   unsigned int elt_bits, count;
@@ -26630,8 +26631,9 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 	  || const_simdlen > 1024
 	  || (const_simdlen & (const_simdlen - 1)) != 0))
     {
-      warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		  "unsupported simdlen %wd", const_simdlen);
+      if (error_p)
+	warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		    "unsupported simdlen %wd", const_simdlen);
       return 0;
     }
 
@@ -26639,7 +26641,9 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
   if (TREE_CODE (ret_type) != VOID_TYPE
       && !currently_supported_simd_type (ret_type, base_type))
     {
-      if (TYPE_SIZE (ret_type) != TYPE_SIZE (base_type))
+      if (!error_p)
+	;
+      else if (TYPE_SIZE (ret_type) != TYPE_SIZE (base_type))
 	warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
 		    "GCC does not currently support mixed size types "
 		    "for %<simd%> functions");
@@ -26666,7 +26670,9 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
       if (clonei->args[i].arg_type != SIMD_CLONE_ARG_TYPE_UNIFORM
 	  && !currently_supported_simd_type (arg_type, base_type))
 	{
-	  if (TYPE_SIZE (arg_type) != TYPE_SIZE (base_type))
+	  if (!error_p)
+	    ;
+	  else if (TYPE_SIZE (arg_type) != TYPE_SIZE (base_type))
 	    warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
 			"GCC does not currently support mixed size types "
 			"for %<simd%> functions");
@@ -26696,9 +26702,11 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
       if (clonei->simdlen.is_constant (&const_simdlen)
 	  && maybe_ne (vec_bits, 64U) && maybe_ne (vec_bits, 128U))
 	{
-	  warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		      "GCC does not currently support simdlen %wd for type %qT",
-		      const_simdlen, base_type);
+	  if (error_p)
+	    warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+			"GCC does not currently support simdlen %wd for "
+			"type %qT",
+			const_simdlen, base_type);
 	  return 0;
 	}
     }
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index ceb69000807..771cfac9672 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -4562,7 +4562,8 @@ static int
 gcn_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *ARG_UNUSED (node),
 					    struct cgraph_simd_clone *clonei,
 					    tree base_type,
-					    int ARG_UNUSED (num))
+					    int ARG_UNUSED (num),
+					    bool error_p)
 {
   unsigned int elt_bits = GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type));
 
@@ -4572,9 +4573,10 @@ gcn_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *ARG_UNUSED (node
     {
       /* Note that x86 has a similar message that is likely to trigger on
 	 sizes that are OK for gcn; the user can't win.  */
-      warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		  "unsupported simdlen %wd (amdgcn)",
-		  clonei->simdlen.to_constant ());
+      if (error_p)
+	warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		    "unsupported simdlen %wd (amdgcn)",
+		    clonei->simdlen.to_constant ());
       return 0;
     }
 
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index c4d0e36e9c0..98e5a3f28fe 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -23647,7 +23647,8 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val)
 static int
 ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 					     struct cgraph_simd_clone *clonei,
-					     tree base_type, int num)
+					     tree base_type, int num,
+					     bool error_p)
 {
   int ret = 1;
 
@@ -23656,8 +23657,9 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 	  || clonei->simdlen > 1024
 	  || (clonei->simdlen & (clonei->simdlen - 1)) != 0))
     {
-      warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		  "unsupported simdlen %wd", clonei->simdlen.to_constant ());
+      if (error_p)
+	warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		    "unsupported simdlen %wd", clonei->simdlen.to_constant ());
       return 0;
     }
 
@@ -23677,8 +23679,9 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 	  break;
 	/* FALLTHRU */
       default:
-	warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		    "unsupported return type %qT for simd", ret_type);
+	if (error_p)
+	  warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		      "unsupported return type %qT for simd", ret_type);
 	return 0;
       }
 
@@ -23707,8 +23710,9 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 	default:
 	  if (clonei->args[i].arg_type == SIMD_CLONE_ARG_TYPE_UNIFORM)
 	    break;
-	  warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		      "unsupported argument type %qT for simd", arg_type);
+	  if (error_p)
+	    warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+			"unsupported argument type %qT for simd", arg_type);
 	  return 0;
 	}
     }
@@ -23784,9 +23788,10 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 	cnt /= clonei->vecsize_float;
       if (cnt > (TARGET_64BIT ? 16 : 8))
 	{
-	  warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		      "unsupported simdlen %wd",
-		      clonei->simdlen.to_constant ());
+	  if (error_p)
+	    warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+			"unsupported simdlen %wd",
+			clonei->simdlen.to_constant ());
 	  return 0;
 	}
       }
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 8def6baa904..f822091af09 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -204,7 +204,7 @@ in the following sections.
 -flax-vector-conversions  -fms-extensions @gol
 -foffload=@var{arg}  -foffload-options=@var{arg} @gol
 -fopenacc  -fopenacc-dim=@var{geom} @gol
--fopenmp  -fopenmp-simd @gol
+-fopenmp  -fopenmp-simd  -fno-openmp-target-simd-clone @gol
 -fpermitted-flt-eval-methods=@var{standard} @gol
 -fplan9-extensions  -fsigned-bitfields  -funsigned-bitfields @gol
 -fsigned-char  -funsigned-char  -fsso-struct=@var{endianness}}
@@ -2749,6 +2749,17 @@ Enable handling of OpenMP's SIMD directives with @code{#pragma omp}
 in C/C++ and @code{!$omp} in Fortran. Other OpenMP directives
 are ignored.
 
+@item -fno-openmp-target-simd-clone
+@opindex fno-openmp-target-simd-clone
+@cindex OpenMP target SIMD clone
+In addition to generating SIMD clones for functions marked with the
+@code{declare simd} directive, by default, GCC also generates clones
+for functions marked with the OpenMP @code{declare target} directive
+that are suitable for vectorization.
+You can disable this behavior and restrict SIMD clone generation only
+to functions explicitly marked @code{declare simd} using
+@option{-fno-openmp-target-simd}.
+
 @item -fpermitted-flt-eval-methods=@var{style}
 @opindex fpermitted-flt-eval-methods
 @opindex fpermitted-flt-eval-methods=c11
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index c3001c6ded9..d0a366f1908 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6249,7 +6249,7 @@ The default is @code{NULL_TREE} which means to not vectorize scatter
 stores.
 @end deftypefn
 
-@deftypefn {Target Hook} int TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN (struct cgraph_node *@var{}, struct cgraph_simd_clone *@var{}, @var{tree}, @var{int})
+@deftypefn {Target Hook} int TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN (struct cgraph_node *@var{}, struct cgraph_simd_clone *@var{}, @var{tree}, @var{int}, @var{bool})
 This hook should set @var{vecsize_mangle}, @var{vecsize_int}, @var{vecsize_float}
 fields in @var{simd_clone} structure pointed by @var{clone_info} argument and also
 @var{simdlen} field if it was previously 0.
diff --git a/gcc/fortran/lang.opt b/gcc/fortran/lang.opt
index b18a6d3f4f9..58f7d3a2835 100644
--- a/gcc/fortran/lang.opt
+++ b/gcc/fortran/lang.opt
@@ -712,6 +712,10 @@ fopenmp-simd
 Fortran
 ; Documented in C
 
+fopenmp-target-simd-clone
+Fortran
+; Documented in C
+
 fpack-derived
 Fortran Var(flag_pack_derived)
 Try to lay out derived types as compactly as possible.
diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
index 34cbee5afcd..07c7bad0e2c 100644
--- a/gcc/omp-simd-clone.cc
+++ b/gcc/omp-simd-clone.cc
@@ -51,6 +51,151 @@ along with GCC; see the file COPYING3.  If not see
 #include "stringpool.h"
 #include "attribs.h"
 #include "omp-simd-clone.h"
+#include "omp-low.h"
+
+/* Helper function for mark_auto_simd_clone; return false if the statement
+   violates restrictions for an "omp declare simd" function.  Specifically,
+   the function must not
+   - throw or call setjmp/longjmp
+   - write memory that could alias parallel calls
+   - include openmp directives or calls
+   - call functions that might do those things */
+
+static bool
+auto_simd_check_stmt (gimple *stmt, tree outer)
+{
+  tree decl;
+
+  switch (gimple_code (stmt))
+    {
+    case GIMPLE_CALL:
+      decl = gimple_call_fndecl (stmt);
+
+      /* We can't know whether indirect calls are safe.  */
+      if (decl == NULL_TREE)
+	return false;
+
+      /* Calls to functions that are CONST or PURE are ok.  */
+      if (gimple_call_flags (stmt) & (ECF_CONST | ECF_PURE))
+	break;
+
+      /* Calls to functions that are already marked "omp declare simd" are
+	 OK.  */
+      if (lookup_attribute ("omp declare simd", DECL_ATTRIBUTES (decl)))
+	break;
+
+      /* Let recursive calls to the current function through.  */
+      if (decl == outer)
+	break;
+
+      /* Other function calls are not permitted.  */
+      return false;
+
+      /* OpenMP directives are not permitted.  */
+    CASE_GIMPLE_OMP:
+      return false;
+
+      /* Conservatively reject all EH-related constructs.  */
+    case GIMPLE_CATCH:
+    case GIMPLE_EH_FILTER:
+    case GIMPLE_EH_MUST_NOT_THROW:
+    case GIMPLE_EH_ELSE:
+    case GIMPLE_EH_DISPATCH:
+    case GIMPLE_RESX:
+    case GIMPLE_TRY:
+      return false;
+
+      /* Asms are not permitted since we don't know what they do.  */
+    case GIMPLE_ASM:
+      return false;
+
+    default:
+      break;
+    }
+
+  /* Memory writes are not permitted.
+     FIXME: this could be relaxed a little to permit writes to
+     function-local variables that could not alias other instances
+     of the function running in parallel.  */
+  if (gimple_store_p (stmt))
+    return false;
+  else
+    return true;
+}
+
+/* If the function NODE appears suitable for auto-annotation with "declare
+   simd", add and return such an attribute, otherwise return null.  */
+
+static tree
+mark_auto_simd_clone (struct cgraph_node *node)
+{
+  tree decl = node->decl;
+  tree t;
+  machine_mode m;
+  tree result;
+  basic_block bb;
+
+  /* Nothing to do if the function isn't a declaration or doesn't
+     have a body.  */
+  if (!node->definition || !node->has_gimple_body_p ())
+    return NULL_TREE;
+
+  /* Nothing to do if the function already has the "omp declare simd"
+     attribute, is marked noclone, or is not "omp declare target".  */
+  if (lookup_attribute ("omp declare simd", DECL_ATTRIBUTES (decl))
+      || lookup_attribute ("noclone", DECL_ATTRIBUTES (decl))
+      || !lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl)))
+    return NULL_TREE;
+
+  /* Backends will check for vectorizable arguments/return types in a
+     target-specific way, but we can immediately filter out functions
+     that have non-scalar arguments/return types.  Also, atomic types
+     trigger warnings in simd_clone_clauses_extract.  */
+  t = TREE_TYPE (TREE_TYPE (decl));
+  m = TYPE_MODE (t);
+  if (!(VOID_TYPE_P (t) || is_a <scalar_mode> (m)) || TYPE_ATOMIC (t))
+    return NULL_TREE;
+
+  if (TYPE_ARG_TYPES (TREE_TYPE (decl)))
+    {
+      for (tree temp = TYPE_ARG_TYPES (TREE_TYPE (decl));
+	   temp; temp = TREE_CHAIN (temp))
+	{
+	  t = TREE_VALUE (temp);
+	  m = TYPE_MODE (t);
+	  if (!(VOID_TYPE_P (t) || is_a <scalar_mode> (m)) || TYPE_ATOMIC (t))
+	    return NULL_TREE;
+	}
+    }
+  else
+    {
+      for (tree temp = DECL_ARGUMENTS (decl); temp; temp = DECL_CHAIN (temp))
+	{
+	  t = TREE_TYPE (temp);
+	  m = TYPE_MODE (t);
+	  if (!(VOID_TYPE_P (t) || is_a <scalar_mode> (m)) || TYPE_ATOMIC (t))
+	    return NULL_TREE;
+	}
+    }
+
+  /* Scan the function body to see if it is suitable for SIMD-ization.  */
+  node->get_body ();
+
+  FOR_EACH_BB_FN (bb, DECL_STRUCT_FUNCTION (decl))
+    {
+      for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
+	   gsi_next (&gsi))
+	if (!auto_simd_check_stmt (gsi_stmt (gsi), decl))
+	  return NULL_TREE;
+    }
+
+  /* All is good.  */
+  result = tree_cons (get_identifier ("omp declare simd"), NULL,
+		      DECL_ATTRIBUTES (decl));
+  DECL_ATTRIBUTES (decl) = result;
+  return result;
+}
+
 
 /* Return the number of elements in vector type VECTYPE, which is associated
    with a SIMD clone.  At present these always have a constant length.  */
@@ -1683,13 +1828,31 @@ simd_clone_adjust (struct cgraph_node *node)
 void
 expand_simd_clones (struct cgraph_node *node)
 {
-  tree attr = lookup_attribute ("omp declare simd",
-				DECL_ATTRIBUTES (node->decl));
-  if (attr == NULL_TREE
-      || node->inlined_to
+  tree attr;
+  bool error_p = true;
+
+  if (node->inlined_to
       || lookup_attribute ("noclone", DECL_ATTRIBUTES (node->decl)))
     return;
 
+  attr = lookup_attribute ("omp declare simd",
+			   DECL_ATTRIBUTES (node->decl));
+
+  /* See if we can add an "omp declare simd" directive implicitly
+     before giving up.  */
+  /* FIXME: OpenACC "#pragma acc routine" translates into
+     "omp declare target", but appears also to have some other effects
+     that conflict with generating SIMD clones, causing ICEs.  So don't
+     do this if we've got OpenACC instead of OpenMP.  */
+  if (attr == NULL_TREE
+      && flag_openmp_target_simd_clone && !flag_openacc)
+    {
+      attr = mark_auto_simd_clone (node);
+      error_p = false;
+    }
+  if (attr == NULL_TREE)
+    return;
+
   /* Ignore
      #pragma omp declare simd
      extern int foo ();
@@ -1714,13 +1877,15 @@ expand_simd_clones (struct cgraph_node *node)
 
       poly_uint64 orig_simdlen = clone_info->simdlen;
       tree base_type = simd_clone_compute_base_data_type (node, clone_info);
+
       /* The target can return 0 (no simd clones should be created),
 	 1 (just one ISA of simd clones should be created) or higher
 	 count of ISA variants.  In that case, clone_info is initialized
 	 for the first ISA variant.  */
       int count
 	= targetm.simd_clone.compute_vecsize_and_simdlen (node, clone_info,
-							  base_type, 0);
+							  base_type, 0,
+							  error_p);
       if (count == 0)
 	continue;
 
@@ -1745,7 +1910,8 @@ expand_simd_clones (struct cgraph_node *node)
 	      /* And call the target hook again to get the right ISA.  */
 	      targetm.simd_clone.compute_vecsize_and_simdlen (node, clone,
 							      base_type,
-							      i / 2);
+							      i / 2,
+							      error_p);
 	      if ((i & 1) != 0)
 		clone->inbranch = 1;
 	    }
diff --git a/gcc/target.def b/gcc/target.def
index 4d49ffc2c88..6e830bed52a 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1634,7 +1634,7 @@ fields in @var{simd_clone} structure pointed by @var{clone_info} argument and al
 not determined by the bitsize (in which case @var{simdlen} is always used).\n\
 The hook should return 0 if SIMD clones shouldn't be emitted,\n\
 or number of @var{vecsize_mangle} variants that should be emitted.",
-int, (struct cgraph_node *, struct cgraph_simd_clone *, tree, int), NULL)
+int, (struct cgraph_node *, struct cgraph_simd_clone *, tree, int, bool), NULL)
 
 DEFHOOK
 (adjust,
diff --git a/gcc/testsuite/gcc.dg/gomp/target-simd-clone-1.c b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-1.c
new file mode 100644
index 00000000000..c367d704002
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-1.c
@@ -0,0 +1,19 @@
+/* { dg-options "-fopenmp -fdump-tree-optimized -O2" } */
+
+/* Test that simd clones are generated for functions with "declare target".  */
+
+#pragma omp declare target
+int addit(int a, int b, int c)
+{
+  return a + b;
+}
+#pragma omp end declare target
+
+/* { dg-final { scan-tree-dump "_ZGVbN4vvv_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump "_ZGVbM4vvv_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump "_ZGVcN4vvv_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump "_ZGVcM4vvv_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump "_ZGVdN8vvv_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump "_ZGVdM8vvv_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump "_ZGVeN16vvv_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump "_ZGVeM16vvv_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/gomp/target-simd-clone-2.c b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-2.c
new file mode 100644
index 00000000000..28df4282623
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-2.c
@@ -0,0 +1,18 @@
+/* { dg-options "-fopenmp -fdump-tree-optimized -O2" } */
+
+/* Test that simd clones are not generated for functions with 
+   "declare target" but unsuitable arguments.  */
+
+struct s {
+  int a;
+  int b;
+};
+  
+#pragma omp declare target
+int addit (struct s x)
+{
+  return x.a + x.b;
+}
+#pragma omp end declare target
+
+/* { dg-final { scan-tree-dump-not "_Z.*_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/gomp/target-simd-clone-3.c b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-3.c
new file mode 100644
index 00000000000..807a2f9204d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-3.c
@@ -0,0 +1,17 @@
+/* { dg-options "-fopenmp -fdump-tree-optimized -O2" } */
+
+/* Test that simd clones are not generated for functions with 
+   "declare target" but that call possibly side-effecting functions 
+   in the body.  */
+
+extern int f (int);
+
+#pragma omp declare target
+int addit(int a, int b, int c)
+{
+  return f(a) + b;
+}
+#pragma omp end declare target
+
+/* { dg-final { scan-tree-dump-not "_Z.*_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
+
diff --git a/gcc/testsuite/gcc.dg/gomp/target-simd-clone-4.c b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-4.c
new file mode 100644
index 00000000000..76bbcf43b03
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-4.c
@@ -0,0 +1,16 @@
+/* { dg-options "-fopenmp -fdump-tree-optimized -O2" } */
+
+/* Test that simd clones are not generated for functions with 
+   "declare target" but that write memory in the body.  */
+
+extern int save;
+
+#pragma omp declare target
+int addit(int a, int b, int c)
+{
+  save = c;
+  return a + b;
+}
+#pragma omp end declare target
+
+/* { dg-final { scan-tree-dump-not "_Z.*_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: OpenMP: Generate SIMD clones for functions with "declare target"
  2022-09-14 17:32 OpenMP: Generate SIMD clones for functions with "declare target" Sandra Loosemore
@ 2022-09-14 18:12 ` Jakub Jelinek
  2022-09-22  3:17   ` [PATCH v2] " Sandra Loosemore
  2022-09-14 21:45 ` Thomas Schwinge
  1 sibling, 1 reply; 10+ messages in thread
From: Jakub Jelinek @ 2022-09-14 18:12 UTC (permalink / raw)
  To: Sandra Loosemore; +Cc: gcc-patches

On Wed, Sep 14, 2022 at 11:32:11AM -0600, Sandra Loosemore wrote:
> This patch is part of the ongoing effort to find more SIMD optimization
> opportunities in OpenMP code.  Here we are looking for functions that have
> the "omp declare target" attribute that are also suitable candidates for
> automatic SIMD cloning.  I've made the filter quite conservative, but maybe
> it could be improved with some further analysis.  I added a command-line
> flag to disable this in case it is buggy :-P or leads to excessive code
> bloat without improving performance in some cases, otherwise the SIMD clones
> are generated in the same way and at the same optimization levels as the
> existing simdclone pass.
> 
> I had to modify the TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN hook to
> add a boolean argument to control diagnostics, since GCC shouldn't complain
> about types the target doesn't support in cases where the user didn't
> explicitly ask for clones to be created.  I tested on
> x86_64-linux-gnu-amdgcn, plain x86_64-linux-gnu, and aarch64-linux-gnu to
> get coverage of all 3 backends that implement this hook.  OK for mainline?

declare simd is an ABI relevant declarative directive, while declare target
is not, all the latter does is say whether the function should be also (or
only) compilable on an offloading target.
Creating simd clones under some option for random declare target functions
(note, declare target is partly auto-discovered property) is perhaps fine
for functions not exported from the translation unit where it is purely
an optimization, but otherwise it is a significant ABI problem,
you export whole new bunch of new exports on the definition side and expect
those to be exported on the use side.  If you compile one TU with
-fopenmp-target-simd-clone and another one without it, program might not
link anymore.  And worse, as it is decided based on the exact implementation
of the function, I assume you can't do that automatically for functions
not defined locally, but whether something has simd clones or not might
change over time based on how you change the implementation.
Say libfoo.so exports a declare target function foo, which is initially
implemented without say using inline asm (or calling one of the "bad"
functions or using exceptions etc.), but then a bugfix comes and needs
to use inline asm or something else in the implementation.  Previously
libfoo.so would export the simd clones, but now it doesn't, so the ABI
of the library changes.

If it is pure optimization thing and purely keyed on the definition,
all the simd clones should be local to the TU, never exported from it.

	Jakub


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: OpenMP: Generate SIMD clones for functions with "declare target"
  2022-09-14 17:32 OpenMP: Generate SIMD clones for functions with "declare target" Sandra Loosemore
  2022-09-14 18:12 ` Jakub Jelinek
@ 2022-09-14 21:45 ` Thomas Schwinge
  1 sibling, 0 replies; 10+ messages in thread
From: Thomas Schwinge @ 2022-09-14 21:45 UTC (permalink / raw)
  To: Sandra Loosemore; +Cc: gcc-patches

Hi Sandra!

Commenting on just one single item:

On 2022-09-14T11:32:11-0600, Sandra Loosemore <sandra@codesourcery.com> wrote:
> --- a/gcc/omp-simd-clone.cc
> +++ b/gcc/omp-simd-clone.cc

>  void
>  expand_simd_clones (struct cgraph_node *node)
>  {
> -  tree attr = lookup_attribute ("omp declare simd",
> -                             DECL_ATTRIBUTES (node->decl));
> -  if (attr == NULL_TREE
> -      || node->inlined_to
> +  tree attr;
> +  bool error_p = true;
> +
> +  if (node->inlined_to
>        || lookup_attribute ("noclone", DECL_ATTRIBUTES (node->decl)))
>      return;
>
> +  attr = lookup_attribute ("omp declare simd",
> +                        DECL_ATTRIBUTES (node->decl));
> +
> +  /* See if we can add an "omp declare simd" directive implicitly
> +     before giving up.  */
> +  /* FIXME: OpenACC "#pragma acc routine" translates into
> +     "omp declare target", but appears also to have some other effects
> +     that conflict with generating SIMD clones, causing ICEs.  So don't
> +     do this if we've got OpenACC instead of OpenMP.  */

Uh, ICEs...  (But I suppose this processing is not relevant for OpenACC
'routine's.)

However, OpenACC and OpenMP support may be active at the same time...

> +  if (attr == NULL_TREE
> +      && flag_openmp_target_simd_clone && !flag_openacc)

..., so '!flag_openacc' is not the right check here.  Instead you'd do
'!oacc_get_fn_attrib (DECL_ATTRIBUTES (node->decl))' (untested) or
similar.

> +    {
> +      attr = mark_auto_simd_clone (node);
> +      error_p = false;
> +    }
> +  if (attr == NULL_TREE)
> +    return;


Grüße
 Thomas
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2] Re: OpenMP: Generate SIMD clones for functions with "declare target"
  2022-09-14 18:12 ` Jakub Jelinek
@ 2022-09-22  3:17   ` Sandra Loosemore
  2022-09-30 10:37     ` Jakub Jelinek
  0 siblings, 1 reply; 10+ messages in thread
From: Sandra Loosemore @ 2022-09-22  3:17 UTC (permalink / raw)
  To: Jakub Jelinek, Thomas Schwinge; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1432 bytes --]

On 9/14/22 12:12, Jakub Jelinek wrote:

> If it is pure optimization thing and purely keyed on the definition,
> all the simd clones should be local to the TU, never exported from it.

OK, here is a revised patch that addresses that.  x86_64 target also 
generates a different set of clones for functions with internal linkage 
vs external so I hacked that to treat these implicit clones in the same 
way as other internal clones.

There is an existing problem with internal "declare simd" clones in that 
nothing ever DCEs clones that end up not being useful, or does a scan of 
the code in the compilation unit before clone generation to avoid 
generating useless clones in the first place.  I haven't tried to solve 
that problem, but I did attempt to mitigate it for these implicit 
"declare target" clones by tagging the option 
OPT_LEVELS_2_PLUS_SPEED_ONLY (instead of enabling it by default all the 
time) so the clones are not generated by default at -Os and -Og.  I 
added a couple new test cases to check this.

On 9/14/22 15:45, Thomas Schwinge wrote:
> However, OpenACC and OpenMP support may be active at the same time...
> 
>> +  if (attr == NULL_TREE
>> +      && flag_openmp_target_simd_clone && !flag_openacc)
> 
> ..., so '!flag_openacc' is not the right check here.  Instead you'd do
> '!oacc_get_fn_attrib (DECL_ATTRIBUTES (node->decl))' (untested) or
> similar.

This is fixed now too.

OK to check in?

-Sandra

[-- Attachment #2: declare-target-simd-clones-v2.patch --]
[-- Type: text/x-patch, Size: 28028 bytes --]

From dfdb9a2162978b964863f351c814211dca8e9a3f Mon Sep 17 00:00:00 2001
From: Sandra Loosemore <sandra@codesourcery.com>
Date: Thu, 22 Sep 2022 02:16:42 +0000
Subject: [PATCH] OpenMP: Generate SIMD clones for functions with "declare
 target"

This patch causes the IPA simdclone pass to generate clones for
functions with the "omp declare target" attribute as if they had
"omp declare simd", provided the function appears to be suitable for
SIMD execution.  The filter is conservative, rejecting functions
that write memory or that call other functions not known to be safe.
A new option -fopenmp-target-simd-clone is added to control this
transformation; it's enabled at -O2 and higher.

gcc/ChangeLog:

	* common.opt (fopenmp-target-simd-clone): New option.
	* opts.cc (default_options_table): Add -fopenmp-target-simd-clone.
	* doc/invoke.texi (-fopenmp-target-simd-clone): Document.
	* omp-simd-clone.cc (auto_simd_check_stmt): New function.
	(mark_auto_simd_clone): New function.
	(simd_clone_create): Add force_local argument, make the symbol
	have internal linkage if it is true.
	(expand_simd_clones): Also check for cloneable functions with
	"omp declare target".  Pass explicit_p argument to
	simd_clone.compute_vecsize_and_simdlen target hook.
	* target.def (TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN):
	Add bool explicit_p argument.
	* doc/tm.texi: Regenerated.
	* config/aarch64/aarch64.cc
	(aarch64_simd_clone_compute_vecsize_and_simdlen): Update.
	* config/gcn/gcn.cc
	(gcn_simd_clone_compute_vecsize_and_simdlen): Update.
	* config/i386/i386.cc
	(ix86_simd_clone_compute_vecsize_and_simdlen): Update.

gcc/testsuite/ChangeLog:

	* gcc.dg/gomp/target-simd-clone-1.c: New.
	* gcc.dg/gomp/target-simd-clone-2.c: New.
	* gcc.dg/gomp/target-simd-clone-3.c: New.
	* gcc.dg/gomp/target-simd-clone-4.c: New.
	* gcc.dg/gomp/target-simd-clone-5.c: New.
	* gcc.dg/gomp/target-simd-clone-6.c: New.
---
 gcc/common.opt                                |   4 +
 gcc/config/aarch64/aarch64.cc                 |  24 +-
 gcc/config/gcn/gcn.cc                         |  10 +-
 gcc/config/i386/i386.cc                       |  27 +-
 gcc/doc/invoke.texi                           |  12 +-
 gcc/doc/tm.texi                               |   2 +-
 gcc/omp-simd-clone.cc                         | 237 ++++++++++++++++--
 gcc/opts.cc                                   |   1 +
 gcc/target.def                                |   2 +-
 .../gcc.dg/gomp/target-simd-clone-1.c         |  18 ++
 .../gcc.dg/gomp/target-simd-clone-2.c         |  18 ++
 .../gcc.dg/gomp/target-simd-clone-3.c         |  17 ++
 .../gcc.dg/gomp/target-simd-clone-4.c         |  16 ++
 .../gcc.dg/gomp/target-simd-clone-5.c         |  13 +
 .../gcc.dg/gomp/target-simd-clone-6.c         |  13 +
 15 files changed, 362 insertions(+), 52 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/gomp/target-simd-clone-1.c
 create mode 100644 gcc/testsuite/gcc.dg/gomp/target-simd-clone-2.c
 create mode 100644 gcc/testsuite/gcc.dg/gomp/target-simd-clone-3.c
 create mode 100644 gcc/testsuite/gcc.dg/gomp/target-simd-clone-4.c
 create mode 100644 gcc/testsuite/gcc.dg/gomp/target-simd-clone-5.c
 create mode 100644 gcc/testsuite/gcc.dg/gomp/target-simd-clone-6.c

diff --git a/gcc/common.opt b/gcc/common.opt
index fba90ff6dcb..c735c62a8d4 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2217,6 +2217,10 @@ fomit-frame-pointer
 Common Var(flag_omit_frame_pointer) Optimization
 When possible do not generate stack frames.
 
+fopenmp-target-simd-clone
+Common Var(flag_openmp_target_simd_clone) Optimization
+Generate SIMD clones for functions with the OpenMP declare target directive.
+
 fopt-info
 Common Var(flag_opt_info) Optimization
 Enable all optimization info dumps on stderr.
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index f199e77cd42..c6d282c55ef 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -26612,7 +26612,8 @@ currently_supported_simd_type (tree t, tree b)
 static int
 aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 					struct cgraph_simd_clone *clonei,
-					tree base_type, int num)
+					tree base_type, int num,
+					bool explicit_p)
 {
   tree t, ret_type;
   unsigned int elt_bits, count;
@@ -26630,8 +26631,9 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 	  || const_simdlen > 1024
 	  || (const_simdlen & (const_simdlen - 1)) != 0))
     {
-      warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		  "unsupported simdlen %wd", const_simdlen);
+      if (explicit_p)
+	warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		    "unsupported simdlen %wd", const_simdlen);
       return 0;
     }
 
@@ -26639,7 +26641,9 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
   if (TREE_CODE (ret_type) != VOID_TYPE
       && !currently_supported_simd_type (ret_type, base_type))
     {
-      if (TYPE_SIZE (ret_type) != TYPE_SIZE (base_type))
+      if (!explicit_p)
+	;
+      else if (TYPE_SIZE (ret_type) != TYPE_SIZE (base_type))
 	warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
 		    "GCC does not currently support mixed size types "
 		    "for %<simd%> functions");
@@ -26666,7 +26670,9 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
       if (clonei->args[i].arg_type != SIMD_CLONE_ARG_TYPE_UNIFORM
 	  && !currently_supported_simd_type (arg_type, base_type))
 	{
-	  if (TYPE_SIZE (arg_type) != TYPE_SIZE (base_type))
+	  if (!explicit_p)
+	    ;
+	  else if (TYPE_SIZE (arg_type) != TYPE_SIZE (base_type))
 	    warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
 			"GCC does not currently support mixed size types "
 			"for %<simd%> functions");
@@ -26696,9 +26702,11 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
       if (clonei->simdlen.is_constant (&const_simdlen)
 	  && maybe_ne (vec_bits, 64U) && maybe_ne (vec_bits, 128U))
 	{
-	  warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		      "GCC does not currently support simdlen %wd for type %qT",
-		      const_simdlen, base_type);
+	  if (explicit_p)
+	    warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+			"GCC does not currently support simdlen %wd for "
+			"type %qT",
+			const_simdlen, base_type);
 	  return 0;
 	}
     }
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index ceb69000807..5c80b8df852 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -4562,7 +4562,8 @@ static int
 gcn_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *ARG_UNUSED (node),
 					    struct cgraph_simd_clone *clonei,
 					    tree base_type,
-					    int ARG_UNUSED (num))
+					    int ARG_UNUSED (num),
+					    bool explicit_p)
 {
   unsigned int elt_bits = GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type));
 
@@ -4572,9 +4573,10 @@ gcn_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *ARG_UNUSED (node
     {
       /* Note that x86 has a similar message that is likely to trigger on
 	 sizes that are OK for gcn; the user can't win.  */
-      warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		  "unsupported simdlen %wd (amdgcn)",
-		  clonei->simdlen.to_constant ());
+      if (explicit_p)
+	warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		    "unsupported simdlen %wd (amdgcn)",
+		    clonei->simdlen.to_constant ());
       return 0;
     }
 
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index c4d0e36e9c0..99ae388ad56 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -23647,7 +23647,8 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val)
 static int
 ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 					     struct cgraph_simd_clone *clonei,
-					     tree base_type, int num)
+					     tree base_type, int num,
+					     bool explicit_p)
 {
   int ret = 1;
 
@@ -23656,8 +23657,9 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 	  || clonei->simdlen > 1024
 	  || (clonei->simdlen & (clonei->simdlen - 1)) != 0))
     {
-      warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		  "unsupported simdlen %wd", clonei->simdlen.to_constant ());
+      if (explicit_p)
+	warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		    "unsupported simdlen %wd", clonei->simdlen.to_constant ());
       return 0;
     }
 
@@ -23677,8 +23679,9 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 	  break;
 	/* FALLTHRU */
       default:
-	warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		    "unsupported return type %qT for simd", ret_type);
+	if (explicit_p)
+	  warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		      "unsupported return type %qT for simd", ret_type);
 	return 0;
       }
 
@@ -23707,13 +23710,14 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 	default:
 	  if (clonei->args[i].arg_type == SIMD_CLONE_ARG_TYPE_UNIFORM)
 	    break;
-	  warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		      "unsupported argument type %qT for simd", arg_type);
+	  if (explicit_p)
+	    warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+			"unsupported argument type %qT for simd", arg_type);
 	  return 0;
 	}
     }
 
-  if (!TREE_PUBLIC (node->decl))
+  if (!TREE_PUBLIC (node->decl) || !explicit_p)
     {
       /* If the function isn't exported, we can pick up just one ISA
 	 for the clones.  */
@@ -23784,9 +23788,10 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 	cnt /= clonei->vecsize_float;
       if (cnt > (TARGET_64BIT ? 16 : 8))
 	{
-	  warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		      "unsupported simdlen %wd",
-		      clonei->simdlen.to_constant ());
+	  if (explicit_p)
+	    warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+			"unsupported simdlen %wd",
+			clonei->simdlen.to_constant ());
 	  return 0;
 	}
       }
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 8def6baa904..e05739a334c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -204,7 +204,7 @@ in the following sections.
 -flax-vector-conversions  -fms-extensions @gol
 -foffload=@var{arg}  -foffload-options=@var{arg} @gol
 -fopenacc  -fopenacc-dim=@var{geom} @gol
--fopenmp  -fopenmp-simd @gol
+-fopenmp  -fopenmp-simd  -fopenmp-target-simd-clone @gol
 -fpermitted-flt-eval-methods=@var{standard} @gol
 -fplan9-extensions  -fsigned-bitfields  -funsigned-bitfields @gol
 -fsigned-char  -funsigned-char  -fsso-struct=@var{endianness}}
@@ -2749,6 +2749,16 @@ Enable handling of OpenMP's SIMD directives with @code{#pragma omp}
 in C/C++ and @code{!$omp} in Fortran. Other OpenMP directives
 are ignored.
 
+@item -fopenmp-target-simd-clone
+@opindex fopenmp-target-simd-clone
+@cindex OpenMP target SIMD clone
+In addition to generating SIMD clones for functions marked with the
+@code{declare simd} directive, GCC also generates clones
+for functions marked with the OpenMP @code{declare target} directive
+that are suitable for vectorization when this option is in effect.
+It is enabled by default at @option{-O2} and higher (but not @option{-Os}
+or @option{-Og}).
+
 @item -fpermitted-flt-eval-methods=@var{style}
 @opindex fpermitted-flt-eval-methods
 @opindex fpermitted-flt-eval-methods=c11
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index c3001c6ded9..d0a366f1908 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6249,7 +6249,7 @@ The default is @code{NULL_TREE} which means to not vectorize scatter
 stores.
 @end deftypefn
 
-@deftypefn {Target Hook} int TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN (struct cgraph_node *@var{}, struct cgraph_simd_clone *@var{}, @var{tree}, @var{int})
+@deftypefn {Target Hook} int TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN (struct cgraph_node *@var{}, struct cgraph_simd_clone *@var{}, @var{tree}, @var{int}, @var{bool})
 This hook should set @var{vecsize_mangle}, @var{vecsize_int}, @var{vecsize_float}
 fields in @var{simd_clone} structure pointed by @var{clone_info} argument and also
 @var{simdlen} field if it was previously 0.
diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
index 34cbee5afcd..f9e98b099d1 100644
--- a/gcc/omp-simd-clone.cc
+++ b/gcc/omp-simd-clone.cc
@@ -51,6 +51,152 @@ along with GCC; see the file COPYING3.  If not see
 #include "stringpool.h"
 #include "attribs.h"
 #include "omp-simd-clone.h"
+#include "omp-low.h"
+#include "omp-general.h"
+
+/* Helper function for mark_auto_simd_clone; return false if the statement
+   violates restrictions for an "omp declare simd" function.  Specifically,
+   the function must not
+   - throw or call setjmp/longjmp
+   - write memory that could alias parallel calls
+   - include openmp directives or calls
+   - call functions that might do those things */
+
+static bool
+auto_simd_check_stmt (gimple *stmt, tree outer)
+{
+  tree decl;
+
+  switch (gimple_code (stmt))
+    {
+    case GIMPLE_CALL:
+      decl = gimple_call_fndecl (stmt);
+
+      /* We can't know whether indirect calls are safe.  */
+      if (decl == NULL_TREE)
+	return false;
+
+      /* Calls to functions that are CONST or PURE are ok.  */
+      if (gimple_call_flags (stmt) & (ECF_CONST | ECF_PURE))
+	break;
+
+      /* Calls to functions that are already marked "omp declare simd" are
+	 OK.  */
+      if (lookup_attribute ("omp declare simd", DECL_ATTRIBUTES (decl)))
+	break;
+
+      /* Let recursive calls to the current function through.  */
+      if (decl == outer)
+	break;
+
+      /* Other function calls are not permitted.  */
+      return false;
+
+      /* OpenMP directives are not permitted.  */
+    CASE_GIMPLE_OMP:
+      return false;
+
+      /* Conservatively reject all EH-related constructs.  */
+    case GIMPLE_CATCH:
+    case GIMPLE_EH_FILTER:
+    case GIMPLE_EH_MUST_NOT_THROW:
+    case GIMPLE_EH_ELSE:
+    case GIMPLE_EH_DISPATCH:
+    case GIMPLE_RESX:
+    case GIMPLE_TRY:
+      return false;
+
+      /* Asms are not permitted since we don't know what they do.  */
+    case GIMPLE_ASM:
+      return false;
+
+    default:
+      break;
+    }
+
+  /* Memory writes are not permitted.
+     FIXME: this could be relaxed a little to permit writes to
+     function-local variables that could not alias other instances
+     of the function running in parallel.  */
+  if (gimple_store_p (stmt))
+    return false;
+  else
+    return true;
+}
+
+/* If the function NODE appears suitable for auto-annotation with "declare
+   simd", add and return such an attribute, otherwise return null.  */
+
+static tree
+mark_auto_simd_clone (struct cgraph_node *node)
+{
+  tree decl = node->decl;
+  tree t;
+  machine_mode m;
+  tree result;
+  basic_block bb;
+
+  /* Nothing to do if the function isn't a definition or doesn't
+     have a body.  */
+  if (!node->definition || !node->has_gimple_body_p ())
+    return NULL_TREE;
+
+  /* Nothing to do if the function already has the "omp declare simd"
+     attribute, is marked noclone, or is not "omp declare target".  */
+  if (lookup_attribute ("omp declare simd", DECL_ATTRIBUTES (decl))
+      || lookup_attribute ("noclone", DECL_ATTRIBUTES (decl))
+      || !lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl)))
+    return NULL_TREE;
+
+  /* Backends will check for vectorizable arguments/return types in a
+     target-specific way, but we can immediately filter out functions
+     that have non-scalar arguments/return types.  Also, atomic types
+     trigger warnings in simd_clone_clauses_extract.  */
+  t = TREE_TYPE (TREE_TYPE (decl));
+  m = TYPE_MODE (t);
+  if (!(VOID_TYPE_P (t) || is_a <scalar_mode> (m)) || TYPE_ATOMIC (t))
+    return NULL_TREE;
+
+  if (TYPE_ARG_TYPES (TREE_TYPE (decl)))
+    {
+      for (tree temp = TYPE_ARG_TYPES (TREE_TYPE (decl));
+	   temp; temp = TREE_CHAIN (temp))
+	{
+	  t = TREE_VALUE (temp);
+	  m = TYPE_MODE (t);
+	  if (!(VOID_TYPE_P (t) || is_a <scalar_mode> (m)) || TYPE_ATOMIC (t))
+	    return NULL_TREE;
+	}
+    }
+  else
+    {
+      for (tree temp = DECL_ARGUMENTS (decl); temp; temp = DECL_CHAIN (temp))
+	{
+	  t = TREE_TYPE (temp);
+	  m = TYPE_MODE (t);
+	  if (!(VOID_TYPE_P (t) || is_a <scalar_mode> (m)) || TYPE_ATOMIC (t))
+	    return NULL_TREE;
+	}
+    }
+
+  /* Scan the function body to see if it is suitable for SIMD-ization.  */
+  node->get_body ();
+
+  FOR_EACH_BB_FN (bb, DECL_STRUCT_FUNCTION (decl))
+    {
+      for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
+	   gsi_next (&gsi))
+	if (!auto_simd_check_stmt (gsi_stmt (gsi), decl))
+	  return NULL_TREE;
+    }
+
+  /* All is good.  */
+  result = tree_cons (get_identifier ("omp declare simd"), NULL,
+		      DECL_ATTRIBUTES (decl));
+  DECL_ATTRIBUTES (decl) = result;
+  return result;
+}
+
 
 /* Return the number of elements in vector type VECTYPE, which is associated
    with a SIMD clone.  At present these always have a constant length.  */
@@ -430,10 +576,12 @@ simd_clone_mangle (struct cgraph_node *node,
   return get_identifier (str);
 }
 
-/* Create a simd clone of OLD_NODE and return it.  */
+/* Create a simd clone of OLD_NODE and return it.  If FORCE_LOCAL is true,
+   create it as a local symbol, otherwise copy the symbol linkage and
+   visibility attributes from OLD_NODE.  */
 
 static struct cgraph_node *
-simd_clone_create (struct cgraph_node *old_node)
+simd_clone_create (struct cgraph_node *old_node, bool force_local)
 {
   struct cgraph_node *new_node;
   if (old_node->definition)
@@ -463,23 +611,38 @@ simd_clone_create (struct cgraph_node *old_node)
     return new_node;
 
   set_decl_built_in_function (new_node->decl, NOT_BUILT_IN, 0);
-  TREE_PUBLIC (new_node->decl) = TREE_PUBLIC (old_node->decl);
-  DECL_COMDAT (new_node->decl) = DECL_COMDAT (old_node->decl);
-  DECL_WEAK (new_node->decl) = DECL_WEAK (old_node->decl);
-  DECL_EXTERNAL (new_node->decl) = DECL_EXTERNAL (old_node->decl);
-  DECL_VISIBILITY_SPECIFIED (new_node->decl)
-    = DECL_VISIBILITY_SPECIFIED (old_node->decl);
-  DECL_VISIBILITY (new_node->decl) = DECL_VISIBILITY (old_node->decl);
-  DECL_DLLIMPORT_P (new_node->decl) = DECL_DLLIMPORT_P (old_node->decl);
-  if (DECL_ONE_ONLY (old_node->decl))
-    make_decl_one_only (new_node->decl, DECL_ASSEMBLER_NAME (new_node->decl));
-
-  /* The method cgraph_version_clone_with_body () will force the new
-     symbol local.  Undo this, and inherit external visibility from
-     the old node.  */
-  new_node->local = old_node->local;
-  new_node->externally_visible = old_node->externally_visible;
-  new_node->calls_declare_variant_alt = old_node->calls_declare_variant_alt;
+  if (force_local)
+    {
+      TREE_PUBLIC (new_node->decl) = 0;
+      DECL_COMDAT (new_node->decl) = 0;
+      DECL_WEAK (new_node->decl) = 0;
+      DECL_EXTERNAL (new_node->decl) = 0;
+      DECL_VISIBILITY_SPECIFIED (new_node->decl) = 0;
+      DECL_VISIBILITY (new_node->decl) = VISIBILITY_DEFAULT;
+      DECL_DLLIMPORT_P (new_node->decl) = 0;
+    }
+  else
+    {
+      TREE_PUBLIC (new_node->decl) = TREE_PUBLIC (old_node->decl);
+      DECL_COMDAT (new_node->decl) = DECL_COMDAT (old_node->decl);
+      DECL_WEAK (new_node->decl) = DECL_WEAK (old_node->decl);
+      DECL_EXTERNAL (new_node->decl) = DECL_EXTERNAL (old_node->decl);
+      DECL_VISIBILITY_SPECIFIED (new_node->decl)
+	= DECL_VISIBILITY_SPECIFIED (old_node->decl);
+      DECL_VISIBILITY (new_node->decl) = DECL_VISIBILITY (old_node->decl);
+      DECL_DLLIMPORT_P (new_node->decl) = DECL_DLLIMPORT_P (old_node->decl);
+      if (DECL_ONE_ONLY (old_node->decl))
+	make_decl_one_only (new_node->decl,
+			    DECL_ASSEMBLER_NAME (new_node->decl));
+
+      /* The method cgraph_version_clone_with_body () will force the new
+	 symbol local.  Undo this, and inherit external visibility from
+	 the old node.  */
+      new_node->local = old_node->local;
+      new_node->externally_visible = old_node->externally_visible;
+      new_node->calls_declare_variant_alt
+	= old_node->calls_declare_variant_alt;
+    }
 
   return new_node;
 }
@@ -1683,13 +1846,32 @@ simd_clone_adjust (struct cgraph_node *node)
 void
 expand_simd_clones (struct cgraph_node *node)
 {
-  tree attr = lookup_attribute ("omp declare simd",
-				DECL_ATTRIBUTES (node->decl));
-  if (attr == NULL_TREE
-      || node->inlined_to
+  tree attr;
+  bool explicit_p = true;
+
+  if (node->inlined_to
       || lookup_attribute ("noclone", DECL_ATTRIBUTES (node->decl)))
     return;
 
+  attr = lookup_attribute ("omp declare simd",
+			   DECL_ATTRIBUTES (node->decl));
+
+  /* See if we can add an "omp declare simd" directive implicitly
+     before giving up.  */
+  /* FIXME: OpenACC "#pragma acc routine" translates into
+     "omp declare target", but appears also to have some other effects
+     that conflict with generating SIMD clones, causing ICEs.  So don't
+     do this if we've got OpenACC instead of OpenMP.  */
+  if (attr == NULL_TREE
+      && flag_openmp_target_simd_clone
+      && !oacc_get_fn_attrib (node->decl))
+    {
+      attr = mark_auto_simd_clone (node);
+      explicit_p = false;
+    }
+  if (attr == NULL_TREE)
+    return;
+
   /* Ignore
      #pragma omp declare simd
      extern int foo ();
@@ -1714,13 +1896,15 @@ expand_simd_clones (struct cgraph_node *node)
 
       poly_uint64 orig_simdlen = clone_info->simdlen;
       tree base_type = simd_clone_compute_base_data_type (node, clone_info);
+
       /* The target can return 0 (no simd clones should be created),
 	 1 (just one ISA of simd clones should be created) or higher
 	 count of ISA variants.  In that case, clone_info is initialized
 	 for the first ISA variant.  */
       int count
 	= targetm.simd_clone.compute_vecsize_and_simdlen (node, clone_info,
-							  base_type, 0);
+							  base_type, 0,
+							  explicit_p);
       if (count == 0)
 	continue;
 
@@ -1745,7 +1929,8 @@ expand_simd_clones (struct cgraph_node *node)
 	      /* And call the target hook again to get the right ISA.  */
 	      targetm.simd_clone.compute_vecsize_and_simdlen (node, clone,
 							      base_type,
-							      i / 2);
+							      i / 2,
+							      explicit_p);
 	      if ((i & 1) != 0)
 		clone->inbranch = 1;
 	    }
@@ -1763,7 +1948,7 @@ expand_simd_clones (struct cgraph_node *node)
 	  /* Only when we are sure we want to create the clone actually
 	     clone the function (or definitions) or create another
 	     extern FUNCTION_DECL (for prototypes without definitions).  */
-	  struct cgraph_node *n = simd_clone_create (node);
+	  struct cgraph_node *n = simd_clone_create (node, !explicit_p);
 	  if (n == NULL)
 	    {
 	      if (i == 0)
diff --git a/gcc/opts.cc b/gcc/opts.cc
index 54e57f36755..b8ca6fdca82 100644
--- a/gcc/opts.cc
+++ b/gcc/opts.cc
@@ -658,6 +658,7 @@ static const struct default_options default_options_table[] =
       REORDER_BLOCKS_ALGORITHM_STC },
     { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_ftree_loop_vectorize, NULL, 1 },
     { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_ftree_slp_vectorize, NULL, 1 },
+    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_fopenmp_target_simd_clone, NULL, 1 },
 #ifdef INSN_SCHEDULING
   /* Only run the pre-regalloc scheduling pass if optimizing for speed.  */
     { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_fschedule_insns, NULL, 1 },
diff --git a/gcc/target.def b/gcc/target.def
index 4d49ffc2c88..6e830bed52a 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1634,7 +1634,7 @@ fields in @var{simd_clone} structure pointed by @var{clone_info} argument and al
 not determined by the bitsize (in which case @var{simdlen} is always used).\n\
 The hook should return 0 if SIMD clones shouldn't be emitted,\n\
 or number of @var{vecsize_mangle} variants that should be emitted.",
-int, (struct cgraph_node *, struct cgraph_simd_clone *, tree, int), NULL)
+int, (struct cgraph_node *, struct cgraph_simd_clone *, tree, int, bool), NULL)
 
 DEFHOOK
 (adjust,
diff --git a/gcc/testsuite/gcc.dg/gomp/target-simd-clone-1.c b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-1.c
new file mode 100644
index 00000000000..ab027a60970
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-1.c
@@ -0,0 +1,18 @@
+/* { dg-options "-fopenmp -O2" } */
+
+/* Test that simd clones are generated for functions with "declare target".  */
+
+#pragma omp declare target
+int addit(int a, int b, int c)
+{
+  return a + b;
+}
+#pragma omp end declare target
+
+/* Although addit has external linkage, we expect clones to be generated as
+   for a function with internal linkage.  */
+
+/* { dg-final { scan-assembler "\\.type.*_ZGVbN4vvv_addit,.*function" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-assembler "\\.type.*_ZGVbM4vvv_addit,.*function" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-assembler-not "\\.globl.*_ZGVbN4vvv_addit" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-assembler-not "\\.globl.*_ZGVbM4vvv_addit" { target i?86-*-* x86_64-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/gomp/target-simd-clone-2.c b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-2.c
new file mode 100644
index 00000000000..0ccbfe1d765
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-2.c
@@ -0,0 +1,18 @@
+/* { dg-options "-fopenmp -O2" } */
+
+/* Test that simd clones are not generated for functions with 
+   "declare target" but unsuitable arguments.  */
+
+struct s {
+  int a;
+  int b;
+};
+  
+#pragma omp declare target
+int addit (struct s x)
+{
+  return x.a + x.b;
+}
+#pragma omp end declare target
+
+/* { dg-final { scan-assembler-not "_Z.*_addit" { target i?86-*-* x86_64-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/gomp/target-simd-clone-3.c b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-3.c
new file mode 100644
index 00000000000..c313cfe53b0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-3.c
@@ -0,0 +1,17 @@
+/* { dg-options "-fopenmp -O2" } */
+
+/* Test that simd clones are not generated for functions with 
+   "declare target" but that call possibly side-effecting functions 
+   in the body.  */
+
+extern int f (int);
+
+#pragma omp declare target
+int addit(int a, int b, int c)
+{
+  return f(a) + b;
+}
+#pragma omp end declare target
+
+/* { dg-final { scan-assembler-not "_Z.*_addit" { target i?86-*-* x86_64-*-* } } } */
+
diff --git a/gcc/testsuite/gcc.dg/gomp/target-simd-clone-4.c b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-4.c
new file mode 100644
index 00000000000..e32b22f6a59
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-4.c
@@ -0,0 +1,16 @@
+/* { dg-options "-fopenmp -O2" } */
+
+/* Test that simd clones are not generated for functions with 
+   "declare target" but that write memory in the body.  */
+
+extern int save;
+
+#pragma omp declare target
+int addit(int a, int b, int c)
+{
+  save = c;
+  return a + b;
+}
+#pragma omp end declare target
+
+/* { dg-final { scan-assembler-not "_Z.*_addit" { target i?86-*-* x86_64-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/gomp/target-simd-clone-5.c b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-5.c
new file mode 100644
index 00000000000..d39a9ab737f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-5.c
@@ -0,0 +1,13 @@
+/* { dg-options "-fopenmp -Os" } */
+
+/* Test that simd clones are not generated for functions with 
+   "declare target" at -Os.  */
+
+#pragma omp declare target
+int addit(int a, int b, int c)
+{
+  return a + b;
+}
+#pragma omp end declare target
+
+/* { dg-final { scan-assembler-not "_Z.*_addit" { target i?86-*-* x86_64-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/gomp/target-simd-clone-6.c b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-6.c
new file mode 100644
index 00000000000..a0c529b1c4e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-6.c
@@ -0,0 +1,13 @@
+/* { dg-options "-fopenmp -Og" } */
+
+/* Test that simd clones are not generated for functions with 
+   "declare target" at -Og.  */
+
+#pragma omp declare target
+int addit(int a, int b, int c)
+{
+  return a + b;
+}
+#pragma omp end declare target
+
+/* { dg-final { scan-assembler-not "_Z.*_addit" { target i?86-*-* x86_64-*-* } } } */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] Re: OpenMP: Generate SIMD clones for functions with "declare target"
  2022-09-22  3:17   ` [PATCH v2] " Sandra Loosemore
@ 2022-09-30 10:37     ` Jakub Jelinek
  2022-10-17  1:23       ` [PATCH v3] " Sandra Loosemore
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Jelinek @ 2022-09-30 10:37 UTC (permalink / raw)
  To: Sandra Loosemore, Jan Hubicka; +Cc: Thomas Schwinge, gcc-patches

On Wed, Sep 21, 2022 at 09:17:18PM -0600, Sandra Loosemore wrote:
> On 9/14/22 12:12, Jakub Jelinek wrote:
> 
> > If it is pure optimization thing and purely keyed on the definition,
> > all the simd clones should be local to the TU, never exported from it.
> 
> OK, here is a revised patch that addresses that.  x86_64 target also
> generates a different set of clones for functions with internal linkage vs
> external so I hacked that to treat these implicit clones in the same way as
> other internal clones.
> 
> There is an existing problem with internal "declare simd" clones in that
> nothing ever DCEs clones that end up not being useful, or does a scan of the
> code in the compilation unit before clone generation to avoid generating
> useless clones in the first place.  I haven't tried to solve that problem,
> but I did attempt to mitigate it for these implicit "declare target" clones
> by tagging the option OPT_LEVELS_2_PLUS_SPEED_ONLY (instead of enabling it
> by default all the time) so the clones are not generated by default at -Os
> and -Og.  I added a couple new test cases to check this.

We've discussed this at Cauldron.  Especially for this patch, but less
urgently for explicit declare simd on non-exported functions (less urgently
just because people don't mark everything declare simd usually) solving the
above is essential.  I don't say it can't be done incrementally, but if the
patch is added to trunk, it needs to be solved before 13 branches.
We need to arrange cgraph to process the declare simd clones after the
callers of the corresponding main function, so that by the time we try to
post-IPA optimize the clones we can see if they were actually used or not
and if not, throw them away.

On the other side, for the implicit declare simd (in explicit case it is
user's choice), maybe it might be useful to actually see if the function clone
is vectorizable before deciding whether to actually make use of it.
Because I doubt it will be a good optimization if we clone it, push
arguments into vectors, then because vectorization failed take it appart,
do a serial loop, create return vector from the scalar results and return.
Though, thinking more about it, for the amdgcn case maybe it is worth even
in that case if we manage to vectorize the caller.  Because if failed
vectorization on admgcn means we perform significantly slower, it can be
helpful to have even partial vectorization, vectorize statements that can
be vectorized and for others use a scalar loop.  Our vectorizer is not
prepared to do that right now I believe (which is why e.g. for
#pragma omp ordered simd we just make the whole loop non-vectorizable,
rather than using a scalar loop for stuff in there and vectorize the rest),
but with this optimization we'd effectively achieve that at least at
function call boundaries (though, only in one direction, if the caller can
be vectorized and callee can't; no optimization if caller can't and callee
could be).

> +/* Helper function for mark_auto_simd_clone; return false if the statement
> +   violates restrictions for an "omp declare simd" function.  Specifically,
> +   the function must not
> +   - throw or call setjmp/longjmp
> +   - write memory that could alias parallel calls
> +   - include openmp directives or calls
> +   - call functions that might do those things */
> +
> +static bool
> +auto_simd_check_stmt (gimple *stmt, tree outer)
> +{
> +  tree decl;
> +
> +  switch (gimple_code (stmt))
> +    {
> +    case GIMPLE_CALL:
> +      decl = gimple_call_fndecl (stmt);
> +
> +      /* We can't know whether indirect calls are safe.  */
> +      if (decl == NULL_TREE)
> +	return false;

What about internal function calls?  Are all of them undesirable, or
some of them?  We do have const / pure ifns, ...
> +
> +      /* Calls to functions that are CONST or PURE are ok.  */
> +      if (gimple_call_flags (stmt) & (ECF_CONST | ECF_PURE))
> +	break;
> +
> +      /* Calls to functions that are already marked "omp declare simd" are
> +	 OK.  */
> +      if (lookup_attribute ("omp declare simd", DECL_ATTRIBUTES (decl)))
> +	break;

You could instead look up the cgraph simd clone info for the function...

> +      /* OpenMP directives are not permitted.  */
> +    CASE_GIMPLE_OMP:
> +      return false;

This makes no sense.  The function is called on low GIMPLE during IPA,
there are no GOMP_* statements at this point in the IL, everything has
been expanded.  Most of OpenMP directives though end up calling
libgomp APIs which aren't pure/const and don't have declare simd
attribute...
Exception can be say master construct, or static scheduling nowait
worksharing loop.

> +      /* Conservatively reject all EH-related constructs.  */
> +    case GIMPLE_CATCH:
> +    case GIMPLE_EH_FILTER:
> +    case GIMPLE_EH_MUST_NOT_THROW:
> +    case GIMPLE_EH_ELSE:
> +    case GIMPLE_EH_DISPATCH:
> +    case GIMPLE_RESX:
> +    case GIMPLE_TRY:

Most of these won't appear in low gimple either, I think GIMPLE_RESX
does and GIMPLE_EH_DISPATCH too, the rest probably can't.

> +      return false;
> +
> +      /* Asms are not permitted since we don't know what they do.  */
> +    case GIMPLE_ASM:
> +      return false;

What about volatile stmts?  Even volatile loads should be punted on.

> +
> +    default:
> +      break;
> +    }
> +
> +  /* Memory writes are not permitted.
> +     FIXME: this could be relaxed a little to permit writes to
> +     function-local variables that could not alias other instances
> +     of the function running in parallel.  */
> +  if (gimple_store_p (stmt))
> +    return false;
> +  else
> +    return true;
> +}

> +  FOR_EACH_BB_FN (bb, DECL_STRUCT_FUNCTION (decl))
> +    {
> +      for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
> +	   gsi_next (&gsi))
> +	if (!auto_simd_check_stmt (gsi_stmt (gsi), decl))
> +	  return NULL_TREE;
> +    }

If you want to punt on exceptions, I guess you could punt on EDGE_EH
or EDGE_ABNORMAL edges out of basic blocks.

> +  attr = lookup_attribute ("omp declare simd",
> +			   DECL_ATTRIBUTES (node->decl));
> +
> +  /* See if we can add an "omp declare simd" directive implicitly
> +     before giving up.  */
> +  /* FIXME: OpenACC "#pragma acc routine" translates into
> +     "omp declare target", but appears also to have some other effects
> +     that conflict with generating SIMD clones, causing ICEs.  So don't
> +     do this if we've got OpenACC instead of OpenMP.  */
> +  if (attr == NULL_TREE
> +      && flag_openmp_target_simd_clone
> +      && !oacc_get_fn_attrib (node->decl))

I admit I don't remember where exactly the simd clone happens wrt. other
IPA passes, but I think it is late pass; so, does it happen for GCN
offloading only in the lto1 offloading compiler?
Shouldn't the auto optimization be then done only in the offloading
lto1 for GCN then (say guard on targetm boolean)?

Otherwise, if we do it say for host offloading fallback as well
(I think it is still undesirable for PTX offloading because it is a waste of
time, there is no vectorization there, it is SIMT instead), it might be
a good idea to check cgraph that the function has at least one caller.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-1.c
> @@ -0,0 +1,18 @@
> +/* { dg-options "-fopenmp -O2" } */
> +
> +/* Test that simd clones are generated for functions with "declare target".  */
> +
> +#pragma omp declare target
> +int addit(int a, int b, int c)
> +{
> +  return a + b;
> +}
> +#pragma omp end declare target

Because in cases like this where nothing calls it in the same TU and not LTO
optimizing, creating the internal clones is pure waste of energy.  Nothing
will vectorize using those.

	Jakub


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] Re: OpenMP: Generate SIMD clones for functions with "declare target"
  2022-09-30 10:37     ` Jakub Jelinek
@ 2022-10-17  1:23       ` Sandra Loosemore
  2022-10-20 14:07         ` Jakub Jelinek
  0 siblings, 1 reply; 10+ messages in thread
From: Sandra Loosemore @ 2022-10-17  1:23 UTC (permalink / raw)
  To: Jakub Jelinek, Jan Hubicka; +Cc: Thomas Schwinge, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 5640 bytes --]

On 9/30/22 04:37, Jakub Jelinek wrote:

> We've discussed this at Cauldron.  Especially for this patch, but less
> urgently for explicit declare simd on non-exported functions (less urgently
> just because people don't mark everything declare simd usually) solving the
> above is essential.  I don't say it can't be done incrementally, but if the
> patch is added to trunk, it needs to be solved before 13 branches.
> We need to arrange cgraph to process the declare simd clones after the
> callers of the corresponding main function, so that by the time we try to
> post-IPA optimize the clones we can see if they were actually used or not
> and if not, throw them away.
> 
> On the other side, for the implicit declare simd (in explicit case it is
> user's choice), maybe it might be useful to actually see if the function clone
> is vectorizable before deciding whether to actually make use of it.
> Because I doubt it will be a good optimization if we clone it, push
> arguments into vectors, then because vectorization failed take it appart,
> do a serial loop, create return vector from the scalar results and return.
> Though, thinking more about it, for the amdgcn case maybe it is worth even
> in that case if we manage to vectorize the caller.  Because if failed
> vectorization on admgcn means we perform significantly slower, it can be
> helpful to have even partial vectorization, vectorize statements that can
> be vectorized and for others use a scalar loop.  Our vectorizer is not
> prepared to do that right now I believe (which is why e.g. for
> #pragma omp ordered simd we just make the whole loop non-vectorizable,
> rather than using a scalar loop for stuff in there and vectorize the rest),
> but with this optimization we'd effectively achieve that at least at
> function call boundaries (though, only in one direction, if the caller can
> be vectorized and callee can't; no optimization if caller can't and callee
> could be).

My sense is that the first approach would be more straightforward than 
the second one, and I am willing to continue to work on that.  However, 
I think I need some direction to get started, as I presently know 
nothing about cgraph and I was unable to find any useful overview or 
interface documentation in the GCC internals manual.  Is this as simple 
as inserting an existing pass into the passlist to clean up after 
vectorization, or does it involve writing something more or less from 
scratch?

> 
>> +      /* OpenMP directives are not permitted.  */
>> +    CASE_GIMPLE_OMP:
>> +      return false;
> 
> This makes no sense.  The function is called on low GIMPLE during IPA,
> there are no GOMP_* statements at this point in the IL, everything has
> been expanded.  Most of OpenMP directives though end up calling
> libgomp APIs which aren't pure/const and don't have declare simd
> attribute...
> Exception can be say master construct, or static scheduling nowait
> worksharing loop.
> 
>> +      /* Conservatively reject all EH-related constructs.  */
>> +    case GIMPLE_CATCH:
>> +    case GIMPLE_EH_FILTER:
>> +    case GIMPLE_EH_MUST_NOT_THROW:
>> +    case GIMPLE_EH_ELSE:
>> +    case GIMPLE_EH_DISPATCH:
>> +    case GIMPLE_RESX:
>> +    case GIMPLE_TRY:
> 
> Most of these won't appear in low gimple either, I think GIMPLE_RESX
> does and GIMPLE_EH_DISPATCH too, the rest probably can't.

OK, this was my bad.  I cut and pasted this from some code that was 
originally for the OMP lowering pass.  I've moved the entire 
plausibility filter to a new pass that runs just before OMP lowering. 
It seems easier to detect the things that are invalid in a cloneable 
function when they are still in a form closer to the source constructs.
>> +      return false;
>> +
>> +      /* Asms are not permitted since we don't know what they do.  */
>> +    case GIMPLE_ASM:
>> +      return false;
> 
> What about volatile stmts?  Even volatile loads should be punted on.

That's fixed now too.
> 
>> +  attr = lookup_attribute ("omp declare simd",
>> +			   DECL_ATTRIBUTES (node->decl));
>> +
>> +  /* See if we can add an "omp declare simd" directive implicitly
>> +     before giving up.  */
>> +  /* FIXME: OpenACC "#pragma acc routine" translates into
>> +     "omp declare target", but appears also to have some other effects
>> +     that conflict with generating SIMD clones, causing ICEs.  So don't
>> +     do this if we've got OpenACC instead of OpenMP.  */
>> +  if (attr == NULL_TREE
>> +      && flag_openmp_target_simd_clone
>> +      && !oacc_get_fn_attrib (node->decl))
> 
> I admit I don't remember where exactly the simd clone happens wrt. other
> IPA passes, but I think it is late pass; so, does it happen for GCN
> offloading only in the lto1 offloading compiler?
> Shouldn't the auto optimization be then done only in the offloading
> lto1 for GCN then (say guard on targetm boolean)?

I'm afraid I don't know much about offloading, but I was under the 
impression it all goes through the same compilation process, just with a 
different target?

> Otherwise, if we do it say for host offloading fallback as well
> (I think it is still undesirable for PTX offloading because it is a waste of
> time, there is no vectorization there, it is SIMT instead), it might be
> a good idea to check cgraph that the function has at least one caller.

As I said previously, I don't understand cgraph, but in my new patch I 
arranged things so that the implicit clones are only created if there is 
also a call to the function found in an OMP loop (not just one caller 
anywhere).  So this should be fixed now.

New patch attached.  Is this one OK for mainline?

-Sandra

[-- Attachment #2: auto-declare-simd-oct16.patch --]
[-- Type: text/x-patch, Size: 42353 bytes --]

commit eb1fe25b125cd153b2c661f1c54e61fcecf4754b
Author: Sandra Loosemore <sandra@codesourcery.com>
Date:   Sat Oct 15 02:03:03 2022 +0000

    OpenMP: Generate SIMD clones for functions with "declare target"
    
    This patch causes the IPA simdclone pass to generate clones for
    functions with the "omp declare target" attribute as if they had
    "omp declare simd", provided the function appears to be suitable for
    SIMD execution.  The filter is conservative, rejecting functions
    that write memory or that call other functions not known to be safe.
    A new option -fopenmp-target-simd-clone is added to control this
    transformation; it's enabled at -O2 and higher.
    
    gcc/ChangeLog:
    
            * common.opt (fopenmp-target-simd-clone): New option.
            * config/aarch64/aarch64.cc
            (aarch64_simd_clone_compute_vecsize_and_simdlen): Update.
            * config/gcn/gcn.cc
            (gcn_simd_clone_compute_vecsize_and_simdlen): Update.
            * config/i386/i386.cc
            (ix86_simd_clone_compute_vecsize_and_simdlen): Update.
            * doc/invoke.texi (-fopenmp-target-simd-clone): Document.
            * doc/tm.texi: Regenerated.
            * function.h (struct function): Add auto_simd_clone_candidate and
            auto_simd_call_candidate bitfields.
            * lto-streamer-in.cc (input_struct_function_base): Handle the new
            bitfields.
            * lto-streamer-out.cc (output_struct_function_base): Likewise.
            * omp-low.cc (auto_declare_simd_signature_ok): New.
            (check_auto_declare_simd_stmt): New.
            (check_auto_declare_simd_op): New.
            (maybe_mark_auto_declare_simd): New.
            (auto_declare_simd_scan_for_calls): New.
            (auto_declare_simd_scan_for_loops): New.
            (execute_auto_declare_simd): New.
            (pass_data_omp_auto_declare_simd): New.
            (pass_omp_auto_declare_simd): New.
            (make_pass_omp_auto_declare_simd): New.
            * omp-simd-clone.cc (simd_clone_create): Add force_local argument,
            make the symbol have internal linkage if it is true.
            (expand_simd_clones): Also check for cloneable functions with
            "omp declare target".  Pass explicit_p argument to
            simd_clone.compute_vecsize_and_simdlen target hook.
            * opts.cc (default_options_table): Add -fopenmp-target-simd-clone.
            * passes.def: Add pass_omp_auto_declare_simd.
            * target.def (TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN):
            Add bool explicit_p argument.
            * tree-pass.h: (make_pass_omp_auto_declare_simd): Declare.
    
    gcc/testsuite/ChangeLog:
    
            * g++.dg/gomp/target-simd-clone-1.C: New.
            * g++.dg/gomp/target-simd-clone-2.C: New.
            * gcc.dg/gomp/target-simd-clone-1.c: New.
            * gcc.dg/gomp/target-simd-clone-2.c: New.
            * gcc.dg/gomp/target-simd-clone-3.c: New.
            * gcc.dg/gomp/target-simd-clone-4.c: New.
            * gcc.dg/gomp/target-simd-clone-5.c: New.
            * gcc.dg/gomp/target-simd-clone-6.c: New.
            * gcc.dg/gomp/target-simd-clone-7.c: New.
            * gcc.dg/gomp/target-simd-clone-8.c: New.

diff --git a/gcc/common.opt b/gcc/common.opt
index bce3e514f65..5ee7d96d65d 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2214,6 +2214,10 @@ fomit-frame-pointer
 Common Var(flag_omit_frame_pointer) Optimization
 When possible do not generate stack frames.
 
+fopenmp-target-simd-clone
+Common Var(flag_openmp_target_simd_clone) Optimization
+Generate SIMD clones for functions with the OpenMP declare target directive.
+
 fopt-info
 Common Var(flag_opt_info) Optimization
 Enable all optimization info dumps on stderr.
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 1d0f994f281..2aafd0d2dd7 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -26698,7 +26698,8 @@ currently_supported_simd_type (tree t, tree b)
 static int
 aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 					struct cgraph_simd_clone *clonei,
-					tree base_type, int num)
+					tree base_type, int num,
+					bool explicit_p)
 {
   tree t, ret_type;
   unsigned int elt_bits, count;
@@ -26716,8 +26717,9 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 	  || const_simdlen > 1024
 	  || (const_simdlen & (const_simdlen - 1)) != 0))
     {
-      warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		  "unsupported simdlen %wd", const_simdlen);
+      if (explicit_p)
+	warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		    "unsupported simdlen %wd", const_simdlen);
       return 0;
     }
 
@@ -26725,7 +26727,9 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
   if (TREE_CODE (ret_type) != VOID_TYPE
       && !currently_supported_simd_type (ret_type, base_type))
     {
-      if (TYPE_SIZE (ret_type) != TYPE_SIZE (base_type))
+      if (!explicit_p)
+	;
+      else if (TYPE_SIZE (ret_type) != TYPE_SIZE (base_type))
 	warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
 		    "GCC does not currently support mixed size types "
 		    "for %<simd%> functions");
@@ -26752,7 +26756,9 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
       if (clonei->args[i].arg_type != SIMD_CLONE_ARG_TYPE_UNIFORM
 	  && !currently_supported_simd_type (arg_type, base_type))
 	{
-	  if (TYPE_SIZE (arg_type) != TYPE_SIZE (base_type))
+	  if (!explicit_p)
+	    ;
+	  else if (TYPE_SIZE (arg_type) != TYPE_SIZE (base_type))
 	    warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
 			"GCC does not currently support mixed size types "
 			"for %<simd%> functions");
@@ -26782,9 +26788,11 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
       if (clonei->simdlen.is_constant (&const_simdlen)
 	  && maybe_ne (vec_bits, 64U) && maybe_ne (vec_bits, 128U))
 	{
-	  warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		      "GCC does not currently support simdlen %wd for type %qT",
-		      const_simdlen, base_type);
+	  if (explicit_p)
+	    warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+			"GCC does not currently support simdlen %wd for "
+			"type %qT",
+			const_simdlen, base_type);
 	  return 0;
 	}
     }
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 3dc294c2d2f..d7059a2091c 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -5024,7 +5024,8 @@ static int
 gcn_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *ARG_UNUSED (node),
 					    struct cgraph_simd_clone *clonei,
 					    tree base_type,
-					    int ARG_UNUSED (num))
+					    int ARG_UNUSED (num),
+					    bool explicit_p)
 {
   if (known_eq (clonei->simdlen, 0U))
     clonei->simdlen = 64;
@@ -5032,9 +5033,10 @@ gcn_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *ARG_UNUSED (node
     {
       /* Note that x86 has a similar message that is likely to trigger on
 	 sizes that are OK for gcn; the user can't win.  */
-      warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		  "unsupported simdlen %wd (amdgcn)",
-		  clonei->simdlen.to_constant ());
+      if (explicit_p)
+	warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		    "unsupported simdlen %wd (amdgcn)",
+		    clonei->simdlen.to_constant ());
       return 0;
     }
 
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 480db35f6cd..5d8bb693ca9 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -23626,7 +23626,8 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val)
 static int
 ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 					     struct cgraph_simd_clone *clonei,
-					     tree base_type, int num)
+					     tree base_type, int num,
+					     bool explicit_p)
 {
   int ret = 1;
 
@@ -23635,8 +23636,9 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 	  || clonei->simdlen > 1024
 	  || (clonei->simdlen & (clonei->simdlen - 1)) != 0))
     {
-      warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		  "unsupported simdlen %wd", clonei->simdlen.to_constant ());
+      if (explicit_p)
+	warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		    "unsupported simdlen %wd", clonei->simdlen.to_constant ());
       return 0;
     }
 
@@ -23656,8 +23658,9 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 	  break;
 	/* FALLTHRU */
       default:
-	warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		    "unsupported return type %qT for simd", ret_type);
+	if (explicit_p)
+	  warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		      "unsupported return type %qT for simd", ret_type);
 	return 0;
       }
 
@@ -23686,13 +23689,14 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 	default:
 	  if (clonei->args[i].arg_type == SIMD_CLONE_ARG_TYPE_UNIFORM)
 	    break;
-	  warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		      "unsupported argument type %qT for simd", arg_type);
+	  if (explicit_p)
+	    warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+			"unsupported argument type %qT for simd", arg_type);
 	  return 0;
 	}
     }
 
-  if (!TREE_PUBLIC (node->decl))
+  if (!TREE_PUBLIC (node->decl) || !explicit_p)
     {
       /* If the function isn't exported, we can pick up just one ISA
 	 for the clones.  */
@@ -23763,9 +23767,10 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
 	cnt /= clonei->vecsize_float;
       if (cnt > (TARGET_64BIT ? 16 : 8))
 	{
-	  warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-		      "unsupported simdlen %wd",
-		      clonei->simdlen.to_constant ());
+	  if (explicit_p)
+	    warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+			"unsupported simdlen %wd",
+			clonei->simdlen.to_constant ());
 	  return 0;
 	}
       }
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index c176e2dc646..75ed54e8cce 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -204,7 +204,7 @@ in the following sections.
 -flax-vector-conversions  -fms-extensions @gol
 -foffload=@var{arg}  -foffload-options=@var{arg} @gol
 -fopenacc  -fopenacc-dim=@var{geom} @gol
--fopenmp  -fopenmp-simd @gol
+-fopenmp  -fopenmp-simd  -fopenmp-target-simd-clone @gol
 -fpermitted-flt-eval-methods=@var{standard} @gol
 -fplan9-extensions  -fsigned-bitfields  -funsigned-bitfields @gol
 -fsigned-char  -funsigned-char -fstrict-flex-arrays[=@var{n}] @gol
@@ -2761,6 +2761,16 @@ Enable handling of OpenMP's @code{simd}, @code{declare simd},
 @code{[[omp::directive(...)]]} and @code{[[omp::sequence(...)]]} in C++
 and @code{!$omp} in Fortran.  Other OpenMP directives are ignored.
 
+@item -fopenmp-target-simd-clone
+@opindex fopenmp-target-simd-clone
+@cindex OpenMP target SIMD clone
+In addition to generating SIMD clones for functions marked with the
+@code{declare simd} directive, GCC also generates clones
+for functions marked with the OpenMP @code{declare target} directive
+that are suitable for vectorization when this option is in effect.
+It is enabled by default at @option{-O2} and higher (but not @option{-Os}
+or @option{-Og}).
+
 @item -fpermitted-flt-eval-methods=@var{style}
 @opindex fpermitted-flt-eval-methods
 @opindex fpermitted-flt-eval-methods=c11
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 110f8dfa0a9..78b8871fb87 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6272,7 +6272,7 @@ The default is @code{NULL_TREE} which means to not vectorize scatter
 stores.
 @end deftypefn
 
-@deftypefn {Target Hook} int TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN (struct cgraph_node *@var{}, struct cgraph_simd_clone *@var{}, @var{tree}, @var{int})
+@deftypefn {Target Hook} int TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN (struct cgraph_node *@var{}, struct cgraph_simd_clone *@var{}, @var{tree}, @var{int}, @var{bool})
 This hook should set @var{vecsize_mangle}, @var{vecsize_int}, @var{vecsize_float}
 fields in @var{simd_clone} structure pointed by @var{clone_info} argument and also
 @var{simdlen} field if it was previously 0.
diff --git a/gcc/function.h b/gcc/function.h
index 098613766be..a706fdd0476 100644
--- a/gcc/function.h
+++ b/gcc/function.h
@@ -438,6 +438,15 @@ struct GTY(()) function {
 
   /* Set if there are any OMP_TARGET regions in the function.  */
   unsigned int has_omp_target : 1;
+
+  /* Set if this function is suitable for auto-creation of OpenMP SIMD
+     clones.  */
+  unsigned int auto_simd_clone_candidate : 1;
+
+  /* Set if this function is called in a context where an OpenMP SIMD
+     clone might be used.  Only useful on functions that are also identified
+     as auto_simd_clone_candidate.  */
+  unsigned int auto_simd_call_candidate : 1;
 };
 
 /* Add the decl D to the local_decls list of FUN.  */
diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc
index fa896340daf..f4c054c2c53 100644
--- a/gcc/lto-streamer-in.cc
+++ b/gcc/lto-streamer-in.cc
@@ -1318,6 +1318,8 @@ input_struct_function_base (struct function *fn, class data_in *data_in,
   fn->calls_eh_return = bp_unpack_value (&bp, 1);
   fn->has_force_vectorize_loops = bp_unpack_value (&bp, 1);
   fn->has_simduid_loops = bp_unpack_value (&bp, 1);
+  fn->auto_simd_clone_candidate = bp_unpack_value (&bp, 1);
+  fn->auto_simd_call_candidate = bp_unpack_value (&bp, 1);
   fn->va_list_fpr_size = bp_unpack_value (&bp, 8);
   fn->va_list_gpr_size = bp_unpack_value (&bp, 8);
   fn->last_clique = bp_unpack_value (&bp, sizeof (short) * 8);
diff --git a/gcc/lto-streamer-out.cc b/gcc/lto-streamer-out.cc
index 2e7af03888b..a434e221664 100644
--- a/gcc/lto-streamer-out.cc
+++ b/gcc/lto-streamer-out.cc
@@ -2278,6 +2278,8 @@ output_struct_function_base (struct output_block *ob, struct function *fn)
   bp_pack_value (&bp, fn->calls_eh_return, 1);
   bp_pack_value (&bp, fn->has_force_vectorize_loops, 1);
   bp_pack_value (&bp, fn->has_simduid_loops, 1);
+  bp_pack_value (&bp, fn->auto_simd_clone_candidate, 1);
+  bp_pack_value (&bp, fn->auto_simd_call_candidate, 1);
   bp_pack_value (&bp, fn->va_list_fpr_size, 8);
   bp_pack_value (&bp, fn->va_list_gpr_size, 8);
   bp_pack_value (&bp, fn->last_clique, sizeof (short) * 8);
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index dc42c752017..c7b8b0b894d 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -15055,4 +15055,335 @@ make_pass_diagnose_omp_blocks (gcc::context *ctxt)
 }
 \f
 
+/* Simple pass to identify functions marked "omp declare target" that are
+   also candidates for "omp declare simd".  The simd_clone IPA pass
+   generates the clones, here we just mark candidates.  It's easier to do
+   the semantic checks on both functions and call sites before OMP
+   lowering.  */
+
+
+/* Predicate to check whether the type signature and attributes of
+   FNDECL appear to be suitable for auto-declare-simd.  This does not
+   check its body, only that it has one.  */
+static bool
+auto_declare_simd_signature_ok (tree fndecl)
+{
+
+  /* Ignore functions without a definition.  */
+  if (!gimple_has_body_p (fndecl))
+    return false;
+
+  /* Ignore functions that are marked "noclone", "omp declare simd",
+     or that aren't marked "omp declare target".  */
+  tree attribs = DECL_ATTRIBUTES (fndecl);
+  if (lookup_attribute ("noclone", attribs)
+      || lookup_attribute ("omp declare simd", attribs)
+      || !lookup_attribute ("omp declare target", attribs))
+    return false;
+
+  /* FIXME: OpenACC "#pragma acc routine" translates into
+     "omp declare target", but appears also to have some other effects
+     that conflict with generating SIMD clones, causing ICEs.  So don't
+     do this if we've got OpenACC instead of OpenMP.  */
+  if (oacc_get_fn_attrib (fndecl))
+    return false;
+
+  /* Filter out functions with argument types that are not vectorizable.
+     Backends will check for vectorizable arguments/return types in a
+     target-specific way when clones are expanded in the simd_clone pass,
+     but we can immediately filter out functions that have non-scalar
+     arguments/return types.  Also, atomic types trigger warnings in
+     simd_clone_clauses_extract.  */
+  if (TYPE_ARG_TYPES (TREE_TYPE (fndecl)) == NULL_TREE)
+    return false;
+  tree t = TREE_TYPE (TREE_TYPE (fndecl));
+  machine_mode m = TYPE_MODE (t);
+  if (!(VOID_TYPE_P (t) || is_a <scalar_mode> (m)) || TYPE_ATOMIC (t))
+    return false;
+
+  if (TYPE_ARG_TYPES (TREE_TYPE (fndecl)))
+    {
+      for (tree temp = TYPE_ARG_TYPES (TREE_TYPE (fndecl));
+	   temp; temp = TREE_CHAIN (temp))
+	{
+	  t = TREE_VALUE (temp);
+	  m = TYPE_MODE (t);
+	  if (!(VOID_TYPE_P (t) || is_a <scalar_mode> (m)) || TYPE_ATOMIC (t))
+	    return false;
+	}
+    }
+  else
+    {
+      for (tree temp = DECL_ARGUMENTS (fndecl); temp; temp = DECL_CHAIN (temp))
+	{
+	  t = TREE_TYPE (temp);
+	  m = TYPE_MODE (t);
+	  if (!(VOID_TYPE_P (t) || is_a <scalar_mode> (m)) || TYPE_ATOMIC (t))
+	    return false;
+	}
+    }
+
+  /* Success!  */
+  return true;
+}
+
+/* This code walker is used on the body of a function that is potentially
+   auto-declare-simd, to filter out things that are not suitable for
+   SIMD cloning.  */
+static tree
+check_auto_declare_simd_stmt (gimple_stmt_iterator *gsi_p,
+			      bool *handled_ops_p,
+			      struct walk_stmt_info *wi ATTRIBUTE_UNUSED)
+{
+  gimple *stmt = gsi_stmt (*gsi_p);
+  tree decl, attribs;
+
+  /* Arbitrary non-null value to return to indicate failed check.  */
+  tree retval = integer_one_node;
+
+  switch (gimple_code (stmt))
+    {
+    case GIMPLE_CALL:
+      decl = gimple_call_fndecl (stmt);
+
+      /* We can't know whether indirect calls are safe.  */
+      if (decl == NULL_TREE)
+	{
+	  *handled_ops_p = true;
+	  return retval;
+	}
+
+      /* Calls to functions that are CONST or PURE are ok.  */
+      if (gimple_call_flags (stmt) & (ECF_CONST | ECF_PURE))
+	break;
+
+      /* Calls to functions that are already marked "omp declare simd" are
+	 OK.  */
+      attribs = DECL_ATTRIBUTES (decl);
+      if (lookup_attribute ("omp declare simd", attribs)
+	  || lookup_attribute ("omp declare auto-simd", attribs))
+	break;
+
+      /* Other function calls are not permitted.  */
+      *handled_ops_p = true;
+      return retval;
+
+      /* OpenMP directives are not permitted.  */
+    CASE_GIMPLE_OMP:
+      *handled_ops_p = true;
+      return retval;
+
+      /* Conservatively reject all EH-related constructs.  */
+    case GIMPLE_CATCH:
+    case GIMPLE_EH_FILTER:
+    case GIMPLE_EH_MUST_NOT_THROW:
+    case GIMPLE_EH_ELSE:
+    case GIMPLE_EH_DISPATCH:
+    case GIMPLE_RESX:
+    case GIMPLE_TRY:
+      *handled_ops_p = true;
+      return retval;
+
+      /* Asms are not permitted since we don't know what they do.  */
+    case GIMPLE_ASM:
+      *handled_ops_p = true;
+      return retval;
+
+    default:
+      break;
+    }
+
+  /* Memory writes are not permitted, nor are statements with volatile
+     operands.
+     FIXME: this could be relaxed a little to permit writes to
+     function-local variables that could not alias other instances
+     of the function running in parallel.  */
+  if (gimple_store_p (stmt) || gimple_has_volatile_ops (stmt))
+    {
+      *handled_ops_p = true;
+      return retval;
+    }
+  else
+    return NULL_TREE;
+}
+
+/* Walker to detect operands that would be invalid in an
+   auto-declare-simd function, specifically volatile memory accesses.
+   It seems that gimple_has_volatile_ops has not been
+   initialized yet when this pass runs, so we have to do our own scan
+   for this.  */
+static tree
+check_auto_declare_simd_op (tree *op, int *walk_subtrees ATTRIBUTE_UNUSED,
+			    void *data ATTRIBUTE_UNUSED)
+{
+  if (TREE_THIS_VOLATILE (*op))
+    return *op;
+  else
+    return NULL_TREE;
+}
+
+
+/* Check whether FNDECL is suitable for auto declare simd, and set
+   the attribute on it if so.  */
+void
+maybe_mark_auto_declare_simd (tree fndecl)
+{
+  /* Ignore functions that already have been identified as auto-simd
+     candidates.  */
+  struct function *fn = DECL_STRUCT_FUNCTION (fndecl);
+  gcc_assert (fn);
+  if (fn->auto_simd_clone_candidate)
+    return;
+
+  /* Check the type signature and attributes.  */
+  if (!auto_declare_simd_signature_ok (fndecl))
+    return;
+
+  /* Walk the function body to check for things that would interfere with
+     vectorization, or be invalid in an "omp declare simd" function.  */
+  gimple_seq body = gimple_body (fndecl);
+  struct walk_stmt_info wi;
+  memset (&wi, 0, sizeof (wi));
+  if (walk_gimple_seq (body, check_auto_declare_simd_stmt,
+		       check_auto_declare_simd_op, &wi))
+    return;
+
+  /* Mark the function as a candidate, for further processing by the
+     simd_clone pass.  */
+  fn->auto_simd_clone_candidate = 1;
+}
+
+/* This code walker is used to scan for calls to functions in contexts
+   where the vectorizer might be able to make use of a SIMD clone.  */
+
+tree
+auto_declare_simd_scan_for_calls (gimple_stmt_iterator *gsi_p,
+				  bool *handled_ops_p ATTRIBUTE_UNUSED,
+				  struct walk_stmt_info *wi ATTRIBUTE_UNUSED)
+{
+  gimple *stmt = gsi_stmt (*gsi_p);
+  switch (gimple_code (stmt))
+    {
+    case GIMPLE_CALL:
+      {
+	tree fndecl = gimple_call_fndecl (stmt);
+	if (fndecl && auto_declare_simd_signature_ok (fndecl))
+	  {
+	    struct function *fn = DECL_STRUCT_FUNCTION (fndecl);
+	    if (fn)
+	      fn->auto_simd_call_candidate = 1;
+	  }
+      }
+      break;
+    default:
+      break;
+    }
+  return NULL_TREE;
+}
+
+/* This code walker is used to scan for OMP loops where we might find
+   calls to functions that are candidates for SIMD cloning.  */
+
+tree
+auto_declare_simd_scan_for_loops (gimple_stmt_iterator *gsi_p,
+				  bool *handled_ops_p ATTRIBUTE_UNUSED,
+				  struct walk_stmt_info *wi ATTRIBUTE_UNUSED)
+{
+  gimple *stmt = gsi_stmt (*gsi_p);
+  switch (gimple_code (stmt))
+    {
+    case GIMPLE_OMP_FOR:
+      {
+	struct walk_stmt_info inner_wi;
+	memset (&inner_wi, 0, sizeof (inner_wi));
+	walk_gimple_seq (gimple_omp_body (stmt),
+			 auto_declare_simd_scan_for_calls, NULL, &inner_wi);
+      }
+      *handled_ops_p = true;
+      break;
+
+    default:
+      break;
+    }
+  return NULL_TREE;
+}
+
+
+/* Main function for this pass.  */
+
+static unsigned int
+execute_auto_declare_simd (void)
+{
+  gimple_seq body = gimple_body (current_function_decl);
+  struct walk_stmt_info wi;
+  memset (&wi, 0, sizeof (wi));
+
+  /* First scan the current function to see if it is a candidate for auto
+     declare simd.  */
+  maybe_mark_auto_declare_simd (current_function_decl);
+
+  /* Now scan the current function to see if it contains calls to things that
+     might be candidates for use of auto declare simd clones, in a context
+     where such a clone might actually be used -- namely, if this function is
+     already marked as "declare simd", or has calls inside an OMP loop.
+
+     Note that the caller and callee may be compiled in either order, that
+     is why we have to use two different bits in the struct function and
+     combine them later in the simd_clone IPA pass.  */
+
+  tree attribs = DECL_ATTRIBUTES (current_function_decl);
+  if (lookup_attribute ("omp declare simd", attribs))
+    walk_gimple_seq (body, auto_declare_simd_scan_for_calls, NULL, &wi);
+  else
+    walk_gimple_seq (body, auto_declare_simd_scan_for_loops, NULL, &wi);
+  return 0;
+}
+
+
+namespace {
+
+const pass_data pass_data_omp_auto_declare_simd =
+{
+  GIMPLE_PASS, /* type */
+  "*omp_auto_declare_simd", /* name */
+  OPTGROUP_OMP, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  PROP_gimple_any, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_omp_auto_declare_simd : public gimple_opt_pass
+{
+public:
+  pass_omp_auto_declare_simd (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_omp_auto_declare_simd, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  bool gate (function *) final override
+  {
+    return ((flag_openacc || flag_openmp || flag_openmp_simd)
+	    && targetm.simd_clone.compute_vecsize_and_simdlen != NULL
+	    && flag_openmp_target_simd_clone
+	    && flag_tree_loop_vectorize);
+  }
+  unsigned int execute (function *) final override
+    {
+      return execute_auto_declare_simd ();
+    }
+
+}; // class pass_omp_auto_declare_simd
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_omp_auto_declare_simd (gcc::context *ctxt)
+{
+  return new pass_omp_auto_declare_simd (ctxt);
+}
+\f
+
 #include "gt-omp-low.h"
diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
index 34cbee5afcd..a1ab05600dd 100644
--- a/gcc/omp-simd-clone.cc
+++ b/gcc/omp-simd-clone.cc
@@ -51,6 +51,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "stringpool.h"
 #include "attribs.h"
 #include "omp-simd-clone.h"
+#include "omp-low.h"
+#include "omp-general.h"
 
 /* Return the number of elements in vector type VECTYPE, which is associated
    with a SIMD clone.  At present these always have a constant length.  */
@@ -430,10 +432,12 @@ simd_clone_mangle (struct cgraph_node *node,
   return get_identifier (str);
 }
 
-/* Create a simd clone of OLD_NODE and return it.  */
+/* Create a simd clone of OLD_NODE and return it.  If FORCE_LOCAL is true,
+   create it as a local symbol, otherwise copy the symbol linkage and
+   visibility attributes from OLD_NODE.  */
 
 static struct cgraph_node *
-simd_clone_create (struct cgraph_node *old_node)
+simd_clone_create (struct cgraph_node *old_node, bool force_local)
 {
   struct cgraph_node *new_node;
   if (old_node->definition)
@@ -463,23 +467,38 @@ simd_clone_create (struct cgraph_node *old_node)
     return new_node;
 
   set_decl_built_in_function (new_node->decl, NOT_BUILT_IN, 0);
-  TREE_PUBLIC (new_node->decl) = TREE_PUBLIC (old_node->decl);
-  DECL_COMDAT (new_node->decl) = DECL_COMDAT (old_node->decl);
-  DECL_WEAK (new_node->decl) = DECL_WEAK (old_node->decl);
-  DECL_EXTERNAL (new_node->decl) = DECL_EXTERNAL (old_node->decl);
-  DECL_VISIBILITY_SPECIFIED (new_node->decl)
-    = DECL_VISIBILITY_SPECIFIED (old_node->decl);
-  DECL_VISIBILITY (new_node->decl) = DECL_VISIBILITY (old_node->decl);
-  DECL_DLLIMPORT_P (new_node->decl) = DECL_DLLIMPORT_P (old_node->decl);
-  if (DECL_ONE_ONLY (old_node->decl))
-    make_decl_one_only (new_node->decl, DECL_ASSEMBLER_NAME (new_node->decl));
-
-  /* The method cgraph_version_clone_with_body () will force the new
-     symbol local.  Undo this, and inherit external visibility from
-     the old node.  */
-  new_node->local = old_node->local;
-  new_node->externally_visible = old_node->externally_visible;
-  new_node->calls_declare_variant_alt = old_node->calls_declare_variant_alt;
+  if (force_local)
+    {
+      TREE_PUBLIC (new_node->decl) = 0;
+      DECL_COMDAT (new_node->decl) = 0;
+      DECL_WEAK (new_node->decl) = 0;
+      DECL_EXTERNAL (new_node->decl) = 0;
+      DECL_VISIBILITY_SPECIFIED (new_node->decl) = 0;
+      DECL_VISIBILITY (new_node->decl) = VISIBILITY_DEFAULT;
+      DECL_DLLIMPORT_P (new_node->decl) = 0;
+    }
+  else
+    {
+      TREE_PUBLIC (new_node->decl) = TREE_PUBLIC (old_node->decl);
+      DECL_COMDAT (new_node->decl) = DECL_COMDAT (old_node->decl);
+      DECL_WEAK (new_node->decl) = DECL_WEAK (old_node->decl);
+      DECL_EXTERNAL (new_node->decl) = DECL_EXTERNAL (old_node->decl);
+      DECL_VISIBILITY_SPECIFIED (new_node->decl)
+	= DECL_VISIBILITY_SPECIFIED (old_node->decl);
+      DECL_VISIBILITY (new_node->decl) = DECL_VISIBILITY (old_node->decl);
+      DECL_DLLIMPORT_P (new_node->decl) = DECL_DLLIMPORT_P (old_node->decl);
+      if (DECL_ONE_ONLY (old_node->decl))
+	make_decl_one_only (new_node->decl,
+			    DECL_ASSEMBLER_NAME (new_node->decl));
+
+      /* The method cgraph_version_clone_with_body () will force the new
+	 symbol local.  Undo this, and inherit external visibility from
+	 the old node.  */
+      new_node->local = old_node->local;
+      new_node->externally_visible = old_node->externally_visible;
+      new_node->calls_declare_variant_alt
+	= old_node->calls_declare_variant_alt;
+    }
 
   return new_node;
 }
@@ -1683,13 +1702,34 @@ simd_clone_adjust (struct cgraph_node *node)
 void
 expand_simd_clones (struct cgraph_node *node)
 {
-  tree attr = lookup_attribute ("omp declare simd",
-				DECL_ATTRIBUTES (node->decl));
-  if (attr == NULL_TREE
-      || node->inlined_to
+  tree attr;
+  bool explicit_p = true;
+
+  if (node->inlined_to
       || lookup_attribute ("noclone", DECL_ATTRIBUTES (node->decl)))
     return;
 
+  attr = lookup_attribute ("omp declare simd",
+			   DECL_ATTRIBUTES (node->decl));
+
+  /* See if we can add an "omp declare simd" directive implicitly
+     before giving up.  */
+  if (attr == NULL_TREE
+      && flag_openmp_target_simd_clone)
+    {
+      struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
+      if (fn && fn->auto_simd_clone_candidate
+	  && (fn->auto_simd_call_candidate || in_lto_p))
+	{
+	  attr = tree_cons (get_identifier ("omp declare simd"), NULL,
+			    DECL_ATTRIBUTES (node->decl));
+	  DECL_ATTRIBUTES (node->decl) = attr;
+	  explicit_p = false;
+	}
+    }
+  if (attr == NULL_TREE)
+    return;
+
   /* Ignore
      #pragma omp declare simd
      extern int foo ();
@@ -1714,13 +1754,15 @@ expand_simd_clones (struct cgraph_node *node)
 
       poly_uint64 orig_simdlen = clone_info->simdlen;
       tree base_type = simd_clone_compute_base_data_type (node, clone_info);
+
       /* The target can return 0 (no simd clones should be created),
 	 1 (just one ISA of simd clones should be created) or higher
 	 count of ISA variants.  In that case, clone_info is initialized
 	 for the first ISA variant.  */
       int count
 	= targetm.simd_clone.compute_vecsize_and_simdlen (node, clone_info,
-							  base_type, 0);
+							  base_type, 0,
+							  explicit_p);
       if (count == 0)
 	continue;
 
@@ -1745,7 +1787,8 @@ expand_simd_clones (struct cgraph_node *node)
 	      /* And call the target hook again to get the right ISA.  */
 	      targetm.simd_clone.compute_vecsize_and_simdlen (node, clone,
 							      base_type,
-							      i / 2);
+							      i / 2,
+							      explicit_p);
 	      if ((i & 1) != 0)
 		clone->inbranch = 1;
 	    }
@@ -1763,7 +1806,7 @@ expand_simd_clones (struct cgraph_node *node)
 	  /* Only when we are sure we want to create the clone actually
 	     clone the function (or definitions) or create another
 	     extern FUNCTION_DECL (for prototypes without definitions).  */
-	  struct cgraph_node *n = simd_clone_create (node);
+	  struct cgraph_node *n = simd_clone_create (node, !explicit_p);
 	  if (n == NULL)
 	    {
 	      if (i == 0)
diff --git a/gcc/opts.cc b/gcc/opts.cc
index ae079fcd20e..8d5eb2bbbce 100644
--- a/gcc/opts.cc
+++ b/gcc/opts.cc
@@ -658,6 +658,7 @@ static const struct default_options default_options_table[] =
       REORDER_BLOCKS_ALGORITHM_STC },
     { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_ftree_loop_vectorize, NULL, 1 },
     { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_ftree_slp_vectorize, NULL, 1 },
+    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_fopenmp_target_simd_clone, NULL, 1 },
 #ifdef INSN_SCHEDULING
   /* Only run the pre-regalloc scheduling pass if optimizing for speed.  */
     { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_fschedule_insns, NULL, 1 },
diff --git a/gcc/passes.def b/gcc/passes.def
index 939ec3e29c8..4b29a2c5800 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_diagnose_omp_blocks);
   NEXT_PASS (pass_diagnose_tm_blocks);
   NEXT_PASS (pass_omp_oacc_kernels_decompose);
+  NEXT_PASS (pass_omp_auto_declare_simd);
   NEXT_PASS (pass_lower_omp);
   NEXT_PASS (pass_lower_cf);
   NEXT_PASS (pass_lower_tm);
diff --git a/gcc/target.def b/gcc/target.def
index a3d3b04a165..0e41a3796ec 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1634,7 +1634,7 @@ fields in @var{simd_clone} structure pointed by @var{clone_info} argument and al
 not determined by the bitsize (in which case @var{simdlen} is always used).\n\
 The hook should return 0 if SIMD clones shouldn't be emitted,\n\
 or number of @var{vecsize_mangle} variants that should be emitted.",
-int, (struct cgraph_node *, struct cgraph_simd_clone *, tree, int), NULL)
+int, (struct cgraph_node *, struct cgraph_simd_clone *, tree, int, bool), NULL)
 
 DEFHOOK
 (adjust,
diff --git a/gcc/testsuite/g++.dg/gomp/target-simd-clone-1.C b/gcc/testsuite/g++.dg/gomp/target-simd-clone-1.C
new file mode 100644
index 00000000000..85eaffc5e84
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gomp/target-simd-clone-1.C
@@ -0,0 +1,26 @@
+/* { dg-options "-fopenmp -O2" } */
+
+/* Test that simd clones are generated for functions with "declare target".  */
+
+#pragma omp declare target
+__attribute__ ((__noinline__)) int addit (int a, int b)
+{
+  return a + b;
+}
+#pragma omp end declare target
+
+void callit (int *a, int *b, int *c)
+{
+  int i;
+  #pragma omp for simd
+  for (i = 0; i < 16; i++)
+    c[i] = addit (a[i], b[i]);
+}
+
+/* Although addit has external linkage, we expect clones to be generated as
+   for a function with internal linkage.  */
+
+/* { dg-final { scan-assembler "\\.type\\t_ZGVbN4vv__Z5additii, @function" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-assembler "\\.type\\t_ZGVbM4vv__Z5additii, @function" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-assembler-not "\\.globl\\t_ZGVbN4vv__Z5additii" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-assembler-not "\\.globl\\t_ZGVbM4vv__Z5additii" { target i?86-*-* x86_64-*-* } } } */
diff --git a/gcc/testsuite/g++.dg/gomp/target-simd-clone-2.C b/gcc/testsuite/g++.dg/gomp/target-simd-clone-2.C
new file mode 100644
index 00000000000..cff5bfdcef3
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gomp/target-simd-clone-2.C
@@ -0,0 +1,23 @@
+/* { dg-options "-fopenmp -O2" } */
+
+/* Test that simd clones are not generated for "declare target"
+   functions that throw.  */
+
+#pragma omp declare target
+__attribute__ ((__noinline__)) int addit (int a, int b)
+{
+  if (a < 0) throw -1;
+  return a + b;
+}
+#pragma omp end declare target
+
+void callit (int *a, int *b, int *c)
+{
+  int i;
+  #pragma omp for simd
+  for (i = 0; i < 16; i++)
+    c[i] = addit (a[i], b[i]);
+}
+
+/* { dg-final { scan-assembler-not "\\.type\\t_ZGVbN4vv__Z5additii, @function" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-assembler-not "\\.type\\t_ZGVbM4vv__Z5additii, @function" { target i?86-*-* x86_64-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/gomp/target-simd-clone-1.c b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-1.c
new file mode 100644
index 00000000000..6b5a196169e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-1.c
@@ -0,0 +1,26 @@
+/* { dg-options "-fopenmp -O2" } */
+
+/* Test that simd clones are generated for functions with "declare target".  */
+
+#pragma omp declare target
+__attribute__ ((__noinline__)) int addit (int a, int b)
+{
+  return a + b;
+}
+#pragma omp end declare target
+
+void callit (int *a, int *b, int *c)
+{
+  int i;
+  #pragma omp for simd
+  for (i = 0; i < 16; i++)
+    c[i] = addit (a[i], b[i]);
+}
+
+/* Although addit has external linkage, we expect clones to be generated as
+   for a function with internal linkage.  */
+
+/* { dg-final { scan-assembler "\\.type\\t_ZGVbN4vv_addit, @function" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-assembler "\\.type\\t_ZGVbM4vv_addit, @function" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-assembler-not "\\.globl\\t_ZGVbN4vv_addit" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-assembler-not "\\.globl\\t_ZGVbM4vv_addit" { target i?86-*-* x86_64-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/gomp/target-simd-clone-2.c b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-2.c
new file mode 100644
index 00000000000..4049bb33784
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-2.c
@@ -0,0 +1,26 @@
+/* { dg-options "-fopenmp -O2" } */
+
+/* Test that simd clones are not generated for functions with 
+   "declare target" but unsuitable arguments.  */
+
+struct s {
+  int a;
+  int b;
+};
+  
+#pragma omp declare target
+__attribute__ ((__noinline__)) int addit (struct s x)
+{
+  return x.a + x.b;
+}
+#pragma omp end declare target
+
+void callit (struct s *ss, int *c)
+{
+  int i;
+  #pragma omp for simd
+  for (i = 0; i < 16; i++)
+    c[i] = addit (ss[i]);
+}
+
+/* { dg-final { scan-assembler-not "_Z.*_addit" { target i?86-*-* x86_64-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/gomp/target-simd-clone-3.c b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-3.c
new file mode 100644
index 00000000000..a03e4764f8f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-3.c
@@ -0,0 +1,25 @@
+/* { dg-options "-fopenmp -O2" } */
+
+/* Test that simd clones are not generated for functions with 
+   "declare target" but that call possibly side-effecting functions 
+   in the body.  */
+
+extern int f (int);
+
+#pragma omp declare target
+__attribute__ ((__noinline__)) int addit(int a, int b)
+{
+  return f(a) + b;
+}
+#pragma omp end declare target
+
+void callit (int *a, int *b, int *c)
+{
+  int i;
+  #pragma omp for simd
+  for (i = 0; i < 16; i++)
+    c[i] = addit (a[i], b[i]);
+}
+
+/* { dg-final { scan-assembler-not "_Z.*_addit" { target i?86-*-* x86_64-*-* } } } */
+
diff --git a/gcc/testsuite/gcc.dg/gomp/target-simd-clone-4.c b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-4.c
new file mode 100644
index 00000000000..7161b927d7b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-4.c
@@ -0,0 +1,24 @@
+/* { dg-options "-fopenmp -O2" } */
+
+/* Test that simd clones are not generated for functions with 
+   "declare target" but that write memory in the body.  */
+
+extern int save;
+
+#pragma omp declare target
+__attribute__ ((__noinline__)) int addit(int a, int b)
+{
+  save = a;
+  return a + b;
+}
+#pragma omp end declare target
+
+void callit (int *a, int *b, int *c)
+{
+  int i;
+  #pragma omp for simd
+  for (i = 0; i < 16; i++)
+    c[i] = addit (a[i], b[i]);
+}
+
+/* { dg-final { scan-assembler-not "_Z.*_addit" { target i?86-*-* x86_64-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/gomp/target-simd-clone-5.c b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-5.c
new file mode 100644
index 00000000000..f32ff9cd5d0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-5.c
@@ -0,0 +1,21 @@
+/* { dg-options "-fopenmp -Os" } */
+
+/* Test that simd clones are not generated for functions with 
+   "declare target" at -Os.  */
+
+#pragma omp declare target
+__attribute__ ((__noinline__)) int addit (int a, int b)
+{
+  return a + b;
+}
+#pragma omp end declare target
+
+void callit (int *a, int *b, int *c)
+{
+  int i;
+  #pragma omp for simd
+  for (i = 0; i < 16; i++)
+    c[i] = addit (a[i], b[i]);
+}
+
+/* { dg-final { scan-assembler-not "_Z.*_addit" { target i?86-*-* x86_64-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/gomp/target-simd-clone-6.c b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-6.c
new file mode 100644
index 00000000000..2a69d2b16e8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-6.c
@@ -0,0 +1,21 @@
+/* { dg-options "-fopenmp -Og" } */
+
+/* Test that simd clones are not generated for functions with 
+   "declare target" at -Og.  */
+
+#pragma omp declare target
+__attribute__ ((__noinline__)) int addit (int a, int b)
+{
+  return a + b;
+}
+#pragma omp end declare target
+
+void callit (int *a, int *b, int *c)
+{
+  int i;
+  #pragma omp for simd
+  for (i = 0; i < 16; i++)
+    c[i] = addit (a[i], b[i]);
+}
+
+/* { dg-final { scan-assembler-not "_Z.*_addit" { target i?86-*-* x86_64-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/gomp/target-simd-clone-7.c b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-7.c
new file mode 100644
index 00000000000..40627d69857
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-7.c
@@ -0,0 +1,22 @@
+/* { dg-options "-fopenmp -O2" } */
+
+/* Test that simd clones are not generated for functions with 
+   "declare target" when there is no call site likely to use them.  */
+
+#pragma omp declare target
+__attribute__ ((__noinline__)) int addit (int a, int b)
+{
+  return a + b;
+}
+#pragma omp end declare target
+
+void callit (int *a, int *b, int *c)
+{
+  int i;
+  for (i = 0; i < 16; i++)
+    /* This call is not in an OMP loop.  */
+    c[i] = addit (a[i], b[i]);
+}
+
+/* { dg-final { scan-assembler-not "\\.type\\t_ZGVbN4vv_addit, @function" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-assembler-not "\\.type\\t_ZGVbM4vv_addit, @function" { target i?86-*-* x86_64-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/gomp/target-simd-clone-8.c b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-8.c
new file mode 100644
index 00000000000..4a05ade1d0e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-8.c
@@ -0,0 +1,23 @@
+/* { dg-options "-fopenmp -O2" } */
+
+/* Test that simd clones are not generated for functions with 
+   "declare target" but that read volatile memory in the body.  */
+
+extern volatile int save;
+
+#pragma omp declare target
+__attribute__ ((__noinline__)) int addit(int a, int b)
+{
+  return save + a + b;
+}
+#pragma omp end declare target
+
+void callit (int *a, int *b, int *c)
+{
+  int i;
+  #pragma omp for simd
+  for (i = 0; i < 16; i++)
+    c[i] = addit (a[i], b[i]);
+}
+
+/* { dg-final { scan-assembler-not "_Z.*_addit" { target i?86-*-* x86_64-*-* } } } */
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 4dfe05ed8e0..c6ac85522fa 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -424,6 +424,7 @@ extern gimple_opt_pass *make_pass_lower_vector_ssa (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_omp_oacc_kernels_decompose (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_lower_omp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_diagnose_omp_blocks (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_omp_auto_declare_simd (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_expand_omp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_expand_omp_ssa (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_omp_target_link (gcc::context *ctxt);

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] Re: OpenMP: Generate SIMD clones for functions with "declare target"
  2022-10-17  1:23       ` [PATCH v3] " Sandra Loosemore
@ 2022-10-20 14:07         ` Jakub Jelinek
  2022-10-27  2:27           ` Sandra Loosemore
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Jelinek @ 2022-10-20 14:07 UTC (permalink / raw)
  To: Sandra Loosemore; +Cc: Jan Hubicka, gcc-patches, Thomas Schwinge

On Sun, Oct 16, 2022 at 07:23:05PM -0600, Sandra Loosemore wrote:
> My sense is that the first approach would be more straightforward than the
> second one, and I am willing to continue to work on that.  However, I think
> I need some direction to get started, as I presently know nothing about
> cgraph and I was unable to find any useful overview or interface
> documentation in the GCC internals manual.  Is this as simple as inserting
> an existing pass into the passlist to clean up after vectorization, or does
> it involve writing something more or less from scratch?

We (as I've discovered during the work on assumptions) have
TODO_discard_function which when returned from an execute pass throws away
a function completely (except now assumption functions for which it doesn't
release body; this could be done in some pass shortly after IPA, or
alternatively before expansion).  But another thing that needs to be done is for the
non-public declare simd clones (both explicit and implicit from your patch)
to be ordered in cgraph after anything that has a cgraph edge to its
original function.  I don't know how to do that, you should talk to Honza,
Richi or Martin about that.
I think the current behavior is that callees are processed before callers
if possible (unless there are cycles), which is certainly what we want for
say assume functions, or IPA RA etc.  But in case of non-public simd clones
we want to do it the other way around (at the expense of IPA RA), so that
we can throw away functions which aren't needed.

> > I admit I don't remember where exactly the simd clone happens wrt. other
> > IPA passes, but I think it is late pass; so, does it happen for GCN
> > offloading only in the lto1 offloading compiler?
> > Shouldn't the auto optimization be then done only in the offloading
> > lto1 for GCN then (say guard on targetm boolean)?
> 
> I'm afraid I don't know much about offloading, but I was under the
> impression it all goes through the same compilation process, just with a
> different target?

I've looked at it today and it seems late ipa passes are executed after LTO
bytecode is streamed back in.
If you say try:
#pragma omp declare simd
int foo (int x) { return x; }

int
main ()
{
  int a[64] = {};
  #pragma omp target map(a)
  #pragma omp simd
  for (int i = 0; i < 64; i++)
    a[i] = foo (a[i]);
}
with
gcc -foffload-options='-fdump-tree-all -fdump-ipa-all' -fdump-tree-all -fdump-ipa-all -O2 -fopenmp a.c -o a
you ought to see the simdclone dump both as a.c.*i.simdclone and a.x*.mkoffload.*i.simdclone
where the former is what is done for the host code (and host fallback),
while the latter is what is done in the offloading lto.
Can't verify it 100% because I have only nvptx-none offloading configured
and in that case pass_omp_simd_clone::gate is disabled in offloading lto
because targetm.simd_clone.compute_vecsize_and_simdlen is NULL for nvptx.
But it is non-NULL for gcn.

Thus, IMHO it is exactly the pass_omp_simd_clone pass where you want to
implement this auto-simdization discovery, guarded with
#ifdef ACCEL_COMPILER and the new option (which means it will be done
only for gcn and not on the host right now).  And do it at the start of
ipa_omp_simd_clone, before the
  FOR_EACH_FUNCTION (node)
    expand_simd_clones (node);
loop, or, if it is purely local decision for each function, at the
start of expand_simd_clones with similar guarding, punt on functions
with "noclone" attribute, or !node->definition.  You need to repeat the
  if (node->has_gimple_body_p ())
    node->get_body ();
to get body before you analyze it.

And please put the new functions for such analysis into omp-simd-clone.cc
where they belong.

	Jakub


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] Re: OpenMP: Generate SIMD clones for functions with "declare target"
  2022-10-20 14:07         ` Jakub Jelinek
@ 2022-10-27  2:27           ` Sandra Loosemore
  2022-10-27 10:09             ` Thomas Schwinge
  0 siblings, 1 reply; 10+ messages in thread
From: Sandra Loosemore @ 2022-10-27  2:27 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Jan Hubicka, gcc-patches, Thomas Schwinge

On 10/20/22 08:07, Jakub Jelinek wrote:
> Thus, IMHO it is exactly the pass_omp_simd_clone pass where you want to
> implement this auto-simdization discovery, guarded with
> #ifdef ACCEL_COMPILER and the new option (which means it will be done
> only for gcn and not on the host right now).

I'm running into a practical difficulty with making this controlled by a 
static #ifdef: namely, testing.

One of my test cases examines the .s output to make sure that the clones 
are emitted as local symbols and not global.  I have not been able to 
find the symbol linkage information in any of the dump files, and I have 
also not been able to figure out how to get a .s file from the offload 
compiler even outside of the DejaGnu test harness.  (It's possible I am 
just an extreme dummy about the latter problem, but so far none of my 
colleagues here has been able to give me a recipe either.)

On top of that, I worry that this should be tested more broadly than for 
the one target we're presently focusing on (AMD GCN), and we'll get much 
more regular test coverage if it's also enabled for x86_64 target which 
has the necessary compute_vecsize_and_simdlen target hook.

I remember Carlos O'Donnell used to have a favorite mantra, "design for 
test".  So, maybe generalize the new -fopenmp-target-simd-clone option 
to take a parameter to force clones to be generated on the OpenMP host 
for test purposes?  The "declare target" directive already has a clause

device_type(host|nohost|any)

that defaults to "any"; maybe we could use that syntax like
-fopenmp-target-simd-clone=any
and use the intersection of the two sets to determine what to 
auto-generate clones for?

-Sandra

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] Re: OpenMP: Generate SIMD clones for functions with "declare target"
  2022-10-27  2:27           ` Sandra Loosemore
@ 2022-10-27 10:09             ` Thomas Schwinge
  2022-10-27 20:40               ` Sandra Loosemore
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Schwinge @ 2022-10-27 10:09 UTC (permalink / raw)
  To: Sandra Loosemore, Jakub Jelinek; +Cc: Jan Hubicka, gcc-patches

Hi!

On 2022-10-26T20:27:19-0600, Sandra Loosemore <sandra@codesourcery.com> wrote:
> On 10/20/22 08:07, Jakub Jelinek wrote:
>> Thus, IMHO it is exactly the pass_omp_simd_clone pass where you want to
>> implement this auto-simdization discovery, guarded with
>> #ifdef ACCEL_COMPILER and the new option (which means it will be done
>> only for gcn and not on the host right now).
>
> I'm running into a practical difficulty with making this controlled by a
> static #ifdef: namely, testing.
>
> One of my test cases examines the .s output to make sure that the clones
> are emitted as local symbols and not global.  I have not been able to
> find the symbol linkage information in any of the dump files

Hmm, also some of '-fdump-ipa-all-details' doesn't help here?

> and I have
> also not been able to figure out how to get a .s file from the offload
> compiler even outside of the DejaGnu test harness.  (It's possible I am
> just an extreme dummy about the latter problem, but so far none of my
> colleagues here has been able to give me a recipe either.)

Right, currently only 'scan-offload-tree-dump[...]',
'scan-offload-rtl-dump[...]' are implemented; I assume
'scan-offload-assembler[...]' could be added without too much effort.

> On top of that, I worry that this should be tested more broadly than for
> the one target we're presently focusing on (AMD GCN), and we'll get much
> more regular test coverage if it's also enabled for x86_64 target which
> has the necessary compute_vecsize_and_simdlen target hook.
>
> I remember Carlos O'Donnell used to have a favorite mantra, "design for
> test".

Heh, I don't remember him ever saying that to me -- but maybe that's
because this is what I do anyway.  ;-P

> So, maybe generalize the new -fopenmp-target-simd-clone option
> to take a parameter to force clones to be generated on the OpenMP host
> for test purposes?  The "declare target" directive already has a clause
>
> device_type(host|nohost|any)
>
> that defaults to "any"; maybe we could use that syntax like
> -fopenmp-target-simd-clone=any
> and use the intersection of the two sets to determine what to
> auto-generate clones for?

Seems reasonable to me (but I'm missing a lot of context here).


There anyway is a goal (far out) to get rid of compilation-time
'#ifdef ACCEL_COMPILER' etc., and instead make such code dependent on a
command-line flag (or some other state), so that it's possible to use the
the same compiler for target (host) as well as offload target compilation.
(For example, to simulate offloading compilation with standard
x86_64-pc-linux-gnu GCC.)


And/or, where you implement the logic to "make sure that the clones
are emitted as local symbols and not global", do emit some "tag" in the
dump file, and the scan for that?

Random examples that I just remembered:

'gcc/omp-offload.cc:execute_oacc_loop_designation' handling of
'OMP_CLAUSE_NOHOST', and how that's scanned (host-side) in test cases
such as 'libgomp/testsuite/libgomp.oacc-c-c++-common/routine-nohost-1.c',
'libgomp/testsuite/libgomp.oacc-fortran/routine-nohost-1.f90'.

'gcc/config/nvptx/nvptx.cc:nvptx_find_sese' doing
'fprintf (dump_file, "SESE regions:"); [...]', and that's scanned in:

    libgomp/testsuite/libgomp.oacc-c-c++-common/nvptx-sese-1.c-/* Match {N->N(.N)+} */
    libgomp/testsuite/libgomp.oacc-c-c++-common/nvptx-sese-1.c:/* { dg-final { scan-offload-rtl-dump "SESE regions:.* \[0-9\]+{\[0-9\]+->\[0-9\]+(\\.\[0-9\]+)+}" "mach" } } */

(You'd be doing this at the 'scan-offload-tree-dump[...]' level, I
suppose.)


Grüße
 Thomas
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] Re: OpenMP: Generate SIMD clones for functions with "declare target"
  2022-10-27 10:09             ` Thomas Schwinge
@ 2022-10-27 20:40               ` Sandra Loosemore
  0 siblings, 0 replies; 10+ messages in thread
From: Sandra Loosemore @ 2022-10-27 20:40 UTC (permalink / raw)
  To: Thomas Schwinge, Jakub Jelinek; +Cc: Jan Hubicka, gcc-patches

On 10/27/22 04:09, Thomas Schwinge wrote:
> Hi!
> 
> On 2022-10-26T20:27:19-0600, Sandra Loosemore <sandra@codesourcery.com> wrote:
>> One of my test cases examines the .s output to make sure that the clones
>> are emitted as local symbols and not global.  I have not been able to
>> find the symbol linkage information in any of the dump files
> 
> Hmm, also some of '-fdump-ipa-all-details' doesn't help here?

Maybe I'm not looking at the right dump file, but all I see is names of 
functions in the dumps and nothing about symbol linkage/visibility, even 
with -details.

> And/or, where you implement the logic to "make sure that the clones
> are emitted as local symbols and not global", do emit some "tag" in the
> dump file, and the scan for that?
> 
> Random examples that I just remembered:
> 
> 'gcc/omp-offload.cc:execute_oacc_loop_designation' handling of
> 'OMP_CLAUSE_NOHOST', and how that's scanned (host-side) in test cases
> such as 'libgomp/testsuite/libgomp.oacc-c-c++-common/routine-nohost-1.c',
> 'libgomp/testsuite/libgomp.oacc-fortran/routine-nohost-1.f90'.
> 
> 'gcc/config/nvptx/nvptx.cc:nvptx_find_sese' doing
> 'fprintf (dump_file, "SESE regions:"); [...]', and that's scanned in:
> 
>      libgomp/testsuite/libgomp.oacc-c-c++-common/nvptx-sese-1.c-/* Match {N->N(.N)+} */
>      libgomp/testsuite/libgomp.oacc-c-c++-common/nvptx-sese-1.c:/* { dg-final { scan-offload-rtl-dump "SESE regions:.* \[0-9\]+{\[0-9\]+->\[0-9\]+(\\.\[0-9\]+)+}" "mach" } } */
> 
> (You'd be doing this at the 'scan-offload-tree-dump[...]' level, I
> suppose.)

I guess customizing the dump output from the simdclone pass with the 
information I need is the easiest solution.  I'm still concerned about 
getting adequate routine test coverage, though, when it's so specialized 
to a particular offload target.

Thanks for the help!  :-)

-Sandra

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-10-27 20:40 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-14 17:32 OpenMP: Generate SIMD clones for functions with "declare target" Sandra Loosemore
2022-09-14 18:12 ` Jakub Jelinek
2022-09-22  3:17   ` [PATCH v2] " Sandra Loosemore
2022-09-30 10:37     ` Jakub Jelinek
2022-10-17  1:23       ` [PATCH v3] " Sandra Loosemore
2022-10-20 14:07         ` Jakub Jelinek
2022-10-27  2:27           ` Sandra Loosemore
2022-10-27 10:09             ` Thomas Schwinge
2022-10-27 20:40               ` Sandra Loosemore
2022-09-14 21:45 ` Thomas Schwinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).