public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 0/3] OpenMP SIMD routines
@ 2022-08-09 13:23 Andrew Stubbs
  2022-08-09 13:23 ` [PATCH 1/3] omp-simd-clone: Allow fixed-lane vectors Andrew Stubbs
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Andrew Stubbs @ 2022-08-09 13:23 UTC (permalink / raw)
  To: gcc-patches

This patch series implements OpenMP "simd" routines for amdgcn, and also
adds support for "simd inbranch" routines for amdgcn, x86_64, and
aarch64 (probably, I can't easily test it).

I can approve patch 2 myself, but it depends on patch 1 so I include it
here for context and completeness.

I first tried to use "mask_mode = DImode", for amdgcn, but that does not
produce great results because it ends up generating code to turn the
mask into a vector and then back into the exact same mask, so I have
settled on "mask_mode = VOIDmode", for now (in fact that uses fewer
argument registers in many cases, so maybe it's better anyway).
Additionally, I find that the x86_64 truth vectors cannot always be
converted to the mask types specified by the backend, so I have pulled
that code out completely.

Therefore, this patch includes only "mask_mode == VOIDmode" support,
but remains a step forward towards full SIMD clone support.

I have not included dump-scans in the testcases for aarch64, but the
testcases will still test correctness.  The aarch64 maintainers can very
easily add those scans if they choose.  No other architecture has
backend support for the clones at this time.

OK for mainline (patches 1 & 3)?

Thanks

Andrew

Andrew Stubbs (3):
  omp-simd-clone: Allow fixed-lane vectors
  amdgcn: OpenMP SIMD routine support
  vect: inbranch SIMD clones

 gcc/config/gcn/gcn.cc                         |  63 ++++++++
 gcc/doc/tm.texi                               |   3 +
 gcc/omp-simd-clone.cc                         |  21 ++-
 gcc/target.def                                |   3 +
 gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c |   2 +
 .../gcc.dg/vect/vect-simd-clone-16.c          |  89 ++++++++++++
 .../gcc.dg/vect/vect-simd-clone-16b.c         |  14 ++
 .../gcc.dg/vect/vect-simd-clone-16c.c         |  16 +++
 .../gcc.dg/vect/vect-simd-clone-16d.c         |  16 +++
 .../gcc.dg/vect/vect-simd-clone-16e.c         |  14 ++
 .../gcc.dg/vect/vect-simd-clone-16f.c         |  16 +++
 .../gcc.dg/vect/vect-simd-clone-17.c          |  89 ++++++++++++
 .../gcc.dg/vect/vect-simd-clone-17b.c         |  14 ++
 .../gcc.dg/vect/vect-simd-clone-17c.c         |  16 +++
 .../gcc.dg/vect/vect-simd-clone-17d.c         |  16 +++
 .../gcc.dg/vect/vect-simd-clone-17e.c         |  14 ++
 .../gcc.dg/vect/vect-simd-clone-17f.c         |  16 +++
 .../gcc.dg/vect/vect-simd-clone-18.c          |  89 ++++++++++++
 .../gcc.dg/vect/vect-simd-clone-18b.c         |  14 ++
 .../gcc.dg/vect/vect-simd-clone-18c.c         |  16 +++
 .../gcc.dg/vect/vect-simd-clone-18d.c         |  16 +++
 .../gcc.dg/vect/vect-simd-clone-18e.c         |  14 ++
 .../gcc.dg/vect/vect-simd-clone-18f.c         |  16 +++
 gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c |   2 +
 gcc/testsuite/gcc.dg/vect/vect-simd-clone-3.c |   1 +
 gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c |   1 +
 gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c |   1 +
 gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c |   2 +
 gcc/tree-if-conv.cc                           |  39 ++++-
 gcc/tree-vect-stmts.cc                        | 134 ++++++++++++++----
 30 files changed, 734 insertions(+), 33 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16b.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16c.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16d.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16e.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17b.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17c.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17d.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17e.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18b.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18c.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18d.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18e.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c

-- 
2.37.0


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/3] omp-simd-clone: Allow fixed-lane vectors
  2022-08-09 13:23 [PATCH 0/3] OpenMP SIMD routines Andrew Stubbs
@ 2022-08-09 13:23 ` Andrew Stubbs
  2022-08-26 11:04   ` Jakub Jelinek
  2022-08-09 13:23 ` [PATCH 2/3] amdgcn: OpenMP SIMD routine support Andrew Stubbs
  2022-08-09 13:23 ` [PATCH 3/3] vect: inbranch SIMD clones Andrew Stubbs
  2 siblings, 1 reply; 21+ messages in thread
From: Andrew Stubbs @ 2022-08-09 13:23 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 813 bytes --]


The vecsize_int/vecsize_float has an assumption that all arguments will use
the same bitsize, and vary the number of lanes according to the element size,
but this is inappropriate on targets where the number of lanes is fixed and
the bitsize varies (i.e. amdgcn).

With this change the vecsize can be left zero and the vectorization factor will
be the same for all types.

gcc/ChangeLog:

	* doc/tm.texi: Regenerate.
	* omp-simd-clone.cc (simd_clone_adjust_return_type): Allow zero
	vecsize.
	(simd_clone_adjust_argument_types): Likewise.
	* target.def (compute_vecsize_and_simdlen): Document the new
	vecsize_int and vecsize_float semantics.
---
 gcc/doc/tm.texi       |  3 +++
 gcc/omp-simd-clone.cc | 20 +++++++++++++++-----
 gcc/target.def        |  3 +++
 3 files changed, 21 insertions(+), 5 deletions(-)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-omp-simd-clone-Allow-fixed-lane-vectors.patch --]
[-- Type: text/x-patch; name="0001-omp-simd-clone-Allow-fixed-lane-vectors.patch", Size: 3278 bytes --]

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 92bda1a7e14..c3001c6ded9 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6253,6 +6253,9 @@ stores.
 This hook should set @var{vecsize_mangle}, @var{vecsize_int}, @var{vecsize_float}
 fields in @var{simd_clone} structure pointed by @var{clone_info} argument and also
 @var{simdlen} field if it was previously 0.
+@var{vecsize_mangle} is a marker for the backend only. @var{vecsize_int} and
+@var{vecsize_float} should be left zero on targets where the number of lanes is
+not determined by the bitsize (in which case @var{simdlen} is always used).
 The hook should return 0 if SIMD clones shouldn't be emitted,
 or number of @var{vecsize_mangle} variants that should be emitted.
 @end deftypefn
diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
index 58bd68b129b..258d3c6377f 100644
--- a/gcc/omp-simd-clone.cc
+++ b/gcc/omp-simd-clone.cc
@@ -504,7 +504,10 @@ simd_clone_adjust_return_type (struct cgraph_node *node)
     veclen = node->simdclone->vecsize_int;
   else
     veclen = node->simdclone->vecsize_float;
-  veclen = exact_div (veclen, GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t)));
+  if (known_eq (veclen, 0))
+    veclen = node->simdclone->simdlen;
+  else
+    veclen = exact_div (veclen, GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t)));
   if (multiple_p (veclen, node->simdclone->simdlen))
     veclen = node->simdclone->simdlen;
   if (POINTER_TYPE_P (t))
@@ -618,8 +621,12 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 	    veclen = sc->vecsize_int;
 	  else
 	    veclen = sc->vecsize_float;
-	  veclen = exact_div (veclen,
-			      GET_MODE_BITSIZE (SCALAR_TYPE_MODE (parm_type)));
+	  if (known_eq (veclen, 0))
+	    veclen = sc->simdlen;
+	  else
+	    veclen = exact_div (veclen,
+				GET_MODE_BITSIZE
+				(SCALAR_TYPE_MODE (parm_type)));
 	  if (multiple_p (veclen, sc->simdlen))
 	    veclen = sc->simdlen;
 	  adj.op = IPA_PARAM_OP_NEW;
@@ -669,8 +676,11 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 	veclen = sc->vecsize_int;
       else
 	veclen = sc->vecsize_float;
-      veclen = exact_div (veclen,
-			  GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type)));
+      if (known_eq (veclen, 0))
+	veclen = sc->simdlen;
+      else
+	veclen = exact_div (veclen,
+			    GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type)));
       if (multiple_p (veclen, sc->simdlen))
 	veclen = sc->simdlen;
       if (sc->mask_mode != VOIDmode)
diff --git a/gcc/target.def b/gcc/target.def
index 2a7fa68f83d..4d49ffc2c88 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1629,6 +1629,9 @@ DEFHOOK
 "This hook should set @var{vecsize_mangle}, @var{vecsize_int}, @var{vecsize_float}\n\
 fields in @var{simd_clone} structure pointed by @var{clone_info} argument and also\n\
 @var{simdlen} field if it was previously 0.\n\
+@var{vecsize_mangle} is a marker for the backend only. @var{vecsize_int} and\n\
+@var{vecsize_float} should be left zero on targets where the number of lanes is\n\
+not determined by the bitsize (in which case @var{simdlen} is always used).\n\
 The hook should return 0 if SIMD clones shouldn't be emitted,\n\
 or number of @var{vecsize_mangle} variants that should be emitted.",
 int, (struct cgraph_node *, struct cgraph_simd_clone *, tree, int), NULL)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 2/3] amdgcn: OpenMP SIMD routine support
  2022-08-09 13:23 [PATCH 0/3] OpenMP SIMD routines Andrew Stubbs
  2022-08-09 13:23 ` [PATCH 1/3] omp-simd-clone: Allow fixed-lane vectors Andrew Stubbs
@ 2022-08-09 13:23 ` Andrew Stubbs
  2022-08-30 14:53   ` Andrew Stubbs
  2022-08-09 13:23 ` [PATCH 3/3] vect: inbranch SIMD clones Andrew Stubbs
  2 siblings, 1 reply; 21+ messages in thread
From: Andrew Stubbs @ 2022-08-09 13:23 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1301 bytes --]


Enable and configure SIMD clones for amdgcn.  This affects both the __simd__
function attribute, and the OpenMP "declare simd" directive.

Note that the masked SIMD variants are generated, but the middle end doesn't
actually support calling them yet.

gcc/ChangeLog:

	* config/gcn/gcn.cc (gcn_simd_clone_compute_vecsize_and_simdlen): New.
	(gcn_simd_clone_adjust): New.
	(gcn_simd_clone_usable): New.
	(TARGET_SIMD_CLONE_ADJUST): New.
	(TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN): New.
	(TARGET_SIMD_CLONE_USABLE): New.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/vect-simd-clone-1.c: Add dg-warning.
	* gcc.dg/vect/vect-simd-clone-2.c: Add dg-warning.
	* gcc.dg/vect/vect-simd-clone-3.c: Add dg-warning.
	* gcc.dg/vect/vect-simd-clone-4.c: Add dg-warning.
	* gcc.dg/vect/vect-simd-clone-5.c: Add dg-warning.
	* gcc.dg/vect/vect-simd-clone-8.c: Add dg-warning.
---
 gcc/config/gcn/gcn.cc                         | 63 +++++++++++++++++++
 gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c |  2 +
 gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c |  2 +
 gcc/testsuite/gcc.dg/vect/vect-simd-clone-3.c |  1 +
 gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c |  1 +
 gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c |  1 +
 gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c |  2 +
 7 files changed, 72 insertions(+)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0002-amdgcn-OpenMP-SIMD-routine-support.patch --]
[-- Type: text/x-patch; name="0002-amdgcn-OpenMP-SIMD-routine-support.patch", Size: 5387 bytes --]

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 96295e23aad..ceb69000807 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -52,6 +52,7 @@
 #include "rtl-iter.h"
 #include "dwarf2.h"
 #include "gimple.h"
+#include "cgraph.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -4555,6 +4556,61 @@ gcn_vectorization_cost (enum vect_cost_for_stmt ARG_UNUSED (type_of_cost),
   return 1;
 }
 
+/* Implement TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN.  */
+
+static int
+gcn_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *ARG_UNUSED (node),
+					    struct cgraph_simd_clone *clonei,
+					    tree base_type,
+					    int ARG_UNUSED (num))
+{
+  unsigned int elt_bits = GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type));
+
+  if (known_eq (clonei->simdlen, 0U))
+    clonei->simdlen = 64;
+  else if (maybe_ne (clonei->simdlen, 64U))
+    {
+      /* Note that x86 has a similar message that is likely to trigger on
+	 sizes that are OK for gcn; the user can't win.  */
+      warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		  "unsupported simdlen %wd (amdgcn)",
+		  clonei->simdlen.to_constant ());
+      return 0;
+    }
+
+  clonei->vecsize_mangle = 'n';
+  clonei->vecsize_int = 0;
+  clonei->vecsize_float = 0;
+
+  /* DImode ought to be more natural here, but VOIDmode produces better code,
+     at present, due to the shift-and-test steps not being optimized away
+     inside the in-branch clones.  */
+  clonei->mask_mode = VOIDmode;
+
+  return 1;
+}
+
+/* Implement TARGET_SIMD_CLONE_ADJUST.  */
+
+static void
+gcn_simd_clone_adjust (struct cgraph_node *ARG_UNUSED (node))
+{
+  /* This hook has to be defined when
+     TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN is defined, but we don't
+     need it to do anything yet.  */
+}
+
+/* Implement TARGET_SIMD_CLONE_USABLE.  */
+
+static int
+gcn_simd_clone_usable (struct cgraph_node *ARG_UNUSED (node))
+{
+  /* We don't need to do anything here because
+     gcn_simd_clone_compute_vecsize_and_simdlen currently only returns one
+     possibility.  */
+  return 0;
+}
+
 /* }}}  */
 /* {{{ md_reorg pass.  */
 
@@ -6643,6 +6699,13 @@ gcn_dwarf_register_span (rtx rtl)
 #define TARGET_SECTION_TYPE_FLAGS gcn_section_type_flags
 #undef  TARGET_SCALAR_MODE_SUPPORTED_P
 #define TARGET_SCALAR_MODE_SUPPORTED_P gcn_scalar_mode_supported_p
+#undef  TARGET_SIMD_CLONE_ADJUST
+#define TARGET_SIMD_CLONE_ADJUST gcn_simd_clone_adjust
+#undef  TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN
+#define TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN \
+  gcn_simd_clone_compute_vecsize_and_simdlen
+#undef  TARGET_SIMD_CLONE_USABLE
+#define TARGET_SIMD_CLONE_USABLE gcn_simd_clone_usable
 #undef  TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P
 #define TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P \
   gcn_small_register_classes_for_mode_p
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c
index 50429049500..cd65fc343f1 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c
@@ -56,3 +56,5 @@ main ()
   return 0;
 }
 
+/* { dg-warning {unsupported simdlen 8 \(amdgcn\)} "" { target amdgcn*-*-* } 18 } */
+/* { dg-warning {unsupported simdlen 4 \(amdgcn\)} "" { target amdgcn*-*-* } 18 } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c
index f89c73a961b..ffcbf9380d6 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c
@@ -50,3 +50,5 @@ main ()
   return 0;
 }
 
+/* { dg-warning {unsupported simdlen 8 \(amdgcn\)} "" { target amdgcn*-*-* } 18 } */
+/* { dg-warning {unsupported simdlen 4 \(amdgcn\)} "" { target amdgcn*-*-* } 18 } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-3.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-3.c
index 75ce696ed66..18d68779cc5 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-3.c
@@ -43,3 +43,4 @@ main ()
   return 0;
 }
 
+/* { dg-warning {unsupported simdlen 4 \(amdgcn\)} "" { target amdgcn*-*-* } 15 } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c
index debbe77b79d..e9af0b83162 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c
@@ -46,3 +46,4 @@ main ()
   return 0;
 }
 
+/* { dg-warning {unsupported simdlen 8 \(amdgcn\)} "" { target amdgcn*-*-* } 17 } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c
index 6a098d9a51a..46da496524d 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c
@@ -41,3 +41,4 @@ main ()
   return 0;
 }
 
+/* { dg-warning {unsupported simdlen 4 \(amdgcn\)} "" { target amdgcn*-*-* } 15 } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c
index 1bfd19dc8ab..f414285a170 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c
@@ -92,3 +92,5 @@ main ()
   return 0;
 }
 
+/* { dg-warning {unsupported simdlen 8 \(amdgcn\)} "" { target amdgcn*-*-* } 17 } */
+/* { dg-warning {unsupported simdlen 8 \(amdgcn\)} "" { target amdgcn*-*-* } 24 } */

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 3/3] vect: inbranch SIMD clones
  2022-08-09 13:23 [PATCH 0/3] OpenMP SIMD routines Andrew Stubbs
  2022-08-09 13:23 ` [PATCH 1/3] omp-simd-clone: Allow fixed-lane vectors Andrew Stubbs
  2022-08-09 13:23 ` [PATCH 2/3] amdgcn: OpenMP SIMD routine support Andrew Stubbs
@ 2022-08-09 13:23 ` Andrew Stubbs
  2022-09-09 14:31   ` Jakub Jelinek
  2 siblings, 1 reply; 21+ messages in thread
From: Andrew Stubbs @ 2022-08-09 13:23 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 4229 bytes --]


There has been support for generating "inbranch" SIMD clones for a long time,
but nothing actually uses them (as far as I can see).

This patch add supports for a sub-set of possible cases (those using
mask_mode == VOIDmode).  The other cases fail to vectorize, just as before,
so there should be no regressions.

The sub-set of support should cover all cases needed by amdgcn, at present.

gcc/ChangeLog:

	* omp-simd-clone.cc (simd_clone_adjust_argument_types): Set vector_type
	for mask arguments also.
	* tree-if-conv.cc: Include cgraph.h.
	(if_convertible_stmt_p): Do if conversions for calls to SIMD calls.
	(predicate_statements): Pass the predicate to SIMD functions.
	* tree-vect-stmts.cc (vectorizable_simd_clone_call): Permit calls
	to clones with mask arguments, in some cases.
	Generate the mask vector arguments.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/vect-simd-clone-16.c: New test.
	* gcc.dg/vect/vect-simd-clone-16b.c: New test.
	* gcc.dg/vect/vect-simd-clone-16c.c: New test.
	* gcc.dg/vect/vect-simd-clone-16d.c: New test.
	* gcc.dg/vect/vect-simd-clone-16e.c: New test.
	* gcc.dg/vect/vect-simd-clone-16f.c: New test.
	* gcc.dg/vect/vect-simd-clone-17.c: New test.
	* gcc.dg/vect/vect-simd-clone-17b.c: New test.
	* gcc.dg/vect/vect-simd-clone-17c.c: New test.
	* gcc.dg/vect/vect-simd-clone-17d.c: New test.
	* gcc.dg/vect/vect-simd-clone-17e.c: New test.
	* gcc.dg/vect/vect-simd-clone-17f.c: New test.
	* gcc.dg/vect/vect-simd-clone-18.c: New test.
	* gcc.dg/vect/vect-simd-clone-18b.c: New test.
	* gcc.dg/vect/vect-simd-clone-18c.c: New test.
	* gcc.dg/vect/vect-simd-clone-18d.c: New test.
	* gcc.dg/vect/vect-simd-clone-18e.c: New test.
	* gcc.dg/vect/vect-simd-clone-18f.c: New test.
---
 gcc/omp-simd-clone.cc                         |   1 +
 .../gcc.dg/vect/vect-simd-clone-16.c          |  89 ++++++++++++
 .../gcc.dg/vect/vect-simd-clone-16b.c         |  14 ++
 .../gcc.dg/vect/vect-simd-clone-16c.c         |  16 +++
 .../gcc.dg/vect/vect-simd-clone-16d.c         |  16 +++
 .../gcc.dg/vect/vect-simd-clone-16e.c         |  14 ++
 .../gcc.dg/vect/vect-simd-clone-16f.c         |  16 +++
 .../gcc.dg/vect/vect-simd-clone-17.c          |  89 ++++++++++++
 .../gcc.dg/vect/vect-simd-clone-17b.c         |  14 ++
 .../gcc.dg/vect/vect-simd-clone-17c.c         |  16 +++
 .../gcc.dg/vect/vect-simd-clone-17d.c         |  16 +++
 .../gcc.dg/vect/vect-simd-clone-17e.c         |  14 ++
 .../gcc.dg/vect/vect-simd-clone-17f.c         |  16 +++
 .../gcc.dg/vect/vect-simd-clone-18.c          |  89 ++++++++++++
 .../gcc.dg/vect/vect-simd-clone-18b.c         |  14 ++
 .../gcc.dg/vect/vect-simd-clone-18c.c         |  16 +++
 .../gcc.dg/vect/vect-simd-clone-18d.c         |  16 +++
 .../gcc.dg/vect/vect-simd-clone-18e.c         |  14 ++
 .../gcc.dg/vect/vect-simd-clone-18f.c         |  16 +++
 gcc/tree-if-conv.cc                           |  39 ++++-
 gcc/tree-vect-stmts.cc                        | 134 ++++++++++++++----
 21 files changed, 641 insertions(+), 28 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16b.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16c.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16d.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16e.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17b.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17c.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17d.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17e.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18b.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18c.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18d.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18e.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0003-vect-inbranch-SIMD-clones.patch --]
[-- Type: text/x-patch; name="0003-vect-inbranch-SIMD-clones.patch", Size: 30721 bytes --]

diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
index 258d3c6377f..58e3dc8b2e9 100644
--- a/gcc/omp-simd-clone.cc
+++ b/gcc/omp-simd-clone.cc
@@ -716,6 +716,7 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 	}
       sc->args[i].orig_type = base_type;
       sc->args[i].arg_type = SIMD_CLONE_ARG_TYPE_MASK;
+      sc->args[i].vector_type = adj.type;
     }
 
   if (node->definition)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c
new file mode 100644
index 00000000000..ffaabb30d1e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c
@@ -0,0 +1,89 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+/* Test that simd inbranch clones work correctly.  */
+
+#ifndef TYPE
+#define TYPE int
+#endif
+
+/* A simple function that will be cloned.  */
+#pragma omp declare simd
+TYPE __attribute__((noinline))
+foo (TYPE a)
+{
+  return a + 1;
+}
+
+/* Check that "inbranch" clones are called correctly.  */
+
+void __attribute__((noinline))
+masked (TYPE * __restrict a, TYPE * __restrict b, int size)
+{
+  #pragma omp simd
+  for (int i = 0; i < size; i++)
+    b[i] = a[i]<1 ? foo(a[i]) : a[i];
+}
+
+/* Check that "inbranch" works when there might be unrolling.  */
+
+void __attribute__((noinline))
+masked_fixed (TYPE * __restrict a, TYPE * __restrict b)
+{
+  #pragma omp simd
+  for (int i = 0; i < 128; i++)
+    b[i] = a[i]<1 ? foo(a[i]) : a[i];
+}
+
+/* Validate the outputs.  */
+
+void
+check_masked (TYPE *b, int size)
+{
+  for (int i = 0; i < size; i++)
+    if (((TYPE)i < 1 && b[i] != (TYPE)(i + 1))
+	|| ((TYPE)i >= 1 && b[i] != (TYPE)i))
+      {
+	__builtin_printf ("error at %d\n", i);
+	__builtin_exit (1);
+      }
+}
+
+int
+main ()
+{
+  TYPE a[1024];
+  TYPE b[1024];
+
+  for (int i = 0; i < 1024; i++)
+    a[i] = i;
+
+  masked_fixed (a, b);
+  check_masked (b, 128);
+
+  /* Test various sizes to cover machines with different vectorization
+     factors.  */
+  for (int size = 8; size <= 1024; size *= 2)
+    {
+      masked (a, b, size);
+      check_masked (b, size);
+    }
+
+  /* Test sizes that might exercise the partial vector code-path.  */
+  for (int size = 8; size <= 1024; size *= 2)
+    {
+      masked (a, b, size-4);
+      check_masked (b, size-4);
+    }
+
+  return 0;
+}
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16b.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16b.c
new file mode 100644
index 00000000000..a503ef85238
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16b.c
@@ -0,0 +1,14 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE float
+#include "vect-simd-clone-16.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 19 "optimized" { target x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16c.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16c.c
new file mode 100644
index 00000000000..6563879df71
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16c.c
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE short
+#include "vect-simd-clone-16.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
+
+/* Fails to use in-branch clones for TYPE=short.  */
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 16 "optimized" { target x86_64-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16d.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16d.c
new file mode 100644
index 00000000000..6c5e69482e5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16d.c
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE char
+#include "vect-simd-clone-16.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
+
+/* Fails to use in-branch clones for TYPE=char.  */
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 16 "optimized" { target x86_64-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16e.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16e.c
new file mode 100644
index 00000000000..6690844deae
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16e.c
@@ -0,0 +1,14 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE double
+#include "vect-simd-clone-16.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 19 "optimized" { target x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c
new file mode 100644
index 00000000000..e7b35a6a2dc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE __INT64_TYPE__
+#include "vect-simd-clone-16.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
+
+/* Fails to use in-branch clones for TYPE=int64.  */
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target x86_64-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17.c
new file mode 100644
index 00000000000..6f5d374a417
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17.c
@@ -0,0 +1,89 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+/* Test that simd inbranch clones work correctly.  */
+
+#ifndef TYPE
+#define TYPE int
+#endif
+
+/* A simple function that will be cloned.  */
+#pragma omp declare simd uniform(b)
+TYPE __attribute__((noinline))
+foo (TYPE a, TYPE b)
+{
+  return a + b;
+}
+
+/* Check that "inbranch" clones are called correctly.  */
+
+void __attribute__((noinline))
+masked (TYPE * __restrict a, TYPE * __restrict b, int size)
+{
+  #pragma omp simd
+  for (int i = 0; i < size; i++)
+    b[i] = a[i]<1 ? foo(a[i], 1) : a[i];
+}
+
+/* Check that "inbranch" works when there might be unrolling.  */
+
+void __attribute__((noinline))
+masked_fixed (TYPE * __restrict a, TYPE * __restrict b)
+{
+  #pragma omp simd
+  for (int i = 0; i < 128; i++)
+    b[i] = a[i]<1 ? foo(a[i], 1) : a[i];
+}
+
+/* Validate the outputs.  */
+
+void
+check_masked (TYPE *b, int size)
+{
+  for (int i = 0; i < size; i++)
+    if (((TYPE)i < 1 && b[i] != (TYPE)(i + 1))
+	|| ((TYPE)i >= 1 && b[i] != (TYPE)i))
+      {
+	__builtin_printf ("error at %d\n", i);
+	__builtin_exit (1);
+      }
+}
+
+int
+main ()
+{
+  TYPE a[1024];
+  TYPE b[1024];
+
+  for (int i = 0; i < 1024; i++)
+    a[i] = i;
+
+  masked_fixed (a, b);
+  check_masked (b, 128);
+
+  /* Test various sizes to cover machines with different vectorization
+     factors.  */
+  for (int size = 8; size <= 1024; size *= 2)
+    {
+      masked (a, b, size);
+      check_masked (b, size);
+    }
+
+  /* Test sizes that might exercise the partial vector code-path.  */
+  for (int size = 8; size <= 1024; size *= 2)
+    {
+      masked (a, b, size-4);
+      check_masked (b, size-4);
+    }
+
+  return 0;
+}
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17b.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17b.c
new file mode 100644
index 00000000000..1e2c3ab11b3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17b.c
@@ -0,0 +1,14 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE float
+#include "vect-simd-clone-17.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 19 "optimized" { target x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17c.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17c.c
new file mode 100644
index 00000000000..007001de669
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17c.c
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE short
+#include "vect-simd-clone-17.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
+
+/* Fails to use in-branch clones for TYPE=short.  */
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 16 "optimized" { target x86_64-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17d.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17d.c
new file mode 100644
index 00000000000..abb85a4ceee
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17d.c
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE char
+#include "vect-simd-clone-17.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
+
+/* Fails to use in-branch clones for TYPE=char.  */
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 16 "optimized" { target x86_64-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17e.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17e.c
new file mode 100644
index 00000000000..2c1d8a659bd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17e.c
@@ -0,0 +1,14 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE double
+#include "vect-simd-clone-17.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 19 "optimized" { target x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c
new file mode 100644
index 00000000000..582e690304f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE __INT64_TYPE__
+#include "vect-simd-clone-17.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
+
+/* Fails to use in-branch clones for TYPE=int64.  */
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target x86_64-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18.c
new file mode 100644
index 00000000000..750a3f92b62
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18.c
@@ -0,0 +1,89 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+/* Test that simd inbranch clones work correctly.  */
+
+#ifndef TYPE
+#define TYPE int
+#endif
+
+/* A simple function that will be cloned.  */
+#pragma omp declare simd uniform(b)
+TYPE __attribute__((noinline))
+foo (TYPE b, TYPE a)
+{
+  return a + b;
+}
+
+/* Check that "inbranch" clones are called correctly.  */
+
+void __attribute__((noinline))
+masked (TYPE * __restrict a, TYPE * __restrict b, int size)
+{
+  #pragma omp simd
+  for (int i = 0; i < size; i++)
+    b[i] = a[i]<1 ? foo(1, a[i]) : a[i];
+}
+
+/* Check that "inbranch" works when there might be unrolling.  */
+
+void __attribute__((noinline))
+masked_fixed (TYPE * __restrict a, TYPE * __restrict b)
+{
+  #pragma omp simd
+  for (int i = 0; i < 128; i++)
+    b[i] = a[i]<1 ? foo(1, a[i]) : a[i];
+}
+
+/* Validate the outputs.  */
+
+void
+check_masked (TYPE *b, int size)
+{
+  for (int i = 0; i < size; i++)
+    if (((TYPE)i < 1 && b[i] != (TYPE)(i + 1))
+	|| ((TYPE)i >= 1 && b[i] != (TYPE)i))
+      {
+	__builtin_printf ("error at %d\n", i);
+	__builtin_exit (1);
+      }
+}
+
+int
+main ()
+{
+  TYPE a[1024];
+  TYPE b[1024];
+
+  for (int i = 0; i < 1024; i++)
+    a[i] = i;
+
+  masked_fixed (a, b);
+  check_masked (b, 128);
+
+  /* Test various sizes to cover machines with different vectorization
+     factors.  */
+  for (int size = 8; size <= 1024; size *= 2)
+    {
+      masked (a, b, size);
+      check_masked (b, size);
+    }
+
+  /* Test sizes that might exercise the partial vector code-path.  */
+  for (int size = 8; size <= 1024; size *= 2)
+    {
+      masked (a, b, size-4);
+      check_masked (b, size-4);
+    }
+
+  return 0;
+}
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18b.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18b.c
new file mode 100644
index 00000000000..a77ccf3bfcc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18b.c
@@ -0,0 +1,14 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE float
+#include "vect-simd-clone-18.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 19 "optimized" { target x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18c.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18c.c
new file mode 100644
index 00000000000..bee5f338abe
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18c.c
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE short
+#include "vect-simd-clone-18.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
+
+/* Fails to use in-branch clones for TYPE=short.  */
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 16 "optimized" { target x86_64-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18d.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18d.c
new file mode 100644
index 00000000000..a749edefdd7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18d.c
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE char
+#include "vect-simd-clone-18.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
+
+/* Fails to use in-branch clones for TYPE=char.  */
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 16 "optimized" { target x86_64-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18e.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18e.c
new file mode 100644
index 00000000000..061e0dc2621
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18e.c
@@ -0,0 +1,14 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE double
+#include "vect-simd-clone-18.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 19 "optimized" { target x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c
new file mode 100644
index 00000000000..a3037f5809a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE __INT64_TYPE__
+#include "vect-simd-clone-18.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
+
+/* Fails to use in-branch clones for TYPE=int64.  */
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target x86_64-*-* } } } */
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 1c8e1a45234..82b21add802 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -122,6 +122,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-dse.h"
 #include "tree-vectorizer.h"
 #include "tree-eh.h"
+#include "cgraph.h"
 
 /* Only handle PHIs with no more arguments unless we are asked to by
    simd pragma.  */
@@ -1054,7 +1055,8 @@ if_convertible_gimple_assign_stmt_p (gimple *stmt,
    A statement is if-convertible if:
    - it is an if-convertible GIMPLE_ASSIGN,
    - it is a GIMPLE_LABEL or a GIMPLE_COND,
-   - it is builtins call.  */
+   - it is builtins call.
+   - it is a call to a function with a SIMD clone.  */
 
 static bool
 if_convertible_stmt_p (gimple *stmt, vec<data_reference_p> refs)
@@ -1074,13 +1076,19 @@ if_convertible_stmt_p (gimple *stmt, vec<data_reference_p> refs)
 	tree fndecl = gimple_call_fndecl (stmt);
 	if (fndecl)
 	  {
+	    /* We can vectorize some builtins and functions with SIMD
+	       clones.  */
 	    int flags = gimple_call_flags (stmt);
+	    struct cgraph_node *node = cgraph_node::get (fndecl);
 	    if ((flags & ECF_CONST)
 		&& !(flags & ECF_LOOPING_CONST_OR_PURE)
-		/* We can only vectorize some builtins at the moment,
-		   so restrict if-conversion to those.  */
 		&& fndecl_built_in_p (fndecl))
 	      return true;
+	    else if (node && node->simd_clones != NULL)
+	      {
+		need_to_predicate = true;
+		return true;
+	      }
 	  }
 	return false;
       }
@@ -2614,6 +2622,31 @@ predicate_statements (loop_p loop)
 	      gimple_assign_set_rhs1 (stmt, ifc_temp_var (type, rhs, &gsi));
 	      update_stmt (stmt);
 	    }
+
+	  /* Add a predicate parameter to functions that have a SIMD clone.
+	     This will cause the vectorizer to match the "in branch" clone
+	     variants because they also have the extra parameter, and serves
+	     to build the mask vector in a natural way.  */
+	  gcall *call = dyn_cast <gcall *> (gsi_stmt (gsi));
+	  if (call && !gimple_call_internal_p (call))
+	    {
+	      tree orig_fndecl = gimple_call_fndecl (call);
+	      int orig_nargs = gimple_call_num_args (call);
+	      auto_vec<tree> args;
+	      for (int i=0; i < orig_nargs; i++)
+		args.safe_push (gimple_call_arg (call, i));
+	      args.safe_push (cond);
+
+	      /* Replace the call with a new one that has the extra
+		 parameter.  The FUNCTION_DECL remains unchanged so that
+		 the vectorizer can find the SIMD clones.  This call will
+		 either be deleted or replaced at that time, so the
+		 mismatch is short-lived and we can live with it.  */
+	      gcall *new_call = gimple_build_call_vec (orig_fndecl, args);
+	      gimple_call_set_lhs (new_call, gimple_call_lhs (call));
+	      gsi_replace (&gsi, new_call, true);
+	    }
+
 	  lhs = gimple_get_lhs (gsi_stmt (gsi));
 	  if (lhs && TREE_CODE (lhs) == SSA_NAME)
 	    ssa_names.add (lhs);
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index f582d238984..2214d216c15 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -4049,16 +4049,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	  || thisarginfo.dt == vect_external_def)
 	gcc_assert (thisarginfo.vectype == NULL_TREE);
       else
-	{
-	  gcc_assert (thisarginfo.vectype != NULL_TREE);
-	  if (VECTOR_BOOLEAN_TYPE_P (thisarginfo.vectype))
-	    {
-	      if (dump_enabled_p ())
-		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-				 "vector mask arguments are not supported\n");
-	      return false;
-	    }
-	}
+	gcc_assert (thisarginfo.vectype != NULL_TREE);
 
       /* For linear arguments, the analyze phase should have saved
 	 the base and step in STMT_VINFO_SIMD_CLONE_INFO.  */
@@ -4151,9 +4142,6 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	if (target_badness < 0)
 	  continue;
 	this_badness += target_badness * 512;
-	/* FORNOW: Have to add code to add the mask argument.  */
-	if (n->simdclone->inbranch)
-	  continue;
 	for (i = 0; i < nargs; i++)
 	  {
 	    switch (n->simdclone->args[i].arg_type)
@@ -4191,7 +4179,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 		i = -1;
 		break;
 	      case SIMD_CLONE_ARG_TYPE_MASK:
-		gcc_unreachable ();
+		break;
 	      }
 	    if (i == (size_t) -1)
 	      break;
@@ -4217,18 +4205,55 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
     return false;
 
   for (i = 0; i < nargs; i++)
-    if ((arginfo[i].dt == vect_constant_def
-	 || arginfo[i].dt == vect_external_def)
-	&& bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR)
-      {
-	tree arg_type = TREE_TYPE (gimple_call_arg (stmt, i));
-	arginfo[i].vectype = get_vectype_for_scalar_type (vinfo, arg_type,
-							  slp_node);
-	if (arginfo[i].vectype == NULL
-	    || !constant_multiple_p (bestn->simdclone->simdlen,
-				     simd_clone_subparts (arginfo[i].vectype)))
+    {
+      if ((arginfo[i].dt == vect_constant_def
+	   || arginfo[i].dt == vect_external_def)
+	  && bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR)
+	{
+	  tree arg_type = TREE_TYPE (gimple_call_arg (stmt, i));
+	  arginfo[i].vectype = get_vectype_for_scalar_type (vinfo, arg_type,
+							    slp_node);
+	  if (arginfo[i].vectype == NULL
+	      || !constant_multiple_p (bestn->simdclone->simdlen,
+				       simd_clone_subparts (arginfo[i].vectype)))
+	    return false;
+	}
+
+      if (bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR
+	  && VECTOR_BOOLEAN_TYPE_P (bestn->simdclone->args[i].vector_type))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			     "vector mask arguments are not supported.\n");
 	  return false;
-      }
+	}
+
+      if (bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_MASK
+	  && bestn->simdclone->mask_mode == VOIDmode
+	  && (simd_clone_subparts (bestn->simdclone->args[i].vector_type)
+	      != simd_clone_subparts (arginfo[i].vectype)))
+	{
+	  /* FORNOW we only have partial support for vector-type masks that
+	     can't hold all of simdlen. */
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION,
+			     vect_location,
+			     "in-branch vector clones are not yet"
+			     " supported for mismatched vector sizes.\n");
+	  return false;
+	}
+      if (bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_MASK
+	  && bestn->simdclone->mask_mode != VOIDmode)
+	{
+	  /* FORNOW don't support integer-type masks.  */
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION,
+			     vect_location,
+			     "in-branch vector clones are not yet"
+			     " supported for integer mask modes.\n");
+	  return false;
+	}
+    }
 
   fndecl = bestn->decl;
   nunits = bestn->simdclone->simdlen;
@@ -4417,6 +4442,65 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 		    }
 		}
 	      break;
+	    case SIMD_CLONE_ARG_TYPE_MASK:
+	      atype = bestn->simdclone->args[i].vector_type;
+	      if (bestn->simdclone->mask_mode != VOIDmode)
+		{
+		  /* FORNOW: this is disabled above.  */
+		  gcc_unreachable ();
+		}
+	      else
+		{
+		  tree elt_type = TREE_TYPE (atype);
+		  tree one = fold_convert (elt_type, integer_one_node);
+		  tree zero = fold_convert (elt_type, integer_zero_node);
+		  o = vector_unroll_factor (nunits,
+					    simd_clone_subparts (atype));
+		  for (m = j * o; m < (j + 1) * o; m++)
+		    {
+		      if (simd_clone_subparts (atype)
+			  < simd_clone_subparts (arginfo[i].vectype))
+			{
+			  /* The mask type has fewer elements than simdlen.  */
+
+			  /* FORNOW */
+			  gcc_unreachable ();
+			}
+		      else if (simd_clone_subparts (atype)
+			       == simd_clone_subparts (arginfo[i].vectype))
+			{
+			  /* The SIMD clone function has the same number of
+			     elements as the current function.  */
+			  if (m == 0)
+			    {
+			      vect_get_vec_defs_for_operand (vinfo, stmt_info,
+							     o * ncopies,
+							     op,
+							     &vec_oprnds[i]);
+			      vec_oprnds_i[i] = 0;
+			    }
+			  vec_oprnd0 = vec_oprnds[i][vec_oprnds_i[i]++];
+			  vec_oprnd0
+			    = build3 (VEC_COND_EXPR, atype, vec_oprnd0,
+				      build_vector_from_val (atype, one),
+				      build_vector_from_val (atype, zero));
+			  gassign *new_stmt
+			    = gimple_build_assign (make_ssa_name (atype),
+						   vec_oprnd0);
+			  vect_finish_stmt_generation (vinfo, stmt_info,
+						       new_stmt, gsi);
+			  vargs.safe_push (gimple_assign_lhs (new_stmt));
+			}
+		      else
+			{
+			  /* The mask type has more elements than simdlen.  */
+
+			  /* FORNOW */
+			  gcc_unreachable ();
+			}
+		    }
+		}
+	      break;
 	    case SIMD_CLONE_ARG_TYPE_UNIFORM:
 	      vargs.safe_push (op);
 	      break;

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] omp-simd-clone: Allow fixed-lane vectors
  2022-08-09 13:23 ` [PATCH 1/3] omp-simd-clone: Allow fixed-lane vectors Andrew Stubbs
@ 2022-08-26 11:04   ` Jakub Jelinek
  2022-08-30 14:52     ` Andrew Stubbs
  0 siblings, 1 reply; 21+ messages in thread
From: Jakub Jelinek @ 2022-08-26 11:04 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: gcc-patches

On Tue, Aug 09, 2022 at 02:23:48PM +0100, Andrew Stubbs wrote:
> 
> The vecsize_int/vecsize_float has an assumption that all arguments will use
> the same bitsize, and vary the number of lanes according to the element size,
> but this is inappropriate on targets where the number of lanes is fixed and
> the bitsize varies (i.e. amdgcn).
> 
> With this change the vecsize can be left zero and the vectorization factor will
> be the same for all types.
> 
> gcc/ChangeLog:
> 
> 	* doc/tm.texi: Regenerate.
> 	* omp-simd-clone.cc (simd_clone_adjust_return_type): Allow zero
> 	vecsize.
> 	(simd_clone_adjust_argument_types): Likewise.
> 	* target.def (compute_vecsize_and_simdlen): Document the new
> 	vecsize_int and vecsize_float semantics.

LGTM, except for a formatting nit.

> @@ -618,8 +621,12 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
>  	    veclen = sc->vecsize_int;
>  	  else
>  	    veclen = sc->vecsize_float;
> -	  veclen = exact_div (veclen,
> -			      GET_MODE_BITSIZE (SCALAR_TYPE_MODE (parm_type)));
> +	  if (known_eq (veclen, 0))
> +	    veclen = sc->simdlen;
> +	  else
> +	    veclen = exact_div (veclen,
> +				GET_MODE_BITSIZE
> +				(SCALAR_TYPE_MODE (parm_type)));

Macro name on one line and ( on another is too ugly, can you please use:
	    veclen
	      = exact_div (veclen,
			   GET_MODE_BITSIZE (SCALAR_TYPE_MODE (parm_type)));
or:
	    {
	      scalar_mode m = SCALAR_TYPE_MODE (parm_type);
	      veclen = exact_div (veclen, GET_MODE_BITSIZE (m));
	    }
?

	Jakub


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] omp-simd-clone: Allow fixed-lane vectors
  2022-08-26 11:04   ` Jakub Jelinek
@ 2022-08-30 14:52     ` Andrew Stubbs
  2022-08-30 16:54       ` Rainer Orth
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Stubbs @ 2022-08-30 14:52 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 406 bytes --]

On 26/08/2022 12:04, Jakub Jelinek wrote:
>> gcc/ChangeLog:
>>
>> 	* doc/tm.texi: Regenerate.
>> 	* omp-simd-clone.cc (simd_clone_adjust_return_type): Allow zero
>> 	vecsize.
>> 	(simd_clone_adjust_argument_types): Likewise.
>> 	* target.def (compute_vecsize_and_simdlen): Document the new
>> 	vecsize_int and vecsize_float semantics.
> 
> LGTM, except for a formatting nit.

Here's what I pushed.

Andrew

[-- Attachment #2: 220830-allow-fixed-lane-vectors.patch --]
[-- Type: text/plain, Size: 3969 bytes --]

omp-simd-clone: Allow fixed-lane vectors

The vecsize_int/vecsize_float has an assumption that all arguments will use
the same bitsize, and vary the number of lanes according to the element size,
but this is inappropriate on targets where the number of lanes is fixed and
the bitsize varies (i.e. amdgcn).

With this change the vecsize can be left zero and the vectorization factor will
be the same for all types.

gcc/ChangeLog:

	* doc/tm.texi: Regenerate.
	* omp-simd-clone.cc (simd_clone_adjust_return_type): Allow zero
	vecsize.
	(simd_clone_adjust_argument_types): Likewise.
	* target.def (compute_vecsize_and_simdlen): Document the new
	vecsize_int and vecsize_float semantics.

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 92bda1a7e14..c3001c6ded9 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6253,6 +6253,9 @@ stores.
 This hook should set @var{vecsize_mangle}, @var{vecsize_int}, @var{vecsize_float}
 fields in @var{simd_clone} structure pointed by @var{clone_info} argument and also
 @var{simdlen} field if it was previously 0.
+@var{vecsize_mangle} is a marker for the backend only. @var{vecsize_int} and
+@var{vecsize_float} should be left zero on targets where the number of lanes is
+not determined by the bitsize (in which case @var{simdlen} is always used).
 The hook should return 0 if SIMD clones shouldn't be emitted,
 or number of @var{vecsize_mangle} variants that should be emitted.
 @end deftypefn
diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
index 58bd68b129b..68ee4c2c3b0 100644
--- a/gcc/omp-simd-clone.cc
+++ b/gcc/omp-simd-clone.cc
@@ -504,7 +504,10 @@ simd_clone_adjust_return_type (struct cgraph_node *node)
     veclen = node->simdclone->vecsize_int;
   else
     veclen = node->simdclone->vecsize_float;
-  veclen = exact_div (veclen, GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t)));
+  if (known_eq (veclen, 0))
+    veclen = node->simdclone->simdlen;
+  else
+    veclen = exact_div (veclen, GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t)));
   if (multiple_p (veclen, node->simdclone->simdlen))
     veclen = node->simdclone->simdlen;
   if (POINTER_TYPE_P (t))
@@ -618,8 +621,12 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 	    veclen = sc->vecsize_int;
 	  else
 	    veclen = sc->vecsize_float;
-	  veclen = exact_div (veclen,
-			      GET_MODE_BITSIZE (SCALAR_TYPE_MODE (parm_type)));
+	  if (known_eq (veclen, 0))
+	    veclen = sc->simdlen;
+	  else
+	    veclen
+	      = exact_div (veclen,
+			   GET_MODE_BITSIZE (SCALAR_TYPE_MODE (parm_type)));
 	  if (multiple_p (veclen, sc->simdlen))
 	    veclen = sc->simdlen;
 	  adj.op = IPA_PARAM_OP_NEW;
@@ -669,8 +676,11 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 	veclen = sc->vecsize_int;
       else
 	veclen = sc->vecsize_float;
-      veclen = exact_div (veclen,
-			  GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type)));
+      if (known_eq (veclen, 0))
+	veclen = sc->simdlen;
+      else
+	veclen = exact_div (veclen,
+			    GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type)));
       if (multiple_p (veclen, sc->simdlen))
 	veclen = sc->simdlen;
       if (sc->mask_mode != VOIDmode)
diff --git a/gcc/target.def b/gcc/target.def
index 2a7fa68f83d..4d49ffc2c88 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1629,6 +1629,9 @@ DEFHOOK
 "This hook should set @var{vecsize_mangle}, @var{vecsize_int}, @var{vecsize_float}\n\
 fields in @var{simd_clone} structure pointed by @var{clone_info} argument and also\n\
 @var{simdlen} field if it was previously 0.\n\
+@var{vecsize_mangle} is a marker for the backend only. @var{vecsize_int} and\n\
+@var{vecsize_float} should be left zero on targets where the number of lanes is\n\
+not determined by the bitsize (in which case @var{simdlen} is always used).\n\
 The hook should return 0 if SIMD clones shouldn't be emitted,\n\
 or number of @var{vecsize_mangle} variants that should be emitted.",
 int, (struct cgraph_node *, struct cgraph_simd_clone *, tree, int), NULL)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] amdgcn: OpenMP SIMD routine support
  2022-08-09 13:23 ` [PATCH 2/3] amdgcn: OpenMP SIMD routine support Andrew Stubbs
@ 2022-08-30 14:53   ` Andrew Stubbs
  0 siblings, 0 replies; 21+ messages in thread
From: Andrew Stubbs @ 2022-08-30 14:53 UTC (permalink / raw)
  To: gcc-patches

On 09/08/2022 14:23, Andrew Stubbs wrote:
> 
> Enable and configure SIMD clones for amdgcn.  This affects both the __simd__
> function attribute, and the OpenMP "declare simd" directive.
> 
> Note that the masked SIMD variants are generated, but the middle end doesn't
> actually support calling them yet.
> 
> gcc/ChangeLog:
> 
> 	* config/gcn/gcn.cc (gcn_simd_clone_compute_vecsize_and_simdlen): New.
> 	(gcn_simd_clone_adjust): New.
> 	(gcn_simd_clone_usable): New.
> 	(TARGET_SIMD_CLONE_ADJUST): New.
> 	(TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN): New.
> 	(TARGET_SIMD_CLONE_USABLE): New.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.dg/vect/vect-simd-clone-1.c: Add dg-warning.
> 	* gcc.dg/vect/vect-simd-clone-2.c: Add dg-warning.
> 	* gcc.dg/vect/vect-simd-clone-3.c: Add dg-warning.
> 	* gcc.dg/vect/vect-simd-clone-4.c: Add dg-warning.
> 	* gcc.dg/vect/vect-simd-clone-5.c: Add dg-warning.
> 	* gcc.dg/vect/vect-simd-clone-8.c: Add dg-warning.

The dependency was approved, so this is now committed.

Andrew


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] omp-simd-clone: Allow fixed-lane vectors
  2022-08-30 14:52     ` Andrew Stubbs
@ 2022-08-30 16:54       ` Rainer Orth
  2022-08-31  7:11         ` Martin Liška
  2022-08-31  8:29         ` Jakub Jelinek
  0 siblings, 2 replies; 21+ messages in thread
From: Rainer Orth @ 2022-08-30 16:54 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: Jakub Jelinek, gcc-patches

Hi Andrew,

> On 26/08/2022 12:04, Jakub Jelinek wrote:
>>> gcc/ChangeLog:
>>>
>>> 	* doc/tm.texi: Regenerate.
>>> 	* omp-simd-clone.cc (simd_clone_adjust_return_type): Allow zero
>>> 	vecsize.
>>> 	(simd_clone_adjust_argument_types): Likewise.
>>> 	* target.def (compute_vecsize_and_simdlen): Document the new
>>> 	vecsize_int and vecsize_float semantics.
>> LGTM, except for a formatting nit.
> diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
> index 58bd68b129b..68ee4c2c3b0 100644
> --- a/gcc/omp-simd-clone.cc
> +++ b/gcc/omp-simd-clone.cc
> @@ -504,7 +504,10 @@ simd_clone_adjust_return_type (struct cgraph_node *node)
>      veclen = node->simdclone->vecsize_int;
>    else
>      veclen = node->simdclone->vecsize_float;
> -  veclen = exact_div (veclen, GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t)));
> +  if (known_eq (veclen, 0))
> +    veclen = node->simdclone->simdlen;
> +  else
> +    veclen = exact_div (veclen, GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t)));
>    if (multiple_p (veclen, node->simdclone->simdlen))
>      veclen = node->simdclone->simdlen;
>    if (POINTER_TYPE_P (t))

this broke bootstrap on (at least) i386-pc-solaris2.11 and
sparc-sun-solaris2.11:

In file included from /vol/gcc/src/hg/master/local/gcc/coretypes.h:475,
                 from /vol/gcc/src/hg/master/local/gcc/omp-simd-clone.cc:23:
/vol/gcc/src/hg/master/local/gcc/poly-int.h: In instantiation of 'typename if_nonpoly<Cb, bool>::type maybe_ne(const poly_int_pod<N, C>&, const Cb&) [with unsigned int N = 1; Ca = long long unsigned int; Cb = int; typename if_nonpoly<Cb, bool>::type = bool]':
/vol/gcc/src/hg/master/local/gcc/omp-simd-clone.cc:507:7:   required from here
/vol/gcc/src/hg/master/local/gcc/poly-int.h:1295:22: error: comparison of integer expressions of different signedness: 'const long long unsigned int' and 'const int' [-Werror=sign-compare]
 1295 |   return a.coeffs[0] != b;
      |          ~~~~~~~~~~~~^~~~

Changing the three instances of 0 to 0U seems to fix this.

	Rainer

-- 
-----------------------------------------------------------------------------
Rainer Orth, Center for Biotechnology, Bielefeld University

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] omp-simd-clone: Allow fixed-lane vectors
  2022-08-30 16:54       ` Rainer Orth
@ 2022-08-31  7:11         ` Martin Liška
  2022-08-31  8:29         ` Jakub Jelinek
  1 sibling, 0 replies; 21+ messages in thread
From: Martin Liška @ 2022-08-31  7:11 UTC (permalink / raw)
  To: Rainer Orth, Andrew Stubbs; +Cc: Jakub Jelinek, gcc-patches

On 8/30/22 18:54, Rainer Orth wrote:
> Hi Andrew,
> 
>> On 26/08/2022 12:04, Jakub Jelinek wrote:
>>>> gcc/ChangeLog:
>>>>
>>>> 	* doc/tm.texi: Regenerate.
>>>> 	* omp-simd-clone.cc (simd_clone_adjust_return_type): Allow zero
>>>> 	vecsize.
>>>> 	(simd_clone_adjust_argument_types): Likewise.
>>>> 	* target.def (compute_vecsize_and_simdlen): Document the new
>>>> 	vecsize_int and vecsize_float semantics.
>>> LGTM, except for a formatting nit.
>> diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
>> index 58bd68b129b..68ee4c2c3b0 100644
>> --- a/gcc/omp-simd-clone.cc
>> +++ b/gcc/omp-simd-clone.cc
>> @@ -504,7 +504,10 @@ simd_clone_adjust_return_type (struct cgraph_node *node)
>>      veclen = node->simdclone->vecsize_int;
>>    else
>>      veclen = node->simdclone->vecsize_float;
>> -  veclen = exact_div (veclen, GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t)));
>> +  if (known_eq (veclen, 0))
>> +    veclen = node->simdclone->simdlen;
>> +  else
>> +    veclen = exact_div (veclen, GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t)));
>>    if (multiple_p (veclen, node->simdclone->simdlen))
>>      veclen = node->simdclone->simdlen;
>>    if (POINTER_TYPE_P (t))
> 
> this broke bootstrap on (at least) i386-pc-solaris2.11 and
> sparc-sun-solaris2.11:
> 
> In file included from /vol/gcc/src/hg/master/local/gcc/coretypes.h:475,
>                  from /vol/gcc/src/hg/master/local/gcc/omp-simd-clone.cc:23:
> /vol/gcc/src/hg/master/local/gcc/poly-int.h: In instantiation of 'typename if_nonpoly<Cb, bool>::type maybe_ne(const poly_int_pod<N, C>&, const Cb&) [with unsigned int N = 1; Ca = long long unsigned int; Cb = int; typename if_nonpoly<Cb, bool>::type = bool]':
> /vol/gcc/src/hg/master/local/gcc/omp-simd-clone.cc:507:7:   required from here
> /vol/gcc/src/hg/master/local/gcc/poly-int.h:1295:22: error: comparison of integer expressions of different signedness: 'const long long unsigned int' and 'const int' [-Werror=sign-compare]
>  1295 |   return a.coeffs[0] != b;
>       |          ~~~~~~~~~~~~^~~~

I noticed the very same warning on x86_64-linux-gnu as well.

Cheers,
Martin

> 
> Changing the three instances of 0 to 0U seems to fix this.
> 
> 	Rainer
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] omp-simd-clone: Allow fixed-lane vectors
  2022-08-30 16:54       ` Rainer Orth
  2022-08-31  7:11         ` Martin Liška
@ 2022-08-31  8:29         ` Jakub Jelinek
  2022-08-31  8:35           ` Andrew Stubbs
  1 sibling, 1 reply; 21+ messages in thread
From: Jakub Jelinek @ 2022-08-31  8:29 UTC (permalink / raw)
  To: Rainer Orth; +Cc: Andrew Stubbs, gcc-patches

On Tue, Aug 30, 2022 at 06:54:49PM +0200, Rainer Orth wrote:
> > --- a/gcc/omp-simd-clone.cc
> > +++ b/gcc/omp-simd-clone.cc
> > @@ -504,7 +504,10 @@ simd_clone_adjust_return_type (struct cgraph_node *node)
> >      veclen = node->simdclone->vecsize_int;
> >    else
> >      veclen = node->simdclone->vecsize_float;
> > -  veclen = exact_div (veclen, GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t)));
> > +  if (known_eq (veclen, 0))
> > +    veclen = node->simdclone->simdlen;
> > +  else
> > +    veclen = exact_div (veclen, GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t)));
> >    if (multiple_p (veclen, node->simdclone->simdlen))
> >      veclen = node->simdclone->simdlen;
> >    if (POINTER_TYPE_P (t))
> 
> this broke bootstrap on (at least) i386-pc-solaris2.11 and
> sparc-sun-solaris2.11:
> 
> In file included from /vol/gcc/src/hg/master/local/gcc/coretypes.h:475,
>                  from /vol/gcc/src/hg/master/local/gcc/omp-simd-clone.cc:23:
> /vol/gcc/src/hg/master/local/gcc/poly-int.h: In instantiation of 'typename if_nonpoly<Cb, bool>::type maybe_ne(const poly_int_pod<N, C>&, const Cb&) [with unsigned int N = 1; Ca = long long unsigned int; Cb = int; typename if_nonpoly<Cb, bool>::type = bool]':
> /vol/gcc/src/hg/master/local/gcc/omp-simd-clone.cc:507:7:   required from here
> /vol/gcc/src/hg/master/local/gcc/poly-int.h:1295:22: error: comparison of integer expressions of different signedness: 'const long long unsigned int' and 'const int' [-Werror=sign-compare]
>  1295 |   return a.coeffs[0] != b;
>       |          ~~~~~~~~~~~~^~~~
> 
> Changing the three instances of 0 to 0U seems to fix this.

It broke bootstrap for me on x86_64-linux and i686-linux too.

I've bootstrapped/regtested the following patch on both overnight
and committed to unbreak bootstrap for others.

2022-08-31  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>
	    Jakub Jelinek  <jakub@redhat.com>

	* omp-simd-clone.cc (simd_clone_adjust_return_type,
	simd_clone_adjust_argument_types): Use known_eq (veclen, 0U)
	instead of known_eq (veclen, 0) to avoid -Wsign-compare warnings.

--- gcc/omp-simd-clone.cc.jj	2022-08-30 23:10:02.054456930 +0200
+++ gcc/omp-simd-clone.cc	2022-08-30 23:51:03.601664615 +0200
@@ -504,7 +504,7 @@ simd_clone_adjust_return_type (struct cg
     veclen = node->simdclone->vecsize_int;
   else
     veclen = node->simdclone->vecsize_float;
-  if (known_eq (veclen, 0))
+  if (known_eq (veclen, 0U))
     veclen = node->simdclone->simdlen;
   else
     veclen = exact_div (veclen, GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t)));
@@ -621,7 +621,7 @@ simd_clone_adjust_argument_types (struct
 	    veclen = sc->vecsize_int;
 	  else
 	    veclen = sc->vecsize_float;
-	  if (known_eq (veclen, 0))
+	  if (known_eq (veclen, 0U))
 	    veclen = sc->simdlen;
 	  else
 	    veclen
@@ -676,7 +676,7 @@ simd_clone_adjust_argument_types (struct
 	veclen = sc->vecsize_int;
       else
 	veclen = sc->vecsize_float;
-      if (known_eq (veclen, 0))
+      if (known_eq (veclen, 0U))
 	veclen = sc->simdlen;
       else
 	veclen = exact_div (veclen,


	Jakub


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] omp-simd-clone: Allow fixed-lane vectors
  2022-08-31  8:29         ` Jakub Jelinek
@ 2022-08-31  8:35           ` Andrew Stubbs
  0 siblings, 0 replies; 21+ messages in thread
From: Andrew Stubbs @ 2022-08-31  8:35 UTC (permalink / raw)
  To: Jakub Jelinek, Rainer Orth; +Cc: gcc-patches

On 31/08/2022 09:29, Jakub Jelinek wrote:
> On Tue, Aug 30, 2022 at 06:54:49PM +0200, Rainer Orth wrote:
>>> --- a/gcc/omp-simd-clone.cc
>>> +++ b/gcc/omp-simd-clone.cc
>>> @@ -504,7 +504,10 @@ simd_clone_adjust_return_type (struct cgraph_node *node)
>>>       veclen = node->simdclone->vecsize_int;
>>>     else
>>>       veclen = node->simdclone->vecsize_float;
>>> -  veclen = exact_div (veclen, GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t)));
>>> +  if (known_eq (veclen, 0))
>>> +    veclen = node->simdclone->simdlen;
>>> +  else
>>> +    veclen = exact_div (veclen, GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t)));
>>>     if (multiple_p (veclen, node->simdclone->simdlen))
>>>       veclen = node->simdclone->simdlen;
>>>     if (POINTER_TYPE_P (t))
>>
>> this broke bootstrap on (at least) i386-pc-solaris2.11 and
>> sparc-sun-solaris2.11:
>>
>> In file included from /vol/gcc/src/hg/master/local/gcc/coretypes.h:475,
>>                   from /vol/gcc/src/hg/master/local/gcc/omp-simd-clone.cc:23:
>> /vol/gcc/src/hg/master/local/gcc/poly-int.h: In instantiation of 'typename if_nonpoly<Cb, bool>::type maybe_ne(const poly_int_pod<N, C>&, const Cb&) [with unsigned int N = 1; Ca = long long unsigned int; Cb = int; typename if_nonpoly<Cb, bool>::type = bool]':
>> /vol/gcc/src/hg/master/local/gcc/omp-simd-clone.cc:507:7:   required from here
>> /vol/gcc/src/hg/master/local/gcc/poly-int.h:1295:22: error: comparison of integer expressions of different signedness: 'const long long unsigned int' and 'const int' [-Werror=sign-compare]
>>   1295 |   return a.coeffs[0] != b;
>>        |          ~~~~~~~~~~~~^~~~
>>
>> Changing the three instances of 0 to 0U seems to fix this.
> 
> It broke bootstrap for me on x86_64-linux and i686-linux too.
> 
> I've bootstrapped/regtested the following patch on both overnight
> and committed to unbreak bootstrap for others.

Apologies everyone. :-(

I did a full build and test on x86_64, but not a bootstrap, and 
apparently it was fine with my not-so-new compiler.

Andrew


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] vect: inbranch SIMD clones
  2022-08-09 13:23 ` [PATCH 3/3] vect: inbranch SIMD clones Andrew Stubbs
@ 2022-09-09 14:31   ` Jakub Jelinek
  2022-09-14  8:09     ` Richard Biener
  2022-11-30 15:17     ` Andrew Stubbs
  0 siblings, 2 replies; 21+ messages in thread
From: Jakub Jelinek @ 2022-09-09 14:31 UTC (permalink / raw)
  To: Andrew Stubbs, Richard Biener; +Cc: gcc-patches

On Tue, Aug 09, 2022 at 02:23:50PM +0100, Andrew Stubbs wrote:
> 
> There has been support for generating "inbranch" SIMD clones for a long time,
> but nothing actually uses them (as far as I can see).

Thanks for working on this.

Note, there is another case where the inbranch SIMD clones could be used
and I even thought it is implemented, but apparently it isn't or it doesn't
work:
#ifndef TYPE
#define TYPE int
#endif

/* A simple function that will be cloned.  */
#pragma omp declare simd inbranch
TYPE __attribute__((noinline))
foo (TYPE a)
{
  return a + 1;
}

/* Check that "inbranch" clones are called correctly.  */

void __attribute__((noinline))
masked (TYPE * __restrict a, TYPE * __restrict b, int size)
{
  #pragma omp simd
  for (int i = 0; i < size; i++)
    b[i] = foo(a[i]);
}

Here, IMHO we should use the inbranch clone for vectorization (better
than not vectorizing it, worse than when we'd have a notinbranch clone)
and just use mask of all ones.
But sure, it can be done incrementally, just mentioning it for completeness.

> This patch add supports for a sub-set of possible cases (those using
> mask_mode == VOIDmode).  The other cases fail to vectorize, just as before,
> so there should be no regressions.
> 
> The sub-set of support should cover all cases needed by amdgcn, at present.
> 
> gcc/ChangeLog:
> 
> 	* omp-simd-clone.cc (simd_clone_adjust_argument_types): Set vector_type
> 	for mask arguments also.
> 	* tree-if-conv.cc: Include cgraph.h.
> 	(if_convertible_stmt_p): Do if conversions for calls to SIMD calls.
> 	(predicate_statements): Pass the predicate to SIMD functions.
> 	* tree-vect-stmts.cc (vectorizable_simd_clone_call): Permit calls
> 	to clones with mask arguments, in some cases.
> 	Generate the mask vector arguments.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.dg/vect/vect-simd-clone-16.c: New test.
> 	* gcc.dg/vect/vect-simd-clone-16b.c: New test.
> 	* gcc.dg/vect/vect-simd-clone-16c.c: New test.
> 	* gcc.dg/vect/vect-simd-clone-16d.c: New test.
> 	* gcc.dg/vect/vect-simd-clone-16e.c: New test.
> 	* gcc.dg/vect/vect-simd-clone-16f.c: New test.
> 	* gcc.dg/vect/vect-simd-clone-17.c: New test.
> 	* gcc.dg/vect/vect-simd-clone-17b.c: New test.
> 	* gcc.dg/vect/vect-simd-clone-17c.c: New test.
> 	* gcc.dg/vect/vect-simd-clone-17d.c: New test.
> 	* gcc.dg/vect/vect-simd-clone-17e.c: New test.
> 	* gcc.dg/vect/vect-simd-clone-17f.c: New test.
> 	* gcc.dg/vect/vect-simd-clone-18.c: New test.
> 	* gcc.dg/vect/vect-simd-clone-18b.c: New test.
> 	* gcc.dg/vect/vect-simd-clone-18c.c: New test.
> 	* gcc.dg/vect/vect-simd-clone-18d.c: New test.
> 	* gcc.dg/vect/vect-simd-clone-18e.c: New test.
> 	* gcc.dg/vect/vect-simd-clone-18f.c: New test.

> --- a/gcc/tree-if-conv.cc
> +++ b/gcc/tree-if-conv.cc
> @@ -1074,13 +1076,19 @@ if_convertible_stmt_p (gimple *stmt, vec<data_reference_p> refs)
>  	tree fndecl = gimple_call_fndecl (stmt);
>  	if (fndecl)
>  	  {
> +	    /* We can vectorize some builtins and functions with SIMD
> +	       clones.  */
>  	    int flags = gimple_call_flags (stmt);
> +	    struct cgraph_node *node = cgraph_node::get (fndecl);
>  	    if ((flags & ECF_CONST)
>  		&& !(flags & ECF_LOOPING_CONST_OR_PURE)
> -		/* We can only vectorize some builtins at the moment,
> -		   so restrict if-conversion to those.  */
>  		&& fndecl_built_in_p (fndecl))
>  	      return true;
> +	    else if (node && node->simd_clones != NULL)
> +	      {
> +		need_to_predicate = true;

I think it would be worth it to check that at least one of the
node->simd_clones clones has ->inbranch set, because if all calls
are declare simd notinbranch, then predicating the loop will be just a
wasted effort.

> +		return true;
> +	      }
>  	  }
>  	return false;
>        }
> @@ -2614,6 +2622,31 @@ predicate_statements (loop_p loop)
>  	      gimple_assign_set_rhs1 (stmt, ifc_temp_var (type, rhs, &gsi));
>  	      update_stmt (stmt);
>  	    }
> +
> +	  /* Add a predicate parameter to functions that have a SIMD clone.
> +	     This will cause the vectorizer to match the "in branch" clone
> +	     variants because they also have the extra parameter, and serves
> +	     to build the mask vector in a natural way.  */
> +	  gcall *call = dyn_cast <gcall *> (gsi_stmt (gsi));
> +	  if (call && !gimple_call_internal_p (call))
> +	    {
> +	      tree orig_fndecl = gimple_call_fndecl (call);
> +	      int orig_nargs = gimple_call_num_args (call);
> +	      auto_vec<tree> args;
> +	      for (int i=0; i < orig_nargs; i++)
> +		args.safe_push (gimple_call_arg (call, i));
> +	      args.safe_push (cond);
> +
> +	      /* Replace the call with a new one that has the extra
> +		 parameter.  The FUNCTION_DECL remains unchanged so that
> +		 the vectorizer can find the SIMD clones.  This call will
> +		 either be deleted or replaced at that time, so the
> +		 mismatch is short-lived and we can live with it.  */
> +	      gcall *new_call = gimple_build_call_vec (orig_fndecl, args);
> +	      gimple_call_set_lhs (new_call, gimple_call_lhs (call));
> +	      gsi_replace (&gsi, new_call, true);

I think this is way too dangerous to represent conditional calls that way,
there is nothing to distinguish those from non-conditional calls.
I think I'd prefer (but please see what Richi thinks too) to represent
the conditional calls as a call to a new internal function, say
IFN_COND_CALL or IFN_MASK_CALL, which would have the arguments the original
call had, plus 2 extra ones first (or 3?), one that would be saved copy of
original gimple_call_fn (i.e. usually &fndecl), another one that would be the
condition (and dunno about whether we need also something to represent
gimple_call_fntype, or whether we simply should punt during ifcvt
on conditional calls where gimple_call_fntype is incompatible with
the function type of fndecl.  Another question is about
gimple_call_chain.  Punt or copy it over to the ifn and back.

	Jakub


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] vect: inbranch SIMD clones
  2022-09-09 14:31   ` Jakub Jelinek
@ 2022-09-14  8:09     ` Richard Biener
  2022-09-14  8:34       ` Jakub Jelinek
  2022-11-30 15:17     ` Andrew Stubbs
  1 sibling, 1 reply; 21+ messages in thread
From: Richard Biener @ 2022-09-14  8:09 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Andrew Stubbs, gcc-patches

On Fri, 9 Sep 2022, Jakub Jelinek wrote:

> On Tue, Aug 09, 2022 at 02:23:50PM +0100, Andrew Stubbs wrote:
> > 
> > There has been support for generating "inbranch" SIMD clones for a long time,
> > but nothing actually uses them (as far as I can see).
> 
> Thanks for working on this.
> 
> Note, there is another case where the inbranch SIMD clones could be used
> and I even thought it is implemented, but apparently it isn't or it doesn't
> work:
> #ifndef TYPE
> #define TYPE int
> #endif
> 
> /* A simple function that will be cloned.  */
> #pragma omp declare simd inbranch
> TYPE __attribute__((noinline))
> foo (TYPE a)
> {
>   return a + 1;
> }
> 
> /* Check that "inbranch" clones are called correctly.  */
> 
> void __attribute__((noinline))
> masked (TYPE * __restrict a, TYPE * __restrict b, int size)
> {
>   #pragma omp simd
>   for (int i = 0; i < size; i++)
>     b[i] = foo(a[i]);
> }
> 
> Here, IMHO we should use the inbranch clone for vectorization (better
> than not vectorizing it, worse than when we'd have a notinbranch clone)
> and just use mask of all ones.
> But sure, it can be done incrementally, just mentioning it for completeness.
> 
> > This patch add supports for a sub-set of possible cases (those using
> > mask_mode == VOIDmode).  The other cases fail to vectorize, just as before,
> > so there should be no regressions.
> > 
> > The sub-set of support should cover all cases needed by amdgcn, at present.
> > 
> > gcc/ChangeLog:
> > 
> > 	* omp-simd-clone.cc (simd_clone_adjust_argument_types): Set vector_type
> > 	for mask arguments also.
> > 	* tree-if-conv.cc: Include cgraph.h.
> > 	(if_convertible_stmt_p): Do if conversions for calls to SIMD calls.
> > 	(predicate_statements): Pass the predicate to SIMD functions.
> > 	* tree-vect-stmts.cc (vectorizable_simd_clone_call): Permit calls
> > 	to clones with mask arguments, in some cases.
> > 	Generate the mask vector arguments.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > 	* gcc.dg/vect/vect-simd-clone-16.c: New test.
> > 	* gcc.dg/vect/vect-simd-clone-16b.c: New test.
> > 	* gcc.dg/vect/vect-simd-clone-16c.c: New test.
> > 	* gcc.dg/vect/vect-simd-clone-16d.c: New test.
> > 	* gcc.dg/vect/vect-simd-clone-16e.c: New test.
> > 	* gcc.dg/vect/vect-simd-clone-16f.c: New test.
> > 	* gcc.dg/vect/vect-simd-clone-17.c: New test.
> > 	* gcc.dg/vect/vect-simd-clone-17b.c: New test.
> > 	* gcc.dg/vect/vect-simd-clone-17c.c: New test.
> > 	* gcc.dg/vect/vect-simd-clone-17d.c: New test.
> > 	* gcc.dg/vect/vect-simd-clone-17e.c: New test.
> > 	* gcc.dg/vect/vect-simd-clone-17f.c: New test.
> > 	* gcc.dg/vect/vect-simd-clone-18.c: New test.
> > 	* gcc.dg/vect/vect-simd-clone-18b.c: New test.
> > 	* gcc.dg/vect/vect-simd-clone-18c.c: New test.
> > 	* gcc.dg/vect/vect-simd-clone-18d.c: New test.
> > 	* gcc.dg/vect/vect-simd-clone-18e.c: New test.
> > 	* gcc.dg/vect/vect-simd-clone-18f.c: New test.
> 
> > --- a/gcc/tree-if-conv.cc
> > +++ b/gcc/tree-if-conv.cc
> > @@ -1074,13 +1076,19 @@ if_convertible_stmt_p (gimple *stmt, vec<data_reference_p> refs)
> >  	tree fndecl = gimple_call_fndecl (stmt);
> >  	if (fndecl)
> >  	  {
> > +	    /* We can vectorize some builtins and functions with SIMD
> > +	       clones.  */
> >  	    int flags = gimple_call_flags (stmt);
> > +	    struct cgraph_node *node = cgraph_node::get (fndecl);
> >  	    if ((flags & ECF_CONST)
> >  		&& !(flags & ECF_LOOPING_CONST_OR_PURE)
> > -		/* We can only vectorize some builtins at the moment,
> > -		   so restrict if-conversion to those.  */
> >  		&& fndecl_built_in_p (fndecl))
> >  	      return true;
> > +	    else if (node && node->simd_clones != NULL)
> > +	      {
> > +		need_to_predicate = true;
> 
> I think it would be worth it to check that at least one of the
> node->simd_clones clones has ->inbranch set, because if all calls
> are declare simd notinbranch, then predicating the loop will be just a
> wasted effort.
> 
> > +		return true;
> > +	      }
> >  	  }
> >  	return false;
> >        }
> > @@ -2614,6 +2622,31 @@ predicate_statements (loop_p loop)
> >  	      gimple_assign_set_rhs1 (stmt, ifc_temp_var (type, rhs, &gsi));
> >  	      update_stmt (stmt);
> >  	    }
> > +
> > +	  /* Add a predicate parameter to functions that have a SIMD clone.
> > +	     This will cause the vectorizer to match the "in branch" clone
> > +	     variants because they also have the extra parameter, and serves
> > +	     to build the mask vector in a natural way.  */
> > +	  gcall *call = dyn_cast <gcall *> (gsi_stmt (gsi));
> > +	  if (call && !gimple_call_internal_p (call))
> > +	    {
> > +	      tree orig_fndecl = gimple_call_fndecl (call);
> > +	      int orig_nargs = gimple_call_num_args (call);
> > +	      auto_vec<tree> args;
> > +	      for (int i=0; i < orig_nargs; i++)
> > +		args.safe_push (gimple_call_arg (call, i));
> > +	      args.safe_push (cond);
> > +
> > +	      /* Replace the call with a new one that has the extra
> > +		 parameter.  The FUNCTION_DECL remains unchanged so that
> > +		 the vectorizer can find the SIMD clones.  This call will
> > +		 either be deleted or replaced at that time, so the
> > +		 mismatch is short-lived and we can live with it.  */
> > +	      gcall *new_call = gimple_build_call_vec (orig_fndecl, args);
> > +	      gimple_call_set_lhs (new_call, gimple_call_lhs (call));
> > +	      gsi_replace (&gsi, new_call, true);
> 
> I think this is way too dangerous to represent conditional calls that way,
> there is nothing to distinguish those from non-conditional calls.
> I think I'd prefer (but please see what Richi thinks too) to represent
> the conditional calls as a call to a new internal function, say
> IFN_COND_CALL or IFN_MASK_CALL, which would have the arguments the original
> call had, plus 2 extra ones first (or 3?), one that would be saved copy of
> original gimple_call_fn (i.e. usually &fndecl), another one that would be the
> condition (and dunno about whether we need also something to represent
> gimple_call_fntype, or whether we simply should punt during ifcvt
> on conditional calls where gimple_call_fntype is incompatible with
> the function type of fndecl.  Another question is about
> gimple_call_chain.  Punt or copy it over to the ifn and back.

Are nested functions a thing for OpenMP?  But yes, punt on them
for now.

I agree that a conditional call should be explicit, but the above is
only transitional between if-conversion and vectorization, right?
Do we support indirect calls here?  As Jakub says one possibility
is to do

 .IFN_COND/MASK_CALL (fn-addr, condition/mask, ...)

another would be

 fnptr = cond ? fn : &nop_call;
 (*fnptr) (...);

thus replace the called function with conditional "nop".  How
to exactly represent that NOP probably isn't too important
when it's transitional until vectorization only, even NULL
might work there.  Of course having the function address
in a computation might confuse parts of the vectorizer.  OTOH
it allows to keep the original call (and the chain and any
other info we'd have to preserve otherwise).

Richard.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] vect: inbranch SIMD clones
  2022-09-14  8:09     ` Richard Biener
@ 2022-09-14  8:34       ` Jakub Jelinek
  0 siblings, 0 replies; 21+ messages in thread
From: Jakub Jelinek @ 2022-09-14  8:34 UTC (permalink / raw)
  To: Richard Biener; +Cc: Andrew Stubbs, gcc-patches

On Wed, Sep 14, 2022 at 08:09:08AM +0000, Richard Biener wrote:
> Are nested functions a thing for OpenMP?  But yes, punt on them
> for now.

For Fortran certainly because they are part of the language, for C
too because they are GNU extension.
But declare simd is mostly best effort, so we can at least for now punt.

> I agree that a conditional call should be explicit, but the above is
> only transitional between if-conversion and vectorization, right?
> Do we support indirect calls here?  As Jakub says one possibility
> is to do
> 
>  .IFN_COND/MASK_CALL (fn-addr, condition/mask, ...)
> 
> another would be
> 
>  fnptr = cond ? fn : &nop_call;
>  (*fnptr) (...);
> 
> thus replace the called function with conditional "nop".  How
> to exactly represent that NOP probably isn't too important
> when it's transitional until vectorization only, even NULL
> might work there.  Of course having the function address
> in a computation might confuse parts of the vectorizer.  OTOH
> it allows to keep the original call (and the chain and any
> other info we'd have to preserve otherwise).

On the ifn one can preserve those too and the advantage is that
it would be just one command rather than 2, but I'm not opposed
to the other way either.

	Jakub


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] vect: inbranch SIMD clones
  2022-09-09 14:31   ` Jakub Jelinek
  2022-09-14  8:09     ` Richard Biener
@ 2022-11-30 15:17     ` Andrew Stubbs
  2022-11-30 15:37       ` Jakub Jelinek
  1 sibling, 1 reply; 21+ messages in thread
From: Andrew Stubbs @ 2022-11-30 15:17 UTC (permalink / raw)
  To: Jakub Jelinek, Richard Biener; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3404 bytes --]

On 09/09/2022 15:31, Jakub Jelinek wrote:
>> --- a/gcc/tree-if-conv.cc
>> +++ b/gcc/tree-if-conv.cc
>> @@ -1074,13 +1076,19 @@ if_convertible_stmt_p (gimple *stmt, vec<data_reference_p> refs)
>>   	tree fndecl = gimple_call_fndecl (stmt);
>>   	if (fndecl)
>>   	  {
>> +	    /* We can vectorize some builtins and functions with SIMD
>> +	       clones.  */
>>   	    int flags = gimple_call_flags (stmt);
>> +	    struct cgraph_node *node = cgraph_node::get (fndecl);
>>   	    if ((flags & ECF_CONST)
>>   		&& !(flags & ECF_LOOPING_CONST_OR_PURE)
>> -		/* We can only vectorize some builtins at the moment,
>> -		   so restrict if-conversion to those.  */
>>   		&& fndecl_built_in_p (fndecl))
>>   	      return true;
>> +	    else if (node && node->simd_clones != NULL)
>> +	      {
>> +		need_to_predicate = true;
> 
> I think it would be worth it to check that at least one of the
> node->simd_clones clones has ->inbranch set, because if all calls
> are declare simd notinbranch, then predicating the loop will be just a
> wasted effort.
> 
>> +		return true;
>> +	      }
>>   	  }
>>   	return false;
>>         }
>> @@ -2614,6 +2622,31 @@ predicate_statements (loop_p loop)
>>   	      gimple_assign_set_rhs1 (stmt, ifc_temp_var (type, rhs, &gsi));
>>   	      update_stmt (stmt);
>>   	    }
>> +
>> +	  /* Add a predicate parameter to functions that have a SIMD clone.
>> +	     This will cause the vectorizer to match the "in branch" clone
>> +	     variants because they also have the extra parameter, and serves
>> +	     to build the mask vector in a natural way.  */
>> +	  gcall *call = dyn_cast <gcall *> (gsi_stmt (gsi));
>> +	  if (call && !gimple_call_internal_p (call))
>> +	    {
>> +	      tree orig_fndecl = gimple_call_fndecl (call);
>> +	      int orig_nargs = gimple_call_num_args (call);
>> +	      auto_vec<tree> args;
>> +	      for (int i=0; i < orig_nargs; i++)
>> +		args.safe_push (gimple_call_arg (call, i));
>> +	      args.safe_push (cond);
>> +
>> +	      /* Replace the call with a new one that has the extra
>> +		 parameter.  The FUNCTION_DECL remains unchanged so that
>> +		 the vectorizer can find the SIMD clones.  This call will
>> +		 either be deleted or replaced at that time, so the
>> +		 mismatch is short-lived and we can live with it.  */
>> +	      gcall *new_call = gimple_build_call_vec (orig_fndecl, args);
>> +	      gimple_call_set_lhs (new_call, gimple_call_lhs (call));
>> +	      gsi_replace (&gsi, new_call, true);
> 
> I think this is way too dangerous to represent conditional calls that way,
> there is nothing to distinguish those from non-conditional calls.
> I think I'd prefer (but please see what Richi thinks too) to represent
> the conditional calls as a call to a new internal function, say
> IFN_COND_CALL or IFN_MASK_CALL, which would have the arguments the original
> call had, plus 2 extra ones first (or 3?), one that would be saved copy of
> original gimple_call_fn (i.e. usually &fndecl), another one that would be the
> condition (and dunno about whether we need also something to represent
> gimple_call_fntype, or whether we simply should punt during ifcvt
> on conditional calls where gimple_call_fntype is incompatible with
> the function type of fndecl.  Another question is about
> gimple_call_chain.  Punt or copy it over to the ifn and back.

The attached should resolve these issues.

OK for mainline?

Andrew

[-- Attachment #2: 221130-inbranch-simd-clones.patch --]
[-- Type: text/plain, Size: 37207 bytes --]

vect: inbranch SIMD clones

There has been support for generating "inbranch" SIMD clones for a long time,
but nothing actually uses them (as far as I can see).

This patch add supports for a sub-set of possible cases (those using
mask_mode == VOIDmode).  The other cases fail to vectorize, just as before,
so there should be no regressions.

The sub-set of support should cover all cases needed by amdgcn, at present.

gcc/ChangeLog:

	* internal-fn.cc (expand_MASK_CALL): New.
	* internal-fn.def (MASK_CALL): New.
	* internal-fn.h (expand_MASK_CALL): New prototype.
	* omp-simd-clone.cc (simd_clone_adjust_argument_types): Set vector_type
	for mask arguments also.
	* tree-if-conv.cc: Include cgraph.h.
	(if_convertible_stmt_p): Do if conversions for calls to SIMD calls.
	(predicate_statements): Convert functions to IFN_MASK_CALL.
	* tree-vect-loop.cc (vect_get_datarefs_in_loop): Recognise
	IFN_MASK_CALL as a SIMD function call.
	* tree-vect-stmts.cc (vectorizable_simd_clone_call): Handle
	IFN_MASK_CALL as an inbranch SIMD function call.
	Generate the mask vector arguments.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/vect-simd-clone-16.c: New test.
	* gcc.dg/vect/vect-simd-clone-16b.c: New test.
	* gcc.dg/vect/vect-simd-clone-16c.c: New test.
	* gcc.dg/vect/vect-simd-clone-16d.c: New test.
	* gcc.dg/vect/vect-simd-clone-16e.c: New test.
	* gcc.dg/vect/vect-simd-clone-16f.c: New test.
	* gcc.dg/vect/vect-simd-clone-17.c: New test.
	* gcc.dg/vect/vect-simd-clone-17b.c: New test.
	* gcc.dg/vect/vect-simd-clone-17c.c: New test.
	* gcc.dg/vect/vect-simd-clone-17d.c: New test.
	* gcc.dg/vect/vect-simd-clone-17e.c: New test.
	* gcc.dg/vect/vect-simd-clone-17f.c: New test.
	* gcc.dg/vect/vect-simd-clone-18.c: New test.
	* gcc.dg/vect/vect-simd-clone-18b.c: New test.
	* gcc.dg/vect/vect-simd-clone-18c.c: New test.
	* gcc.dg/vect/vect-simd-clone-18d.c: New test.
	* gcc.dg/vect/vect-simd-clone-18e.c: New test.
	* gcc.dg/vect/vect-simd-clone-18f.c: New test.

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 9471f543191..d9e11bfc62a 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -4527,3 +4527,10 @@ void
 expand_ASSUME (internal_fn, gcall *)
 {
 }
+
+void
+expand_MASK_CALL (internal_fn, gcall *)
+{
+  /* This IFN should only exist between ifcvt and vect passes.  */
+  gcc_unreachable ();
+}
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 61516dab66d..301c3780659 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -466,6 +466,9 @@ DEF_INTERNAL_FN (TRAP, ECF_CONST | ECF_LEAF | ECF_NORETURN
 DEF_INTERNAL_FN (ASSUME, ECF_CONST | ECF_LEAF | ECF_NOTHROW
 			 | ECF_LOOPING_CONST_OR_PURE, NULL)
 
+/* For if-conversion of inbranch SIMD clones.  */
+DEF_INTERNAL_FN (MASK_CALL, ECF_NOVOPS, NULL)
+
 #undef DEF_INTERNAL_INT_FN
 #undef DEF_INTERNAL_FLT_FN
 #undef DEF_INTERNAL_FLT_FLOATN_FN
diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
index 21b1ce43df6..ced92c041bb 100644
--- a/gcc/internal-fn.h
+++ b/gcc/internal-fn.h
@@ -244,6 +244,7 @@ extern void expand_SHUFFLEVECTOR (internal_fn, gcall *);
 extern void expand_SPACESHIP (internal_fn, gcall *);
 extern void expand_TRAP (internal_fn, gcall *);
 extern void expand_ASSUME (internal_fn, gcall *);
+extern void expand_MASK_CALL (internal_fn, gcall *);
 
 extern bool vectorized_internal_fn_supported_p (internal_fn, tree);
 
diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
index 21d69aa8747..afb7d99747b 100644
--- a/gcc/omp-simd-clone.cc
+++ b/gcc/omp-simd-clone.cc
@@ -937,6 +937,7 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 	}
       sc->args[i].orig_type = base_type;
       sc->args[i].arg_type = SIMD_CLONE_ARG_TYPE_MASK;
+      sc->args[i].vector_type = adj.type;
     }
 
   if (node->definition)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c
new file mode 100644
index 00000000000..ffaabb30d1e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c
@@ -0,0 +1,89 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+/* Test that simd inbranch clones work correctly.  */
+
+#ifndef TYPE
+#define TYPE int
+#endif
+
+/* A simple function that will be cloned.  */
+#pragma omp declare simd
+TYPE __attribute__((noinline))
+foo (TYPE a)
+{
+  return a + 1;
+}
+
+/* Check that "inbranch" clones are called correctly.  */
+
+void __attribute__((noinline))
+masked (TYPE * __restrict a, TYPE * __restrict b, int size)
+{
+  #pragma omp simd
+  for (int i = 0; i < size; i++)
+    b[i] = a[i]<1 ? foo(a[i]) : a[i];
+}
+
+/* Check that "inbranch" works when there might be unrolling.  */
+
+void __attribute__((noinline))
+masked_fixed (TYPE * __restrict a, TYPE * __restrict b)
+{
+  #pragma omp simd
+  for (int i = 0; i < 128; i++)
+    b[i] = a[i]<1 ? foo(a[i]) : a[i];
+}
+
+/* Validate the outputs.  */
+
+void
+check_masked (TYPE *b, int size)
+{
+  for (int i = 0; i < size; i++)
+    if (((TYPE)i < 1 && b[i] != (TYPE)(i + 1))
+	|| ((TYPE)i >= 1 && b[i] != (TYPE)i))
+      {
+	__builtin_printf ("error at %d\n", i);
+	__builtin_exit (1);
+      }
+}
+
+int
+main ()
+{
+  TYPE a[1024];
+  TYPE b[1024];
+
+  for (int i = 0; i < 1024; i++)
+    a[i] = i;
+
+  masked_fixed (a, b);
+  check_masked (b, 128);
+
+  /* Test various sizes to cover machines with different vectorization
+     factors.  */
+  for (int size = 8; size <= 1024; size *= 2)
+    {
+      masked (a, b, size);
+      check_masked (b, size);
+    }
+
+  /* Test sizes that might exercise the partial vector code-path.  */
+  for (int size = 8; size <= 1024; size *= 2)
+    {
+      masked (a, b, size-4);
+      check_masked (b, size-4);
+    }
+
+  return 0;
+}
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16b.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16b.c
new file mode 100644
index 00000000000..a503ef85238
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16b.c
@@ -0,0 +1,14 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE float
+#include "vect-simd-clone-16.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 19 "optimized" { target x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16c.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16c.c
new file mode 100644
index 00000000000..6563879df71
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16c.c
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE short
+#include "vect-simd-clone-16.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
+
+/* Fails to use in-branch clones for TYPE=short.  */
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 16 "optimized" { target x86_64-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16d.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16d.c
new file mode 100644
index 00000000000..6c5e69482e5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16d.c
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE char
+#include "vect-simd-clone-16.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
+
+/* Fails to use in-branch clones for TYPE=char.  */
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 16 "optimized" { target x86_64-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16e.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16e.c
new file mode 100644
index 00000000000..6690844deae
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16e.c
@@ -0,0 +1,14 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE double
+#include "vect-simd-clone-16.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 19 "optimized" { target x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c
new file mode 100644
index 00000000000..e7b35a6a2dc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE __INT64_TYPE__
+#include "vect-simd-clone-16.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
+
+/* Fails to use in-branch clones for TYPE=int64.  */
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target x86_64-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17.c
new file mode 100644
index 00000000000..6f5d374a417
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17.c
@@ -0,0 +1,89 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+/* Test that simd inbranch clones work correctly.  */
+
+#ifndef TYPE
+#define TYPE int
+#endif
+
+/* A simple function that will be cloned.  */
+#pragma omp declare simd uniform(b)
+TYPE __attribute__((noinline))
+foo (TYPE a, TYPE b)
+{
+  return a + b;
+}
+
+/* Check that "inbranch" clones are called correctly.  */
+
+void __attribute__((noinline))
+masked (TYPE * __restrict a, TYPE * __restrict b, int size)
+{
+  #pragma omp simd
+  for (int i = 0; i < size; i++)
+    b[i] = a[i]<1 ? foo(a[i], 1) : a[i];
+}
+
+/* Check that "inbranch" works when there might be unrolling.  */
+
+void __attribute__((noinline))
+masked_fixed (TYPE * __restrict a, TYPE * __restrict b)
+{
+  #pragma omp simd
+  for (int i = 0; i < 128; i++)
+    b[i] = a[i]<1 ? foo(a[i], 1) : a[i];
+}
+
+/* Validate the outputs.  */
+
+void
+check_masked (TYPE *b, int size)
+{
+  for (int i = 0; i < size; i++)
+    if (((TYPE)i < 1 && b[i] != (TYPE)(i + 1))
+	|| ((TYPE)i >= 1 && b[i] != (TYPE)i))
+      {
+	__builtin_printf ("error at %d\n", i);
+	__builtin_exit (1);
+      }
+}
+
+int
+main ()
+{
+  TYPE a[1024];
+  TYPE b[1024];
+
+  for (int i = 0; i < 1024; i++)
+    a[i] = i;
+
+  masked_fixed (a, b);
+  check_masked (b, 128);
+
+  /* Test various sizes to cover machines with different vectorization
+     factors.  */
+  for (int size = 8; size <= 1024; size *= 2)
+    {
+      masked (a, b, size);
+      check_masked (b, size);
+    }
+
+  /* Test sizes that might exercise the partial vector code-path.  */
+  for (int size = 8; size <= 1024; size *= 2)
+    {
+      masked (a, b, size-4);
+      check_masked (b, size-4);
+    }
+
+  return 0;
+}
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17b.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17b.c
new file mode 100644
index 00000000000..1e2c3ab11b3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17b.c
@@ -0,0 +1,14 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE float
+#include "vect-simd-clone-17.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 19 "optimized" { target x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17c.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17c.c
new file mode 100644
index 00000000000..007001de669
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17c.c
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE short
+#include "vect-simd-clone-17.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
+
+/* Fails to use in-branch clones for TYPE=short.  */
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 16 "optimized" { target x86_64-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17d.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17d.c
new file mode 100644
index 00000000000..abb85a4ceee
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17d.c
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE char
+#include "vect-simd-clone-17.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
+
+/* Fails to use in-branch clones for TYPE=char.  */
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 16 "optimized" { target x86_64-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17e.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17e.c
new file mode 100644
index 00000000000..2c1d8a659bd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17e.c
@@ -0,0 +1,14 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE double
+#include "vect-simd-clone-17.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 19 "optimized" { target x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c
new file mode 100644
index 00000000000..582e690304f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE __INT64_TYPE__
+#include "vect-simd-clone-17.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
+
+/* Fails to use in-branch clones for TYPE=int64.  */
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target x86_64-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18.c
new file mode 100644
index 00000000000..750a3f92b62
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18.c
@@ -0,0 +1,89 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+/* Test that simd inbranch clones work correctly.  */
+
+#ifndef TYPE
+#define TYPE int
+#endif
+
+/* A simple function that will be cloned.  */
+#pragma omp declare simd uniform(b)
+TYPE __attribute__((noinline))
+foo (TYPE b, TYPE a)
+{
+  return a + b;
+}
+
+/* Check that "inbranch" clones are called correctly.  */
+
+void __attribute__((noinline))
+masked (TYPE * __restrict a, TYPE * __restrict b, int size)
+{
+  #pragma omp simd
+  for (int i = 0; i < size; i++)
+    b[i] = a[i]<1 ? foo(1, a[i]) : a[i];
+}
+
+/* Check that "inbranch" works when there might be unrolling.  */
+
+void __attribute__((noinline))
+masked_fixed (TYPE * __restrict a, TYPE * __restrict b)
+{
+  #pragma omp simd
+  for (int i = 0; i < 128; i++)
+    b[i] = a[i]<1 ? foo(1, a[i]) : a[i];
+}
+
+/* Validate the outputs.  */
+
+void
+check_masked (TYPE *b, int size)
+{
+  for (int i = 0; i < size; i++)
+    if (((TYPE)i < 1 && b[i] != (TYPE)(i + 1))
+	|| ((TYPE)i >= 1 && b[i] != (TYPE)i))
+      {
+	__builtin_printf ("error at %d\n", i);
+	__builtin_exit (1);
+      }
+}
+
+int
+main ()
+{
+  TYPE a[1024];
+  TYPE b[1024];
+
+  for (int i = 0; i < 1024; i++)
+    a[i] = i;
+
+  masked_fixed (a, b);
+  check_masked (b, 128);
+
+  /* Test various sizes to cover machines with different vectorization
+     factors.  */
+  for (int size = 8; size <= 1024; size *= 2)
+    {
+      masked (a, b, size);
+      check_masked (b, size);
+    }
+
+  /* Test sizes that might exercise the partial vector code-path.  */
+  for (int size = 8; size <= 1024; size *= 2)
+    {
+      masked (a, b, size-4);
+      check_masked (b, size-4);
+    }
+
+  return 0;
+}
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18b.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18b.c
new file mode 100644
index 00000000000..a77ccf3bfcc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18b.c
@@ -0,0 +1,14 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE float
+#include "vect-simd-clone-18.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 19 "optimized" { target x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18c.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18c.c
new file mode 100644
index 00000000000..bee5f338abe
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18c.c
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE short
+#include "vect-simd-clone-18.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
+
+/* Fails to use in-branch clones for TYPE=short.  */
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 16 "optimized" { target x86_64-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18d.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18d.c
new file mode 100644
index 00000000000..a749edefdd7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18d.c
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE char
+#include "vect-simd-clone-18.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
+
+/* Fails to use in-branch clones for TYPE=char.  */
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 16 "optimized" { target x86_64-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18e.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18e.c
new file mode 100644
index 00000000000..061e0dc2621
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18e.c
@@ -0,0 +1,14 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE double
+#include "vect-simd-clone-18.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 19 "optimized" { target x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c
new file mode 100644
index 00000000000..a3037f5809a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE __INT64_TYPE__
+#include "vect-simd-clone-18.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
+
+/* Fails to use in-branch clones for TYPE=int64.  */
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target x86_64-*-* } } } */
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 64b20b4a9e1..127ee873d25 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -123,6 +123,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-dse.h"
 #include "tree-vectorizer.h"
 #include "tree-eh.h"
+#include "cgraph.h"
 
 /* For lang_hooks.types.type_for_mode.  */
 #include "langhooks.h"
@@ -1063,7 +1064,8 @@ if_convertible_gimple_assign_stmt_p (gimple *stmt,
    A statement is if-convertible if:
    - it is an if-convertible GIMPLE_ASSIGN,
    - it is a GIMPLE_LABEL or a GIMPLE_COND,
-   - it is builtins call.  */
+   - it is builtins call.
+   - it is a call to a function with a SIMD clone.  */
 
 static bool
 if_convertible_stmt_p (gimple *stmt, vec<data_reference_p> refs)
@@ -1083,13 +1085,23 @@ if_convertible_stmt_p (gimple *stmt, vec<data_reference_p> refs)
 	tree fndecl = gimple_call_fndecl (stmt);
 	if (fndecl)
 	  {
+	    /* We can vectorize some builtins and functions with SIMD
+	       "inbranch" clones.  */
 	    int flags = gimple_call_flags (stmt);
+	    struct cgraph_node *node = cgraph_node::get (fndecl);
 	    if ((flags & ECF_CONST)
 		&& !(flags & ECF_LOOPING_CONST_OR_PURE)
-		/* We can only vectorize some builtins at the moment,
-		   so restrict if-conversion to those.  */
 		&& fndecl_built_in_p (fndecl))
 	      return true;
+	    else if (node && node->simd_clones != NULL)
+	      /* Ensure that at least one clone can be "inbranch".  */
+	      for (struct cgraph_node *n = node->simd_clones; n != NULL;
+		   n = n->simdclone->next_clone)
+		if (n->simdclone->inbranch)
+		  {
+		    need_to_predicate = true;
+		    return true;
+		  }
 	  }
 	return false;
       }
@@ -2603,6 +2615,29 @@ predicate_statements (loop_p loop)
 	      gimple_assign_set_rhs1 (stmt, ifc_temp_var (type, rhs, &gsi));
 	      update_stmt (stmt);
 	    }
+
+	  /* Convert functions that have a SIMD clone to IFN_MASK_CALL.  This
+	     will cause the vectorizer to match the "in branch" clone variants,
+	     and serves to build the mask vector in a natural way.  */
+	  gcall *call = dyn_cast <gcall *> (gsi_stmt (gsi));
+	  if (call && !gimple_call_internal_p (call))
+	    {
+	      tree orig_fn = gimple_call_fn (call);
+	      int orig_nargs = gimple_call_num_args (call);
+	      auto_vec<tree> args;
+	      args.safe_push (orig_fn);
+	      for (int i=0; i < orig_nargs; i++)
+		args.safe_push (gimple_call_arg (call, i));
+	      args.safe_push (cond);
+
+	      /* Replace the call with a IFN_MASK_CALL that has the extra
+		 condition parameter. */
+	      gcall *new_call = gimple_build_call_internal_vec (IFN_MASK_CALL,
+								args);
+	      gimple_call_set_lhs (new_call, gimple_call_lhs (call));
+	      gsi_replace (&gsi, new_call, true);
+	    }
+
 	  lhs = gimple_get_lhs (gsi_stmt (gsi));
 	  if (lhs && TREE_CODE (lhs) == SSA_NAME)
 	    ssa_names.add (lhs);
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index aacbb12580c..60efd4d7525 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -2121,6 +2121,15 @@ vect_get_datarefs_in_loop (loop_p loop, basic_block *bbs,
 	    if (is_gimple_call (stmt) && loop->safelen)
 	      {
 		tree fndecl = gimple_call_fndecl (stmt), op;
+		if (fndecl == NULL_TREE
+		    && gimple_call_internal_p (stmt)
+		    && gimple_call_internal_fn (stmt) == IFN_MASK_CALL)
+		  {
+		    fndecl = gimple_call_arg (stmt, 0);
+		    gcc_checking_assert (TREE_CODE (fndecl) == ADDR_EXPR);
+		    fndecl = TREE_OPERAND (fndecl, 0);
+		    gcc_checking_assert (TREE_CODE (fndecl) == FUNCTION_DECL);
+		  }
 		if (fndecl != NULL_TREE)
 		  {
 		    cgraph_node *node = cgraph_node::get (fndecl);
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 5485da58b38..0310c80c79d 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -3987,6 +3987,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
   size_t i, nargs;
   tree lhs, rtype, ratype;
   vec<constructor_elt, va_gc> *ret_ctor_elts = NULL;
+  int arg_offset = 0;
 
   /* Is STMT a vectorizable call?   */
   gcall *stmt = dyn_cast <gcall *> (stmt_info->stmt);
@@ -3994,6 +3995,16 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
     return false;
 
   fndecl = gimple_call_fndecl (stmt);
+  if (fndecl == NULL_TREE
+      && gimple_call_internal_p (stmt)
+      && gimple_call_internal_fn (stmt) == IFN_MASK_CALL)
+    {
+      fndecl = gimple_call_arg (stmt, 0);
+      gcc_checking_assert (TREE_CODE (fndecl) == ADDR_EXPR);
+      fndecl = TREE_OPERAND (fndecl, 0);
+      gcc_checking_assert (TREE_CODE (fndecl) == FUNCTION_DECL);
+      arg_offset = 1;
+    }
   if (fndecl == NULL_TREE)
     return false;
 
@@ -4024,7 +4035,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
     return false;
 
   /* Process function arguments.  */
-  nargs = gimple_call_num_args (stmt);
+  nargs = gimple_call_num_args (stmt) - arg_offset;
 
   /* Bail out if the function has zero arguments.  */
   if (nargs == 0)
@@ -4042,7 +4053,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
       thisarginfo.op = NULL_TREE;
       thisarginfo.simd_lane_linear = false;
 
-      op = gimple_call_arg (stmt, i);
+      op = gimple_call_arg (stmt, i + arg_offset);
       if (!vect_is_simple_use (op, vinfo, &thisarginfo.dt,
 			       &thisarginfo.vectype)
 	  || thisarginfo.dt == vect_uninitialized_def)
@@ -4057,16 +4068,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	  || thisarginfo.dt == vect_external_def)
 	gcc_assert (thisarginfo.vectype == NULL_TREE);
       else
-	{
-	  gcc_assert (thisarginfo.vectype != NULL_TREE);
-	  if (VECTOR_BOOLEAN_TYPE_P (thisarginfo.vectype))
-	    {
-	      if (dump_enabled_p ())
-		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-				 "vector mask arguments are not supported\n");
-	      return false;
-	    }
-	}
+	gcc_assert (thisarginfo.vectype != NULL_TREE);
 
       /* For linear arguments, the analyze phase should have saved
 	 the base and step in STMT_VINFO_SIMD_CLONE_INFO.  */
@@ -4159,9 +4161,6 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	if (target_badness < 0)
 	  continue;
 	this_badness += target_badness * 512;
-	/* FORNOW: Have to add code to add the mask argument.  */
-	if (n->simdclone->inbranch)
-	  continue;
 	for (i = 0; i < nargs; i++)
 	  {
 	    switch (n->simdclone->args[i].arg_type)
@@ -4169,7 +4168,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	      case SIMD_CLONE_ARG_TYPE_VECTOR:
 		if (!useless_type_conversion_p
 			(n->simdclone->args[i].orig_type,
-			 TREE_TYPE (gimple_call_arg (stmt, i))))
+			 TREE_TYPE (gimple_call_arg (stmt, i + arg_offset))))
 		  i = -1;
 		else if (arginfo[i].dt == vect_constant_def
 			 || arginfo[i].dt == vect_external_def
@@ -4199,7 +4198,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 		i = -1;
 		break;
 	      case SIMD_CLONE_ARG_TYPE_MASK:
-		gcc_unreachable ();
+		break;
 	      }
 	    if (i == (size_t) -1)
 	      break;
@@ -4225,18 +4224,55 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
     return false;
 
   for (i = 0; i < nargs; i++)
-    if ((arginfo[i].dt == vect_constant_def
-	 || arginfo[i].dt == vect_external_def)
-	&& bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR)
-      {
-	tree arg_type = TREE_TYPE (gimple_call_arg (stmt, i));
-	arginfo[i].vectype = get_vectype_for_scalar_type (vinfo, arg_type,
-							  slp_node);
-	if (arginfo[i].vectype == NULL
-	    || !constant_multiple_p (bestn->simdclone->simdlen,
-				     simd_clone_subparts (arginfo[i].vectype)))
+    {
+      if ((arginfo[i].dt == vect_constant_def
+	   || arginfo[i].dt == vect_external_def)
+	  && bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR)
+	{
+	  tree arg_type = TREE_TYPE (gimple_call_arg (stmt, i + arg_offset));
+	  arginfo[i].vectype = get_vectype_for_scalar_type (vinfo, arg_type,
+							    slp_node);
+	  if (arginfo[i].vectype == NULL
+	      || !constant_multiple_p (bestn->simdclone->simdlen,
+				       simd_clone_subparts (arginfo[i].vectype)))
+	    return false;
+	}
+
+      if (bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR
+	  && VECTOR_BOOLEAN_TYPE_P (bestn->simdclone->args[i].vector_type))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			     "vector mask arguments are not supported.\n");
 	  return false;
-      }
+	}
+
+      if (bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_MASK
+	  && bestn->simdclone->mask_mode == VOIDmode
+	  && (simd_clone_subparts (bestn->simdclone->args[i].vector_type)
+	      != simd_clone_subparts (arginfo[i].vectype)))
+	{
+	  /* FORNOW we only have partial support for vector-type masks that
+	     can't hold all of simdlen. */
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION,
+			     vect_location,
+			     "in-branch vector clones are not yet"
+			     " supported for mismatched vector sizes.\n");
+	  return false;
+	}
+      if (bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_MASK
+	  && bestn->simdclone->mask_mode != VOIDmode)
+	{
+	  /* FORNOW don't support integer-type masks.  */
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION,
+			     vect_location,
+			     "in-branch vector clones are not yet"
+			     " supported for integer mask modes.\n");
+	  return false;
+	}
+    }
 
   fndecl = bestn->decl;
   nunits = bestn->simdclone->simdlen;
@@ -4326,7 +4362,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	{
 	  unsigned int k, l, m, o;
 	  tree atype;
-	  op = gimple_call_arg (stmt, i);
+	  op = gimple_call_arg (stmt, i + arg_offset);
 	  switch (bestn->simdclone->args[i].arg_type)
 	    {
 	    case SIMD_CLONE_ARG_TYPE_VECTOR:
@@ -4425,6 +4461,65 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 		    }
 		}
 	      break;
+	    case SIMD_CLONE_ARG_TYPE_MASK:
+	      atype = bestn->simdclone->args[i].vector_type;
+	      if (bestn->simdclone->mask_mode != VOIDmode)
+		{
+		  /* FORNOW: this is disabled above.  */
+		  gcc_unreachable ();
+		}
+	      else
+		{
+		  tree elt_type = TREE_TYPE (atype);
+		  tree one = fold_convert (elt_type, integer_one_node);
+		  tree zero = fold_convert (elt_type, integer_zero_node);
+		  o = vector_unroll_factor (nunits,
+					    simd_clone_subparts (atype));
+		  for (m = j * o; m < (j + 1) * o; m++)
+		    {
+		      if (simd_clone_subparts (atype)
+			  < simd_clone_subparts (arginfo[i].vectype))
+			{
+			  /* The mask type has fewer elements than simdlen.  */
+
+			  /* FORNOW */
+			  gcc_unreachable ();
+			}
+		      else if (simd_clone_subparts (atype)
+			       == simd_clone_subparts (arginfo[i].vectype))
+			{
+			  /* The SIMD clone function has the same number of
+			     elements as the current function.  */
+			  if (m == 0)
+			    {
+			      vect_get_vec_defs_for_operand (vinfo, stmt_info,
+							     o * ncopies,
+							     op,
+							     &vec_oprnds[i]);
+			      vec_oprnds_i[i] = 0;
+			    }
+			  vec_oprnd0 = vec_oprnds[i][vec_oprnds_i[i]++];
+			  vec_oprnd0
+			    = build3 (VEC_COND_EXPR, atype, vec_oprnd0,
+				      build_vector_from_val (atype, one),
+				      build_vector_from_val (atype, zero));
+			  gassign *new_stmt
+			    = gimple_build_assign (make_ssa_name (atype),
+						   vec_oprnd0);
+			  vect_finish_stmt_generation (vinfo, stmt_info,
+						       new_stmt, gsi);
+			  vargs.safe_push (gimple_assign_lhs (new_stmt));
+			}
+		      else
+			{
+			  /* The mask type has more elements than simdlen.  */
+
+			  /* FORNOW */
+			  gcc_unreachable ();
+			}
+		    }
+		}
+	      break;
 	    case SIMD_CLONE_ARG_TYPE_UNIFORM:
 	      vargs.safe_push (op);
 	      break;

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] vect: inbranch SIMD clones
  2022-11-30 15:17     ` Andrew Stubbs
@ 2022-11-30 15:37       ` Jakub Jelinek
  2022-12-01 13:35         ` Andrew Stubbs
  0 siblings, 1 reply; 21+ messages in thread
From: Jakub Jelinek @ 2022-11-30 15:37 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: Richard Biener, gcc-patches

On Wed, Nov 30, 2022 at 03:17:30PM +0000, Andrew Stubbs wrote:
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c
> @@ -0,0 +1,89 @@
> +/* { dg-require-effective-target vect_simd_clones } */
> +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
> +/* { dg-additional-options "-mavx" { target avx_runtime } } */
...
> +/* Ensure the the in-branch simd clones are used on targets that support
> +   them.  These counts include all call and definitions.  */
> +
> +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */

Maybe better add -ffat-lto-objects to dg-additional-options and drop
the dg-skip-if (if it works with that, for all similar tests)?

> @@ -1063,7 +1064,8 @@ if_convertible_gimple_assign_stmt_p (gimple *stmt,
>     A statement is if-convertible if:
>     - it is an if-convertible GIMPLE_ASSIGN,
>     - it is a GIMPLE_LABEL or a GIMPLE_COND,
> -   - it is builtins call.  */
> +   - it is builtins call.

s/call\./call,/ above

> +   - it is a call to a function with a SIMD clone.  */
>  
>  static bool
>  if_convertible_stmt_p (gimple *stmt, vec<data_reference_p> refs)
> @@ -1083,13 +1085,23 @@ if_convertible_stmt_p (gimple *stmt, vec<data_reference_p> refs)
>  	tree fndecl = gimple_call_fndecl (stmt);
>  	if (fndecl)
>  	  {
> +	    /* We can vectorize some builtins and functions with SIMD
> +	       "inbranch" clones.  */
>  	    int flags = gimple_call_flags (stmt);
> +	    struct cgraph_node *node = cgraph_node::get (fndecl);
>  	    if ((flags & ECF_CONST)
>  		&& !(flags & ECF_LOOPING_CONST_OR_PURE)
> -		/* We can only vectorize some builtins at the moment,
> -		   so restrict if-conversion to those.  */
>  		&& fndecl_built_in_p (fndecl))
>  	      return true;
> +	    else if (node && node->simd_clones != NULL)

I don't see much value in the "else " above, the if branch returns
if condition is true, so just
	    if (node && node->simd_clones != NULL)
would do it.

> +	      /* Ensure that at least one clone can be "inbranch".  */
> +	      for (struct cgraph_node *n = node->simd_clones; n != NULL;
> +		   n = n->simdclone->next_clone)
> +		if (n->simdclone->inbranch)
> +		  {
> +		    need_to_predicate = true;
> +		    return true;
> +		  }
>  	  }
>  	return false;
>        }
> @@ -2603,6 +2615,29 @@ predicate_statements (loop_p loop)
>  	      gimple_assign_set_rhs1 (stmt, ifc_temp_var (type, rhs, &gsi));
>  	      update_stmt (stmt);
>  	    }
> +
> +	  /* Convert functions that have a SIMD clone to IFN_MASK_CALL.  This
> +	     will cause the vectorizer to match the "in branch" clone variants,
> +	     and serves to build the mask vector in a natural way.  */
> +	  gcall *call = dyn_cast <gcall *> (gsi_stmt (gsi));
> +	  if (call && !gimple_call_internal_p (call))
> +	    {
> +	      tree orig_fn = gimple_call_fn (call);
> +	      int orig_nargs = gimple_call_num_args (call);
> +	      auto_vec<tree> args;
> +	      args.safe_push (orig_fn);
> +	      for (int i=0; i < orig_nargs; i++)

Formatting - int i = 0;

> +		args.safe_push (gimple_call_arg (call, i));
> +	      args.safe_push (cond);
> +
> +	      /* Replace the call with a IFN_MASK_CALL that has the extra
> +		 condition parameter. */
> +	      gcall *new_call = gimple_build_call_internal_vec (IFN_MASK_CALL,
> +								args);
> +	      gimple_call_set_lhs (new_call, gimple_call_lhs (call));
> +	      gsi_replace (&gsi, new_call, true);
> +	    }
> +
>  	  lhs = gimple_get_lhs (gsi_stmt (gsi));
>  	  if (lhs && TREE_CODE (lhs) == SSA_NAME)
>  	    ssa_names.add (lhs);
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -3987,6 +3987,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
>    size_t i, nargs;
>    tree lhs, rtype, ratype;
>    vec<constructor_elt, va_gc> *ret_ctor_elts = NULL;
> +  int arg_offset = 0;
>  
>    /* Is STMT a vectorizable call?   */
>    gcall *stmt = dyn_cast <gcall *> (stmt_info->stmt);
> @@ -3994,6 +3995,16 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
>      return false;
>  
>    fndecl = gimple_call_fndecl (stmt);
> +  if (fndecl == NULL_TREE
> +      && gimple_call_internal_p (stmt)
> +      && gimple_call_internal_fn (stmt) == IFN_MASK_CALL)

Replace the above 2 lines with
      && gimple_call_internal_p (stmt, IFN_MASK_CALL))
?

> --- a/gcc/tree-vect-loop.cc                                                                                                                                                           
> +++ b/gcc/tree-vect-loop.cc                                                                                                                                                           
> @@ -2121,6 +2121,15 @@ vect_get_datarefs_in_loop (loop_p loop, basic_block *bbs,                                                                                                      
>             if (is_gimple_call (stmt) && loop->safelen)                                                                                                                               
>               {                                                                                                                                                                       
>                 tree fndecl = gimple_call_fndecl (stmt), op;                                                                                                                          
> +               if (fndecl == NULL_TREE                                                                                                                                               
> +                   && gimple_call_internal_p (stmt)                                                                                                                                  
> +                   && gimple_call_internal_fn (stmt) == IFN_MASK_CALL)                                                                                                               

Similarly.

Otherwise LGTM.

	Jakub


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] vect: inbranch SIMD clones
  2022-11-30 15:37       ` Jakub Jelinek
@ 2022-12-01 13:35         ` Andrew Stubbs
  2022-12-01 14:16           ` Jakub Jelinek
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Stubbs @ 2022-12-01 13:35 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Richard Biener, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 957 bytes --]

On 30/11/2022 15:37, Jakub Jelinek wrote:
> On Wed, Nov 30, 2022 at 03:17:30PM +0000, Andrew Stubbs wrote:
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c
>> @@ -0,0 +1,89 @@
>> +/* { dg-require-effective-target vect_simd_clones } */
>> +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
>> +/* { dg-additional-options "-mavx" { target avx_runtime } } */
> ...
>> +/* Ensure the the in-branch simd clones are used on targets that support
>> +   them.  These counts include all call and definitions.  */
>> +
>> +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
> 
> Maybe better add -ffat-lto-objects to dg-additional-options and drop
> the dg-skip-if (if it works with that, for all similar tests)?

The tests are already run with -ffat-lto-objects and the test still 
fails (pattern found zero times). I don't know why.

Aside from that, I've made all the other changes you requested.

OK now?

Andrew

[-- Attachment #2: 221201-inbranch-simd-clones.patch --]
[-- Type: text/plain, Size: 37207 bytes --]

vect: inbranch SIMD clones

There has been support for generating "inbranch" SIMD clones for a long time,
but nothing actually uses them (as far as I can see).

This patch add supports for a sub-set of possible cases (those using
mask_mode == VOIDmode).  The other cases fail to vectorize, just as before,
so there should be no regressions.

The sub-set of support should cover all cases needed by amdgcn, at present.

gcc/ChangeLog:

	* internal-fn.cc (expand_MASK_CALL): New.
	* internal-fn.def (MASK_CALL): New.
	* internal-fn.h (expand_MASK_CALL): New prototype.
	* omp-simd-clone.cc (simd_clone_adjust_argument_types): Set vector_type
	for mask arguments also.
	* tree-if-conv.cc: Include cgraph.h.
	(if_convertible_stmt_p): Do if conversions for calls to SIMD calls.
	(predicate_statements): Convert functions to IFN_MASK_CALL.
	* tree-vect-loop.cc (vect_get_datarefs_in_loop): Recognise
	IFN_MASK_CALL as a SIMD function call.
	* tree-vect-stmts.cc (vectorizable_simd_clone_call): Handle
	IFN_MASK_CALL as an inbranch SIMD function call.
	Generate the mask vector arguments.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/vect-simd-clone-16.c: New test.
	* gcc.dg/vect/vect-simd-clone-16b.c: New test.
	* gcc.dg/vect/vect-simd-clone-16c.c: New test.
	* gcc.dg/vect/vect-simd-clone-16d.c: New test.
	* gcc.dg/vect/vect-simd-clone-16e.c: New test.
	* gcc.dg/vect/vect-simd-clone-16f.c: New test.
	* gcc.dg/vect/vect-simd-clone-17.c: New test.
	* gcc.dg/vect/vect-simd-clone-17b.c: New test.
	* gcc.dg/vect/vect-simd-clone-17c.c: New test.
	* gcc.dg/vect/vect-simd-clone-17d.c: New test.
	* gcc.dg/vect/vect-simd-clone-17e.c: New test.
	* gcc.dg/vect/vect-simd-clone-17f.c: New test.
	* gcc.dg/vect/vect-simd-clone-18.c: New test.
	* gcc.dg/vect/vect-simd-clone-18b.c: New test.
	* gcc.dg/vect/vect-simd-clone-18c.c: New test.
	* gcc.dg/vect/vect-simd-clone-18d.c: New test.
	* gcc.dg/vect/vect-simd-clone-18e.c: New test.
	* gcc.dg/vect/vect-simd-clone-18f.c: New test.

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 9471f543191..d9e11bfc62a 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -4527,3 +4527,10 @@ void
 expand_ASSUME (internal_fn, gcall *)
 {
 }
+
+void
+expand_MASK_CALL (internal_fn, gcall *)
+{
+  /* This IFN should only exist between ifcvt and vect passes.  */
+  gcc_unreachable ();
+}
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 61516dab66d..301c3780659 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -466,6 +466,9 @@ DEF_INTERNAL_FN (TRAP, ECF_CONST | ECF_LEAF | ECF_NORETURN
 DEF_INTERNAL_FN (ASSUME, ECF_CONST | ECF_LEAF | ECF_NOTHROW
 			 | ECF_LOOPING_CONST_OR_PURE, NULL)
 
+/* For if-conversion of inbranch SIMD clones.  */
+DEF_INTERNAL_FN (MASK_CALL, ECF_NOVOPS, NULL)
+
 #undef DEF_INTERNAL_INT_FN
 #undef DEF_INTERNAL_FLT_FN
 #undef DEF_INTERNAL_FLT_FLOATN_FN
diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
index 21b1ce43df6..ced92c041bb 100644
--- a/gcc/internal-fn.h
+++ b/gcc/internal-fn.h
@@ -244,6 +244,7 @@ extern void expand_SHUFFLEVECTOR (internal_fn, gcall *);
 extern void expand_SPACESHIP (internal_fn, gcall *);
 extern void expand_TRAP (internal_fn, gcall *);
 extern void expand_ASSUME (internal_fn, gcall *);
+extern void expand_MASK_CALL (internal_fn, gcall *);
 
 extern bool vectorized_internal_fn_supported_p (internal_fn, tree);
 
diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
index 21d69aa8747..afb7d99747b 100644
--- a/gcc/omp-simd-clone.cc
+++ b/gcc/omp-simd-clone.cc
@@ -937,6 +937,7 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 	}
       sc->args[i].orig_type = base_type;
       sc->args[i].arg_type = SIMD_CLONE_ARG_TYPE_MASK;
+      sc->args[i].vector_type = adj.type;
     }
 
   if (node->definition)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c
new file mode 100644
index 00000000000..ffaabb30d1e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c
@@ -0,0 +1,89 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+/* Test that simd inbranch clones work correctly.  */
+
+#ifndef TYPE
+#define TYPE int
+#endif
+
+/* A simple function that will be cloned.  */
+#pragma omp declare simd
+TYPE __attribute__((noinline))
+foo (TYPE a)
+{
+  return a + 1;
+}
+
+/* Check that "inbranch" clones are called correctly.  */
+
+void __attribute__((noinline))
+masked (TYPE * __restrict a, TYPE * __restrict b, int size)
+{
+  #pragma omp simd
+  for (int i = 0; i < size; i++)
+    b[i] = a[i]<1 ? foo(a[i]) : a[i];
+}
+
+/* Check that "inbranch" works when there might be unrolling.  */
+
+void __attribute__((noinline))
+masked_fixed (TYPE * __restrict a, TYPE * __restrict b)
+{
+  #pragma omp simd
+  for (int i = 0; i < 128; i++)
+    b[i] = a[i]<1 ? foo(a[i]) : a[i];
+}
+
+/* Validate the outputs.  */
+
+void
+check_masked (TYPE *b, int size)
+{
+  for (int i = 0; i < size; i++)
+    if (((TYPE)i < 1 && b[i] != (TYPE)(i + 1))
+	|| ((TYPE)i >= 1 && b[i] != (TYPE)i))
+      {
+	__builtin_printf ("error at %d\n", i);
+	__builtin_exit (1);
+      }
+}
+
+int
+main ()
+{
+  TYPE a[1024];
+  TYPE b[1024];
+
+  for (int i = 0; i < 1024; i++)
+    a[i] = i;
+
+  masked_fixed (a, b);
+  check_masked (b, 128);
+
+  /* Test various sizes to cover machines with different vectorization
+     factors.  */
+  for (int size = 8; size <= 1024; size *= 2)
+    {
+      masked (a, b, size);
+      check_masked (b, size);
+    }
+
+  /* Test sizes that might exercise the partial vector code-path.  */
+  for (int size = 8; size <= 1024; size *= 2)
+    {
+      masked (a, b, size-4);
+      check_masked (b, size-4);
+    }
+
+  return 0;
+}
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16b.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16b.c
new file mode 100644
index 00000000000..a503ef85238
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16b.c
@@ -0,0 +1,14 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE float
+#include "vect-simd-clone-16.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 19 "optimized" { target x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16c.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16c.c
new file mode 100644
index 00000000000..6563879df71
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16c.c
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE short
+#include "vect-simd-clone-16.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
+
+/* Fails to use in-branch clones for TYPE=short.  */
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 16 "optimized" { target x86_64-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16d.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16d.c
new file mode 100644
index 00000000000..6c5e69482e5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16d.c
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE char
+#include "vect-simd-clone-16.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
+
+/* Fails to use in-branch clones for TYPE=char.  */
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 16 "optimized" { target x86_64-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16e.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16e.c
new file mode 100644
index 00000000000..6690844deae
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16e.c
@@ -0,0 +1,14 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE double
+#include "vect-simd-clone-16.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 19 "optimized" { target x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c
new file mode 100644
index 00000000000..e7b35a6a2dc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE __INT64_TYPE__
+#include "vect-simd-clone-16.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
+
+/* Fails to use in-branch clones for TYPE=int64.  */
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target x86_64-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17.c
new file mode 100644
index 00000000000..6f5d374a417
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17.c
@@ -0,0 +1,89 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+/* Test that simd inbranch clones work correctly.  */
+
+#ifndef TYPE
+#define TYPE int
+#endif
+
+/* A simple function that will be cloned.  */
+#pragma omp declare simd uniform(b)
+TYPE __attribute__((noinline))
+foo (TYPE a, TYPE b)
+{
+  return a + b;
+}
+
+/* Check that "inbranch" clones are called correctly.  */
+
+void __attribute__((noinline))
+masked (TYPE * __restrict a, TYPE * __restrict b, int size)
+{
+  #pragma omp simd
+  for (int i = 0; i < size; i++)
+    b[i] = a[i]<1 ? foo(a[i], 1) : a[i];
+}
+
+/* Check that "inbranch" works when there might be unrolling.  */
+
+void __attribute__((noinline))
+masked_fixed (TYPE * __restrict a, TYPE * __restrict b)
+{
+  #pragma omp simd
+  for (int i = 0; i < 128; i++)
+    b[i] = a[i]<1 ? foo(a[i], 1) : a[i];
+}
+
+/* Validate the outputs.  */
+
+void
+check_masked (TYPE *b, int size)
+{
+  for (int i = 0; i < size; i++)
+    if (((TYPE)i < 1 && b[i] != (TYPE)(i + 1))
+	|| ((TYPE)i >= 1 && b[i] != (TYPE)i))
+      {
+	__builtin_printf ("error at %d\n", i);
+	__builtin_exit (1);
+      }
+}
+
+int
+main ()
+{
+  TYPE a[1024];
+  TYPE b[1024];
+
+  for (int i = 0; i < 1024; i++)
+    a[i] = i;
+
+  masked_fixed (a, b);
+  check_masked (b, 128);
+
+  /* Test various sizes to cover machines with different vectorization
+     factors.  */
+  for (int size = 8; size <= 1024; size *= 2)
+    {
+      masked (a, b, size);
+      check_masked (b, size);
+    }
+
+  /* Test sizes that might exercise the partial vector code-path.  */
+  for (int size = 8; size <= 1024; size *= 2)
+    {
+      masked (a, b, size-4);
+      check_masked (b, size-4);
+    }
+
+  return 0;
+}
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17b.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17b.c
new file mode 100644
index 00000000000..1e2c3ab11b3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17b.c
@@ -0,0 +1,14 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE float
+#include "vect-simd-clone-17.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 19 "optimized" { target x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17c.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17c.c
new file mode 100644
index 00000000000..007001de669
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17c.c
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE short
+#include "vect-simd-clone-17.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
+
+/* Fails to use in-branch clones for TYPE=short.  */
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 16 "optimized" { target x86_64-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17d.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17d.c
new file mode 100644
index 00000000000..abb85a4ceee
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17d.c
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE char
+#include "vect-simd-clone-17.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
+
+/* Fails to use in-branch clones for TYPE=char.  */
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 16 "optimized" { target x86_64-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17e.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17e.c
new file mode 100644
index 00000000000..2c1d8a659bd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17e.c
@@ -0,0 +1,14 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE double
+#include "vect-simd-clone-17.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 19 "optimized" { target x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c
new file mode 100644
index 00000000000..582e690304f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE __INT64_TYPE__
+#include "vect-simd-clone-17.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
+
+/* Fails to use in-branch clones for TYPE=int64.  */
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target x86_64-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18.c
new file mode 100644
index 00000000000..750a3f92b62
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18.c
@@ -0,0 +1,89 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+/* Test that simd inbranch clones work correctly.  */
+
+#ifndef TYPE
+#define TYPE int
+#endif
+
+/* A simple function that will be cloned.  */
+#pragma omp declare simd uniform(b)
+TYPE __attribute__((noinline))
+foo (TYPE b, TYPE a)
+{
+  return a + b;
+}
+
+/* Check that "inbranch" clones are called correctly.  */
+
+void __attribute__((noinline))
+masked (TYPE * __restrict a, TYPE * __restrict b, int size)
+{
+  #pragma omp simd
+  for (int i = 0; i < size; i++)
+    b[i] = a[i]<1 ? foo(1, a[i]) : a[i];
+}
+
+/* Check that "inbranch" works when there might be unrolling.  */
+
+void __attribute__((noinline))
+masked_fixed (TYPE * __restrict a, TYPE * __restrict b)
+{
+  #pragma omp simd
+  for (int i = 0; i < 128; i++)
+    b[i] = a[i]<1 ? foo(1, a[i]) : a[i];
+}
+
+/* Validate the outputs.  */
+
+void
+check_masked (TYPE *b, int size)
+{
+  for (int i = 0; i < size; i++)
+    if (((TYPE)i < 1 && b[i] != (TYPE)(i + 1))
+	|| ((TYPE)i >= 1 && b[i] != (TYPE)i))
+      {
+	__builtin_printf ("error at %d\n", i);
+	__builtin_exit (1);
+      }
+}
+
+int
+main ()
+{
+  TYPE a[1024];
+  TYPE b[1024];
+
+  for (int i = 0; i < 1024; i++)
+    a[i] = i;
+
+  masked_fixed (a, b);
+  check_masked (b, 128);
+
+  /* Test various sizes to cover machines with different vectorization
+     factors.  */
+  for (int size = 8; size <= 1024; size *= 2)
+    {
+      masked (a, b, size);
+      check_masked (b, size);
+    }
+
+  /* Test sizes that might exercise the partial vector code-path.  */
+  for (int size = 8; size <= 1024; size *= 2)
+    {
+      masked (a, b, size-4);
+      check_masked (b, size-4);
+    }
+
+  return 0;
+}
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18b.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18b.c
new file mode 100644
index 00000000000..a77ccf3bfcc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18b.c
@@ -0,0 +1,14 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE float
+#include "vect-simd-clone-18.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 19 "optimized" { target x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18c.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18c.c
new file mode 100644
index 00000000000..bee5f338abe
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18c.c
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE short
+#include "vect-simd-clone-18.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
+
+/* Fails to use in-branch clones for TYPE=short.  */
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 16 "optimized" { target x86_64-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18d.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18d.c
new file mode 100644
index 00000000000..a749edefdd7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18d.c
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE char
+#include "vect-simd-clone-18.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
+
+/* Fails to use in-branch clones for TYPE=char.  */
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 16 "optimized" { target x86_64-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18e.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18e.c
new file mode 100644
index 00000000000..061e0dc2621
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18e.c
@@ -0,0 +1,14 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE double
+#include "vect-simd-clone-18.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 19 "optimized" { target x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c
new file mode 100644
index 00000000000..a3037f5809a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE __INT64_TYPE__
+#include "vect-simd-clone-18.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support
+   them.  These counts include all call and definitions.  */
+
+/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */
+/* TODO: aarch64 */
+
+/* Fails to use in-branch clones for TYPE=int64.  */
+/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
+/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target x86_64-*-* } } } */
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 64b20b4a9e1..127ee873d25 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -123,6 +123,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-dse.h"
 #include "tree-vectorizer.h"
 #include "tree-eh.h"
+#include "cgraph.h"
 
 /* For lang_hooks.types.type_for_mode.  */
 #include "langhooks.h"
@@ -1063,7 +1064,8 @@ if_convertible_gimple_assign_stmt_p (gimple *stmt,
    A statement is if-convertible if:
    - it is an if-convertible GIMPLE_ASSIGN,
    - it is a GIMPLE_LABEL or a GIMPLE_COND,
-   - it is builtins call.  */
+   - it is builtins call.
+   - it is a call to a function with a SIMD clone.  */
 
 static bool
 if_convertible_stmt_p (gimple *stmt, vec<data_reference_p> refs)
@@ -1083,13 +1085,23 @@ if_convertible_stmt_p (gimple *stmt, vec<data_reference_p> refs)
 	tree fndecl = gimple_call_fndecl (stmt);
 	if (fndecl)
 	  {
+	    /* We can vectorize some builtins and functions with SIMD
+	       "inbranch" clones.  */
 	    int flags = gimple_call_flags (stmt);
+	    struct cgraph_node *node = cgraph_node::get (fndecl);
 	    if ((flags & ECF_CONST)
 		&& !(flags & ECF_LOOPING_CONST_OR_PURE)
-		/* We can only vectorize some builtins at the moment,
-		   so restrict if-conversion to those.  */
 		&& fndecl_built_in_p (fndecl))
 	      return true;
+	    else if (node && node->simd_clones != NULL)
+	      /* Ensure that at least one clone can be "inbranch".  */
+	      for (struct cgraph_node *n = node->simd_clones; n != NULL;
+		   n = n->simdclone->next_clone)
+		if (n->simdclone->inbranch)
+		  {
+		    need_to_predicate = true;
+		    return true;
+		  }
 	  }
 	return false;
       }
@@ -2603,6 +2615,29 @@ predicate_statements (loop_p loop)
 	      gimple_assign_set_rhs1 (stmt, ifc_temp_var (type, rhs, &gsi));
 	      update_stmt (stmt);
 	    }
+
+	  /* Convert functions that have a SIMD clone to IFN_MASK_CALL.  This
+	     will cause the vectorizer to match the "in branch" clone variants,
+	     and serves to build the mask vector in a natural way.  */
+	  gcall *call = dyn_cast <gcall *> (gsi_stmt (gsi));
+	  if (call && !gimple_call_internal_p (call))
+	    {
+	      tree orig_fn = gimple_call_fn (call);
+	      int orig_nargs = gimple_call_num_args (call);
+	      auto_vec<tree> args;
+	      args.safe_push (orig_fn);
+	      for (int i=0; i < orig_nargs; i++)
+		args.safe_push (gimple_call_arg (call, i));
+	      args.safe_push (cond);
+
+	      /* Replace the call with a IFN_MASK_CALL that has the extra
+		 condition parameter. */
+	      gcall *new_call = gimple_build_call_internal_vec (IFN_MASK_CALL,
+								args);
+	      gimple_call_set_lhs (new_call, gimple_call_lhs (call));
+	      gsi_replace (&gsi, new_call, true);
+	    }
+
 	  lhs = gimple_get_lhs (gsi_stmt (gsi));
 	  if (lhs && TREE_CODE (lhs) == SSA_NAME)
 	    ssa_names.add (lhs);
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index aacbb12580c..60efd4d7525 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -2121,6 +2121,15 @@ vect_get_datarefs_in_loop (loop_p loop, basic_block *bbs,
 	    if (is_gimple_call (stmt) && loop->safelen)
 	      {
 		tree fndecl = gimple_call_fndecl (stmt), op;
+		if (fndecl == NULL_TREE
+		    && gimple_call_internal_p (stmt)
+		    && gimple_call_internal_fn (stmt) == IFN_MASK_CALL)
+		  {
+		    fndecl = gimple_call_arg (stmt, 0);
+		    gcc_checking_assert (TREE_CODE (fndecl) == ADDR_EXPR);
+		    fndecl = TREE_OPERAND (fndecl, 0);
+		    gcc_checking_assert (TREE_CODE (fndecl) == FUNCTION_DECL);
+		  }
 		if (fndecl != NULL_TREE)
 		  {
 		    cgraph_node *node = cgraph_node::get (fndecl);
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 5485da58b38..0310c80c79d 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -3987,6 +3987,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
   size_t i, nargs;
   tree lhs, rtype, ratype;
   vec<constructor_elt, va_gc> *ret_ctor_elts = NULL;
+  int arg_offset = 0;
 
   /* Is STMT a vectorizable call?   */
   gcall *stmt = dyn_cast <gcall *> (stmt_info->stmt);
@@ -3994,6 +3995,16 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
     return false;
 
   fndecl = gimple_call_fndecl (stmt);
+  if (fndecl == NULL_TREE
+      && gimple_call_internal_p (stmt)
+      && gimple_call_internal_fn (stmt) == IFN_MASK_CALL)
+    {
+      fndecl = gimple_call_arg (stmt, 0);
+      gcc_checking_assert (TREE_CODE (fndecl) == ADDR_EXPR);
+      fndecl = TREE_OPERAND (fndecl, 0);
+      gcc_checking_assert (TREE_CODE (fndecl) == FUNCTION_DECL);
+      arg_offset = 1;
+    }
   if (fndecl == NULL_TREE)
     return false;
 
@@ -4024,7 +4035,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
     return false;
 
   /* Process function arguments.  */
-  nargs = gimple_call_num_args (stmt);
+  nargs = gimple_call_num_args (stmt) - arg_offset;
 
   /* Bail out if the function has zero arguments.  */
   if (nargs == 0)
@@ -4042,7 +4053,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
       thisarginfo.op = NULL_TREE;
       thisarginfo.simd_lane_linear = false;
 
-      op = gimple_call_arg (stmt, i);
+      op = gimple_call_arg (stmt, i + arg_offset);
       if (!vect_is_simple_use (op, vinfo, &thisarginfo.dt,
 			       &thisarginfo.vectype)
 	  || thisarginfo.dt == vect_uninitialized_def)
@@ -4057,16 +4068,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	  || thisarginfo.dt == vect_external_def)
 	gcc_assert (thisarginfo.vectype == NULL_TREE);
       else
-	{
-	  gcc_assert (thisarginfo.vectype != NULL_TREE);
-	  if (VECTOR_BOOLEAN_TYPE_P (thisarginfo.vectype))
-	    {
-	      if (dump_enabled_p ())
-		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-				 "vector mask arguments are not supported\n");
-	      return false;
-	    }
-	}
+	gcc_assert (thisarginfo.vectype != NULL_TREE);
 
       /* For linear arguments, the analyze phase should have saved
 	 the base and step in STMT_VINFO_SIMD_CLONE_INFO.  */
@@ -4159,9 +4161,6 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	if (target_badness < 0)
 	  continue;
 	this_badness += target_badness * 512;
-	/* FORNOW: Have to add code to add the mask argument.  */
-	if (n->simdclone->inbranch)
-	  continue;
 	for (i = 0; i < nargs; i++)
 	  {
 	    switch (n->simdclone->args[i].arg_type)
@@ -4169,7 +4168,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	      case SIMD_CLONE_ARG_TYPE_VECTOR:
 		if (!useless_type_conversion_p
 			(n->simdclone->args[i].orig_type,
-			 TREE_TYPE (gimple_call_arg (stmt, i))))
+			 TREE_TYPE (gimple_call_arg (stmt, i + arg_offset))))
 		  i = -1;
 		else if (arginfo[i].dt == vect_constant_def
 			 || arginfo[i].dt == vect_external_def
@@ -4199,7 +4198,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 		i = -1;
 		break;
 	      case SIMD_CLONE_ARG_TYPE_MASK:
-		gcc_unreachable ();
+		break;
 	      }
 	    if (i == (size_t) -1)
 	      break;
@@ -4225,18 +4224,55 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
     return false;
 
   for (i = 0; i < nargs; i++)
-    if ((arginfo[i].dt == vect_constant_def
-	 || arginfo[i].dt == vect_external_def)
-	&& bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR)
-      {
-	tree arg_type = TREE_TYPE (gimple_call_arg (stmt, i));
-	arginfo[i].vectype = get_vectype_for_scalar_type (vinfo, arg_type,
-							  slp_node);
-	if (arginfo[i].vectype == NULL
-	    || !constant_multiple_p (bestn->simdclone->simdlen,
-				     simd_clone_subparts (arginfo[i].vectype)))
+    {
+      if ((arginfo[i].dt == vect_constant_def
+	   || arginfo[i].dt == vect_external_def)
+	  && bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR)
+	{
+	  tree arg_type = TREE_TYPE (gimple_call_arg (stmt, i + arg_offset));
+	  arginfo[i].vectype = get_vectype_for_scalar_type (vinfo, arg_type,
+							    slp_node);
+	  if (arginfo[i].vectype == NULL
+	      || !constant_multiple_p (bestn->simdclone->simdlen,
+				       simd_clone_subparts (arginfo[i].vectype)))
+	    return false;
+	}
+
+      if (bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR
+	  && VECTOR_BOOLEAN_TYPE_P (bestn->simdclone->args[i].vector_type))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			     "vector mask arguments are not supported.\n");
 	  return false;
-      }
+	}
+
+      if (bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_MASK
+	  && bestn->simdclone->mask_mode == VOIDmode
+	  && (simd_clone_subparts (bestn->simdclone->args[i].vector_type)
+	      != simd_clone_subparts (arginfo[i].vectype)))
+	{
+	  /* FORNOW we only have partial support for vector-type masks that
+	     can't hold all of simdlen. */
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION,
+			     vect_location,
+			     "in-branch vector clones are not yet"
+			     " supported for mismatched vector sizes.\n");
+	  return false;
+	}
+      if (bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_MASK
+	  && bestn->simdclone->mask_mode != VOIDmode)
+	{
+	  /* FORNOW don't support integer-type masks.  */
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION,
+			     vect_location,
+			     "in-branch vector clones are not yet"
+			     " supported for integer mask modes.\n");
+	  return false;
+	}
+    }
 
   fndecl = bestn->decl;
   nunits = bestn->simdclone->simdlen;
@@ -4326,7 +4362,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	{
 	  unsigned int k, l, m, o;
 	  tree atype;
-	  op = gimple_call_arg (stmt, i);
+	  op = gimple_call_arg (stmt, i + arg_offset);
 	  switch (bestn->simdclone->args[i].arg_type)
 	    {
 	    case SIMD_CLONE_ARG_TYPE_VECTOR:
@@ -4425,6 +4461,65 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 		    }
 		}
 	      break;
+	    case SIMD_CLONE_ARG_TYPE_MASK:
+	      atype = bestn->simdclone->args[i].vector_type;
+	      if (bestn->simdclone->mask_mode != VOIDmode)
+		{
+		  /* FORNOW: this is disabled above.  */
+		  gcc_unreachable ();
+		}
+	      else
+		{
+		  tree elt_type = TREE_TYPE (atype);
+		  tree one = fold_convert (elt_type, integer_one_node);
+		  tree zero = fold_convert (elt_type, integer_zero_node);
+		  o = vector_unroll_factor (nunits,
+					    simd_clone_subparts (atype));
+		  for (m = j * o; m < (j + 1) * o; m++)
+		    {
+		      if (simd_clone_subparts (atype)
+			  < simd_clone_subparts (arginfo[i].vectype))
+			{
+			  /* The mask type has fewer elements than simdlen.  */
+
+			  /* FORNOW */
+			  gcc_unreachable ();
+			}
+		      else if (simd_clone_subparts (atype)
+			       == simd_clone_subparts (arginfo[i].vectype))
+			{
+			  /* The SIMD clone function has the same number of
+			     elements as the current function.  */
+			  if (m == 0)
+			    {
+			      vect_get_vec_defs_for_operand (vinfo, stmt_info,
+							     o * ncopies,
+							     op,
+							     &vec_oprnds[i]);
+			      vec_oprnds_i[i] = 0;
+			    }
+			  vec_oprnd0 = vec_oprnds[i][vec_oprnds_i[i]++];
+			  vec_oprnd0
+			    = build3 (VEC_COND_EXPR, atype, vec_oprnd0,
+				      build_vector_from_val (atype, one),
+				      build_vector_from_val (atype, zero));
+			  gassign *new_stmt
+			    = gimple_build_assign (make_ssa_name (atype),
+						   vec_oprnd0);
+			  vect_finish_stmt_generation (vinfo, stmt_info,
+						       new_stmt, gsi);
+			  vargs.safe_push (gimple_assign_lhs (new_stmt));
+			}
+		      else
+			{
+			  /* The mask type has more elements than simdlen.  */
+
+			  /* FORNOW */
+			  gcc_unreachable ();
+			}
+		    }
+		}
+	      break;
 	    case SIMD_CLONE_ARG_TYPE_UNIFORM:
 	      vargs.safe_push (op);
 	      break;

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] vect: inbranch SIMD clones
  2022-12-01 13:35         ` Andrew Stubbs
@ 2022-12-01 14:16           ` Jakub Jelinek
  2023-01-06 12:20             ` Andrew Stubbs
  0 siblings, 1 reply; 21+ messages in thread
From: Jakub Jelinek @ 2022-12-01 14:16 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: Richard Biener, gcc-patches

On Thu, Dec 01, 2022 at 01:35:38PM +0000, Andrew Stubbs wrote:
> > Maybe better add -ffat-lto-objects to dg-additional-options and drop
> > the dg-skip-if (if it works with that, for all similar tests)?
> 
> The tests are already run with -ffat-lto-objects and the test still fails
> (pattern found zero times). I don't know why.
> 
> Aside from that, I've made all the other changes you requested.

Ah, I see what's going on.  You match simdclone, which isn't matched just in
the calls (I bet that is what you actually should/want count), but also twice
per each simd clone definition (and if somebody has say path to gcc
tree with simdclone in the name could match even more times).

Thus, I think:
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c
> @@ -0,0 +1,89 @@
> +/* { dg-require-effective-target vect_simd_clones } */
> +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
> +/* { dg-additional-options "-mavx" { target avx_runtime } } */
> +
> +/* Test that simd inbranch clones work correctly.  */
> +
> +#ifndef TYPE
> +#define TYPE int
> +#endif
> +
> +/* A simple function that will be cloned.  */
> +#pragma omp declare simd
> +TYPE __attribute__((noinline))
> +foo (TYPE a)
> +{
> +  return a + 1;
> +}
> +
> +/* Check that "inbranch" clones are called correctly.  */
> +
> +void __attribute__((noinline))

You should use noipa attribute instead of noinline on callers
which aren't declare simd (on declare simd it would prevent cloning
which is essential for the declare simd behavior), so that you don't
get surprises e.g. from extra ipa cp etc.

> +masked (TYPE * __restrict a, TYPE * __restrict b, int size)
> +{
> +  #pragma omp simd
> +  for (int i = 0; i < size; i++)
> +    b[i] = a[i]<1 ? foo(a[i]) : a[i];
> +}
> +
> +/* Check that "inbranch" works when there might be unrolling.  */
> +
> +void __attribute__((noinline))

So here too.
> +masked_fixed (TYPE * __restrict a, TYPE * __restrict b)

> +/* Ensure the the in-branch simd clones are used on targets that support
> +   them.  These counts include all call and definitions.  */
> +
> +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */

Drop lines line above.

> +/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target x86_64-*-* } } } */
> +/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target amdgcn-*-* } } } */

And scan-tree-dump-times " = foo.simdclone" 2 "optimized"; I'd think that
should be the right number for all of x86_64, amdgcn and aarch64.  And
please don't forget about i?86-*-* too.

> +/* TODO: aarch64 */

For aarch64, one would need to include it in check_effective_target_vect_simd_clones
first...

Otherwise LGTM.

	Jakub


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] vect: inbranch SIMD clones
  2022-12-01 14:16           ` Jakub Jelinek
@ 2023-01-06 12:20             ` Andrew Stubbs
  2023-02-10  9:11               ` Jakub Jelinek
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Stubbs @ 2023-01-06 12:20 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Richard Biener, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2115 bytes --]

Here's a new version of the patch.

On 01/12/2022 14:16, Jakub Jelinek wrote:
>> +void __attribute__((noinline))
> 
> You should use noipa attribute instead of noinline on callers
> which aren't declare simd (on declare simd it would prevent cloning
> which is essential for the declare simd behavior), so that you don't
> get surprises e.g. from extra ipa cp etc.

Fixed.

>> +/* Ensure the the in-branch simd clones are used on targets that support
>> +   them.  These counts include all call and definitions.  */
>> +
>> +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
> 
> Drop lines line above.

I don't want to drop the comment because I get so frustrated by 
testcases that fail when something changes and it's not obvious what the 
original author was actually trying to test.

I've tried to fix the -flto thing and I can't figure out how. The 
problem seems to be that there are two dump files from the two compiler 
invocations and it scans the wrong one. Aarch64 has the same problem.

>> +/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target x86_64-*-* } } } */
>> +/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target amdgcn-*-* } } } */
> 
> And scan-tree-dump-times " = foo.simdclone" 2 "optimized"; I'd think that
> should be the right number for all of x86_64, amdgcn and aarch64.  And
> please don't forget about i?86-*-* too.

I've switched the pattern and changed to using the "vect" dump (instead 
of "optimized") so that the later transformations don't mess up the 
counts. However there are still other reasons why the count varies. It 
might be that those can be turned off by options somehow, but probably 
testing those cases is valuable too. The values are 2, 3, or 4, now, 
instead of 18, so that's an improvement.

> 
>> +/* TODO: aarch64 */
> 
> For aarch64, one would need to include it in check_effective_target_vect_simd_clones
> first...

I've done so and tested it, but that's not included in the patch because 
there were other testcases that started reporting fails. None of the new 
testcases fail for Aarch64.

OK now?

Andrew

[-- Attachment #2: 230106-inbranch-simd-clones.patch --]
[-- Type: text/plain, Size: 40299 bytes --]

vect: inbranch SIMD clones

There has been support for generating "inbranch" SIMD clones for a long time,
but nothing actually uses them (as far as I can see).

This patch add supports for a sub-set of possible cases (those using
mask_mode == VOIDmode).  The other cases fail to vectorize, just as before,
so there should be no regressions.

The sub-set of support should cover all cases needed by amdgcn, at present.

gcc/ChangeLog:

	* internal-fn.cc (expand_MASK_CALL): New.
	* internal-fn.def (MASK_CALL): New.
	* internal-fn.h (expand_MASK_CALL): New prototype.
	* omp-simd-clone.cc (simd_clone_adjust_argument_types): Set vector_type
	for mask arguments also.
	* tree-if-conv.cc: Include cgraph.h.
	(if_convertible_stmt_p): Do if conversions for calls to SIMD calls.
	(predicate_statements): Convert functions to IFN_MASK_CALL.
	* tree-vect-loop.cc (vect_get_datarefs_in_loop): Recognise
	IFN_MASK_CALL as a SIMD function call.
	* tree-vect-stmts.cc (vectorizable_simd_clone_call): Handle
	IFN_MASK_CALL as an inbranch SIMD function call.
	Generate the mask vector arguments.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/vect-simd-clone-16.c: New test.
	* gcc.dg/vect/vect-simd-clone-16b.c: New test.
	* gcc.dg/vect/vect-simd-clone-16c.c: New test.
	* gcc.dg/vect/vect-simd-clone-16d.c: New test.
	* gcc.dg/vect/vect-simd-clone-16e.c: New test.
	* gcc.dg/vect/vect-simd-clone-16f.c: New test.
	* gcc.dg/vect/vect-simd-clone-17.c: New test.
	* gcc.dg/vect/vect-simd-clone-17b.c: New test.
	* gcc.dg/vect/vect-simd-clone-17c.c: New test.
	* gcc.dg/vect/vect-simd-clone-17d.c: New test.
	* gcc.dg/vect/vect-simd-clone-17e.c: New test.
	* gcc.dg/vect/vect-simd-clone-17f.c: New test.
	* gcc.dg/vect/vect-simd-clone-18.c: New test.
	* gcc.dg/vect/vect-simd-clone-18b.c: New test.
	* gcc.dg/vect/vect-simd-clone-18c.c: New test.
	* gcc.dg/vect/vect-simd-clone-18d.c: New test.
	* gcc.dg/vect/vect-simd-clone-18e.c: New test.
	* gcc.dg/vect/vect-simd-clone-18f.c: New test.

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 9471f543191..d9e11bfc62a 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -4527,3 +4527,10 @@ void
 expand_ASSUME (internal_fn, gcall *)
 {
 }
+
+void
+expand_MASK_CALL (internal_fn, gcall *)
+{
+  /* This IFN should only exist between ifcvt and vect passes.  */
+  gcc_unreachable ();
+}
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 61516dab66d..301c3780659 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -466,6 +466,9 @@ DEF_INTERNAL_FN (TRAP, ECF_CONST | ECF_LEAF | ECF_NORETURN
 DEF_INTERNAL_FN (ASSUME, ECF_CONST | ECF_LEAF | ECF_NOTHROW
 			 | ECF_LOOPING_CONST_OR_PURE, NULL)
 
+/* For if-conversion of inbranch SIMD clones.  */
+DEF_INTERNAL_FN (MASK_CALL, ECF_NOVOPS, NULL)
+
 #undef DEF_INTERNAL_INT_FN
 #undef DEF_INTERNAL_FLT_FN
 #undef DEF_INTERNAL_FLT_FLOATN_FN
diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
index 21b1ce43df6..ced92c041bb 100644
--- a/gcc/internal-fn.h
+++ b/gcc/internal-fn.h
@@ -244,6 +244,7 @@ extern void expand_SHUFFLEVECTOR (internal_fn, gcall *);
 extern void expand_SPACESHIP (internal_fn, gcall *);
 extern void expand_TRAP (internal_fn, gcall *);
 extern void expand_ASSUME (internal_fn, gcall *);
+extern void expand_MASK_CALL (internal_fn, gcall *);
 
 extern bool vectorized_internal_fn_supported_p (internal_fn, tree);
 
diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
index 21d69aa8747..afb7d99747b 100644
--- a/gcc/omp-simd-clone.cc
+++ b/gcc/omp-simd-clone.cc
@@ -937,6 +937,7 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 	}
       sc->args[i].orig_type = base_type;
       sc->args[i].arg_type = SIMD_CLONE_ARG_TYPE_MASK;
+      sc->args[i].vector_type = adj.type;
     }
 
   if (node->definition)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c
new file mode 100644
index 00000000000..ce9a6dad1b7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c
@@ -0,0 +1,89 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+/* Test that simd inbranch clones work correctly.  */
+
+#ifndef TYPE
+#define TYPE int
+#endif
+
+/* A simple function that will be cloned.  */
+#pragma omp declare simd
+TYPE __attribute__((noinline))
+foo (TYPE a)
+{
+  return a + 1;
+}
+
+/* Check that "inbranch" clones are called correctly.  */
+
+void __attribute__((noipa))
+masked (TYPE * __restrict a, TYPE * __restrict b, int size)
+{
+  #pragma omp simd
+  for (int i = 0; i < size; i++)
+    b[i] = a[i]<1 ? foo(a[i]) : a[i];
+}
+
+/* Check that "inbranch" works when there might be unrolling.  */
+
+void __attribute__((noipa))
+masked_fixed (TYPE * __restrict a, TYPE * __restrict b)
+{
+  #pragma omp simd
+  for (int i = 0; i < 128; i++)
+    b[i] = a[i]<1 ? foo(a[i]) : a[i];
+}
+
+/* Validate the outputs.  */
+
+void
+check_masked (TYPE *b, int size)
+{
+  for (int i = 0; i < size; i++)
+    if (((TYPE)i < 1 && b[i] != (TYPE)(i + 1))
+	|| ((TYPE)i >= 1 && b[i] != (TYPE)i))
+      {
+	__builtin_printf ("error at %d\n", i);
+	__builtin_exit (1);
+      }
+}
+
+int
+main ()
+{
+  TYPE a[1024];
+  TYPE b[1024];
+
+  for (int i = 0; i < 1024; i++)
+    a[i] = i;
+
+  masked_fixed (a, b);
+  check_masked (b, 128);
+
+  /* Test various sizes to cover machines with different vectorization
+     factors.  */
+  for (int size = 8; size <= 1024; size *= 2)
+    {
+      masked (a, b, size);
+      check_masked (b, size);
+    }
+
+  /* Test sizes that might exercise the partial vector code-path.  */
+  for (int size = 8; size <= 1024; size *= 2)
+    {
+      masked (a, b, size-4);
+      check_masked (b, size-4);
+    }
+
+  return 0;
+}
+
+/* Ensure the the in-branch simd clones are used on targets that support them.
+   Some targets use another call for the epilogue loops.  */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" { target { ! aarch64*-*-* } } } } */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 3 "vect" { target aarch64*-*-* } } } */
+
+/* The LTO test produces two dump files and we scan the wrong one.  */
+/* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16b.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16b.c
new file mode 100644
index 00000000000..af543b6573d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16b.c
@@ -0,0 +1,14 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE float
+#include "vect-simd-clone-16.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support them.
+   Some targets use another call for the epilogue loops.  */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" { target { ! { avx_runtime || aarch64*-*-* } } } } } */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 3 "vect" { target { avx_runtime || aarch64*-*-* } } } } */
+
+/* The LTO test produces two dump files and we scan the wrong one.  */
+/* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16c.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16c.c
new file mode 100644
index 00000000000..677548a9439
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16c.c
@@ -0,0 +1,17 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE short
+#include "vect-simd-clone-16.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support them.
+   Some targets use another call for the epilogue loops.  */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" { target { ! { x86_64*-*-* || { i686*-*-* || aarch64*-*-* } } } } } } */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 3 "vect" { target { aarch64*-*-* } } } } */
+
+/* x86_64 fails to use in-branch clones for TYPE=short.  */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 0 "vect" { target x86_64*-*-* i686*-*-* } } } */
+
+/* The LTO test produces two dump files and we scan the wrong one.  */
+/* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16d.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16d.c
new file mode 100644
index 00000000000..a9ae9932b30
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16d.c
@@ -0,0 +1,17 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE char
+#include "vect-simd-clone-16.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support them.
+   Some targets use another call for the epilogue loops.  */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" { target { ! { x86_64*-*-* || { i686*-*-* || aarch64*-*-* } } } } } } */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 3 "vect" { target { aarch64*-*-* } } } } */
+
+/* x86_64 fails to use in-branch clones for TYPE=char.  */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 0 "vect" { target x86_64*-*-* i686*-*-* } } } */
+
+/* The LTO test produces two dump files and we scan the wrong one.  */
+/* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16e.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16e.c
new file mode 100644
index 00000000000..c8b482bf2e7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16e.c
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE double
+#include "vect-simd-clone-16.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support them.
+   Some targets use another call for the epilogue loops.
+   Some targets use pairs of vectors and do twice the calls.  */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" { target { { ! avx_runtime } && { ! { i686*-*-* && { ! lp64 } } } } } } } */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 3 "vect" { target { avx_runtime && { ! { i686*-*-* && { ! lp64 } } } } } } } */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 4 "vect" { target { i686*-*-* && { ! lp64 } } } } } */
+
+/* The LTO test produces two dump files and we scan the wrong one.  */
+/* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c
new file mode 100644
index 00000000000..f42ac082678
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c
@@ -0,0 +1,14 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE __INT64_TYPE__
+#include "vect-simd-clone-16.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support them.
+   Some targets use pairs of vectors and do twice the calls.  */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" { target { ! { i686*-*-* && { ! lp64 } } } } } } */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 4 "vect" { target { i686*-*-* && { ! lp64 } } } } } */
+
+/* The LTO test produces two dump files and we scan the wrong one.  */
+/* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17.c
new file mode 100644
index 00000000000..756225e4306
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17.c
@@ -0,0 +1,89 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+/* Test that simd inbranch clones work correctly.  */
+
+#ifndef TYPE
+#define TYPE int
+#endif
+
+/* A simple function that will be cloned.  */
+#pragma omp declare simd uniform(b)
+TYPE __attribute__((noinline))
+foo (TYPE a, TYPE b)
+{
+  return a + b;
+}
+
+/* Check that "inbranch" clones are called correctly.  */
+
+void __attribute__((noipa))
+masked (TYPE * __restrict a, TYPE * __restrict b, int size)
+{
+  #pragma omp simd
+  for (int i = 0; i < size; i++)
+    b[i] = a[i]<1 ? foo(a[i], 1) : a[i];
+}
+
+/* Check that "inbranch" works when there might be unrolling.  */
+
+void __attribute__((noipa))
+masked_fixed (TYPE * __restrict a, TYPE * __restrict b)
+{
+  #pragma omp simd
+  for (int i = 0; i < 128; i++)
+    b[i] = a[i]<1 ? foo(a[i], 1) : a[i];
+}
+
+/* Validate the outputs.  */
+
+void
+check_masked (TYPE *b, int size)
+{
+  for (int i = 0; i < size; i++)
+    if (((TYPE)i < 1 && b[i] != (TYPE)(i + 1))
+	|| ((TYPE)i >= 1 && b[i] != (TYPE)i))
+      {
+	__builtin_printf ("error at %d\n", i);
+	__builtin_exit (1);
+      }
+}
+
+int
+main ()
+{
+  TYPE a[1024];
+  TYPE b[1024];
+
+  for (int i = 0; i < 1024; i++)
+    a[i] = i;
+
+  masked_fixed (a, b);
+  check_masked (b, 128);
+
+  /* Test various sizes to cover machines with different vectorization
+     factors.  */
+  for (int size = 8; size <= 1024; size *= 2)
+    {
+      masked (a, b, size);
+      check_masked (b, size);
+    }
+
+  /* Test sizes that might exercise the partial vector code-path.  */
+  for (int size = 8; size <= 1024; size *= 2)
+    {
+      masked (a, b, size-4);
+      check_masked (b, size-4);
+    }
+
+  return 0;
+}
+
+/* Ensure the the in-branch simd clones are used on targets that support them.
+   Some targets use another call for the epilogue loops.  */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" { target { ! aarch64*-*-* } } } } */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 3 "vect" { target aarch64*-*-* } } } */
+
+/* The LTO test produces two dump files and we scan the wrong one.  */
+/* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17b.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17b.c
new file mode 100644
index 00000000000..8731c268644
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17b.c
@@ -0,0 +1,14 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE float
+#include "vect-simd-clone-17.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support them.
+   Some targets use another call for the epilogue loops.  */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" { target { ! { avx_runtime || aarch64*-*-* } } } } } */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 3 "vect" { target { avx_runtime || aarch64*-*-* } } } } */
+
+/* The LTO test produces two dump files and we scan the wrong one.  */
+/* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17c.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17c.c
new file mode 100644
index 00000000000..6683d1a9cbf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17c.c
@@ -0,0 +1,17 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE short
+#include "vect-simd-clone-17.c"
+ 
+/* Ensure the the in-branch simd clones are used on targets that support them.
+   Some targets use another call for the epilogue loops.  */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" { target { ! { x86_64*-*-* || { i686*-*-* || aarch64*-*-* } } } } } } */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 3 "vect" { target { aarch64*-*-* } } } } */
+
+/* x86_64 fails to use in-branch clones for TYPE=short.  */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 0 "vect" { target x86_64*-*-* i686*-*-* } } } */
+
+/* The LTO test produces two dump files and we scan the wrong one.  */
+/* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17d.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17d.c
new file mode 100644
index 00000000000..d38bde6d85e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17d.c
@@ -0,0 +1,17 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE char
+#include "vect-simd-clone-17.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support them.
+   Some targets use another call for the epilogue loops.  */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" { target { ! { x86_64*-*-* || { i686*-*-* || aarch64*-*-* } } } } } } */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 3 "vect" { target { aarch64*-*-* } } } } */
+
+/* x86_64 fails to use in-branch clones for TYPE=char.  */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 0 "vect" { target x86_64*-*-* i686*-*-* } } } */
+
+/* The LTO test produces two dump files and we scan the wrong one.  */
+/* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17e.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17e.c
new file mode 100644
index 00000000000..f2a428c62c1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17e.c
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE double
+#include "vect-simd-clone-17.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support them.
+   Some targets use another call for the epilogue loops.
+   Some targets use pairs of vectors and do twice the calls.  */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" { target { { ! avx_runtime } && { ! { i686*-*-* && { ! lp64 } } } } } } } */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 3 "vect" { target { avx_runtime && { ! { i686*-*-* && { ! lp64 } } } } } } } */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 4 "vect" { target { i686*-*-* && { ! lp64 } } } } } */
+
+/* The LTO test produces two dump files and we scan the wrong one.  */
+/* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c
new file mode 100644
index 00000000000..cd05dec9632
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c
@@ -0,0 +1,14 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE __INT64_TYPE__
+#include "vect-simd-clone-17.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support them.
+   Some targets use pairs of vectors and do twice the calls.  */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" { target { ! { i686*-*-* && { ! lp64 } } } } } } */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 4 "vect" { target { i686*-*-* && { ! lp64 } } } } } */
+
+/* The LTO test produces two dump files and we scan the wrong one.  */
+/* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18.c
new file mode 100644
index 00000000000..febf9fdf85e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18.c
@@ -0,0 +1,89 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+/* Test that simd inbranch clones work correctly.  */
+
+#ifndef TYPE
+#define TYPE int
+#endif
+
+/* A simple function that will be cloned.  */
+#pragma omp declare simd uniform(b)
+TYPE __attribute__((noinline))
+foo (TYPE b, TYPE a)
+{
+  return a + b;
+}
+
+/* Check that "inbranch" clones are called correctly.  */
+
+void __attribute__((noipa))
+masked (TYPE * __restrict a, TYPE * __restrict b, int size)
+{
+  #pragma omp simd
+  for (int i = 0; i < size; i++)
+    b[i] = a[i]<1 ? foo(1, a[i]) : a[i];
+}
+
+/* Check that "inbranch" works when there might be unrolling.  */
+
+void __attribute__((noipa))
+masked_fixed (TYPE * __restrict a, TYPE * __restrict b)
+{
+  #pragma omp simd
+  for (int i = 0; i < 128; i++)
+    b[i] = a[i]<1 ? foo(1, a[i]) : a[i];
+}
+
+/* Validate the outputs.  */
+
+void
+check_masked (TYPE *b, int size)
+{
+  for (int i = 0; i < size; i++)
+    if (((TYPE)i < 1 && b[i] != (TYPE)(i + 1))
+	|| ((TYPE)i >= 1 && b[i] != (TYPE)i))
+      {
+	__builtin_printf ("error at %d\n", i);
+	__builtin_exit (1);
+      }
+}
+
+int
+main ()
+{
+  TYPE a[1024];
+  TYPE b[1024];
+
+  for (int i = 0; i < 1024; i++)
+    a[i] = i;
+
+  masked_fixed (a, b);
+  check_masked (b, 128);
+
+  /* Test various sizes to cover machines with different vectorization
+     factors.  */
+  for (int size = 8; size <= 1024; size *= 2)
+    {
+      masked (a, b, size);
+      check_masked (b, size);
+    }
+
+  /* Test sizes that might exercise the partial vector code-path.  */
+  for (int size = 8; size <= 1024; size *= 2)
+    {
+      masked (a, b, size-4);
+      check_masked (b, size-4);
+    }
+
+  return 0;
+}
+
+/* Ensure the the in-branch simd clones are used on targets that support them.
+   Some targets use another call for the epilogue loops.  */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" { target { ! aarch64*-*-* } } } } */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 3 "vect" { target aarch64*-*-* } } } */
+
+/* The LTO test produces two dump files and we scan the wrong one.  */
+/* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18b.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18b.c
new file mode 100644
index 00000000000..120993e517a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18b.c
@@ -0,0 +1,14 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE float
+#include "vect-simd-clone-18.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support them.
+   Some targets use another call for the epilogue loops.  */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" { target { ! { avx_runtime || aarch64*-*-* } } } } } */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 3 "vect" { target { avx_runtime || aarch64*-*-* } } } } */
+
+/* The LTO test produces two dump files and we scan the wrong one.  */
+/* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18c.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18c.c
new file mode 100644
index 00000000000..0d1fc6de4e4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18c.c
@@ -0,0 +1,17 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE short
+#include "vect-simd-clone-18.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support them.
+   Some targets use another call for the epilogue loops.  */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" { target { ! { x86_64*-*-* || { i686*-*-* || aarch64*-*-* } } } } } } */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 3 "vect" { target { aarch64*-*-* } } } } */
+
+/* x86_64 fails to use in-branch clones for TYPE=short.  */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 0 "vect" { target x86_64*-*-* i686*-*-* } } } */
+
+/* The LTO test produces two dump files and we scan the wrong one.  */
+/* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18d.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18d.c
new file mode 100644
index 00000000000..1e6c028fc47
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18d.c
@@ -0,0 +1,17 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE char
+#include "vect-simd-clone-18.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support them.
+   Some targets use another call for the epilogue loops.  */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" { target { ! { x86_64*-*-* || { i686*-*-* || aarch64*-*-* } } } } } } */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 3 "vect" { target { aarch64*-*-* } } } } */
+
+/* x86_64 fails to use in-branch clones for TYPE=char.  */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 0 "vect" { target x86_64*-*-* i686*-*-* } } } */
+
+/* The LTO test produces two dump files and we scan the wrong one.  */
+/* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18e.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18e.c
new file mode 100644
index 00000000000..9d20e52cb9a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18e.c
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE double
+#include "vect-simd-clone-18.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support them.
+   Some targets use another call for the epilogue loops.
+   Some targets use pairs of vectors and do twice the calls.  */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" { target { { ! avx_runtime } && { ! { i686*-*-* && { ! lp64 } } } } } } } */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 3 "vect" { target { avx_runtime && { ! { i686*-*-* && { ! lp64 } } } } } } } */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 4 "vect" { target { i686*-*-* && { ! lp64 } } } } } */
+
+/* The LTO test produces two dump files and we scan the wrong one.  */
+/* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c
new file mode 100644
index 00000000000..09ee7ff60fd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c
@@ -0,0 +1,14 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#define TYPE __INT64_TYPE__
+#include "vect-simd-clone-18.c"
+
+/* Ensure the the in-branch simd clones are used on targets that support them.
+   Some targets use pairs of vectors and do twice the calls.  */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" { target { ! { i686*-*-* && { ! lp64 } } } } } } */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 4 "vect" { target { i686*-*-* && { ! lp64 } } } } } */
+
+/* The LTO test produces two dump files and we scan the wrong one.  */
+/* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 64b20b4a9e1..54f687b1172 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -123,6 +123,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-dse.h"
 #include "tree-vectorizer.h"
 #include "tree-eh.h"
+#include "cgraph.h"
 
 /* For lang_hooks.types.type_for_mode.  */
 #include "langhooks.h"
@@ -1063,7 +1064,8 @@ if_convertible_gimple_assign_stmt_p (gimple *stmt,
    A statement is if-convertible if:
    - it is an if-convertible GIMPLE_ASSIGN,
    - it is a GIMPLE_LABEL or a GIMPLE_COND,
-   - it is builtins call.  */
+   - it is builtins call,
+   - it is a call to a function with a SIMD clone.  */
 
 static bool
 if_convertible_stmt_p (gimple *stmt, vec<data_reference_p> refs)
@@ -1083,13 +1085,23 @@ if_convertible_stmt_p (gimple *stmt, vec<data_reference_p> refs)
 	tree fndecl = gimple_call_fndecl (stmt);
 	if (fndecl)
 	  {
+	    /* We can vectorize some builtins and functions with SIMD
+	       "inbranch" clones.  */
 	    int flags = gimple_call_flags (stmt);
+	    struct cgraph_node *node = cgraph_node::get (fndecl);
 	    if ((flags & ECF_CONST)
 		&& !(flags & ECF_LOOPING_CONST_OR_PURE)
-		/* We can only vectorize some builtins at the moment,
-		   so restrict if-conversion to those.  */
 		&& fndecl_built_in_p (fndecl))
 	      return true;
+	    if (node && node->simd_clones != NULL)
+	      /* Ensure that at least one clone can be "inbranch".  */
+	      for (struct cgraph_node *n = node->simd_clones; n != NULL;
+		   n = n->simdclone->next_clone)
+		if (n->simdclone->inbranch)
+		  {
+		    need_to_predicate = true;
+		    return true;
+		  }
 	  }
 	return false;
       }
@@ -2603,6 +2615,29 @@ predicate_statements (loop_p loop)
 	      gimple_assign_set_rhs1 (stmt, ifc_temp_var (type, rhs, &gsi));
 	      update_stmt (stmt);
 	    }
+
+	  /* Convert functions that have a SIMD clone to IFN_MASK_CALL.  This
+	     will cause the vectorizer to match the "in branch" clone variants,
+	     and serves to build the mask vector in a natural way.  */
+	  gcall *call = dyn_cast <gcall *> (gsi_stmt (gsi));
+	  if (call && !gimple_call_internal_p (call))
+	    {
+	      tree orig_fn = gimple_call_fn (call);
+	      int orig_nargs = gimple_call_num_args (call);
+	      auto_vec<tree> args;
+	      args.safe_push (orig_fn);
+	      for (int i = 0; i < orig_nargs; i++)
+		args.safe_push (gimple_call_arg (call, i));
+	      args.safe_push (cond);
+
+	      /* Replace the call with a IFN_MASK_CALL that has the extra
+		 condition parameter. */
+	      gcall *new_call = gimple_build_call_internal_vec (IFN_MASK_CALL,
+								args);
+	      gimple_call_set_lhs (new_call, gimple_call_lhs (call));
+	      gsi_replace (&gsi, new_call, true);
+	    }
+
 	  lhs = gimple_get_lhs (gsi_stmt (gsi));
 	  if (lhs && TREE_CODE (lhs) == SSA_NAME)
 	    ssa_names.add (lhs);
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index aacbb12580c..fce5738aa58 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -2121,6 +2121,14 @@ vect_get_datarefs_in_loop (loop_p loop, basic_block *bbs,
 	    if (is_gimple_call (stmt) && loop->safelen)
 	      {
 		tree fndecl = gimple_call_fndecl (stmt), op;
+		if (fndecl == NULL_TREE
+		    && gimple_call_internal_p (stmt, IFN_MASK_CALL))
+		  {
+		    fndecl = gimple_call_arg (stmt, 0);
+		    gcc_checking_assert (TREE_CODE (fndecl) == ADDR_EXPR);
+		    fndecl = TREE_OPERAND (fndecl, 0);
+		    gcc_checking_assert (TREE_CODE (fndecl) == FUNCTION_DECL);
+		  }
 		if (fndecl != NULL_TREE)
 		  {
 		    cgraph_node *node = cgraph_node::get (fndecl);
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 5485da58b38..48048938291 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -3987,6 +3987,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
   size_t i, nargs;
   tree lhs, rtype, ratype;
   vec<constructor_elt, va_gc> *ret_ctor_elts = NULL;
+  int arg_offset = 0;
 
   /* Is STMT a vectorizable call?   */
   gcall *stmt = dyn_cast <gcall *> (stmt_info->stmt);
@@ -3994,6 +3995,15 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
     return false;
 
   fndecl = gimple_call_fndecl (stmt);
+  if (fndecl == NULL_TREE
+      && gimple_call_internal_p (stmt, IFN_MASK_CALL))
+    {
+      fndecl = gimple_call_arg (stmt, 0);
+      gcc_checking_assert (TREE_CODE (fndecl) == ADDR_EXPR);
+      fndecl = TREE_OPERAND (fndecl, 0);
+      gcc_checking_assert (TREE_CODE (fndecl) == FUNCTION_DECL);
+      arg_offset = 1;
+    }
   if (fndecl == NULL_TREE)
     return false;
 
@@ -4024,7 +4034,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
     return false;
 
   /* Process function arguments.  */
-  nargs = gimple_call_num_args (stmt);
+  nargs = gimple_call_num_args (stmt) - arg_offset;
 
   /* Bail out if the function has zero arguments.  */
   if (nargs == 0)
@@ -4042,7 +4052,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
       thisarginfo.op = NULL_TREE;
       thisarginfo.simd_lane_linear = false;
 
-      op = gimple_call_arg (stmt, i);
+      op = gimple_call_arg (stmt, i + arg_offset);
       if (!vect_is_simple_use (op, vinfo, &thisarginfo.dt,
 			       &thisarginfo.vectype)
 	  || thisarginfo.dt == vect_uninitialized_def)
@@ -4057,16 +4067,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	  || thisarginfo.dt == vect_external_def)
 	gcc_assert (thisarginfo.vectype == NULL_TREE);
       else
-	{
-	  gcc_assert (thisarginfo.vectype != NULL_TREE);
-	  if (VECTOR_BOOLEAN_TYPE_P (thisarginfo.vectype))
-	    {
-	      if (dump_enabled_p ())
-		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-				 "vector mask arguments are not supported\n");
-	      return false;
-	    }
-	}
+	gcc_assert (thisarginfo.vectype != NULL_TREE);
 
       /* For linear arguments, the analyze phase should have saved
 	 the base and step in STMT_VINFO_SIMD_CLONE_INFO.  */
@@ -4159,9 +4160,6 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	if (target_badness < 0)
 	  continue;
 	this_badness += target_badness * 512;
-	/* FORNOW: Have to add code to add the mask argument.  */
-	if (n->simdclone->inbranch)
-	  continue;
 	for (i = 0; i < nargs; i++)
 	  {
 	    switch (n->simdclone->args[i].arg_type)
@@ -4169,7 +4167,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	      case SIMD_CLONE_ARG_TYPE_VECTOR:
 		if (!useless_type_conversion_p
 			(n->simdclone->args[i].orig_type,
-			 TREE_TYPE (gimple_call_arg (stmt, i))))
+			 TREE_TYPE (gimple_call_arg (stmt, i + arg_offset))))
 		  i = -1;
 		else if (arginfo[i].dt == vect_constant_def
 			 || arginfo[i].dt == vect_external_def
@@ -4199,7 +4197,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 		i = -1;
 		break;
 	      case SIMD_CLONE_ARG_TYPE_MASK:
-		gcc_unreachable ();
+		break;
 	      }
 	    if (i == (size_t) -1)
 	      break;
@@ -4225,18 +4223,55 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
     return false;
 
   for (i = 0; i < nargs; i++)
-    if ((arginfo[i].dt == vect_constant_def
-	 || arginfo[i].dt == vect_external_def)
-	&& bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR)
-      {
-	tree arg_type = TREE_TYPE (gimple_call_arg (stmt, i));
-	arginfo[i].vectype = get_vectype_for_scalar_type (vinfo, arg_type,
-							  slp_node);
-	if (arginfo[i].vectype == NULL
-	    || !constant_multiple_p (bestn->simdclone->simdlen,
-				     simd_clone_subparts (arginfo[i].vectype)))
+    {
+      if ((arginfo[i].dt == vect_constant_def
+	   || arginfo[i].dt == vect_external_def)
+	  && bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR)
+	{
+	  tree arg_type = TREE_TYPE (gimple_call_arg (stmt, i + arg_offset));
+	  arginfo[i].vectype = get_vectype_for_scalar_type (vinfo, arg_type,
+							    slp_node);
+	  if (arginfo[i].vectype == NULL
+	      || !constant_multiple_p (bestn->simdclone->simdlen,
+				       simd_clone_subparts (arginfo[i].vectype)))
+	    return false;
+	}
+
+      if (bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR
+	  && VECTOR_BOOLEAN_TYPE_P (bestn->simdclone->args[i].vector_type))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			     "vector mask arguments are not supported.\n");
 	  return false;
-      }
+	}
+
+      if (bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_MASK
+	  && bestn->simdclone->mask_mode == VOIDmode
+	  && (simd_clone_subparts (bestn->simdclone->args[i].vector_type)
+	      != simd_clone_subparts (arginfo[i].vectype)))
+	{
+	  /* FORNOW we only have partial support for vector-type masks that
+	     can't hold all of simdlen. */
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION,
+			     vect_location,
+			     "in-branch vector clones are not yet"
+			     " supported for mismatched vector sizes.\n");
+	  return false;
+	}
+      if (bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_MASK
+	  && bestn->simdclone->mask_mode != VOIDmode)
+	{
+	  /* FORNOW don't support integer-type masks.  */
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION,
+			     vect_location,
+			     "in-branch vector clones are not yet"
+			     " supported for integer mask modes.\n");
+	  return false;
+	}
+    }
 
   fndecl = bestn->decl;
   nunits = bestn->simdclone->simdlen;
@@ -4326,7 +4361,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	{
 	  unsigned int k, l, m, o;
 	  tree atype;
-	  op = gimple_call_arg (stmt, i);
+	  op = gimple_call_arg (stmt, i + arg_offset);
 	  switch (bestn->simdclone->args[i].arg_type)
 	    {
 	    case SIMD_CLONE_ARG_TYPE_VECTOR:
@@ -4425,6 +4460,65 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 		    }
 		}
 	      break;
+	    case SIMD_CLONE_ARG_TYPE_MASK:
+	      atype = bestn->simdclone->args[i].vector_type;
+	      if (bestn->simdclone->mask_mode != VOIDmode)
+		{
+		  /* FORNOW: this is disabled above.  */
+		  gcc_unreachable ();
+		}
+	      else
+		{
+		  tree elt_type = TREE_TYPE (atype);
+		  tree one = fold_convert (elt_type, integer_one_node);
+		  tree zero = fold_convert (elt_type, integer_zero_node);
+		  o = vector_unroll_factor (nunits,
+					    simd_clone_subparts (atype));
+		  for (m = j * o; m < (j + 1) * o; m++)
+		    {
+		      if (simd_clone_subparts (atype)
+			  < simd_clone_subparts (arginfo[i].vectype))
+			{
+			  /* The mask type has fewer elements than simdlen.  */
+
+			  /* FORNOW */
+			  gcc_unreachable ();
+			}
+		      else if (simd_clone_subparts (atype)
+			       == simd_clone_subparts (arginfo[i].vectype))
+			{
+			  /* The SIMD clone function has the same number of
+			     elements as the current function.  */
+			  if (m == 0)
+			    {
+			      vect_get_vec_defs_for_operand (vinfo, stmt_info,
+							     o * ncopies,
+							     op,
+							     &vec_oprnds[i]);
+			      vec_oprnds_i[i] = 0;
+			    }
+			  vec_oprnd0 = vec_oprnds[i][vec_oprnds_i[i]++];
+			  vec_oprnd0
+			    = build3 (VEC_COND_EXPR, atype, vec_oprnd0,
+				      build_vector_from_val (atype, one),
+				      build_vector_from_val (atype, zero));
+			  gassign *new_stmt
+			    = gimple_build_assign (make_ssa_name (atype),
+						   vec_oprnd0);
+			  vect_finish_stmt_generation (vinfo, stmt_info,
+						       new_stmt, gsi);
+			  vargs.safe_push (gimple_assign_lhs (new_stmt));
+			}
+		      else
+			{
+			  /* The mask type has more elements than simdlen.  */
+
+			  /* FORNOW */
+			  gcc_unreachable ();
+			}
+		    }
+		}
+	      break;
 	    case SIMD_CLONE_ARG_TYPE_UNIFORM:
 	      vargs.safe_push (op);
 	      break;

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] vect: inbranch SIMD clones
  2023-01-06 12:20             ` Andrew Stubbs
@ 2023-02-10  9:11               ` Jakub Jelinek
  2023-02-23 10:02                 ` Andrew Stubbs
  0 siblings, 1 reply; 21+ messages in thread
From: Jakub Jelinek @ 2023-02-10  9:11 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: Richard Biener, gcc-patches

On Fri, Jan 06, 2023 at 12:20:33PM +0000, Andrew Stubbs wrote:
> > > +/* Ensure the the in-branch simd clones are used on targets that support
> > > +   them.  These counts include all call and definitions.  */
> > > +
> > > +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
> > 
> > Drop lines line above.
> 
> I don't want to drop the comment because I get so frustrated by testcases
> that fail when something changes and it's not obvious what the original
> author was actually trying to test.
> 
> I've tried to fix the -flto thing and I can't figure out how. The problem
> seems to be that there are two dump files from the two compiler invocations
> and it scans the wrong one. Aarch64 has the same problem.

Two dumps are because it is in a dg-do run test.
I think it would be better to separate it, have for all cases one
test with defaulted dg-do (in vect.exp that is either dg-do run or dg-do
compile:
# If the target system supports vector instructions, the default action
# for a test is 'run', otherwise it's 'compile'.
) without the dg-final and then another one with the same TYPE which would
be forcibly dg-do compile with dg-final and
dg-additional-options "-ffat-lto-objects", then you get a single dump only.

> > > +/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target x86_64-*-* } } } */
> > > +/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target amdgcn-*-* } } } */
> > 
> > And scan-tree-dump-times " = foo.simdclone" 2 "optimized"; I'd think that
> > should be the right number for all of x86_64, amdgcn and aarch64.  And
> > please don't forget about i?86-*-* too.
> 
> I've switched the pattern and changed to using the "vect" dump (instead of
> "optimized") so that the later transformations don't mess up the counts.
> However there are still other reasons why the count varies. It might be that
> those can be turned off by options somehow, but probably testing those cases
> is valuable too. The values are 2, 3, or 4, now, instead of 18, so that's an
> improvement.

But still varries between the architectures, so it is an extra maintainance
nightmare.

> > > +/* TODO: aarch64 */
> > 
> > For aarch64, one would need to include it in check_effective_target_vect_simd_clones
> > first...
> 
> I've done so and tested it, but that's not included in the patch because
> there were other testcases that started reporting fails. None of the new
> testcases fail for Aarch64.

Sure, that would be for a separate patch.

Anyway, if you want, commit the patch as is and tweak the testcases if
possible incrementally.

	Jakub


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] vect: inbranch SIMD clones
  2023-02-10  9:11               ` Jakub Jelinek
@ 2023-02-23 10:02                 ` Andrew Stubbs
  0 siblings, 0 replies; 21+ messages in thread
From: Andrew Stubbs @ 2023-02-23 10:02 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Richard Biener, gcc-patches

On 10/02/2023 09:11, Jakub Jelinek wrote:
>> I've tried to fix the -flto thing and I can't figure out how. The problem
>> seems to be that there are two dump files from the two compiler invocations
>> and it scans the wrong one. Aarch64 has the same problem.
> 
> Two dumps are because it is in a dg-do run test.
> I think it would be better to separate it, have for all cases one
> test with defaulted dg-do (in vect.exp that is either dg-do run or dg-do
> compile:
> # If the target system supports vector instructions, the default action
> # for a test is 'run', otherwise it's 'compile'.
> ) without the dg-final and then another one with the same TYPE which would
> be forcibly dg-do compile with dg-final and
> dg-additional-options "-ffat-lto-objects", then you get a single dump only.

If I change the testcase to "dg-do compile" then it does indeed only 
produce one dump, but it's still the wrong one.

The command it runs is this (I removed some noise):

   x86_64-none-linux-gnu-gcc vect-simd-clone-16.c -flto -ffat-lto-objects \
               -msse2 -ftree-vectorize -fno-tree-loop-distribute-patterns \
               -fno-vect-cost-model -fno-common -O2 \
               -fdump-tree-vect-details -fopenmp-simd -mavx

With "-S" (dg-do compile) I get

   vect-simd-clone-16.c.172t.vect

Otherwise (dg-do run) I get

   a-vect-simd-clone-16.c.172t.vect
   a.ltrans0.ltrans.172t.vect

The "ltrans0" dump has the "foo.simdclone" output that we're looking 
for, but dejagnu appears to be scanning the other, which does not. The 
filenames vary between the two commands, but the contents is identical.

>>>> +/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target x86_64-*-* } } } */
>>>> +/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target amdgcn-*-* } } } */
>>>
>>> And scan-tree-dump-times " = foo.simdclone" 2 "optimized"; I'd think that
>>> should be the right number for all of x86_64, amdgcn and aarch64.  And
>>> please don't forget about i?86-*-* too.
>>
>> I've switched the pattern and changed to using the "vect" dump (instead of
>> "optimized") so that the later transformations don't mess up the counts.
>> However there are still other reasons why the count varies. It might be that
>> those can be turned off by options somehow, but probably testing those cases
>> is valuable too. The values are 2, 3, or 4, now, instead of 18, so that's an
>> improvement.
> 
> But still varries between the architectures, so it is an extra maintainance
> nightmare.
> 
>>>> +/* TODO: aarch64 */
>>>
>>> For aarch64, one would need to include it in check_effective_target_vect_simd_clones
>>> first...
>>
>> I've done so and tested it, but that's not included in the patch because
>> there were other testcases that started reporting fails. None of the new
>> testcases fail for Aarch64.
> 
> Sure, that would be for a separate patch.
> 
> Anyway, if you want, commit the patch as is and tweak the testcases if
> possible incrementally.

I will do so now. It would be nice to fix the testcase oddities, but I 
don't know how.

I wrote the above yesterday, but apparently the email didn't send ... 
since then some bugs have been reported. I'll try to investigate today, 
although I think Richi has a fix already.

Thanks

Andrew

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2023-02-23 10:03 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-09 13:23 [PATCH 0/3] OpenMP SIMD routines Andrew Stubbs
2022-08-09 13:23 ` [PATCH 1/3] omp-simd-clone: Allow fixed-lane vectors Andrew Stubbs
2022-08-26 11:04   ` Jakub Jelinek
2022-08-30 14:52     ` Andrew Stubbs
2022-08-30 16:54       ` Rainer Orth
2022-08-31  7:11         ` Martin Liška
2022-08-31  8:29         ` Jakub Jelinek
2022-08-31  8:35           ` Andrew Stubbs
2022-08-09 13:23 ` [PATCH 2/3] amdgcn: OpenMP SIMD routine support Andrew Stubbs
2022-08-30 14:53   ` Andrew Stubbs
2022-08-09 13:23 ` [PATCH 3/3] vect: inbranch SIMD clones Andrew Stubbs
2022-09-09 14:31   ` Jakub Jelinek
2022-09-14  8:09     ` Richard Biener
2022-09-14  8:34       ` Jakub Jelinek
2022-11-30 15:17     ` Andrew Stubbs
2022-11-30 15:37       ` Jakub Jelinek
2022-12-01 13:35         ` Andrew Stubbs
2022-12-01 14:16           ` Jakub Jelinek
2023-01-06 12:20             ` Andrew Stubbs
2023-02-10  9:11               ` Jakub Jelinek
2023-02-23 10:02                 ` Andrew Stubbs

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).