public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] Allow vec_duplicate_optab to fail
@ 2021-06-05 15:18 H.J. Lu
  2021-06-05 15:18 ` [PATCH v2 1/2] " H.J. Lu
  2021-06-05 15:18 ` [PATCH v2 2/2] x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast H.J. Lu
  0 siblings, 2 replies; 11+ messages in thread
From: H.J. Lu @ 2021-06-05 15:18 UTC (permalink / raw)
  To: gcc-patches; +Cc: Uros Bizjak, Jakub Jelinek, Richard Sandiford, Richard Biener

We'd like to add vec_duplicate_optab to x86 backend.  There are 3 ways
to broadcast an integer constant:

1. Load the full size from constant pool directly.
2. Use AVX2/AVX512 broadcast instruction.
3. Emulate broadcast with SSE2 unpack and shuffle instructions.

A small benchmark:

https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/memset/broadcast

shows that broadcast is a little bit faster on Intel Core i7-8559U:

$ make
gcc -g -I. -O2   -c -o test.o test.c
gcc -g   -c -o memory.o memory.S
gcc -g   -c -o broadcast.o broadcast.S
gcc -g   -c -o vec_dup_sse2.o vec_dup_sse2.S
gcc -o test test.o memory.o broadcast.o vec_dup_sse2.o
./test
memory      : 147215
broadcast   : 121213
vec_dup_sse2: 171366
$

broadcast is also smaller:

$ size memory.o broadcast.o
   text	   data	    bss	    dec	    hex	filename
    132	      0	      0	    132	     84	memory.o
    122	      0	      0	    122	     7a	broadcast.o
$

The preferred choices are

1. Use AVX2/AVX512 broadcast instruction.
2. Load the full size from constant pool directly.
3. Emulate broadcast with SSE2 unpack and shuffle instructions.

The first patch updates vec_duplicate_optab usage to allow it to fail so
that x86 backend can opt out SSE2 broadcast emulation from an integer
constant.

The second patch adds vec_duplicate<mode> expander and updates move
expanders to convert the CONST_WIDE_INT and CONST_VECTO operands to
vector broadcast from an integer with AVX2.

H.J. Lu (2):
  Allow vec_duplicate_optab to fail
  x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast

 gcc/config/i386/i386-expand.c                 | 216 +++++++++++++++++-
 gcc/config/i386/i386-protos.h                 |   3 +
 gcc/config/i386/i386.c                        |  31 +++
 gcc/config/i386/sse.md                        |  19 ++
 gcc/doc/md.texi                               |   2 -
 gcc/expr.c                                    |  10 +-
 .../i386/avx512f-broadcast-pr87767-1.c        |   7 +-
 .../i386/avx512f-broadcast-pr87767-5.c        |   5 +-
 .../gcc.target/i386/avx512f_cond_move.c       |   4 +-
 .../i386/avx512vl-broadcast-pr87767-1.c       |  12 +-
 .../i386/avx512vl-broadcast-pr87767-5.c       |   9 +-
 gcc/testsuite/gcc.target/i386/pr100865-1.c    |  13 ++
 gcc/testsuite/gcc.target/i386/pr100865-10a.c  |  33 +++
 gcc/testsuite/gcc.target/i386/pr100865-10b.c  |   7 +
 gcc/testsuite/gcc.target/i386/pr100865-2.c    |  14 ++
 gcc/testsuite/gcc.target/i386/pr100865-3.c    |  15 ++
 gcc/testsuite/gcc.target/i386/pr100865-4a.c   |  16 ++
 gcc/testsuite/gcc.target/i386/pr100865-4b.c   |   9 +
 gcc/testsuite/gcc.target/i386/pr100865-5a.c   |  16 ++
 gcc/testsuite/gcc.target/i386/pr100865-5b.c   |   9 +
 gcc/testsuite/gcc.target/i386/pr100865-6a.c   |  16 ++
 gcc/testsuite/gcc.target/i386/pr100865-6b.c   |   9 +
 gcc/testsuite/gcc.target/i386/pr100865-7a.c   |  17 ++
 gcc/testsuite/gcc.target/i386/pr100865-7b.c   |   9 +
 gcc/testsuite/gcc.target/i386/pr100865-8a.c   |  24 ++
 gcc/testsuite/gcc.target/i386/pr100865-8b.c   |   7 +
 gcc/testsuite/gcc.target/i386/pr100865-9a.c   |  25 ++
 gcc/testsuite/gcc.target/i386/pr100865-9b.c   |   7 +
 28 files changed, 534 insertions(+), 30 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-10a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-10b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-4a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-4b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-5a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-5b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-6a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-6b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-7a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-7b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-8a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-8b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-9a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-9b.c

-- 
2.31.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/2] Allow vec_duplicate_optab to fail
  2021-06-05 15:18 [PATCH v2 0/2] Allow vec_duplicate_optab to fail H.J. Lu
@ 2021-06-05 15:18 ` H.J. Lu
  2021-06-07  7:12   ` Richard Sandiford
  2021-06-05 15:18 ` [PATCH v2 2/2] x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast H.J. Lu
  1 sibling, 1 reply; 11+ messages in thread
From: H.J. Lu @ 2021-06-05 15:18 UTC (permalink / raw)
  To: gcc-patches; +Cc: Uros Bizjak, Jakub Jelinek, Richard Sandiford, Richard Biener

Update vec_duplicate to allow to fail so that backend can only allow
broadcasting an integer constant to a vector when broadcast instruction
is available.

	* expr.c (store_constructor): Replace expand_insn with
	maybe_expand_insn for vec_duplicate_optab.
	* doc/md.texi: Update vec_duplicate.
---
 gcc/doc/md.texi |  2 --
 gcc/expr.c      | 10 ++++++----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 00caf3844cc..e66c41c4779 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5077,8 +5077,6 @@ the mode appropriate for one element of @var{m}.
 This pattern only handles duplicates of non-constant inputs.  Constant
 vectors go through the @code{mov@var{m}} pattern instead.
 
-This pattern is not allowed to @code{FAIL}.
-
 @cindex @code{vec_series@var{m}} instruction pattern
 @item @samp{vec_series@var{m}}
 Initialize vector output operand 0 so that element @var{i} is equal to
diff --git a/gcc/expr.c b/gcc/expr.c
index e4660f0e90a..3107c32f259 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -7075,10 +7075,12 @@ store_constructor (tree exp, rtx target, int cleared, poly_int64 size,
 	    class expand_operand ops[2];
 	    create_output_operand (&ops[0], target, mode);
 	    create_input_operand (&ops[1], expand_normal (elt), eltmode);
-	    expand_insn (icode, 2, ops);
-	    if (!rtx_equal_p (target, ops[0].value))
-	      emit_move_insn (target, ops[0].value);
-	    break;
+	    if (maybe_expand_insn (icode, 2, ops))
+	      {
+		if (!rtx_equal_p (target, ops[0].value))
+		  emit_move_insn (target, ops[0].value);
+		break;
+	      }
 	  }
 
 	n_elts = TYPE_VECTOR_SUBPARTS (type);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 2/2] x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast
  2021-06-05 15:18 [PATCH v2 0/2] Allow vec_duplicate_optab to fail H.J. Lu
  2021-06-05 15:18 ` [PATCH v2 1/2] " H.J. Lu
@ 2021-06-05 15:18 ` H.J. Lu
  1 sibling, 0 replies; 11+ messages in thread
From: H.J. Lu @ 2021-06-05 15:18 UTC (permalink / raw)
  To: gcc-patches; +Cc: Uros Bizjak, Jakub Jelinek, Richard Sandiford, Richard Biener

1. Update move expanders to convert the CONST_WIDE_INT and CONST_VECTO
operands to vector broadcast from an integer with AVX2.
2. Add ix86_gen_scratch_sse_rtx to return a scratch SSE register which
won't increase stack alignment requirement and blocks transformation by
the combine pass.
3. Add vec_duplicate<mode> expander.
4. Update PR 87767 tests to expect integer broadcast instead of broadcast
from memory.
5. Update avx512f_cond_move.c to expect integer broadcast.

A small benchmark:

https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/memset/broadcast

shows that broadcast is a little bit faster on Intel Core i7-8559U:

$ make
gcc -g -I. -O2   -c -o test.o test.c
gcc -g   -c -o memory.o memory.S
gcc -g   -c -o broadcast.o broadcast.S
gcc -g   -c -o vec_dup_sse2.o vec_dup_sse2.S
gcc -o test test.o memory.o broadcast.o vec_dup_sse2.o
./test
memory      : 147215
broadcast   : 121213
vec_dup_sse2: 171366
$

broadcast is also smaller:

$ size memory.o broadcast.o
   text	   data	    bss	    dec	    hex	filename
    132	      0	      0	    132	     84	memory.o
    122	      0	      0	    122	     7a	broadcast.o
$

gcc/

	PR target/100865
	* config/i386/i386-expand.c (ix86_expand_vector_init_duplicate):
	New prototype.
	(ix86_byte_broadcast): New function.
	(ix86_convert_const_wide_int_to_broadcast): Likewise.
	(ix86_expand_move): Convert CONST_WIDE_INT to broadcast if mode
	size is 16 bytes or bigger.
	(ix86_broadcast_from_integer_constant): New function.
	(ix86_expand_vector_move): Convert CONST_WIDE_INT and CONST_VECTOR
	to broadcast if mode size is 16 bytes or bigger.
	(ix86_expand_integer_vec_duplicat): New function.
	* config/i386/i386-protos.h (ix86_gen_scratch_sse_rtx): New
	prototype.
	(ix86_expand_integer_vec_duplicat): Likewise.
	* config/i386/i386.c (ix86_gen_scratch_sse_rtx): New function.
	* config/i386/sse.md (INT_BROADCAST_MODE): New mode iterator.
	(vec_duplicate<mode>): New expander.

gcc/testsuite/

	PR target/100865
	* gcc.target/i386/avx512f-broadcast-pr87767-1.c: Expect integer
	broadcast.
	* gcc.target/i386/avx512f-broadcast-pr87767-5.c: Likewise.
	* gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Likewise.
	* gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Likewise.
	* gcc.target/i386/avx512f_cond_move.c: Also pass
	-mprefer-vector-width=512 and expect integer broadcast.
	* gcc.target/i386/pr100865-1.c: New test.
	* gcc.target/i386/pr100865-2.c: Likewise.
	* gcc.target/i386/pr100865-3.c: Likewise.
	* gcc.target/i386/pr100865-4a.c: Likewise.
	* gcc.target/i386/pr100865-4b.c: Likewise.
	* gcc.target/i386/pr100865-5a.c: Likewise.
	* gcc.target/i386/pr100865-5b.c: Likewise.
	* gcc.target/i386/pr100865-6a.c: Likewise.
	* gcc.target/i386/pr100865-6b.c: Likewise.
	* gcc.target/i386/pr100865-7a.c: Likewise.
	* gcc.target/i386/pr100865-7b.c: Likewise.
	* gcc.target/i386/pr100865-8a.c: Likewise.
	* gcc.target/i386/pr100865-8b.c: Likewise.
	* gcc.target/i386/pr100865-9a.c: Likewise.
	* gcc.target/i386/pr100865-9b.c: Likewise.
	* gcc.target/i386/pr100865-10a.c: Likewise.
	* gcc.target/i386/pr100865-10b.c: Likewise.
---
 gcc/config/i386/i386-expand.c                 | 216 +++++++++++++++++-
 gcc/config/i386/i386-protos.h                 |   3 +
 gcc/config/i386/i386.c                        |  31 +++
 gcc/config/i386/sse.md                        |  19 ++
 .../i386/avx512f-broadcast-pr87767-1.c        |   7 +-
 .../i386/avx512f-broadcast-pr87767-5.c        |   5 +-
 .../gcc.target/i386/avx512f_cond_move.c       |   4 +-
 .../i386/avx512vl-broadcast-pr87767-1.c       |  12 +-
 .../i386/avx512vl-broadcast-pr87767-5.c       |   9 +-
 gcc/testsuite/gcc.target/i386/pr100865-1.c    |  13 ++
 gcc/testsuite/gcc.target/i386/pr100865-10a.c  |  33 +++
 gcc/testsuite/gcc.target/i386/pr100865-10b.c  |   7 +
 gcc/testsuite/gcc.target/i386/pr100865-2.c    |  14 ++
 gcc/testsuite/gcc.target/i386/pr100865-3.c    |  15 ++
 gcc/testsuite/gcc.target/i386/pr100865-4a.c   |  16 ++
 gcc/testsuite/gcc.target/i386/pr100865-4b.c   |   9 +
 gcc/testsuite/gcc.target/i386/pr100865-5a.c   |  16 ++
 gcc/testsuite/gcc.target/i386/pr100865-5b.c   |   9 +
 gcc/testsuite/gcc.target/i386/pr100865-6a.c   |  16 ++
 gcc/testsuite/gcc.target/i386/pr100865-6b.c   |   9 +
 gcc/testsuite/gcc.target/i386/pr100865-7a.c   |  17 ++
 gcc/testsuite/gcc.target/i386/pr100865-7b.c   |   9 +
 gcc/testsuite/gcc.target/i386/pr100865-8a.c   |  24 ++
 gcc/testsuite/gcc.target/i386/pr100865-8b.c   |   7 +
 gcc/testsuite/gcc.target/i386/pr100865-9a.c   |  25 ++
 gcc/testsuite/gcc.target/i386/pr100865-9b.c   |   7 +
 26 files changed, 528 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-10a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-10b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-4a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-4b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-5a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-5b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-6a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-6b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-7a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-7b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-8a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-8b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-9a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-9b.c

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 804cb596867..04361cb331e 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -93,6 +93,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "i386-builtins.h"
 #include "i386-expand.h"
 
+static bool ix86_expand_vector_init_duplicate (bool, machine_mode, rtx,
+					       rtx);
+
 /* Split one or more double-mode RTL references into pairs of half-mode
    references.  The RTL can be REG, offsettable MEM, integer constant, or
    CONST_DOUBLE.  "operands" is a pointer to an array of double-mode RTLs to
@@ -190,6 +193,88 @@ ix86_expand_clear (rtx dest)
   emit_insn (tmp);
 }
 
+/* Return true if V can be broadcasted from an integer of WIDTH bits
+   which is returned in VAL_BROADCAST.  Otherwise, return false.  */
+
+static bool
+ix86_broadcast (HOST_WIDE_INT v, unsigned int width,
+		HOST_WIDE_INT &val_broadcast)
+{
+  wide_int val = wi::uhwi (v, HOST_BITS_PER_WIDE_INT);
+  val_broadcast = wi::extract_uhwi (val, 0, width);
+  for (unsigned int i = width; i < HOST_BITS_PER_WIDE_INT; i += width)
+    {
+      HOST_WIDE_INT each = wi::extract_uhwi (val, i, width);
+      if (val_broadcast != each)
+	return false;
+    }
+  val_broadcast = sext_hwi (val_broadcast, width);
+  return true;
+}
+
+/* Convert the CONST_WIDE_INT operand OP to broadcast in MODE.  */
+
+static rtx
+ix86_convert_const_wide_int_to_broadcast (machine_mode mode, rtx op)
+{
+  /* Don't use integer vector broadcast if we can't move from GPR to SSE
+     register directly.  */
+  if (!TARGET_INTER_UNIT_MOVES_TO_VEC)
+    return nullptr;
+
+  /* Convert CONST_WIDE_INT to a non-standard SSE constant integer
+     broadcast only if vector broadcast is available.  */
+  if (!TARGET_AVX2
+      || !CONST_WIDE_INT_P (op)
+      || standard_sse_constant_p (op, mode))
+    return nullptr;
+
+  HOST_WIDE_INT val = CONST_WIDE_INT_ELT (op, 0);
+  HOST_WIDE_INT val_broadcast;
+  scalar_int_mode broadcast_mode;
+  if (ix86_broadcast (val, GET_MODE_BITSIZE (QImode),
+		      val_broadcast))
+    broadcast_mode = QImode;
+  else if (ix86_broadcast (val, GET_MODE_BITSIZE (HImode),
+			   val_broadcast))
+    broadcast_mode = HImode;
+  else if (ix86_broadcast (val, GET_MODE_BITSIZE (SImode),
+			   val_broadcast))
+    broadcast_mode = SImode;
+  else if (TARGET_64BIT
+	   && ix86_broadcast (val, GET_MODE_BITSIZE (DImode),
+			      val_broadcast))
+    {
+      /* NB: MOVQ takes a 32-bit signed immediate operand.  */
+      if (trunc_int_for_mode (val_broadcast, SImode) != val_broadcast)
+	return nullptr;
+      broadcast_mode = DImode;
+    }
+  else
+    return nullptr;
+
+  /* Check if OP can be broadcasted from VAL.  */
+  for (int i = 1; i < CONST_WIDE_INT_NUNITS (op); i++)
+    if (val != CONST_WIDE_INT_ELT (op, i))
+      return nullptr;
+
+  unsigned int nunits = (GET_MODE_SIZE (mode)
+			 / GET_MODE_SIZE (broadcast_mode));
+  machine_mode vector_mode;
+  if (!mode_for_vector (broadcast_mode, nunits).exists (&vector_mode))
+    gcc_unreachable ();
+  rtx target = ix86_gen_scratch_sse_rtx (vector_mode, true);
+  if (!ix86_expand_vector_init_duplicate (false, vector_mode, target,
+					  GEN_INT (val_broadcast)))
+    gcc_unreachable ();
+  if (REGNO (target) < FIRST_PSEUDO_REGISTER)
+    target = gen_rtx_REG (mode, REGNO (target));
+  else
+    target = convert_to_mode (mode, target, 1);
+
+  return target;
+}
+
 void
 ix86_expand_move (machine_mode mode, rtx operands[])
 {
@@ -347,20 +432,29 @@ ix86_expand_move (machine_mode mode, rtx operands[])
 	  && optimize)
 	op1 = copy_to_mode_reg (mode, op1);
 
-      if (can_create_pseudo_p ()
-	  && CONST_DOUBLE_P (op1))
+      if (can_create_pseudo_p ())
 	{
-	  /* If we are loading a floating point constant to a register,
-	     force the value to memory now, since we'll get better code
-	     out the back end.  */
+	  if (CONST_DOUBLE_P (op1))
+	    {
+	      /* If we are loading a floating point constant to a
+		 register, force the value to memory now, since we'll
+		 get better code out the back end.  */
 
-	  op1 = validize_mem (force_const_mem (mode, op1));
-	  if (!register_operand (op0, mode))
+	      op1 = validize_mem (force_const_mem (mode, op1));
+	      if (!register_operand (op0, mode))
+		{
+		  rtx temp = gen_reg_rtx (mode);
+		  emit_insn (gen_rtx_SET (temp, op1));
+		  emit_move_insn (op0, temp);
+		  return;
+		}
+	    }
+	  else if (GET_MODE_SIZE (mode) >= 16)
 	    {
-	      rtx temp = gen_reg_rtx (mode);
-	      emit_insn (gen_rtx_SET (temp, op1));
-	      emit_move_insn (op0, temp);
-	      return;
+	      rtx tmp = ix86_convert_const_wide_int_to_broadcast
+		(GET_MODE (op0), op1);
+	      if (tmp != nullptr)
+		op1 = tmp;
 	    }
 	}
     }
@@ -368,6 +462,54 @@ ix86_expand_move (machine_mode mode, rtx operands[])
   emit_insn (gen_rtx_SET (op0, op1));
 }
 
+static rtx
+ix86_broadcast_from_integer_constant (machine_mode mode, rtx op)
+{
+  int nunits = GET_MODE_NUNITS (mode);
+  if (nunits < 2)
+    return nullptr;
+
+  /* Don't use integer vector broadcast if we can't move from GPR to SSE
+     register directly.  */
+  if (!TARGET_INTER_UNIT_MOVES_TO_VEC)
+    return nullptr;
+
+  /* Don't broadcast from a standard SSE constant integer.  */
+  if (standard_sse_constant_p (op, mode))
+    return nullptr;
+
+  /* Don't broadcast from a 64-bit integer constant in 32-bit mode.  */
+  if (GET_MODE_INNER (mode) == DImode && !TARGET_64BIT)
+    return nullptr;
+
+  rtx constant = get_pool_constant (XEXP (op, 0));
+  if (GET_CODE (constant) != CONST_VECTOR)
+    return nullptr;
+
+  /* There could be some rtx like
+     (mem/u/c:V16QI (symbol_ref/u:DI ("*.LC1")))
+     but with "*.LC1" refer to V2DI constant vector.  */
+  if (GET_MODE (constant) != mode)
+    {
+      constant = simplify_subreg (mode, constant, GET_MODE (constant),
+				  0);
+      if (constant == nullptr || GET_CODE (constant) != CONST_VECTOR)
+	return nullptr;
+    }
+
+  rtx first = XVECEXP (constant, 0, 0);
+
+  for (int i = 1; i < nunits; ++i)
+    {
+      rtx tmp = XVECEXP (constant, 0, i);
+      /* Vector duplicate value.  */
+      if (!rtx_equal_p (tmp, first))
+	return nullptr;
+    }
+
+  return first;
+}
+
 void
 ix86_expand_vector_move (machine_mode mode, rtx operands[])
 {
@@ -407,7 +549,33 @@ ix86_expand_vector_move (machine_mode mode, rtx operands[])
 	  op1 = simplify_gen_subreg (mode, r, imode, SUBREG_BYTE (op1));
 	}
       else
-	op1 = validize_mem (force_const_mem (mode, op1));
+	{
+	  machine_mode mode = GET_MODE (op0);
+	  rtx tmp = ix86_convert_const_wide_int_to_broadcast
+	    (mode, op1);
+	  if (tmp == nullptr)
+	    op1 = validize_mem (force_const_mem (mode, op1));
+	  else
+	    op1 = tmp;
+	}
+    }
+
+  rtx first;
+
+  if (can_create_pseudo_p ()
+      && GET_MODE_SIZE (mode) >= 16
+      && GET_MODE_CLASS (mode) == MODE_VECTOR_INT
+      && (MEM_P (op1)
+	  && SYMBOL_REF_P (XEXP (op1, 0))
+	  && CONSTANT_POOL_ADDRESS_P (XEXP (op1, 0)))
+      && (first = ix86_broadcast_from_integer_constant (mode, op1)))
+    {
+      /* Broadcast to XMM/YMM/ZMM register from an integer constant.  */
+      op1 = ix86_gen_scratch_sse_rtx (mode, false);
+      if (!ix86_expand_vector_init_duplicate (false, mode, op1, first))
+	gcc_unreachable ();
+      emit_move_insn (op0, op1);
+      return;
     }
 
   /* We need to check memory alignment for SSE mode since attribute
@@ -15496,6 +15664,30 @@ ix86_expand_vector_extract (bool mmx_ok, rtx target, rtx vec, int elt)
     }
 }
 
+bool
+ix86_expand_integer_vec_duplicate (rtx *operands)
+{
+  /* Don't use integer vector broadcast if we can't move from GPR to SSE
+     register directly.  */
+  if (!TARGET_INTER_UNIT_MOVES_TO_VEC)
+    return false;
+
+  /* Enable VEC_DUPLICATE from a non-standard SSE constant integer only
+     if vector broadcast is available.  */
+  if (CONST_INT_P (operands[1])
+      && (!TARGET_AVX2
+	  || standard_sse_constant_p (operands[1],
+				      GET_MODE (operands[0]))))
+    return false;
+
+  if (!ix86_expand_vector_init_duplicate (false,
+					  GET_MODE (operands[0]),
+					  operands[0], operands[1]))
+    gcc_unreachable ();
+
+  return true;
+}
+
 /* Generate code to copy vector bits i / 2 ... i - 1 from vector SRC
    to bits 0 ... i / 2 - 1 of vector DEST, which has the same mode.
    The upper bits of DEST are undefined, though they shouldn't cause
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 7782cf1163f..f68617e77fd 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -50,6 +50,8 @@ extern void ix86_reset_previous_fndecl (void);
 
 extern bool ix86_using_red_zone (void);
 
+extern rtx ix86_gen_scratch_sse_rtx (machine_mode, bool);
+
 extern unsigned int ix86_regmode_natural_size (machine_mode);
 #ifdef RTX_CODE
 extern int standard_80387_constant_p (rtx);
@@ -257,6 +259,7 @@ extern void ix86_expand_mul_widen_hilo (rtx, rtx, rtx, bool, bool);
 extern void ix86_expand_sse2_mulv4si3 (rtx, rtx, rtx);
 extern void ix86_expand_sse2_mulvxdi3 (rtx, rtx, rtx);
 extern void ix86_expand_sse2_abs (rtx, rtx);
+extern bool ix86_expand_integer_vec_duplicate (rtx *);
 
 /* In i386-c.c  */
 extern void ix86_target_macros (void);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 04649b42122..795a7320f94 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -23061,6 +23061,37 @@ ix86_optab_supported_p (int op, machine_mode mode1, machine_mode,
     }
 }
 
+/* Return a scratch register in MODE for vector load and store.  If
+   CONSTANT_INT_BROADCAST is true, it is used to hold constant integer
+   broadcast result.  */
+
+rtx
+ix86_gen_scratch_sse_rtx (machine_mode mode,
+			  bool constant_int_broadcast)
+{
+  rtx target;
+
+  /* NB: Choose a hard scratch SSE register:
+     1. Avoid increasing stack alignment requirement.
+     2. For integer constant broadcast in 64-bit mode, avoid
+	transformation by the combine pass.
+   */
+  if (GET_MODE_SIZE (mode) >= 16
+      && !COMPLEX_MODE_P (mode)
+      && (SCALAR_INT_MODE_P (mode)
+	  || GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+      && ((constant_int_broadcast
+	   && TARGET_64BIT
+	   && GET_MODE_SIZE (mode) == 16)
+	  || GET_MODE_ALIGNMENT (mode) > crtl->stack_alignment_estimated))
+    target = gen_rtx_REG (mode, (TARGET_64BIT
+				 ? LAST_REX_SSE_REG
+				 : LAST_SSE_REG));
+  else
+    target = gen_reg_rtx (mode);
+  return target;
+}
+
 /* Address space support.
 
    This is not "far pointers" in the 16-bit sense, but an easy way
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index e4248e554eb..73d6d49a426 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -24601,3 +24601,22 @@
   "TARGET_WIDEKL"
   "aes<aeswideklvariant>\t{%0}"
   [(set_attr "type" "other")])
+
+;; Modes handled by broadcast patterns.
+(define_mode_iterator INT_BROADCAST_MODE
+  [(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI
+   (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI
+   (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
+   (V8DI "TARGET_AVX512F") (V4DI "TARGET_64BIT") V2DI])
+
+;; Broadcast from an integer.
+(define_expand "vec_duplicate<mode>"
+  [(set (match_operand:INT_BROADCAST_MODE 0 "register_operand")
+	(vec_duplicate:INT_BROADCAST_MODE
+	  (match_operand:<ssescalarmode> 1 "general_operand")))]
+  "TARGET_SSE2"
+{
+  if (!ix86_expand_integer_vec_duplicate (operands))
+    FAIL;
+  DONE;
+})
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c
index 0563e696316..a2664d87f29 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c
@@ -2,8 +2,11 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mavx512f -mavx512dq" } */
 /* { dg-additional-options "-mdynamic-no-pic" { target { *-*-darwin* && ia32 } } }
-/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to8\\\}" 5 } }  */
-/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to16\\\}" 5 } }  */
+/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to8\\\}" 2 { target { ! ia32 } } } }  */
+/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to8\\\}" 5 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to16\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %zmm\[0-9\]+" 3 } } */
+/* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+%r\[^\n\]*, %zmm\[0-9\]+" 3 { target { ! ia32 } } } } */
 
 typedef int v16si  __attribute__ ((vector_size (64)));
 typedef long long v8di  __attribute__ ((vector_size (64)));
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-5.c b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-5.c
index ffbe95980ca..477f9ca1282 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-5.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-5.c
@@ -2,8 +2,9 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mavx512f" } */
 /* { dg-additional-options "-mdynamic-no-pic" { target { *-*-darwin* && ia32 } } }
-/* { dg-final { scan-assembler-times "\[^n\n\]*\\\{1to8\\\}" 4 } }  */
-/* { dg-final { scan-assembler-times "\[^n\n\]*\\\{1to16\\\}" 4 } }  */
+/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to8\\\}" 4 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %zmm\[0-9\]+" 4 } } */
+/* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+%r\[^\n\]*, %zmm\[0-9\]+" 4 { target { ! ia32 } } } } */
 
 typedef int v16si  __attribute__ ((vector_size (64)));
 typedef long long v8di  __attribute__ ((vector_size (64)));
diff --git a/gcc/testsuite/gcc.target/i386/avx512f_cond_move.c b/gcc/testsuite/gcc.target/i386/avx512f_cond_move.c
index 99a89f51202..ca49a585232 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f_cond_move.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f_cond_move.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -mavx512f" } */
-/* { dg-final { scan-assembler-times "(?:vpblendmd|vmovdqa32)\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 8 } } */
+/* { dg-options "-O3 -mavx512f -mprefer-vector-width=512" } */
+/* { dg-final { scan-assembler-times "(?:vpbroadcastd|vmovdqa32)\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 8 } } */
 
 unsigned int x[128];
 int y[128];
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-1.c b/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-1.c
index c06369d93fd..f8eb99f0b5f 100644
--- a/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-1.c
@@ -2,9 +2,15 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mavx512f -mavx512vl -mavx512dq" } */
 /* { dg-additional-options "-mdynamic-no-pic" { target { *-*-darwin* && ia32 } } }
-/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to2\\\}" 5 } }  */
-/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to4\\\}" 10 } }  */
-/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to8\\\}" 5 } }  */
+/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to2\\\}" 2 { target { ! ia32 } } } }  */
+/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to4\\\}" 4 { target { ! ia32 } } } }  */
+/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to2\\\}" 5 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to4\\\}" 7 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to8\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 3 } } */
+/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %ymm\[0-9\]+" 3 } } */
+/* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+%r\[^\n\]*, %xmm\[0-9\]+" 3 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+%r\[^\n\]*, %ymm\[0-9\]+" 3 { target { ! ia32 } } } } */
 
 typedef int v4si  __attribute__ ((vector_size (16)));
 typedef int v8si  __attribute__ ((vector_size (32)));
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-5.c b/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-5.c
index 4998a9b8d51..32f6ac81841 100644
--- a/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-5.c
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-5.c
@@ -2,9 +2,12 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mavx512f -mavx512vl" } */
 /* { dg-additional-options "-mdynamic-no-pic" { target { *-*-darwin* && ia32 } } }
-/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to2\\\}" 4 } }  */
-/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to4\\\}" 8 } }  */
-/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to8\\\}" 4 } }  */
+/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to2\\\}" 4 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to4\\\}" 4 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 4 } } */
+/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %ymm\[0-9\]+" 4 } } */
+/* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+%r\[^\n\]*, %xmm\[0-9\]+" 4 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+%r\[^\n\]*, %ymm\[0-9\]+" 4 { target { ! ia32 } } } } */
 
 typedef int v4si  __attribute__ ((vector_size (16)));
 typedef int v8si  __attribute__ ((vector_size (32)));
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-1.c b/gcc/testsuite/gcc.target/i386/pr100865-1.c
new file mode 100644
index 00000000000..6c3097fb2a6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr100865-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=x86-64" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 3, 16);
+}
+
+/* { dg-final { scan-assembler-times "movdqa\[ \\t\]+\[^\n\]*%xmm" 1 } } */
+/* { dg-final { scan-assembler-times "movups\[\\t \]%xmm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-10a.c b/gcc/testsuite/gcc.target/i386/pr100865-10a.c
new file mode 100644
index 00000000000..7ffc19e56a8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr100865-10a.c
@@ -0,0 +1,33 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O3 -march=skylake" } */
+
+extern __int128 array[16];
+
+#define MK_CONST128_BROADCAST(A) \
+  ((((unsigned __int128) (unsigned char) A) << 120) \
+   | (((unsigned __int128) (unsigned char) A) << 112) \
+   | (((unsigned __int128) (unsigned char) A) << 104) \
+   | (((unsigned __int128) (unsigned char) A) << 96) \
+   | (((unsigned __int128) (unsigned char) A) << 88) \
+   | (((unsigned __int128) (unsigned char) A) << 80) \
+   | (((unsigned __int128) (unsigned char) A) << 72) \
+   | (((unsigned __int128) (unsigned char) A) << 64) \
+   | (((unsigned __int128) (unsigned char) A) << 56) \
+   | (((unsigned __int128) (unsigned char) A) << 48) \
+   | (((unsigned __int128) (unsigned char) A) << 40) \
+   | (((unsigned __int128) (unsigned char) A) << 32) \
+   | (((unsigned __int128) (unsigned char) A) << 24) \
+   | (((unsigned __int128) (unsigned char) A) << 16) \
+   | (((unsigned __int128) (unsigned char) A) << 8) \
+   | ((unsigned __int128) (unsigned char) A) )
+
+void
+foo (void)
+{
+  int i;
+  for (i = 0; i < sizeof (array) / sizeof (array[0]); i++)
+    array[i] = MK_CONST128_BROADCAST (0x1f);
+}
+
+/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+\[^\n\]*, %xmm\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-10b.c b/gcc/testsuite/gcc.target/i386/pr100865-10b.c
new file mode 100644
index 00000000000..edf52765c60
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr100865-10b.c
@@ -0,0 +1,7 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O3 -march=skylake-avx512" } */
+
+#include "pr100865-10a.c"
+
+/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-2.c b/gcc/testsuite/gcc.target/i386/pr100865-2.c
new file mode 100644
index 00000000000..17efe2d72a3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr100865-2.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=skylake" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 3, 16);
+}
+
+/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%xmm\[0-9\]+, %xmm\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%xmm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-not "vmovdqa" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-3.c b/gcc/testsuite/gcc.target/i386/pr100865-3.c
new file mode 100644
index 00000000000..b6dbcf7809b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr100865-3.c
@@ -0,0 +1,15 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=skylake-avx512" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 3, 16);
+}
+
+/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%xmm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-not "vpbroadcastb\[\\t \]+%xmm\[0-9\]+, %xmm\[0-9\]+" } } */
+/* { dg-final { scan-assembler-not "vmovdqa" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-4a.c b/gcc/testsuite/gcc.target/i386/pr100865-4a.c
new file mode 100644
index 00000000000..f55883598f9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr100865-4a.c
@@ -0,0 +1,16 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=skylake" } */
+
+extern char array[64];
+
+void
+foo (void)
+{
+  int i;
+  for (i = 0; i < sizeof (array) / sizeof (array[0]); i++)
+    array[i] = -45;
+}
+
+/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%xmm\[0-9\]+, %xmm\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%xmm\[0-9\]+, " 4 } } */
+/* { dg-final { scan-assembler-not "vmovdqa" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-4b.c b/gcc/testsuite/gcc.target/i386/pr100865-4b.c
new file mode 100644
index 00000000000..f41e6147b4c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr100865-4b.c
@@ -0,0 +1,9 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=skylake-avx512" } */
+
+#include "pr100865-4a.c"
+
+/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%xmm\[0-9\]+, " 4 } } */
+/* { dg-final { scan-assembler-not "vpbroadcastb\[\\t \]+%xmm\[0-9\]+, %xmm\[0-9\]+" } } */
+/* { dg-final { scan-assembler-not "vmovdqa" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-5a.c b/gcc/testsuite/gcc.target/i386/pr100865-5a.c
new file mode 100644
index 00000000000..4149797fe81
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr100865-5a.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=skylake" } */
+
+extern short array[64];
+
+void
+foo (void)
+{
+  int i;
+  for (i = 0; i < sizeof (array) / sizeof (array[0]); i++)
+    array[i] = -45;
+}
+
+/* { dg-final { scan-assembler-times "vpbroadcastw\[\\t \]+%xmm\[0-9\]+, %ymm\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, " 4 } } */
+/* { dg-final { scan-assembler-not "vmovdqa" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-5b.c b/gcc/testsuite/gcc.target/i386/pr100865-5b.c
new file mode 100644
index 00000000000..ded41b680d3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr100865-5b.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=skylake-avx512" } */
+
+#include "pr100865-5a.c"
+
+/* { dg-final { scan-assembler-times "vpbroadcastw\[\\t \]+%(?:r|e)\[^\n\]*, %ymm\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu16\[\\t \]%ymm\[0-9\]+, " 4 } } */
+/* { dg-final { scan-assembler-not "vpbroadcastw\[\\t \]+%xmm\[0-9\]+, %ymm\[0-9\]+" } } */
+/* { dg-final { scan-assembler-not "vmovdqa" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-6a.c b/gcc/testsuite/gcc.target/i386/pr100865-6a.c
new file mode 100644
index 00000000000..3fde549a10d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr100865-6a.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=skylake" } */
+
+extern int array[64];
+
+void
+foo (void)
+{
+  int i;
+  for (i = 0; i < sizeof (array) / sizeof (array[0]); i++)
+    array[i] = -45;
+}
+
+/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%xmm\[0-9\]+, %ymm\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, " 8 } } */
+/* { dg-final { scan-assembler-not "vmovdqa" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-6b.c b/gcc/testsuite/gcc.target/i386/pr100865-6b.c
new file mode 100644
index 00000000000..44e74c64e55
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr100865-6b.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=skylake-avx512" } */
+
+#include "pr100865-6a.c"
+
+/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %ymm\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, " 8 } } */
+/* { dg-final { scan-assembler-not "vpbroadcastd\[\\t \]+%xmm\[0-9\]+, %ymm\[0-9\]+" } } */
+/* { dg-final { scan-assembler-not "vmovdqa" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-7a.c b/gcc/testsuite/gcc.target/i386/pr100865-7a.c
new file mode 100644
index 00000000000..f6f2be91120
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr100865-7a.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=skylake" } */
+
+extern long long int array[64];
+
+void
+foo (void)
+{
+  int i;
+  for (i = 0; i < sizeof (array) / sizeof (array[0]); i++)
+    array[i] = -45;
+}
+
+/* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+\[^\n\]*, %ymm\[0-9\]+" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, " 16 } } */
+/* { dg-final { scan-assembler-not "vpbroadcastq" { target ia32 } } } */
+/* { dg-final { scan-assembler-not "vmovdqa" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-7b.c b/gcc/testsuite/gcc.target/i386/pr100865-7b.c
new file mode 100644
index 00000000000..0a68820aa32
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr100865-7b.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=skylake-avx512" } */
+
+#include "pr100865-7a.c"
+
+/* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+%r\[^\n\]*, %ymm\[0-9\]+" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+\[^\n\]*, %ymm\[0-9\]+" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, " 16 } } */
+/* { dg-final { scan-assembler-not "vmovdqa" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-8a.c b/gcc/testsuite/gcc.target/i386/pr100865-8a.c
new file mode 100644
index 00000000000..96e9f13204c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr100865-8a.c
@@ -0,0 +1,24 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O3 -march=skylake" } */
+
+extern __int128 array[16];
+
+#define MK_CONST128_BROADCAST(A) \
+  ((((unsigned __int128) (unsigned int) A) << 96) \
+   | (((unsigned __int128) (unsigned int) A) << 64) \
+   | (((unsigned __int128) (unsigned int) A) << 32) \
+   | ((unsigned __int128) (unsigned int) A) )
+
+#define MK_CONST128_BROADCAST_SIGNED(A) \
+  ((__int128) MK_CONST128_BROADCAST (A))
+
+void
+foo (void)
+{
+  int i;
+  for (i = 0; i < sizeof (array) / sizeof (array[0]); i++)
+    array[i] = MK_CONST128_BROADCAST_SIGNED (-45);
+}
+
+/* { dg-final { scan-assembler-times "(?:vpbroadcastq|vpshufd)\[\\t \]+\[^\n\]*, %xmm\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-8b.c b/gcc/testsuite/gcc.target/i386/pr100865-8b.c
new file mode 100644
index 00000000000..99a10ad83bd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr100865-8b.c
@@ -0,0 +1,7 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O3 -march=skylake-avx512" } */
+
+#include "pr100865-8a.c"
+
+/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-9a.c b/gcc/testsuite/gcc.target/i386/pr100865-9a.c
new file mode 100644
index 00000000000..45d0e0d0e2e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr100865-9a.c
@@ -0,0 +1,25 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O3 -march=skylake" } */
+
+extern __int128 array[16];
+
+#define MK_CONST128_BROADCAST(A) \
+  ((((unsigned __int128) (unsigned short) A) << 112) \
+   | (((unsigned __int128) (unsigned short) A) << 96) \
+   | (((unsigned __int128) (unsigned short) A) << 80) \
+   | (((unsigned __int128) (unsigned short) A) << 64) \
+   | (((unsigned __int128) (unsigned short) A) << 48) \
+   | (((unsigned __int128) (unsigned short) A) << 32) \
+   | (((unsigned __int128) (unsigned short) A) << 16) \
+   | ((unsigned __int128) (unsigned short) A) )
+
+void
+foo (void)
+{
+  int i;
+  for (i = 0; i < sizeof (array) / sizeof (array[0]); i++)
+    array[i] = MK_CONST128_BROADCAST (0x1fff);
+}
+
+/* { dg-final { scan-assembler-times "vpbroadcastw\[\\t \]+%xmm\[0-9\]+, %xmm\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-9b.c b/gcc/testsuite/gcc.target/i386/pr100865-9b.c
new file mode 100644
index 00000000000..14696248525
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr100865-9b.c
@@ -0,0 +1,7 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O3 -march=skylake-avx512" } */
+
+#include "pr100865-9a.c"
+
+/* { dg-final { scan-assembler-times "vpbroadcastw\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/2] Allow vec_duplicate_optab to fail
  2021-06-05 15:18 ` [PATCH v2 1/2] " H.J. Lu
@ 2021-06-07  7:12   ` Richard Sandiford
  2021-06-07 14:18     ` H.J. Lu
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Sandiford @ 2021-06-07  7:12 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches, Uros Bizjak, Jakub Jelinek, Richard Biener

"H.J. Lu" <hjl.tools@gmail.com> writes:
> Update vec_duplicate to allow to fail so that backend can only allow
> broadcasting an integer constant to a vector when broadcast instruction
> is available.

I'm not sure why we need this to fail though.  Once the optab is defined
for target X, the optab should handle all duplicates for target X,
even if there are different strategies it can use.

AIUI the case you want to make conditional is the constant case.
I guess the first question is: why don't we simplify those CONSTRUCTORs
to VECTOR_CSTs in gimple?  I'm surprised we still see the constant case
as a constructor here.

If we can't rely on that happening, then would it work to change:

	/* Try using vec_duplicate_optab for uniform vectors.  */
	if (!TREE_SIDE_EFFECTS (exp)
	    && VECTOR_MODE_P (mode)
	    && eltmode == GET_MODE_INNER (mode)
	    && ((icode = optab_handler (vec_duplicate_optab, mode))
		!= CODE_FOR_nothing)
	    && (elt = uniform_vector_p (exp)))

to something like:

	/* Try using vec_duplicate_optab for uniform vectors.  */
	if (!TREE_SIDE_EFFECTS (exp)
	    && VECTOR_MODE_P (mode)
	    && eltmode == GET_MODE_INNER (mode)
	    && (elt = uniform_vector_p (exp)))
	  {
	    if (TREE_CODE (elt) == INTEGER_CST
		|| TREE_CODE (elt) == POLY_INT_CST
		|| TREE_CODE (elt) == REAL_CST
		|| TREE_CODE (elt) == FIXED_CST)
	      {
		rtx src = gen_const_vec_duplicate (mode, expand_normal (node));
		emit_move_insn (target, src);
		break;
	      }
	    …
	  }

Thanks,
Richard

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/2] Allow vec_duplicate_optab to fail
  2021-06-07  7:12   ` Richard Sandiford
@ 2021-06-07 14:18     ` H.J. Lu
  2021-06-07 17:59       ` Richard Biener
  0 siblings, 1 reply; 11+ messages in thread
From: H.J. Lu @ 2021-06-07 14:18 UTC (permalink / raw)
  To: H.J. Lu, GCC Patches, Uros Bizjak, Jakub Jelinek, Richard Biener,
	Richard Sandiford

On Mon, Jun 7, 2021 at 12:12 AM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> "H.J. Lu" <hjl.tools@gmail.com> writes:
> > Update vec_duplicate to allow to fail so that backend can only allow
> > broadcasting an integer constant to a vector when broadcast instruction
> > is available.
>
> I'm not sure why we need this to fail though.  Once the optab is defined
> for target X, the optab should handle all duplicates for target X,
> even if there are different strategies it can use.
>
> AIUI the case you want to make conditional is the constant case.
> I guess the first question is: why don't we simplify those CONSTRUCTORs
> to VECTOR_CSTs in gimple?  I'm surprised we still see the constant case
> as a constructor here.

The particular testcase for vec_duplicate is gcc.dg/pr100239.c.

> If we can't rely on that happening, then would it work to change:
>
>         /* Try using vec_duplicate_optab for uniform vectors.  */
>         if (!TREE_SIDE_EFFECTS (exp)
>             && VECTOR_MODE_P (mode)
>             && eltmode == GET_MODE_INNER (mode)
>             && ((icode = optab_handler (vec_duplicate_optab, mode))
>                 != CODE_FOR_nothing)
>             && (elt = uniform_vector_p (exp)))
>
> to something like:
>
>         /* Try using vec_duplicate_optab for uniform vectors.  */
>         if (!TREE_SIDE_EFFECTS (exp)
>             && VECTOR_MODE_P (mode)
>             && eltmode == GET_MODE_INNER (mode)
>             && (elt = uniform_vector_p (exp)))
>           {
>             if (TREE_CODE (elt) == INTEGER_CST
>                 || TREE_CODE (elt) == POLY_INT_CST
>                 || TREE_CODE (elt) == REAL_CST
>                 || TREE_CODE (elt) == FIXED_CST)
>               {
>                 rtx src = gen_const_vec_duplicate (mode, expand_normal (node));
>                 emit_move_insn (target, src);
>                 break;
>               }
>             …
>           }

I will give it a try.

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/2] Allow vec_duplicate_optab to fail
  2021-06-07 14:18     ` H.J. Lu
@ 2021-06-07 17:59       ` Richard Biener
  2021-06-07 18:10         ` Richard Biener
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Biener @ 2021-06-07 17:59 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GCC Patches, Uros Bizjak, Jakub Jelinek, Richard Sandiford

On Mon, Jun 7, 2021 at 4:19 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Mon, Jun 7, 2021 at 12:12 AM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
> >
> > "H.J. Lu" <hjl.tools@gmail.com> writes:
> > > Update vec_duplicate to allow to fail so that backend can only allow
> > > broadcasting an integer constant to a vector when broadcast instruction
> > > is available.
> >
> > I'm not sure why we need this to fail though.  Once the optab is defined
> > for target X, the optab should handle all duplicates for target X,
> > even if there are different strategies it can use.
> >
> > AIUI the case you want to make conditional is the constant case.
> > I guess the first question is: why don't we simplify those CONSTRUCTORs
> > to VECTOR_CSTs in gimple?  I'm surprised we still see the constant case
> > as a constructor here.
>
> The particular testcase for vec_duplicate is gcc.dg/pr100239.c.
>
> > If we can't rely on that happening, then would it work to change:
> >
> >         /* Try using vec_duplicate_optab for uniform vectors.  */
> >         if (!TREE_SIDE_EFFECTS (exp)
> >             && VECTOR_MODE_P (mode)
> >             && eltmode == GET_MODE_INNER (mode)
> >             && ((icode = optab_handler (vec_duplicate_optab, mode))
> >                 != CODE_FOR_nothing)
> >             && (elt = uniform_vector_p (exp)))
> >
> > to something like:
> >
> >         /* Try using vec_duplicate_optab for uniform vectors.  */
> >         if (!TREE_SIDE_EFFECTS (exp)
> >             && VECTOR_MODE_P (mode)
> >             && eltmode == GET_MODE_INNER (mode)
> >             && (elt = uniform_vector_p (exp)))
> >           {
> >             if (TREE_CODE (elt) == INTEGER_CST
> >                 || TREE_CODE (elt) == POLY_INT_CST
> >                 || TREE_CODE (elt) == REAL_CST
> >                 || TREE_CODE (elt) == FIXED_CST)
> >               {
> >                 rtx src = gen_const_vec_duplicate (mode, expand_normal (node));
> >                 emit_move_insn (target, src);
> >                 break;
> >               }
> >             …
> >           }
>
> I will give it a try.

I can confirm that veclower leaves us with an unfolded constant CTOR.
If you file a PR to remind me I'll fix that.

Richard.

> Thanks.
>
> --
> H.J.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/2] Allow vec_duplicate_optab to fail
  2021-06-07 17:59       ` Richard Biener
@ 2021-06-07 18:10         ` Richard Biener
  2021-06-07 20:33           ` [PATCH] middle-end/100951 - make sure to generate VECTOR_CST in lowering H.J. Lu
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Biener @ 2021-06-07 18:10 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GCC Patches, Uros Bizjak, Jakub Jelinek, Richard Sandiford

[-- Attachment #1: Type: text/plain, Size: 2616 bytes --]

On Mon, Jun 7, 2021 at 7:59 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Mon, Jun 7, 2021 at 4:19 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Mon, Jun 7, 2021 at 12:12 AM Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> > >
> > > "H.J. Lu" <hjl.tools@gmail.com> writes:
> > > > Update vec_duplicate to allow to fail so that backend can only allow
> > > > broadcasting an integer constant to a vector when broadcast instruction
> > > > is available.
> > >
> > > I'm not sure why we need this to fail though.  Once the optab is defined
> > > for target X, the optab should handle all duplicates for target X,
> > > even if there are different strategies it can use.
> > >
> > > AIUI the case you want to make conditional is the constant case.
> > > I guess the first question is: why don't we simplify those CONSTRUCTORs
> > > to VECTOR_CSTs in gimple?  I'm surprised we still see the constant case
> > > as a constructor here.
> >
> > The particular testcase for vec_duplicate is gcc.dg/pr100239.c.
> >
> > > If we can't rely on that happening, then would it work to change:
> > >
> > >         /* Try using vec_duplicate_optab for uniform vectors.  */
> > >         if (!TREE_SIDE_EFFECTS (exp)
> > >             && VECTOR_MODE_P (mode)
> > >             && eltmode == GET_MODE_INNER (mode)
> > >             && ((icode = optab_handler (vec_duplicate_optab, mode))
> > >                 != CODE_FOR_nothing)
> > >             && (elt = uniform_vector_p (exp)))
> > >
> > > to something like:
> > >
> > >         /* Try using vec_duplicate_optab for uniform vectors.  */
> > >         if (!TREE_SIDE_EFFECTS (exp)
> > >             && VECTOR_MODE_P (mode)
> > >             && eltmode == GET_MODE_INNER (mode)
> > >             && (elt = uniform_vector_p (exp)))
> > >           {
> > >             if (TREE_CODE (elt) == INTEGER_CST
> > >                 || TREE_CODE (elt) == POLY_INT_CST
> > >                 || TREE_CODE (elt) == REAL_CST
> > >                 || TREE_CODE (elt) == FIXED_CST)
> > >               {
> > >                 rtx src = gen_const_vec_duplicate (mode, expand_normal (node));
> > >                 emit_move_insn (target, src);
> > >                 break;
> > >               }
> > >             …
> > >           }
> >
> > I will give it a try.
>
> I can confirm that veclower leaves us with an unfolded constant CTOR.
> If you file a PR to remind me I'll fix that.

The attached untested patch fixes this for the testcase.

Richard.

> Richard.
>
> > Thanks.
> >
> > --
> > H.J.

[-- Attachment #2: p --]
[-- Type: application/octet-stream, Size: 4438 bytes --]

From 3c89ebfcbeaafdd9bbf31a300593365eb92906c4 Mon Sep 17 00:00:00 2001
From: Richard Biener <rguenther@suse.de>
Date: Mon, 7 Jun 2021 20:08:13 +0200
Subject: [PATCH] middle-end/ - make sure to generate VECTOR_CST in lowering

When vector lowering creates piecewise ops make sure to create
VECTOR_CSTs instead of CONSTRUCTORs when possible.

2021-06-07  Richard Biener  <rguenther@suse.de>

	* tree-vect-generic.c (): Build a VECTOR_CST if all
	elements are constant.
---
 gcc/tree-vect-generic.c | 34 ++++++++++++++++++++++++++++++----
 1 file changed, 30 insertions(+), 4 deletions(-)

diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c
index d9c0ac9de7e..5f3f9fa005e 100644
--- a/gcc/tree-vect-generic.c
+++ b/gcc/tree-vect-generic.c
@@ -328,16 +328,22 @@ expand_vector_piecewise (gimple_stmt_iterator *gsi, elem_op_func f,
   if (!ret_type)
     ret_type = type;
   vec_alloc (v, (nunits + delta - 1) / delta);
+  bool constant_p = true;
   for (i = 0; i < nunits;
        i += delta, index = int_const_binop (PLUS_EXPR, index, part_width))
     {
       tree result = f (gsi, inner_type, a, b, index, part_width, code,
 		       ret_type);
+      if (!CONSTANT_CLASS_P (result))
+	constant_p = false;
       constructor_elt ce = {NULL_TREE, result};
       v->quick_push (ce);
     }
 
-  return build_constructor (ret_type, v);
+  if (constant_p)
+    return build_vector_from_ctor (ret_type, v);
+  else
+    return build_constructor (ret_type, v);
 }
 
 /* Expand a vector operation to scalars with the freedom to use
@@ -1105,6 +1111,7 @@ expand_vector_condition (gimple_stmt_iterator *gsi, bitmap dce_ssa_names)
 
   int nunits = nunits_for_known_piecewise_op (type);
   vec_alloc (v, nunits);
+  bool constant_p = true;
   for (int i = 0; i < nunits; i++)
     {
       tree aa, result;
@@ -1129,6 +1136,8 @@ expand_vector_condition (gimple_stmt_iterator *gsi, bitmap dce_ssa_names)
       else
 	aa = tree_vec_extract (gsi, cond_type, a, width, index);
       result = gimplify_build3 (gsi, COND_EXPR, inner_type, aa, bb, cc);
+      if (!CONSTANT_CLASS_P (result))
+	constant_p = false;
       constructor_elt ce = {NULL_TREE, result};
       v->quick_push (ce);
       index = int_const_binop (PLUS_EXPR, index, width);
@@ -1138,7 +1147,10 @@ expand_vector_condition (gimple_stmt_iterator *gsi, bitmap dce_ssa_names)
 	comp_index = int_const_binop (PLUS_EXPR, comp_index, comp_width);
     }
 
-  constr = build_constructor (type, v);
+  if (constant_p)
+    constr = build_vector_from_ctor (type, v);
+  else
+    constr = build_constructor (type, v);
   gimple_assign_set_rhs_from_tree (gsi, constr);
   update_stmt (gsi_stmt (*gsi));
 
@@ -1578,6 +1590,7 @@ lower_vec_perm (gimple_stmt_iterator *gsi)
               "vector shuffling operation will be expanded piecewise");
 
   vec_alloc (v, elements);
+  bool constant_p = true;
   for (i = 0; i < elements; i++)
     {
       si = size_int (i);
@@ -1639,10 +1652,15 @@ lower_vec_perm (gimple_stmt_iterator *gsi)
 	    t = v0_val;
         }
 
+      if (!CONSTANT_CLASS_P (t))
+	constant_p = false;
       CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, t);
     }
 
-  constr = build_constructor (vect_type, v);
+  if (constant_p)
+    constr = build_vector_from_ctor (vect_type, v);
+  else
+    constr = build_constructor (vect_type, v);
   gimple_assign_set_rhs_from_tree (gsi, constr);
   update_stmt (gsi_stmt (*gsi));
 }
@@ -2014,6 +2032,7 @@ expand_vector_conversion (gimple_stmt_iterator *gsi)
 		}
 
 	      vec_alloc (v, (nunits + delta - 1) / delta * 2);
+	      bool constant_p = true;
 	      for (i = 0; i < nunits;
 		   i += delta, index = int_const_binop (PLUS_EXPR, index,
 							part_width))
@@ -2024,12 +2043,19 @@ expand_vector_conversion (gimple_stmt_iterator *gsi)
 					  index);
 		  tree result = gimplify_build1 (gsi, code1, cretd_type, a);
 		  constructor_elt ce = { NULL_TREE, result };
+		  if (!CONSTANT_CLASS_P (ce.value))
+		    constant_p = false;
 		  v->quick_push (ce);
 		  ce.value = gimplify_build1 (gsi, code2, cretd_type, a);
+		  if (!CONSTANT_CLASS_P (ce.value))
+		    constant_p = false;
 		  v->quick_push (ce);
 		}
 
-	      new_rhs = build_constructor (ret_type, v);
+	      if (constant_p)
+		new_rhs = build_vector_from_ctor (ret_type, v);
+	      else
+		new_rhs = build_constructor (ret_type, v);
 	      g = gimple_build_assign (lhs, new_rhs);
 	      gsi_replace (gsi, g, false);
 	      return;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] middle-end/100951 - make sure to generate VECTOR_CST in lowering
  2021-06-07 18:10         ` Richard Biener
@ 2021-06-07 20:33           ` H.J. Lu
  2021-06-09 21:03             ` Jeff Law
  0 siblings, 1 reply; 11+ messages in thread
From: H.J. Lu @ 2021-06-07 20:33 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Uros Bizjak, Jakub Jelinek, Richard Sandiford

[-- Attachment #1: Type: text/plain, Size: 2811 bytes --]

On Mon, Jun 7, 2021 at 11:10 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Mon, Jun 7, 2021 at 7:59 PM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Mon, Jun 7, 2021 at 4:19 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >
> > > On Mon, Jun 7, 2021 at 12:12 AM Richard Sandiford
> > > <richard.sandiford@arm.com> wrote:
> > > >
> > > > "H.J. Lu" <hjl.tools@gmail.com> writes:
> > > > > Update vec_duplicate to allow to fail so that backend can only allow
> > > > > broadcasting an integer constant to a vector when broadcast instruction
> > > > > is available.
> > > >
> > > > I'm not sure why we need this to fail though.  Once the optab is defined
> > > > for target X, the optab should handle all duplicates for target X,
> > > > even if there are different strategies it can use.
> > > >
> > > > AIUI the case you want to make conditional is the constant case.
> > > > I guess the first question is: why don't we simplify those CONSTRUCTORs
> > > > to VECTOR_CSTs in gimple?  I'm surprised we still see the constant case
> > > > as a constructor here.
> > >
> > > The particular testcase for vec_duplicate is gcc.dg/pr100239.c.
> > >
> > > > If we can't rely on that happening, then would it work to change:
> > > >
> > > >         /* Try using vec_duplicate_optab for uniform vectors.  */
> > > >         if (!TREE_SIDE_EFFECTS (exp)
> > > >             && VECTOR_MODE_P (mode)
> > > >             && eltmode == GET_MODE_INNER (mode)
> > > >             && ((icode = optab_handler (vec_duplicate_optab, mode))
> > > >                 != CODE_FOR_nothing)
> > > >             && (elt = uniform_vector_p (exp)))
> > > >
> > > > to something like:
> > > >
> > > >         /* Try using vec_duplicate_optab for uniform vectors.  */
> > > >         if (!TREE_SIDE_EFFECTS (exp)
> > > >             && VECTOR_MODE_P (mode)
> > > >             && eltmode == GET_MODE_INNER (mode)
> > > >             && (elt = uniform_vector_p (exp)))
> > > >           {
> > > >             if (TREE_CODE (elt) == INTEGER_CST
> > > >                 || TREE_CODE (elt) == POLY_INT_CST
> > > >                 || TREE_CODE (elt) == REAL_CST
> > > >                 || TREE_CODE (elt) == FIXED_CST)
> > > >               {
> > > >                 rtx src = gen_const_vec_duplicate (mode, expand_normal (node));
> > > >                 emit_move_insn (target, src);
> > > >                 break;
> > > >               }
> > > >             …
> > > >           }
> > >
> > > I will give it a try.
> >
> > I can confirm that veclower leaves us with an unfolded constant CTOR.
> > If you file a PR to remind me I'll fix that.
>
> The attached untested patch fixes this for the testcase.
>

Here is the patch + the testcase.

-- 
H.J.

[-- Attachment #2: 0001-middle-end-100951-make-sure-to-generate-VECTOR_CST-i.patch --]
[-- Type: text/x-patch, Size: 5388 bytes --]

From aac56894719b59e552b493c970946225ed8c27f6 Mon Sep 17 00:00:00 2001
From: Richard Biener <rguenther@suse.de>
Date: Mon, 7 Jun 2021 20:08:13 +0200
Subject: [PATCH] middle-end/100951 - make sure to generate VECTOR_CST in
 lowering

When vector lowering creates piecewise ops make sure to create
VECTOR_CSTs instead of CONSTRUCTORs when possible.

gcc/

2021-06-07  Richard Biener  <rguenther@suse.de>

	PR middle-end/100951
	* tree-vect-generic.c (): Build a VECTOR_CST if all
	elements are constant.

gcc/testsuite/

2021-06-07  H.J. Lu  <hjl.tools@gmail.com>

	PR middle-end/100951
	* gcc.target/i386/pr100951.c: New test.
---
 gcc/testsuite/gcc.target/i386/pr100951.c | 15 +++++++++++
 gcc/tree-vect-generic.c                  | 34 +++++++++++++++++++++---
 2 files changed, 45 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100951.c

diff --git a/gcc/testsuite/gcc.target/i386/pr100951.c b/gcc/testsuite/gcc.target/i386/pr100951.c
new file mode 100644
index 00000000000..16d8bafa663
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr100951.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -march=x86-64" } */
+
+typedef short __attribute__((__vector_size__ (8 * sizeof (short)))) V;
+V v, w;
+
+void
+foo (void)
+{
+  w = __builtin_shuffle (v != v, 0 < (V) {}, (V) {192} >> 5);
+}
+
+/* { dg-final { scan-assembler-not "punpcklwd" } } */
+/* { dg-final { scan-assembler-not "pshufd" } } */
+/* { dg-final { scan-assembler-times "pxor\[\\t \]%xmm\[0-9\]+, %xmm\[0-9\]+" 1 } } */
diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c
index d9c0ac9de7e..5f3f9fa005e 100644
--- a/gcc/tree-vect-generic.c
+++ b/gcc/tree-vect-generic.c
@@ -328,16 +328,22 @@ expand_vector_piecewise (gimple_stmt_iterator *gsi, elem_op_func f,
   if (!ret_type)
     ret_type = type;
   vec_alloc (v, (nunits + delta - 1) / delta);
+  bool constant_p = true;
   for (i = 0; i < nunits;
        i += delta, index = int_const_binop (PLUS_EXPR, index, part_width))
     {
       tree result = f (gsi, inner_type, a, b, index, part_width, code,
 		       ret_type);
+      if (!CONSTANT_CLASS_P (result))
+	constant_p = false;
       constructor_elt ce = {NULL_TREE, result};
       v->quick_push (ce);
     }
 
-  return build_constructor (ret_type, v);
+  if (constant_p)
+    return build_vector_from_ctor (ret_type, v);
+  else
+    return build_constructor (ret_type, v);
 }
 
 /* Expand a vector operation to scalars with the freedom to use
@@ -1105,6 +1111,7 @@ expand_vector_condition (gimple_stmt_iterator *gsi, bitmap dce_ssa_names)
 
   int nunits = nunits_for_known_piecewise_op (type);
   vec_alloc (v, nunits);
+  bool constant_p = true;
   for (int i = 0; i < nunits; i++)
     {
       tree aa, result;
@@ -1129,6 +1136,8 @@ expand_vector_condition (gimple_stmt_iterator *gsi, bitmap dce_ssa_names)
       else
 	aa = tree_vec_extract (gsi, cond_type, a, width, index);
       result = gimplify_build3 (gsi, COND_EXPR, inner_type, aa, bb, cc);
+      if (!CONSTANT_CLASS_P (result))
+	constant_p = false;
       constructor_elt ce = {NULL_TREE, result};
       v->quick_push (ce);
       index = int_const_binop (PLUS_EXPR, index, width);
@@ -1138,7 +1147,10 @@ expand_vector_condition (gimple_stmt_iterator *gsi, bitmap dce_ssa_names)
 	comp_index = int_const_binop (PLUS_EXPR, comp_index, comp_width);
     }
 
-  constr = build_constructor (type, v);
+  if (constant_p)
+    constr = build_vector_from_ctor (type, v);
+  else
+    constr = build_constructor (type, v);
   gimple_assign_set_rhs_from_tree (gsi, constr);
   update_stmt (gsi_stmt (*gsi));
 
@@ -1578,6 +1590,7 @@ lower_vec_perm (gimple_stmt_iterator *gsi)
               "vector shuffling operation will be expanded piecewise");
 
   vec_alloc (v, elements);
+  bool constant_p = true;
   for (i = 0; i < elements; i++)
     {
       si = size_int (i);
@@ -1639,10 +1652,15 @@ lower_vec_perm (gimple_stmt_iterator *gsi)
 	    t = v0_val;
         }
 
+      if (!CONSTANT_CLASS_P (t))
+	constant_p = false;
       CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, t);
     }
 
-  constr = build_constructor (vect_type, v);
+  if (constant_p)
+    constr = build_vector_from_ctor (vect_type, v);
+  else
+    constr = build_constructor (vect_type, v);
   gimple_assign_set_rhs_from_tree (gsi, constr);
   update_stmt (gsi_stmt (*gsi));
 }
@@ -2014,6 +2032,7 @@ expand_vector_conversion (gimple_stmt_iterator *gsi)
 		}
 
 	      vec_alloc (v, (nunits + delta - 1) / delta * 2);
+	      bool constant_p = true;
 	      for (i = 0; i < nunits;
 		   i += delta, index = int_const_binop (PLUS_EXPR, index,
 							part_width))
@@ -2024,12 +2043,19 @@ expand_vector_conversion (gimple_stmt_iterator *gsi)
 					  index);
 		  tree result = gimplify_build1 (gsi, code1, cretd_type, a);
 		  constructor_elt ce = { NULL_TREE, result };
+		  if (!CONSTANT_CLASS_P (ce.value))
+		    constant_p = false;
 		  v->quick_push (ce);
 		  ce.value = gimplify_build1 (gsi, code2, cretd_type, a);
+		  if (!CONSTANT_CLASS_P (ce.value))
+		    constant_p = false;
 		  v->quick_push (ce);
 		}
 
-	      new_rhs = build_constructor (ret_type, v);
+	      if (constant_p)
+		new_rhs = build_vector_from_ctor (ret_type, v);
+	      else
+		new_rhs = build_constructor (ret_type, v);
 	      g = gimple_build_assign (lhs, new_rhs);
 	      gsi_replace (gsi, g, false);
 	      return;
-- 
2.31.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] middle-end/100951 - make sure to generate VECTOR_CST in lowering
  2021-06-07 20:33           ` [PATCH] middle-end/100951 - make sure to generate VECTOR_CST in lowering H.J. Lu
@ 2021-06-09 21:03             ` Jeff Law
  2021-06-09 21:31               ` H.J. Lu
  0 siblings, 1 reply; 11+ messages in thread
From: Jeff Law @ 2021-06-09 21:03 UTC (permalink / raw)
  To: H.J. Lu, Richard Biener; +Cc: Jakub Jelinek, GCC Patches, Richard Sandiford



On 6/7/2021 2:33 PM, H.J. Lu via Gcc-patches wrote:
> On Mon, Jun 7, 2021 at 11:10 AM Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Mon, Jun 7, 2021 at 7:59 PM Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Mon, Jun 7, 2021 at 4:19 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> On Mon, Jun 7, 2021 at 12:12 AM Richard Sandiford
>>>> <richard.sandiford@arm.com> wrote:
>>>>> "H.J. Lu" <hjl.tools@gmail.com> writes:
>>>>>> Update vec_duplicate to allow to fail so that backend can only allow
>>>>>> broadcasting an integer constant to a vector when broadcast instruction
>>>>>> is available.
>>>>> I'm not sure why we need this to fail though.  Once the optab is defined
>>>>> for target X, the optab should handle all duplicates for target X,
>>>>> even if there are different strategies it can use.
>>>>>
>>>>> AIUI the case you want to make conditional is the constant case.
>>>>> I guess the first question is: why don't we simplify those CONSTRUCTORs
>>>>> to VECTOR_CSTs in gimple?  I'm surprised we still see the constant case
>>>>> as a constructor here.
>>>> The particular testcase for vec_duplicate is gcc.dg/pr100239.c.
>>>>
>>>>> If we can't rely on that happening, then would it work to change:
>>>>>
>>>>>          /* Try using vec_duplicate_optab for uniform vectors.  */
>>>>>          if (!TREE_SIDE_EFFECTS (exp)
>>>>>              && VECTOR_MODE_P (mode)
>>>>>              && eltmode == GET_MODE_INNER (mode)
>>>>>              && ((icode = optab_handler (vec_duplicate_optab, mode))
>>>>>                  != CODE_FOR_nothing)
>>>>>              && (elt = uniform_vector_p (exp)))
>>>>>
>>>>> to something like:
>>>>>
>>>>>          /* Try using vec_duplicate_optab for uniform vectors.  */
>>>>>          if (!TREE_SIDE_EFFECTS (exp)
>>>>>              && VECTOR_MODE_P (mode)
>>>>>              && eltmode == GET_MODE_INNER (mode)
>>>>>              && (elt = uniform_vector_p (exp)))
>>>>>            {
>>>>>              if (TREE_CODE (elt) == INTEGER_CST
>>>>>                  || TREE_CODE (elt) == POLY_INT_CST
>>>>>                  || TREE_CODE (elt) == REAL_CST
>>>>>                  || TREE_CODE (elt) == FIXED_CST)
>>>>>                {
>>>>>                  rtx src = gen_const_vec_duplicate (mode, expand_normal (node));
>>>>>                  emit_move_insn (target, src);
>>>>>                  break;
>>>>>                }
>>>>>              …
>>>>>            }
>>>> I will give it a try.
>>> I can confirm that veclower leaves us with an unfolded constant CTOR.
>>> If you file a PR to remind me I'll fix that.
>> The attached untested patch fixes this for the testcase.
>>
> Here is the patch + the testcase.
>
>
> 0001-middle-end-100951-make-sure-to-generate-VECTOR_CST-i.patch
>
>  From aac56894719b59e552b493c970946225ed8c27f6 Mon Sep 17 00:00:00 2001
> From: Richard Biener <rguenther@suse.de>
> Date: Mon, 7 Jun 2021 20:08:13 +0200
> Subject: [PATCH] middle-end/100951 - make sure to generate VECTOR_CST in
>   lowering
>
> When vector lowering creates piecewise ops make sure to create
> VECTOR_CSTs instead of CONSTRUCTORs when possible.
>
> gcc/
>
> 2021-06-07  Richard Biener  <rguenther@suse.de>
>
> 	PR middle-end/100951
> 	* tree-vect-generic.c (): Build a VECTOR_CST if all
> 	elements are constant.
>
> gcc/testsuite/
>
> 2021-06-07  H.J. Lu  <hjl.tools@gmail.com>
>
> 	PR middle-end/100951
> 	* gcc.target/i386/pr100951.c: New test.
Assuming this passed testing it is OK.
jeff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] middle-end/100951 - make sure to generate VECTOR_CST in lowering
  2021-06-09 21:03             ` Jeff Law
@ 2021-06-09 21:31               ` H.J. Lu
  0 siblings, 0 replies; 11+ messages in thread
From: H.J. Lu @ 2021-06-09 21:31 UTC (permalink / raw)
  To: Jeff Law; +Cc: Richard Biener, Jakub Jelinek, GCC Patches, Richard Sandiford

On Wed, Jun 9, 2021 at 2:03 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 6/7/2021 2:33 PM, H.J. Lu via Gcc-patches wrote:
>
> On Mon, Jun 7, 2021 at 11:10 AM Richard Biener
> <richard.guenther@gmail.com> wrote:
>
> On Mon, Jun 7, 2021 at 7:59 PM Richard Biener
> <richard.guenther@gmail.com> wrote:
>
> On Mon, Jun 7, 2021 at 4:19 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Mon, Jun 7, 2021 at 12:12 AM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>
> "H.J. Lu" <hjl.tools@gmail.com> writes:
>
> Update vec_duplicate to allow to fail so that backend can only allow
> broadcasting an integer constant to a vector when broadcast instruction
> is available.
>
> I'm not sure why we need this to fail though.  Once the optab is defined
> for target X, the optab should handle all duplicates for target X,
> even if there are different strategies it can use.
>
> AIUI the case you want to make conditional is the constant case.
> I guess the first question is: why don't we simplify those CONSTRUCTORs
> to VECTOR_CSTs in gimple?  I'm surprised we still see the constant case
> as a constructor here.
>
> The particular testcase for vec_duplicate is gcc.dg/pr100239.c.
>
> If we can't rely on that happening, then would it work to change:
>
>         /* Try using vec_duplicate_optab for uniform vectors.  */
>         if (!TREE_SIDE_EFFECTS (exp)
>             && VECTOR_MODE_P (mode)
>             && eltmode == GET_MODE_INNER (mode)
>             && ((icode = optab_handler (vec_duplicate_optab, mode))
>                 != CODE_FOR_nothing)
>             && (elt = uniform_vector_p (exp)))
>
> to something like:
>
>         /* Try using vec_duplicate_optab for uniform vectors.  */
>         if (!TREE_SIDE_EFFECTS (exp)
>             && VECTOR_MODE_P (mode)
>             && eltmode == GET_MODE_INNER (mode)
>             && (elt = uniform_vector_p (exp)))
>           {
>             if (TREE_CODE (elt) == INTEGER_CST
>                 || TREE_CODE (elt) == POLY_INT_CST
>                 || TREE_CODE (elt) == REAL_CST
>                 || TREE_CODE (elt) == FIXED_CST)
>               {
>                 rtx src = gen_const_vec_duplicate (mode, expand_normal (node));
>                 emit_move_insn (target, src);
>                 break;
>               }
>             …
>           }
>
> I will give it a try.
>
> I can confirm that veclower leaves us with an unfolded constant CTOR.
> If you file a PR to remind me I'll fix that.
>
> The attached untested patch fixes this for the testcase.
>
> Here is the patch + the testcase.
>
>
> 0001-middle-end-100951-make-sure-to-generate-VECTOR_CST-i.patch
>
> From aac56894719b59e552b493c970946225ed8c27f6 Mon Sep 17 00:00:00 2001
> From: Richard Biener <rguenther@suse.de>
> Date: Mon, 7 Jun 2021 20:08:13 +0200
> Subject: [PATCH] middle-end/100951 - make sure to generate VECTOR_CST in
>  lowering
>
> When vector lowering creates piecewise ops make sure to create
> VECTOR_CSTs instead of CONSTRUCTORs when possible.
>
> gcc/
>
> 2021-06-07  Richard Biener  <rguenther@suse.de>
>
> PR middle-end/100951
> * tree-vect-generic.c (): Build a VECTOR_CST if all
> elements are constant.
>
> gcc/testsuite/
>
> 2021-06-07  H.J. Lu  <hjl.tools@gmail.com>
>
> PR middle-end/100951
> * gcc.target/i386/pr100951.c: New test.
>
> Assuming this passed testing it is OK.
> jeff

Richard has committed:

commit ffe3a37f54ab866d85bdde48c2a32be5e09d8515
Author: Richard Biener <rguenther@suse.de>
Date:   Mon Jun 7 20:08:13 2021 +0200

    middle-end/100951 - make sure to generate VECTOR_CST in lowering

    When vector lowering creates piecewise ops make sure to create
    VECTOR_CSTs instead of CONSTRUCTORs when possible.


-- 
H.J.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] middle-end/100951 - make sure to generate VECTOR_CST in lowering
@ 2021-06-08  8:47 Richard Biener
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Biener @ 2021-06-08  8:47 UTC (permalink / raw)
  To: gcc-patches

When vector lowering creates piecewise ops make sure to create
VECTOR_CSTs instead of CONSTRUCTORs when possible.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

gcc/

2021-06-07  Richard Biener  <rguenther@suse.de>

	PR middle-end/100951
	* tree-vect-generic.c (expand_vector_piecewise): Build a
	VECTOR_CST if all elements are constant.
	(expand_vector_condition): Likewise.
	(lower_vec_perm): Likewise.
	(expand_vector_conversion): Likewise.

gcc/testsuite/

2021-06-07  H.J. Lu  <hjl.tools@gmail.com>

	PR middle-end/100951
	* gcc.target/i386/pr100951.c: New test.
---
 gcc/testsuite/gcc.target/i386/pr100951.c | 15 +++++++++++
 gcc/tree-vect-generic.c                  | 34 +++++++++++++++++++++---
 2 files changed, 45 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100951.c

diff --git a/gcc/testsuite/gcc.target/i386/pr100951.c b/gcc/testsuite/gcc.target/i386/pr100951.c
new file mode 100644
index 00000000000..16d8bafa663
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr100951.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -march=x86-64" } */
+
+typedef short __attribute__((__vector_size__ (8 * sizeof (short)))) V;
+V v, w;
+
+void
+foo (void)
+{
+  w = __builtin_shuffle (v != v, 0 < (V) {}, (V) {192} >> 5);
+}
+
+/* { dg-final { scan-assembler-not "punpcklwd" } } */
+/* { dg-final { scan-assembler-not "pshufd" } } */
+/* { dg-final { scan-assembler-times "pxor\[\\t \]%xmm\[0-9\]+, %xmm\[0-9\]+" 1 } } */
diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c
index d9c0ac9de7e..5f3f9fa005e 100644
--- a/gcc/tree-vect-generic.c
+++ b/gcc/tree-vect-generic.c
@@ -328,16 +328,22 @@ expand_vector_piecewise (gimple_stmt_iterator *gsi, elem_op_func f,
   if (!ret_type)
     ret_type = type;
   vec_alloc (v, (nunits + delta - 1) / delta);
+  bool constant_p = true;
   for (i = 0; i < nunits;
        i += delta, index = int_const_binop (PLUS_EXPR, index, part_width))
     {
       tree result = f (gsi, inner_type, a, b, index, part_width, code,
 		       ret_type);
+      if (!CONSTANT_CLASS_P (result))
+	constant_p = false;
       constructor_elt ce = {NULL_TREE, result};
       v->quick_push (ce);
     }
 
-  return build_constructor (ret_type, v);
+  if (constant_p)
+    return build_vector_from_ctor (ret_type, v);
+  else
+    return build_constructor (ret_type, v);
 }
 
 /* Expand a vector operation to scalars with the freedom to use
@@ -1105,6 +1111,7 @@ expand_vector_condition (gimple_stmt_iterator *gsi, bitmap dce_ssa_names)
 
   int nunits = nunits_for_known_piecewise_op (type);
   vec_alloc (v, nunits);
+  bool constant_p = true;
   for (int i = 0; i < nunits; i++)
     {
       tree aa, result;
@@ -1129,6 +1136,8 @@ expand_vector_condition (gimple_stmt_iterator *gsi, bitmap dce_ssa_names)
       else
 	aa = tree_vec_extract (gsi, cond_type, a, width, index);
       result = gimplify_build3 (gsi, COND_EXPR, inner_type, aa, bb, cc);
+      if (!CONSTANT_CLASS_P (result))
+	constant_p = false;
       constructor_elt ce = {NULL_TREE, result};
       v->quick_push (ce);
       index = int_const_binop (PLUS_EXPR, index, width);
@@ -1138,7 +1147,10 @@ expand_vector_condition (gimple_stmt_iterator *gsi, bitmap dce_ssa_names)
 	comp_index = int_const_binop (PLUS_EXPR, comp_index, comp_width);
     }
 
-  constr = build_constructor (type, v);
+  if (constant_p)
+    constr = build_vector_from_ctor (type, v);
+  else
+    constr = build_constructor (type, v);
   gimple_assign_set_rhs_from_tree (gsi, constr);
   update_stmt (gsi_stmt (*gsi));
 
@@ -1578,6 +1590,7 @@ lower_vec_perm (gimple_stmt_iterator *gsi)
               "vector shuffling operation will be expanded piecewise");
 
   vec_alloc (v, elements);
+  bool constant_p = true;
   for (i = 0; i < elements; i++)
     {
       si = size_int (i);
@@ -1639,10 +1652,15 @@ lower_vec_perm (gimple_stmt_iterator *gsi)
 	    t = v0_val;
         }
 
+      if (!CONSTANT_CLASS_P (t))
+	constant_p = false;
       CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, t);
     }
 
-  constr = build_constructor (vect_type, v);
+  if (constant_p)
+    constr = build_vector_from_ctor (vect_type, v);
+  else
+    constr = build_constructor (vect_type, v);
   gimple_assign_set_rhs_from_tree (gsi, constr);
   update_stmt (gsi_stmt (*gsi));
 }
@@ -2014,6 +2032,7 @@ expand_vector_conversion (gimple_stmt_iterator *gsi)
 		}
 
 	      vec_alloc (v, (nunits + delta - 1) / delta * 2);
+	      bool constant_p = true;
 	      for (i = 0; i < nunits;
 		   i += delta, index = int_const_binop (PLUS_EXPR, index,
 							part_width))
@@ -2024,12 +2043,19 @@ expand_vector_conversion (gimple_stmt_iterator *gsi)
 					  index);
 		  tree result = gimplify_build1 (gsi, code1, cretd_type, a);
 		  constructor_elt ce = { NULL_TREE, result };
+		  if (!CONSTANT_CLASS_P (ce.value))
+		    constant_p = false;
 		  v->quick_push (ce);
 		  ce.value = gimplify_build1 (gsi, code2, cretd_type, a);
+		  if (!CONSTANT_CLASS_P (ce.value))
+		    constant_p = false;
 		  v->quick_push (ce);
 		}
 
-	      new_rhs = build_constructor (ret_type, v);
+	      if (constant_p)
+		new_rhs = build_vector_from_ctor (ret_type, v);
+	      else
+		new_rhs = build_constructor (ret_type, v);
 	      g = gimple_build_assign (lhs, new_rhs);
 	      gsi_replace (gsi, g, false);
 	      return;
-- 
2.26.2

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-06-09 21:31 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-05 15:18 [PATCH v2 0/2] Allow vec_duplicate_optab to fail H.J. Lu
2021-06-05 15:18 ` [PATCH v2 1/2] " H.J. Lu
2021-06-07  7:12   ` Richard Sandiford
2021-06-07 14:18     ` H.J. Lu
2021-06-07 17:59       ` Richard Biener
2021-06-07 18:10         ` Richard Biener
2021-06-07 20:33           ` [PATCH] middle-end/100951 - make sure to generate VECTOR_CST in lowering H.J. Lu
2021-06-09 21:03             ` Jeff Law
2021-06-09 21:31               ` H.J. Lu
2021-06-05 15:18 ` [PATCH v2 2/2] x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast H.J. Lu
2021-06-08  8:47 [PATCH] middle-end/100951 - make sure to generate VECTOR_CST in lowering Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).