public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] Allow vec_duplicate_optab to fail
@ 2021-06-05 15:18 H.J. Lu
  2021-06-05 15:18 ` [PATCH v2 1/2] " H.J. Lu
  2021-06-05 15:18 ` [PATCH v2 2/2] x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast H.J. Lu
  0 siblings, 2 replies; 11+ messages in thread
From: H.J. Lu @ 2021-06-05 15:18 UTC (permalink / raw)
  To: gcc-patches; +Cc: Uros Bizjak, Jakub Jelinek, Richard Sandiford, Richard Biener

We'd like to add vec_duplicate_optab to x86 backend.  There are 3 ways
to broadcast an integer constant:

1. Load the full size from constant pool directly.
2. Use AVX2/AVX512 broadcast instruction.
3. Emulate broadcast with SSE2 unpack and shuffle instructions.

A small benchmark:

https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/memset/broadcast

shows that broadcast is a little bit faster on Intel Core i7-8559U:

$ make
gcc -g -I. -O2   -c -o test.o test.c
gcc -g   -c -o memory.o memory.S
gcc -g   -c -o broadcast.o broadcast.S
gcc -g   -c -o vec_dup_sse2.o vec_dup_sse2.S
gcc -o test test.o memory.o broadcast.o vec_dup_sse2.o
./test
memory      : 147215
broadcast   : 121213
vec_dup_sse2: 171366
$

broadcast is also smaller:

$ size memory.o broadcast.o
   text	   data	    bss	    dec	    hex	filename
    132	      0	      0	    132	     84	memory.o
    122	      0	      0	    122	     7a	broadcast.o
$

The preferred choices are

1. Use AVX2/AVX512 broadcast instruction.
2. Load the full size from constant pool directly.
3. Emulate broadcast with SSE2 unpack and shuffle instructions.

The first patch updates vec_duplicate_optab usage to allow it to fail so
that x86 backend can opt out SSE2 broadcast emulation from an integer
constant.

The second patch adds vec_duplicate<mode> expander and updates move
expanders to convert the CONST_WIDE_INT and CONST_VECTO operands to
vector broadcast from an integer with AVX2.

H.J. Lu (2):
  Allow vec_duplicate_optab to fail
  x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast

 gcc/config/i386/i386-expand.c                 | 216 +++++++++++++++++-
 gcc/config/i386/i386-protos.h                 |   3 +
 gcc/config/i386/i386.c                        |  31 +++
 gcc/config/i386/sse.md                        |  19 ++
 gcc/doc/md.texi                               |   2 -
 gcc/expr.c                                    |  10 +-
 .../i386/avx512f-broadcast-pr87767-1.c        |   7 +-
 .../i386/avx512f-broadcast-pr87767-5.c        |   5 +-
 .../gcc.target/i386/avx512f_cond_move.c       |   4 +-
 .../i386/avx512vl-broadcast-pr87767-1.c       |  12 +-
 .../i386/avx512vl-broadcast-pr87767-5.c       |   9 +-
 gcc/testsuite/gcc.target/i386/pr100865-1.c    |  13 ++
 gcc/testsuite/gcc.target/i386/pr100865-10a.c  |  33 +++
 gcc/testsuite/gcc.target/i386/pr100865-10b.c  |   7 +
 gcc/testsuite/gcc.target/i386/pr100865-2.c    |  14 ++
 gcc/testsuite/gcc.target/i386/pr100865-3.c    |  15 ++
 gcc/testsuite/gcc.target/i386/pr100865-4a.c   |  16 ++
 gcc/testsuite/gcc.target/i386/pr100865-4b.c   |   9 +
 gcc/testsuite/gcc.target/i386/pr100865-5a.c   |  16 ++
 gcc/testsuite/gcc.target/i386/pr100865-5b.c   |   9 +
 gcc/testsuite/gcc.target/i386/pr100865-6a.c   |  16 ++
 gcc/testsuite/gcc.target/i386/pr100865-6b.c   |   9 +
 gcc/testsuite/gcc.target/i386/pr100865-7a.c   |  17 ++
 gcc/testsuite/gcc.target/i386/pr100865-7b.c   |   9 +
 gcc/testsuite/gcc.target/i386/pr100865-8a.c   |  24 ++
 gcc/testsuite/gcc.target/i386/pr100865-8b.c   |   7 +
 gcc/testsuite/gcc.target/i386/pr100865-9a.c   |  25 ++
 gcc/testsuite/gcc.target/i386/pr100865-9b.c   |   7 +
 28 files changed, 534 insertions(+), 30 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-10a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-10b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-4a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-4b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-5a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-5b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-6a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-6b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-7a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-7b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-8a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-8b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-9a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-9b.c

-- 
2.31.1


^ permalink raw reply	[flat|nested] 11+ messages in thread
* [PATCH] middle-end/100951 - make sure to generate VECTOR_CST in lowering
@ 2021-06-08  8:47 Richard Biener
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Biener @ 2021-06-08  8:47 UTC (permalink / raw)
  To: gcc-patches

When vector lowering creates piecewise ops make sure to create
VECTOR_CSTs instead of CONSTRUCTORs when possible.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

gcc/

2021-06-07  Richard Biener  <rguenther@suse.de>

	PR middle-end/100951
	* tree-vect-generic.c (expand_vector_piecewise): Build a
	VECTOR_CST if all elements are constant.
	(expand_vector_condition): Likewise.
	(lower_vec_perm): Likewise.
	(expand_vector_conversion): Likewise.

gcc/testsuite/

2021-06-07  H.J. Lu  <hjl.tools@gmail.com>

	PR middle-end/100951
	* gcc.target/i386/pr100951.c: New test.
---
 gcc/testsuite/gcc.target/i386/pr100951.c | 15 +++++++++++
 gcc/tree-vect-generic.c                  | 34 +++++++++++++++++++++---
 2 files changed, 45 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100951.c

diff --git a/gcc/testsuite/gcc.target/i386/pr100951.c b/gcc/testsuite/gcc.target/i386/pr100951.c
new file mode 100644
index 00000000000..16d8bafa663
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr100951.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -march=x86-64" } */
+
+typedef short __attribute__((__vector_size__ (8 * sizeof (short)))) V;
+V v, w;
+
+void
+foo (void)
+{
+  w = __builtin_shuffle (v != v, 0 < (V) {}, (V) {192} >> 5);
+}
+
+/* { dg-final { scan-assembler-not "punpcklwd" } } */
+/* { dg-final { scan-assembler-not "pshufd" } } */
+/* { dg-final { scan-assembler-times "pxor\[\\t \]%xmm\[0-9\]+, %xmm\[0-9\]+" 1 } } */
diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c
index d9c0ac9de7e..5f3f9fa005e 100644
--- a/gcc/tree-vect-generic.c
+++ b/gcc/tree-vect-generic.c
@@ -328,16 +328,22 @@ expand_vector_piecewise (gimple_stmt_iterator *gsi, elem_op_func f,
   if (!ret_type)
     ret_type = type;
   vec_alloc (v, (nunits + delta - 1) / delta);
+  bool constant_p = true;
   for (i = 0; i < nunits;
        i += delta, index = int_const_binop (PLUS_EXPR, index, part_width))
     {
       tree result = f (gsi, inner_type, a, b, index, part_width, code,
 		       ret_type);
+      if (!CONSTANT_CLASS_P (result))
+	constant_p = false;
       constructor_elt ce = {NULL_TREE, result};
       v->quick_push (ce);
     }
 
-  return build_constructor (ret_type, v);
+  if (constant_p)
+    return build_vector_from_ctor (ret_type, v);
+  else
+    return build_constructor (ret_type, v);
 }
 
 /* Expand a vector operation to scalars with the freedom to use
@@ -1105,6 +1111,7 @@ expand_vector_condition (gimple_stmt_iterator *gsi, bitmap dce_ssa_names)
 
   int nunits = nunits_for_known_piecewise_op (type);
   vec_alloc (v, nunits);
+  bool constant_p = true;
   for (int i = 0; i < nunits; i++)
     {
       tree aa, result;
@@ -1129,6 +1136,8 @@ expand_vector_condition (gimple_stmt_iterator *gsi, bitmap dce_ssa_names)
       else
 	aa = tree_vec_extract (gsi, cond_type, a, width, index);
       result = gimplify_build3 (gsi, COND_EXPR, inner_type, aa, bb, cc);
+      if (!CONSTANT_CLASS_P (result))
+	constant_p = false;
       constructor_elt ce = {NULL_TREE, result};
       v->quick_push (ce);
       index = int_const_binop (PLUS_EXPR, index, width);
@@ -1138,7 +1147,10 @@ expand_vector_condition (gimple_stmt_iterator *gsi, bitmap dce_ssa_names)
 	comp_index = int_const_binop (PLUS_EXPR, comp_index, comp_width);
     }
 
-  constr = build_constructor (type, v);
+  if (constant_p)
+    constr = build_vector_from_ctor (type, v);
+  else
+    constr = build_constructor (type, v);
   gimple_assign_set_rhs_from_tree (gsi, constr);
   update_stmt (gsi_stmt (*gsi));
 
@@ -1578,6 +1590,7 @@ lower_vec_perm (gimple_stmt_iterator *gsi)
               "vector shuffling operation will be expanded piecewise");
 
   vec_alloc (v, elements);
+  bool constant_p = true;
   for (i = 0; i < elements; i++)
     {
       si = size_int (i);
@@ -1639,10 +1652,15 @@ lower_vec_perm (gimple_stmt_iterator *gsi)
 	    t = v0_val;
         }
 
+      if (!CONSTANT_CLASS_P (t))
+	constant_p = false;
       CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, t);
     }
 
-  constr = build_constructor (vect_type, v);
+  if (constant_p)
+    constr = build_vector_from_ctor (vect_type, v);
+  else
+    constr = build_constructor (vect_type, v);
   gimple_assign_set_rhs_from_tree (gsi, constr);
   update_stmt (gsi_stmt (*gsi));
 }
@@ -2014,6 +2032,7 @@ expand_vector_conversion (gimple_stmt_iterator *gsi)
 		}
 
 	      vec_alloc (v, (nunits + delta - 1) / delta * 2);
+	      bool constant_p = true;
 	      for (i = 0; i < nunits;
 		   i += delta, index = int_const_binop (PLUS_EXPR, index,
 							part_width))
@@ -2024,12 +2043,19 @@ expand_vector_conversion (gimple_stmt_iterator *gsi)
 					  index);
 		  tree result = gimplify_build1 (gsi, code1, cretd_type, a);
 		  constructor_elt ce = { NULL_TREE, result };
+		  if (!CONSTANT_CLASS_P (ce.value))
+		    constant_p = false;
 		  v->quick_push (ce);
 		  ce.value = gimplify_build1 (gsi, code2, cretd_type, a);
+		  if (!CONSTANT_CLASS_P (ce.value))
+		    constant_p = false;
 		  v->quick_push (ce);
 		}
 
-	      new_rhs = build_constructor (ret_type, v);
+	      if (constant_p)
+		new_rhs = build_vector_from_ctor (ret_type, v);
+	      else
+		new_rhs = build_constructor (ret_type, v);
 	      g = gimple_build_assign (lhs, new_rhs);
 	      gsi_replace (gsi, g, false);
 	      return;
-- 
2.26.2

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-06-09 21:31 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-05 15:18 [PATCH v2 0/2] Allow vec_duplicate_optab to fail H.J. Lu
2021-06-05 15:18 ` [PATCH v2 1/2] " H.J. Lu
2021-06-07  7:12   ` Richard Sandiford
2021-06-07 14:18     ` H.J. Lu
2021-06-07 17:59       ` Richard Biener
2021-06-07 18:10         ` Richard Biener
2021-06-07 20:33           ` [PATCH] middle-end/100951 - make sure to generate VECTOR_CST in lowering H.J. Lu
2021-06-09 21:03             ` Jeff Law
2021-06-09 21:31               ` H.J. Lu
2021-06-05 15:18 ` [PATCH v2 2/2] x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast H.J. Lu
2021-06-08  8:47 [PATCH] middle-end/100951 - make sure to generate VECTOR_CST in lowering Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).