* [PATCH v2 0/2] Allow vec_duplicate_optab to fail @ 2021-06-05 15:18 H.J. Lu 2021-06-05 15:18 ` [PATCH v2 1/2] " H.J. Lu 2021-06-05 15:18 ` [PATCH v2 2/2] x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast H.J. Lu 0 siblings, 2 replies; 10+ messages in thread From: H.J. Lu @ 2021-06-05 15:18 UTC (permalink / raw) To: gcc-patches; +Cc: Uros Bizjak, Jakub Jelinek, Richard Sandiford, Richard Biener We'd like to add vec_duplicate_optab to x86 backend. There are 3 ways to broadcast an integer constant: 1. Load the full size from constant pool directly. 2. Use AVX2/AVX512 broadcast instruction. 3. Emulate broadcast with SSE2 unpack and shuffle instructions. A small benchmark: https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/memset/broadcast shows that broadcast is a little bit faster on Intel Core i7-8559U: $ make gcc -g -I. -O2 -c -o test.o test.c gcc -g -c -o memory.o memory.S gcc -g -c -o broadcast.o broadcast.S gcc -g -c -o vec_dup_sse2.o vec_dup_sse2.S gcc -o test test.o memory.o broadcast.o vec_dup_sse2.o ./test memory : 147215 broadcast : 121213 vec_dup_sse2: 171366 $ broadcast is also smaller: $ size memory.o broadcast.o text data bss dec hex filename 132 0 0 132 84 memory.o 122 0 0 122 7a broadcast.o $ The preferred choices are 1. Use AVX2/AVX512 broadcast instruction. 2. Load the full size from constant pool directly. 3. Emulate broadcast with SSE2 unpack and shuffle instructions. The first patch updates vec_duplicate_optab usage to allow it to fail so that x86 backend can opt out SSE2 broadcast emulation from an integer constant. The second patch adds vec_duplicate<mode> expander and updates move expanders to convert the CONST_WIDE_INT and CONST_VECTO operands to vector broadcast from an integer with AVX2. H.J. Lu (2): Allow vec_duplicate_optab to fail x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast gcc/config/i386/i386-expand.c | 216 +++++++++++++++++- gcc/config/i386/i386-protos.h | 3 + gcc/config/i386/i386.c | 31 +++ gcc/config/i386/sse.md | 19 ++ gcc/doc/md.texi | 2 - gcc/expr.c | 10 +- .../i386/avx512f-broadcast-pr87767-1.c | 7 +- .../i386/avx512f-broadcast-pr87767-5.c | 5 +- .../gcc.target/i386/avx512f_cond_move.c | 4 +- .../i386/avx512vl-broadcast-pr87767-1.c | 12 +- .../i386/avx512vl-broadcast-pr87767-5.c | 9 +- gcc/testsuite/gcc.target/i386/pr100865-1.c | 13 ++ gcc/testsuite/gcc.target/i386/pr100865-10a.c | 33 +++ gcc/testsuite/gcc.target/i386/pr100865-10b.c | 7 + gcc/testsuite/gcc.target/i386/pr100865-2.c | 14 ++ gcc/testsuite/gcc.target/i386/pr100865-3.c | 15 ++ gcc/testsuite/gcc.target/i386/pr100865-4a.c | 16 ++ gcc/testsuite/gcc.target/i386/pr100865-4b.c | 9 + gcc/testsuite/gcc.target/i386/pr100865-5a.c | 16 ++ gcc/testsuite/gcc.target/i386/pr100865-5b.c | 9 + gcc/testsuite/gcc.target/i386/pr100865-6a.c | 16 ++ gcc/testsuite/gcc.target/i386/pr100865-6b.c | 9 + gcc/testsuite/gcc.target/i386/pr100865-7a.c | 17 ++ gcc/testsuite/gcc.target/i386/pr100865-7b.c | 9 + gcc/testsuite/gcc.target/i386/pr100865-8a.c | 24 ++ gcc/testsuite/gcc.target/i386/pr100865-8b.c | 7 + gcc/testsuite/gcc.target/i386/pr100865-9a.c | 25 ++ gcc/testsuite/gcc.target/i386/pr100865-9b.c | 7 + 28 files changed, 534 insertions(+), 30 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-10a.c create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-10b.c create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-2.c create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-3.c create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-4a.c create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-4b.c create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-5a.c create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-5b.c create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-6a.c create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-6b.c create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-7a.c create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-7b.c create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-8a.c create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-8b.c create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-9a.c create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-9b.c -- 2.31.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 1/2] Allow vec_duplicate_optab to fail 2021-06-05 15:18 [PATCH v2 0/2] Allow vec_duplicate_optab to fail H.J. Lu @ 2021-06-05 15:18 ` H.J. Lu 2021-06-07 7:12 ` Richard Sandiford 2021-06-05 15:18 ` [PATCH v2 2/2] x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast H.J. Lu 1 sibling, 1 reply; 10+ messages in thread From: H.J. Lu @ 2021-06-05 15:18 UTC (permalink / raw) To: gcc-patches; +Cc: Uros Bizjak, Jakub Jelinek, Richard Sandiford, Richard Biener Update vec_duplicate to allow to fail so that backend can only allow broadcasting an integer constant to a vector when broadcast instruction is available. * expr.c (store_constructor): Replace expand_insn with maybe_expand_insn for vec_duplicate_optab. * doc/md.texi: Update vec_duplicate. --- gcc/doc/md.texi | 2 -- gcc/expr.c | 10 ++++++---- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 00caf3844cc..e66c41c4779 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -5077,8 +5077,6 @@ the mode appropriate for one element of @var{m}. This pattern only handles duplicates of non-constant inputs. Constant vectors go through the @code{mov@var{m}} pattern instead. -This pattern is not allowed to @code{FAIL}. - @cindex @code{vec_series@var{m}} instruction pattern @item @samp{vec_series@var{m}} Initialize vector output operand 0 so that element @var{i} is equal to diff --git a/gcc/expr.c b/gcc/expr.c index e4660f0e90a..3107c32f259 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -7075,10 +7075,12 @@ store_constructor (tree exp, rtx target, int cleared, poly_int64 size, class expand_operand ops[2]; create_output_operand (&ops[0], target, mode); create_input_operand (&ops[1], expand_normal (elt), eltmode); - expand_insn (icode, 2, ops); - if (!rtx_equal_p (target, ops[0].value)) - emit_move_insn (target, ops[0].value); - break; + if (maybe_expand_insn (icode, 2, ops)) + { + if (!rtx_equal_p (target, ops[0].value)) + emit_move_insn (target, ops[0].value); + break; + } } n_elts = TYPE_VECTOR_SUBPARTS (type); -- 2.31.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/2] Allow vec_duplicate_optab to fail 2021-06-05 15:18 ` [PATCH v2 1/2] " H.J. Lu @ 2021-06-07 7:12 ` Richard Sandiford 2021-06-07 14:18 ` H.J. Lu 0 siblings, 1 reply; 10+ messages in thread From: Richard Sandiford @ 2021-06-07 7:12 UTC (permalink / raw) To: H.J. Lu; +Cc: gcc-patches, Uros Bizjak, Jakub Jelinek, Richard Biener "H.J. Lu" <hjl.tools@gmail.com> writes: > Update vec_duplicate to allow to fail so that backend can only allow > broadcasting an integer constant to a vector when broadcast instruction > is available. I'm not sure why we need this to fail though. Once the optab is defined for target X, the optab should handle all duplicates for target X, even if there are different strategies it can use. AIUI the case you want to make conditional is the constant case. I guess the first question is: why don't we simplify those CONSTRUCTORs to VECTOR_CSTs in gimple? I'm surprised we still see the constant case as a constructor here. If we can't rely on that happening, then would it work to change: /* Try using vec_duplicate_optab for uniform vectors. */ if (!TREE_SIDE_EFFECTS (exp) && VECTOR_MODE_P (mode) && eltmode == GET_MODE_INNER (mode) && ((icode = optab_handler (vec_duplicate_optab, mode)) != CODE_FOR_nothing) && (elt = uniform_vector_p (exp))) to something like: /* Try using vec_duplicate_optab for uniform vectors. */ if (!TREE_SIDE_EFFECTS (exp) && VECTOR_MODE_P (mode) && eltmode == GET_MODE_INNER (mode) && (elt = uniform_vector_p (exp))) { if (TREE_CODE (elt) == INTEGER_CST || TREE_CODE (elt) == POLY_INT_CST || TREE_CODE (elt) == REAL_CST || TREE_CODE (elt) == FIXED_CST) { rtx src = gen_const_vec_duplicate (mode, expand_normal (node)); emit_move_insn (target, src); break; } … } Thanks, Richard ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/2] Allow vec_duplicate_optab to fail 2021-06-07 7:12 ` Richard Sandiford @ 2021-06-07 14:18 ` H.J. Lu 2021-06-07 17:59 ` Richard Biener 0 siblings, 1 reply; 10+ messages in thread From: H.J. Lu @ 2021-06-07 14:18 UTC (permalink / raw) To: H.J. Lu, GCC Patches, Uros Bizjak, Jakub Jelinek, Richard Biener, Richard Sandiford On Mon, Jun 7, 2021 at 12:12 AM Richard Sandiford <richard.sandiford@arm.com> wrote: > > "H.J. Lu" <hjl.tools@gmail.com> writes: > > Update vec_duplicate to allow to fail so that backend can only allow > > broadcasting an integer constant to a vector when broadcast instruction > > is available. > > I'm not sure why we need this to fail though. Once the optab is defined > for target X, the optab should handle all duplicates for target X, > even if there are different strategies it can use. > > AIUI the case you want to make conditional is the constant case. > I guess the first question is: why don't we simplify those CONSTRUCTORs > to VECTOR_CSTs in gimple? I'm surprised we still see the constant case > as a constructor here. The particular testcase for vec_duplicate is gcc.dg/pr100239.c. > If we can't rely on that happening, then would it work to change: > > /* Try using vec_duplicate_optab for uniform vectors. */ > if (!TREE_SIDE_EFFECTS (exp) > && VECTOR_MODE_P (mode) > && eltmode == GET_MODE_INNER (mode) > && ((icode = optab_handler (vec_duplicate_optab, mode)) > != CODE_FOR_nothing) > && (elt = uniform_vector_p (exp))) > > to something like: > > /* Try using vec_duplicate_optab for uniform vectors. */ > if (!TREE_SIDE_EFFECTS (exp) > && VECTOR_MODE_P (mode) > && eltmode == GET_MODE_INNER (mode) > && (elt = uniform_vector_p (exp))) > { > if (TREE_CODE (elt) == INTEGER_CST > || TREE_CODE (elt) == POLY_INT_CST > || TREE_CODE (elt) == REAL_CST > || TREE_CODE (elt) == FIXED_CST) > { > rtx src = gen_const_vec_duplicate (mode, expand_normal (node)); > emit_move_insn (target, src); > break; > } > … > } I will give it a try. Thanks. -- H.J. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/2] Allow vec_duplicate_optab to fail 2021-06-07 14:18 ` H.J. Lu @ 2021-06-07 17:59 ` Richard Biener 2021-06-07 18:10 ` Richard Biener 0 siblings, 1 reply; 10+ messages in thread From: Richard Biener @ 2021-06-07 17:59 UTC (permalink / raw) To: H.J. Lu; +Cc: GCC Patches, Uros Bizjak, Jakub Jelinek, Richard Sandiford On Mon, Jun 7, 2021 at 4:19 PM H.J. Lu <hjl.tools@gmail.com> wrote: > > On Mon, Jun 7, 2021 at 12:12 AM Richard Sandiford > <richard.sandiford@arm.com> wrote: > > > > "H.J. Lu" <hjl.tools@gmail.com> writes: > > > Update vec_duplicate to allow to fail so that backend can only allow > > > broadcasting an integer constant to a vector when broadcast instruction > > > is available. > > > > I'm not sure why we need this to fail though. Once the optab is defined > > for target X, the optab should handle all duplicates for target X, > > even if there are different strategies it can use. > > > > AIUI the case you want to make conditional is the constant case. > > I guess the first question is: why don't we simplify those CONSTRUCTORs > > to VECTOR_CSTs in gimple? I'm surprised we still see the constant case > > as a constructor here. > > The particular testcase for vec_duplicate is gcc.dg/pr100239.c. > > > If we can't rely on that happening, then would it work to change: > > > > /* Try using vec_duplicate_optab for uniform vectors. */ > > if (!TREE_SIDE_EFFECTS (exp) > > && VECTOR_MODE_P (mode) > > && eltmode == GET_MODE_INNER (mode) > > && ((icode = optab_handler (vec_duplicate_optab, mode)) > > != CODE_FOR_nothing) > > && (elt = uniform_vector_p (exp))) > > > > to something like: > > > > /* Try using vec_duplicate_optab for uniform vectors. */ > > if (!TREE_SIDE_EFFECTS (exp) > > && VECTOR_MODE_P (mode) > > && eltmode == GET_MODE_INNER (mode) > > && (elt = uniform_vector_p (exp))) > > { > > if (TREE_CODE (elt) == INTEGER_CST > > || TREE_CODE (elt) == POLY_INT_CST > > || TREE_CODE (elt) == REAL_CST > > || TREE_CODE (elt) == FIXED_CST) > > { > > rtx src = gen_const_vec_duplicate (mode, expand_normal (node)); > > emit_move_insn (target, src); > > break; > > } > > … > > } > > I will give it a try. I can confirm that veclower leaves us with an unfolded constant CTOR. If you file a PR to remind me I'll fix that. Richard. > Thanks. > > -- > H.J. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/2] Allow vec_duplicate_optab to fail 2021-06-07 17:59 ` Richard Biener @ 2021-06-07 18:10 ` Richard Biener 2021-06-07 20:33 ` [PATCH] middle-end/100951 - make sure to generate VECTOR_CST in lowering H.J. Lu 0 siblings, 1 reply; 10+ messages in thread From: Richard Biener @ 2021-06-07 18:10 UTC (permalink / raw) To: H.J. Lu; +Cc: GCC Patches, Uros Bizjak, Jakub Jelinek, Richard Sandiford [-- Attachment #1: Type: text/plain, Size: 2616 bytes --] On Mon, Jun 7, 2021 at 7:59 PM Richard Biener <richard.guenther@gmail.com> wrote: > > On Mon, Jun 7, 2021 at 4:19 PM H.J. Lu <hjl.tools@gmail.com> wrote: > > > > On Mon, Jun 7, 2021 at 12:12 AM Richard Sandiford > > <richard.sandiford@arm.com> wrote: > > > > > > "H.J. Lu" <hjl.tools@gmail.com> writes: > > > > Update vec_duplicate to allow to fail so that backend can only allow > > > > broadcasting an integer constant to a vector when broadcast instruction > > > > is available. > > > > > > I'm not sure why we need this to fail though. Once the optab is defined > > > for target X, the optab should handle all duplicates for target X, > > > even if there are different strategies it can use. > > > > > > AIUI the case you want to make conditional is the constant case. > > > I guess the first question is: why don't we simplify those CONSTRUCTORs > > > to VECTOR_CSTs in gimple? I'm surprised we still see the constant case > > > as a constructor here. > > > > The particular testcase for vec_duplicate is gcc.dg/pr100239.c. > > > > > If we can't rely on that happening, then would it work to change: > > > > > > /* Try using vec_duplicate_optab for uniform vectors. */ > > > if (!TREE_SIDE_EFFECTS (exp) > > > && VECTOR_MODE_P (mode) > > > && eltmode == GET_MODE_INNER (mode) > > > && ((icode = optab_handler (vec_duplicate_optab, mode)) > > > != CODE_FOR_nothing) > > > && (elt = uniform_vector_p (exp))) > > > > > > to something like: > > > > > > /* Try using vec_duplicate_optab for uniform vectors. */ > > > if (!TREE_SIDE_EFFECTS (exp) > > > && VECTOR_MODE_P (mode) > > > && eltmode == GET_MODE_INNER (mode) > > > && (elt = uniform_vector_p (exp))) > > > { > > > if (TREE_CODE (elt) == INTEGER_CST > > > || TREE_CODE (elt) == POLY_INT_CST > > > || TREE_CODE (elt) == REAL_CST > > > || TREE_CODE (elt) == FIXED_CST) > > > { > > > rtx src = gen_const_vec_duplicate (mode, expand_normal (node)); > > > emit_move_insn (target, src); > > > break; > > > } > > > … > > > } > > > > I will give it a try. > > I can confirm that veclower leaves us with an unfolded constant CTOR. > If you file a PR to remind me I'll fix that. The attached untested patch fixes this for the testcase. Richard. > Richard. > > > Thanks. > > > > -- > > H.J. [-- Attachment #2: p --] [-- Type: application/octet-stream, Size: 4438 bytes --] From 3c89ebfcbeaafdd9bbf31a300593365eb92906c4 Mon Sep 17 00:00:00 2001 From: Richard Biener <rguenther@suse.de> Date: Mon, 7 Jun 2021 20:08:13 +0200 Subject: [PATCH] middle-end/ - make sure to generate VECTOR_CST in lowering When vector lowering creates piecewise ops make sure to create VECTOR_CSTs instead of CONSTRUCTORs when possible. 2021-06-07 Richard Biener <rguenther@suse.de> * tree-vect-generic.c (): Build a VECTOR_CST if all elements are constant. --- gcc/tree-vect-generic.c | 34 ++++++++++++++++++++++++++++++---- 1 file changed, 30 insertions(+), 4 deletions(-) diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c index d9c0ac9de7e..5f3f9fa005e 100644 --- a/gcc/tree-vect-generic.c +++ b/gcc/tree-vect-generic.c @@ -328,16 +328,22 @@ expand_vector_piecewise (gimple_stmt_iterator *gsi, elem_op_func f, if (!ret_type) ret_type = type; vec_alloc (v, (nunits + delta - 1) / delta); + bool constant_p = true; for (i = 0; i < nunits; i += delta, index = int_const_binop (PLUS_EXPR, index, part_width)) { tree result = f (gsi, inner_type, a, b, index, part_width, code, ret_type); + if (!CONSTANT_CLASS_P (result)) + constant_p = false; constructor_elt ce = {NULL_TREE, result}; v->quick_push (ce); } - return build_constructor (ret_type, v); + if (constant_p) + return build_vector_from_ctor (ret_type, v); + else + return build_constructor (ret_type, v); } /* Expand a vector operation to scalars with the freedom to use @@ -1105,6 +1111,7 @@ expand_vector_condition (gimple_stmt_iterator *gsi, bitmap dce_ssa_names) int nunits = nunits_for_known_piecewise_op (type); vec_alloc (v, nunits); + bool constant_p = true; for (int i = 0; i < nunits; i++) { tree aa, result; @@ -1129,6 +1136,8 @@ expand_vector_condition (gimple_stmt_iterator *gsi, bitmap dce_ssa_names) else aa = tree_vec_extract (gsi, cond_type, a, width, index); result = gimplify_build3 (gsi, COND_EXPR, inner_type, aa, bb, cc); + if (!CONSTANT_CLASS_P (result)) + constant_p = false; constructor_elt ce = {NULL_TREE, result}; v->quick_push (ce); index = int_const_binop (PLUS_EXPR, index, width); @@ -1138,7 +1147,10 @@ expand_vector_condition (gimple_stmt_iterator *gsi, bitmap dce_ssa_names) comp_index = int_const_binop (PLUS_EXPR, comp_index, comp_width); } - constr = build_constructor (type, v); + if (constant_p) + constr = build_vector_from_ctor (type, v); + else + constr = build_constructor (type, v); gimple_assign_set_rhs_from_tree (gsi, constr); update_stmt (gsi_stmt (*gsi)); @@ -1578,6 +1590,7 @@ lower_vec_perm (gimple_stmt_iterator *gsi) "vector shuffling operation will be expanded piecewise"); vec_alloc (v, elements); + bool constant_p = true; for (i = 0; i < elements; i++) { si = size_int (i); @@ -1639,10 +1652,15 @@ lower_vec_perm (gimple_stmt_iterator *gsi) t = v0_val; } + if (!CONSTANT_CLASS_P (t)) + constant_p = false; CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, t); } - constr = build_constructor (vect_type, v); + if (constant_p) + constr = build_vector_from_ctor (vect_type, v); + else + constr = build_constructor (vect_type, v); gimple_assign_set_rhs_from_tree (gsi, constr); update_stmt (gsi_stmt (*gsi)); } @@ -2014,6 +2032,7 @@ expand_vector_conversion (gimple_stmt_iterator *gsi) } vec_alloc (v, (nunits + delta - 1) / delta * 2); + bool constant_p = true; for (i = 0; i < nunits; i += delta, index = int_const_binop (PLUS_EXPR, index, part_width)) @@ -2024,12 +2043,19 @@ expand_vector_conversion (gimple_stmt_iterator *gsi) index); tree result = gimplify_build1 (gsi, code1, cretd_type, a); constructor_elt ce = { NULL_TREE, result }; + if (!CONSTANT_CLASS_P (ce.value)) + constant_p = false; v->quick_push (ce); ce.value = gimplify_build1 (gsi, code2, cretd_type, a); + if (!CONSTANT_CLASS_P (ce.value)) + constant_p = false; v->quick_push (ce); } - new_rhs = build_constructor (ret_type, v); + if (constant_p) + new_rhs = build_vector_from_ctor (ret_type, v); + else + new_rhs = build_constructor (ret_type, v); g = gimple_build_assign (lhs, new_rhs); gsi_replace (gsi, g, false); return; -- 2.17.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH] middle-end/100951 - make sure to generate VECTOR_CST in lowering 2021-06-07 18:10 ` Richard Biener @ 2021-06-07 20:33 ` H.J. Lu 2021-06-09 21:03 ` Jeff Law 0 siblings, 1 reply; 10+ messages in thread From: H.J. Lu @ 2021-06-07 20:33 UTC (permalink / raw) To: Richard Biener; +Cc: GCC Patches, Uros Bizjak, Jakub Jelinek, Richard Sandiford [-- Attachment #1: Type: text/plain, Size: 2811 bytes --] On Mon, Jun 7, 2021 at 11:10 AM Richard Biener <richard.guenther@gmail.com> wrote: > > On Mon, Jun 7, 2021 at 7:59 PM Richard Biener > <richard.guenther@gmail.com> wrote: > > > > On Mon, Jun 7, 2021 at 4:19 PM H.J. Lu <hjl.tools@gmail.com> wrote: > > > > > > On Mon, Jun 7, 2021 at 12:12 AM Richard Sandiford > > > <richard.sandiford@arm.com> wrote: > > > > > > > > "H.J. Lu" <hjl.tools@gmail.com> writes: > > > > > Update vec_duplicate to allow to fail so that backend can only allow > > > > > broadcasting an integer constant to a vector when broadcast instruction > > > > > is available. > > > > > > > > I'm not sure why we need this to fail though. Once the optab is defined > > > > for target X, the optab should handle all duplicates for target X, > > > > even if there are different strategies it can use. > > > > > > > > AIUI the case you want to make conditional is the constant case. > > > > I guess the first question is: why don't we simplify those CONSTRUCTORs > > > > to VECTOR_CSTs in gimple? I'm surprised we still see the constant case > > > > as a constructor here. > > > > > > The particular testcase for vec_duplicate is gcc.dg/pr100239.c. > > > > > > > If we can't rely on that happening, then would it work to change: > > > > > > > > /* Try using vec_duplicate_optab for uniform vectors. */ > > > > if (!TREE_SIDE_EFFECTS (exp) > > > > && VECTOR_MODE_P (mode) > > > > && eltmode == GET_MODE_INNER (mode) > > > > && ((icode = optab_handler (vec_duplicate_optab, mode)) > > > > != CODE_FOR_nothing) > > > > && (elt = uniform_vector_p (exp))) > > > > > > > > to something like: > > > > > > > > /* Try using vec_duplicate_optab for uniform vectors. */ > > > > if (!TREE_SIDE_EFFECTS (exp) > > > > && VECTOR_MODE_P (mode) > > > > && eltmode == GET_MODE_INNER (mode) > > > > && (elt = uniform_vector_p (exp))) > > > > { > > > > if (TREE_CODE (elt) == INTEGER_CST > > > > || TREE_CODE (elt) == POLY_INT_CST > > > > || TREE_CODE (elt) == REAL_CST > > > > || TREE_CODE (elt) == FIXED_CST) > > > > { > > > > rtx src = gen_const_vec_duplicate (mode, expand_normal (node)); > > > > emit_move_insn (target, src); > > > > break; > > > > } > > > > … > > > > } > > > > > > I will give it a try. > > > > I can confirm that veclower leaves us with an unfolded constant CTOR. > > If you file a PR to remind me I'll fix that. > > The attached untested patch fixes this for the testcase. > Here is the patch + the testcase. -- H.J. [-- Attachment #2: 0001-middle-end-100951-make-sure-to-generate-VECTOR_CST-i.patch --] [-- Type: text/x-patch, Size: 5388 bytes --] From aac56894719b59e552b493c970946225ed8c27f6 Mon Sep 17 00:00:00 2001 From: Richard Biener <rguenther@suse.de> Date: Mon, 7 Jun 2021 20:08:13 +0200 Subject: [PATCH] middle-end/100951 - make sure to generate VECTOR_CST in lowering When vector lowering creates piecewise ops make sure to create VECTOR_CSTs instead of CONSTRUCTORs when possible. gcc/ 2021-06-07 Richard Biener <rguenther@suse.de> PR middle-end/100951 * tree-vect-generic.c (): Build a VECTOR_CST if all elements are constant. gcc/testsuite/ 2021-06-07 H.J. Lu <hjl.tools@gmail.com> PR middle-end/100951 * gcc.target/i386/pr100951.c: New test. --- gcc/testsuite/gcc.target/i386/pr100951.c | 15 +++++++++++ gcc/tree-vect-generic.c | 34 +++++++++++++++++++++--- 2 files changed, 45 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr100951.c diff --git a/gcc/testsuite/gcc.target/i386/pr100951.c b/gcc/testsuite/gcc.target/i386/pr100951.c new file mode 100644 index 00000000000..16d8bafa663 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100951.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O0 -march=x86-64" } */ + +typedef short __attribute__((__vector_size__ (8 * sizeof (short)))) V; +V v, w; + +void +foo (void) +{ + w = __builtin_shuffle (v != v, 0 < (V) {}, (V) {192} >> 5); +} + +/* { dg-final { scan-assembler-not "punpcklwd" } } */ +/* { dg-final { scan-assembler-not "pshufd" } } */ +/* { dg-final { scan-assembler-times "pxor\[\\t \]%xmm\[0-9\]+, %xmm\[0-9\]+" 1 } } */ diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c index d9c0ac9de7e..5f3f9fa005e 100644 --- a/gcc/tree-vect-generic.c +++ b/gcc/tree-vect-generic.c @@ -328,16 +328,22 @@ expand_vector_piecewise (gimple_stmt_iterator *gsi, elem_op_func f, if (!ret_type) ret_type = type; vec_alloc (v, (nunits + delta - 1) / delta); + bool constant_p = true; for (i = 0; i < nunits; i += delta, index = int_const_binop (PLUS_EXPR, index, part_width)) { tree result = f (gsi, inner_type, a, b, index, part_width, code, ret_type); + if (!CONSTANT_CLASS_P (result)) + constant_p = false; constructor_elt ce = {NULL_TREE, result}; v->quick_push (ce); } - return build_constructor (ret_type, v); + if (constant_p) + return build_vector_from_ctor (ret_type, v); + else + return build_constructor (ret_type, v); } /* Expand a vector operation to scalars with the freedom to use @@ -1105,6 +1111,7 @@ expand_vector_condition (gimple_stmt_iterator *gsi, bitmap dce_ssa_names) int nunits = nunits_for_known_piecewise_op (type); vec_alloc (v, nunits); + bool constant_p = true; for (int i = 0; i < nunits; i++) { tree aa, result; @@ -1129,6 +1136,8 @@ expand_vector_condition (gimple_stmt_iterator *gsi, bitmap dce_ssa_names) else aa = tree_vec_extract (gsi, cond_type, a, width, index); result = gimplify_build3 (gsi, COND_EXPR, inner_type, aa, bb, cc); + if (!CONSTANT_CLASS_P (result)) + constant_p = false; constructor_elt ce = {NULL_TREE, result}; v->quick_push (ce); index = int_const_binop (PLUS_EXPR, index, width); @@ -1138,7 +1147,10 @@ expand_vector_condition (gimple_stmt_iterator *gsi, bitmap dce_ssa_names) comp_index = int_const_binop (PLUS_EXPR, comp_index, comp_width); } - constr = build_constructor (type, v); + if (constant_p) + constr = build_vector_from_ctor (type, v); + else + constr = build_constructor (type, v); gimple_assign_set_rhs_from_tree (gsi, constr); update_stmt (gsi_stmt (*gsi)); @@ -1578,6 +1590,7 @@ lower_vec_perm (gimple_stmt_iterator *gsi) "vector shuffling operation will be expanded piecewise"); vec_alloc (v, elements); + bool constant_p = true; for (i = 0; i < elements; i++) { si = size_int (i); @@ -1639,10 +1652,15 @@ lower_vec_perm (gimple_stmt_iterator *gsi) t = v0_val; } + if (!CONSTANT_CLASS_P (t)) + constant_p = false; CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, t); } - constr = build_constructor (vect_type, v); + if (constant_p) + constr = build_vector_from_ctor (vect_type, v); + else + constr = build_constructor (vect_type, v); gimple_assign_set_rhs_from_tree (gsi, constr); update_stmt (gsi_stmt (*gsi)); } @@ -2014,6 +2032,7 @@ expand_vector_conversion (gimple_stmt_iterator *gsi) } vec_alloc (v, (nunits + delta - 1) / delta * 2); + bool constant_p = true; for (i = 0; i < nunits; i += delta, index = int_const_binop (PLUS_EXPR, index, part_width)) @@ -2024,12 +2043,19 @@ expand_vector_conversion (gimple_stmt_iterator *gsi) index); tree result = gimplify_build1 (gsi, code1, cretd_type, a); constructor_elt ce = { NULL_TREE, result }; + if (!CONSTANT_CLASS_P (ce.value)) + constant_p = false; v->quick_push (ce); ce.value = gimplify_build1 (gsi, code2, cretd_type, a); + if (!CONSTANT_CLASS_P (ce.value)) + constant_p = false; v->quick_push (ce); } - new_rhs = build_constructor (ret_type, v); + if (constant_p) + new_rhs = build_vector_from_ctor (ret_type, v); + else + new_rhs = build_constructor (ret_type, v); g = gimple_build_assign (lhs, new_rhs); gsi_replace (gsi, g, false); return; -- 2.31.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] middle-end/100951 - make sure to generate VECTOR_CST in lowering 2021-06-07 20:33 ` [PATCH] middle-end/100951 - make sure to generate VECTOR_CST in lowering H.J. Lu @ 2021-06-09 21:03 ` Jeff Law 2021-06-09 21:31 ` H.J. Lu 0 siblings, 1 reply; 10+ messages in thread From: Jeff Law @ 2021-06-09 21:03 UTC (permalink / raw) To: H.J. Lu, Richard Biener; +Cc: Jakub Jelinek, GCC Patches, Richard Sandiford On 6/7/2021 2:33 PM, H.J. Lu via Gcc-patches wrote: > On Mon, Jun 7, 2021 at 11:10 AM Richard Biener > <richard.guenther@gmail.com> wrote: >> On Mon, Jun 7, 2021 at 7:59 PM Richard Biener >> <richard.guenther@gmail.com> wrote: >>> On Mon, Jun 7, 2021 at 4:19 PM H.J. Lu <hjl.tools@gmail.com> wrote: >>>> On Mon, Jun 7, 2021 at 12:12 AM Richard Sandiford >>>> <richard.sandiford@arm.com> wrote: >>>>> "H.J. Lu" <hjl.tools@gmail.com> writes: >>>>>> Update vec_duplicate to allow to fail so that backend can only allow >>>>>> broadcasting an integer constant to a vector when broadcast instruction >>>>>> is available. >>>>> I'm not sure why we need this to fail though. Once the optab is defined >>>>> for target X, the optab should handle all duplicates for target X, >>>>> even if there are different strategies it can use. >>>>> >>>>> AIUI the case you want to make conditional is the constant case. >>>>> I guess the first question is: why don't we simplify those CONSTRUCTORs >>>>> to VECTOR_CSTs in gimple? I'm surprised we still see the constant case >>>>> as a constructor here. >>>> The particular testcase for vec_duplicate is gcc.dg/pr100239.c. >>>> >>>>> If we can't rely on that happening, then would it work to change: >>>>> >>>>> /* Try using vec_duplicate_optab for uniform vectors. */ >>>>> if (!TREE_SIDE_EFFECTS (exp) >>>>> && VECTOR_MODE_P (mode) >>>>> && eltmode == GET_MODE_INNER (mode) >>>>> && ((icode = optab_handler (vec_duplicate_optab, mode)) >>>>> != CODE_FOR_nothing) >>>>> && (elt = uniform_vector_p (exp))) >>>>> >>>>> to something like: >>>>> >>>>> /* Try using vec_duplicate_optab for uniform vectors. */ >>>>> if (!TREE_SIDE_EFFECTS (exp) >>>>> && VECTOR_MODE_P (mode) >>>>> && eltmode == GET_MODE_INNER (mode) >>>>> && (elt = uniform_vector_p (exp))) >>>>> { >>>>> if (TREE_CODE (elt) == INTEGER_CST >>>>> || TREE_CODE (elt) == POLY_INT_CST >>>>> || TREE_CODE (elt) == REAL_CST >>>>> || TREE_CODE (elt) == FIXED_CST) >>>>> { >>>>> rtx src = gen_const_vec_duplicate (mode, expand_normal (node)); >>>>> emit_move_insn (target, src); >>>>> break; >>>>> } >>>>> … >>>>> } >>>> I will give it a try. >>> I can confirm that veclower leaves us with an unfolded constant CTOR. >>> If you file a PR to remind me I'll fix that. >> The attached untested patch fixes this for the testcase. >> > Here is the patch + the testcase. > > > 0001-middle-end-100951-make-sure-to-generate-VECTOR_CST-i.patch > > From aac56894719b59e552b493c970946225ed8c27f6 Mon Sep 17 00:00:00 2001 > From: Richard Biener <rguenther@suse.de> > Date: Mon, 7 Jun 2021 20:08:13 +0200 > Subject: [PATCH] middle-end/100951 - make sure to generate VECTOR_CST in > lowering > > When vector lowering creates piecewise ops make sure to create > VECTOR_CSTs instead of CONSTRUCTORs when possible. > > gcc/ > > 2021-06-07 Richard Biener <rguenther@suse.de> > > PR middle-end/100951 > * tree-vect-generic.c (): Build a VECTOR_CST if all > elements are constant. > > gcc/testsuite/ > > 2021-06-07 H.J. Lu <hjl.tools@gmail.com> > > PR middle-end/100951 > * gcc.target/i386/pr100951.c: New test. Assuming this passed testing it is OK. jeff ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] middle-end/100951 - make sure to generate VECTOR_CST in lowering 2021-06-09 21:03 ` Jeff Law @ 2021-06-09 21:31 ` H.J. Lu 0 siblings, 0 replies; 10+ messages in thread From: H.J. Lu @ 2021-06-09 21:31 UTC (permalink / raw) To: Jeff Law; +Cc: Richard Biener, Jakub Jelinek, GCC Patches, Richard Sandiford On Wed, Jun 9, 2021 at 2:03 PM Jeff Law <jeffreyalaw@gmail.com> wrote: > > > > On 6/7/2021 2:33 PM, H.J. Lu via Gcc-patches wrote: > > On Mon, Jun 7, 2021 at 11:10 AM Richard Biener > <richard.guenther@gmail.com> wrote: > > On Mon, Jun 7, 2021 at 7:59 PM Richard Biener > <richard.guenther@gmail.com> wrote: > > On Mon, Jun 7, 2021 at 4:19 PM H.J. Lu <hjl.tools@gmail.com> wrote: > > On Mon, Jun 7, 2021 at 12:12 AM Richard Sandiford > <richard.sandiford@arm.com> wrote: > > "H.J. Lu" <hjl.tools@gmail.com> writes: > > Update vec_duplicate to allow to fail so that backend can only allow > broadcasting an integer constant to a vector when broadcast instruction > is available. > > I'm not sure why we need this to fail though. Once the optab is defined > for target X, the optab should handle all duplicates for target X, > even if there are different strategies it can use. > > AIUI the case you want to make conditional is the constant case. > I guess the first question is: why don't we simplify those CONSTRUCTORs > to VECTOR_CSTs in gimple? I'm surprised we still see the constant case > as a constructor here. > > The particular testcase for vec_duplicate is gcc.dg/pr100239.c. > > If we can't rely on that happening, then would it work to change: > > /* Try using vec_duplicate_optab for uniform vectors. */ > if (!TREE_SIDE_EFFECTS (exp) > && VECTOR_MODE_P (mode) > && eltmode == GET_MODE_INNER (mode) > && ((icode = optab_handler (vec_duplicate_optab, mode)) > != CODE_FOR_nothing) > && (elt = uniform_vector_p (exp))) > > to something like: > > /* Try using vec_duplicate_optab for uniform vectors. */ > if (!TREE_SIDE_EFFECTS (exp) > && VECTOR_MODE_P (mode) > && eltmode == GET_MODE_INNER (mode) > && (elt = uniform_vector_p (exp))) > { > if (TREE_CODE (elt) == INTEGER_CST > || TREE_CODE (elt) == POLY_INT_CST > || TREE_CODE (elt) == REAL_CST > || TREE_CODE (elt) == FIXED_CST) > { > rtx src = gen_const_vec_duplicate (mode, expand_normal (node)); > emit_move_insn (target, src); > break; > } > … > } > > I will give it a try. > > I can confirm that veclower leaves us with an unfolded constant CTOR. > If you file a PR to remind me I'll fix that. > > The attached untested patch fixes this for the testcase. > > Here is the patch + the testcase. > > > 0001-middle-end-100951-make-sure-to-generate-VECTOR_CST-i.patch > > From aac56894719b59e552b493c970946225ed8c27f6 Mon Sep 17 00:00:00 2001 > From: Richard Biener <rguenther@suse.de> > Date: Mon, 7 Jun 2021 20:08:13 +0200 > Subject: [PATCH] middle-end/100951 - make sure to generate VECTOR_CST in > lowering > > When vector lowering creates piecewise ops make sure to create > VECTOR_CSTs instead of CONSTRUCTORs when possible. > > gcc/ > > 2021-06-07 Richard Biener <rguenther@suse.de> > > PR middle-end/100951 > * tree-vect-generic.c (): Build a VECTOR_CST if all > elements are constant. > > gcc/testsuite/ > > 2021-06-07 H.J. Lu <hjl.tools@gmail.com> > > PR middle-end/100951 > * gcc.target/i386/pr100951.c: New test. > > Assuming this passed testing it is OK. > jeff Richard has committed: commit ffe3a37f54ab866d85bdde48c2a32be5e09d8515 Author: Richard Biener <rguenther@suse.de> Date: Mon Jun 7 20:08:13 2021 +0200 middle-end/100951 - make sure to generate VECTOR_CST in lowering When vector lowering creates piecewise ops make sure to create VECTOR_CSTs instead of CONSTRUCTORs when possible. -- H.J. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 2/2] x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast 2021-06-05 15:18 [PATCH v2 0/2] Allow vec_duplicate_optab to fail H.J. Lu 2021-06-05 15:18 ` [PATCH v2 1/2] " H.J. Lu @ 2021-06-05 15:18 ` H.J. Lu 1 sibling, 0 replies; 10+ messages in thread From: H.J. Lu @ 2021-06-05 15:18 UTC (permalink / raw) To: gcc-patches; +Cc: Uros Bizjak, Jakub Jelinek, Richard Sandiford, Richard Biener 1. Update move expanders to convert the CONST_WIDE_INT and CONST_VECTO operands to vector broadcast from an integer with AVX2. 2. Add ix86_gen_scratch_sse_rtx to return a scratch SSE register which won't increase stack alignment requirement and blocks transformation by the combine pass. 3. Add vec_duplicate<mode> expander. 4. Update PR 87767 tests to expect integer broadcast instead of broadcast from memory. 5. Update avx512f_cond_move.c to expect integer broadcast. A small benchmark: https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/memset/broadcast shows that broadcast is a little bit faster on Intel Core i7-8559U: $ make gcc -g -I. -O2 -c -o test.o test.c gcc -g -c -o memory.o memory.S gcc -g -c -o broadcast.o broadcast.S gcc -g -c -o vec_dup_sse2.o vec_dup_sse2.S gcc -o test test.o memory.o broadcast.o vec_dup_sse2.o ./test memory : 147215 broadcast : 121213 vec_dup_sse2: 171366 $ broadcast is also smaller: $ size memory.o broadcast.o text data bss dec hex filename 132 0 0 132 84 memory.o 122 0 0 122 7a broadcast.o $ gcc/ PR target/100865 * config/i386/i386-expand.c (ix86_expand_vector_init_duplicate): New prototype. (ix86_byte_broadcast): New function. (ix86_convert_const_wide_int_to_broadcast): Likewise. (ix86_expand_move): Convert CONST_WIDE_INT to broadcast if mode size is 16 bytes or bigger. (ix86_broadcast_from_integer_constant): New function. (ix86_expand_vector_move): Convert CONST_WIDE_INT and CONST_VECTOR to broadcast if mode size is 16 bytes or bigger. (ix86_expand_integer_vec_duplicat): New function. * config/i386/i386-protos.h (ix86_gen_scratch_sse_rtx): New prototype. (ix86_expand_integer_vec_duplicat): Likewise. * config/i386/i386.c (ix86_gen_scratch_sse_rtx): New function. * config/i386/sse.md (INT_BROADCAST_MODE): New mode iterator. (vec_duplicate<mode>): New expander. gcc/testsuite/ PR target/100865 * gcc.target/i386/avx512f-broadcast-pr87767-1.c: Expect integer broadcast. * gcc.target/i386/avx512f-broadcast-pr87767-5.c: Likewise. * gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Likewise. * gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Likewise. * gcc.target/i386/avx512f_cond_move.c: Also pass -mprefer-vector-width=512 and expect integer broadcast. * gcc.target/i386/pr100865-1.c: New test. * gcc.target/i386/pr100865-2.c: Likewise. * gcc.target/i386/pr100865-3.c: Likewise. * gcc.target/i386/pr100865-4a.c: Likewise. * gcc.target/i386/pr100865-4b.c: Likewise. * gcc.target/i386/pr100865-5a.c: Likewise. * gcc.target/i386/pr100865-5b.c: Likewise. * gcc.target/i386/pr100865-6a.c: Likewise. * gcc.target/i386/pr100865-6b.c: Likewise. * gcc.target/i386/pr100865-7a.c: Likewise. * gcc.target/i386/pr100865-7b.c: Likewise. * gcc.target/i386/pr100865-8a.c: Likewise. * gcc.target/i386/pr100865-8b.c: Likewise. * gcc.target/i386/pr100865-9a.c: Likewise. * gcc.target/i386/pr100865-9b.c: Likewise. * gcc.target/i386/pr100865-10a.c: Likewise. * gcc.target/i386/pr100865-10b.c: Likewise. --- gcc/config/i386/i386-expand.c | 216 +++++++++++++++++- gcc/config/i386/i386-protos.h | 3 + gcc/config/i386/i386.c | 31 +++ gcc/config/i386/sse.md | 19 ++ .../i386/avx512f-broadcast-pr87767-1.c | 7 +- .../i386/avx512f-broadcast-pr87767-5.c | 5 +- .../gcc.target/i386/avx512f_cond_move.c | 4 +- .../i386/avx512vl-broadcast-pr87767-1.c | 12 +- .../i386/avx512vl-broadcast-pr87767-5.c | 9 +- gcc/testsuite/gcc.target/i386/pr100865-1.c | 13 ++ gcc/testsuite/gcc.target/i386/pr100865-10a.c | 33 +++ gcc/testsuite/gcc.target/i386/pr100865-10b.c | 7 + gcc/testsuite/gcc.target/i386/pr100865-2.c | 14 ++ gcc/testsuite/gcc.target/i386/pr100865-3.c | 15 ++ gcc/testsuite/gcc.target/i386/pr100865-4a.c | 16 ++ gcc/testsuite/gcc.target/i386/pr100865-4b.c | 9 + gcc/testsuite/gcc.target/i386/pr100865-5a.c | 16 ++ gcc/testsuite/gcc.target/i386/pr100865-5b.c | 9 + gcc/testsuite/gcc.target/i386/pr100865-6a.c | 16 ++ gcc/testsuite/gcc.target/i386/pr100865-6b.c | 9 + gcc/testsuite/gcc.target/i386/pr100865-7a.c | 17 ++ gcc/testsuite/gcc.target/i386/pr100865-7b.c | 9 + gcc/testsuite/gcc.target/i386/pr100865-8a.c | 24 ++ gcc/testsuite/gcc.target/i386/pr100865-8b.c | 7 + gcc/testsuite/gcc.target/i386/pr100865-9a.c | 25 ++ gcc/testsuite/gcc.target/i386/pr100865-9b.c | 7 + 26 files changed, 528 insertions(+), 24 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-10a.c create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-10b.c create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-2.c create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-3.c create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-4a.c create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-4b.c create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-5a.c create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-5b.c create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-6a.c create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-6b.c create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-7a.c create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-7b.c create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-8a.c create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-8b.c create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-9a.c create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-9b.c diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index 804cb596867..04361cb331e 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -93,6 +93,9 @@ along with GCC; see the file COPYING3. If not see #include "i386-builtins.h" #include "i386-expand.h" +static bool ix86_expand_vector_init_duplicate (bool, machine_mode, rtx, + rtx); + /* Split one or more double-mode RTL references into pairs of half-mode references. The RTL can be REG, offsettable MEM, integer constant, or CONST_DOUBLE. "operands" is a pointer to an array of double-mode RTLs to @@ -190,6 +193,88 @@ ix86_expand_clear (rtx dest) emit_insn (tmp); } +/* Return true if V can be broadcasted from an integer of WIDTH bits + which is returned in VAL_BROADCAST. Otherwise, return false. */ + +static bool +ix86_broadcast (HOST_WIDE_INT v, unsigned int width, + HOST_WIDE_INT &val_broadcast) +{ + wide_int val = wi::uhwi (v, HOST_BITS_PER_WIDE_INT); + val_broadcast = wi::extract_uhwi (val, 0, width); + for (unsigned int i = width; i < HOST_BITS_PER_WIDE_INT; i += width) + { + HOST_WIDE_INT each = wi::extract_uhwi (val, i, width); + if (val_broadcast != each) + return false; + } + val_broadcast = sext_hwi (val_broadcast, width); + return true; +} + +/* Convert the CONST_WIDE_INT operand OP to broadcast in MODE. */ + +static rtx +ix86_convert_const_wide_int_to_broadcast (machine_mode mode, rtx op) +{ + /* Don't use integer vector broadcast if we can't move from GPR to SSE + register directly. */ + if (!TARGET_INTER_UNIT_MOVES_TO_VEC) + return nullptr; + + /* Convert CONST_WIDE_INT to a non-standard SSE constant integer + broadcast only if vector broadcast is available. */ + if (!TARGET_AVX2 + || !CONST_WIDE_INT_P (op) + || standard_sse_constant_p (op, mode)) + return nullptr; + + HOST_WIDE_INT val = CONST_WIDE_INT_ELT (op, 0); + HOST_WIDE_INT val_broadcast; + scalar_int_mode broadcast_mode; + if (ix86_broadcast (val, GET_MODE_BITSIZE (QImode), + val_broadcast)) + broadcast_mode = QImode; + else if (ix86_broadcast (val, GET_MODE_BITSIZE (HImode), + val_broadcast)) + broadcast_mode = HImode; + else if (ix86_broadcast (val, GET_MODE_BITSIZE (SImode), + val_broadcast)) + broadcast_mode = SImode; + else if (TARGET_64BIT + && ix86_broadcast (val, GET_MODE_BITSIZE (DImode), + val_broadcast)) + { + /* NB: MOVQ takes a 32-bit signed immediate operand. */ + if (trunc_int_for_mode (val_broadcast, SImode) != val_broadcast) + return nullptr; + broadcast_mode = DImode; + } + else + return nullptr; + + /* Check if OP can be broadcasted from VAL. */ + for (int i = 1; i < CONST_WIDE_INT_NUNITS (op); i++) + if (val != CONST_WIDE_INT_ELT (op, i)) + return nullptr; + + unsigned int nunits = (GET_MODE_SIZE (mode) + / GET_MODE_SIZE (broadcast_mode)); + machine_mode vector_mode; + if (!mode_for_vector (broadcast_mode, nunits).exists (&vector_mode)) + gcc_unreachable (); + rtx target = ix86_gen_scratch_sse_rtx (vector_mode, true); + if (!ix86_expand_vector_init_duplicate (false, vector_mode, target, + GEN_INT (val_broadcast))) + gcc_unreachable (); + if (REGNO (target) < FIRST_PSEUDO_REGISTER) + target = gen_rtx_REG (mode, REGNO (target)); + else + target = convert_to_mode (mode, target, 1); + + return target; +} + void ix86_expand_move (machine_mode mode, rtx operands[]) { @@ -347,20 +432,29 @@ ix86_expand_move (machine_mode mode, rtx operands[]) && optimize) op1 = copy_to_mode_reg (mode, op1); - if (can_create_pseudo_p () - && CONST_DOUBLE_P (op1)) + if (can_create_pseudo_p ()) { - /* If we are loading a floating point constant to a register, - force the value to memory now, since we'll get better code - out the back end. */ + if (CONST_DOUBLE_P (op1)) + { + /* If we are loading a floating point constant to a + register, force the value to memory now, since we'll + get better code out the back end. */ - op1 = validize_mem (force_const_mem (mode, op1)); - if (!register_operand (op0, mode)) + op1 = validize_mem (force_const_mem (mode, op1)); + if (!register_operand (op0, mode)) + { + rtx temp = gen_reg_rtx (mode); + emit_insn (gen_rtx_SET (temp, op1)); + emit_move_insn (op0, temp); + return; + } + } + else if (GET_MODE_SIZE (mode) >= 16) { - rtx temp = gen_reg_rtx (mode); - emit_insn (gen_rtx_SET (temp, op1)); - emit_move_insn (op0, temp); - return; + rtx tmp = ix86_convert_const_wide_int_to_broadcast + (GET_MODE (op0), op1); + if (tmp != nullptr) + op1 = tmp; } } } @@ -368,6 +462,54 @@ ix86_expand_move (machine_mode mode, rtx operands[]) emit_insn (gen_rtx_SET (op0, op1)); } +static rtx +ix86_broadcast_from_integer_constant (machine_mode mode, rtx op) +{ + int nunits = GET_MODE_NUNITS (mode); + if (nunits < 2) + return nullptr; + + /* Don't use integer vector broadcast if we can't move from GPR to SSE + register directly. */ + if (!TARGET_INTER_UNIT_MOVES_TO_VEC) + return nullptr; + + /* Don't broadcast from a standard SSE constant integer. */ + if (standard_sse_constant_p (op, mode)) + return nullptr; + + /* Don't broadcast from a 64-bit integer constant in 32-bit mode. */ + if (GET_MODE_INNER (mode) == DImode && !TARGET_64BIT) + return nullptr; + + rtx constant = get_pool_constant (XEXP (op, 0)); + if (GET_CODE (constant) != CONST_VECTOR) + return nullptr; + + /* There could be some rtx like + (mem/u/c:V16QI (symbol_ref/u:DI ("*.LC1"))) + but with "*.LC1" refer to V2DI constant vector. */ + if (GET_MODE (constant) != mode) + { + constant = simplify_subreg (mode, constant, GET_MODE (constant), + 0); + if (constant == nullptr || GET_CODE (constant) != CONST_VECTOR) + return nullptr; + } + + rtx first = XVECEXP (constant, 0, 0); + + for (int i = 1; i < nunits; ++i) + { + rtx tmp = XVECEXP (constant, 0, i); + /* Vector duplicate value. */ + if (!rtx_equal_p (tmp, first)) + return nullptr; + } + + return first; +} + void ix86_expand_vector_move (machine_mode mode, rtx operands[]) { @@ -407,7 +549,33 @@ ix86_expand_vector_move (machine_mode mode, rtx operands[]) op1 = simplify_gen_subreg (mode, r, imode, SUBREG_BYTE (op1)); } else - op1 = validize_mem (force_const_mem (mode, op1)); + { + machine_mode mode = GET_MODE (op0); + rtx tmp = ix86_convert_const_wide_int_to_broadcast + (mode, op1); + if (tmp == nullptr) + op1 = validize_mem (force_const_mem (mode, op1)); + else + op1 = tmp; + } + } + + rtx first; + + if (can_create_pseudo_p () + && GET_MODE_SIZE (mode) >= 16 + && GET_MODE_CLASS (mode) == MODE_VECTOR_INT + && (MEM_P (op1) + && SYMBOL_REF_P (XEXP (op1, 0)) + && CONSTANT_POOL_ADDRESS_P (XEXP (op1, 0))) + && (first = ix86_broadcast_from_integer_constant (mode, op1))) + { + /* Broadcast to XMM/YMM/ZMM register from an integer constant. */ + op1 = ix86_gen_scratch_sse_rtx (mode, false); + if (!ix86_expand_vector_init_duplicate (false, mode, op1, first)) + gcc_unreachable (); + emit_move_insn (op0, op1); + return; } /* We need to check memory alignment for SSE mode since attribute @@ -15496,6 +15664,30 @@ ix86_expand_vector_extract (bool mmx_ok, rtx target, rtx vec, int elt) } } +bool +ix86_expand_integer_vec_duplicate (rtx *operands) +{ + /* Don't use integer vector broadcast if we can't move from GPR to SSE + register directly. */ + if (!TARGET_INTER_UNIT_MOVES_TO_VEC) + return false; + + /* Enable VEC_DUPLICATE from a non-standard SSE constant integer only + if vector broadcast is available. */ + if (CONST_INT_P (operands[1]) + && (!TARGET_AVX2 + || standard_sse_constant_p (operands[1], + GET_MODE (operands[0])))) + return false; + + if (!ix86_expand_vector_init_duplicate (false, + GET_MODE (operands[0]), + operands[0], operands[1])) + gcc_unreachable (); + + return true; +} + /* Generate code to copy vector bits i / 2 ... i - 1 from vector SRC to bits 0 ... i / 2 - 1 of vector DEST, which has the same mode. The upper bits of DEST are undefined, though they shouldn't cause diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h index 7782cf1163f..f68617e77fd 100644 --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h @@ -50,6 +50,8 @@ extern void ix86_reset_previous_fndecl (void); extern bool ix86_using_red_zone (void); +extern rtx ix86_gen_scratch_sse_rtx (machine_mode, bool); + extern unsigned int ix86_regmode_natural_size (machine_mode); #ifdef RTX_CODE extern int standard_80387_constant_p (rtx); @@ -257,6 +259,7 @@ extern void ix86_expand_mul_widen_hilo (rtx, rtx, rtx, bool, bool); extern void ix86_expand_sse2_mulv4si3 (rtx, rtx, rtx); extern void ix86_expand_sse2_mulvxdi3 (rtx, rtx, rtx); extern void ix86_expand_sse2_abs (rtx, rtx); +extern bool ix86_expand_integer_vec_duplicate (rtx *); /* In i386-c.c */ extern void ix86_target_macros (void); diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 04649b42122..795a7320f94 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -23061,6 +23061,37 @@ ix86_optab_supported_p (int op, machine_mode mode1, machine_mode, } } +/* Return a scratch register in MODE for vector load and store. If + CONSTANT_INT_BROADCAST is true, it is used to hold constant integer + broadcast result. */ + +rtx +ix86_gen_scratch_sse_rtx (machine_mode mode, + bool constant_int_broadcast) +{ + rtx target; + + /* NB: Choose a hard scratch SSE register: + 1. Avoid increasing stack alignment requirement. + 2. For integer constant broadcast in 64-bit mode, avoid + transformation by the combine pass. + */ + if (GET_MODE_SIZE (mode) >= 16 + && !COMPLEX_MODE_P (mode) + && (SCALAR_INT_MODE_P (mode) + || GET_MODE_CLASS (mode) == MODE_VECTOR_INT) + && ((constant_int_broadcast + && TARGET_64BIT + && GET_MODE_SIZE (mode) == 16) + || GET_MODE_ALIGNMENT (mode) > crtl->stack_alignment_estimated)) + target = gen_rtx_REG (mode, (TARGET_64BIT + ? LAST_REX_SSE_REG + : LAST_SSE_REG)); + else + target = gen_reg_rtx (mode); + return target; +} + /* Address space support. This is not "far pointers" in the 16-bit sense, but an easy way diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index e4248e554eb..73d6d49a426 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -24601,3 +24601,22 @@ "TARGET_WIDEKL" "aes<aeswideklvariant>\t{%0}" [(set_attr "type" "other")]) + +;; Modes handled by broadcast patterns. +(define_mode_iterator INT_BROADCAST_MODE + [(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI + (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI + (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI + (V8DI "TARGET_AVX512F") (V4DI "TARGET_64BIT") V2DI]) + +;; Broadcast from an integer. +(define_expand "vec_duplicate<mode>" + [(set (match_operand:INT_BROADCAST_MODE 0 "register_operand") + (vec_duplicate:INT_BROADCAST_MODE + (match_operand:<ssescalarmode> 1 "general_operand")))] + "TARGET_SSE2" +{ + if (!ix86_expand_integer_vec_duplicate (operands)) + FAIL; + DONE; +}) diff --git a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c index 0563e696316..a2664d87f29 100644 --- a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c +++ b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c @@ -2,8 +2,11 @@ /* { dg-do compile } */ /* { dg-options "-O2 -mavx512f -mavx512dq" } */ /* { dg-additional-options "-mdynamic-no-pic" { target { *-*-darwin* && ia32 } } } -/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to8\\\}" 5 } } */ -/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to16\\\}" 5 } } */ +/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to8\\\}" 2 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to8\\\}" 5 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to16\\\}" 2 } } */ +/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %zmm\[0-9\]+" 3 } } */ +/* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+%r\[^\n\]*, %zmm\[0-9\]+" 3 { target { ! ia32 } } } } */ typedef int v16si __attribute__ ((vector_size (64))); typedef long long v8di __attribute__ ((vector_size (64))); diff --git a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-5.c b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-5.c index ffbe95980ca..477f9ca1282 100644 --- a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-5.c +++ b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-5.c @@ -2,8 +2,9 @@ /* { dg-do compile } */ /* { dg-options "-O2 -mavx512f" } */ /* { dg-additional-options "-mdynamic-no-pic" { target { *-*-darwin* && ia32 } } } -/* { dg-final { scan-assembler-times "\[^n\n\]*\\\{1to8\\\}" 4 } } */ -/* { dg-final { scan-assembler-times "\[^n\n\]*\\\{1to16\\\}" 4 } } */ +/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to8\\\}" 4 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %zmm\[0-9\]+" 4 } } */ +/* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+%r\[^\n\]*, %zmm\[0-9\]+" 4 { target { ! ia32 } } } } */ typedef int v16si __attribute__ ((vector_size (64))); typedef long long v8di __attribute__ ((vector_size (64))); diff --git a/gcc/testsuite/gcc.target/i386/avx512f_cond_move.c b/gcc/testsuite/gcc.target/i386/avx512f_cond_move.c index 99a89f51202..ca49a585232 100644 --- a/gcc/testsuite/gcc.target/i386/avx512f_cond_move.c +++ b/gcc/testsuite/gcc.target/i386/avx512f_cond_move.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -mavx512f" } */ -/* { dg-final { scan-assembler-times "(?:vpblendmd|vmovdqa32)\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 8 } } */ +/* { dg-options "-O3 -mavx512f -mprefer-vector-width=512" } */ +/* { dg-final { scan-assembler-times "(?:vpbroadcastd|vmovdqa32)\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 8 } } */ unsigned int x[128]; int y[128]; diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-1.c b/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-1.c index c06369d93fd..f8eb99f0b5f 100644 --- a/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-1.c +++ b/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-1.c @@ -2,9 +2,15 @@ /* { dg-do compile } */ /* { dg-options "-O2 -mavx512f -mavx512vl -mavx512dq" } */ /* { dg-additional-options "-mdynamic-no-pic" { target { *-*-darwin* && ia32 } } } -/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to2\\\}" 5 } } */ -/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to4\\\}" 10 } } */ -/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to8\\\}" 5 } } */ +/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to2\\\}" 2 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to4\\\}" 4 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to2\\\}" 5 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to4\\\}" 7 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to8\\\}" 2 } } */ +/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 3 } } */ +/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %ymm\[0-9\]+" 3 } } */ +/* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+%r\[^\n\]*, %xmm\[0-9\]+" 3 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+%r\[^\n\]*, %ymm\[0-9\]+" 3 { target { ! ia32 } } } } */ typedef int v4si __attribute__ ((vector_size (16))); typedef int v8si __attribute__ ((vector_size (32))); diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-5.c b/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-5.c index 4998a9b8d51..32f6ac81841 100644 --- a/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-5.c +++ b/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-5.c @@ -2,9 +2,12 @@ /* { dg-do compile } */ /* { dg-options "-O2 -mavx512f -mavx512vl" } */ /* { dg-additional-options "-mdynamic-no-pic" { target { *-*-darwin* && ia32 } } } -/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to2\\\}" 4 } } */ -/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to4\\\}" 8 } } */ -/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to8\\\}" 4 } } */ +/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to2\\\}" 4 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to4\\\}" 4 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 4 } } */ +/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %ymm\[0-9\]+" 4 } } */ +/* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+%r\[^\n\]*, %xmm\[0-9\]+" 4 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+%r\[^\n\]*, %ymm\[0-9\]+" 4 { target { ! ia32 } } } } */ typedef int v4si __attribute__ ((vector_size (16))); typedef int v8si __attribute__ ((vector_size (32))); diff --git a/gcc/testsuite/gcc.target/i386/pr100865-1.c b/gcc/testsuite/gcc.target/i386/pr100865-1.c new file mode 100644 index 00000000000..6c3097fb2a6 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-1.c @@ -0,0 +1,13 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -march=x86-64" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 3, 16); +} + +/* { dg-final { scan-assembler-times "movdqa\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* { dg-final { scan-assembler-times "movups\[\\t \]%xmm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-10a.c b/gcc/testsuite/gcc.target/i386/pr100865-10a.c new file mode 100644 index 00000000000..7ffc19e56a8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-10a.c @@ -0,0 +1,33 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O3 -march=skylake" } */ + +extern __int128 array[16]; + +#define MK_CONST128_BROADCAST(A) \ + ((((unsigned __int128) (unsigned char) A) << 120) \ + | (((unsigned __int128) (unsigned char) A) << 112) \ + | (((unsigned __int128) (unsigned char) A) << 104) \ + | (((unsigned __int128) (unsigned char) A) << 96) \ + | (((unsigned __int128) (unsigned char) A) << 88) \ + | (((unsigned __int128) (unsigned char) A) << 80) \ + | (((unsigned __int128) (unsigned char) A) << 72) \ + | (((unsigned __int128) (unsigned char) A) << 64) \ + | (((unsigned __int128) (unsigned char) A) << 56) \ + | (((unsigned __int128) (unsigned char) A) << 48) \ + | (((unsigned __int128) (unsigned char) A) << 40) \ + | (((unsigned __int128) (unsigned char) A) << 32) \ + | (((unsigned __int128) (unsigned char) A) << 24) \ + | (((unsigned __int128) (unsigned char) A) << 16) \ + | (((unsigned __int128) (unsigned char) A) << 8) \ + | ((unsigned __int128) (unsigned char) A) ) + +void +foo (void) +{ + int i; + for (i = 0; i < sizeof (array) / sizeof (array[0]); i++) + array[i] = MK_CONST128_BROADCAST (0x1f); +} + +/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+\[^\n\]*, %xmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-10b.c b/gcc/testsuite/gcc.target/i386/pr100865-10b.c new file mode 100644 index 00000000000..edf52765c60 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-10b.c @@ -0,0 +1,7 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O3 -march=skylake-avx512" } */ + +#include "pr100865-10a.c" + +/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-2.c b/gcc/testsuite/gcc.target/i386/pr100865-2.c new file mode 100644 index 00000000000..17efe2d72a3 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-2.c @@ -0,0 +1,14 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -march=skylake" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 3, 16); +} + +/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%xmm\[0-9\]+, %xmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%xmm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */ +/* { dg-final { scan-assembler-not "vmovdqa" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-3.c b/gcc/testsuite/gcc.target/i386/pr100865-3.c new file mode 100644 index 00000000000..b6dbcf7809b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-3.c @@ -0,0 +1,15 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -march=skylake-avx512" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 3, 16); +} + +/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%xmm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */ +/* { dg-final { scan-assembler-not "vpbroadcastb\[\\t \]+%xmm\[0-9\]+, %xmm\[0-9\]+" } } */ +/* { dg-final { scan-assembler-not "vmovdqa" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-4a.c b/gcc/testsuite/gcc.target/i386/pr100865-4a.c new file mode 100644 index 00000000000..f55883598f9 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-4a.c @@ -0,0 +1,16 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -march=skylake" } */ + +extern char array[64]; + +void +foo (void) +{ + int i; + for (i = 0; i < sizeof (array) / sizeof (array[0]); i++) + array[i] = -45; +} + +/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%xmm\[0-9\]+, %xmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%xmm\[0-9\]+, " 4 } } */ +/* { dg-final { scan-assembler-not "vmovdqa" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-4b.c b/gcc/testsuite/gcc.target/i386/pr100865-4b.c new file mode 100644 index 00000000000..f41e6147b4c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-4b.c @@ -0,0 +1,9 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -march=skylake-avx512" } */ + +#include "pr100865-4a.c" + +/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%xmm\[0-9\]+, " 4 } } */ +/* { dg-final { scan-assembler-not "vpbroadcastb\[\\t \]+%xmm\[0-9\]+, %xmm\[0-9\]+" } } */ +/* { dg-final { scan-assembler-not "vmovdqa" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-5a.c b/gcc/testsuite/gcc.target/i386/pr100865-5a.c new file mode 100644 index 00000000000..4149797fe81 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-5a.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -march=skylake" } */ + +extern short array[64]; + +void +foo (void) +{ + int i; + for (i = 0; i < sizeof (array) / sizeof (array[0]); i++) + array[i] = -45; +} + +/* { dg-final { scan-assembler-times "vpbroadcastw\[\\t \]+%xmm\[0-9\]+, %ymm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, " 4 } } */ +/* { dg-final { scan-assembler-not "vmovdqa" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-5b.c b/gcc/testsuite/gcc.target/i386/pr100865-5b.c new file mode 100644 index 00000000000..ded41b680d3 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-5b.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -march=skylake-avx512" } */ + +#include "pr100865-5a.c" + +/* { dg-final { scan-assembler-times "vpbroadcastw\[\\t \]+%(?:r|e)\[^\n\]*, %ymm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu16\[\\t \]%ymm\[0-9\]+, " 4 } } */ +/* { dg-final { scan-assembler-not "vpbroadcastw\[\\t \]+%xmm\[0-9\]+, %ymm\[0-9\]+" } } */ +/* { dg-final { scan-assembler-not "vmovdqa" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-6a.c b/gcc/testsuite/gcc.target/i386/pr100865-6a.c new file mode 100644 index 00000000000..3fde549a10d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-6a.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -march=skylake" } */ + +extern int array[64]; + +void +foo (void) +{ + int i; + for (i = 0; i < sizeof (array) / sizeof (array[0]); i++) + array[i] = -45; +} + +/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%xmm\[0-9\]+, %ymm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, " 8 } } */ +/* { dg-final { scan-assembler-not "vmovdqa" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-6b.c b/gcc/testsuite/gcc.target/i386/pr100865-6b.c new file mode 100644 index 00000000000..44e74c64e55 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-6b.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -march=skylake-avx512" } */ + +#include "pr100865-6a.c" + +/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %ymm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, " 8 } } */ +/* { dg-final { scan-assembler-not "vpbroadcastd\[\\t \]+%xmm\[0-9\]+, %ymm\[0-9\]+" } } */ +/* { dg-final { scan-assembler-not "vmovdqa" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-7a.c b/gcc/testsuite/gcc.target/i386/pr100865-7a.c new file mode 100644 index 00000000000..f6f2be91120 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-7a.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -march=skylake" } */ + +extern long long int array[64]; + +void +foo (void) +{ + int i; + for (i = 0; i < sizeof (array) / sizeof (array[0]); i++) + array[i] = -45; +} + +/* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+\[^\n\]*, %ymm\[0-9\]+" 1 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, " 16 } } */ +/* { dg-final { scan-assembler-not "vpbroadcastq" { target ia32 } } } */ +/* { dg-final { scan-assembler-not "vmovdqa" { target { ! ia32 } } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-7b.c b/gcc/testsuite/gcc.target/i386/pr100865-7b.c new file mode 100644 index 00000000000..0a68820aa32 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-7b.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -march=skylake-avx512" } */ + +#include "pr100865-7a.c" + +/* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+%r\[^\n\]*, %ymm\[0-9\]+" 1 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+\[^\n\]*, %ymm\[0-9\]+" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, " 16 } } */ +/* { dg-final { scan-assembler-not "vmovdqa" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-8a.c b/gcc/testsuite/gcc.target/i386/pr100865-8a.c new file mode 100644 index 00000000000..96e9f13204c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-8a.c @@ -0,0 +1,24 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O3 -march=skylake" } */ + +extern __int128 array[16]; + +#define MK_CONST128_BROADCAST(A) \ + ((((unsigned __int128) (unsigned int) A) << 96) \ + | (((unsigned __int128) (unsigned int) A) << 64) \ + | (((unsigned __int128) (unsigned int) A) << 32) \ + | ((unsigned __int128) (unsigned int) A) ) + +#define MK_CONST128_BROADCAST_SIGNED(A) \ + ((__int128) MK_CONST128_BROADCAST (A)) + +void +foo (void) +{ + int i; + for (i = 0; i < sizeof (array) / sizeof (array[0]); i++) + array[i] = MK_CONST128_BROADCAST_SIGNED (-45); +} + +/* { dg-final { scan-assembler-times "(?:vpbroadcastq|vpshufd)\[\\t \]+\[^\n\]*, %xmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-8b.c b/gcc/testsuite/gcc.target/i386/pr100865-8b.c new file mode 100644 index 00000000000..99a10ad83bd --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-8b.c @@ -0,0 +1,7 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O3 -march=skylake-avx512" } */ + +#include "pr100865-8a.c" + +/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-9a.c b/gcc/testsuite/gcc.target/i386/pr100865-9a.c new file mode 100644 index 00000000000..45d0e0d0e2e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-9a.c @@ -0,0 +1,25 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O3 -march=skylake" } */ + +extern __int128 array[16]; + +#define MK_CONST128_BROADCAST(A) \ + ((((unsigned __int128) (unsigned short) A) << 112) \ + | (((unsigned __int128) (unsigned short) A) << 96) \ + | (((unsigned __int128) (unsigned short) A) << 80) \ + | (((unsigned __int128) (unsigned short) A) << 64) \ + | (((unsigned __int128) (unsigned short) A) << 48) \ + | (((unsigned __int128) (unsigned short) A) << 32) \ + | (((unsigned __int128) (unsigned short) A) << 16) \ + | ((unsigned __int128) (unsigned short) A) ) + +void +foo (void) +{ + int i; + for (i = 0; i < sizeof (array) / sizeof (array[0]); i++) + array[i] = MK_CONST128_BROADCAST (0x1fff); +} + +/* { dg-final { scan-assembler-times "vpbroadcastw\[\\t \]+%xmm\[0-9\]+, %xmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-9b.c b/gcc/testsuite/gcc.target/i386/pr100865-9b.c new file mode 100644 index 00000000000..14696248525 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-9b.c @@ -0,0 +1,7 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O3 -march=skylake-avx512" } */ + +#include "pr100865-9a.c" + +/* { dg-final { scan-assembler-times "vpbroadcastw\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */ -- 2.31.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2021-06-09 21:31 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-06-05 15:18 [PATCH v2 0/2] Allow vec_duplicate_optab to fail H.J. Lu 2021-06-05 15:18 ` [PATCH v2 1/2] " H.J. Lu 2021-06-07 7:12 ` Richard Sandiford 2021-06-07 14:18 ` H.J. Lu 2021-06-07 17:59 ` Richard Biener 2021-06-07 18:10 ` Richard Biener 2021-06-07 20:33 ` [PATCH] middle-end/100951 - make sure to generate VECTOR_CST in lowering H.J. Lu 2021-06-09 21:03 ` Jeff Law 2021-06-09 21:31 ` H.J. Lu 2021-06-05 15:18 ` [PATCH v2 2/2] x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast H.J. Lu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).