public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Hongtao Liu <crazylht@gmail.com>
To: Jan Hubicka <hubicka@ucw.cz>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] [AVX512] [PR87767] Optimize memory broadcast for constant vector under AVX512
Date: Tue, 4 Aug 2020 14:05:58 +0800	[thread overview]
Message-ID: <CAMZc-bx2yWXv7d1D9FNOBo_qbaCUk07PLRkfy6HuNyZG9RhqYA@mail.gmail.com> (raw)
In-Reply-To: <CAMZc-bySuDdFimV6_=HnyptoctvZsLk8vEkVe9yvsZ90mNUOng@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 21769 bytes --]

Update patch.

There are a lot of avx512 define_insns which lack corresponding memory
broadcast version, i only add *avx512f_mul<mode>3<mask_name>_bcst and
*avx512dq_mul<mode>3<mask_name>_bcst in this patch.

On Fri, Jul 24, 2020 at 10:37 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Thu, Jul 23, 2020 at 9:53 PM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Thu, Jul 23, 2020 at 4:39 PM Jan Hubicka <hubicka@ucw.cz> wrote:
> > >
> > > Hello,
> > > sorry for taking so long to get to this.
> > > > diff --git a/gcc/config/i386/i386-features.c b/gcc/config/i386/i386-features.c
> > > > index 535fc7e981d..8f81d101382 100644
> > > > --- a/gcc/config/i386/i386-features.c
> > > > +++ b/gcc/config/i386/i386-features.c
> > > > @@ -2379,6 +2379,152 @@ make_pass_remove_partial_avx_dependency (gcc::context *ctxt)
> > > >    return new pass_remove_partial_avx_dependency (ctxt);
> > > >  }
> > > >
> > > > +/* Replace all one-value const vector that are referenced by SYMBOL_REFs in x
> > > > +   with embedded broadcast. i.e.transform
> > > > +
> > > > +     vpaddq .LC0(%rip), %zmm0, %zmm0
> > > > +     ret
> > > > +  .LC0:
> > > > +    .quad 3
> > > > +    .quad 3
> > > > +    .quad 3
> > > > +    .quad 3
> > > > +    .quad 3
> > > > +    .quad 3
> > > > +    .quad 3
> > > > +    .quad 3
> > > > +
> > > > +    to
> > > > +
> > > > +     vpaddq .LC0(%rip){1to8}, %zmm0, %zmm0
> > >
> > > It seems to me that having a special purpose pass for this is bit
> > > overzelaous.  It seems to me that you can do same pattern matching via
> > > splitter and fit it into the usual insn splitting pass?
> > >
> >
> > From an implementation perspective, there could be lots of work, since
> > memory embedding broadcast is available for nearly every instruction
> > in AVX512. And for new added AVX512 instructions, we also need to add
> > a define_split for them.
> >
>
> I'll add more tests to show my point.
>
> > > Honza
> > > > +     ret
> > > > +  .LC0:
> > > > +    .quad 3  */
> > > > +static void
> > > > +replace_constant_pool_with_broadcast (rtx_insn* insn)
> > > > +{
> > > > +  subrtx_ptr_iterator::array_type array;
> > > > +  FOR_EACH_SUBRTX_PTR (iter, array, &PATTERN (insn), ALL)
> > > > +    {
> > > > +      rtx *loc = *iter;
> > > > +      rtx x = *loc;
> > > > +      rtx broadcast_mem, vec_dup, constant, first;
> > > > +      machine_mode mode;
> > > > +      if (GET_CODE (x) != MEM
> > > > +       || GET_CODE (XEXP (x, 0)) != SYMBOL_REF
> > > > +       || !CONSTANT_POOL_ADDRESS_P (XEXP (x, 0)))
> > > > +     continue;
> > > > +
> > > > +      mode = GET_MODE (x);
> > > > +      if (!VECTOR_MODE_P (mode))
> > > > +     return;
> > > > +
> > > > +      constant = get_pool_constant (XEXP (x, 0));
> > > > +      first = XVECEXP (constant, 0, 0);
> > > > +      /* There could be some rtx like
> > > > +      (mem/u/c:V16QI (symbol_ref/u:DI ("*.LC1")))
> > > > +      but with "*.LC1" refer to V2DI constant vector.  */
> > > > +      if (GET_MODE (constant) != mode)
> > > > +     return;
> > > > +
> > > > +      for (int i = 1; i < GET_MODE_NUNITS (mode); ++i)
> > > > +     {
> > > > +       rtx tmp = XVECEXP (constant, 0, i);
> > > > +       /* Only handle one-value const vector.  */
> > > > +       if (!rtx_equal_p (tmp, first))
> > > > +         return;
> > > > +     }
> > > > +
> > > > +      broadcast_mem = force_const_mem (GET_MODE_INNER (mode), first);
> > > > +      vec_dup = gen_rtx_VEC_DUPLICATE (mode, broadcast_mem);
> > > > +      *loc = vec_dup;
> > > > +      INSN_CODE (insn) = -1;
> > > > +      /* Revert change if there's no corresponding pattern.  */
> > > > +      if (recog_memoized (insn) < 0)
> > > > +             {
> > > > +               *loc = x;
> > > > +               recog_memoized (insn);
> > > > +             }
> > > > +      /* At most 1 memory_operand in an insn.  */
> > > > +      return;
> > > > +    }
> > > > +}
> > > > +
> > > > +/* For const vector having one duplicated value, there's no need to put
> > > > +   whole vector in the constant pool when target supports embedded broadcast. */
> > > > +static unsigned int
> > > > +constant_pool_broadcast (void)
> > > > +{
> > > > +  timevar_push (TV_MACH_DEP);
> > > > +  rtx_insn *insn;
> > > > +
> > > > +  for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
> > > > +    {
> > > > +      if (!INSN_P (insn))
> > > > +     continue;
> > > > +
> > > > +      /* Insns may appear inside a SEQUENCE.  Only check the patterns of
> > > > +      insns, not any notes that may be attached.  We don't want to mark
> > > > +      a constant just because it happens to appear in a REG_EQUIV note.  */
> > > > +      if (rtx_sequence *seq = dyn_cast <rtx_sequence *> (PATTERN (insn)))
> > > > +     {
> > > > +       int i, n = seq->len ();
> > > > +       for (i = 0; i < n; ++i)
> > > > +         {
> > > > +           rtx subinsn = seq->element (i);
> > > > +           if (INSN_P (subinsn))
> > > > +             replace_constant_pool_with_broadcast (dyn_cast <rtx_insn *> (subinsn));
> > > > +         }
> > > > +     }
> > > > +      else
> > > > +     replace_constant_pool_with_broadcast (insn);
> > > > +    }
> > > > +  timevar_pop (TV_MACH_DEP);
> > > > +  return 0;
> > > > +}
> > > > +
> > > > +namespace {
> > > > +
> > > > +const pass_data pass_data_constant_pool_broadcast =
> > > > +{
> > > > +  RTL_PASS, /* type */
> > > > +  "cpb", /* name */
> > > > +  OPTGROUP_NONE, /* optinfo_flags */
> > > > +  TV_MACH_DEP, /* tv_id */
> > > > +  0, /* properties_required */
> > > > +  0, /* properties_provided */
> > > > +  0, /* properties_destroyed */
> > > > +  0, /* todo_flags_start */
> > > > +  TODO_df_finish, /* todo_flags_finish */
> > > > +};
> > > > +
> > > > +class pass_constant_pool_broadcast : public rtl_opt_pass
> > > > +{
> > > > +public:
> > > > +  pass_constant_pool_broadcast (gcc::context *ctxt)
> > > > +    : rtl_opt_pass (pass_data_constant_pool_broadcast, ctxt)
> > > > +  {}
> > > > +
> > > > +  /* opt_pass methods: */
> > > > +  virtual bool gate (function *)
> > > > +    {
> > > > +      return TARGET_AVX512F;
> > > > +    }
> > > > +
> > > > +  virtual unsigned int execute (function *)
> > > > +    {
> > > > +      return constant_pool_broadcast ();
> > > > +    }
> > > > +}; // class pass_cpb
> > > > +
> > > > +} // anon namespace
> > > > +
> > > > +rtl_opt_pass *
> > > > +make_pass_constant_pool_broadcast (gcc::context *ctxt)
> > > > +{
> > > > +  return new pass_constant_pool_broadcast (ctxt);
> > > > +}
> > > > +
> > > >  /* This compares the priority of target features in function DECL1
> > > >     and DECL2.  It returns positive value if DECL1 is higher priority,
> > > >     negative value if DECL2 is higher priority and 0 if they are the
> > > > diff --git a/gcc/config/i386/i386-passes.def b/gcc/config/i386/i386-passes.def
> > > > index d83c7b956b1..07ecf8e790f 100644
> > > > --- a/gcc/config/i386/i386-passes.def
> > > > +++ b/gcc/config/i386/i386-passes.def
> > > > @@ -33,3 +33,4 @@ along with GCC; see the file COPYING3.  If not see
> > > >    INSERT_PASS_BEFORE (pass_shorten_branches, 1, pass_insert_endbr_and_patchable_area);
> > > >
> > > >    INSERT_PASS_AFTER (pass_combine, 1, pass_remove_partial_avx_dependency);
> > > > +  INSERT_PASS_AFTER (pass_combine, 1, pass_constant_pool_broadcast);
> > > > diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
> > > > index 7c2ce618f3f..6c6909b41dd 100644
> > > > --- a/gcc/config/i386/i386-protos.h
> > > > +++ b/gcc/config/i386/i386-protos.h
> > > > @@ -386,3 +386,4 @@ extern rtl_opt_pass *make_pass_insert_endbr_and_patchable_area
> > > >    (gcc::context *);
> > > >  extern rtl_opt_pass *make_pass_remove_partial_avx_dependency
> > > >    (gcc::context *);
> > > > +extern rtl_opt_pass *make_pass_constant_pool_broadcast (gcc::context *);
> > > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > > index 431571a4bc1..fbfb459c5bf 100644
> > > > --- a/gcc/config/i386/sse.md
> > > > +++ b/gcc/config/i386/sse.md
> > > > @@ -12127,6 +12127,19 @@
> > > >     (set_attr "prefix" "evex")
> > > >     (set_attr "mode" "<sseinsnmode>")])
> > > >
> > > > +(define_insn "*avx512dq_mul<mode>3<mask_name>_bcst"
> > > > +  [(set (match_operand:VI8_AVX512VL 0 "register_operand" "=v")
> > > > +     (mult:VI8_AVX512VL
> > > > +       (vec_duplicate:VI8_AVX512VL
> > > > +         (match_operand:<ssescalarmode> 1 "memory_operand" "m"))
> > > > +       (match_operand:VI8_AVX512VL 2 "register_operand" "v")
> > > > +))]
> > > > +  "TARGET_AVX512DQ"
> > > > +  "vpmullq\t{%1<avx512bcst>, %2, %0<mask_operand3>|%0<mask_operand3>, %2, %1<avx512bcst>}"
> > > > +  [(set_attr "type" "sseimul")
> > > > +   (set_attr "prefix" "evex")
> > > > +   (set_attr "mode" "<sseinsnmode>")])
> > > > +
> > > >  (define_expand "mul<mode>3<mask_name>"
> > > >    [(set (match_operand:VI4_AVX512F 0 "register_operand")
> > > >       (mult:VI4_AVX512F
> > > > @@ -12167,6 +12180,18 @@
> > > >     (set_attr "btver2_decode" "vector,vector,vector")
> > > >     (set_attr "mode" "<sseinsnmode>")])
> > > >
> > > > +(define_insn "*avx512f_mul<mode>3<mask_name>_bcst"
> > > > +  [(set (match_operand:VI4_AVX512VL 0 "register_operand" "=v")
> > > > +     (mult:VI4_AVX512VL
> > > > +       (vec_duplicate:VI4_AVX512VL
> > > > +         (match_operand:<ssescalarmode> 1 "memory_operand" "m"))
> > > > +       (match_operand:VI4_AVX512VL 2 "register_operand" "v")))]
> > > > +  "TARGET_AVX512F"
> > > > +   "vpmulld\t{%1<avx512bcst>, %2, %0<mask_operand3>|%0<mask_operand3>, %2, %1<avx512bcst>}"
> > > > +  [(set_attr "type" "sseimul")
> > > > +   (set_attr "prefix" "evex")
> > > > +   (set_attr "mode" "<sseinsnmode>")])
> > > > +
> > > >  (define_expand "mul<mode>3"
> > > >    [(set (match_operand:VI8_AVX2_AVX512F 0 "register_operand")
> > > >       (mult:VI8_AVX2_AVX512F
> > > > diff --git a/gcc/testsuite/gcc.target/i386/avx2-broadcast-pr87767-1.c b/gcc/testsuite/gcc.target/i386/avx2-broadcast-pr87767-1.c
> > > > new file mode 100644
> > > > index 00000000000..800ef1f957e
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/avx2-broadcast-pr87767-1.c
> > > > @@ -0,0 +1,40 @@
> > > > +/* PR target/87767 */
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O2 -mavx2" } */
> > > > +/* { dg-final { scan-assembler-not "\\\{1to\[248\]\\\}" } }  */
> > > > +/* { dg-final { scan-assembler-not "\\\{1to16\\\}" } }  */
> > > > +
> > > > +typedef int v4si  __attribute__ ((vector_size (16)));
> > > > +typedef int v8si  __attribute__ ((vector_size (32)));
> > > > +typedef long long v2di  __attribute__ ((vector_size (16)));
> > > > +typedef long long v4di  __attribute__ ((vector_size (32)));
> > > > +typedef float v4sf  __attribute__ ((vector_size (16)));
> > > > +typedef float v8sf  __attribute__ ((vector_size (32)));
> > > > +typedef double v2df  __attribute__ ((vector_size (16)));
> > > > +typedef double v4df  __attribute__ ((vector_size (32)));
> > > > +
> > > > +#define FOO(VTYPE, OP_NAME, OP)                      \
> > > > +VTYPE                                                \
> > > > + __attribute__ ((noipa))                     \
> > > > +foo_##OP_NAME##_##VTYPE (VTYPE a)            \
> > > > +{                                            \
> > > > +  return a OP 101;                           \
> > > > +}                                            \
> > > > +
> > > > +FOO (v4si, add, +);
> > > > +FOO (v8si, add, +);
> > > > +FOO (v2di, add, +);
> > > > +FOO (v4di, add, +);
> > > > +FOO (v4sf, add, +);
> > > > +FOO (v8sf, add, +);
> > > > +FOO (v2df, add, +);
> > > > +FOO (v4df, add, +);
> > > > +
> > > > +FOO (v4si, mul, *);
> > > > +FOO (v8si, mul, *);
> > > > +FOO (v2di, mul, *);
> > > > +FOO (v4di, mul, *);
> > > > +FOO (v4sf, mul, *);
> > > > +FOO (v8sf, mul, *);
> > > > +FOO (v2df, mul, *);
> > > > +FOO (v4df, mul, *);
> > > > diff --git a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c
> > > > new file mode 100644
> > > > index 00000000000..21249bc0cf9
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c
> > > > @@ -0,0 +1,66 @@
> > > > +/* PR target/87767 */
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O2 -mavx512f -mavx512vl -mavx512dq" } */
> > > > +/* { dg-final { scan-assembler-times "vpadd\[^\n\]*\\\{1to2\\\}" 1 } }  */
> > > > +/* { dg-final { scan-assembler-times "vpadd\[^\n\]*\\\{1to4\\\}" 2 } }  */
> > > > +/* { dg-final { scan-assembler-times "vpadd\[^\n\]*\\\{1to8\\\}" 2 } }  */
> > > > +/* { dg-final { scan-assembler-times "vpadd\[^\n\]*\\\{1to16\\\}" 1 } }  */
> > > > +/* { dg-final { scan-assembler-times "vpmul\[^\n\]*\\\{1to2\\\}" 1 } }  */
> > > > +/* { dg-final { scan-assembler-times "vpmul\[^\n\]*\\\{1to4\\\}" 2 } }  */
> > > > +/* { dg-final { scan-assembler-times "vpmul\[^\n\]*\\\{1to8\\\}" 2 } }  */
> > > > +/* { dg-final { scan-assembler-times "vpmul\[^\n\]*\\\{1to16\\\}" 1 } }  */
> > > > +/* { dg-final { scan-assembler-times "vadd\[^\n\]*\\\{1to2\\\}" 1 } }  */
> > > > +/* { dg-final { scan-assembler-times "vadd\[^\n\]*\\\{1to4\\\}" 2 } }  */
> > > > +/* { dg-final { scan-assembler-times "vadd\[^\n\]*\\\{1to8\\\}" 2 } }  */
> > > > +/* { dg-final { scan-assembler-times "vadd\[^\n\]*\\\{1to16\\\}" 1 } }  */
> > > > +/* { dg-final { scan-assembler-times "vmul\[^\n\]*\\\{1to2\\\}" 1 } }  */
> > > > +/* { dg-final { scan-assembler-times "vmul\[^\n\]*\\\{1to4\\\}" 2 } }  */
> > > > +/* { dg-final { scan-assembler-times "vmul\[^\n\]*\\\{1to8\\\}" 2 } }  */
> > > > +/* { dg-final { scan-assembler-times "vmul\[^\n\]*\\\{1to16\\\}" 1 } }  */
> > > > +
> > > > +typedef int v4si  __attribute__ ((vector_size (16)));
> > > > +typedef int v8si  __attribute__ ((vector_size (32)));
> > > > +typedef int v16si  __attribute__ ((vector_size (64)));
> > > > +typedef long long v2di  __attribute__ ((vector_size (16)));
> > > > +typedef long long v4di  __attribute__ ((vector_size (32)));
> > > > +typedef long long v8di  __attribute__ ((vector_size (64)));
> > > > +typedef float v4sf  __attribute__ ((vector_size (16)));
> > > > +typedef float v8sf  __attribute__ ((vector_size (32)));
> > > > +typedef float v16sf  __attribute__ ((vector_size (64)));
> > > > +typedef double v2df  __attribute__ ((vector_size (16)));
> > > > +typedef double v4df  __attribute__ ((vector_size (32)));
> > > > +typedef double v8df  __attribute__ ((vector_size (64)));
> > > > +
> > > > +#define FOO(VTYPE, OP_NAME, OP)                      \
> > > > +VTYPE                                                \
> > > > + __attribute__ ((noipa))                     \
> > > > +foo_##OP_NAME##_##VTYPE (VTYPE a)            \
> > > > +{                                            \
> > > > +  return a OP 101;                           \
> > > > +}                                            \
> > > > +
> > > > +FOO (v4si, add, +);
> > > > +FOO (v8si, add, +);
> > > > +FOO (v16si, add, +);
> > > > +FOO (v2di, add, +);
> > > > +FOO (v4di, add, +);
> > > > +FOO (v8di, add, +);
> > > > +FOO (v4sf, add, +);
> > > > +FOO (v8sf, add, +);
> > > > +FOO (v16sf, add, +);
> > > > +FOO (v2df, add, +);
> > > > +FOO (v4df, add, +);
> > > > +FOO (v8df, add, +);
> > > > +
> > > > +FOO (v4si, mul, *);
> > > > +FOO (v8si, mul, *);
> > > > +FOO (v16si, mul, *);
> > > > +FOO (v2di, mul, *);
> > > > +FOO (v4di, mul, *);
> > > > +FOO (v8di, mul, *);
> > > > +FOO (v4sf, mul, *);
> > > > +FOO (v8sf, mul, *);
> > > > +FOO (v16sf, mul, *);
> > > > +FOO (v2df, mul, *);
> > > > +FOO (v4df, mul, *);
> > > > +FOO (v8df, mul, *);
> > > > diff --git a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-2.c b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-2.c
> > > > new file mode 100644
> > > > index 00000000000..938346743c2
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-2.c
> > > > @@ -0,0 +1,54 @@
> > > > +/* PR target/87767 */
> > > > +/* { dg-do run } */
> > > > +/* { dg-options "-O2 -mavx512f -mavx512vl -mavx512dq" } */
> > > > +
> > > > +#include<stdlib.h>
> > > > +#include<stdio.h>
> > > > +#include "avx512f-broadcast-pr87767-1.c"
> > > > +
> > > > +#define TEST(VTYPE, TYPE, N, OP_NAME, OP)            \
> > > > +  do                                                 \
> > > > +    {                                                        \
> > > > +      TYPE exp[N], src[N];                           \
> > > > +      VTYPE res;                                     \
> > > > +      for (int i = 0; i < N; i++)                    \
> > > > +     src[i] = i * i * 107;                           \
> > > > +      res = foo_##OP_NAME##_##VTYPE (*(VTYPE*)&src[0]);      \
> > > > +      for (int i = 0; i < N; i ++)                   \
> > > > +     exp[i] = src[i] OP 101;                         \
> > > > +      for (int j = 0; j < N; j++)                    \
> > > > +     {                                               \
> > > > +       if (res[j] != exp[j])                         \
> > > > +         abort();                                    \
> > > > +     }                                               \
> > > > +    }                                                        \
> > > > +  while (0)
> > > > +
> > > > +int main()
> > > > +{
> > > > +  TEST (v4si, int, 4, add, +);
> > > > +  TEST (v8si, int, 8, add, +);
> > > > +  TEST (v16si, int, 16, add, +);
> > > > +  TEST (v2di, long long, 2, add, +);
> > > > +  TEST (v4di, long long, 4, add, +);
> > > > +  TEST (v8di, long long, 8, add, +);
> > > > +  TEST (v4sf, float, 4, add, +);
> > > > +  TEST (v8sf, float, 8, add, +);
> > > > +  TEST (v16sf, float, 16, add, +);
> > > > +  TEST (v2df, double, 2, add, +);
> > > > +  TEST (v4df, double, 4, add, +);
> > > > +  TEST (v8df, double, 8, add, +);
> > > > +
> > > > +  TEST (v4si, int, 4, mul, *);
> > > > +  TEST (v8si, int, 8, mul, *);
> > > > +  TEST (v16si, int, 16, mul, *);
> > > > +  TEST (v2di, long long, 2, mul, *);
> > > > +  TEST (v4di, long long, 4, mul, *);
> > > > +  TEST (v8di, long long, 8, mul, *);
> > > > +  TEST (v4sf, float, 4, mul, *);
> > > > +  TEST (v8sf, float, 8, mul, *);
> > > > +  TEST (v16sf, float, 16, mul, *);
> > > > +  TEST (v2df, double, 2, mul, *);
> > > > +  TEST (v4df, double, 4, mul, *);
> > > > +  TEST (v8df, double, 8, mul, *);
> > > > +}
> > > > diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-1.c b/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-1.c
> > > > new file mode 100644
> > > > index 00000000000..ec159a68158
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-1.c
> > > > @@ -0,0 +1,40 @@
> > > > +/* PR target/87767 */
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O2 -mavx512f" } */
> > > > +/* { dg-final { scan-assembler-not "\\\{1to\[248\]\\\}" } }  */
> > > > +/* { dg-final { scan-assembler-not "\\\{1to16\\\}" } }  */
> > > > +
> > > > +typedef int v4si  __attribute__ ((vector_size (16)));
> > > > +typedef int v8si  __attribute__ ((vector_size (32)));
> > > > +typedef long long v2di  __attribute__ ((vector_size (16)));
> > > > +typedef long long v4di  __attribute__ ((vector_size (32)));
> > > > +typedef float v4sf  __attribute__ ((vector_size (16)));
> > > > +typedef float v8sf  __attribute__ ((vector_size (32)));
> > > > +typedef double v2df  __attribute__ ((vector_size (16)));
> > > > +typedef double v4df  __attribute__ ((vector_size (32)));
> > > > +
> > > > +#define FOO(VTYPE, OP_NAME, OP)                      \
> > > > +VTYPE                                                \
> > > > + __attribute__ ((noipa))                     \
> > > > +foo_##OP_NAME##_##VTYPE (VTYPE a)            \
> > > > +{                                            \
> > > > +  return a OP 101;                           \
> > > > +}                                            \
> > > > +
> > > > +FOO (v4si, add, +);
> > > > +FOO (v8si, add, +);
> > > > +FOO (v2di, add, +);
> > > > +FOO (v4di, add, +);
> > > > +FOO (v4sf, add, +);
> > > > +FOO (v8sf, add, +);
> > > > +FOO (v2df, add, +);
> > > > +FOO (v4df, add, +);
> > > > +
> > > > +FOO (v4si, mul, *);
> > > > +FOO (v8si, mul, *);
> > > > +FOO (v2di, mul, *);
> > > > +FOO (v4di, mul, *);
> > > > +FOO (v4sf, mul, *);
> > > > +FOO (v8sf, mul, *);
> > > > +FOO (v2df, mul, *);
> > > > +FOO (v4df, mul, *);
> > > > diff --git a/gcc/testsuite/gcc.target/i386/pr92865-1.c b/gcc/testsuite/gcc.target/i386/pr92865-1.c
> > > > index 49b5778a067..a37487d9af7 100644
> > > > --- a/gcc/testsuite/gcc.target/i386/pr92865-1.c
> > > > +++ b/gcc/testsuite/gcc.target/i386/pr92865-1.c
> > > > @@ -3,10 +3,11 @@
> > > >  /* { dg-options "-Ofast -mavx512f -mavx512bw -mxop" } */
> > > >  /* { dg-final { scan-assembler-times "vpcmp\[bwdq\]\[\t ]" 4 } } */
> > > >  /* { dg-final { scan-assembler-times "vpcmpu\[bwdq\]\[\t ]" 4 } } */
> > > > -/* { dg-final { scan-assembler-times "vmovdq\[au\]8\[\t ]" 4 } } */
> > > > -/* { dg-final { scan-assembler-times "vmovdq\[au\]16\[\t ]" 4 } } *
> > > > -/* { dg-final { scan-assembler-times "vmovdq\[au\]32\[\t ]" 4 } } */
> > > > -/* { dg-final { scan-assembler-times "vmovdq\[au\]64\[\t ]" 4 } } */
> > > > +/* { dg-final { scan-assembler-times "vmovdq\[au\]8\[\t ]" 2 } } */
> > > > +/* { dg-final { scan-assembler-times "vmovdq\[au\]16\[\t ]" 2 } } *
> > > > +/* { dg-final { scan-assembler-times "vmovdq\[au\]32\[\t ]" 2 } } */
> > > > +/* { dg-final { scan-assembler-times "vmovdq\[au\]64\[\t ]" 2 } } */
> > > > +/* { dg-final { scan-assembler-times "vpbroadcast\[bwqd\]\[\t ]" 16 } } */
> > > >
> > > >  extern char arraysb[64];
> > > >  extern short arraysw[32];
> > > > --
> > > > 2.18.1
> > > >
> > >
> >
> >
> > --
> > BR,
> > Hongtao
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao

[-- Attachment #2: 0001-Optimize-memory-broadcast-for-constant-vector_V2.patch --]
[-- Type: text/x-patch, Size: 36139 bytes --]

From 8cd1b1ad4f7fecec3e7a22be09d8f282115a1c7d Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Wed, 8 Jul 2020 17:14:36 +0800
Subject: [PATCH] Optimize memory broadcast for constant vector under AVX512.

For constant vector having one duplicated value, there's no need to put
whole vector in the constant pool, using embedded broadcast instead.

2020-07-09  Hongtao Liu  <hongtao.liu@intel.com>

gcc/ChangeLog:

	PR target/87767
	* config/i386/i386-features.c
	(replace_constant_pool_with_broadcast): New function.
	(constant_pool_broadcast): Ditto.
	(class pass_constant_pool_broadcast): New pass.
	(make_pass_constant_pool_broadcast): Ditto.
	* config/i386/i386-passes.def: Insert new pass after combine.
	* config/i386/i386-protos.h
	(make_pass_constant_pool_broadcast): Declare.
	* config/i386/sse.md (*avx512dq_mul<mode>3<mask_name>_bcst,
	*avx512f_mul<mode>3<mask_name>_bcst): New define_insn.
	* config/i386/avx512fintrin.h (_mm512_set1_ps,
	_mm512_set1_pd,_mm512_set1_epi32, _mm512_set1_epi64): Adjusted.

gcc/testsuite/ChangeLog:

	PR target/87767
	* gcc.target/i386/avx2-broadcast-pr87767-1.c: New test.
	* gcc.target/i386/avx512f-broadcast-pr87767-1.c: New test.
	* gcc.target/i386/avx512f-broadcast-pr87767-2.c: New test.
	* gcc.target/i386/avx512f-broadcast-pr87767-3.c: New test.
	* gcc.target/i386/avx512f-broadcast-pr87767-4.c: New test.
	* gcc.target/i386/avx512f-broadcast-pr87767-5.c: New test.
	* gcc.target/i386/avx512f-broadcast-pr87767-6.c: New test.
	* gcc.target/i386/avx512f-broadcast-pr87767-7.c: New test.
	* gcc.target/i386/avx512vl-broadcast-pr87767-1.c: New test.
	* gcc.target/i386/pr92865-1.c: Adjust testcase.
---
 gcc/config/i386/avx512fintrin.h               |  27 ++--
 gcc/config/i386/i386-features.c               | 146 ++++++++++++++++++
 gcc/config/i386/i386-passes.def               |   1 +
 gcc/config/i386/i386-protos.h                 |   1 +
 gcc/config/i386/sse.md                        |  25 +++
 .../i386/avx2-broadcast-pr87767-1.c           |  40 +++++
 .../i386/avx512f-broadcast-pr87767-1.c        |  78 ++++++++++
 .../i386/avx512f-broadcast-pr87767-2.c        |  61 ++++++++
 .../i386/avx512f-broadcast-pr87767-3.c        |  55 +++++++
 .../i386/avx512f-broadcast-pr87767-4.c        |  59 +++++++
 .../i386/avx512f-broadcast-pr87767-5.c        |  63 ++++++++
 .../i386/avx512f-broadcast-pr87767-6.c        |  58 +++++++
 .../i386/avx512f-broadcast-pr87767-7.c        | 121 +++++++++++++++
 .../i386/avx512vl-broadcast-pr87767-1.c       |  41 +++++
 gcc/testsuite/gcc.target/i386/pr92865-1.c     |   9 +-
 15 files changed, 763 insertions(+), 22 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx2-broadcast-pr87767-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-1.c

diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
index 0d53dda3a27..729d5686d68 100644
--- a/gcc/config/i386/avx512fintrin.h
+++ b/gcc/config/i386/avx512fintrin.h
@@ -239,22 +239,17 @@ extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_set1_pd (double __A)
 {
-  return (__m512d) __builtin_ia32_broadcastsd512 (__extension__
-						  (__v2df) { __A, },
-						  (__v8df)
-						  _mm512_undefined_pd (),
-						  (__mmask8) -1);
+  return __extension__ (__m512d)(__v8df)
+    { __A, __A, __A, __A, __A, __A, __A, __A };
 }
 
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_set1_ps (float __A)
 {
-  return (__m512) __builtin_ia32_broadcastss512 (__extension__
-						 (__v4sf) { __A, },
-						 (__v16sf)
-						 _mm512_undefined_ps (),
-						 (__mmask16) -1);
+  return __extension__ (__m512)(__v16sf)
+    { __A, __A, __A, __A, __A, __A, __A, __A,
+      __A, __A, __A, __A, __A, __A, __A, __A };
 }
 
 /* Create the vector [A B C D A B C D A B C D A B C D].  */
@@ -4072,10 +4067,9 @@ extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_set1_epi32 (int __A)
 {
-  return (__m512i) __builtin_ia32_pbroadcastd512_gpr_mask (__A,
-							   (__v16si)
-							   _mm512_undefined_epi32 (),
-							   (__mmask16)(-1));
+  return (__m512i)(__v16si)
+    { __A, __A, __A, __A, __A, __A, __A, __A,
+      __A, __A, __A, __A, __A, __A, __A, __A };
 }
 
 extern __inline __m512i
@@ -4128,10 +4122,7 @@ extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_set1_epi64 (long long __A)
 {
-  return (__m512i) __builtin_ia32_pbroadcastq512_gpr_mask (__A,
-							   (__v8di)
-							   _mm512_undefined_epi32 (),
-							   (__mmask8)(-1));
+  return (__m512i)(__v8di) { __A, __A, __A, __A, __A, __A, __A, __A };
 }
 
 extern __inline __m512i
diff --git a/gcc/config/i386/i386-features.c b/gcc/config/i386/i386-features.c
index 535fc7e981d..8f81d101382 100644
--- a/gcc/config/i386/i386-features.c
+++ b/gcc/config/i386/i386-features.c
@@ -2379,6 +2379,152 @@ make_pass_remove_partial_avx_dependency (gcc::context *ctxt)
   return new pass_remove_partial_avx_dependency (ctxt);
 }
 
+/* Replace all one-value const vector that are referenced by SYMBOL_REFs in x
+   with embedded broadcast. i.e.transform
+
+     vpaddq .LC0(%rip), %zmm0, %zmm0
+     ret
+  .LC0:
+    .quad 3
+    .quad 3
+    .quad 3
+    .quad 3
+    .quad 3
+    .quad 3
+    .quad 3
+    .quad 3
+
+    to
+
+     vpaddq .LC0(%rip){1to8}, %zmm0, %zmm0
+     ret
+  .LC0:
+    .quad 3  */
+static void
+replace_constant_pool_with_broadcast (rtx_insn* insn)
+{
+  subrtx_ptr_iterator::array_type array;
+  FOR_EACH_SUBRTX_PTR (iter, array, &PATTERN (insn), ALL)
+    {
+      rtx *loc = *iter;
+      rtx x = *loc;
+      rtx broadcast_mem, vec_dup, constant, first;
+      machine_mode mode;
+      if (GET_CODE (x) != MEM
+	  || GET_CODE (XEXP (x, 0)) != SYMBOL_REF
+	  || !CONSTANT_POOL_ADDRESS_P (XEXP (x, 0)))
+	continue;
+
+      mode = GET_MODE (x);
+      if (!VECTOR_MODE_P (mode))
+	return;
+
+      constant = get_pool_constant (XEXP (x, 0));
+      first = XVECEXP (constant, 0, 0);
+      /* There could be some rtx like
+	 (mem/u/c:V16QI (symbol_ref/u:DI ("*.LC1")))
+	 but with "*.LC1" refer to V2DI constant vector.  */
+      if (GET_MODE (constant) != mode)
+	return;
+
+      for (int i = 1; i < GET_MODE_NUNITS (mode); ++i)
+	{
+	  rtx tmp = XVECEXP (constant, 0, i);
+	  /* Only handle one-value const vector.  */
+	  if (!rtx_equal_p (tmp, first))
+	    return;
+	}
+
+      broadcast_mem = force_const_mem (GET_MODE_INNER (mode), first);
+      vec_dup = gen_rtx_VEC_DUPLICATE (mode, broadcast_mem);
+      *loc = vec_dup;
+      INSN_CODE (insn) = -1;
+      /* Revert change if there's no corresponding pattern.  */
+      if (recog_memoized (insn) < 0)
+      	{
+      	  *loc = x;
+      	  recog_memoized (insn);
+      	}
+      /* At most 1 memory_operand in an insn.  */
+      return;
+    }
+}
+
+/* For const vector having one duplicated value, there's no need to put
+   whole vector in the constant pool when target supports embedded broadcast. */
+static unsigned int
+constant_pool_broadcast (void)
+{
+  timevar_push (TV_MACH_DEP);
+  rtx_insn *insn;
+
+  for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
+    {
+      if (!INSN_P (insn))
+	continue;
+
+      /* Insns may appear inside a SEQUENCE.  Only check the patterns of
+	 insns, not any notes that may be attached.  We don't want to mark
+	 a constant just because it happens to appear in a REG_EQUIV note.  */
+      if (rtx_sequence *seq = dyn_cast <rtx_sequence *> (PATTERN (insn)))
+	{
+	  int i, n = seq->len ();
+	  for (i = 0; i < n; ++i)
+	    {
+	      rtx subinsn = seq->element (i);
+	      if (INSN_P (subinsn))
+		replace_constant_pool_with_broadcast (dyn_cast <rtx_insn *> (subinsn));
+	    }
+	}
+      else
+	replace_constant_pool_with_broadcast (insn);
+    }
+  timevar_pop (TV_MACH_DEP);
+  return 0;
+}
+
+namespace {
+
+const pass_data pass_data_constant_pool_broadcast =
+{
+  RTL_PASS, /* type */
+  "cpb", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_MACH_DEP, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  TODO_df_finish, /* todo_flags_finish */
+};
+
+class pass_constant_pool_broadcast : public rtl_opt_pass
+{
+public:
+  pass_constant_pool_broadcast (gcc::context *ctxt)
+    : rtl_opt_pass (pass_data_constant_pool_broadcast, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *)
+    {
+      return TARGET_AVX512F;
+    }
+
+  virtual unsigned int execute (function *)
+    {
+      return constant_pool_broadcast ();
+    }
+}; // class pass_cpb
+
+} // anon namespace
+
+rtl_opt_pass *
+make_pass_constant_pool_broadcast (gcc::context *ctxt)
+{
+  return new pass_constant_pool_broadcast (ctxt);
+}
+
 /* This compares the priority of target features in function DECL1
    and DECL2.  It returns positive value if DECL1 is higher priority,
    negative value if DECL2 is higher priority and 0 if they are the
diff --git a/gcc/config/i386/i386-passes.def b/gcc/config/i386/i386-passes.def
index d83c7b956b1..07ecf8e790f 100644
--- a/gcc/config/i386/i386-passes.def
+++ b/gcc/config/i386/i386-passes.def
@@ -33,3 +33,4 @@ along with GCC; see the file COPYING3.  If not see
   INSERT_PASS_BEFORE (pass_shorten_branches, 1, pass_insert_endbr_and_patchable_area);
 
   INSERT_PASS_AFTER (pass_combine, 1, pass_remove_partial_avx_dependency);
+  INSERT_PASS_AFTER (pass_combine, 1, pass_constant_pool_broadcast);
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 0b95c57b1a0..812d3e06cf1 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -386,3 +386,4 @@ extern rtl_opt_pass *make_pass_insert_endbr_and_patchable_area
   (gcc::context *);
 extern rtl_opt_pass *make_pass_remove_partial_avx_dependency
   (gcc::context *);
+extern rtl_opt_pass *make_pass_constant_pool_broadcast (gcc::context *);
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index b6348de67cb..6bd780caff6 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -12121,6 +12121,19 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
+(define_insn "*avx512dq_mul<mode>3<mask_name>_bcst"
+  [(set (match_operand:VI8_AVX512VL 0 "register_operand" "=v")
+	(mult:VI8_AVX512VL
+	  (vec_duplicate:VI8_AVX512VL
+	    (match_operand:<ssescalarmode> 1 "memory_operand" "m"))
+	  (match_operand:VI8_AVX512VL 2 "register_operand" "v")
+))]
+  "TARGET_AVX512DQ"
+  "vpmullq\t{%1<avx512bcst>, %2, %0<mask_operand3>|%0<mask_operand3>, %2, %1<avx512bcst>}"
+  [(set_attr "type" "sseimul")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
 (define_expand "mul<mode>3<mask_name>"
   [(set (match_operand:VI4_AVX512F 0 "register_operand")
 	(mult:VI4_AVX512F
@@ -12161,6 +12174,18 @@
    (set_attr "btver2_decode" "vector,vector,vector")
    (set_attr "mode" "<sseinsnmode>")])
 
+(define_insn "*avx512f_mul<mode>3<mask_name>_bcst"
+  [(set (match_operand:VI4_AVX512VL 0 "register_operand" "=v")
+	(mult:VI4_AVX512VL
+	  (vec_duplicate:VI4_AVX512VL
+	    (match_operand:<ssescalarmode> 1 "memory_operand" "m"))
+	  (match_operand:VI4_AVX512VL 2 "register_operand" "v")))]
+  "TARGET_AVX512F"
+   "vpmulld\t{%1<avx512bcst>, %2, %0<mask_operand3>|%0<mask_operand3>, %2, %1<avx512bcst>}"
+  [(set_attr "type" "sseimul")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "<sseinsnmode>")])
+
 (define_expand "mul<mode>3"
   [(set (match_operand:VI8_AVX2_AVX512F 0 "register_operand")
 	(mult:VI8_AVX2_AVX512F
diff --git a/gcc/testsuite/gcc.target/i386/avx2-broadcast-pr87767-1.c b/gcc/testsuite/gcc.target/i386/avx2-broadcast-pr87767-1.c
new file mode 100644
index 00000000000..800ef1f957e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx2-broadcast-pr87767-1.c
@@ -0,0 +1,40 @@
+/* PR target/87767 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx2" } */
+/* { dg-final { scan-assembler-not "\\\{1to\[248\]\\\}" } }  */
+/* { dg-final { scan-assembler-not "\\\{1to16\\\}" } }  */
+
+typedef int v4si  __attribute__ ((vector_size (16)));
+typedef int v8si  __attribute__ ((vector_size (32)));
+typedef long long v2di  __attribute__ ((vector_size (16)));
+typedef long long v4di  __attribute__ ((vector_size (32)));
+typedef float v4sf  __attribute__ ((vector_size (16)));
+typedef float v8sf  __attribute__ ((vector_size (32)));
+typedef double v2df  __attribute__ ((vector_size (16)));
+typedef double v4df  __attribute__ ((vector_size (32)));
+
+#define FOO(VTYPE, OP_NAME, OP)			\
+VTYPE						\
+ __attribute__ ((noipa))			\
+foo_##OP_NAME##_##VTYPE (VTYPE a)		\
+{						\
+  return a OP 101;				\
+}						\
+
+FOO (v4si, add, +);
+FOO (v8si, add, +);
+FOO (v2di, add, +);
+FOO (v4di, add, +);
+FOO (v4sf, add, +);
+FOO (v8sf, add, +);
+FOO (v2df, add, +);
+FOO (v4df, add, +);
+
+FOO (v4si, mul, *);
+FOO (v8si, mul, *);
+FOO (v2di, mul, *);
+FOO (v4di, mul, *);
+FOO (v4sf, mul, *);
+FOO (v8sf, mul, *);
+FOO (v2df, mul, *);
+FOO (v4df, mul, *);
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c
new file mode 100644
index 00000000000..610d9c6e2dd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c
@@ -0,0 +1,78 @@
+/* PR target/87767 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512f -mavx512vl -mavx512dq" } */
+/* { dg-final { scan-assembler-times "vpadd\[^\n\]*\\\{1to2\\\}" 1 } }  */
+/* { dg-final { scan-assembler-times "vpadd\[^\n\]*\\\{1to4\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "vpadd\[^\n\]*\\\{1to8\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "vpadd\[^\n\]*\\\{1to16\\\}" 1 } }  */
+/* { dg-final { scan-assembler-times "vpsub\[^\n\]*\\\{1to2\\\}" 1 } }  */
+/* { dg-final { scan-assembler-times "vpsub\[^\n\]*\\\{1to4\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "vpsub\[^\n\]*\\\{1to8\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "vpsub\[^\n\]*\\\{1to16\\\}" 1 } }  */
+/* { dg-final { scan-assembler-times "vpmul\[^\n\]*\\\{1to2\\\}" 1 } }  */
+/* { dg-final { scan-assembler-times "vpmul\[^\n\]*\\\{1to4\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "vpmul\[^\n\]*\\\{1to8\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "vpmul\[^\n\]*\\\{1to16\\\}" 1 } }  */
+/* { dg-final { scan-assembler-times "vadd\[^\n\]*\\\{1to2\\\}" 1 } }  */
+/* { dg-final { scan-assembler-times "vadd\[^\n\]*\\\{1to4\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "vadd\[^\n\]*\\\{1to8\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "vadd\[^\n\]*\\\{1to16\\\}" 1 } }  */
+/* { dg-final { scan-assembler-times "vmul\[^\n\]*\\\{1to2\\\}" 1 } }  */
+/* { dg-final { scan-assembler-times "vmul\[^\n\]*\\\{1to4\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "vmul\[^\n\]*\\\{1to8\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "vmul\[^\n\]*\\\{1to16\\\}" 1 } }  */
+
+typedef int v4si  __attribute__ ((vector_size (16)));
+typedef int v8si  __attribute__ ((vector_size (32)));
+typedef int v16si  __attribute__ ((vector_size (64)));
+typedef long long v2di  __attribute__ ((vector_size (16)));
+typedef long long v4di  __attribute__ ((vector_size (32)));
+typedef long long v8di  __attribute__ ((vector_size (64)));
+typedef float v4sf  __attribute__ ((vector_size (16)));
+typedef float v8sf  __attribute__ ((vector_size (32)));
+typedef float v16sf  __attribute__ ((vector_size (64)));
+typedef double v2df  __attribute__ ((vector_size (16)));
+typedef double v4df  __attribute__ ((vector_size (32)));
+typedef double v8df  __attribute__ ((vector_size (64)));
+
+#define CONSTANT 101;
+#define FOO(VTYPE, OP_NAME, OP)			\
+VTYPE						\
+ __attribute__ ((noipa))			\
+foo_##OP_NAME##_##VTYPE (VTYPE a)		\
+{						\
+  return a OP CONSTANT;				\
+}						\
+
+FOO (v4si, add, +);
+FOO (v8si, add, +);
+FOO (v16si, add, +);
+FOO (v2di, add, +);
+FOO (v4di, add, +);
+FOO (v8di, add, +);
+FOO (v4sf, add, +);
+FOO (v8sf, add, +);
+FOO (v16sf, add, +);
+FOO (v2df, add, +);
+FOO (v4df, add, +);
+FOO (v8df, add, +);
+
+FOO (v4si, sub, -);
+FOO (v8si, sub, -);
+FOO (v16si, sub, -);
+FOO (v2di, sub, -);
+FOO (v4di, sub, -);
+FOO (v8di, sub, -);
+
+FOO (v4si, mul, *);
+FOO (v8si, mul, *);
+FOO (v16si, mul, *);
+FOO (v2di, mul, *);
+FOO (v4di, mul, *);
+FOO (v8di, mul, *);
+FOO (v4sf, mul, *);
+FOO (v8sf, mul, *);
+FOO (v16sf, mul, *);
+FOO (v2df, mul, *);
+FOO (v4df, mul, *);
+FOO (v8df, mul, *);
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-2.c b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-2.c
new file mode 100644
index 00000000000..88b6fe77b18
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-2.c
@@ -0,0 +1,61 @@
+/* PR target/87767 */
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512f -mavx512vl -mavx512dq" } */
+
+#include<stdlib.h>
+#include<stdio.h>
+#include "avx512f-broadcast-pr87767-1.c"
+
+#define TEST(VTYPE, TYPE, N, OP_NAME, OP)		\
+  do							\
+    {							\
+      TYPE exp[N], src[N];				\
+      VTYPE res;					\
+      for (int i = 0; i < N; i++)			\
+	src[i] = i * i * 107;				\
+      res = foo_##OP_NAME##_##VTYPE (*(VTYPE*)&src[0]);	\
+      for (int i = 0; i < N; i ++)			\
+	exp[i] = src[i] OP CONSTANT;			\
+      for (int j = 0; j < N; j++)			\
+	{						\
+	  if (res[j] != exp[j])				\
+	    abort();					\
+	}						\
+    }							\
+  while (0)
+
+int main()
+{
+  TEST (v4si, int, 4, add, +);
+  TEST (v8si, int, 8, add, +);
+  TEST (v16si, int, 16, add, +);
+  TEST (v2di, long long, 2, add, +);
+  TEST (v4di, long long, 4, add, +);
+  TEST (v8di, long long, 8, add, +);
+  TEST (v4sf, float, 4, add, +);
+  TEST (v8sf, float, 8, add, +);
+  TEST (v16sf, float, 16, add, +);
+  TEST (v2df, double, 2, add, +);
+  TEST (v4df, double, 4, add, +);
+  TEST (v8df, double, 8, add, +);
+
+  TEST (v4si, int, 4, sub, -);
+  TEST (v8si, int, 8, sub, -);
+  TEST (v16si, int, 16, sub, -);
+  TEST (v2di, long long, 2, sub, -);
+  TEST (v4di, long long, 4, sub, -);
+  TEST (v8di, long long, 8, sub, -);
+
+  TEST (v4si, int, 4, mul, *);
+  TEST (v8si, int, 8, mul, *);
+  TEST (v16si, int, 16, mul, *);
+  TEST (v2di, long long, 2, mul, *);
+  TEST (v4di, long long, 4, mul, *);
+  TEST (v8di, long long, 8, mul, *);
+  TEST (v4sf, float, 4, mul, *);
+  TEST (v8sf, float, 8, mul, *);
+  TEST (v16sf, float, 16, mul, *);
+  TEST (v2df, double, 2, mul, *);
+  TEST (v4df, double, 4, mul, *);
+  TEST (v8df, double, 8, mul, *);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-3.c b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-3.c
new file mode 100644
index 00000000000..94c183b7a5a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-3.c
@@ -0,0 +1,55 @@
+/* PR target/87767 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512f -mavx512vl" } */
+/* { dg-final { scan-assembler-times "(?:vfmadd|vfmsub)\[^\n\]*\\\{1to2\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "(?:vfmadd|vfmsub)\[^\n\]*\\\{1to4\\\}" 4 } }  */
+/* { dg-final { scan-assembler-times "(?:vfmadd|vfmsub)\[^\n\]*\\\{1to8\\\}" 4 } }  */
+/* { dg-final { scan-assembler-times "(?:vfmadd|vfmsub)\[^\n\]*\\\{1to16\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "(?:vfnmadd|vfnmsub)\[^\n\]*\\\{1to2\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "(?:vfnmadd|vfnmsub)\[^\n\]*\\\{1to4\\\}" 4 } }  */
+/* { dg-final { scan-assembler-times "(?:vfnmadd|vfnmsub)\[^\n\]*\\\{1to8\\\}" 4 } }  */
+/* { dg-final { scan-assembler-times "(?:vfnmadd|vfnmsub)\[^\n\]*\\\{1to16\\\}" 2 } }  */
+
+typedef float v4sf  __attribute__ ((vector_size (16)));
+typedef float v8sf  __attribute__ ((vector_size (32)));
+typedef float v16sf  __attribute__ ((vector_size (64)));
+typedef double v2df  __attribute__ ((vector_size (16)));
+typedef double v4df  __attribute__ ((vector_size (32)));
+typedef double v8df  __attribute__ ((vector_size (64)));
+
+#define CONSTANT 101;
+#define FOO(VTYPE, OP_NAME, OP1, OP2)		\
+VTYPE						\
+ __attribute__ ((noipa))			\
+ foo_##OP_NAME##_##VTYPE (VTYPE a, VTYPE b)	\
+{						\
+  return (OP1 a * b) OP2 CONSTANT;		\
+}						\
+
+FOO (v4sf, fma,, +);
+FOO (v8sf, fma,, +);
+FOO (v16sf, fma,, +);
+FOO (v2df, fma,, +);
+FOO (v4df, fma,, +);
+FOO (v8df, fma,, +);
+
+FOO (v4sf, fms,, -);
+FOO (v8sf, fms,, -);
+FOO (v16sf, fms,, -);
+FOO (v2df, fms,, -);
+FOO (v4df, fms,, -);
+FOO (v8df, fms,, -);
+
+FOO (v4sf, fnma, -, +);
+FOO (v8sf, fnma, -, +);
+FOO (v16sf, fnma, -, +);
+FOO (v2df, fnma, -, +);
+FOO (v4df, fnma, -, +);
+FOO (v8df, fnma, -, +);
+
+FOO (v4sf, fnms, -, -);
+FOO (v8sf, fnms, -, -);
+FOO (v16sf, fnms, -, -);
+FOO (v2df, fnms, -, -);
+FOO (v4df, fnms, -, -);
+FOO (v8df, fnms, -, -);
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-4.c b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-4.c
new file mode 100644
index 00000000000..17268968f9e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-4.c
@@ -0,0 +1,59 @@
+/* PR target/87767 */
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512f -mavx512vl" } */
+
+#include<stdlib.h>
+#include<stdio.h>
+#include "avx512f-broadcast-pr87767-3.c"
+
+#define TEST(VTYPE, TYPE, N, OP_NAME, OP1, OP2)				\
+  do									\
+    {									\
+      TYPE exp[N], src1[N], src2[N];					\
+      VTYPE res;							\
+      for (int i = 0; i < N; i++)					\
+	{								\
+	  src1[i] = i * i * 107.2f;					\
+	  src2[i] = i * 2.f - 404.f;					\
+	}								\
+      res = foo_##OP_NAME##_##VTYPE (*(VTYPE*)&src1[0], *(VTYPE*)&src2[0]); \
+      for (int i = 0; i < N; i ++)					\
+	exp[i] = (OP1 src1[i] * src2[i]) OP2 CONSTANT;			\
+      for (int j = 0; j < N; j++)					\
+	{								\
+	  if (res[j] != exp[j])						\
+	    abort();							\
+	}								\
+    }									\
+  while (0)
+
+int main()
+{
+  TEST (v4sf, float, 4, fma,, +);
+  TEST (v8sf, float, 8, fma,, +);
+  TEST (v16sf, float, 16, fma,, +);
+  TEST (v2df, double, 2, fma,, +);
+  TEST (v4df, double, 4, fma,, +);
+  TEST (v8df, double, 8, fma,, +);
+
+  TEST (v4sf, float, 4, fms,, -);
+  TEST (v8sf, float, 8, fms,, -);
+  TEST (v16sf, float, 16, fms,, -);
+  TEST (v2df, double, 2, fms,, -);
+  TEST (v4df, double, 4, fms,, -);
+  TEST (v8df, double, 8, fms,, -);
+
+  TEST (v4sf, float, 4, fnma,-, +);
+  TEST (v8sf, float, 8, fnma,-, +);
+  TEST (v16sf, float, 16, fnma,-, +);
+  TEST (v2df, double, 2, fnma,-, +);
+  TEST (v4df, double, 4, fnma,-, +);
+  TEST (v8df, double, 8, fnma,-, +);
+
+  TEST (v4sf, float, 4, fnms,-, -);
+  TEST (v8sf, float, 8, fnms,-, -);
+  TEST (v16sf, float, 16, fnms,-, -);
+  TEST (v2df, double, 2, fnms,-, -);
+  TEST (v4df, double, 4, fnms,-, -);
+  TEST (v8df, double, 8, fnms,-, -);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-5.c b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-5.c
new file mode 100644
index 00000000000..139bba75686
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-5.c
@@ -0,0 +1,63 @@
+/* PR target/87767 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512f -mavx512vl" } */
+/* { dg-final { scan-assembler-times "vpandn\[^\n\]*\\\{1to2\\\}" 1 } }  */
+/* { dg-final { scan-assembler-times "vpandn\[^\n\]*\\\{1to4\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "vpandn\[^\n\]*\\\{1to8\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "vpandn\[^\n\]*\\\{1to16\\\}" 1 } }  */
+/* { dg-final { scan-assembler-times "vpand\[^n\n\]*\\\{1to2\\\}" 1 } }  */
+/* { dg-final { scan-assembler-times "vpand\[^n\n\]*\\\{1to4\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "vpand\[^n\n\]*\\\{1to8\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "vpand\[^n\n\]*\\\{1to16\\\}" 1 } }  */
+/* { dg-final { scan-assembler-times "vpor\[^\n\]*\\\{1to2\\\}" 1 } }  */
+/* { dg-final { scan-assembler-times "vpor\[^\n\]*\\\{1to4\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "vpor\[^\n\]*\\\{1to8\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "vpor\[^\n\]*\\\{1to16\\\}" 1 } }  */
+/* { dg-final { scan-assembler-times "vpxor\[^\n\]*\\\{1to2\\\}" 1 } }  */
+/* { dg-final { scan-assembler-times "vpxor\[^\n\]*\\\{1to4\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "vpxor\[^\n\]*\\\{1to8\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "vpxor\[^\n\]*\\\{1to16\\\}" 1 } }  */
+
+typedef int v4si  __attribute__ ((vector_size (16)));
+typedef int v8si  __attribute__ ((vector_size (32)));
+typedef int v16si  __attribute__ ((vector_size (64)));
+typedef long long v2di  __attribute__ ((vector_size (16)));
+typedef long long v4di  __attribute__ ((vector_size (32)));
+typedef long long v8di  __attribute__ ((vector_size (64)));
+
+#define CONSTANT 101;
+#define FOO(VTYPE, OP_NAME, OP1, OP2)		\
+VTYPE						\
+ __attribute__ ((noipa))			\
+ foo_##OP_NAME##_##VTYPE (VTYPE a)		\
+{						\
+  return (OP1 a) OP2 CONSTANT;			\
+}						\
+
+FOO (v4si, andnot, ~, &);
+FOO (v8si, andnot, ~, &);
+FOO (v16si, andnot, ~, &);
+FOO (v2di, andnot, ~, &);
+FOO (v4di, andnot, ~, &);
+FOO (v8di, andnot, ~, &);
+
+FOO (v4si, and,, &);
+FOO (v8si, and,, &);
+FOO (v16si, and,, &);
+FOO (v2di, and,, &);
+FOO (v4di, and,, &);
+FOO (v8di, and,, &);
+
+FOO (v4si, or,, |);
+FOO (v8si, or,, |);
+FOO (v16si, or,, |);
+FOO (v2di, or,, |);
+FOO (v4di, or,, |);
+FOO (v8di, or,, |);
+
+FOO (v4si, xor,, ^);
+FOO (v8si, xor,, ^);
+FOO (v16si, xor,, ^);
+FOO (v2di, xor,, ^);
+FOO (v4di, xor,, ^);
+FOO (v8di, xor,, ^);
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-6.c b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-6.c
new file mode 100644
index 00000000000..892b52858da
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-6.c
@@ -0,0 +1,58 @@
+/* PR target/87767 */
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512f -mavx512vl" } */
+
+#include<stdlib.h>
+#include<stdio.h>
+#include "avx512f-broadcast-pr87767-5.c"
+
+#define TEST(VTYPE, TYPE, N, OP_NAME, OP1, OP2)				\
+  do									\
+    {									\
+      TYPE exp[N], src[N];						\
+      VTYPE res;							\
+      for (int i = 0; i < N; i++)					\
+	{								\
+	  src[i] = i * i * 107;						\
+	}								\
+      res = foo_##OP_NAME##_##VTYPE (*(VTYPE*)&src[0]);			\
+      for (int i = 0; i < N; i ++)					\
+	exp[i] = (OP1 src[i]) OP2 CONSTANT;				\
+      for (int j = 0; j < N; j++)					\
+	{								\
+	  if (res[j] != exp[j])						\
+	    abort();							\
+	}								\
+    }									\
+  while (0)
+
+int main()
+{
+  TEST (v4si, int, 4, andnot, ~, &);
+  TEST (v8si, int, 8, andnot, ~, &);
+  TEST (v16si, int, 16, andnot, ~, &);
+  TEST (v2di, long long, 2, andnot, ~, &);
+  TEST (v4di, long long, 4, andnot, ~, &);
+  TEST (v8di, long long, 8, andnot, ~, &);
+
+  TEST (v4si, int, 4, and,, &);
+  TEST (v8si, int, 8, and,, &);
+  TEST (v16si, int, 16, and,, &);
+  TEST (v2di, long long, 2, and,, &);
+  TEST (v4di, long long, 4, and,, &);
+  TEST (v8di, long long, 8, and,, &);
+
+  TEST (v4si, int, 4, or,, |);
+  TEST (v8si, int, 8, or,, |);
+  TEST (v16si, int, 16, or,, |);
+  TEST (v2di, long long, 2, or,, |);
+  TEST (v4di, long long, 4, or,, |);
+  TEST (v8di, long long, 8, or,, |);
+
+  TEST (v4si, int, 4, xor,, ^);
+  TEST (v8si, int, 8, xor,, ^);
+  TEST (v16si, int, 16, xor,, ^);
+  TEST (v2di, long long, 2, xor,, ^);
+  TEST (v4di, long long, 4, xor,, ^);
+  TEST (v8di, long long, 8, xor,, ^);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-7.c b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-7.c
new file mode 100644
index 00000000000..a8f145d8589
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-7.c
@@ -0,0 +1,121 @@
+/* PR target/87767 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512f -mavx512vl" } */
+/* { dg-final { scan-assembler-times "vadd\[^\n\]*\\\{1to2\\\}" 1 } }  */
+/* { dg-final { scan-assembler-times "vadd\[^\n\]*\\\{1to4\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "vadd\[^\n\]*\\\{1to8\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "vadd\[^\n\]*\\\{1to16\\\}" 1 } }  */
+/* { dg-final { scan-assembler-times "vsub\[^\n\]*\\\{1to2\\\}" 1 } }  */
+/* { dg-final { scan-assembler-times "vsub\[^\n\]*\\\{1to4\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "vsub\[^\n\]*\\\{1to8\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "vsub\[^\n\]*\\\{1to16\\\}" 1 } }  */
+/* { dg-final { scan-assembler-times "vmul\[^\n\]*\\\{1to2\\\}" 1 } }  */
+/* { dg-final { scan-assembler-times "vmul\[^\n\]*\\\{1to4\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "vmul\[^\n\]*\\\{1to8\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "vmul\[^\n\]*\\\{1to16\\\}" 1 } }  */
+/* { dg-final { scan-assembler-times "vdiv\[^\n\]*\\\{1to2\\\}" 1 } }  */
+/* { dg-final { scan-assembler-times "vdiv\[^\n\]*\\\{1to4\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "vdiv\[^\n\]*\\\{1to8\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "vdiv\[^\n\]*\\\{1to16\\\}" 1 } }  */
+/* { dg-final { scan-assembler-times "vfmadd\[^\n\]*\\\{1to2\\\}" 1 } }  */
+/* { dg-final { scan-assembler-times "vfmadd\[^\n\]*\\\{1to4\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "vfmadd\[^\n\]*\\\{1to8\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "vfmadd\[^\n\]*\\\{1to16\\\}" 1 } }  */
+/* { dg-final { scan-assembler-times "vfmsub\[^\n\]*\\\{1to2\\\}" 1 } }  */
+/* { dg-final { scan-assembler-times "vfmsub\[^\n\]*\\\{1to4\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "vfmsub\[^\n\]*\\\{1to8\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "vfmsub\[^\n\]*\\\{1to16\\\}" 1 } }  */
+/* { dg-final { scan-assembler-times "vfnmadd\[^\n\]*\\\{1to2\\\}" 1 } }  */
+/* { dg-final { scan-assembler-times "vfnmadd\[^\n\]*\\\{1to4\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "vfnmadd\[^\n\]*\\\{1to8\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "vfnmadd\[^\n\]*\\\{1to16\\\}" 1 } }  */
+/* { dg-final { scan-assembler-times "vfnmsub\[^\n\]*\\\{1to2\\\}" 1 } }  */
+/* { dg-final { scan-assembler-times "vfnmsub\[^\n\]*\\\{1to4\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "vfnmsub\[^\n\]*\\\{1to8\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "vfnmsub\[^\n\]*\\\{1to16\\\}" 1 } }  */
+
+#include<immintrin.h>
+
+#define CONSTANT 101
+
+#define FOO(VTYPE, OP_NAME, LEN, SUFFIX, MTYPE)			\
+  VTYPE									\
+  __attribute__ ((noipa))						\
+  _mm##LEN##_foo_##OP_NAME##_##SUFFIX (VTYPE dst, VTYPE src, MTYPE m)	\
+  {									\
+    return  _mm##LEN##_mask_##OP_NAME##_##SUFFIX (dst, m, src,		\
+						  _mm##LEN##_set1_##SUFFIX (CONSTANT)); \
+  }									\
+
+#define FOOZ(VTYPE, OP_NAME, LEN, SUFFIX, MTYPE)			\
+  VTYPE									\
+  __attribute__ ((noipa))						\
+  _mm##LEN##_fooz_##OP_NAME##_##SUFFIX (VTYPE dst, VTYPE src, MTYPE m)	\
+  {									\
+    return  _mm##LEN##_maskz_##OP_NAME##_##SUFFIX (m, dst, src,		\
+						  _mm##LEN##_set1_##SUFFIX (CONSTANT)); \
+  }									\
+
+FOO (__m512, add, 512, ps, __mmask16);
+FOO (__m256, add, 256, ps, __mmask8);
+FOO (__m128, add,, ps, __mmask8);
+
+FOO (__m512, sub, 512, ps, __mmask16);
+FOO (__m256, sub, 256, ps, __mmask8);
+FOO (__m128, sub,, ps, __mmask8);
+
+FOO (__m512, mul, 512, ps, __mmask16);
+FOO (__m256, mul, 256, ps, __mmask8);
+FOO (__m128, mul,, ps, __mmask8);
+
+FOO (__m512, div, 512, ps, __mmask16);
+FOO (__m256, div, 256, ps, __mmask8);
+FOO (__m128, div,, ps, __mmask8);
+
+FOOZ (__m512, fmadd, 512, ps, __mmask16);
+FOOZ (__m256, fmadd, 256, ps, __mmask8);
+FOOZ (__m128, fmadd,, ps, __mmask8);
+
+FOOZ (__m512, fmsub, 512, ps, __mmask16);
+FOOZ (__m256, fmsub, 256, ps, __mmask8);
+FOOZ (__m128, fmsub,, ps, __mmask8);
+
+FOOZ (__m512, fnmadd, 512, ps, __mmask16);
+FOOZ (__m256, fnmadd, 256, ps, __mmask8);
+FOOZ (__m128, fnmadd,, ps, __mmask8);
+
+FOOZ (__m512, fnmsub, 512, ps, __mmask16);
+FOOZ (__m256, fnmsub, 256, ps, __mmask8);
+FOOZ (__m128, fnmsub,, ps, __mmask8);
+
+FOO (__m512d, add, 512, pd, __mmask8);
+FOO (__m256d, add, 256, pd, __mmask8);
+FOO (__m128d, add,, pd, __mmask8);
+
+FOO (__m512d, sub, 512, pd, __mmask8);
+FOO (__m256d, sub, 256, pd, __mmask8);
+FOO (__m128d, sub,, pd, __mmask8);
+
+FOO (__m512d, mul, 512, pd, __mmask8);
+FOO (__m256d, mul, 256, pd, __mmask8);
+FOO (__m128d, mul,, pd, __mmask8);
+
+FOO (__m512d, div, 512, pd, __mmask8);
+FOO (__m256d, div, 256, pd, __mmask8);
+FOO (__m128d, div,, pd, __mmask8);
+
+FOOZ (__m512d, fmadd, 512, pd, __mmask8);
+FOOZ (__m256d, fmadd, 256, pd, __mmask8);
+FOOZ (__m128d, fmadd,, pd, __mmask8);
+
+FOOZ (__m512d, fmsub, 512, pd, __mmask8);
+FOOZ (__m256d, fmsub, 256, pd, __mmask8);
+FOOZ (__m128d, fmsub,, pd, __mmask8);
+
+FOOZ (__m512d, fnmadd, 512, pd, __mmask8);
+FOOZ (__m256d, fnmadd, 256, pd, __mmask8);
+FOOZ (__m128d, fnmadd,, pd, __mmask8);
+
+FOOZ (__m512d, fnmsub, 512, pd, __mmask8);
+FOOZ (__m256d, fnmsub, 256, pd, __mmask8);
+FOOZ (__m128d, fnmsub,, pd, __mmask8);
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-1.c b/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-1.c
new file mode 100644
index 00000000000..00d2fb1b955
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-1.c
@@ -0,0 +1,41 @@
+/* PR target/87767 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512f" } */
+/* { dg-final { scan-assembler-not "\\\{1to\[248\]\\\}" } }  */
+/* { dg-final { scan-assembler-not "\\\{1to16\\\}" } }  */
+
+typedef int v4si  __attribute__ ((vector_size (16)));
+typedef int v8si  __attribute__ ((vector_size (32)));
+typedef long long v2di  __attribute__ ((vector_size (16)));
+typedef long long v4di  __attribute__ ((vector_size (32)));
+typedef float v4sf  __attribute__ ((vector_size (16)));
+typedef float v8sf  __attribute__ ((vector_size (32)));
+typedef double v2df  __attribute__ ((vector_size (16)));
+typedef double v4df  __attribute__ ((vector_size (32)));
+
+#define CONSTANT 101;
+#define FOO(VTYPE, OP_NAME, OP)			\
+VTYPE						\
+ __attribute__ ((noipa))			\
+foo_##OP_NAME##_##VTYPE (VTYPE a)		\
+{						\
+  return a OP CONSTANT;				\
+}						\
+
+FOO (v4si, add, +);
+FOO (v8si, add, +);
+FOO (v2di, add, +);
+FOO (v4di, add, +);
+FOO (v4sf, add, +);
+FOO (v8sf, add, +);
+FOO (v2df, add, +);
+FOO (v4df, add, +);
+
+FOO (v4si, mul, *);
+FOO (v8si, mul, *);
+FOO (v2di, mul, *);
+FOO (v4di, mul, *);
+FOO (v4sf, mul, *);
+FOO (v8sf, mul, *);
+FOO (v2df, mul, *);
+FOO (v4df, mul, *);
diff --git a/gcc/testsuite/gcc.target/i386/pr92865-1.c b/gcc/testsuite/gcc.target/i386/pr92865-1.c
index 49b5778a067..a37487d9af7 100644
--- a/gcc/testsuite/gcc.target/i386/pr92865-1.c
+++ b/gcc/testsuite/gcc.target/i386/pr92865-1.c
@@ -3,10 +3,11 @@
 /* { dg-options "-Ofast -mavx512f -mavx512bw -mxop" } */
 /* { dg-final { scan-assembler-times "vpcmp\[bwdq\]\[\t ]" 4 } } */
 /* { dg-final { scan-assembler-times "vpcmpu\[bwdq\]\[\t ]" 4 } } */
-/* { dg-final { scan-assembler-times "vmovdq\[au\]8\[\t ]" 4 } } */
-/* { dg-final { scan-assembler-times "vmovdq\[au\]16\[\t ]" 4 } } *
-/* { dg-final { scan-assembler-times "vmovdq\[au\]32\[\t ]" 4 } } */
-/* { dg-final { scan-assembler-times "vmovdq\[au\]64\[\t ]" 4 } } */
+/* { dg-final { scan-assembler-times "vmovdq\[au\]8\[\t ]" 2 } } */
+/* { dg-final { scan-assembler-times "vmovdq\[au\]16\[\t ]" 2 } } *
+/* { dg-final { scan-assembler-times "vmovdq\[au\]32\[\t ]" 2 } } */
+/* { dg-final { scan-assembler-times "vmovdq\[au\]64\[\t ]" 2 } } */
+/* { dg-final { scan-assembler-times "vpbroadcast\[bwqd\]\[\t ]" 16 } } */
 
 extern char arraysb[64];
 extern short arraysw[32];
-- 
2.18.1


  reply	other threads:[~2020-08-04  6:05 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-09  8:33 Hongtao Liu
2020-07-10  9:24 ` Hongtao Liu
2020-07-17  7:24   ` Hongtao Liu
2020-07-23  8:39 ` Jan Hubicka
2020-07-23 13:53   ` Hongtao Liu
2020-07-24  2:37     ` Hongtao Liu
2020-08-04  6:05       ` Hongtao Liu [this message]
2020-08-26 21:23         ` Jeff Law
2020-08-27 11:09           ` Jan Hubicka
2020-08-27 12:24 ` Jakub Jelinek
2020-08-27 13:07   ` Richard Biener
2020-08-27 13:20     ` Jakub Jelinek
2020-08-28  6:47       ` Richard Biener
2020-08-28  8:52         ` Jakub Jelinek
2020-08-28 10:36           ` Richard Biener
2020-08-28 10:47             ` Jakub Jelinek
2020-08-28 11:06               ` Richard Biener
2020-08-28 11:26                 ` Jakub Jelinek
2020-08-28 14:53                 ` Jakub Jelinek
2020-08-28 16:07                   ` Richard Sandiford
2020-08-28 16:25                     ` Jakub Jelinek
2020-08-30  9:24                       ` Jakub Jelinek
2020-08-31  8:18                         ` Richard Biener
2020-08-28 17:18   ` Hongtao Liu
2020-09-01  9:55   ` Hongtao Liu
2020-09-01 10:11     ` Jakub Jelinek
2020-09-02  1:57       ` Hongtao Liu
2020-09-02  9:58         ` Jakub Jelinek
2020-09-03  2:11           ` Hongtao Liu
2020-09-03  7:27             ` Jakub Jelinek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMZc-bx2yWXv7d1D9FNOBo_qbaCUk07PLRkfy6HuNyZG9RhqYA@mail.gmail.com \
    --to=crazylht@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=hubicka@ucw.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).