public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [PATCH] [PR target/97194] [AVX2] Support variable index vec_set.
@ 2020-11-11  8:45 Uros Bizjak
  2020-11-12  2:06 ` Hongtao Liu
  0 siblings, 1 reply; 23+ messages in thread
From: Uros Bizjak @ 2020-11-11  8:45 UTC (permalink / raw)
  To: gcc-patches

> gcc/ChangeLog:
>
> PR target/97194
> * config/i386/i386-expand.c (ix86_expand_vector_set_var): New function.
> * config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl.
> * config/i386/predicates.md (vec_setm_operand): New predicate,
> true for const_int_operand or register_operand under TARGET_AVX2.
> * config/i386/sse.md (vec_set<mode>): Support both constant
> and variable index vec_set.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/avx2-vec-set-1.c: New test.
> * gcc.target/i386/avx2-vec-set-2.c: New test.
> * gcc.target/i386/avx512bw-vec-set-1.c: New test.
> * gcc.target/i386/avx512bw-vec-set-2.c: New test.
> * gcc.target/i386/avx512f-vec-set-2.c: New test.
> * gcc.target/i386/avx512vl-vec-set-2.c: New test.

+;; True for registers, or const_int_operand, used to vec_setm expander.
+(define_predicate "vec_setm_operand"
+  (ior (and (match_operand 0 "register_operand")
+    (match_test "TARGET_AVX2"))
+       (match_code "const_int")))
+
 ;; True for registers, or 1 or -1.  Used to optimize double-word shifts.
 (define_predicate "reg_or_pm1_operand"
   (ior (match_operand 0 "register_operand")
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index b153a87fb98..1798e5dea75 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -8098,11 +8098,14 @@ (define_insn "vec_setv2df_0"
 (define_expand "vec_set<mode>"
   [(match_operand:V 0 "register_operand")
    (match_operand:<ssescalarmode> 1 "register_operand")
-   (match_operand 2 "const_int_operand")]
+   (match_operand 2 "vec_setm_operand")]

You need to specify a mode, otherwise a register of any mode can pass here.

Uros.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] [PR target/97194] [AVX2] Support variable index vec_set.
  2020-11-11  8:45 [PATCH] [PR target/97194] [AVX2] Support variable index vec_set Uros Bizjak
@ 2020-11-12  2:06 ` Hongtao Liu
  2020-11-12  8:20   ` Uros Bizjak
  0 siblings, 1 reply; 23+ messages in thread
From: Hongtao Liu @ 2020-11-12  2:06 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Richard Biener

On Wed, Nov 11, 2020 at 4:45 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> > gcc/ChangeLog:
> >
> > PR target/97194
> > * config/i386/i386-expand.c (ix86_expand_vector_set_var): New function.
> > * config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl.
> > * config/i386/predicates.md (vec_setm_operand): New predicate,
> > true for const_int_operand or register_operand under TARGET_AVX2.
> > * config/i386/sse.md (vec_set<mode>): Support both constant
> > and variable index vec_set.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/avx2-vec-set-1.c: New test.
> > * gcc.target/i386/avx2-vec-set-2.c: New test.
> > * gcc.target/i386/avx512bw-vec-set-1.c: New test.
> > * gcc.target/i386/avx512bw-vec-set-2.c: New test.
> > * gcc.target/i386/avx512f-vec-set-2.c: New test.
> > * gcc.target/i386/avx512vl-vec-set-2.c: New test.
>
> +;; True for registers, or const_int_operand, used to vec_setm expander.
> +(define_predicate "vec_setm_operand"
> +  (ior (and (match_operand 0 "register_operand")
> +    (match_test "TARGET_AVX2"))
> +       (match_code "const_int")))
> +
>  ;; True for registers, or 1 or -1.  Used to optimize double-word shifts.
>  (define_predicate "reg_or_pm1_operand"
>    (ior (match_operand 0 "register_operand")
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index b153a87fb98..1798e5dea75 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -8098,11 +8098,14 @@ (define_insn "vec_setv2df_0"
>  (define_expand "vec_set<mode>"
>    [(match_operand:V 0 "register_operand")
>     (match_operand:<ssescalarmode> 1 "register_operand")
> -   (match_operand 2 "const_int_operand")]
> +   (match_operand 2 "vec_setm_operand")]
>
> You need to specify a mode, otherwise a register of any mode can pass here.
>
Yes, theoretically, we only accept integer types. But in can_vec_set_var_idx_p
cut
---
bool
can_vec_set_var_idx_p (machine_mode vec_mode)
{
  if (!VECTOR_MODE_P (vec_mode))
    return false;

  machine_mode inner_mode = GET_MODE_INNER (vec_mode);
  rtx reg1 = alloca_raw_REG (vec_mode, LAST_VIRTUAL_REGISTER + 1);
  rtx reg2 = alloca_raw_REG (inner_mode, LAST_VIRTUAL_REGISTER + 2);
  rtx reg3 = alloca_raw_REG (VOIDmode, LAST_VIRTUAL_REGISTER + 3);

  enum insn_code icode = optab_handler (vec_set_optab, vec_mode);

  return icode != CODE_FOR_nothing && insn_operand_matches (icode, 0, reg1)
         && insn_operand_matches (icode, 1, reg2)
         && insn_operand_matches (icode, 2, reg3);
}
---

reg3 is assumed to be VOIDmode, set anymode in match_operand 2 will
fail insn_operand_matches (icode, 2, reg3)
---
(gdb) p insn_operand_matches(icode,2,reg3)
$5 = false
(gdb)
---

Maybe we need to change

rtx reg3 = alloca_raw_REG (VOIDmode, LAST_VIRTUAL_REGISTER + 3);

to

rtx reg3 = alloca_raw_REG (SImode, LAST_VIRTUAL_REGISTER + 3);

cc Richard Biener, any thoughts?

> Uros.



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] [PR target/97194] [AVX2] Support variable index vec_set.
  2020-11-12  2:06 ` Hongtao Liu
@ 2020-11-12  8:20   ` Uros Bizjak
  2020-11-12  9:12     ` Hongtao Liu
  0 siblings, 1 reply; 23+ messages in thread
From: Uros Bizjak @ 2020-11-12  8:20 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: gcc-patches, Richard Biener

On Thu, Nov 12, 2020 at 3:04 AM Hongtao Liu <crazylht@gmail.com> wrote:

> > > gcc/ChangeLog:
> > >
> > > PR target/97194
> > > * config/i386/i386-expand.c (ix86_expand_vector_set_var): New function.
> > > * config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl.
> > > * config/i386/predicates.md (vec_setm_operand): New predicate,
> > > true for const_int_operand or register_operand under TARGET_AVX2.
> > > * config/i386/sse.md (vec_set<mode>): Support both constant
> > > and variable index vec_set.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/i386/avx2-vec-set-1.c: New test.
> > > * gcc.target/i386/avx2-vec-set-2.c: New test.
> > > * gcc.target/i386/avx512bw-vec-set-1.c: New test.
> > > * gcc.target/i386/avx512bw-vec-set-2.c: New test.
> > > * gcc.target/i386/avx512f-vec-set-2.c: New test.
> > > * gcc.target/i386/avx512vl-vec-set-2.c: New test.
> >
> > +;; True for registers, or const_int_operand, used to vec_setm expander.
> > +(define_predicate "vec_setm_operand"
> > +  (ior (and (match_operand 0 "register_operand")
> > +    (match_test "TARGET_AVX2"))
> > +       (match_code "const_int")))
> > +
> >  ;; True for registers, or 1 or -1.  Used to optimize double-word shifts.
> >  (define_predicate "reg_or_pm1_operand"
> >    (ior (match_operand 0 "register_operand")
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > index b153a87fb98..1798e5dea75 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -8098,11 +8098,14 @@ (define_insn "vec_setv2df_0"
> >  (define_expand "vec_set<mode>"
> >    [(match_operand:V 0 "register_operand")
> >     (match_operand:<ssescalarmode> 1 "register_operand")
> > -   (match_operand 2 "const_int_operand")]
> > +   (match_operand 2 "vec_setm_operand")]
> >
> > You need to specify a mode, otherwise a register of any mode can pass here.
> >
> Yes, theoretically, we only accept integer types. But in can_vec_set_var_idx_p
> cut
> ---
> bool
> can_vec_set_var_idx_p (machine_mode vec_mode)
> {
>   if (!VECTOR_MODE_P (vec_mode))
>     return false;
>
>   machine_mode inner_mode = GET_MODE_INNER (vec_mode);
>   rtx reg1 = alloca_raw_REG (vec_mode, LAST_VIRTUAL_REGISTER + 1);
>   rtx reg2 = alloca_raw_REG (inner_mode, LAST_VIRTUAL_REGISTER + 2);
>   rtx reg3 = alloca_raw_REG (VOIDmode, LAST_VIRTUAL_REGISTER + 3);
>
>   enum insn_code icode = optab_handler (vec_set_optab, vec_mode);
>
>   return icode != CODE_FOR_nothing && insn_operand_matches (icode, 0, reg1)
>          && insn_operand_matches (icode, 1, reg2)
>          && insn_operand_matches (icode, 2, reg3);
> }
> ---
>
> reg3 is assumed to be VOIDmode, set anymode in match_operand 2 will
> fail insn_operand_matches (icode, 2, reg3)
> ---
> (gdb) p insn_operand_matches(icode,2,reg3)
> $5 = false
> (gdb)
> ---
>
> Maybe we need to change
>
> rtx reg3 = alloca_raw_REG (VOIDmode, LAST_VIRTUAL_REGISTER + 3);
>
> to
>
> rtx reg3 = alloca_raw_REG (SImode, LAST_VIRTUAL_REGISTER + 3);
>
> cc Richard Biener, any thoughts?

There are two targets (gcn in gcn-valu.md and s390 in vector.md) that
specify SImode for operand 2 in vec_setM pattern and allow register
operands. I wonder if and how they manage to generate the pattern.

Uros.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] [PR target/97194] [AVX2] Support variable index vec_set.
  2020-11-12  8:20   ` Uros Bizjak
@ 2020-11-12  9:12     ` Hongtao Liu
  2020-11-12  9:15       ` Hongtao Liu
  2020-11-16 10:16       ` Uros Bizjak
  0 siblings, 2 replies; 23+ messages in thread
From: Hongtao Liu @ 2020-11-12  9:12 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Richard Biener

On Thu, Nov 12, 2020 at 4:21 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Thu, Nov 12, 2020 at 3:04 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> > > > gcc/ChangeLog:
> > > >
> > > > PR target/97194
> > > > * config/i386/i386-expand.c (ix86_expand_vector_set_var): New function.
> > > > * config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl.
> > > > * config/i386/predicates.md (vec_setm_operand): New predicate,
> > > > true for const_int_operand or register_operand under TARGET_AVX2.
> > > > * config/i386/sse.md (vec_set<mode>): Support both constant
> > > > and variable index vec_set.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > * gcc.target/i386/avx2-vec-set-1.c: New test.
> > > > * gcc.target/i386/avx2-vec-set-2.c: New test.
> > > > * gcc.target/i386/avx512bw-vec-set-1.c: New test.
> > > > * gcc.target/i386/avx512bw-vec-set-2.c: New test.
> > > > * gcc.target/i386/avx512f-vec-set-2.c: New test.
> > > > * gcc.target/i386/avx512vl-vec-set-2.c: New test.
> > >
> > > +;; True for registers, or const_int_operand, used to vec_setm expander.
> > > +(define_predicate "vec_setm_operand"
> > > +  (ior (and (match_operand 0 "register_operand")
> > > +    (match_test "TARGET_AVX2"))
> > > +       (match_code "const_int")))
> > > +
> > >  ;; True for registers, or 1 or -1.  Used to optimize double-word shifts.
> > >  (define_predicate "reg_or_pm1_operand"
> > >    (ior (match_operand 0 "register_operand")
> > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > index b153a87fb98..1798e5dea75 100644
> > > --- a/gcc/config/i386/sse.md
> > > +++ b/gcc/config/i386/sse.md
> > > @@ -8098,11 +8098,14 @@ (define_insn "vec_setv2df_0"
> > >  (define_expand "vec_set<mode>"
> > >    [(match_operand:V 0 "register_operand")
> > >     (match_operand:<ssescalarmode> 1 "register_operand")
> > > -   (match_operand 2 "const_int_operand")]
> > > +   (match_operand 2 "vec_setm_operand")]
> > >
> > > You need to specify a mode, otherwise a register of any mode can pass here.
> > >
> > Yes, theoretically, we only accept integer types. But in can_vec_set_var_idx_p
> > cut
> > ---
> > bool
> > can_vec_set_var_idx_p (machine_mode vec_mode)
> > {
> >   if (!VECTOR_MODE_P (vec_mode))
> >     return false;
> >
> >   machine_mode inner_mode = GET_MODE_INNER (vec_mode);
> >   rtx reg1 = alloca_raw_REG (vec_mode, LAST_VIRTUAL_REGISTER + 1);
> >   rtx reg2 = alloca_raw_REG (inner_mode, LAST_VIRTUAL_REGISTER + 2);
> >   rtx reg3 = alloca_raw_REG (VOIDmode, LAST_VIRTUAL_REGISTER + 3);
> >
> >   enum insn_code icode = optab_handler (vec_set_optab, vec_mode);
> >
> >   return icode != CODE_FOR_nothing && insn_operand_matches (icode, 0, reg1)
> >          && insn_operand_matches (icode, 1, reg2)
> >          && insn_operand_matches (icode, 2, reg3);
> > }
> > ---
> >
> > reg3 is assumed to be VOIDmode, set anymode in match_operand 2 will
> > fail insn_operand_matches (icode, 2, reg3)
> > ---
> > (gdb) p insn_operand_matches(icode,2,reg3)
> > $5 = false
> > (gdb)
> > ---
> >
> > Maybe we need to change
> >
> > rtx reg3 = alloca_raw_REG (VOIDmode, LAST_VIRTUAL_REGISTER + 3);
> >
> > to
> >
> > rtx reg3 = alloca_raw_REG (SImode, LAST_VIRTUAL_REGISTER + 3);
> >
> > cc Richard Biener, any thoughts?
>
> There are two targets (gcn in gcn-valu.md and s390 in vector.md) that
> specify SImode for operand 2 in vec_setM pattern and allow register
> operands. I wonder if and how they manage to generate the pattern.
>
> Uros.

Variable index vec_set is enabled by r11-3486, about two months ago in
[1]. But for the upper two targets, the codes are already there since
GCC10(maybe earlier, i just looked at gcc10 branch), I don't think
those codes are for [1].

[1] https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555905.html


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] [PR target/97194] [AVX2] Support variable index vec_set.
  2020-11-12  9:12     ` Hongtao Liu
@ 2020-11-12  9:15       ` Hongtao Liu
  2020-11-12  9:25         ` Hongtao Liu
  2020-11-16 10:16       ` Uros Bizjak
  1 sibling, 1 reply; 23+ messages in thread
From: Hongtao Liu @ 2020-11-12  9:15 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Richard Biener

On Thu, Nov 12, 2020 at 5:12 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Thu, Nov 12, 2020 at 4:21 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Thu, Nov 12, 2020 at 3:04 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > PR target/97194
> > > > > * config/i386/i386-expand.c (ix86_expand_vector_set_var): New function.
> > > > > * config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl.
> > > > > * config/i386/predicates.md (vec_setm_operand): New predicate,
> > > > > true for const_int_operand or register_operand under TARGET_AVX2.
> > > > > * config/i386/sse.md (vec_set<mode>): Support both constant
> > > > > and variable index vec_set.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > > * gcc.target/i386/avx2-vec-set-1.c: New test.
> > > > > * gcc.target/i386/avx2-vec-set-2.c: New test.
> > > > > * gcc.target/i386/avx512bw-vec-set-1.c: New test.
> > > > > * gcc.target/i386/avx512bw-vec-set-2.c: New test.
> > > > > * gcc.target/i386/avx512f-vec-set-2.c: New test.
> > > > > * gcc.target/i386/avx512vl-vec-set-2.c: New test.
> > > >
> > > > +;; True for registers, or const_int_operand, used to vec_setm expander.
> > > > +(define_predicate "vec_setm_operand"
> > > > +  (ior (and (match_operand 0 "register_operand")
> > > > +    (match_test "TARGET_AVX2"))
> > > > +       (match_code "const_int")))
> > > > +
> > > >  ;; True for registers, or 1 or -1.  Used to optimize double-word shifts.
> > > >  (define_predicate "reg_or_pm1_operand"
> > > >    (ior (match_operand 0 "register_operand")
> > > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > > index b153a87fb98..1798e5dea75 100644
> > > > --- a/gcc/config/i386/sse.md
> > > > +++ b/gcc/config/i386/sse.md
> > > > @@ -8098,11 +8098,14 @@ (define_insn "vec_setv2df_0"
> > > >  (define_expand "vec_set<mode>"
> > > >    [(match_operand:V 0 "register_operand")
> > > >     (match_operand:<ssescalarmode> 1 "register_operand")
> > > > -   (match_operand 2 "const_int_operand")]
> > > > +   (match_operand 2 "vec_setm_operand")]
> > > >
> > > > You need to specify a mode, otherwise a register of any mode can pass here.
> > > >
> > > Yes, theoretically, we only accept integer types. But in can_vec_set_var_idx_p
> > > cut
> > > ---
> > > bool
> > > can_vec_set_var_idx_p (machine_mode vec_mode)
> > > {
> > >   if (!VECTOR_MODE_P (vec_mode))
> > >     return false;
> > >
> > >   machine_mode inner_mode = GET_MODE_INNER (vec_mode);
> > >   rtx reg1 = alloca_raw_REG (vec_mode, LAST_VIRTUAL_REGISTER + 1);
> > >   rtx reg2 = alloca_raw_REG (inner_mode, LAST_VIRTUAL_REGISTER + 2);
> > >   rtx reg3 = alloca_raw_REG (VOIDmode, LAST_VIRTUAL_REGISTER + 3);
> > >
> > >   enum insn_code icode = optab_handler (vec_set_optab, vec_mode);
> > >
> > >   return icode != CODE_FOR_nothing && insn_operand_matches (icode, 0, reg1)
> > >          && insn_operand_matches (icode, 1, reg2)
> > >          && insn_operand_matches (icode, 2, reg3);
> > > }
> > > ---
> > >
> > > reg3 is assumed to be VOIDmode, set anymode in match_operand 2 will
> > > fail insn_operand_matches (icode, 2, reg3)
> > > ---
> > > (gdb) p insn_operand_matches(icode,2,reg3)
> > > $5 = false
> > > (gdb)
> > > ---
> > >
> > > Maybe we need to change
> > >
> > > rtx reg3 = alloca_raw_REG (VOIDmode, LAST_VIRTUAL_REGISTER + 3);
> > >
> > > to
> > >
> > > rtx reg3 = alloca_raw_REG (SImode, LAST_VIRTUAL_REGISTER + 3);
> > >
> > > cc Richard Biener, any thoughts?
> >
> > There are two targets (gcn in gcn-valu.md and s390 in vector.md) that
> > specify SImode for operand 2 in vec_setM pattern and allow register
> > operands. I wonder if and how they manage to generate the pattern.
> >
> > Uros.
>
> Variable index vec_set is enabled by r11-3486, about two months ago in
> [1]. But for the upper two targets, the codes are already there since
> GCC10(maybe earlier, i just looked at gcc10 branch), I don't think
> those codes are for [1].
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555905.html
>
>
> --
> BR,
> Hongtao

Correct [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554240.html

-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] [PR target/97194] [AVX2] Support variable index vec_set.
  2020-11-12  9:15       ` Hongtao Liu
@ 2020-11-12  9:25         ` Hongtao Liu
  2020-11-12 13:59           ` Richard Biener
  0 siblings, 1 reply; 23+ messages in thread
From: Hongtao Liu @ 2020-11-12  9:25 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Richard Biener

On Thu, Nov 12, 2020 at 5:15 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Thu, Nov 12, 2020 at 5:12 PM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Thu, Nov 12, 2020 at 4:21 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > On Thu, Nov 12, 2020 at 3:04 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > > PR target/97194
> > > > > > * config/i386/i386-expand.c (ix86_expand_vector_set_var): New function.
> > > > > > * config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl.
> > > > > > * config/i386/predicates.md (vec_setm_operand): New predicate,
> > > > > > true for const_int_operand or register_operand under TARGET_AVX2.
> > > > > > * config/i386/sse.md (vec_set<mode>): Support both constant
> > > > > > and variable index vec_set.
> > > > > >
> > > > > > gcc/testsuite/ChangeLog:
> > > > > >
> > > > > > * gcc.target/i386/avx2-vec-set-1.c: New test.
> > > > > > * gcc.target/i386/avx2-vec-set-2.c: New test.
> > > > > > * gcc.target/i386/avx512bw-vec-set-1.c: New test.
> > > > > > * gcc.target/i386/avx512bw-vec-set-2.c: New test.
> > > > > > * gcc.target/i386/avx512f-vec-set-2.c: New test.
> > > > > > * gcc.target/i386/avx512vl-vec-set-2.c: New test.
> > > > >
> > > > > +;; True for registers, or const_int_operand, used to vec_setm expander.
> > > > > +(define_predicate "vec_setm_operand"
> > > > > +  (ior (and (match_operand 0 "register_operand")
> > > > > +    (match_test "TARGET_AVX2"))
> > > > > +       (match_code "const_int")))
> > > > > +
> > > > >  ;; True for registers, or 1 or -1.  Used to optimize double-word shifts.
> > > > >  (define_predicate "reg_or_pm1_operand"
> > > > >    (ior (match_operand 0 "register_operand")
> > > > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > > > index b153a87fb98..1798e5dea75 100644
> > > > > --- a/gcc/config/i386/sse.md
> > > > > +++ b/gcc/config/i386/sse.md
> > > > > @@ -8098,11 +8098,14 @@ (define_insn "vec_setv2df_0"
> > > > >  (define_expand "vec_set<mode>"
> > > > >    [(match_operand:V 0 "register_operand")
> > > > >     (match_operand:<ssescalarmode> 1 "register_operand")
> > > > > -   (match_operand 2 "const_int_operand")]
> > > > > +   (match_operand 2 "vec_setm_operand")]
> > > > >
> > > > > You need to specify a mode, otherwise a register of any mode can pass here.
> > > > >
> > > > Yes, theoretically, we only accept integer types. But in can_vec_set_var_idx_p
> > > > cut
> > > > ---
> > > > bool
> > > > can_vec_set_var_idx_p (machine_mode vec_mode)
> > > > {
> > > >   if (!VECTOR_MODE_P (vec_mode))
> > > >     return false;
> > > >
> > > >   machine_mode inner_mode = GET_MODE_INNER (vec_mode);
> > > >   rtx reg1 = alloca_raw_REG (vec_mode, LAST_VIRTUAL_REGISTER + 1);
> > > >   rtx reg2 = alloca_raw_REG (inner_mode, LAST_VIRTUAL_REGISTER + 2);
> > > >   rtx reg3 = alloca_raw_REG (VOIDmode, LAST_VIRTUAL_REGISTER + 3);
> > > >
> > > >   enum insn_code icode = optab_handler (vec_set_optab, vec_mode);
> > > >
> > > >   return icode != CODE_FOR_nothing && insn_operand_matches (icode, 0, reg1)
> > > >          && insn_operand_matches (icode, 1, reg2)
> > > >          && insn_operand_matches (icode, 2, reg3);
> > > > }
> > > > ---
> > > >
> > > > reg3 is assumed to be VOIDmode, set anymode in match_operand 2 will
> > > > fail insn_operand_matches (icode, 2, reg3)
> > > > ---
> > > > (gdb) p insn_operand_matches(icode,2,reg3)
> > > > $5 = false
> > > > (gdb)
> > > > ---
> > > >
> > > > Maybe we need to change
> > > >
> > > > rtx reg3 = alloca_raw_REG (VOIDmode, LAST_VIRTUAL_REGISTER + 3);
> > > >
> > > > to
> > > >
> > > > rtx reg3 = alloca_raw_REG (SImode, LAST_VIRTUAL_REGISTER + 3);
> > > >
> > > > cc Richard Biener, any thoughts?
> > >
> > > There are two targets (gcn in gcn-valu.md and s390 in vector.md) that
> > > specify SImode for operand 2 in vec_setM pattern and allow register
> > > operands. I wonder if and how they manage to generate the pattern.
> > >
> > > Uros.
> >
> > Variable index vec_set is enabled by r11-3486, about two months ago in
> > [1]. But for the upper two targets, the codes are already there since
> > GCC10(maybe earlier, i just looked at gcc10 branch), I don't think
> > those codes are for [1].
> >
> > [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555905.html
> >
> >
> > --
> > BR,
> > Hongtao
>
> Correct [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554240.html
>
> --
> BR,
> Hongtao

in https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554592.html

It says

> >> +can_vec_set_var_idx_p (enum tree_code code, machine_mode vec_mode,
> >> +                      machine_mode value_mode, machine_mode idx_mode)
> >
> > toplevel comment missing
> >
> >> +{
> >> +  gcc_assert (code == VECTOR_TYPE);
> >
> > what's the point of pasing 'code' here then?  Since the optab only has a single
> > mode, the vector mode, the value_mode is redundant as well.  And I guess
> > we might want to handle "arbitrary" index modes?  That is, the .md expanders
> > should not restrict its mode - I guess it simply uses VOIDmode at the moment
> > (for integer constants).  Not sure how to best do this without an explicit mode
> > in the optab ...
>
> Yes, removed 'code' and value_mode by checking VECTOR_MODE_P and use GET_MODE_INNER
> for value_mode.  ".md expanders" shall support for integer constants index mode, but
> I guess they shouldn't be expanded by IFN as this function is for variable index
> insert only?  Anyway, the v3 patch used VOIDmode check...


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] [PR target/97194] [AVX2] Support variable index vec_set.
  2020-11-12  9:25         ` Hongtao Liu
@ 2020-11-12 13:59           ` Richard Biener
  2020-11-12 17:51             ` Uros Bizjak
  2020-11-16 11:57             ` Uros Bizjak
  0 siblings, 2 replies; 23+ messages in thread
From: Richard Biener @ 2020-11-12 13:59 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: Uros Bizjak, gcc-patches

On Thu, Nov 12, 2020 at 10:23 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Thu, Nov 12, 2020 at 5:15 PM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Thu, Nov 12, 2020 at 5:12 PM Hongtao Liu <crazylht@gmail.com> wrote:
> > >
> > > On Thu, Nov 12, 2020 at 4:21 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > >
> > > > On Thu, Nov 12, 2020 at 3:04 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > > >
> > > > > > > gcc/ChangeLog:
> > > > > > >
> > > > > > > PR target/97194
> > > > > > > * config/i386/i386-expand.c (ix86_expand_vector_set_var): New function.
> > > > > > > * config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl.
> > > > > > > * config/i386/predicates.md (vec_setm_operand): New predicate,
> > > > > > > true for const_int_operand or register_operand under TARGET_AVX2.
> > > > > > > * config/i386/sse.md (vec_set<mode>): Support both constant
> > > > > > > and variable index vec_set.
> > > > > > >
> > > > > > > gcc/testsuite/ChangeLog:
> > > > > > >
> > > > > > > * gcc.target/i386/avx2-vec-set-1.c: New test.
> > > > > > > * gcc.target/i386/avx2-vec-set-2.c: New test.
> > > > > > > * gcc.target/i386/avx512bw-vec-set-1.c: New test.
> > > > > > > * gcc.target/i386/avx512bw-vec-set-2.c: New test.
> > > > > > > * gcc.target/i386/avx512f-vec-set-2.c: New test.
> > > > > > > * gcc.target/i386/avx512vl-vec-set-2.c: New test.
> > > > > >
> > > > > > +;; True for registers, or const_int_operand, used to vec_setm expander.
> > > > > > +(define_predicate "vec_setm_operand"
> > > > > > +  (ior (and (match_operand 0 "register_operand")
> > > > > > +    (match_test "TARGET_AVX2"))
> > > > > > +       (match_code "const_int")))
> > > > > > +
> > > > > >  ;; True for registers, or 1 or -1.  Used to optimize double-word shifts.
> > > > > >  (define_predicate "reg_or_pm1_operand"
> > > > > >    (ior (match_operand 0 "register_operand")
> > > > > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > > > > index b153a87fb98..1798e5dea75 100644
> > > > > > --- a/gcc/config/i386/sse.md
> > > > > > +++ b/gcc/config/i386/sse.md
> > > > > > @@ -8098,11 +8098,14 @@ (define_insn "vec_setv2df_0"
> > > > > >  (define_expand "vec_set<mode>"
> > > > > >    [(match_operand:V 0 "register_operand")
> > > > > >     (match_operand:<ssescalarmode> 1 "register_operand")
> > > > > > -   (match_operand 2 "const_int_operand")]
> > > > > > +   (match_operand 2 "vec_setm_operand")]
> > > > > >
> > > > > > You need to specify a mode, otherwise a register of any mode can pass here.
> > > > > >
> > > > > Yes, theoretically, we only accept integer types. But in can_vec_set_var_idx_p
> > > > > cut
> > > > > ---
> > > > > bool
> > > > > can_vec_set_var_idx_p (machine_mode vec_mode)
> > > > > {
> > > > >   if (!VECTOR_MODE_P (vec_mode))
> > > > >     return false;
> > > > >
> > > > >   machine_mode inner_mode = GET_MODE_INNER (vec_mode);
> > > > >   rtx reg1 = alloca_raw_REG (vec_mode, LAST_VIRTUAL_REGISTER + 1);
> > > > >   rtx reg2 = alloca_raw_REG (inner_mode, LAST_VIRTUAL_REGISTER + 2);
> > > > >   rtx reg3 = alloca_raw_REG (VOIDmode, LAST_VIRTUAL_REGISTER + 3);
> > > > >
> > > > >   enum insn_code icode = optab_handler (vec_set_optab, vec_mode);
> > > > >
> > > > >   return icode != CODE_FOR_nothing && insn_operand_matches (icode, 0, reg1)
> > > > >          && insn_operand_matches (icode, 1, reg2)
> > > > >          && insn_operand_matches (icode, 2, reg3);
> > > > > }
> > > > > ---
> > > > >
> > > > > reg3 is assumed to be VOIDmode, set anymode in match_operand 2 will
> > > > > fail insn_operand_matches (icode, 2, reg3)
> > > > > ---
> > > > > (gdb) p insn_operand_matches(icode,2,reg3)
> > > > > $5 = false
> > > > > (gdb)
> > > > > ---
> > > > >
> > > > > Maybe we need to change
> > > > >
> > > > > rtx reg3 = alloca_raw_REG (VOIDmode, LAST_VIRTUAL_REGISTER + 3);
> > > > >
> > > > > to
> > > > >
> > > > > rtx reg3 = alloca_raw_REG (SImode, LAST_VIRTUAL_REGISTER + 3);
> > > > >
> > > > > cc Richard Biener, any thoughts?
> > > >
> > > > There are two targets (gcn in gcn-valu.md and s390 in vector.md) that
> > > > specify SImode for operand 2 in vec_setM pattern and allow register
> > > > operands. I wonder if and how they manage to generate the pattern.
> > > >
> > > > Uros.
> > >
> > > Variable index vec_set is enabled by r11-3486, about two months ago in
> > > [1]. But for the upper two targets, the codes are already there since
> > > GCC10(maybe earlier, i just looked at gcc10 branch), I don't think
> > > those codes are for [1].
> > >
> > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555905.html
> > >
> > >
> > > --
> > > BR,
> > > Hongtao
> >
> > Correct [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554240.html
> >
> > --
> > BR,
> > Hongtao
>
> in https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554592.html
>
> It says
>
> > >> +can_vec_set_var_idx_p (enum tree_code code, machine_mode vec_mode,
> > >> +                      machine_mode value_mode, machine_mode idx_mode)
> > >
> > > toplevel comment missing
> > >
> > >> +{
> > >> +  gcc_assert (code == VECTOR_TYPE);
> > >
> > > what's the point of pasing 'code' here then?  Since the optab only has a single
> > > mode, the vector mode, the value_mode is redundant as well.  And I guess
> > > we might want to handle "arbitrary" index modes?  That is, the .md expanders
> > > should not restrict its mode - I guess it simply uses VOIDmode at the moment
> > > (for integer constants).  Not sure how to best do this without an explicit mode
> > > in the optab ...
> >
> > Yes, removed 'code' and value_mode by checking VECTOR_MODE_P and use GET_MODE_INNER
> > for value_mode.  ".md expanders" shall support for integer constants index mode, but
> > I guess they shouldn't be expanded by IFN as this function is for variable index
> > insert only?  Anyway, the v3 patch used VOIDmode check...

I'm not sure what best to do here, as said accepting "any" (integer) mode as
input is desirable (SImode, DImode but eventually also smaller modes).  How
that can be best achieved I don't know.

Why's not specifying any mode in the patter no good?  Just make sure you
appropriately extend/subreg it?  We can make sure it will be an integer
mode in the expander itself.

Richard.

>
> --
> BR,
> Hongtao

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] [PR target/97194] [AVX2] Support variable index vec_set.
  2020-11-12 13:59           ` Richard Biener
@ 2020-11-12 17:51             ` Uros Bizjak
  2020-11-12 18:26               ` Uros Bizjak
  2020-11-16 11:57             ` Uros Bizjak
  1 sibling, 1 reply; 23+ messages in thread
From: Uros Bizjak @ 2020-11-12 17:51 UTC (permalink / raw)
  To: gcc-patches; +Cc: Hongtao Liu, Richard Biener, Richard Sandiford

On Thu, Nov 12, 2020 at 2:59 PM Richard Biener
<richard.guenther@gmail.com> wrote:
> > > > > > > > gcc/ChangeLog:
> > > > > > > >
> > > > > > > > PR target/97194
> > > > > > > > * config/i386/i386-expand.c (ix86_expand_vector_set_var): New function.
> > > > > > > > * config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl.
> > > > > > > > * config/i386/predicates.md (vec_setm_operand): New predicate,
> > > > > > > > true for const_int_operand or register_operand under TARGET_AVX2.
> > > > > > > > * config/i386/sse.md (vec_set<mode>): Support both constant
> > > > > > > > and variable index vec_set.
> > > > > > > >
> > > > > > > > gcc/testsuite/ChangeLog:
> > > > > > > >
> > > > > > > > * gcc.target/i386/avx2-vec-set-1.c: New test.
> > > > > > > > * gcc.target/i386/avx2-vec-set-2.c: New test.
> > > > > > > > * gcc.target/i386/avx512bw-vec-set-1.c: New test.
> > > > > > > > * gcc.target/i386/avx512bw-vec-set-2.c: New test.
> > > > > > > > * gcc.target/i386/avx512f-vec-set-2.c: New test.
> > > > > > > > * gcc.target/i386/avx512vl-vec-set-2.c: New test.
> > > > > > >
> > > > > > > +;; True for registers, or const_int_operand, used to vec_setm expander.
> > > > > > > +(define_predicate "vec_setm_operand"
> > > > > > > +  (ior (and (match_operand 0 "register_operand")
> > > > > > > +    (match_test "TARGET_AVX2"))
> > > > > > > +       (match_code "const_int")))
> > > > > > > +
> > > > > > >  ;; True for registers, or 1 or -1.  Used to optimize double-word shifts.
> > > > > > >  (define_predicate "reg_or_pm1_operand"
> > > > > > >    (ior (match_operand 0 "register_operand")
> > > > > > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > > > > > index b153a87fb98..1798e5dea75 100644
> > > > > > > --- a/gcc/config/i386/sse.md
> > > > > > > +++ b/gcc/config/i386/sse.md
> > > > > > > @@ -8098,11 +8098,14 @@ (define_insn "vec_setv2df_0"
> > > > > > >  (define_expand "vec_set<mode>"
> > > > > > >    [(match_operand:V 0 "register_operand")
> > > > > > >     (match_operand:<ssescalarmode> 1 "register_operand")
> > > > > > > -   (match_operand 2 "const_int_operand")]
> > > > > > > +   (match_operand 2 "vec_setm_operand")]
> > > > > > >
> > > > > > > You need to specify a mode, otherwise a register of any mode can pass here.
> > > > > > >
> > > > > > Yes, theoretically, we only accept integer types. But in can_vec_set_var_idx_p
> > > > > > cut
> > > > > > ---
> > > > > > bool
> > > > > > can_vec_set_var_idx_p (machine_mode vec_mode)
> > > > > > {
> > > > > >   if (!VECTOR_MODE_P (vec_mode))
> > > > > >     return false;
> > > > > >
> > > > > >   machine_mode inner_mode = GET_MODE_INNER (vec_mode);
> > > > > >   rtx reg1 = alloca_raw_REG (vec_mode, LAST_VIRTUAL_REGISTER + 1);
> > > > > >   rtx reg2 = alloca_raw_REG (inner_mode, LAST_VIRTUAL_REGISTER + 2);
> > > > > >   rtx reg3 = alloca_raw_REG (VOIDmode, LAST_VIRTUAL_REGISTER + 3);
> > > > > >
> > > > > >   enum insn_code icode = optab_handler (vec_set_optab, vec_mode);
> > > > > >
> > > > > >   return icode != CODE_FOR_nothing && insn_operand_matches (icode, 0, reg1)
> > > > > >          && insn_operand_matches (icode, 1, reg2)
> > > > > >          && insn_operand_matches (icode, 2, reg3);
> > > > > > }
> > > > > > ---
> > > > > >
> > > > > > reg3 is assumed to be VOIDmode, set anymode in match_operand 2 will
> > > > > > fail insn_operand_matches (icode, 2, reg3)
> > > > > > ---
> > > > > > (gdb) p insn_operand_matches(icode,2,reg3)
> > > > > > $5 = false
> > > > > > (gdb)
> > > > > > ---
> > > > > >
> > > > > > Maybe we need to change
> > > > > >
> > > > > > rtx reg3 = alloca_raw_REG (VOIDmode, LAST_VIRTUAL_REGISTER + 3);
> > > > > >
> > > > > > to
> > > > > >
> > > > > > rtx reg3 = alloca_raw_REG (SImode, LAST_VIRTUAL_REGISTER + 3);
> > > > > >
> > > > > > cc Richard Biener, any thoughts?
> > > > >
> > > > > There are two targets (gcn in gcn-valu.md and s390 in vector.md) that
> > > > > specify SImode for operand 2 in vec_setM pattern and allow register
> > > > > operands. I wonder if and how they manage to generate the pattern.
> > > > >
> > > > > Uros.
> > > >
> > > > Variable index vec_set is enabled by r11-3486, about two months ago in
> > > > [1]. But for the upper two targets, the codes are already there since
> > > > GCC10(maybe earlier, i just looked at gcc10 branch), I don't think
> > > > those codes are for [1].
> > > >
> > > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555905.html
> > > >
> > > >
> > > > --
> > > > BR,
> > > > Hongtao
> > >
> > > Correct [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554240.html
> > >
> > > --
> > > BR,
> > > Hongtao
> >
> > in https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554592.html
> >
> > It says
> >
> > > >> +can_vec_set_var_idx_p (enum tree_code code, machine_mode vec_mode,
> > > >> +                      machine_mode value_mode, machine_mode idx_mode)
> > > >
> > > > toplevel comment missing
> > > >
> > > >> +{
> > > >> +  gcc_assert (code == VECTOR_TYPE);
> > > >
> > > > what's the point of pasing 'code' here then?  Since the optab only has a single
> > > > mode, the vector mode, the value_mode is redundant as well.  And I guess
> > > > we might want to handle "arbitrary" index modes?  That is, the .md expanders
> > > > should not restrict its mode - I guess it simply uses VOIDmode at the moment
> > > > (for integer constants).  Not sure how to best do this without an explicit mode
> > > > in the optab ...
> > >
> > > Yes, removed 'code' and value_mode by checking VECTOR_MODE_P and use GET_MODE_INNER
> > > for value_mode.  ".md expanders" shall support for integer constants index mode, but
> > > I guess they shouldn't be expanded by IFN as this function is for variable index
> > > insert only?  Anyway, the v3 patch used VOIDmode check...
>
> I'm not sure what best to do here, as said accepting "any" (integer) mode as
> input is desirable (SImode, DImode but eventually also smaller modes).  How
> that can be best achieved I don't know.

I was expecting something similar to how extvM/extzvM operands are
handled here. We have:

    Operands 0 and 1 both have mode M.  Operands 2 and 3 have a
    target-specific mode.

Please note operands 2 and 3 having a "target-specific" mode, handled
in optabs-query.c as:

  machine_mode struct_mode = data->operand[struct_op].mode;
  if (struct_mode == VOIDmode)
    struct_mode = word_mode;
  if (mode != struct_mode)
    return false;

> Why's not specifying any mode in the patter no good?  Just make sure you
> appropriately extend/subreg it?  We can make sure it will be an integer
> mode in the expander itself.

IIRC, having known mode, expanders can use create_convert_operand_to,
and the middle-end will do the above by itself. Also note that at
least two targets specify SImode, so register operands are currently
ineffective there.

Adding RichardS to CC for the expand part.

Uros.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] [PR target/97194] [AVX2] Support variable index vec_set.
  2020-11-12 17:51             ` Uros Bizjak
@ 2020-11-12 18:26               ` Uros Bizjak
  2020-11-12 18:34                 ` Uros Bizjak
  0 siblings, 1 reply; 23+ messages in thread
From: Uros Bizjak @ 2020-11-12 18:26 UTC (permalink / raw)
  To: gcc-patches; +Cc: Hongtao Liu, Richard Biener, Richard Sandiford

On Thu, Nov 12, 2020 at 6:51 PM Uros Bizjak <ubizjak@gmail.com> wrote:

> > > > Yes, removed 'code' and value_mode by checking VECTOR_MODE_P and use GET_MODE_INNER
> > > > for value_mode.  ".md expanders" shall support for integer constants index mode, but
> > > > I guess they shouldn't be expanded by IFN as this function is for variable index
> > > > insert only?  Anyway, the v3 patch used VOIDmode check...
> >
> > I'm not sure what best to do here, as said accepting "any" (integer) mode as
> > input is desirable (SImode, DImode but eventually also smaller modes).  How
> > that can be best achieved I don't know.
>
> I was expecting something similar to how extvM/extzvM operands are
> handled here. We have:
>
>     Operands 0 and 1 both have mode M.  Operands 2 and 3 have a
>     target-specific mode.
>
> Please note operands 2 and 3 having a "target-specific" mode, handled
> in optabs-query.c as:
>
>   machine_mode struct_mode = data->operand[struct_op].mode;
>   if (struct_mode == VOIDmode)
>     struct_mode = word_mode;
>   if (mode != struct_mode)
>     return false;
>
> > Why's not specifying any mode in the patter no good?  Just make sure you
> > appropriately extend/subreg it?  We can make sure it will be an integer
> > mode in the expander itself.
>
> IIRC, having known mode, expanders can use create_convert_operand_to,
> and the middle-end will do the above by itself. Also note that at
> least two targets specify SImode, so register operands are currently
> ineffective there.

On a related note, the pattern is currently expanded as (see
store_bit_field_1 in expmed.c):

      create_fixed_operand (&ops[0], op0);
      create_input_operand (&ops[1], value, innermode);
      create_integer_operand (&ops[2], pos);

I don't think calling create_integer_operand on register operand is
correct. The function comment says:

/* Make OP describe an input operand that has value INTVAL and that has
   no inherent mode.  This function should only be used for operands that
   are always expand-time constants.  The backend may request that INTVAL
   be copied into a different kind of rtx, but it must specify the mode
   of that rtx if so.  */

Uros.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] [PR target/97194] [AVX2] Support variable index vec_set.
  2020-11-12 18:26               ` Uros Bizjak
@ 2020-11-12 18:34                 ` Uros Bizjak
  0 siblings, 0 replies; 23+ messages in thread
From: Uros Bizjak @ 2020-11-12 18:34 UTC (permalink / raw)
  To: gcc-patches; +Cc: Hongtao Liu, Richard Biener, Richard Sandiford

On Thu, Nov 12, 2020 at 7:26 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Thu, Nov 12, 2020 at 6:51 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> > > > > Yes, removed 'code' and value_mode by checking VECTOR_MODE_P and use GET_MODE_INNER
> > > > > for value_mode.  ".md expanders" shall support for integer constants index mode, but
> > > > > I guess they shouldn't be expanded by IFN as this function is for variable index
> > > > > insert only?  Anyway, the v3 patch used VOIDmode check...
> > >
> > > I'm not sure what best to do here, as said accepting "any" (integer) mode as
> > > input is desirable (SImode, DImode but eventually also smaller modes).  How
> > > that can be best achieved I don't know.
> >
> > I was expecting something similar to how extvM/extzvM operands are
> > handled here. We have:
> >
> >     Operands 0 and 1 both have mode M.  Operands 2 and 3 have a
> >     target-specific mode.
> >
> > Please note operands 2 and 3 having a "target-specific" mode, handled
> > in optabs-query.c as:
> >
> >   machine_mode struct_mode = data->operand[struct_op].mode;
> >   if (struct_mode == VOIDmode)
> >     struct_mode = word_mode;
> >   if (mode != struct_mode)
> >     return false;
> >
> > > Why's not specifying any mode in the patter no good?  Just make sure you
> > > appropriately extend/subreg it?  We can make sure it will be an integer
> > > mode in the expander itself.
> >
> > IIRC, having known mode, expanders can use create_convert_operand_to,
> > and the middle-end will do the above by itself. Also note that at
> > least two targets specify SImode, so register operands are currently
> > ineffective there.
>
> On a related note, the pattern is currently expanded as (see
> store_bit_field_1 in expmed.c):
>
>       create_fixed_operand (&ops[0], op0);
>       create_input_operand (&ops[1], value, innermode);
>       create_integer_operand (&ops[2], pos);
>
> I don't think calling create_integer_operand on register operand is
> correct. The function comment says:
>
> /* Make OP describe an input operand that has value INTVAL and that has
>    no inherent mode.  This function should only be used for operands that
>    are always expand-time constants.  The backend may request that INTVAL
>    be copied into a different kind of rtx, but it must specify the mode
>    of that rtx if so.  */

Ah, sorry - variable vec_set takes a different path, please disregard
my last message.

Uros.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] [PR target/97194] [AVX2] Support variable index vec_set.
  2020-11-12  9:12     ` Hongtao Liu
  2020-11-12  9:15       ` Hongtao Liu
@ 2020-11-16 10:16       ` Uros Bizjak
  1 sibling, 0 replies; 23+ messages in thread
From: Uros Bizjak @ 2020-11-16 10:16 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: gcc-patches, Richard Biener

On Thu, Nov 12, 2020 at 10:10 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Thu, Nov 12, 2020 at 4:21 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Thu, Nov 12, 2020 at 3:04 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > PR target/97194
> > > > > * config/i386/i386-expand.c (ix86_expand_vector_set_var): New function.
> > > > > * config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl.
> > > > > * config/i386/predicates.md (vec_setm_operand): New predicate,
> > > > > true for const_int_operand or register_operand under TARGET_AVX2.
> > > > > * config/i386/sse.md (vec_set<mode>): Support both constant
> > > > > and variable index vec_set.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > > * gcc.target/i386/avx2-vec-set-1.c: New test.
> > > > > * gcc.target/i386/avx2-vec-set-2.c: New test.
> > > > > * gcc.target/i386/avx512bw-vec-set-1.c: New test.
> > > > > * gcc.target/i386/avx512bw-vec-set-2.c: New test.
> > > > > * gcc.target/i386/avx512f-vec-set-2.c: New test.
> > > > > * gcc.target/i386/avx512vl-vec-set-2.c: New test.
> > > >
> > > > +;; True for registers, or const_int_operand, used to vec_setm expander.
> > > > +(define_predicate "vec_setm_operand"
> > > > +  (ior (and (match_operand 0 "register_operand")
> > > > +    (match_test "TARGET_AVX2"))
> > > > +       (match_code "const_int")))
> > > > +
> > > >  ;; True for registers, or 1 or -1.  Used to optimize double-word shifts.
> > > >  (define_predicate "reg_or_pm1_operand"
> > > >    (ior (match_operand 0 "register_operand")
> > > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > > index b153a87fb98..1798e5dea75 100644
> > > > --- a/gcc/config/i386/sse.md
> > > > +++ b/gcc/config/i386/sse.md
> > > > @@ -8098,11 +8098,14 @@ (define_insn "vec_setv2df_0"
> > > >  (define_expand "vec_set<mode>"
> > > >    [(match_operand:V 0 "register_operand")
> > > >     (match_operand:<ssescalarmode> 1 "register_operand")
> > > > -   (match_operand 2 "const_int_operand")]
> > > > +   (match_operand 2 "vec_setm_operand")]
> > > >
> > > > You need to specify a mode, otherwise a register of any mode can pass here.
> > > >
> > > Yes, theoretically, we only accept integer types. But in can_vec_set_var_idx_p
> > > cut
> > > ---
> > > bool
> > > can_vec_set_var_idx_p (machine_mode vec_mode)
> > > {
> > >   if (!VECTOR_MODE_P (vec_mode))
> > >     return false;
> > >
> > >   machine_mode inner_mode = GET_MODE_INNER (vec_mode);
> > >   rtx reg1 = alloca_raw_REG (vec_mode, LAST_VIRTUAL_REGISTER + 1);
> > >   rtx reg2 = alloca_raw_REG (inner_mode, LAST_VIRTUAL_REGISTER + 2);
> > >   rtx reg3 = alloca_raw_REG (VOIDmode, LAST_VIRTUAL_REGISTER + 3);
> > >
> > >   enum insn_code icode = optab_handler (vec_set_optab, vec_mode);
> > >
> > >   return icode != CODE_FOR_nothing && insn_operand_matches (icode, 0, reg1)
> > >          && insn_operand_matches (icode, 1, reg2)
> > >          && insn_operand_matches (icode, 2, reg3);
> > > }
> > > ---
> > >
> > > reg3 is assumed to be VOIDmode, set anymode in match_operand 2 will
> > > fail insn_operand_matches (icode, 2, reg3)
> > > ---
> > > (gdb) p insn_operand_matches(icode,2,reg3)
> > > $5 = false
> > > (gdb)
> > > ---
> > >
> > > Maybe we need to change
> > >
> > > rtx reg3 = alloca_raw_REG (VOIDmode, LAST_VIRTUAL_REGISTER + 3);
> > >
> > > to
> > >
> > > rtx reg3 = alloca_raw_REG (SImode, LAST_VIRTUAL_REGISTER + 3);
> > >
> > > cc Richard Biener, any thoughts?
> >
> > There are two targets (gcn in gcn-valu.md and s390 in vector.md) that
> > specify SImode for operand 2 in vec_setM pattern and allow register
> > operands. I wonder if and how they manage to generate the pattern.
> >
> > Uros.
>
> Variable index vec_set is enabled by r11-3486, about two months ago in
> [1]. But for the upper two targets, the codes are already there since
> GCC10(maybe earlier, i just looked at gcc10 branch), I don't think
> those codes are for [1].

OK, let's proceed with the modeless operand.

The patch is OK for mainline.

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] [PR target/97194] [AVX2] Support variable index vec_set.
  2020-11-12 13:59           ` Richard Biener
  2020-11-12 17:51             ` Uros Bizjak
@ 2020-11-16 11:57             ` Uros Bizjak
  2020-11-19 10:54               ` Richard Sandiford
  1 sibling, 1 reply; 23+ messages in thread
From: Uros Bizjak @ 2020-11-16 11:57 UTC (permalink / raw)
  To: gcc-patches; +Cc: Hongtao Liu, Richard Biener, krebbel, ams

[-- Attachment #1: Type: text/plain, Size: 442 bytes --]

On Thu, Nov 12, 2020 at 2:59 PM Richard Biener
<richard.guenther@gmail.com> wrote:

> I'm not sure what best to do here, as said accepting "any" (integer) mode as
> input is desirable (SImode, DImode but eventually also smaller modes).  How
> that can be best achieved I don't know.

FTR, attached patch *should* allow s390 and amdgcn to emit vec_set
with SImode variable index operand, but I was not able to test the
patch by myself.

Uros.

[-- Attachment #2: p.diff.txt --]
[-- Type: text/plain, Size: 886 bytes --]

diff --git a/gcc/optabs.c b/gcc/optabs.c
index 1820b91877a..02ba599c373 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -3863,12 +3863,17 @@ can_vec_set_var_idx_p (machine_mode vec_mode)
     return false;
 
   machine_mode inner_mode = GET_MODE_INNER (vec_mode);
+
   rtx reg1 = alloca_raw_REG (vec_mode, LAST_VIRTUAL_REGISTER + 1);
   rtx reg2 = alloca_raw_REG (inner_mode, LAST_VIRTUAL_REGISTER + 2);
-  rtx reg3 = alloca_raw_REG (VOIDmode, LAST_VIRTUAL_REGISTER + 3);
 
   enum insn_code icode = optab_handler (vec_set_optab, vec_mode);
 
+  const struct insn_data_d *data = &insn_data[icode];
+  machine_mode idx_mode = data->operand[2].mode;
+
+  rtx reg3 = alloca_raw_REG (idx_mode, LAST_VIRTUAL_REGISTER + 3);
+
   return icode != CODE_FOR_nothing && insn_operand_matches (icode, 0, reg1)
 	 && insn_operand_matches (icode, 1, reg2)
 	 && insn_operand_matches (icode, 2, reg3);

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] [PR target/97194] [AVX2] Support variable index vec_set.
  2020-11-16 11:57             ` Uros Bizjak
@ 2020-11-19 10:54               ` Richard Sandiford
  0 siblings, 0 replies; 23+ messages in thread
From: Richard Sandiford @ 2020-11-19 10:54 UTC (permalink / raw)
  To: Uros Bizjak via Gcc-patches; +Cc: Uros Bizjak, ams

Sorry for the late reply.  I somehow managed to miss this thread until now
despite being cc:ed.

> > I'm not sure what best to do here, as said accepting "any" (integer) mode as
> > input is desirable (SImode, DImode but eventually also smaller modes).  How
> > that can be best achieved I don't know.
>
> I was expecting something similar to how extvM/extzvM operands are
> handled here. We have:
>
>     Operands 0 and 1 both have mode M.  Operands 2 and 3 have a
>     target-specific mode.
>
> Please note operands 2 and 3 having a "target-specific" mode, handled
> in optabs-query.c as:

>   machine_mode struct_mode = data->operand[struct_op].mode;
>   if (struct_mode == VOIDmode)
>     struct_mode = word_mode;
>   if (mode != struct_mode)
>     return false;
>
> > Why's not specifying any mode in the patter no good?  Just make sure you
> > appropriately extend/subreg it?  We can make sure it will be an integer
> > mode in the expander itself.
>
> IIRC, having known mode, expanders can use create_convert_operand_to,
> and the middle-end will do the above by itself. Also note that at
> least two targets specify SImode, so register operands are currently
> ineffective there.

Yeah, I agree create_convert_operand_to is the right way to handle
this kind of situation.

Uros Bizjak via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On Thu, Nov 12, 2020 at 2:59 PM Richard Biener
> <richard.guenther@gmail.com> wrote:
>
>> I'm not sure what best to do here, as said accepting "any" (integer) mode as
>> input is desirable (SImode, DImode but eventually also smaller modes).  How
>> that can be best achieved I don't know.
>
> FTR, attached patch *should* allow s390 and amdgcn to emit vec_set
> with SImode variable index operand, but I was not able to test the
> patch by myself.

LGTM FWIW.  I think this is preferable to a modeless operand,
which IMO should always be a last resort.

Thanks,
Richard

>
> Uros.
>
> diff --git a/gcc/optabs.c b/gcc/optabs.c
> index 1820b91877a..02ba599c373 100644
> --- a/gcc/optabs.c
> +++ b/gcc/optabs.c
> @@ -3863,12 +3863,17 @@ can_vec_set_var_idx_p (machine_mode vec_mode)
>      return false;
>  
>    machine_mode inner_mode = GET_MODE_INNER (vec_mode);
> +
>    rtx reg1 = alloca_raw_REG (vec_mode, LAST_VIRTUAL_REGISTER + 1);
>    rtx reg2 = alloca_raw_REG (inner_mode, LAST_VIRTUAL_REGISTER + 2);
> -  rtx reg3 = alloca_raw_REG (VOIDmode, LAST_VIRTUAL_REGISTER + 3);
>  
>    enum insn_code icode = optab_handler (vec_set_optab, vec_mode);
>  
> +  const struct insn_data_d *data = &insn_data[icode];
> +  machine_mode idx_mode = data->operand[2].mode;
> +
> +  rtx reg3 = alloca_raw_REG (idx_mode, LAST_VIRTUAL_REGISTER + 3);
> +
>    return icode != CODE_FOR_nothing && insn_operand_matches (icode, 0, reg1)
>  	 && insn_operand_matches (icode, 1, reg2)
>  	 && insn_operand_matches (icode, 2, reg3);

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] [PR target/97194] [AVX2] Support variable index vec_set.
  2020-11-25 19:15               ` Jeff Law
@ 2020-11-26  1:45                 ` Hongtao Liu
  0 siblings, 0 replies; 23+ messages in thread
From: Hongtao Liu @ 2020-11-26  1:45 UTC (permalink / raw)
  To: Jeff Law; +Cc: GCC Patches, Kirill Yukhin

Thanks for the review.
BTW, the patch is already installed because uros helped to review this
patch in another thread
https://gcc.gnu.org/pipermail/gcc-patches/2020-November/558682.html

On Thu, Nov 26, 2020 at 3:15 AM Jeff Law <law@redhat.com> wrote:
>
>
>
> On 11/11/20 1:03 AM, Hongtao Liu via Gcc-patches wrote:
>
> >
> >
> >
> > vec_set_rebaserebase_onr11-4901.patch
> >
> > From c9d684c37b5f79f68f938f39eeb9e7989b10302d Mon Sep 17 00:00:00 2001
> > From: liuhongt <hongtao.liu@intel.com>
> > Date: Mon, 19 Oct 2020 16:04:39 +0800
> > Subject: [PATCH] Support variable index vec_set.
> >
> > gcc/ChangeLog:
> >
> >       PR target/97194
> >       * config/i386/i386-expand.c (ix86_expand_vector_set_var): New function.
> >       * config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl.
> >       * config/i386/predicates.md (vec_setm_operand): New predicate,
> >       true for const_int_operand or register_operand under TARGET_AVX2.
> >       * config/i386/sse.md (vec_set<mode>): Support both constant
> >       and variable index vec_set.
> >
> > gcc/testsuite/ChangeLog:
> >
> >       * gcc.target/i386/avx2-vec-set-1.c: New test.
> >       * gcc.target/i386/avx2-vec-set-2.c: New test.
> >       * gcc.target/i386/avx512bw-vec-set-1.c: New test.
> >       * gcc.target/i386/avx512bw-vec-set-2.c: New test.
> >       * gcc.target/i386/avx512f-vec-set-2.c: New test.
> >       * gcc.target/i386/avx512vl-vec-set-2.c: New test.
> This is OK.  Sorry for the delays.
>
> jeff
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] [PR target/97194] [AVX2] Support variable index vec_set.
  2020-11-11  8:03             ` Hongtao Liu
@ 2020-11-25 19:15               ` Jeff Law
  2020-11-26  1:45                 ` Hongtao Liu
  0 siblings, 1 reply; 23+ messages in thread
From: Jeff Law @ 2020-11-25 19:15 UTC (permalink / raw)
  To: Hongtao Liu, GCC Patches, Kirill Yukhin



On 11/11/20 1:03 AM, Hongtao Liu via Gcc-patches wrote:

>
>
>
> vec_set_rebaserebase_onr11-4901.patch
>
> From c9d684c37b5f79f68f938f39eeb9e7989b10302d Mon Sep 17 00:00:00 2001
> From: liuhongt <hongtao.liu@intel.com>
> Date: Mon, 19 Oct 2020 16:04:39 +0800
> Subject: [PATCH] Support variable index vec_set.
>
> gcc/ChangeLog:
>
> 	PR target/97194
> 	* config/i386/i386-expand.c (ix86_expand_vector_set_var): New function.
> 	* config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl.
> 	* config/i386/predicates.md (vec_setm_operand): New predicate,
> 	true for const_int_operand or register_operand under TARGET_AVX2.
> 	* config/i386/sse.md (vec_set<mode>): Support both constant
> 	and variable index vec_set.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.target/i386/avx2-vec-set-1.c: New test.
> 	* gcc.target/i386/avx2-vec-set-2.c: New test.
> 	* gcc.target/i386/avx512bw-vec-set-1.c: New test.
> 	* gcc.target/i386/avx512bw-vec-set-2.c: New test.
> 	* gcc.target/i386/avx512f-vec-set-2.c: New test.
> 	* gcc.target/i386/avx512vl-vec-set-2.c: New test.
This is OK.  Sorry for the delays.

jeff


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] [PR target/97194] [AVX2] Support variable index vec_set.
  2020-10-27  7:51           ` Hongtao Liu
@ 2020-11-11  8:03             ` Hongtao Liu
  2020-11-25 19:15               ` Jeff Law
  0 siblings, 1 reply; 23+ messages in thread
From: Hongtao Liu @ 2020-11-11  8:03 UTC (permalink / raw)
  To: GCC Patches, Kirill Yukhin

[-- Attachment #1: Type: text/plain, Size: 5888 bytes --]

ping ^3

Rebase patch on latest trunk.

On Tue, Oct 27, 2020 at 3:51 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> ping^1
>
> On Tue, Oct 20, 2020 at 3:36 PM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Tue, Oct 20, 2020 at 4:35 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > >
> > > On Mon, Oct 19, 2020 at 5:55 PM Richard Biener
> > > <richard.guenther@gmail.com> wrote:
> > > >
> > > > On Mon, Oct 19, 2020 at 11:37 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > > > >
> > > > > On Mon, Oct 19, 2020 at 5:07 PM Richard Biener
> > > > > <richard.guenther@gmail.com> wrote:
> > > > > >
> > > > > > On Mon, Oct 19, 2020 at 10:21 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > > > > > >
> > > > > > > Hi:
> > > > > > >   It's implemented as below:
> > > > > > > V setg (V v, int idx, T val)
> > > > > > >
> > > > > > > {
> > > > > > >   V idxv = (V){idx, idx, idx, idx, idx, idx, idx, idx};
> > > > > > >   V valv = (V){val, val, val, val, val, val, val, val};
> > > > > > >   V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == idxv);
> > > > > > >   v = (v & ~mask) | (valv & mask);
> > > > > > >   return v;
> > > > > > > }
> > > > > > >
> > > > > > > Bootstrap is fine, regression test for i386/x86-64 backend is ok.
> > > > > > > Ok for trunk?
> > > > > >
> > > > > > Hmm, I guess you're trying to keep the code for !AVX512BW simple
> > > > > > but isn't just splitting the compare into
> > > > > >
> > > > > >  clow = {0, 1, 2, 3 ... } == idxv
> > > > > >  chigh = {16, 17, 18, ... } == idxv;
> > > > > >  cmp = {clow, chigh}
> > > > > >
> > > > >
> > > > > We also don't have 512-bits byte/word blend instructions without
> > > > > TARGET_AVX512W, so how to use 512-bits cmp?
> > > >
> > > > Oh, I see.  Guess two back-to-back vpternlog could emulate
> > >
> > > Yes, we can have something like vpternlogd %zmm0, %zmm1, %zmm2, 0xD8,
> > > but since we don't have 512-bits bytes/word broadcast instruction,
> > > It would need 2 broadcast and 1 vec_concat to get 1 512-bits vector.
> > > it wouldn't save many instructions compared to my version(as below).
> > >
> > > ---
> > >         leal    -16(%rsi), %eax
> > >         vmovd   %edi, %xmm2
> > >         vmovdqa .LC0(%rip), %ymm4
> > >         vextracti64x4   $0x1, %zmm0, %ymm3
> > >         vmovd   %eax, %xmm1
> > >         vpbroadcastw    %xmm2, %ymm2
> > >         vpbroadcastw    %xmm1, %ymm1
> > >         vpcmpeqw        %ymm4, %ymm1, %ymm1
> > >         vpblendvb       %ymm1, %ymm2, %ymm3, %ymm3
> > >         vmovd   %esi, %xmm1
> > >         vpbroadcastw    %xmm1, %ymm1
> > >         vpcmpeqw        %ymm4, %ymm1, %ymm1
> > >         vpblendvb       %ymm1, %ymm2, %ymm0, %ymm0
> > >         vinserti64x4    $0x1, %ymm3, %zmm0, %zmm0
> > > ---
> > >
> > > > the blend?  Not sure if important - I recall only knl didn't have bw?
> > > >
> > >
> > > Yes, after(including) SKX, all avx512 targets will support AVX512BW.
> > > And i don't think performance for V32HI/V64QI without AVX512BW is important.
> >
> > True.
> >
> > I have no further comments on the patch then - it still needs i386 maintainer
> > approval though.
> >
> > Thanks,
> > Richard.
> >
> > >
> > > > > cut from i386-expand.c:
> > > > > in ix86_expand_sse_movcc
> > > > >  3682    case E_V64QImode:
> > > > >  3683      gen = gen_avx512bw_blendmv64qi; ---> TARGET_AVX512BW needed
> > > > >  3684      break;
> > > > >  3685    case E_V32HImode:
> > > > >  3686      gen = gen_avx512bw_blendmv32hi; --> TARGET_AVX512BW needed
> > > > >  3687      break;
> > > > >  3688    case E_V16SImode:
> > > > >  3689      gen = gen_avx512f_blendmv16si;
> > > > >  3690      break;
> > > > >  3691    case E_V8DImode:
> > > > >  3692      gen = gen_avx512f_blendmv8di;
> > > > >  3693      break;
> > > > >  3694    case E_V8DFmode:
> > > > >
> > > > > > faster, smaller and eventually even easier during expansion?
> > > > > >
> > > > > > +  gcc_assert (ix86_expand_vector_init_duplicate (false, mode, valv, val));
> > > > > > +  gcc_assert (ix86_expand_vector_init_duplicate (false, cmp_mode,
> > > > > > idxv, idx_tmp));
> > > > > >
> > > > > > side-effects in gcc_assert is considered bad style, use
> > > > > >
> > > > > >   ok = ix86_expand_vector_init_duplicate (false, mode, valv, val);
> > > > > >   gcc_assert (ok);
> > > > > >
> > > > > > +  vec[5] = constv;
> > > > > > +  ix86_expand_int_vcond (vec);
> > > > > >
> > > > > > this also returns a bool you probably should assert true.
> > > > > >
> > > > >
> > > > > Yes, will change.
> > > > >
> > > > > > Otherwise thanks for tackling this.
> > > > > >
> > > > > > Richard.
> > > > > >
> > > > > > > gcc/ChangeLog:
> > > > > > >
> > > > > > >         PR target/97194
> > > > > > >         * config/i386/i386-expand.c (ix86_expand_vector_set_var): New function.
> > > > > > >         * config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl.
> > > > > > >         * config/i386/predicates.md (vec_setm_operand): New predicate,
> > > > > > >         true for const_int_operand or register_operand under TARGET_AVX2.
> > > > > > >         * config/i386/sse.md (vec_set<mode>): Support both constant
> > > > > > >         and variable index vec_set.
> > > > > > >
> > > > > > > gcc/testsuite/ChangeLog:
> > > > > > >
> > > > > > >         * gcc.target/i386/avx2-vec-set-1.c: New test.
> > > > > > >         * gcc.target/i386/avx2-vec-set-2.c: New test.
> > > > > > >         * gcc.target/i386/avx512bw-vec-set-1.c: New test.
> > > > > > >         * gcc.target/i386/avx512bw-vec-set-2.c: New test.
> > > > > > >         * gcc.target/i386/avx512f-vec-set-2.c: New test.
> > > > > > >         * gcc.target/i386/avx512vl-vec-set-2.c: New test.
> > > > > > >
> > > > > > > --
> > > > > > > BR,
> > > > > > > Hongtao
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > BR,
> > > > > Hongtao
> > >
> > >
> > >
> > > --
> > > BR,
> > > Hongtao
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao

[-- Attachment #2: vec_set_rebaserebase_onr11-4901.patch --]
[-- Type: text/x-patch, Size: 15186 bytes --]

From c9d684c37b5f79f68f938f39eeb9e7989b10302d Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Mon, 19 Oct 2020 16:04:39 +0800
Subject: [PATCH] Support variable index vec_set.

gcc/ChangeLog:

	PR target/97194
	* config/i386/i386-expand.c (ix86_expand_vector_set_var): New function.
	* config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl.
	* config/i386/predicates.md (vec_setm_operand): New predicate,
	true for const_int_operand or register_operand under TARGET_AVX2.
	* config/i386/sse.md (vec_set<mode>): Support both constant
	and variable index vec_set.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx2-vec-set-1.c: New test.
	* gcc.target/i386/avx2-vec-set-2.c: New test.
	* gcc.target/i386/avx512bw-vec-set-1.c: New test.
	* gcc.target/i386/avx512bw-vec-set-2.c: New test.
	* gcc.target/i386/avx512f-vec-set-2.c: New test.
	* gcc.target/i386/avx512vl-vec-set-2.c: New test.
---
 gcc/config/i386/i386-expand.c                 | 106 ++++++++++++++++++
 gcc/config/i386/i386-protos.h                 |   1 +
 gcc/config/i386/predicates.md                 |   6 +
 gcc/config/i386/sse.md                        |   9 +-
 .../gcc.target/i386/avx2-vec-set-1.c          |  49 ++++++++
 .../gcc.target/i386/avx2-vec-set-2.c          |  50 +++++++++
 .../gcc.target/i386/avx512bw-vec-set-1.c      |  20 ++++
 .../gcc.target/i386/avx512bw-vec-set-2.c      |  44 ++++++++
 .../gcc.target/i386/avx512f-vec-set-2.c       |  42 +++++++
 .../gcc.target/i386/avx512vl-vec-set-2.c      |  55 +++++++++
 10 files changed, 379 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx2-vec-set-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx2-vec-set-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-vec-set-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-vec-set-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vec-set-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-vec-set-2.c

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 6f81b58a08e..af65a4dde68 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -14540,6 +14540,112 @@ ix86_expand_vector_init (bool mmx_ok, rtx target, rtx vals)
   ix86_expand_vector_init_general (mmx_ok, mode, target, vals);
 }
 
+/* Implemented as
+   V setg (V v, int idx, T val)
+   {
+     V idxv = (V){idx, idx, idx, idx, idx, idx, idx, idx};
+     V valv = (V){val, val, val, val, val, val, val, val};
+     V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == idxv);
+     v = (v & ~mask) | (valv & mask);
+     return v;
+   }.  */
+void
+ix86_expand_vector_set_var (rtx target, rtx val, rtx idx)
+{
+  rtx vec[64];
+  machine_mode mode = GET_MODE (target);
+  machine_mode cmp_mode = mode;
+  int n_elts = GET_MODE_NUNITS (mode);
+  rtx valv,idxv,constv,idx_tmp;
+  bool ok = false;
+
+  /* 512-bits vector byte/word broadcast and comparison only available
+     under TARGET_AVX512BW, break 512-bits vector into two 256-bits vector
+     when without TARGET_AVX512BW.  */
+  if ((mode == V32HImode || mode == V64QImode) && !TARGET_AVX512BW)
+    {
+      gcc_assert (TARGET_AVX512F);
+      rtx vhi, vlo, idx_hi;
+      machine_mode half_mode;
+      rtx (*extract_hi)(rtx, rtx);
+      rtx (*extract_lo)(rtx, rtx);
+
+      if (mode == V32HImode)
+	{
+	  half_mode = V16HImode;
+	  extract_hi = gen_vec_extract_hi_v32hi;
+	  extract_lo = gen_vec_extract_lo_v32hi;
+	}
+      else
+	{
+	  half_mode = V32QImode;
+	  extract_hi = gen_vec_extract_hi_v64qi;
+	  extract_lo = gen_vec_extract_lo_v64qi;
+	}
+
+      vhi = gen_reg_rtx (half_mode);
+      vlo = gen_reg_rtx (half_mode);
+      idx_hi = gen_reg_rtx (GET_MODE (idx));
+      emit_insn (extract_hi (vhi, target));
+      emit_insn (extract_lo (vlo, target));
+      vec[0] = idx_hi;
+      vec[1] = idx;
+      vec[2] = GEN_INT (n_elts/2);
+      ix86_expand_binary_operator (MINUS, GET_MODE (idx), vec);
+      ix86_expand_vector_set_var (vhi, val, idx_hi);
+      ix86_expand_vector_set_var (vlo, val, idx);
+      emit_insn (gen_rtx_SET (target, gen_rtx_VEC_CONCAT (mode, vlo, vhi)));
+      return;
+    }
+
+  if (FLOAT_MODE_P (GET_MODE_INNER (mode)))
+    {
+      switch (mode)
+	{
+	case E_V2DFmode:
+	  cmp_mode = V2DImode;
+	  break;
+	case E_V4DFmode:
+	  cmp_mode = V4DImode;
+	  break;
+	case E_V8DFmode:
+	  cmp_mode = V8DImode;
+	  break;
+	case E_V4SFmode:
+	  cmp_mode = V4SImode;
+	  break;
+	case E_V8SFmode:
+	  cmp_mode = V8SImode;
+	  break;
+	case E_V16SFmode:
+	  cmp_mode = V16SImode;
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+    }
+
+  for (int i = 0; i != n_elts; i++)
+    vec[i] = GEN_INT (i);
+  constv = gen_rtx_CONST_VECTOR (cmp_mode, gen_rtvec_v (n_elts, vec));
+  valv = gen_reg_rtx (mode);
+  idxv = gen_reg_rtx (cmp_mode);
+  idx_tmp = convert_to_mode (GET_MODE_INNER (cmp_mode), idx, 1);
+
+  ok = ix86_expand_vector_init_duplicate (false, mode, valv, val);
+  gcc_assert (ok);
+  ok = ix86_expand_vector_init_duplicate (false, cmp_mode, idxv, idx_tmp);
+  gcc_assert (ok);
+  vec[0] = target;
+  vec[1] = valv;
+  vec[2] = target;
+  vec[3] = gen_rtx_EQ (mode, idxv, constv);
+  vec[4] = idxv;
+  vec[5] = constv;
+  ok = ix86_expand_int_vcond (vec);
+  gcc_assert (ok);
+}
+
 void
 ix86_expand_vector_set (bool mmx_ok, rtx target, rtx val, int elt)
 {
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index b70d598ce20..d18b3410881 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -244,6 +244,7 @@ extern rtx ix86_rewrite_tls_address (rtx);
 
 extern void ix86_expand_vector_init (bool, rtx, rtx);
 extern void ix86_expand_vector_set (bool, rtx, rtx, int);
+extern void ix86_expand_vector_set_var (rtx, rtx, rtx);
 extern void ix86_expand_vector_extract (bool, rtx, rtx, int);
 extern void ix86_expand_reduc (rtx (*)(rtx, rtx, rtx), rtx, rtx);
 
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 36f9dfcc586..be5aaa4d76f 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -1023,6 +1023,12 @@ (define_predicate "incdec_operand"
   return op == const1_rtx || op == constm1_rtx;
 })
 
+;; True for registers, or const_int_operand, used to vec_setm expander.
+(define_predicate "vec_setm_operand"
+  (ior (and (match_operand 0 "register_operand")
+	    (match_test "TARGET_AVX2"))
+       (match_code "const_int")))
+
 ;; True for registers, or 1 or -1.  Used to optimize double-word shifts.
 (define_predicate "reg_or_pm1_operand"
   (ior (match_operand 0 "register_operand")
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index b153a87fb98..1798e5dea75 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -8098,11 +8098,14 @@ (define_insn "vec_setv2df_0"
 (define_expand "vec_set<mode>"
   [(match_operand:V 0 "register_operand")
    (match_operand:<ssescalarmode> 1 "register_operand")
-   (match_operand 2 "const_int_operand")]
+   (match_operand 2 "vec_setm_operand")]
   "TARGET_SSE"
 {
-  ix86_expand_vector_set (false, operands[0], operands[1],
-			  INTVAL (operands[2]));
+  if (CONST_INT_P (operands[2]))
+    ix86_expand_vector_set (false, operands[0], operands[1],
+			    INTVAL (operands[2]));
+  else
+    ix86_expand_vector_set_var (operands[0], operands[1], operands[2]);
   DONE;
 })
 
diff --git a/gcc/testsuite/gcc.target/i386/avx2-vec-set-1.c b/gcc/testsuite/gcc.target/i386/avx2-vec-set-1.c
new file mode 100644
index 00000000000..4c16ec5dfc4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx2-vec-set-1.c
@@ -0,0 +1,49 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx2 -O2 -mno-avx512f" } */
+/* { dg-final { scan-assembler-times {(?n)vpcmpeq[bwdq]} 12 } } */
+/* { dg-final { scan-assembler-times {(?n)vp?blendv} 12 } } */
+
+typedef char v32qi __attribute__ ((vector_size (32)));
+typedef char v16qi __attribute__ ((vector_size (16)));
+
+typedef short v16hi __attribute__ ((vector_size (32)));
+typedef short v8hi __attribute__ ((vector_size (16)));
+
+typedef int v8si __attribute__ ((vector_size (32)));
+typedef int v4si __attribute__ ((vector_size (16)));
+
+typedef long long v4di __attribute__ ((vector_size (32)));
+typedef long long v2di __attribute__ ((vector_size (16)));
+
+typedef float v8sf __attribute__ ((vector_size (32)));
+typedef float v4sf __attribute__ ((vector_size (16)));
+
+typedef double v4df __attribute__ ((vector_size (32)));
+typedef double v2df __attribute__ ((vector_size (16)));
+
+#define FOO(VTYPE, TYPE)			\
+  VTYPE						\
+  __attribute__ ((noipa))			\
+  foo_##VTYPE (VTYPE a, TYPE b, unsigned int c)	\
+  {						\
+    a[c] = b;					\
+    return a;					\
+  }						\
+
+FOO (v16qi, char);
+FOO (v32qi, char);
+
+FOO (v8hi, short);
+FOO (v16hi, short);
+
+FOO (v4si, int);
+FOO (v8si, int);
+
+FOO (v2di, long long);
+FOO (v4di, long long);
+
+FOO (v4sf, float);
+FOO (v8sf, float);
+
+FOO (v2df, double);
+FOO (v4df, double);
diff --git a/gcc/testsuite/gcc.target/i386/avx2-vec-set-2.c b/gcc/testsuite/gcc.target/i386/avx2-vec-set-2.c
new file mode 100644
index 00000000000..9086ef406f1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx2-vec-set-2.c
@@ -0,0 +1,50 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx2 } */
+/* { dg-options "-O2 -mavx2" } */
+
+
+#ifndef CHECK
+#define CHECK "avx2-check.h"
+#endif
+
+#ifndef TEST
+#define TEST avx2_test
+#endif
+
+#include CHECK
+
+#include "avx2-vec-set-1.c"
+
+#define CALC_TEST(vtype, type, N, idx)				\
+do								\
+  {								\
+    int i,val = idx * idx - idx * 3 + 16;			\
+    type res[N],exp[N];						\
+    vtype resv;							\
+    for (i = 0; i < N; i++)					\
+      {								\
+	res[i] = i * i - i * 3 + 15;				\
+	exp[i] = res[i];					\
+      }								\
+    exp[idx] = val;						\
+    resv = foo_##vtype (*(vtype *)&res[0], val, idx);		\
+    for (i = 0; i < N; i++)					\
+      {								\
+	if (resv[i] != exp[i])					\
+	  abort ();						\
+      }								\
+  }								\
+while (0)
+
+static void
+TEST (void)
+{
+  CALC_TEST (v32qi, char, 32, 17);
+  CALC_TEST (v16qi, char, 16, 5);
+  CALC_TEST (v16hi, short, 16, 9);
+  CALC_TEST (v8hi, short, 8, 6);
+  CALC_TEST (v8si, int, 8, 3);
+  CALC_TEST (v4si, int, 4, 2);
+  CALC_TEST (v4di, long long, 4, 1);
+  CALC_TEST (v2di, long long, 2, 0);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-vec-set-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-vec-set-1.c
new file mode 100644
index 00000000000..5cfbc85732e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-vec-set-1.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512vl -mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times {(?n)(?:vp?broadcast|vmovddup)} 36 } } */
+/* { dg-final { scan-assembler-times {(?n)vpcmp[bwdq][ \t]+\$0} 18 } } */
+
+typedef char v64qi __attribute__ ((vector_size (64)));
+typedef short v32hi __attribute__ ((vector_size (64)));
+typedef int v16si __attribute__ ((vector_size (64)));
+typedef long long v8di __attribute__ ((vector_size (64)));
+typedef float v16sf __attribute__ ((vector_size (64)));
+typedef double v8df __attribute__ ((vector_size (64)));
+
+#include "avx2-vec-set-1.c"
+
+FOO (v64qi, char);
+FOO (v32hi, short);
+FOO (v16si, int);
+FOO (v8di, long long);
+FOO (v16sf, float);
+FOO (v8df, double);
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-vec-set-2.c b/gcc/testsuite/gcc.target/i386/avx512bw-vec-set-2.c
new file mode 100644
index 00000000000..22e64183ebd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-vec-set-2.c
@@ -0,0 +1,44 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx512bw } */
+/* { dg-options "-O2 -mavx512bw" } */
+
+
+#ifndef CHECK
+#define CHECK "avx512f-check.h"
+#endif
+
+#define AVX512BW
+
+#include CHECK
+
+#include "avx512bw-vec-set-1.c"
+
+#define CALC_TEST(vtype, type, N, idx)				\
+do								\
+  {								\
+    int i,val = idx * idx - idx * 3 + 16;			\
+    type res[N],exp[N];						\
+    vtype resv;							\
+    for (i = 0; i < N; i++)					\
+      {								\
+	res[i] = i * i - i * 3 + 15;				\
+	exp[i] = res[i];					\
+      }								\
+    exp[idx] = val;						\
+    resv = foo_##vtype (*(vtype *)&res[0], val, idx);		\
+    for (i = 0; i < N; i++)					\
+      {								\
+	if (resv[i] != exp[i])					\
+	  abort ();						\
+      }								\
+  }								\
+while (0)
+
+static void
+test_512 (void)
+{
+  CALC_TEST (v64qi, char, 64, 50);
+  CALC_TEST (v32hi, short, 32, 30);
+  CALC_TEST (v16si, int, 16, 15);
+  CALC_TEST (v8di, long long, 8, 7);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vec-set-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vec-set-2.c
new file mode 100644
index 00000000000..8f2aa03ec11
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vec-set-2.c
@@ -0,0 +1,42 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx512f } */
+/* { dg-options "-O2 -mavx512f -mno-avx512bw" } */
+
+
+#ifndef CHECK
+#define CHECK "avx512f-check.h"
+#endif
+
+#define AVX512F
+
+#include CHECK
+
+#include "avx512bw-vec-set-1.c"
+
+#define CALC_TEST(vtype, type, N, idx)				\
+do								\
+  {								\
+    int i,val = idx * idx - idx * 3 + 16;			\
+    type res[N],exp[N];						\
+    vtype resv;							\
+    for (i = 0; i < N; i++)					\
+      {								\
+	res[i] = i * i - i * 3 + 15;				\
+	exp[i] = res[i];					\
+      }								\
+    exp[idx] = val;						\
+    resv = foo_##vtype (*(vtype *)&res[0], val, idx);		\
+    for (i = 0; i < N; i++)					\
+      {								\
+	if (resv[i] != exp[i])					\
+	  abort ();						\
+      }								\
+  }								\
+while (0)
+
+static void
+test_512 (void)
+{
+  CALC_TEST (v64qi, char, 64, 50);
+  CALC_TEST (v32hi, short, 32, 30);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vec-set-2.c b/gcc/testsuite/gcc.target/i386/avx512vl-vec-set-2.c
new file mode 100644
index 00000000000..4f327427a64
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-vec-set-2.c
@@ -0,0 +1,55 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx512bw } */
+/* { dg-require-effective-target avx512vl } */
+/* { dg-options "-O2 -mavx512bw -mavx512vl" } */
+
+
+#ifndef CHECK
+#define CHECK "avx512f-check.h"
+#endif
+
+#define AVX512VL
+#define AVX512BW
+
+#include CHECK
+
+#include "avx512bw-vec-set-1.c"
+
+#define CALC_TEST(vtype, type, N, idx)				\
+do								\
+  {								\
+    int i,val = idx * idx - idx * 3 + 16;			\
+    type res[N],exp[N];						\
+    vtype resv;							\
+    for (i = 0; i < N; i++)					\
+      {								\
+	res[i] = i * i - i * 3 + 15;				\
+	exp[i] = res[i];					\
+      }								\
+    exp[idx] = val;						\
+    resv = foo_##vtype (*(vtype *)&res[0], val, idx);		\
+    for (i = 0; i < N; i++)					\
+      {								\
+	if (resv[i] != exp[i])					\
+	  abort ();						\
+      }								\
+  }								\
+while (0)
+
+static void
+test_256 (void)
+{
+  CALC_TEST (v32qi, char, 32, 17);
+  CALC_TEST (v16hi, short, 16, 9);
+  CALC_TEST (v8si, int, 8, 3);
+  CALC_TEST (v4di, long long, 4, 1);
+}
+
+static void
+test_128 (void)
+{
+  CALC_TEST (v16qi, char, 16, 5);
+  CALC_TEST (v8hi, short, 8, 6);
+  CALC_TEST (v4si, int, 4, 2);
+  CALC_TEST (v2di, long long, 2, 0);
+}
-- 
2.18.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] [PR target/97194] [AVX2] Support variable index vec_set.
  2020-10-20  7:36         ` Richard Biener
@ 2020-10-27  7:51           ` Hongtao Liu
  2020-11-11  8:03             ` Hongtao Liu
  0 siblings, 1 reply; 23+ messages in thread
From: Hongtao Liu @ 2020-10-27  7:51 UTC (permalink / raw)
  To: GCC Patches, Kirill Yukhin

[-- Attachment #1: Type: text/plain, Size: 5438 bytes --]

ping^1

On Tue, Oct 20, 2020 at 3:36 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Tue, Oct 20, 2020 at 4:35 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Mon, Oct 19, 2020 at 5:55 PM Richard Biener
> > <richard.guenther@gmail.com> wrote:
> > >
> > > On Mon, Oct 19, 2020 at 11:37 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > > >
> > > > On Mon, Oct 19, 2020 at 5:07 PM Richard Biener
> > > > <richard.guenther@gmail.com> wrote:
> > > > >
> > > > > On Mon, Oct 19, 2020 at 10:21 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > > > > >
> > > > > > Hi:
> > > > > >   It's implemented as below:
> > > > > > V setg (V v, int idx, T val)
> > > > > >
> > > > > > {
> > > > > >   V idxv = (V){idx, idx, idx, idx, idx, idx, idx, idx};
> > > > > >   V valv = (V){val, val, val, val, val, val, val, val};
> > > > > >   V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == idxv);
> > > > > >   v = (v & ~mask) | (valv & mask);
> > > > > >   return v;
> > > > > > }
> > > > > >
> > > > > > Bootstrap is fine, regression test for i386/x86-64 backend is ok.
> > > > > > Ok for trunk?
> > > > >
> > > > > Hmm, I guess you're trying to keep the code for !AVX512BW simple
> > > > > but isn't just splitting the compare into
> > > > >
> > > > >  clow = {0, 1, 2, 3 ... } == idxv
> > > > >  chigh = {16, 17, 18, ... } == idxv;
> > > > >  cmp = {clow, chigh}
> > > > >
> > > >
> > > > We also don't have 512-bits byte/word blend instructions without
> > > > TARGET_AVX512W, so how to use 512-bits cmp?
> > >
> > > Oh, I see.  Guess two back-to-back vpternlog could emulate
> >
> > Yes, we can have something like vpternlogd %zmm0, %zmm1, %zmm2, 0xD8,
> > but since we don't have 512-bits bytes/word broadcast instruction,
> > It would need 2 broadcast and 1 vec_concat to get 1 512-bits vector.
> > it wouldn't save many instructions compared to my version(as below).
> >
> > ---
> >         leal    -16(%rsi), %eax
> >         vmovd   %edi, %xmm2
> >         vmovdqa .LC0(%rip), %ymm4
> >         vextracti64x4   $0x1, %zmm0, %ymm3
> >         vmovd   %eax, %xmm1
> >         vpbroadcastw    %xmm2, %ymm2
> >         vpbroadcastw    %xmm1, %ymm1
> >         vpcmpeqw        %ymm4, %ymm1, %ymm1
> >         vpblendvb       %ymm1, %ymm2, %ymm3, %ymm3
> >         vmovd   %esi, %xmm1
> >         vpbroadcastw    %xmm1, %ymm1
> >         vpcmpeqw        %ymm4, %ymm1, %ymm1
> >         vpblendvb       %ymm1, %ymm2, %ymm0, %ymm0
> >         vinserti64x4    $0x1, %ymm3, %zmm0, %zmm0
> > ---
> >
> > > the blend?  Not sure if important - I recall only knl didn't have bw?
> > >
> >
> > Yes, after(including) SKX, all avx512 targets will support AVX512BW.
> > And i don't think performance for V32HI/V64QI without AVX512BW is important.
>
> True.
>
> I have no further comments on the patch then - it still needs i386 maintainer
> approval though.
>
> Thanks,
> Richard.
>
> >
> > > > cut from i386-expand.c:
> > > > in ix86_expand_sse_movcc
> > > >  3682    case E_V64QImode:
> > > >  3683      gen = gen_avx512bw_blendmv64qi; ---> TARGET_AVX512BW needed
> > > >  3684      break;
> > > >  3685    case E_V32HImode:
> > > >  3686      gen = gen_avx512bw_blendmv32hi; --> TARGET_AVX512BW needed
> > > >  3687      break;
> > > >  3688    case E_V16SImode:
> > > >  3689      gen = gen_avx512f_blendmv16si;
> > > >  3690      break;
> > > >  3691    case E_V8DImode:
> > > >  3692      gen = gen_avx512f_blendmv8di;
> > > >  3693      break;
> > > >  3694    case E_V8DFmode:
> > > >
> > > > > faster, smaller and eventually even easier during expansion?
> > > > >
> > > > > +  gcc_assert (ix86_expand_vector_init_duplicate (false, mode, valv, val));
> > > > > +  gcc_assert (ix86_expand_vector_init_duplicate (false, cmp_mode,
> > > > > idxv, idx_tmp));
> > > > >
> > > > > side-effects in gcc_assert is considered bad style, use
> > > > >
> > > > >   ok = ix86_expand_vector_init_duplicate (false, mode, valv, val);
> > > > >   gcc_assert (ok);
> > > > >
> > > > > +  vec[5] = constv;
> > > > > +  ix86_expand_int_vcond (vec);
> > > > >
> > > > > this also returns a bool you probably should assert true.
> > > > >
> > > >
> > > > Yes, will change.
> > > >
> > > > > Otherwise thanks for tackling this.
> > > > >
> > > > > Richard.
> > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > >         PR target/97194
> > > > > >         * config/i386/i386-expand.c (ix86_expand_vector_set_var): New function.
> > > > > >         * config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl.
> > > > > >         * config/i386/predicates.md (vec_setm_operand): New predicate,
> > > > > >         true for const_int_operand or register_operand under TARGET_AVX2.
> > > > > >         * config/i386/sse.md (vec_set<mode>): Support both constant
> > > > > >         and variable index vec_set.
> > > > > >
> > > > > > gcc/testsuite/ChangeLog:
> > > > > >
> > > > > >         * gcc.target/i386/avx2-vec-set-1.c: New test.
> > > > > >         * gcc.target/i386/avx2-vec-set-2.c: New test.
> > > > > >         * gcc.target/i386/avx512bw-vec-set-1.c: New test.
> > > > > >         * gcc.target/i386/avx512bw-vec-set-2.c: New test.
> > > > > >         * gcc.target/i386/avx512f-vec-set-2.c: New test.
> > > > > >         * gcc.target/i386/avx512vl-vec-set-2.c: New test.
> > > > > >
> > > > > > --
> > > > > > BR,
> > > > > > Hongtao
> > > >
> > > >
> > > >
> > > > --
> > > > BR,
> > > > Hongtao
> >
> >
> >
> > --
> > BR,
> > Hongtao



-- 
BR,
Hongtao

[-- Attachment #2: 0001-Support-variable-index-vec_set-V2.patch --]
[-- Type: text/x-patch, Size: 15186 bytes --]

From 3b24e66d861c38292da43a55f5b2a51d0f89d043 Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Mon, 19 Oct 2020 16:04:39 +0800
Subject: [PATCH] Support variable index vec_set.

gcc/ChangeLog:

	PR target/97194
	* config/i386/i386-expand.c (ix86_expand_vector_set_var): New function.
	* config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl.
	* config/i386/predicates.md (vec_setm_operand): New predicate,
	true for const_int_operand or register_operand under TARGET_AVX2.
	* config/i386/sse.md (vec_set<mode>): Support both constant
	and variable index vec_set.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx2-vec-set-1.c: New test.
	* gcc.target/i386/avx2-vec-set-2.c: New test.
	* gcc.target/i386/avx512bw-vec-set-1.c: New test.
	* gcc.target/i386/avx512bw-vec-set-2.c: New test.
	* gcc.target/i386/avx512f-vec-set-2.c: New test.
	* gcc.target/i386/avx512vl-vec-set-2.c: New test.
---
 gcc/config/i386/i386-expand.c                 | 106 ++++++++++++++++++
 gcc/config/i386/i386-protos.h                 |   1 +
 gcc/config/i386/predicates.md                 |   6 +
 gcc/config/i386/sse.md                        |   9 +-
 .../gcc.target/i386/avx2-vec-set-1.c          |  49 ++++++++
 .../gcc.target/i386/avx2-vec-set-2.c          |  50 +++++++++
 .../gcc.target/i386/avx512bw-vec-set-1.c      |  20 ++++
 .../gcc.target/i386/avx512bw-vec-set-2.c      |  44 ++++++++
 .../gcc.target/i386/avx512f-vec-set-2.c       |  42 +++++++
 .../gcc.target/i386/avx512vl-vec-set-2.c      |  55 +++++++++
 10 files changed, 379 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx2-vec-set-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx2-vec-set-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-vec-set-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-vec-set-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vec-set-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-vec-set-2.c

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 3e8afe683dc..fe449ac855e 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -14235,6 +14235,112 @@ ix86_expand_vector_init (bool mmx_ok, rtx target, rtx vals)
   ix86_expand_vector_init_general (mmx_ok, mode, target, vals);
 }
 
+/* Implemented as
+   V setg (V v, int idx, T val)
+   {
+     V idxv = (V){idx, idx, idx, idx, idx, idx, idx, idx};
+     V valv = (V){val, val, val, val, val, val, val, val};
+     V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == idxv);
+     v = (v & ~mask) | (valv & mask);
+     return v;
+   }.  */
+void
+ix86_expand_vector_set_var (rtx target, rtx val, rtx idx)
+{
+  rtx vec[64];
+  machine_mode mode = GET_MODE (target);
+  machine_mode cmp_mode = mode;
+  int n_elts = GET_MODE_NUNITS (mode);
+  rtx valv,idxv,constv,idx_tmp;
+  bool ok = false;
+
+  /* 512-bits vector byte/word broadcast and comparison only available
+     under TARGET_AVX512BW, break 512-bits vector into two 256-bits vector
+     when without TARGET_AVX512BW.  */
+  if ((mode == V32HImode || mode == V64QImode) && !TARGET_AVX512BW)
+    {
+      gcc_assert (TARGET_AVX512F);
+      rtx vhi, vlo, idx_hi;
+      machine_mode half_mode;
+      rtx (*extract_hi)(rtx, rtx);
+      rtx (*extract_lo)(rtx, rtx);
+
+      if (mode == V32HImode)
+	{
+	  half_mode = V16HImode;
+	  extract_hi = gen_vec_extract_hi_v32hi;
+	  extract_lo = gen_vec_extract_lo_v32hi;
+	}
+      else
+	{
+	  half_mode = V32QImode;
+	  extract_hi = gen_vec_extract_hi_v64qi;
+	  extract_lo = gen_vec_extract_lo_v64qi;
+	}
+
+      vhi = gen_reg_rtx (half_mode);
+      vlo = gen_reg_rtx (half_mode);
+      idx_hi = gen_reg_rtx (GET_MODE (idx));
+      emit_insn (extract_hi (vhi, target));
+      emit_insn (extract_lo (vlo, target));
+      vec[0] = idx_hi;
+      vec[1] = idx;
+      vec[2] = GEN_INT (n_elts/2);
+      ix86_expand_binary_operator (MINUS, GET_MODE (idx), vec);
+      ix86_expand_vector_set_var (vhi, val, idx_hi);
+      ix86_expand_vector_set_var (vlo, val, idx);
+      emit_insn (gen_rtx_SET (target, gen_rtx_VEC_CONCAT (mode, vlo, vhi)));
+      return;
+    }
+
+  if (FLOAT_MODE_P (GET_MODE_INNER (mode)))
+    {
+      switch (mode)
+	{
+	case E_V2DFmode:
+	  cmp_mode = V2DImode;
+	  break;
+	case E_V4DFmode:
+	  cmp_mode = V4DImode;
+	  break;
+	case E_V8DFmode:
+	  cmp_mode = V8DImode;
+	  break;
+	case E_V4SFmode:
+	  cmp_mode = V4SImode;
+	  break;
+	case E_V8SFmode:
+	  cmp_mode = V8SImode;
+	  break;
+	case E_V16SFmode:
+	  cmp_mode = V16SImode;
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+    }
+
+  for (int i = 0; i != n_elts; i++)
+    vec[i] = GEN_INT (i);
+  constv = gen_rtx_CONST_VECTOR (cmp_mode, gen_rtvec_v (n_elts, vec));
+  valv = gen_reg_rtx (mode);
+  idxv = gen_reg_rtx (cmp_mode);
+  idx_tmp = convert_to_mode (GET_MODE_INNER (cmp_mode), idx, 1);
+
+  ok = ix86_expand_vector_init_duplicate (false, mode, valv, val);
+  gcc_assert (ok);
+  ok = ix86_expand_vector_init_duplicate (false, cmp_mode, idxv, idx_tmp);
+  gcc_assert (ok);
+  vec[0] = target;
+  vec[1] = valv;
+  vec[2] = target;
+  vec[3] = gen_rtx_EQ (mode, idxv, constv);
+  vec[4] = idxv;
+  vec[5] = constv;
+  ok = ix86_expand_int_vcond (vec);
+  gcc_assert (ok);
+}
+
 void
 ix86_expand_vector_set (bool mmx_ok, rtx target, rtx val, int elt)
 {
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index c5b700efd0e..7a1dc3d4d64 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -242,6 +242,7 @@ extern rtx ix86_rewrite_tls_address (rtx);
 
 extern void ix86_expand_vector_init (bool, rtx, rtx);
 extern void ix86_expand_vector_set (bool, rtx, rtx, int);
+extern void ix86_expand_vector_set_var (rtx, rtx, rtx);
 extern void ix86_expand_vector_extract (bool, rtx, rtx, int);
 extern void ix86_expand_reduc (rtx (*)(rtx, rtx, rtx), rtx, rtx);
 
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index be57cdaf768..d2a2eece02b 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -1023,6 +1023,12 @@ (define_predicate "incdec_operand"
   return op == const1_rtx || op == constm1_rtx;
 })
 
+;; True for registers, or const_int_operand, used to vec_setm expander.
+(define_predicate "vec_setm_operand"
+  (ior (and (match_operand 0 "register_operand")
+	    (match_test "TARGET_AVX2"))
+       (match_code "const_int")))
+
 ;; True for registers, or 1 or -1.  Used to optimize double-word shifts.
 (define_predicate "reg_or_pm1_operand"
   (ior (match_operand 0 "register_operand")
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 52635f6bc08..35a64b7b333 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -8085,11 +8085,14 @@ (define_insn "vec_setv2df_0"
 (define_expand "vec_set<mode>"
   [(match_operand:V 0 "register_operand")
    (match_operand:<ssescalarmode> 1 "register_operand")
-   (match_operand 2 "const_int_operand")]
+   (match_operand 2 "vec_setm_operand")]
   "TARGET_SSE"
 {
-  ix86_expand_vector_set (false, operands[0], operands[1],
-			  INTVAL (operands[2]));
+  if (CONST_INT_P (operands[2]))
+    ix86_expand_vector_set (false, operands[0], operands[1],
+			    INTVAL (operands[2]));
+  else
+    ix86_expand_vector_set_var (operands[0], operands[1], operands[2]);
   DONE;
 })
 
diff --git a/gcc/testsuite/gcc.target/i386/avx2-vec-set-1.c b/gcc/testsuite/gcc.target/i386/avx2-vec-set-1.c
new file mode 100644
index 00000000000..4c16ec5dfc4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx2-vec-set-1.c
@@ -0,0 +1,49 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx2 -O2 -mno-avx512f" } */
+/* { dg-final { scan-assembler-times {(?n)vpcmpeq[bwdq]} 12 } } */
+/* { dg-final { scan-assembler-times {(?n)vp?blendv} 12 } } */
+
+typedef char v32qi __attribute__ ((vector_size (32)));
+typedef char v16qi __attribute__ ((vector_size (16)));
+
+typedef short v16hi __attribute__ ((vector_size (32)));
+typedef short v8hi __attribute__ ((vector_size (16)));
+
+typedef int v8si __attribute__ ((vector_size (32)));
+typedef int v4si __attribute__ ((vector_size (16)));
+
+typedef long long v4di __attribute__ ((vector_size (32)));
+typedef long long v2di __attribute__ ((vector_size (16)));
+
+typedef float v8sf __attribute__ ((vector_size (32)));
+typedef float v4sf __attribute__ ((vector_size (16)));
+
+typedef double v4df __attribute__ ((vector_size (32)));
+typedef double v2df __attribute__ ((vector_size (16)));
+
+#define FOO(VTYPE, TYPE)			\
+  VTYPE						\
+  __attribute__ ((noipa))			\
+  foo_##VTYPE (VTYPE a, TYPE b, unsigned int c)	\
+  {						\
+    a[c] = b;					\
+    return a;					\
+  }						\
+
+FOO (v16qi, char);
+FOO (v32qi, char);
+
+FOO (v8hi, short);
+FOO (v16hi, short);
+
+FOO (v4si, int);
+FOO (v8si, int);
+
+FOO (v2di, long long);
+FOO (v4di, long long);
+
+FOO (v4sf, float);
+FOO (v8sf, float);
+
+FOO (v2df, double);
+FOO (v4df, double);
diff --git a/gcc/testsuite/gcc.target/i386/avx2-vec-set-2.c b/gcc/testsuite/gcc.target/i386/avx2-vec-set-2.c
new file mode 100644
index 00000000000..9086ef406f1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx2-vec-set-2.c
@@ -0,0 +1,50 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx2 } */
+/* { dg-options "-O2 -mavx2" } */
+
+
+#ifndef CHECK
+#define CHECK "avx2-check.h"
+#endif
+
+#ifndef TEST
+#define TEST avx2_test
+#endif
+
+#include CHECK
+
+#include "avx2-vec-set-1.c"
+
+#define CALC_TEST(vtype, type, N, idx)				\
+do								\
+  {								\
+    int i,val = idx * idx - idx * 3 + 16;			\
+    type res[N],exp[N];						\
+    vtype resv;							\
+    for (i = 0; i < N; i++)					\
+      {								\
+	res[i] = i * i - i * 3 + 15;				\
+	exp[i] = res[i];					\
+      }								\
+    exp[idx] = val;						\
+    resv = foo_##vtype (*(vtype *)&res[0], val, idx);		\
+    for (i = 0; i < N; i++)					\
+      {								\
+	if (resv[i] != exp[i])					\
+	  abort ();						\
+      }								\
+  }								\
+while (0)
+
+static void
+TEST (void)
+{
+  CALC_TEST (v32qi, char, 32, 17);
+  CALC_TEST (v16qi, char, 16, 5);
+  CALC_TEST (v16hi, short, 16, 9);
+  CALC_TEST (v8hi, short, 8, 6);
+  CALC_TEST (v8si, int, 8, 3);
+  CALC_TEST (v4si, int, 4, 2);
+  CALC_TEST (v4di, long long, 4, 1);
+  CALC_TEST (v2di, long long, 2, 0);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-vec-set-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-vec-set-1.c
new file mode 100644
index 00000000000..5cfbc85732e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-vec-set-1.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512vl -mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times {(?n)(?:vp?broadcast|vmovddup)} 36 } } */
+/* { dg-final { scan-assembler-times {(?n)vpcmp[bwdq][ \t]+\$0} 18 } } */
+
+typedef char v64qi __attribute__ ((vector_size (64)));
+typedef short v32hi __attribute__ ((vector_size (64)));
+typedef int v16si __attribute__ ((vector_size (64)));
+typedef long long v8di __attribute__ ((vector_size (64)));
+typedef float v16sf __attribute__ ((vector_size (64)));
+typedef double v8df __attribute__ ((vector_size (64)));
+
+#include "avx2-vec-set-1.c"
+
+FOO (v64qi, char);
+FOO (v32hi, short);
+FOO (v16si, int);
+FOO (v8di, long long);
+FOO (v16sf, float);
+FOO (v8df, double);
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-vec-set-2.c b/gcc/testsuite/gcc.target/i386/avx512bw-vec-set-2.c
new file mode 100644
index 00000000000..22e64183ebd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-vec-set-2.c
@@ -0,0 +1,44 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx512bw } */
+/* { dg-options "-O2 -mavx512bw" } */
+
+
+#ifndef CHECK
+#define CHECK "avx512f-check.h"
+#endif
+
+#define AVX512BW
+
+#include CHECK
+
+#include "avx512bw-vec-set-1.c"
+
+#define CALC_TEST(vtype, type, N, idx)				\
+do								\
+  {								\
+    int i,val = idx * idx - idx * 3 + 16;			\
+    type res[N],exp[N];						\
+    vtype resv;							\
+    for (i = 0; i < N; i++)					\
+      {								\
+	res[i] = i * i - i * 3 + 15;				\
+	exp[i] = res[i];					\
+      }								\
+    exp[idx] = val;						\
+    resv = foo_##vtype (*(vtype *)&res[0], val, idx);		\
+    for (i = 0; i < N; i++)					\
+      {								\
+	if (resv[i] != exp[i])					\
+	  abort ();						\
+      }								\
+  }								\
+while (0)
+
+static void
+test_512 (void)
+{
+  CALC_TEST (v64qi, char, 64, 50);
+  CALC_TEST (v32hi, short, 32, 30);
+  CALC_TEST (v16si, int, 16, 15);
+  CALC_TEST (v8di, long long, 8, 7);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vec-set-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vec-set-2.c
new file mode 100644
index 00000000000..8f2aa03ec11
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vec-set-2.c
@@ -0,0 +1,42 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx512f } */
+/* { dg-options "-O2 -mavx512f -mno-avx512bw" } */
+
+
+#ifndef CHECK
+#define CHECK "avx512f-check.h"
+#endif
+
+#define AVX512F
+
+#include CHECK
+
+#include "avx512bw-vec-set-1.c"
+
+#define CALC_TEST(vtype, type, N, idx)				\
+do								\
+  {								\
+    int i,val = idx * idx - idx * 3 + 16;			\
+    type res[N],exp[N];						\
+    vtype resv;							\
+    for (i = 0; i < N; i++)					\
+      {								\
+	res[i] = i * i - i * 3 + 15;				\
+	exp[i] = res[i];					\
+      }								\
+    exp[idx] = val;						\
+    resv = foo_##vtype (*(vtype *)&res[0], val, idx);		\
+    for (i = 0; i < N; i++)					\
+      {								\
+	if (resv[i] != exp[i])					\
+	  abort ();						\
+      }								\
+  }								\
+while (0)
+
+static void
+test_512 (void)
+{
+  CALC_TEST (v64qi, char, 64, 50);
+  CALC_TEST (v32hi, short, 32, 30);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vec-set-2.c b/gcc/testsuite/gcc.target/i386/avx512vl-vec-set-2.c
new file mode 100644
index 00000000000..4f327427a64
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-vec-set-2.c
@@ -0,0 +1,55 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx512bw } */
+/* { dg-require-effective-target avx512vl } */
+/* { dg-options "-O2 -mavx512bw -mavx512vl" } */
+
+
+#ifndef CHECK
+#define CHECK "avx512f-check.h"
+#endif
+
+#define AVX512VL
+#define AVX512BW
+
+#include CHECK
+
+#include "avx512bw-vec-set-1.c"
+
+#define CALC_TEST(vtype, type, N, idx)				\
+do								\
+  {								\
+    int i,val = idx * idx - idx * 3 + 16;			\
+    type res[N],exp[N];						\
+    vtype resv;							\
+    for (i = 0; i < N; i++)					\
+      {								\
+	res[i] = i * i - i * 3 + 15;				\
+	exp[i] = res[i];					\
+      }								\
+    exp[idx] = val;						\
+    resv = foo_##vtype (*(vtype *)&res[0], val, idx);		\
+    for (i = 0; i < N; i++)					\
+      {								\
+	if (resv[i] != exp[i])					\
+	  abort ();						\
+      }								\
+  }								\
+while (0)
+
+static void
+test_256 (void)
+{
+  CALC_TEST (v32qi, char, 32, 17);
+  CALC_TEST (v16hi, short, 16, 9);
+  CALC_TEST (v8si, int, 8, 3);
+  CALC_TEST (v4di, long long, 4, 1);
+}
+
+static void
+test_128 (void)
+{
+  CALC_TEST (v16qi, char, 16, 5);
+  CALC_TEST (v8hi, short, 8, 6);
+  CALC_TEST (v4si, int, 4, 2);
+  CALC_TEST (v2di, long long, 2, 0);
+}
-- 
2.18.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] [PR target/97194] [AVX2] Support variable index vec_set.
  2020-10-20  2:37       ` Hongtao Liu
@ 2020-10-20  7:36         ` Richard Biener
  2020-10-27  7:51           ` Hongtao Liu
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Biener @ 2020-10-20  7:36 UTC (permalink / raw)
  To: Hongtao Liu, Kirill Yukhin; +Cc: GCC Patches, H. J. Lu

On Tue, Oct 20, 2020 at 4:35 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Mon, Oct 19, 2020 at 5:55 PM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Mon, Oct 19, 2020 at 11:37 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > >
> > > On Mon, Oct 19, 2020 at 5:07 PM Richard Biener
> > > <richard.guenther@gmail.com> wrote:
> > > >
> > > > On Mon, Oct 19, 2020 at 10:21 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > > > >
> > > > > Hi:
> > > > >   It's implemented as below:
> > > > > V setg (V v, int idx, T val)
> > > > >
> > > > > {
> > > > >   V idxv = (V){idx, idx, idx, idx, idx, idx, idx, idx};
> > > > >   V valv = (V){val, val, val, val, val, val, val, val};
> > > > >   V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == idxv);
> > > > >   v = (v & ~mask) | (valv & mask);
> > > > >   return v;
> > > > > }
> > > > >
> > > > > Bootstrap is fine, regression test for i386/x86-64 backend is ok.
> > > > > Ok for trunk?
> > > >
> > > > Hmm, I guess you're trying to keep the code for !AVX512BW simple
> > > > but isn't just splitting the compare into
> > > >
> > > >  clow = {0, 1, 2, 3 ... } == idxv
> > > >  chigh = {16, 17, 18, ... } == idxv;
> > > >  cmp = {clow, chigh}
> > > >
> > >
> > > We also don't have 512-bits byte/word blend instructions without
> > > TARGET_AVX512W, so how to use 512-bits cmp?
> >
> > Oh, I see.  Guess two back-to-back vpternlog could emulate
>
> Yes, we can have something like vpternlogd %zmm0, %zmm1, %zmm2, 0xD8,
> but since we don't have 512-bits bytes/word broadcast instruction,
> It would need 2 broadcast and 1 vec_concat to get 1 512-bits vector.
> it wouldn't save many instructions compared to my version(as below).
>
> ---
>         leal    -16(%rsi), %eax
>         vmovd   %edi, %xmm2
>         vmovdqa .LC0(%rip), %ymm4
>         vextracti64x4   $0x1, %zmm0, %ymm3
>         vmovd   %eax, %xmm1
>         vpbroadcastw    %xmm2, %ymm2
>         vpbroadcastw    %xmm1, %ymm1
>         vpcmpeqw        %ymm4, %ymm1, %ymm1
>         vpblendvb       %ymm1, %ymm2, %ymm3, %ymm3
>         vmovd   %esi, %xmm1
>         vpbroadcastw    %xmm1, %ymm1
>         vpcmpeqw        %ymm4, %ymm1, %ymm1
>         vpblendvb       %ymm1, %ymm2, %ymm0, %ymm0
>         vinserti64x4    $0x1, %ymm3, %zmm0, %zmm0
> ---
>
> > the blend?  Not sure if important - I recall only knl didn't have bw?
> >
>
> Yes, after(including) SKX, all avx512 targets will support AVX512BW.
> And i don't think performance for V32HI/V64QI without AVX512BW is important.

True.

I have no further comments on the patch then - it still needs i386 maintainer
approval though.

Thanks,
Richard.

>
> > > cut from i386-expand.c:
> > > in ix86_expand_sse_movcc
> > >  3682    case E_V64QImode:
> > >  3683      gen = gen_avx512bw_blendmv64qi; ---> TARGET_AVX512BW needed
> > >  3684      break;
> > >  3685    case E_V32HImode:
> > >  3686      gen = gen_avx512bw_blendmv32hi; --> TARGET_AVX512BW needed
> > >  3687      break;
> > >  3688    case E_V16SImode:
> > >  3689      gen = gen_avx512f_blendmv16si;
> > >  3690      break;
> > >  3691    case E_V8DImode:
> > >  3692      gen = gen_avx512f_blendmv8di;
> > >  3693      break;
> > >  3694    case E_V8DFmode:
> > >
> > > > faster, smaller and eventually even easier during expansion?
> > > >
> > > > +  gcc_assert (ix86_expand_vector_init_duplicate (false, mode, valv, val));
> > > > +  gcc_assert (ix86_expand_vector_init_duplicate (false, cmp_mode,
> > > > idxv, idx_tmp));
> > > >
> > > > side-effects in gcc_assert is considered bad style, use
> > > >
> > > >   ok = ix86_expand_vector_init_duplicate (false, mode, valv, val);
> > > >   gcc_assert (ok);
> > > >
> > > > +  vec[5] = constv;
> > > > +  ix86_expand_int_vcond (vec);
> > > >
> > > > this also returns a bool you probably should assert true.
> > > >
> > >
> > > Yes, will change.
> > >
> > > > Otherwise thanks for tackling this.
> > > >
> > > > Richard.
> > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > >         PR target/97194
> > > > >         * config/i386/i386-expand.c (ix86_expand_vector_set_var): New function.
> > > > >         * config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl.
> > > > >         * config/i386/predicates.md (vec_setm_operand): New predicate,
> > > > >         true for const_int_operand or register_operand under TARGET_AVX2.
> > > > >         * config/i386/sse.md (vec_set<mode>): Support both constant
> > > > >         and variable index vec_set.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > >         * gcc.target/i386/avx2-vec-set-1.c: New test.
> > > > >         * gcc.target/i386/avx2-vec-set-2.c: New test.
> > > > >         * gcc.target/i386/avx512bw-vec-set-1.c: New test.
> > > > >         * gcc.target/i386/avx512bw-vec-set-2.c: New test.
> > > > >         * gcc.target/i386/avx512f-vec-set-2.c: New test.
> > > > >         * gcc.target/i386/avx512vl-vec-set-2.c: New test.
> > > > >
> > > > > --
> > > > > BR,
> > > > > Hongtao
> > >
> > >
> > >
> > > --
> > > BR,
> > > Hongtao
>
>
>
> --
> BR,
> Hongtao

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] [PR target/97194] [AVX2] Support variable index vec_set.
  2020-10-19  9:55     ` Richard Biener
@ 2020-10-20  2:37       ` Hongtao Liu
  2020-10-20  7:36         ` Richard Biener
  0 siblings, 1 reply; 23+ messages in thread
From: Hongtao Liu @ 2020-10-20  2:37 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, H. J. Lu

On Mon, Oct 19, 2020 at 5:55 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Mon, Oct 19, 2020 at 11:37 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Mon, Oct 19, 2020 at 5:07 PM Richard Biener
> > <richard.guenther@gmail.com> wrote:
> > >
> > > On Mon, Oct 19, 2020 at 10:21 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > > >
> > > > Hi:
> > > >   It's implemented as below:
> > > > V setg (V v, int idx, T val)
> > > >
> > > > {
> > > >   V idxv = (V){idx, idx, idx, idx, idx, idx, idx, idx};
> > > >   V valv = (V){val, val, val, val, val, val, val, val};
> > > >   V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == idxv);
> > > >   v = (v & ~mask) | (valv & mask);
> > > >   return v;
> > > > }
> > > >
> > > > Bootstrap is fine, regression test for i386/x86-64 backend is ok.
> > > > Ok for trunk?
> > >
> > > Hmm, I guess you're trying to keep the code for !AVX512BW simple
> > > but isn't just splitting the compare into
> > >
> > >  clow = {0, 1, 2, 3 ... } == idxv
> > >  chigh = {16, 17, 18, ... } == idxv;
> > >  cmp = {clow, chigh}
> > >
> >
> > We also don't have 512-bits byte/word blend instructions without
> > TARGET_AVX512W, so how to use 512-bits cmp?
>
> Oh, I see.  Guess two back-to-back vpternlog could emulate

Yes, we can have something like vpternlogd %zmm0, %zmm1, %zmm2, 0xD8,
but since we don't have 512-bits bytes/word broadcast instruction,
It would need 2 broadcast and 1 vec_concat to get 1 512-bits vector.
it wouldn't save many instructions compared to my version(as below).

---
        leal    -16(%rsi), %eax
        vmovd   %edi, %xmm2
        vmovdqa .LC0(%rip), %ymm4
        vextracti64x4   $0x1, %zmm0, %ymm3
        vmovd   %eax, %xmm1
        vpbroadcastw    %xmm2, %ymm2
        vpbroadcastw    %xmm1, %ymm1
        vpcmpeqw        %ymm4, %ymm1, %ymm1
        vpblendvb       %ymm1, %ymm2, %ymm3, %ymm3
        vmovd   %esi, %xmm1
        vpbroadcastw    %xmm1, %ymm1
        vpcmpeqw        %ymm4, %ymm1, %ymm1
        vpblendvb       %ymm1, %ymm2, %ymm0, %ymm0
        vinserti64x4    $0x1, %ymm3, %zmm0, %zmm0
---

> the blend?  Not sure if important - I recall only knl didn't have bw?
>

Yes, after(including) SKX, all avx512 targets will support AVX512BW.
And i don't think performance for V32HI/V64QI without AVX512BW is important.


> > cut from i386-expand.c:
> > in ix86_expand_sse_movcc
> >  3682    case E_V64QImode:
> >  3683      gen = gen_avx512bw_blendmv64qi; ---> TARGET_AVX512BW needed
> >  3684      break;
> >  3685    case E_V32HImode:
> >  3686      gen = gen_avx512bw_blendmv32hi; --> TARGET_AVX512BW needed
> >  3687      break;
> >  3688    case E_V16SImode:
> >  3689      gen = gen_avx512f_blendmv16si;
> >  3690      break;
> >  3691    case E_V8DImode:
> >  3692      gen = gen_avx512f_blendmv8di;
> >  3693      break;
> >  3694    case E_V8DFmode:
> >
> > > faster, smaller and eventually even easier during expansion?
> > >
> > > +  gcc_assert (ix86_expand_vector_init_duplicate (false, mode, valv, val));
> > > +  gcc_assert (ix86_expand_vector_init_duplicate (false, cmp_mode,
> > > idxv, idx_tmp));
> > >
> > > side-effects in gcc_assert is considered bad style, use
> > >
> > >   ok = ix86_expand_vector_init_duplicate (false, mode, valv, val);
> > >   gcc_assert (ok);
> > >
> > > +  vec[5] = constv;
> > > +  ix86_expand_int_vcond (vec);
> > >
> > > this also returns a bool you probably should assert true.
> > >
> >
> > Yes, will change.
> >
> > > Otherwise thanks for tackling this.
> > >
> > > Richard.
> > >
> > > > gcc/ChangeLog:
> > > >
> > > >         PR target/97194
> > > >         * config/i386/i386-expand.c (ix86_expand_vector_set_var): New function.
> > > >         * config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl.
> > > >         * config/i386/predicates.md (vec_setm_operand): New predicate,
> > > >         true for const_int_operand or register_operand under TARGET_AVX2.
> > > >         * config/i386/sse.md (vec_set<mode>): Support both constant
> > > >         and variable index vec_set.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > >         * gcc.target/i386/avx2-vec-set-1.c: New test.
> > > >         * gcc.target/i386/avx2-vec-set-2.c: New test.
> > > >         * gcc.target/i386/avx512bw-vec-set-1.c: New test.
> > > >         * gcc.target/i386/avx512bw-vec-set-2.c: New test.
> > > >         * gcc.target/i386/avx512f-vec-set-2.c: New test.
> > > >         * gcc.target/i386/avx512vl-vec-set-2.c: New test.
> > > >
> > > > --
> > > > BR,
> > > > Hongtao
> >
> >
> >
> > --
> > BR,
> > Hongtao



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] [PR target/97194] [AVX2] Support variable index vec_set.
  2020-10-19  9:39   ` Hongtao Liu
@ 2020-10-19  9:55     ` Richard Biener
  2020-10-20  2:37       ` Hongtao Liu
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Biener @ 2020-10-19  9:55 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: GCC Patches, H. J. Lu

On Mon, Oct 19, 2020 at 11:37 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Mon, Oct 19, 2020 at 5:07 PM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Mon, Oct 19, 2020 at 10:21 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > >
> > > Hi:
> > >   It's implemented as below:
> > > V setg (V v, int idx, T val)
> > >
> > > {
> > >   V idxv = (V){idx, idx, idx, idx, idx, idx, idx, idx};
> > >   V valv = (V){val, val, val, val, val, val, val, val};
> > >   V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == idxv);
> > >   v = (v & ~mask) | (valv & mask);
> > >   return v;
> > > }
> > >
> > > Bootstrap is fine, regression test for i386/x86-64 backend is ok.
> > > Ok for trunk?
> >
> > Hmm, I guess you're trying to keep the code for !AVX512BW simple
> > but isn't just splitting the compare into
> >
> >  clow = {0, 1, 2, 3 ... } == idxv
> >  chigh = {16, 17, 18, ... } == idxv;
> >  cmp = {clow, chigh}
> >
>
> We also don't have 512-bits byte/word blend instructions without
> TARGET_AVX512W, so how to use 512-bits cmp?

Oh, I see.  Guess two back-to-back vpternlog could emulate
the blend?  Not sure if important - I recall only knl didn't have bw?

> cut from i386-expand.c:
> in ix86_expand_sse_movcc
>  3682    case E_V64QImode:
>  3683      gen = gen_avx512bw_blendmv64qi; ---> TARGET_AVX512BW needed
>  3684      break;
>  3685    case E_V32HImode:
>  3686      gen = gen_avx512bw_blendmv32hi; --> TARGET_AVX512BW needed
>  3687      break;
>  3688    case E_V16SImode:
>  3689      gen = gen_avx512f_blendmv16si;
>  3690      break;
>  3691    case E_V8DImode:
>  3692      gen = gen_avx512f_blendmv8di;
>  3693      break;
>  3694    case E_V8DFmode:
>
> > faster, smaller and eventually even easier during expansion?
> >
> > +  gcc_assert (ix86_expand_vector_init_duplicate (false, mode, valv, val));
> > +  gcc_assert (ix86_expand_vector_init_duplicate (false, cmp_mode,
> > idxv, idx_tmp));
> >
> > side-effects in gcc_assert is considered bad style, use
> >
> >   ok = ix86_expand_vector_init_duplicate (false, mode, valv, val);
> >   gcc_assert (ok);
> >
> > +  vec[5] = constv;
> > +  ix86_expand_int_vcond (vec);
> >
> > this also returns a bool you probably should assert true.
> >
>
> Yes, will change.
>
> > Otherwise thanks for tackling this.
> >
> > Richard.
> >
> > > gcc/ChangeLog:
> > >
> > >         PR target/97194
> > >         * config/i386/i386-expand.c (ix86_expand_vector_set_var): New function.
> > >         * config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl.
> > >         * config/i386/predicates.md (vec_setm_operand): New predicate,
> > >         true for const_int_operand or register_operand under TARGET_AVX2.
> > >         * config/i386/sse.md (vec_set<mode>): Support both constant
> > >         and variable index vec_set.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >         * gcc.target/i386/avx2-vec-set-1.c: New test.
> > >         * gcc.target/i386/avx2-vec-set-2.c: New test.
> > >         * gcc.target/i386/avx512bw-vec-set-1.c: New test.
> > >         * gcc.target/i386/avx512bw-vec-set-2.c: New test.
> > >         * gcc.target/i386/avx512f-vec-set-2.c: New test.
> > >         * gcc.target/i386/avx512vl-vec-set-2.c: New test.
> > >
> > > --
> > > BR,
> > > Hongtao
>
>
>
> --
> BR,
> Hongtao

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] [PR target/97194] [AVX2] Support variable index vec_set.
  2020-10-19  9:07 ` Richard Biener
@ 2020-10-19  9:39   ` Hongtao Liu
  2020-10-19  9:55     ` Richard Biener
  0 siblings, 1 reply; 23+ messages in thread
From: Hongtao Liu @ 2020-10-19  9:39 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, H. J. Lu

On Mon, Oct 19, 2020 at 5:07 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Mon, Oct 19, 2020 at 10:21 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > Hi:
> >   It's implemented as below:
> > V setg (V v, int idx, T val)
> >
> > {
> >   V idxv = (V){idx, idx, idx, idx, idx, idx, idx, idx};
> >   V valv = (V){val, val, val, val, val, val, val, val};
> >   V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == idxv);
> >   v = (v & ~mask) | (valv & mask);
> >   return v;
> > }
> >
> > Bootstrap is fine, regression test for i386/x86-64 backend is ok.
> > Ok for trunk?
>
> Hmm, I guess you're trying to keep the code for !AVX512BW simple
> but isn't just splitting the compare into
>
>  clow = {0, 1, 2, 3 ... } == idxv
>  chigh = {16, 17, 18, ... } == idxv;
>  cmp = {clow, chigh}
>

We also don't have 512-bits byte/word blend instructions without
TARGET_AVX512W, so how to use 512-bits cmp?
cut from i386-expand.c:
in ix86_expand_sse_movcc
 3682    case E_V64QImode:
 3683      gen = gen_avx512bw_blendmv64qi; ---> TARGET_AVX512BW needed
 3684      break;
 3685    case E_V32HImode:
 3686      gen = gen_avx512bw_blendmv32hi; --> TARGET_AVX512BW needed
 3687      break;
 3688    case E_V16SImode:
 3689      gen = gen_avx512f_blendmv16si;
 3690      break;
 3691    case E_V8DImode:
 3692      gen = gen_avx512f_blendmv8di;
 3693      break;
 3694    case E_V8DFmode:

> faster, smaller and eventually even easier during expansion?
>
> +  gcc_assert (ix86_expand_vector_init_duplicate (false, mode, valv, val));
> +  gcc_assert (ix86_expand_vector_init_duplicate (false, cmp_mode,
> idxv, idx_tmp));
>
> side-effects in gcc_assert is considered bad style, use
>
>   ok = ix86_expand_vector_init_duplicate (false, mode, valv, val);
>   gcc_assert (ok);
>
> +  vec[5] = constv;
> +  ix86_expand_int_vcond (vec);
>
> this also returns a bool you probably should assert true.
>

Yes, will change.

> Otherwise thanks for tackling this.
>
> Richard.
>
> > gcc/ChangeLog:
> >
> >         PR target/97194
> >         * config/i386/i386-expand.c (ix86_expand_vector_set_var): New function.
> >         * config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl.
> >         * config/i386/predicates.md (vec_setm_operand): New predicate,
> >         true for const_int_operand or register_operand under TARGET_AVX2.
> >         * config/i386/sse.md (vec_set<mode>): Support both constant
> >         and variable index vec_set.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/i386/avx2-vec-set-1.c: New test.
> >         * gcc.target/i386/avx2-vec-set-2.c: New test.
> >         * gcc.target/i386/avx512bw-vec-set-1.c: New test.
> >         * gcc.target/i386/avx512bw-vec-set-2.c: New test.
> >         * gcc.target/i386/avx512f-vec-set-2.c: New test.
> >         * gcc.target/i386/avx512vl-vec-set-2.c: New test.
> >
> > --
> > BR,
> > Hongtao



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] [PR target/97194] [AVX2] Support variable index vec_set.
  2020-10-19  8:23 Hongtao Liu
@ 2020-10-19  9:07 ` Richard Biener
  2020-10-19  9:39   ` Hongtao Liu
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Biener @ 2020-10-19  9:07 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: GCC Patches, H. J. Lu

On Mon, Oct 19, 2020 at 10:21 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> Hi:
>   It's implemented as below:
> V setg (V v, int idx, T val)
>
> {
>   V idxv = (V){idx, idx, idx, idx, idx, idx, idx, idx};
>   V valv = (V){val, val, val, val, val, val, val, val};
>   V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == idxv);
>   v = (v & ~mask) | (valv & mask);
>   return v;
> }
>
> Bootstrap is fine, regression test for i386/x86-64 backend is ok.
> Ok for trunk?

Hmm, I guess you're trying to keep the code for !AVX512BW simple
but isn't just splitting the compare into

 clow = {0, 1, 2, 3 ... } == idxv
 chigh = {16, 17, 18, ... } == idxv;
 cmp = {clow, chigh}

faster, smaller and eventually even easier during expansion?

+  gcc_assert (ix86_expand_vector_init_duplicate (false, mode, valv, val));
+  gcc_assert (ix86_expand_vector_init_duplicate (false, cmp_mode,
idxv, idx_tmp));

side-effects in gcc_assert is considered bad style, use

  ok = ix86_expand_vector_init_duplicate (false, mode, valv, val);
  gcc_assert (ok);

+  vec[5] = constv;
+  ix86_expand_int_vcond (vec);

this also returns a bool you probably should assert true.

Otherwise thanks for tackling this.

Richard.

> gcc/ChangeLog:
>
>         PR target/97194
>         * config/i386/i386-expand.c (ix86_expand_vector_set_var): New function.
>         * config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl.
>         * config/i386/predicates.md (vec_setm_operand): New predicate,
>         true for const_int_operand or register_operand under TARGET_AVX2.
>         * config/i386/sse.md (vec_set<mode>): Support both constant
>         and variable index vec_set.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/avx2-vec-set-1.c: New test.
>         * gcc.target/i386/avx2-vec-set-2.c: New test.
>         * gcc.target/i386/avx512bw-vec-set-1.c: New test.
>         * gcc.target/i386/avx512bw-vec-set-2.c: New test.
>         * gcc.target/i386/avx512f-vec-set-2.c: New test.
>         * gcc.target/i386/avx512vl-vec-set-2.c: New test.
>
> --
> BR,
> Hongtao

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] [PR target/97194] [AVX2] Support variable index vec_set.
@ 2020-10-19  8:23 Hongtao Liu
  2020-10-19  9:07 ` Richard Biener
  0 siblings, 1 reply; 23+ messages in thread
From: Hongtao Liu @ 2020-10-19  8:23 UTC (permalink / raw)
  To: GCC Patches, Richard Biener

[-- Attachment #1: Type: text/plain, Size: 1186 bytes --]

Hi:
  It's implemented as below:
V setg (V v, int idx, T val)

{
  V idxv = (V){idx, idx, idx, idx, idx, idx, idx, idx};
  V valv = (V){val, val, val, val, val, val, val, val};
  V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == idxv);
  v = (v & ~mask) | (valv & mask);
  return v;
}

Bootstrap is fine, regression test for i386/x86-64 backend is ok.
Ok for trunk?

gcc/ChangeLog:

        PR target/97194
        * config/i386/i386-expand.c (ix86_expand_vector_set_var): New function.
        * config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl.
        * config/i386/predicates.md (vec_setm_operand): New predicate,
        true for const_int_operand or register_operand under TARGET_AVX2.
        * config/i386/sse.md (vec_set<mode>): Support both constant
        and variable index vec_set.

gcc/testsuite/ChangeLog:

        * gcc.target/i386/avx2-vec-set-1.c: New test.
        * gcc.target/i386/avx2-vec-set-2.c: New test.
        * gcc.target/i386/avx512bw-vec-set-1.c: New test.
        * gcc.target/i386/avx512bw-vec-set-2.c: New test.
        * gcc.target/i386/avx512f-vec-set-2.c: New test.
        * gcc.target/i386/avx512vl-vec-set-2.c: New test.

-- 
BR,
Hongtao

[-- Attachment #2: 0001-Support-variable-index-vec_set.patch --]
[-- Type: text/x-patch, Size: 15121 bytes --]

From d00b6ada0fc420da7cf1e91ccbf12ee0a923d8a3 Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Mon, 19 Oct 2020 16:04:39 +0800
Subject: [PATCH] Support variable index vec_set.

gcc/ChangeLog:

	PR target/97194
	* config/i386/i386-expand.c (ix86_expand_vector_set_var): New function.
	* config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl.
	* config/i386/predicates.md (vec_setm_operand): New predicate,
	true for const_int_operand or register_operand under TARGET_AVX2.
	* config/i386/sse.md (vec_set<mode>): Support both constant
	and variable index vec_set.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx2-vec-set-1.c: New test.
	* gcc.target/i386/avx2-vec-set-2.c: New test.
	* gcc.target/i386/avx512bw-vec-set-1.c: New test.
	* gcc.target/i386/avx512bw-vec-set-2.c: New test.
	* gcc.target/i386/avx512f-vec-set-2.c: New test.
	* gcc.target/i386/avx512vl-vec-set-2.c: New test.
---
 gcc/config/i386/i386-expand.c                 | 102 ++++++++++++++++++
 gcc/config/i386/i386-protos.h                 |   1 +
 gcc/config/i386/predicates.md                 |   6 ++
 gcc/config/i386/sse.md                        |   9 +-
 .../gcc.target/i386/avx2-vec-set-1.c          |  49 +++++++++
 .../gcc.target/i386/avx2-vec-set-2.c          |  50 +++++++++
 .../gcc.target/i386/avx512bw-vec-set-1.c      |  20 ++++
 .../gcc.target/i386/avx512bw-vec-set-2.c      |  44 ++++++++
 .../gcc.target/i386/avx512f-vec-set-2.c       |  42 ++++++++
 .../gcc.target/i386/avx512vl-vec-set-2.c      |  55 ++++++++++
 10 files changed, 375 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx2-vec-set-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx2-vec-set-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-vec-set-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-vec-set-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vec-set-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-vec-set-2.c

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index e6f8b314f18..63b11e5f945 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -14201,6 +14201,108 @@ ix86_expand_vector_init (bool mmx_ok, rtx target, rtx vals)
   ix86_expand_vector_init_general (mmx_ok, mode, target, vals);
 }
 
+/* Implemented as
+   V setg (V v, int idx, T val)
+   {
+     V idxv = (V){idx, idx, idx, idx, idx, idx, idx, idx};
+     V valv = (V){val, val, val, val, val, val, val, val};
+     V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == idxv);
+     v = (v & ~mask) | (valv & mask);
+     return v;
+   }.  */
+void
+ix86_expand_vector_set_var (rtx target, rtx val, rtx idx)
+{
+  rtx vec[64];
+  machine_mode mode = GET_MODE (target);
+  machine_mode cmp_mode = mode;
+  int n_elts = GET_MODE_NUNITS (mode);
+  rtx valv,idxv,constv,idx_tmp;
+
+  /* 512-bits vector byte/word broadcast and comparison only available
+     under TARGET_AVX512BW, break 512-bits vector into two 256-bits vector
+     when without TARGET_AVX512BW.  */
+  if ((mode == V32HImode || mode == V64QImode) && !TARGET_AVX512BW)
+    {
+      gcc_assert (TARGET_AVX512F);
+      rtx vhi, vlo, idx_hi;
+      machine_mode half_mode;
+      rtx (*extract_hi)(rtx, rtx);
+      rtx (*extract_lo)(rtx, rtx);
+
+      if (mode == V32HImode)
+	{
+	  half_mode = V16HImode;
+	  extract_hi = gen_vec_extract_hi_v32hi;
+	  extract_lo = gen_vec_extract_lo_v32hi;
+	}
+      else
+	{
+	  half_mode = V32QImode;
+	  extract_hi = gen_vec_extract_hi_v64qi;
+	  extract_lo = gen_vec_extract_lo_v64qi;
+	}
+
+      vhi = gen_reg_rtx (half_mode);
+      vlo = gen_reg_rtx (half_mode);
+      idx_hi = gen_reg_rtx (GET_MODE (idx));
+      emit_insn (extract_hi (vhi, target));
+      emit_insn (extract_lo (vlo, target));
+      vec[0] = idx_hi;
+      vec[1] = idx;
+      vec[2] = GEN_INT (n_elts/2);
+      ix86_expand_binary_operator (MINUS, GET_MODE (idx), vec);
+      ix86_expand_vector_set_var (vhi, val, idx_hi);
+      ix86_expand_vector_set_var (vlo, val, idx);
+      emit_insn (gen_rtx_SET (target, gen_rtx_VEC_CONCAT (mode, vlo, vhi)));
+      return;
+    }
+
+  if (FLOAT_MODE_P (GET_MODE_INNER (mode)))
+    {
+      switch (mode)
+	{
+	case E_V2DFmode:
+	  cmp_mode = V2DImode;
+	  break;
+	case E_V4DFmode:
+	  cmp_mode = V4DImode;
+	  break;
+	case E_V8DFmode:
+	  cmp_mode = V8DImode;
+	  break;
+	case E_V4SFmode:
+	  cmp_mode = V4SImode;
+	  break;
+	case E_V8SFmode:
+	  cmp_mode = V8SImode;
+	  break;
+	case E_V16SFmode:
+	  cmp_mode = V16SImode;
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+    }
+
+  for (int i = 0; i != n_elts; i++)
+    vec[i] = GEN_INT (i);
+  constv = gen_rtx_CONST_VECTOR (cmp_mode, gen_rtvec_v (n_elts, vec));
+  valv = gen_reg_rtx (mode);
+  idxv = gen_reg_rtx (cmp_mode);
+  idx_tmp = convert_to_mode (GET_MODE_INNER (cmp_mode), idx, 1);
+
+  gcc_assert (ix86_expand_vector_init_duplicate (false, mode, valv, val));
+  gcc_assert (ix86_expand_vector_init_duplicate (false, cmp_mode, idxv, idx_tmp));
+  vec[0] = target;
+  vec[1] = valv;
+  vec[2] = target;
+  vec[3] = gen_rtx_EQ (mode, idxv, constv);
+  vec[4] = idxv;
+  vec[5] = constv;
+  ix86_expand_int_vcond (vec);
+}
+
 void
 ix86_expand_vector_set (bool mmx_ok, rtx target, rtx val, int elt)
 {
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index c5b700efd0e..7a1dc3d4d64 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -242,6 +242,7 @@ extern rtx ix86_rewrite_tls_address (rtx);
 
 extern void ix86_expand_vector_init (bool, rtx, rtx);
 extern void ix86_expand_vector_set (bool, rtx, rtx, int);
+extern void ix86_expand_vector_set_var (rtx, rtx, rtx);
 extern void ix86_expand_vector_extract (bool, rtx, rtx, int);
 extern void ix86_expand_reduc (rtx (*)(rtx, rtx, rtx), rtx, rtx);
 
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index b03f9cd1c8c..5941a50d5d0 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -1023,6 +1023,12 @@ (define_predicate "incdec_operand"
   return op == const1_rtx || op == constm1_rtx;
 })
 
+;; True for registers, or const_int_operand, used to vec_setm expander.
+(define_predicate "vec_setm_operand"
+  (ior (and (match_operand 0 "register_operand")
+	    (match_test "TARGET_AVX2"))
+       (match_code "const_int")))
+
 ;; True for registers, or 1 or -1.  Used to optimize double-word shifts.
 (define_predicate "reg_or_pm1_operand"
   (ior (match_operand 0 "register_operand")
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 934b60a288f..30e39b41260 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -8310,11 +8310,14 @@ (define_insn "vec_setv2df_0"
 (define_expand "vec_set<mode>"
   [(match_operand:V 0 "register_operand")
    (match_operand:<ssescalarmode> 1 "register_operand")
-   (match_operand 2 "const_int_operand")]
+   (match_operand 2 "vec_setm_operand")]
   "TARGET_SSE"
 {
-  ix86_expand_vector_set (false, operands[0], operands[1],
-			  INTVAL (operands[2]));
+  if (CONST_INT_P (operands[2]))
+    ix86_expand_vector_set (false, operands[0], operands[1],
+			    INTVAL (operands[2]));
+  else
+    ix86_expand_vector_set_var (operands[0], operands[1], operands[2]);
   DONE;
 })
 
diff --git a/gcc/testsuite/gcc.target/i386/avx2-vec-set-1.c b/gcc/testsuite/gcc.target/i386/avx2-vec-set-1.c
new file mode 100644
index 00000000000..4c16ec5dfc4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx2-vec-set-1.c
@@ -0,0 +1,49 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx2 -O2 -mno-avx512f" } */
+/* { dg-final { scan-assembler-times {(?n)vpcmpeq[bwdq]} 12 } } */
+/* { dg-final { scan-assembler-times {(?n)vp?blendv} 12 } } */
+
+typedef char v32qi __attribute__ ((vector_size (32)));
+typedef char v16qi __attribute__ ((vector_size (16)));
+
+typedef short v16hi __attribute__ ((vector_size (32)));
+typedef short v8hi __attribute__ ((vector_size (16)));
+
+typedef int v8si __attribute__ ((vector_size (32)));
+typedef int v4si __attribute__ ((vector_size (16)));
+
+typedef long long v4di __attribute__ ((vector_size (32)));
+typedef long long v2di __attribute__ ((vector_size (16)));
+
+typedef float v8sf __attribute__ ((vector_size (32)));
+typedef float v4sf __attribute__ ((vector_size (16)));
+
+typedef double v4df __attribute__ ((vector_size (32)));
+typedef double v2df __attribute__ ((vector_size (16)));
+
+#define FOO(VTYPE, TYPE)			\
+  VTYPE						\
+  __attribute__ ((noipa))			\
+  foo_##VTYPE (VTYPE a, TYPE b, unsigned int c)	\
+  {						\
+    a[c] = b;					\
+    return a;					\
+  }						\
+
+FOO (v16qi, char);
+FOO (v32qi, char);
+
+FOO (v8hi, short);
+FOO (v16hi, short);
+
+FOO (v4si, int);
+FOO (v8si, int);
+
+FOO (v2di, long long);
+FOO (v4di, long long);
+
+FOO (v4sf, float);
+FOO (v8sf, float);
+
+FOO (v2df, double);
+FOO (v4df, double);
diff --git a/gcc/testsuite/gcc.target/i386/avx2-vec-set-2.c b/gcc/testsuite/gcc.target/i386/avx2-vec-set-2.c
new file mode 100644
index 00000000000..9086ef406f1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx2-vec-set-2.c
@@ -0,0 +1,50 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx2 } */
+/* { dg-options "-O2 -mavx2" } */
+
+
+#ifndef CHECK
+#define CHECK "avx2-check.h"
+#endif
+
+#ifndef TEST
+#define TEST avx2_test
+#endif
+
+#include CHECK
+
+#include "avx2-vec-set-1.c"
+
+#define CALC_TEST(vtype, type, N, idx)				\
+do								\
+  {								\
+    int i,val = idx * idx - idx * 3 + 16;			\
+    type res[N],exp[N];						\
+    vtype resv;							\
+    for (i = 0; i < N; i++)					\
+      {								\
+	res[i] = i * i - i * 3 + 15;				\
+	exp[i] = res[i];					\
+      }								\
+    exp[idx] = val;						\
+    resv = foo_##vtype (*(vtype *)&res[0], val, idx);		\
+    for (i = 0; i < N; i++)					\
+      {								\
+	if (resv[i] != exp[i])					\
+	  abort ();						\
+      }								\
+  }								\
+while (0)
+
+static void
+TEST (void)
+{
+  CALC_TEST (v32qi, char, 32, 17);
+  CALC_TEST (v16qi, char, 16, 5);
+  CALC_TEST (v16hi, short, 16, 9);
+  CALC_TEST (v8hi, short, 8, 6);
+  CALC_TEST (v8si, int, 8, 3);
+  CALC_TEST (v4si, int, 4, 2);
+  CALC_TEST (v4di, long long, 4, 1);
+  CALC_TEST (v2di, long long, 2, 0);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-vec-set-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-vec-set-1.c
new file mode 100644
index 00000000000..5cfbc85732e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-vec-set-1.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512vl -mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times {(?n)(?:vp?broadcast|vmovddup)} 36 } } */
+/* { dg-final { scan-assembler-times {(?n)vpcmp[bwdq][ \t]+\$0} 18 } } */
+
+typedef char v64qi __attribute__ ((vector_size (64)));
+typedef short v32hi __attribute__ ((vector_size (64)));
+typedef int v16si __attribute__ ((vector_size (64)));
+typedef long long v8di __attribute__ ((vector_size (64)));
+typedef float v16sf __attribute__ ((vector_size (64)));
+typedef double v8df __attribute__ ((vector_size (64)));
+
+#include "avx2-vec-set-1.c"
+
+FOO (v64qi, char);
+FOO (v32hi, short);
+FOO (v16si, int);
+FOO (v8di, long long);
+FOO (v16sf, float);
+FOO (v8df, double);
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-vec-set-2.c b/gcc/testsuite/gcc.target/i386/avx512bw-vec-set-2.c
new file mode 100644
index 00000000000..22e64183ebd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-vec-set-2.c
@@ -0,0 +1,44 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx512bw } */
+/* { dg-options "-O2 -mavx512bw" } */
+
+
+#ifndef CHECK
+#define CHECK "avx512f-check.h"
+#endif
+
+#define AVX512BW
+
+#include CHECK
+
+#include "avx512bw-vec-set-1.c"
+
+#define CALC_TEST(vtype, type, N, idx)				\
+do								\
+  {								\
+    int i,val = idx * idx - idx * 3 + 16;			\
+    type res[N],exp[N];						\
+    vtype resv;							\
+    for (i = 0; i < N; i++)					\
+      {								\
+	res[i] = i * i - i * 3 + 15;				\
+	exp[i] = res[i];					\
+      }								\
+    exp[idx] = val;						\
+    resv = foo_##vtype (*(vtype *)&res[0], val, idx);		\
+    for (i = 0; i < N; i++)					\
+      {								\
+	if (resv[i] != exp[i])					\
+	  abort ();						\
+      }								\
+  }								\
+while (0)
+
+static void
+test_512 (void)
+{
+  CALC_TEST (v64qi, char, 64, 50);
+  CALC_TEST (v32hi, short, 32, 30);
+  CALC_TEST (v16si, int, 16, 15);
+  CALC_TEST (v8di, long long, 8, 7);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vec-set-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vec-set-2.c
new file mode 100644
index 00000000000..8f2aa03ec11
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vec-set-2.c
@@ -0,0 +1,42 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx512f } */
+/* { dg-options "-O2 -mavx512f -mno-avx512bw" } */
+
+
+#ifndef CHECK
+#define CHECK "avx512f-check.h"
+#endif
+
+#define AVX512F
+
+#include CHECK
+
+#include "avx512bw-vec-set-1.c"
+
+#define CALC_TEST(vtype, type, N, idx)				\
+do								\
+  {								\
+    int i,val = idx * idx - idx * 3 + 16;			\
+    type res[N],exp[N];						\
+    vtype resv;							\
+    for (i = 0; i < N; i++)					\
+      {								\
+	res[i] = i * i - i * 3 + 15;				\
+	exp[i] = res[i];					\
+      }								\
+    exp[idx] = val;						\
+    resv = foo_##vtype (*(vtype *)&res[0], val, idx);		\
+    for (i = 0; i < N; i++)					\
+      {								\
+	if (resv[i] != exp[i])					\
+	  abort ();						\
+      }								\
+  }								\
+while (0)
+
+static void
+test_512 (void)
+{
+  CALC_TEST (v64qi, char, 64, 50);
+  CALC_TEST (v32hi, short, 32, 30);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vec-set-2.c b/gcc/testsuite/gcc.target/i386/avx512vl-vec-set-2.c
new file mode 100644
index 00000000000..4f327427a64
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-vec-set-2.c
@@ -0,0 +1,55 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx512bw } */
+/* { dg-require-effective-target avx512vl } */
+/* { dg-options "-O2 -mavx512bw -mavx512vl" } */
+
+
+#ifndef CHECK
+#define CHECK "avx512f-check.h"
+#endif
+
+#define AVX512VL
+#define AVX512BW
+
+#include CHECK
+
+#include "avx512bw-vec-set-1.c"
+
+#define CALC_TEST(vtype, type, N, idx)				\
+do								\
+  {								\
+    int i,val = idx * idx - idx * 3 + 16;			\
+    type res[N],exp[N];						\
+    vtype resv;							\
+    for (i = 0; i < N; i++)					\
+      {								\
+	res[i] = i * i - i * 3 + 15;				\
+	exp[i] = res[i];					\
+      }								\
+    exp[idx] = val;						\
+    resv = foo_##vtype (*(vtype *)&res[0], val, idx);		\
+    for (i = 0; i < N; i++)					\
+      {								\
+	if (resv[i] != exp[i])					\
+	  abort ();						\
+      }								\
+  }								\
+while (0)
+
+static void
+test_256 (void)
+{
+  CALC_TEST (v32qi, char, 32, 17);
+  CALC_TEST (v16hi, short, 16, 9);
+  CALC_TEST (v8si, int, 8, 3);
+  CALC_TEST (v4di, long long, 4, 1);
+}
+
+static void
+test_128 (void)
+{
+  CALC_TEST (v16qi, char, 16, 5);
+  CALC_TEST (v8hi, short, 8, 6);
+  CALC_TEST (v4si, int, 4, 2);
+  CALC_TEST (v2di, long long, 2, 0);
+}
-- 
2.18.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2020-11-26  1:43 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-11  8:45 [PATCH] [PR target/97194] [AVX2] Support variable index vec_set Uros Bizjak
2020-11-12  2:06 ` Hongtao Liu
2020-11-12  8:20   ` Uros Bizjak
2020-11-12  9:12     ` Hongtao Liu
2020-11-12  9:15       ` Hongtao Liu
2020-11-12  9:25         ` Hongtao Liu
2020-11-12 13:59           ` Richard Biener
2020-11-12 17:51             ` Uros Bizjak
2020-11-12 18:26               ` Uros Bizjak
2020-11-12 18:34                 ` Uros Bizjak
2020-11-16 11:57             ` Uros Bizjak
2020-11-19 10:54               ` Richard Sandiford
2020-11-16 10:16       ` Uros Bizjak
  -- strict thread matches above, loose matches on Subject: below --
2020-10-19  8:23 Hongtao Liu
2020-10-19  9:07 ` Richard Biener
2020-10-19  9:39   ` Hongtao Liu
2020-10-19  9:55     ` Richard Biener
2020-10-20  2:37       ` Hongtao Liu
2020-10-20  7:36         ` Richard Biener
2020-10-27  7:51           ` Hongtao Liu
2020-11-11  8:03             ` Hongtao Liu
2020-11-25 19:15               ` Jeff Law
2020-11-26  1:45                 ` Hongtao Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).