* [PATCH] Support logic shift left/right for avx512 mask type.
@ 2021-07-20 12:33 liuhongt
2021-07-20 13:40 ` Uros Bizjak
0 siblings, 1 reply; 6+ messages in thread
From: liuhongt @ 2021-07-20 12:33 UTC (permalink / raw)
To: gcc-patches; +Cc: ubizjak, crazylht, hjl.tools, rguenther
Hi:
As mention in https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575420.html
----cut start-----
> note for the lowpart we can just view-convert away the excess bits,
> fully re-using the mask. We generate surprisingly "good" code:
>
> kmovb %k1, %edi
> shrb $4, %dil
> kmovb %edi, %k2
>
> besides the lack of using kshiftrb. I guess we're just lacking
> a mask register alternative for
Yes, we can do it similar as kor/kand/kxor.
---cut end--------
Bootstrap and regtested on x86_64-linux-gnu{-m32,}.
Ok for trunk?
gcc/ChangeLog:
* config/i386/constraints.md (Wb): New constraint.
(Ww): Ditto.
* config/i386/i386.md (*ashlhi3_1): Extend to avx512 mask
shift.
(*ashlqi3_1): Ditto.
(*<insn><mode>3_1): Ditto.
(*<insn><mode>3_1): Ditto.
* config/i386/sse.md (k<code><mode>): New define_split after
it to convert generic shift pattern to mask shift ones.
gcc/testsuite/ChangeLog:
* gcc.target/i386/mask-shift.c: New test.
---
gcc/config/i386/constraints.md | 10 +++
gcc/config/i386/i386.md | 94 +++++++++++++++-------
gcc/config/i386/sse.md | 14 ++++
gcc/testsuite/gcc.target/i386/mask-shift.c | 83 +++++++++++++++++++
4 files changed, 173 insertions(+), 28 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/i386/mask-shift.c
diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 485e3f5b2cf..4aa28a5621c 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -222,6 +222,16 @@ (define_constraint "BC"
(match_operand 0 "vector_all_ones_operand"))))
;; Integer constant constraints.
+(define_constraint "Wb"
+ "Integer constant in the range 0 @dots{} 7, for 8-bit shifts."
+ (and (match_code "const_int")
+ (match_test "IN_RANGE (ival, 0, 7)")))
+
+(define_constraint "Ww"
+ "Integer constant in the range 0 @dots{} 15, for 16-bit shifts."
+ (and (match_code "const_int")
+ (match_test "IN_RANGE (ival, 0, 15)")))
+
(define_constraint "I"
"Integer constant in the range 0 @dots{} 31, for 32-bit shifts."
(and (match_code "const_int")
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 8b809c49fe0..c5f9bd4d4d8 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1136,6 +1136,7 @@ (define_mode_attr di [(SI "nF") (DI "Wd")])
;; Immediate operand constraint for shifts.
(define_mode_attr S [(QI "I") (HI "I") (SI "I") (DI "J") (TI "O")])
+(define_mode_attr KS [(QI "Wb") (HI "Ww") (SI "I") (DI "J")])
;; Print register name in the specified mode.
(define_mode_attr k [(QI "b") (HI "w") (SI "k") (DI "q")])
@@ -11088,9 +11089,9 @@ (define_insn "*bmi2_ashl<mode>3_1"
(set_attr "mode" "<MODE>")])
(define_insn "*ashl<mode>3_1"
- [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r")
- (ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "0,l,rm")
- (match_operand:QI 2 "nonmemory_operand" "c<S>,M,r")))
+ [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,?k")
+ (ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "0,l,rm,k")
+ (match_operand:QI 2 "nonmemory_operand" "c<S>,M,r,<KS>")))
(clobber (reg:CC FLAGS_REG))]
"ix86_binary_operator_ok (ASHIFT, <MODE>mode, operands)"
{
@@ -11098,6 +11099,7 @@ (define_insn "*ashl<mode>3_1"
{
case TYPE_LEA:
case TYPE_ISHIFTX:
+ case TYPE_MSKLOG:
return "#";
case TYPE_ALU:
@@ -11113,7 +11115,11 @@ (define_insn "*ashl<mode>3_1"
return "sal{<imodesuffix>}\t{%2, %0|%0, %2}";
}
}
- [(set_attr "isa" "*,*,bmi2")
+ [(set_attr "isa" "*,*,bmi2,avx512bw")
(set (attr "type")
(cond [(eq_attr "alternative" "1")
(const_string "lea")
@@ -11123,6 +11129,8 @@ (define_insn "*ashl<mode>3_1"
(match_operand 0 "register_operand"))
(match_operand 2 "const1_operand"))
(const_string "alu")
+ (eq_attr "alternative" "3")
+ (const_string "msklog")
]
(const_string "ishift")))
(set (attr "length_immediate")
@@ -11218,15 +11226,16 @@ (define_split
"operands[2] = gen_lowpart (SImode, operands[2]);")
(define_insn "*ashlhi3_1"
- [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,Yp")
- (ashift:HI (match_operand:HI 1 "nonimmediate_operand" "0,l")
- (match_operand:QI 2 "nonmemory_operand" "cI,M")))
+ [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,Yp,?k")
+ (ashift:HI (match_operand:HI 1 "nonimmediate_operand" "0,l,k")
+ (match_operand:QI 2 "nonmemory_operand" "cI,M,Ww")))
(clobber (reg:CC FLAGS_REG))]
"ix86_binary_operator_ok (ASHIFT, HImode, operands)"
{
switch (get_attr_type (insn))
{
case TYPE_LEA:
+ case TYPE_MSKLOG:
return "#";
case TYPE_ALU:
@@ -11241,9 +11246,12 @@ (define_insn "*ashlhi3_1"
return "sal{w}\t{%2, %0|%0, %2}";
}
}
- [(set (attr "type")
+ [(set_attr "isa" "*,*,avx512f")
+ (set (attr "type")
(cond [(eq_attr "alternative" "1")
(const_string "lea")
+ (eq_attr "alternative" "2")
+ (const_string "msklog")
(and (and (match_test "TARGET_DOUBLE_WITH_ADD")
(match_operand 0 "register_operand"))
(match_operand 2 "const1_operand"))
@@ -11259,18 +11270,19 @@ (define_insn "*ashlhi3_1"
(match_test "optimize_function_for_size_p (cfun)")))))
(const_string "0")
(const_string "*")))
- (set_attr "mode" "HI,SI")])
+ (set_attr "mode" "HI,SI,HI")])
(define_insn "*ashlqi3_1"
- [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r,Yp")
- (ashift:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,l")
- (match_operand:QI 2 "nonmemory_operand" "cI,cI,M")))
+ [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r,Yp,?k")
+ (ashift:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,l,k")
+ (match_operand:QI 2 "nonmemory_operand" "cI,cI,M,Wb")))
(clobber (reg:CC FLAGS_REG))]
"ix86_binary_operator_ok (ASHIFT, QImode, operands)"
{
switch (get_attr_type (insn))
{
case TYPE_LEA:
+ case TYPE_MSKLOG:
return "#";
case TYPE_ALU:
@@ -11298,9 +11307,12 @@ (define_insn "*ashlqi3_1"
}
}
}
- [(set (attr "type")
+ [(set_attr "isa" "*,*,*,avx512dq")
+ (set (attr "type")
(cond [(eq_attr "alternative" "2")
(const_string "lea")
+ (eq_attr "alternative" "3")
+ (const_string "msklog")
(and (and (match_test "TARGET_DOUBLE_WITH_ADD")
(match_operand 0 "register_operand"))
(match_operand 2 "const1_operand"))
@@ -11316,7 +11334,7 @@ (define_insn "*ashlqi3_1"
(match_test "optimize_function_for_size_p (cfun)")))))
(const_string "0")
(const_string "*")))
- (set_attr "mode" "QI,SI,SI")
+ (set_attr "mode" "QI,SI,SI,QI")
;; Potential partial reg stall on alternative 1.
(set (attr "preferred_for_speed")
(cond [(eq_attr "alternative" "1")
@@ -11819,16 +11837,17 @@ (define_insn "*bmi2_<insn><mode>3_1"
(set_attr "mode" "<MODE>")])
(define_insn "*<insn><mode>3_1"
- [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r")
+ [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,?k")
(any_shiftrt:SWI48
- (match_operand:SWI48 1 "nonimmediate_operand" "0,rm")
- (match_operand:QI 2 "nonmemory_operand" "c<S>,r")))
+ (match_operand:SWI48 1 "nonimmediate_operand" "0,rm,k")
+ (match_operand:QI 2 "nonmemory_operand" "c<S>,r,<KS>")))
(clobber (reg:CC FLAGS_REG))]
"ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
{
switch (get_attr_type (insn))
{
case TYPE_ISHIFTX:
+ case TYPE_MSKLOG:
return "#";
default:
@@ -11839,11 +11858,16 @@ (define_insn "*<insn><mode>3_1"
return "<shift>{<imodesuffix>}\t{%2, %0|%0, %2}";
}
}
- [(set_attr "isa" "*,bmi2")
- (set_attr "type" "ishift,ishiftx")
+ [(set_attr "isa" "*,bmi2,avx512bw")
+ (set_attr "type" "ishift,ishiftx,msklog")
+ (set (attr "enabled")
+ (if_then_else (eq_attr "alternative" "2")
+ (symbol_ref "<CODE> == LSHIFTRT && TARGET_AVX512BW")
+ (const_string "*")))
(set (attr "length_immediate")
(if_then_else
- (and (match_operand 2 "const1_operand")
+ (and (and (match_operand 2 "const1_operand")
+ (eq_attr "alternative" "0"))
(ior (match_test "TARGET_SHIFT1")
(match_test "optimize_function_for_size_p (cfun)")))
(const_string "0")
@@ -11916,27 +11940,41 @@ (define_split
"operands[2] = gen_lowpart (SImode, operands[2]);")
(define_insn "*<insn><mode>3_1"
- [(set (match_operand:SWI12 0 "nonimmediate_operand" "=<r>m")
+ [(set (match_operand:SWI12 0 "nonimmediate_operand" "=<r>m, ?k")
(any_shiftrt:SWI12
- (match_operand:SWI12 1 "nonimmediate_operand" "0")
- (match_operand:QI 2 "nonmemory_operand" "c<S>")))
+ (match_operand:SWI12 1 "nonimmediate_operand" "0, k")
+ (match_operand:QI 2 "nonmemory_operand" "c<S>, <KS>")))
(clobber (reg:CC FLAGS_REG))]
"ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
{
- if (operands[2] == const1_rtx
- && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
- return "<shift>{<imodesuffix>}\t%0";
- else
- return "<shift>{<imodesuffix>}\t{%2, %0|%0, %2}";
+ switch (get_attr_type (insn))
+ {
+ case TYPE_ISHIFT:
+ if (operands[2] == const1_rtx
+ && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+ return "<shift>{<imodesuffix>}\t%0";
+ else
+ return "<shift>{<imodesuffix>}\t{%2, %0|%0, %2}";
+ case TYPE_MSKLOG:
+ return "#";
+ default:
+ gcc_unreachable ();
+ }
}
- [(set_attr "type" "ishift")
+ [(set_attr "type" "ishift,msklog")
(set (attr "length_immediate")
(if_then_else
- (and (match_operand 2 "const1_operand")
+ (and (and (match_operand 2 "const1_operand")
+ (eq_attr "alternative" "0"))
(ior (match_test "TARGET_SHIFT1")
(match_test "optimize_function_for_size_p (cfun)")))
(const_string "0")
(const_string "*")))
+ (set (attr "enabled")
+ (if_then_else (eq_attr "alternative" "1")
+ (symbol_ref "<CODE> == LSHIFTRT && TARGET_AVX512F
+ && (<MODE>mode != QImode || TARGET_AVX512DQ)")
+ (const_string "*")))
(set_attr "mode" "<MODE>")])
(define_insn "*<insn><mode>3_1_slp"
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index ab29999023d..f8759e4d758 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1755,6 +1755,20 @@ (define_insn "k<code><mode>"
(set_attr "prefix" "vex")
(set_attr "mode" "<MODE>")])
+(define_split
+ [(set (match_operand:SWI1248_AVX512BW 0 "mask_reg_operand")
+ (any_lshift:SWI1248_AVX512BW
+ (match_operand:SWI1248_AVX512BW 1 "mask_reg_operand")
+ (match_operand 2 "const_int_operand")))
+ (clobber (reg:CC FLAGS_REG))]
+ "TARGET_AVX512F && reload_completed"
+ [(parallel
+ [(set (match_dup 0)
+ (any_lshift:SWI1248_AVX512BW
+ (match_dup 1)
+ (match_dup 2)))
+ (unspec [(const_int 0)] UNSPEC_MASKOP)])])
+
(define_insn "ktest<mode>"
[(set (reg:CC FLAGS_REG)
(unspec:CC
diff --git a/gcc/testsuite/gcc.target/i386/mask-shift.c b/gcc/testsuite/gcc.target/i386/mask-shift.c
new file mode 100644
index 00000000000..4cb6ef37821
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/mask-shift.c
@@ -0,0 +1,83 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -mavx512dq -O2" } */
+
+#include<immintrin.h>
+void
+fooq (__m512i a, __m512i b, void* p)
+{
+ __mmask8 m1 = _mm512_cmpeq_epi64_mask (a, b);
+ m1 >>= 4;
+ _mm512_mask_storeu_epi64 (p, m1, a);
+}
+
+/* { dg-final { scan-assembler-times {(?n)kshiftrb} "1" } } */
+
+void
+food (__m512i a, __m512i b, void* p)
+{
+ __mmask16 m1 = _mm512_cmpeq_epi32_mask (a, b);
+ m1 >>= 8;
+ _mm512_mask_storeu_epi32 (p, m1, a);
+}
+
+/* { dg-final { scan-assembler-times {(?n)kshiftrw} "1" } } */
+
+void
+foow (__m512i a, __m512i b, void* p)
+{
+ __mmask32 m1 = _mm512_cmpeq_epi16_mask (a, b);
+ m1 >>= 16;
+ _mm512_mask_storeu_epi16 (p, m1, a);
+}
+
+/* { dg-final { scan-assembler-times {(?n)kshiftrd} "1" } } */
+
+void
+foob (__m512i a, __m512i b, void* p)
+{
+ __mmask64 m1 = _mm512_cmpeq_epi8_mask (a, b);
+ m1 >>= 32;
+ _mm512_mask_storeu_epi8 (p, m1, a);
+}
+
+/* { dg-final { scan-assembler-times {(?n)kshiftrq} "1" { target { ! ia32 } } } } */
+
+void
+fooq1 (__m512i a, __m512i b, void* p)
+{
+ __mmask8 m1 = _mm512_cmpeq_epi64_mask (a, b);
+ m1 <<= 4;
+ _mm512_mask_storeu_epi64 (p, m1, a);
+}
+
+/* { dg-final { scan-assembler-times {(?n)kshiftlb} "1" } } */
+
+void
+food1 (__m512i a, __m512i b, void* p)
+{
+ __mmask16 m1 = _mm512_cmpeq_epi32_mask (a, b);
+ m1 <<= 8;
+ _mm512_mask_storeu_epi32 (p, m1, a);
+}
+
+/* { dg-final { scan-assembler-times {(?n)kshiftlw} "1" } } */
+
+void
+foow1 (__m512i a, __m512i b, void* p)
+{
+ __mmask32 m1 = _mm512_cmpeq_epi16_mask (a, b);
+ m1 <<= 16;
+ _mm512_mask_storeu_epi16 (p, m1, a);
+}
+
+/* { dg-final { scan-assembler-times {(?n)kshiftld} "1" } } */
+
+void
+foob1 (__m512i a, __m512i b, void* p)
+{
+ __mmask64 m1 = _mm512_cmpeq_epi8_mask (a, b);
+ m1 <<= 32;
+ _mm512_mask_storeu_epi8 (p, m1, a);
+}
+
+/* { dg-final { scan-assembler-times {(?n)kshiftlq} "1" { target { ! ia32 } } } } */
--
2.18.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Support logic shift left/right for avx512 mask type.
2021-07-20 12:33 [PATCH] Support logic shift left/right for avx512 mask type liuhongt
@ 2021-07-20 13:40 ` Uros Bizjak
2021-07-21 3:11 ` Hongtao Liu
0 siblings, 1 reply; 6+ messages in thread
From: Uros Bizjak @ 2021-07-20 13:40 UTC (permalink / raw)
To: liuhongt; +Cc: gcc-patches, Hongtao Liu, H. J. Lu, Richard Biener
On Tue, Jul 20, 2021 at 2:33 PM liuhongt <hongtao.liu@intel.com> wrote:
>
> Hi:
> As mention in https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575420.html
>
> ----cut start-----
> > note for the lowpart we can just view-convert away the excess bits,
> > fully re-using the mask. We generate surprisingly "good" code:
> >
> > kmovb %k1, %edi
> > shrb $4, %dil
> > kmovb %edi, %k2
> >
> > besides the lack of using kshiftrb. I guess we're just lacking
> > a mask register alternative for
> Yes, we can do it similar as kor/kand/kxor.
> ---cut end--------
>
> Bootstrap and regtested on x86_64-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
> * config/i386/constraints.md (Wb): New constraint.
> (Ww): Ditto.
> * config/i386/i386.md (*ashlhi3_1): Extend to avx512 mask
> shift.
> (*ashlqi3_1): Ditto.
> (*<insn><mode>3_1): Ditto.
> (*<insn><mode>3_1): Ditto.
> * config/i386/sse.md (k<code><mode>): New define_split after
> it to convert generic shift pattern to mask shift ones.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/mask-shift.c: New test.
> ---
> gcc/config/i386/constraints.md | 10 +++
> gcc/config/i386/i386.md | 94 +++++++++++++++-------
> gcc/config/i386/sse.md | 14 ++++
> gcc/testsuite/gcc.target/i386/mask-shift.c | 83 +++++++++++++++++++
> 4 files changed, 173 insertions(+), 28 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/i386/mask-shift.c
>
> diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
> index 485e3f5b2cf..4aa28a5621c 100644
> --- a/gcc/config/i386/constraints.md
> +++ b/gcc/config/i386/constraints.md
> @@ -222,6 +222,16 @@ (define_constraint "BC"
> (match_operand 0 "vector_all_ones_operand"))))
>
> ;; Integer constant constraints.
> +(define_constraint "Wb"
> + "Integer constant in the range 0 @dots{} 7, for 8-bit shifts."
> + (and (match_code "const_int")
> + (match_test "IN_RANGE (ival, 0, 7)")))
> +
> +(define_constraint "Ww"
> + "Integer constant in the range 0 @dots{} 15, for 16-bit shifts."
> + (and (match_code "const_int")
> + (match_test "IN_RANGE (ival, 0, 15)")))
> +
> (define_constraint "I"
> "Integer constant in the range 0 @dots{} 31, for 32-bit shifts."
> (and (match_code "const_int")
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 8b809c49fe0..c5f9bd4d4d8 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -1136,6 +1136,7 @@ (define_mode_attr di [(SI "nF") (DI "Wd")])
>
> ;; Immediate operand constraint for shifts.
> (define_mode_attr S [(QI "I") (HI "I") (SI "I") (DI "J") (TI "O")])
> +(define_mode_attr KS [(QI "Wb") (HI "Ww") (SI "I") (DI "J")])
>
> ;; Print register name in the specified mode.
> (define_mode_attr k [(QI "b") (HI "w") (SI "k") (DI "q")])
> @@ -11088,9 +11089,9 @@ (define_insn "*bmi2_ashl<mode>3_1"
> (set_attr "mode" "<MODE>")])
>
> (define_insn "*ashl<mode>3_1"
> - [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r")
> - (ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "0,l,rm")
> - (match_operand:QI 2 "nonmemory_operand" "c<S>,M,r")))
> + [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,?k")
> + (ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "0,l,rm,k")
> + (match_operand:QI 2 "nonmemory_operand" "c<S>,M,r,<KS>")))
> (clobber (reg:CC FLAGS_REG))]
> "ix86_binary_operator_ok (ASHIFT, <MODE>mode, operands)"
> {
> @@ -11098,6 +11099,7 @@ (define_insn "*ashl<mode>3_1"
> {
> case TYPE_LEA:
> case TYPE_ISHIFTX:
> + case TYPE_MSKLOG:
> return "#";
>
> case TYPE_ALU:
> @@ -11113,7 +11115,11 @@ (define_insn "*ashl<mode>3_1"
> return "sal{<imodesuffix>}\t{%2, %0|%0, %2}";
> }
> }
> - [(set_attr "isa" "*,*,bmi2")
> + [(set_attr "isa" "*,*,bmi2,avx512bw")
> (set (attr "type")
> (cond [(eq_attr "alternative" "1")
> (const_string "lea")
> @@ -11123,6 +11129,8 @@ (define_insn "*ashl<mode>3_1"
> (match_operand 0 "register_operand"))
> (match_operand 2 "const1_operand"))
> (const_string "alu")
> + (eq_attr "alternative" "3")
> + (const_string "msklog")
> ]
> (const_string "ishift")))
> (set (attr "length_immediate")
> @@ -11218,15 +11226,16 @@ (define_split
> "operands[2] = gen_lowpart (SImode, operands[2]);")
>
> (define_insn "*ashlhi3_1"
> - [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,Yp")
> - (ashift:HI (match_operand:HI 1 "nonimmediate_operand" "0,l")
> - (match_operand:QI 2 "nonmemory_operand" "cI,M")))
> + [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,Yp,?k")
> + (ashift:HI (match_operand:HI 1 "nonimmediate_operand" "0,l,k")
> + (match_operand:QI 2 "nonmemory_operand" "cI,M,Ww")))
> (clobber (reg:CC FLAGS_REG))]
> "ix86_binary_operator_ok (ASHIFT, HImode, operands)"
> {
> switch (get_attr_type (insn))
> {
> case TYPE_LEA:
> + case TYPE_MSKLOG:
> return "#";
>
> case TYPE_ALU:
> @@ -11241,9 +11246,12 @@ (define_insn "*ashlhi3_1"
> return "sal{w}\t{%2, %0|%0, %2}";
> }
> }
> - [(set (attr "type")
> + [(set_attr "isa" "*,*,avx512f")
> + (set (attr "type")
> (cond [(eq_attr "alternative" "1")
> (const_string "lea")
> + (eq_attr "alternative" "2")
> + (const_string "msklog")
> (and (and (match_test "TARGET_DOUBLE_WITH_ADD")
> (match_operand 0 "register_operand"))
> (match_operand 2 "const1_operand"))
> @@ -11259,18 +11270,19 @@ (define_insn "*ashlhi3_1"
> (match_test "optimize_function_for_size_p (cfun)")))))
> (const_string "0")
> (const_string "*")))
> - (set_attr "mode" "HI,SI")])
> + (set_attr "mode" "HI,SI,HI")])
>
> (define_insn "*ashlqi3_1"
> - [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r,Yp")
> - (ashift:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,l")
> - (match_operand:QI 2 "nonmemory_operand" "cI,cI,M")))
> + [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r,Yp,?k")
> + (ashift:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,l,k")
> + (match_operand:QI 2 "nonmemory_operand" "cI,cI,M,Wb")))
> (clobber (reg:CC FLAGS_REG))]
> "ix86_binary_operator_ok (ASHIFT, QImode, operands)"
> {
> switch (get_attr_type (insn))
> {
> case TYPE_LEA:
> + case TYPE_MSKLOG:
> return "#";
>
> case TYPE_ALU:
> @@ -11298,9 +11307,12 @@ (define_insn "*ashlqi3_1"
> }
> }
> }
> - [(set (attr "type")
> + [(set_attr "isa" "*,*,*,avx512dq")
> + (set (attr "type")
> (cond [(eq_attr "alternative" "2")
> (const_string "lea")
> + (eq_attr "alternative" "3")
> + (const_string "msklog")
> (and (and (match_test "TARGET_DOUBLE_WITH_ADD")
> (match_operand 0 "register_operand"))
> (match_operand 2 "const1_operand"))
> @@ -11316,7 +11334,7 @@ (define_insn "*ashlqi3_1"
> (match_test "optimize_function_for_size_p (cfun)")))))
> (const_string "0")
> (const_string "*")))
> - (set_attr "mode" "QI,SI,SI")
> + (set_attr "mode" "QI,SI,SI,QI")
> ;; Potential partial reg stall on alternative 1.
> (set (attr "preferred_for_speed")
> (cond [(eq_attr "alternative" "1")
> @@ -11819,16 +11837,17 @@ (define_insn "*bmi2_<insn><mode>3_1"
> (set_attr "mode" "<MODE>")])
>
> (define_insn "*<insn><mode>3_1"
> - [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r")
> + [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,?k")
> (any_shiftrt:SWI48
> - (match_operand:SWI48 1 "nonimmediate_operand" "0,rm")
> - (match_operand:QI 2 "nonmemory_operand" "c<S>,r")))
> + (match_operand:SWI48 1 "nonimmediate_operand" "0,rm,k")
> + (match_operand:QI 2 "nonmemory_operand" "c<S>,r,<KS>")))
> (clobber (reg:CC FLAGS_REG))]
> "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
> {
> switch (get_attr_type (insn))
> {
> case TYPE_ISHIFTX:
> + case TYPE_MSKLOG:
> return "#";
>
> default:
> @@ -11839,11 +11858,16 @@ (define_insn "*<insn><mode>3_1"
> return "<shift>{<imodesuffix>}\t{%2, %0|%0, %2}";
> }
> }
> - [(set_attr "isa" "*,bmi2")
> - (set_attr "type" "ishift,ishiftx")
> + [(set_attr "isa" "*,bmi2,avx512bw")
> + (set_attr "type" "ishift,ishiftx,msklog")
> + (set (attr "enabled")
> + (if_then_else (eq_attr "alternative" "2")
> + (symbol_ref "<CODE> == LSHIFTRT && TARGET_AVX512BW")
Please rather split the pattern to ASHIFTRT and LSHIFTRT. The
macroization has no point if we need to use enabled attribute in this
way.
> + (const_string "*")))
> (set (attr "length_immediate")
> (if_then_else
> - (and (match_operand 2 "const1_operand")
> + (and (and (match_operand 2 "const1_operand")
> + (eq_attr "alternative" "0"))
> (ior (match_test "TARGET_SHIFT1")
> (match_test "optimize_function_for_size_p (cfun)")))
> (const_string "0")
> @@ -11916,27 +11940,41 @@ (define_split
> "operands[2] = gen_lowpart (SImode, operands[2]);")
>
> (define_insn "*<insn><mode>3_1"
> - [(set (match_operand:SWI12 0 "nonimmediate_operand" "=<r>m")
> + [(set (match_operand:SWI12 0 "nonimmediate_operand" "=<r>m, ?k")
> (any_shiftrt:SWI12
> - (match_operand:SWI12 1 "nonimmediate_operand" "0")
> - (match_operand:QI 2 "nonmemory_operand" "c<S>")))
> + (match_operand:SWI12 1 "nonimmediate_operand" "0, k")
> + (match_operand:QI 2 "nonmemory_operand" "c<S>, <KS>")))
> (clobber (reg:CC FLAGS_REG))]
> "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
> {
> - if (operands[2] == const1_rtx
> - && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
> - return "<shift>{<imodesuffix>}\t%0";
> - else
> - return "<shift>{<imodesuffix>}\t{%2, %0|%0, %2}";
> + switch (get_attr_type (insn))
> + {
> + case TYPE_ISHIFT:
> + if (operands[2] == const1_rtx
> + && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
> + return "<shift>{<imodesuffix>}\t%0";
> + else
> + return "<shift>{<imodesuffix>}\t{%2, %0|%0, %2}";
> + case TYPE_MSKLOG:
> + return "#";
> + default:
> + gcc_unreachable ();
> + }
> }
> - [(set_attr "type" "ishift")
> + [(set_attr "type" "ishift,msklog")
> (set (attr "length_immediate")
> (if_then_else
> - (and (match_operand 2 "const1_operand")
> + (and (and (match_operand 2 "const1_operand")
> + (eq_attr "alternative" "0"))
> (ior (match_test "TARGET_SHIFT1")
> (match_test "optimize_function_for_size_p (cfun)")))
> (const_string "0")
> (const_string "*")))
> + (set (attr "enabled")
> + (if_then_else (eq_attr "alternative" "1")
> + (symbol_ref "<CODE> == LSHIFTRT && TARGET_AVX512F
> + && (<MODE>mode != QImode || TARGET_AVX512DQ)")
Also here, please split out LSHIFTRT and perhaps use conditional
constraint to avoid enabled attribute.
Uros.
> + (const_string "*")))
> (set_attr "mode" "<MODE>")])
>
> (define_insn "*<insn><mode>3_1_slp"
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index ab29999023d..f8759e4d758 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -1755,6 +1755,20 @@ (define_insn "k<code><mode>"
> (set_attr "prefix" "vex")
> (set_attr "mode" "<MODE>")])
>
> +(define_split
> + [(set (match_operand:SWI1248_AVX512BW 0 "mask_reg_operand")
> + (any_lshift:SWI1248_AVX512BW
> + (match_operand:SWI1248_AVX512BW 1 "mask_reg_operand")
> + (match_operand 2 "const_int_operand")))
> + (clobber (reg:CC FLAGS_REG))]
> + "TARGET_AVX512F && reload_completed"
> + [(parallel
> + [(set (match_dup 0)
> + (any_lshift:SWI1248_AVX512BW
> + (match_dup 1)
> + (match_dup 2)))
> + (unspec [(const_int 0)] UNSPEC_MASKOP)])])
> +
> (define_insn "ktest<mode>"
> [(set (reg:CC FLAGS_REG)
> (unspec:CC
> diff --git a/gcc/testsuite/gcc.target/i386/mask-shift.c b/gcc/testsuite/gcc.target/i386/mask-shift.c
> new file mode 100644
> index 00000000000..4cb6ef37821
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/mask-shift.c
> @@ -0,0 +1,83 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx512bw -mavx512dq -O2" } */
> +
> +#include<immintrin.h>
> +void
> +fooq (__m512i a, __m512i b, void* p)
> +{
> + __mmask8 m1 = _mm512_cmpeq_epi64_mask (a, b);
> + m1 >>= 4;
> + _mm512_mask_storeu_epi64 (p, m1, a);
> +}
> +
> +/* { dg-final { scan-assembler-times {(?n)kshiftrb} "1" } } */
> +
> +void
> +food (__m512i a, __m512i b, void* p)
> +{
> + __mmask16 m1 = _mm512_cmpeq_epi32_mask (a, b);
> + m1 >>= 8;
> + _mm512_mask_storeu_epi32 (p, m1, a);
> +}
> +
> +/* { dg-final { scan-assembler-times {(?n)kshiftrw} "1" } } */
> +
> +void
> +foow (__m512i a, __m512i b, void* p)
> +{
> + __mmask32 m1 = _mm512_cmpeq_epi16_mask (a, b);
> + m1 >>= 16;
> + _mm512_mask_storeu_epi16 (p, m1, a);
> +}
> +
> +/* { dg-final { scan-assembler-times {(?n)kshiftrd} "1" } } */
> +
> +void
> +foob (__m512i a, __m512i b, void* p)
> +{
> + __mmask64 m1 = _mm512_cmpeq_epi8_mask (a, b);
> + m1 >>= 32;
> + _mm512_mask_storeu_epi8 (p, m1, a);
> +}
> +
> +/* { dg-final { scan-assembler-times {(?n)kshiftrq} "1" { target { ! ia32 } } } } */
> +
> +void
> +fooq1 (__m512i a, __m512i b, void* p)
> +{
> + __mmask8 m1 = _mm512_cmpeq_epi64_mask (a, b);
> + m1 <<= 4;
> + _mm512_mask_storeu_epi64 (p, m1, a);
> +}
> +
> +/* { dg-final { scan-assembler-times {(?n)kshiftlb} "1" } } */
> +
> +void
> +food1 (__m512i a, __m512i b, void* p)
> +{
> + __mmask16 m1 = _mm512_cmpeq_epi32_mask (a, b);
> + m1 <<= 8;
> + _mm512_mask_storeu_epi32 (p, m1, a);
> +}
> +
> +/* { dg-final { scan-assembler-times {(?n)kshiftlw} "1" } } */
> +
> +void
> +foow1 (__m512i a, __m512i b, void* p)
> +{
> + __mmask32 m1 = _mm512_cmpeq_epi16_mask (a, b);
> + m1 <<= 16;
> + _mm512_mask_storeu_epi16 (p, m1, a);
> +}
> +
> +/* { dg-final { scan-assembler-times {(?n)kshiftld} "1" } } */
> +
> +void
> +foob1 (__m512i a, __m512i b, void* p)
> +{
> + __mmask64 m1 = _mm512_cmpeq_epi8_mask (a, b);
> + m1 <<= 32;
> + _mm512_mask_storeu_epi8 (p, m1, a);
> +}
> +
> +/* { dg-final { scan-assembler-times {(?n)kshiftlq} "1" { target { ! ia32 } } } } */
> --
> 2.18.1
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Support logic shift left/right for avx512 mask type.
2021-07-20 13:40 ` Uros Bizjak
@ 2021-07-21 3:11 ` Hongtao Liu
2021-07-21 8:22 ` Uros Bizjak
0 siblings, 1 reply; 6+ messages in thread
From: Hongtao Liu @ 2021-07-21 3:11 UTC (permalink / raw)
To: Uros Bizjak; +Cc: liuhongt, gcc-patches, H. J. Lu, Richard Biener
[-- Attachment #1: Type: text/plain, Size: 16563 bytes --]
On Tue, Jul 20, 2021 at 9:41 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Tue, Jul 20, 2021 at 2:33 PM liuhongt <hongtao.liu@intel.com> wrote:
> >
> > Hi:
> > As mention in https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575420.html
> >
> > ----cut start-----
> > > note for the lowpart we can just view-convert away the excess bits,
> > > fully re-using the mask. We generate surprisingly "good" code:
> > >
> > > kmovb %k1, %edi
> > > shrb $4, %dil
> > > kmovb %edi, %k2
> > >
> > > besides the lack of using kshiftrb. I guess we're just lacking
> > > a mask register alternative for
> > Yes, we can do it similar as kor/kand/kxor.
> > ---cut end--------
> >
> > Bootstrap and regtested on x86_64-linux-gnu{-m32,}.
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> > * config/i386/constraints.md (Wb): New constraint.
> > (Ww): Ditto.
> > * config/i386/i386.md (*ashlhi3_1): Extend to avx512 mask
> > shift.
> > (*ashlqi3_1): Ditto.
> > (*<insn><mode>3_1): Ditto.
> > (*<insn><mode>3_1): Ditto.
> > * config/i386/sse.md (k<code><mode>): New define_split after
> > it to convert generic shift pattern to mask shift ones.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/mask-shift.c: New test.
> > ---
> > gcc/config/i386/constraints.md | 10 +++
> > gcc/config/i386/i386.md | 94 +++++++++++++++-------
> > gcc/config/i386/sse.md | 14 ++++
> > gcc/testsuite/gcc.target/i386/mask-shift.c | 83 +++++++++++++++++++
> > 4 files changed, 173 insertions(+), 28 deletions(-)
> > create mode 100644 gcc/testsuite/gcc.target/i386/mask-shift.c
> >
> > diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
> > index 485e3f5b2cf..4aa28a5621c 100644
> > --- a/gcc/config/i386/constraints.md
> > +++ b/gcc/config/i386/constraints.md
> > @@ -222,6 +222,16 @@ (define_constraint "BC"
> > (match_operand 0 "vector_all_ones_operand"))))
> >
> > ;; Integer constant constraints.
> > +(define_constraint "Wb"
> > + "Integer constant in the range 0 @dots{} 7, for 8-bit shifts."
> > + (and (match_code "const_int")
> > + (match_test "IN_RANGE (ival, 0, 7)")))
> > +
> > +(define_constraint "Ww"
> > + "Integer constant in the range 0 @dots{} 15, for 16-bit shifts."
> > + (and (match_code "const_int")
> > + (match_test "IN_RANGE (ival, 0, 15)")))
> > +
> > (define_constraint "I"
> > "Integer constant in the range 0 @dots{} 31, for 32-bit shifts."
> > (and (match_code "const_int")
> > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > index 8b809c49fe0..c5f9bd4d4d8 100644
> > --- a/gcc/config/i386/i386.md
> > +++ b/gcc/config/i386/i386.md
> > @@ -1136,6 +1136,7 @@ (define_mode_attr di [(SI "nF") (DI "Wd")])
> >
> > ;; Immediate operand constraint for shifts.
> > (define_mode_attr S [(QI "I") (HI "I") (SI "I") (DI "J") (TI "O")])
> > +(define_mode_attr KS [(QI "Wb") (HI "Ww") (SI "I") (DI "J")])
> >
> > ;; Print register name in the specified mode.
> > (define_mode_attr k [(QI "b") (HI "w") (SI "k") (DI "q")])
> > @@ -11088,9 +11089,9 @@ (define_insn "*bmi2_ashl<mode>3_1"
> > (set_attr "mode" "<MODE>")])
> >
> > (define_insn "*ashl<mode>3_1"
> > - [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r")
> > - (ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "0,l,rm")
> > - (match_operand:QI 2 "nonmemory_operand" "c<S>,M,r")))
> > + [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,?k")
> > + (ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "0,l,rm,k")
> > + (match_operand:QI 2 "nonmemory_operand" "c<S>,M,r,<KS>")))
> > (clobber (reg:CC FLAGS_REG))]
> > "ix86_binary_operator_ok (ASHIFT, <MODE>mode, operands)"
> > {
> > @@ -11098,6 +11099,7 @@ (define_insn "*ashl<mode>3_1"
> > {
> > case TYPE_LEA:
> > case TYPE_ISHIFTX:
> > + case TYPE_MSKLOG:
> > return "#";
> >
> > case TYPE_ALU:
> > @@ -11113,7 +11115,11 @@ (define_insn "*ashl<mode>3_1"
> > return "sal{<imodesuffix>}\t{%2, %0|%0, %2}";
> > }
> > }
> > - [(set_attr "isa" "*,*,bmi2")
> > + [(set_attr "isa" "*,*,bmi2,avx512bw")
> > (set (attr "type")
> > (cond [(eq_attr "alternative" "1")
> > (const_string "lea")
> > @@ -11123,6 +11129,8 @@ (define_insn "*ashl<mode>3_1"
> > (match_operand 0 "register_operand"))
> > (match_operand 2 "const1_operand"))
> > (const_string "alu")
> > + (eq_attr "alternative" "3")
> > + (const_string "msklog")
> > ]
> > (const_string "ishift")))
> > (set (attr "length_immediate")
> > @@ -11218,15 +11226,16 @@ (define_split
> > "operands[2] = gen_lowpart (SImode, operands[2]);")
> >
> > (define_insn "*ashlhi3_1"
> > - [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,Yp")
> > - (ashift:HI (match_operand:HI 1 "nonimmediate_operand" "0,l")
> > - (match_operand:QI 2 "nonmemory_operand" "cI,M")))
> > + [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,Yp,?k")
> > + (ashift:HI (match_operand:HI 1 "nonimmediate_operand" "0,l,k")
> > + (match_operand:QI 2 "nonmemory_operand" "cI,M,Ww")))
> > (clobber (reg:CC FLAGS_REG))]
> > "ix86_binary_operator_ok (ASHIFT, HImode, operands)"
> > {
> > switch (get_attr_type (insn))
> > {
> > case TYPE_LEA:
> > + case TYPE_MSKLOG:
> > return "#";
> >
> > case TYPE_ALU:
> > @@ -11241,9 +11246,12 @@ (define_insn "*ashlhi3_1"
> > return "sal{w}\t{%2, %0|%0, %2}";
> > }
> > }
> > - [(set (attr "type")
> > + [(set_attr "isa" "*,*,avx512f")
> > + (set (attr "type")
> > (cond [(eq_attr "alternative" "1")
> > (const_string "lea")
> > + (eq_attr "alternative" "2")
> > + (const_string "msklog")
> > (and (and (match_test "TARGET_DOUBLE_WITH_ADD")
> > (match_operand 0 "register_operand"))
> > (match_operand 2 "const1_operand"))
> > @@ -11259,18 +11270,19 @@ (define_insn "*ashlhi3_1"
> > (match_test "optimize_function_for_size_p (cfun)")))))
> > (const_string "0")
> > (const_string "*")))
> > - (set_attr "mode" "HI,SI")])
> > + (set_attr "mode" "HI,SI,HI")])
> >
> > (define_insn "*ashlqi3_1"
> > - [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r,Yp")
> > - (ashift:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,l")
> > - (match_operand:QI 2 "nonmemory_operand" "cI,cI,M")))
> > + [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r,Yp,?k")
> > + (ashift:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,l,k")
> > + (match_operand:QI 2 "nonmemory_operand" "cI,cI,M,Wb")))
> > (clobber (reg:CC FLAGS_REG))]
> > "ix86_binary_operator_ok (ASHIFT, QImode, operands)"
> > {
> > switch (get_attr_type (insn))
> > {
> > case TYPE_LEA:
> > + case TYPE_MSKLOG:
> > return "#";
> >
> > case TYPE_ALU:
> > @@ -11298,9 +11307,12 @@ (define_insn "*ashlqi3_1"
> > }
> > }
> > }
> > - [(set (attr "type")
> > + [(set_attr "isa" "*,*,*,avx512dq")
> > + (set (attr "type")
> > (cond [(eq_attr "alternative" "2")
> > (const_string "lea")
> > + (eq_attr "alternative" "3")
> > + (const_string "msklog")
> > (and (and (match_test "TARGET_DOUBLE_WITH_ADD")
> > (match_operand 0 "register_operand"))
> > (match_operand 2 "const1_operand"))
> > @@ -11316,7 +11334,7 @@ (define_insn "*ashlqi3_1"
> > (match_test "optimize_function_for_size_p (cfun)")))))
> > (const_string "0")
> > (const_string "*")))
> > - (set_attr "mode" "QI,SI,SI")
> > + (set_attr "mode" "QI,SI,SI,QI")
> > ;; Potential partial reg stall on alternative 1.
> > (set (attr "preferred_for_speed")
> > (cond [(eq_attr "alternative" "1")
> > @@ -11819,16 +11837,17 @@ (define_insn "*bmi2_<insn><mode>3_1"
> > (set_attr "mode" "<MODE>")])
> >
> > (define_insn "*<insn><mode>3_1"
> > - [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r")
> > + [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,?k")
> > (any_shiftrt:SWI48
> > - (match_operand:SWI48 1 "nonimmediate_operand" "0,rm")
> > - (match_operand:QI 2 "nonmemory_operand" "c<S>,r")))
> > + (match_operand:SWI48 1 "nonimmediate_operand" "0,rm,k")
> > + (match_operand:QI 2 "nonmemory_operand" "c<S>,r,<KS>")))
> > (clobber (reg:CC FLAGS_REG))]
> > "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
> > {
> > switch (get_attr_type (insn))
> > {
> > case TYPE_ISHIFTX:
> > + case TYPE_MSKLOG:
> > return "#";
> >
> > default:
> > @@ -11839,11 +11858,16 @@ (define_insn "*<insn><mode>3_1"
> > return "<shift>{<imodesuffix>}\t{%2, %0|%0, %2}";
> > }
> > }
> > - [(set_attr "isa" "*,bmi2")
> > - (set_attr "type" "ishift,ishiftx")
> > + [(set_attr "isa" "*,bmi2,avx512bw")
> > + (set_attr "type" "ishift,ishiftx,msklog")
> > + (set (attr "enabled")
> > + (if_then_else (eq_attr "alternative" "2")
> > + (symbol_ref "<CODE> == LSHIFTRT && TARGET_AVX512BW")
>
> Please rather split the pattern to ASHIFTRT and LSHIFTRT. The
> macroization has no point if we need to use enabled attribute in this
> way.
Changed.
>
> > + (const_string "*")))
> > (set (attr "length_immediate")
> > (if_then_else
> > - (and (match_operand 2 "const1_operand")
> > + (and (and (match_operand 2 "const1_operand")
> > + (eq_attr "alternative" "0"))
> > (ior (match_test "TARGET_SHIFT1")
> > (match_test "optimize_function_for_size_p (cfun)")))
> > (const_string "0")
> > @@ -11916,27 +11940,41 @@ (define_split
> > "operands[2] = gen_lowpart (SImode, operands[2]);")
> >
> > (define_insn "*<insn><mode>3_1"
> > - [(set (match_operand:SWI12 0 "nonimmediate_operand" "=<r>m")
> > + [(set (match_operand:SWI12 0 "nonimmediate_operand" "=<r>m, ?k")
> > (any_shiftrt:SWI12
> > - (match_operand:SWI12 1 "nonimmediate_operand" "0")
> > - (match_operand:QI 2 "nonmemory_operand" "c<S>")))
> > + (match_operand:SWI12 1 "nonimmediate_operand" "0, k")
> > + (match_operand:QI 2 "nonmemory_operand" "c<S>, <KS>")))
> > (clobber (reg:CC FLAGS_REG))]
> > "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
> > {
> > - if (operands[2] == const1_rtx
> > - && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
> > - return "<shift>{<imodesuffix>}\t%0";
> > - else
> > - return "<shift>{<imodesuffix>}\t{%2, %0|%0, %2}";
> > + switch (get_attr_type (insn))
> > + {
> > + case TYPE_ISHIFT:
> > + if (operands[2] == const1_rtx
> > + && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
> > + return "<shift>{<imodesuffix>}\t%0";
> > + else
> > + return "<shift>{<imodesuffix>}\t{%2, %0|%0, %2}";
> > + case TYPE_MSKLOG:
> > + return "#";
> > + default:
> > + gcc_unreachable ();
> > + }
> > }
> > - [(set_attr "type" "ishift")
> > + [(set_attr "type" "ishift,msklog")
> > (set (attr "length_immediate")
> > (if_then_else
> > - (and (match_operand 2 "const1_operand")
> > + (and (and (match_operand 2 "const1_operand")
> > + (eq_attr "alternative" "0"))
> > (ior (match_test "TARGET_SHIFT1")
> > (match_test "optimize_function_for_size_p (cfun)")))
> > (const_string "0")
> > (const_string "*")))
> > + (set (attr "enabled")
> > + (if_then_else (eq_attr "alternative" "1")
> > + (symbol_ref "<CODE> == LSHIFTRT && TARGET_AVX512F
> > + && (<MODE>mode != QImode || TARGET_AVX512DQ)")
>
> Also here, please split out LSHIFTRT and perhaps use conditional
> constraint to avoid enabled attribute.
>
Changed.
> Uros.
>
> > + (const_string "*")))
> > (set_attr "mode" "<MODE>")])
> >
> > (define_insn "*<insn><mode>3_1_slp"
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > index ab29999023d..f8759e4d758 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -1755,6 +1755,20 @@ (define_insn "k<code><mode>"
> > (set_attr "prefix" "vex")
> > (set_attr "mode" "<MODE>")])
> >
> > +(define_split
> > + [(set (match_operand:SWI1248_AVX512BW 0 "mask_reg_operand")
> > + (any_lshift:SWI1248_AVX512BW
> > + (match_operand:SWI1248_AVX512BW 1 "mask_reg_operand")
> > + (match_operand 2 "const_int_operand")))
> > + (clobber (reg:CC FLAGS_REG))]
> > + "TARGET_AVX512F && reload_completed"
> > + [(parallel
> > + [(set (match_dup 0)
> > + (any_lshift:SWI1248_AVX512BW
> > + (match_dup 1)
> > + (match_dup 2)))
> > + (unspec [(const_int 0)] UNSPEC_MASKOP)])])
> > +
> > (define_insn "ktest<mode>"
> > [(set (reg:CC FLAGS_REG)
> > (unspec:CC
> > diff --git a/gcc/testsuite/gcc.target/i386/mask-shift.c b/gcc/testsuite/gcc.target/i386/mask-shift.c
> > new file mode 100644
> > index 00000000000..4cb6ef37821
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/mask-shift.c
> > @@ -0,0 +1,83 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-mavx512bw -mavx512dq -O2" } */
> > +
> > +#include<immintrin.h>
> > +void
> > +fooq (__m512i a, __m512i b, void* p)
> > +{
> > + __mmask8 m1 = _mm512_cmpeq_epi64_mask (a, b);
> > + m1 >>= 4;
> > + _mm512_mask_storeu_epi64 (p, m1, a);
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {(?n)kshiftrb} "1" } } */
> > +
> > +void
> > +food (__m512i a, __m512i b, void* p)
> > +{
> > + __mmask16 m1 = _mm512_cmpeq_epi32_mask (a, b);
> > + m1 >>= 8;
> > + _mm512_mask_storeu_epi32 (p, m1, a);
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {(?n)kshiftrw} "1" } } */
> > +
> > +void
> > +foow (__m512i a, __m512i b, void* p)
> > +{
> > + __mmask32 m1 = _mm512_cmpeq_epi16_mask (a, b);
> > + m1 >>= 16;
> > + _mm512_mask_storeu_epi16 (p, m1, a);
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {(?n)kshiftrd} "1" } } */
> > +
> > +void
> > +foob (__m512i a, __m512i b, void* p)
> > +{
> > + __mmask64 m1 = _mm512_cmpeq_epi8_mask (a, b);
> > + m1 >>= 32;
> > + _mm512_mask_storeu_epi8 (p, m1, a);
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {(?n)kshiftrq} "1" { target { ! ia32 } } } } */
> > +
> > +void
> > +fooq1 (__m512i a, __m512i b, void* p)
> > +{
> > + __mmask8 m1 = _mm512_cmpeq_epi64_mask (a, b);
> > + m1 <<= 4;
> > + _mm512_mask_storeu_epi64 (p, m1, a);
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {(?n)kshiftlb} "1" } } */
> > +
> > +void
> > +food1 (__m512i a, __m512i b, void* p)
> > +{
> > + __mmask16 m1 = _mm512_cmpeq_epi32_mask (a, b);
> > + m1 <<= 8;
> > + _mm512_mask_storeu_epi32 (p, m1, a);
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {(?n)kshiftlw} "1" } } */
> > +
> > +void
> > +foow1 (__m512i a, __m512i b, void* p)
> > +{
> > + __mmask32 m1 = _mm512_cmpeq_epi16_mask (a, b);
> > + m1 <<= 16;
> > + _mm512_mask_storeu_epi16 (p, m1, a);
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {(?n)kshiftld} "1" } } */
> > +
> > +void
> > +foob1 (__m512i a, __m512i b, void* p)
> > +{
> > + __mmask64 m1 = _mm512_cmpeq_epi8_mask (a, b);
> > + m1 <<= 32;
> > + _mm512_mask_storeu_epi8 (p, m1, a);
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {(?n)kshiftlq} "1" { target { ! ia32 } } } } */
> > --
> > 2.18.1
> >
Update patch.
gcc/ChangeLog:
* config/i386/constraints.md (Wb): New constraint.
(Ww): Ditto.
* config/i386/i386.md (*ashlhi3_1): Extend to avx512 mask
shift.
(*ashlqi3_1): Ditto.
(*<insn><mode>3_1): Split to ..
(*ashr<mode>3_1): this, ...
(*lshr<mode>3_1): and this, also extend this pattern to avx512
mask registers.
(*<insn><mode>3_1): Split to ..
(*ashr<mode>3_1): this, ...
(*lshr<mode>3_1): and this, also extend this pattern to avx512
mask registers.
* config/i386/sse.md (k<code><mode>): New define_split after
it to convert generic shift pattern to mask shift ones.
--
BR,
Hongtao
[-- Attachment #2: v2-0001-Support-logic-shift-left-right-for-avx512-mask-type.patch --]
[-- Type: text/x-patch, Size: 14599 bytes --]
From 6f731b7ec4244faf8c0c49197a78cfcbbdd42dc9 Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Tue, 20 Jul 2021 18:32:35 +0800
Subject: [PATCH] Support logic shift left/right for avx512 mask type.
gcc/ChangeLog:
* config/i386/constraints.md (Wb): New constraint.
(Ww): Ditto.
* config/i386/i386.md (*ashlhi3_1): Extend to avx512 mask
shift.
(*ashlqi3_1): Ditto.
(*<insn><mode>3_1): Split to ..
(*ashr<mode>3_1): this, ...
(*lshr<mode>3_1): and this, also extend this pattern to avx512
mask registers.
(*<insn><mode>3_1): Split to ..
(*ashr<mode>3_1): this, ...
(*lshr<mode>3_1): and this, also extend this pattern to avx512
mask registers.
* config/i386/sse.md (k<code><mode>): New define_split after
it to convert generic shift pattern to mask shift ones.
gcc/testsuite/ChangeLog:
* gcc.target/i386/mask-shift.c: New test.
---
gcc/config/i386/constraints.md | 10 ++
gcc/config/i386/i386.md | 133 +++++++++++++++++----
gcc/config/i386/sse.md | 14 +++
gcc/testsuite/gcc.target/i386/mask-shift.c | 83 +++++++++++++
4 files changed, 216 insertions(+), 24 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/i386/mask-shift.c
diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 485e3f5b2cf..4aa28a5621c 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -222,6 +222,16 @@ (define_constraint "BC"
(match_operand 0 "vector_all_ones_operand"))))
;; Integer constant constraints.
+(define_constraint "Wb"
+ "Integer constant in the range 0 @dots{} 7, for 8-bit shifts."
+ (and (match_code "const_int")
+ (match_test "IN_RANGE (ival, 0, 7)")))
+
+(define_constraint "Ww"
+ "Integer constant in the range 0 @dots{} 15, for 16-bit shifts."
+ (and (match_code "const_int")
+ (match_test "IN_RANGE (ival, 0, 15)")))
+
(define_constraint "I"
"Integer constant in the range 0 @dots{} 31, for 32-bit shifts."
(and (match_code "const_int")
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 8b809c49fe0..61aee28e2ea 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1136,6 +1136,7 @@ (define_mode_attr di [(SI "nF") (DI "Wd")])
;; Immediate operand constraint for shifts.
(define_mode_attr S [(QI "I") (HI "I") (SI "I") (DI "J") (TI "O")])
+(define_mode_attr KS [(QI "Wb") (HI "Ww") (SI "I") (DI "J")])
;; Print register name in the specified mode.
(define_mode_attr k [(QI "b") (HI "w") (SI "k") (DI "q")])
@@ -11088,9 +11089,9 @@ (define_insn "*bmi2_ashl<mode>3_1"
(set_attr "mode" "<MODE>")])
(define_insn "*ashl<mode>3_1"
- [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r")
- (ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "0,l,rm")
- (match_operand:QI 2 "nonmemory_operand" "c<S>,M,r")))
+ [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,?k")
+ (ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "0,l,rm,k")
+ (match_operand:QI 2 "nonmemory_operand" "c<S>,M,r,<KS>")))
(clobber (reg:CC FLAGS_REG))]
"ix86_binary_operator_ok (ASHIFT, <MODE>mode, operands)"
{
@@ -11098,6 +11099,7 @@ (define_insn "*ashl<mode>3_1"
{
case TYPE_LEA:
case TYPE_ISHIFTX:
+ case TYPE_MSKLOG:
return "#";
case TYPE_ALU:
@@ -11113,7 +11115,7 @@ (define_insn "*ashl<mode>3_1"
return "sal{<imodesuffix>}\t{%2, %0|%0, %2}";
}
}
- [(set_attr "isa" "*,*,bmi2")
+ [(set_attr "isa" "*,*,bmi2,avx512bw")
(set (attr "type")
(cond [(eq_attr "alternative" "1")
(const_string "lea")
@@ -11123,6 +11125,8 @@ (define_insn "*ashl<mode>3_1"
(match_operand 0 "register_operand"))
(match_operand 2 "const1_operand"))
(const_string "alu")
+ (eq_attr "alternative" "3")
+ (const_string "msklog")
]
(const_string "ishift")))
(set (attr "length_immediate")
@@ -11218,15 +11222,16 @@ (define_split
"operands[2] = gen_lowpart (SImode, operands[2]);")
(define_insn "*ashlhi3_1"
- [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,Yp")
- (ashift:HI (match_operand:HI 1 "nonimmediate_operand" "0,l")
- (match_operand:QI 2 "nonmemory_operand" "cI,M")))
+ [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,Yp,?k")
+ (ashift:HI (match_operand:HI 1 "nonimmediate_operand" "0,l,k")
+ (match_operand:QI 2 "nonmemory_operand" "cI,M,Ww")))
(clobber (reg:CC FLAGS_REG))]
"ix86_binary_operator_ok (ASHIFT, HImode, operands)"
{
switch (get_attr_type (insn))
{
case TYPE_LEA:
+ case TYPE_MSKLOG:
return "#";
case TYPE_ALU:
@@ -11241,9 +11246,12 @@ (define_insn "*ashlhi3_1"
return "sal{w}\t{%2, %0|%0, %2}";
}
}
- [(set (attr "type")
+ [(set_attr "isa" "*,*,avx512f")
+ (set (attr "type")
(cond [(eq_attr "alternative" "1")
(const_string "lea")
+ (eq_attr "alternative" "2")
+ (const_string "msklog")
(and (and (match_test "TARGET_DOUBLE_WITH_ADD")
(match_operand 0 "register_operand"))
(match_operand 2 "const1_operand"))
@@ -11259,18 +11267,19 @@ (define_insn "*ashlhi3_1"
(match_test "optimize_function_for_size_p (cfun)")))))
(const_string "0")
(const_string "*")))
- (set_attr "mode" "HI,SI")])
+ (set_attr "mode" "HI,SI,HI")])
(define_insn "*ashlqi3_1"
- [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r,Yp")
- (ashift:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,l")
- (match_operand:QI 2 "nonmemory_operand" "cI,cI,M")))
+ [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r,Yp,?k")
+ (ashift:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,l,k")
+ (match_operand:QI 2 "nonmemory_operand" "cI,cI,M,Wb")))
(clobber (reg:CC FLAGS_REG))]
"ix86_binary_operator_ok (ASHIFT, QImode, operands)"
{
switch (get_attr_type (insn))
{
case TYPE_LEA:
+ case TYPE_MSKLOG:
return "#";
case TYPE_ALU:
@@ -11298,9 +11307,12 @@ (define_insn "*ashlqi3_1"
}
}
}
- [(set (attr "type")
+ [(set_attr "isa" "*,*,*,avx512dq")
+ (set (attr "type")
(cond [(eq_attr "alternative" "2")
(const_string "lea")
+ (eq_attr "alternative" "3")
+ (const_string "msklog")
(and (and (match_test "TARGET_DOUBLE_WITH_ADD")
(match_operand 0 "register_operand"))
(match_operand 2 "const1_operand"))
@@ -11316,7 +11328,7 @@ (define_insn "*ashlqi3_1"
(match_test "optimize_function_for_size_p (cfun)")))))
(const_string "0")
(const_string "*")))
- (set_attr "mode" "QI,SI,SI")
+ (set_attr "mode" "QI,SI,SI,QI")
;; Potential partial reg stall on alternative 1.
(set (attr "preferred_for_speed")
(cond [(eq_attr "alternative" "1")
@@ -11818,13 +11830,13 @@ (define_insn "*bmi2_<insn><mode>3_1"
[(set_attr "type" "ishiftx")
(set_attr "mode" "<MODE>")])
-(define_insn "*<insn><mode>3_1"
+(define_insn "*ashr<mode>3_1"
[(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r")
- (any_shiftrt:SWI48
+ (ashiftrt:SWI48
(match_operand:SWI48 1 "nonimmediate_operand" "0,rm")
(match_operand:QI 2 "nonmemory_operand" "c<S>,r")))
(clobber (reg:CC FLAGS_REG))]
- "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
+ "ix86_binary_operator_ok (ASHIFTRT, <MODE>mode, operands)"
{
switch (get_attr_type (insn))
{
@@ -11834,9 +11846,9 @@ (define_insn "*<insn><mode>3_1"
default:
if (operands[2] == const1_rtx
&& (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
- return "<shift>{<imodesuffix>}\t%0";
+ return "sar{<imodesuffix>}\t%0";
else
- return "<shift>{<imodesuffix>}\t{%2, %0|%0, %2}";
+ return "sar{<imodesuffix>}\t{%2, %0|%0, %2}";
}
}
[(set_attr "isa" "*,bmi2")
@@ -11850,6 +11862,40 @@ (define_insn "*<insn><mode>3_1"
(const_string "*")))
(set_attr "mode" "<MODE>")])
+(define_insn "*lshr<mode>3_1"
+ [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,?k")
+ (lshiftrt:SWI48
+ (match_operand:SWI48 1 "nonimmediate_operand" "0,rm,k")
+ (match_operand:QI 2 "nonmemory_operand" "c<S>,r,<KS>")))
+ (clobber (reg:CC FLAGS_REG))]
+ "ix86_binary_operator_ok (LSHIFTRT, <MODE>mode, operands)"
+{
+ switch (get_attr_type (insn))
+ {
+ case TYPE_ISHIFTX:
+ case TYPE_MSKLOG:
+ return "#";
+
+ default:
+ if (operands[2] == const1_rtx
+ && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+ return "shr{<imodesuffix>}\t%0";
+ else
+ return "shr{<imodesuffix>}\t{%2, %0|%0, %2}";
+ }
+}
+ [(set_attr "isa" "*,bmi2,avx512bw")
+ (set_attr "type" "ishift,ishiftx,msklog")
+ (set (attr "length_immediate")
+ (if_then_else
+ (and (and (match_operand 2 "const1_operand")
+ (eq_attr "alternative" "0"))
+ (ior (match_test "TARGET_SHIFT1")
+ (match_test "optimize_function_for_size_p (cfun)")))
+ (const_string "0")
+ (const_string "*")))
+ (set_attr "mode" "<MODE>")])
+
;; Convert shift to the shiftx pattern to avoid flags dependency.
(define_split
[(set (match_operand:SWI48 0 "register_operand")
@@ -11915,19 +11961,19 @@ (define_split
(zero_extend:DI (any_shiftrt:SI (match_dup 1) (match_dup 2))))]
"operands[2] = gen_lowpart (SImode, operands[2]);")
-(define_insn "*<insn><mode>3_1"
+(define_insn "*ashr<mode>3_1"
[(set (match_operand:SWI12 0 "nonimmediate_operand" "=<r>m")
- (any_shiftrt:SWI12
+ (ashiftrt:SWI12
(match_operand:SWI12 1 "nonimmediate_operand" "0")
(match_operand:QI 2 "nonmemory_operand" "c<S>")))
(clobber (reg:CC FLAGS_REG))]
- "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
+ "ix86_binary_operator_ok (ASHIFTRT, <MODE>mode, operands)"
{
if (operands[2] == const1_rtx
&& (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
- return "<shift>{<imodesuffix>}\t%0";
+ return "sar{<imodesuffix>}\t%0";
else
- return "<shift>{<imodesuffix>}\t{%2, %0|%0, %2}";
+ return "sar{<imodesuffix>}\t{%2, %0|%0, %2}";
}
[(set_attr "type" "ishift")
(set (attr "length_immediate")
@@ -11939,6 +11985,45 @@ (define_insn "*<insn><mode>3_1"
(const_string "*")))
(set_attr "mode" "<MODE>")])
+(define_insn "*lshr<mode>3_1"
+ [(set (match_operand:SWI12 0 "nonimmediate_operand" "=<r>m, ?k")
+ (lshiftrt:SWI12
+ (match_operand:SWI12 1 "nonimmediate_operand" "0, k")
+ (match_operand:QI 2 "nonmemory_operand" "c<S>, <KS>")))
+ (clobber (reg:CC FLAGS_REG))]
+ "ix86_binary_operator_ok (LSHIFTRT, <MODE>mode, operands)"
+{
+ switch (get_attr_type (insn))
+ {
+ case TYPE_ISHIFT:
+ if (operands[2] == const1_rtx
+ && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+ return "shr{<imodesuffix>}\t%0";
+ else
+ return "shr{<imodesuffix>}\t{%2, %0|%0, %2}";
+ case TYPE_MSKLOG:
+ return "#";
+ default:
+ gcc_unreachable ();
+ }
+}
+ [(set (attr "isa")
+ (if_then_else (eq_attr "alternative" "1")
+ (if_then_else (match_test "<MODE>mode == QImode")
+ (const_string "avx512dq")
+ (const_string "avx512f"))
+ (const_string "*")))
+ (set_attr "type" "ishift,msklog")
+ (set (attr "length_immediate")
+ (if_then_else
+ (and (and (match_operand 2 "const1_operand")
+ (eq_attr "alternative" "0"))
+ (ior (match_test "TARGET_SHIFT1")
+ (match_test "optimize_function_for_size_p (cfun)")))
+ (const_string "0")
+ (const_string "*")))
+ (set_attr "mode" "<MODE>")])
+
(define_insn "*<insn><mode>3_1_slp"
[(set (strict_low_part (match_operand:SWI12 0 "register_operand" "+<r>"))
(any_shiftrt:SWI12 (match_operand:SWI12 1 "register_operand" "0")
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index ab29999023d..f8759e4d758 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1755,6 +1755,20 @@ (define_insn "k<code><mode>"
(set_attr "prefix" "vex")
(set_attr "mode" "<MODE>")])
+(define_split
+ [(set (match_operand:SWI1248_AVX512BW 0 "mask_reg_operand")
+ (any_lshift:SWI1248_AVX512BW
+ (match_operand:SWI1248_AVX512BW 1 "mask_reg_operand")
+ (match_operand 2 "const_int_operand")))
+ (clobber (reg:CC FLAGS_REG))]
+ "TARGET_AVX512F && reload_completed"
+ [(parallel
+ [(set (match_dup 0)
+ (any_lshift:SWI1248_AVX512BW
+ (match_dup 1)
+ (match_dup 2)))
+ (unspec [(const_int 0)] UNSPEC_MASKOP)])])
+
(define_insn "ktest<mode>"
[(set (reg:CC FLAGS_REG)
(unspec:CC
diff --git a/gcc/testsuite/gcc.target/i386/mask-shift.c b/gcc/testsuite/gcc.target/i386/mask-shift.c
new file mode 100644
index 00000000000..4cb6ef37821
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/mask-shift.c
@@ -0,0 +1,83 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -mavx512dq -O2" } */
+
+#include<immintrin.h>
+void
+fooq (__m512i a, __m512i b, void* p)
+{
+ __mmask8 m1 = _mm512_cmpeq_epi64_mask (a, b);
+ m1 >>= 4;
+ _mm512_mask_storeu_epi64 (p, m1, a);
+}
+
+/* { dg-final { scan-assembler-times {(?n)kshiftrb} "1" } } */
+
+void
+food (__m512i a, __m512i b, void* p)
+{
+ __mmask16 m1 = _mm512_cmpeq_epi32_mask (a, b);
+ m1 >>= 8;
+ _mm512_mask_storeu_epi32 (p, m1, a);
+}
+
+/* { dg-final { scan-assembler-times {(?n)kshiftrw} "1" } } */
+
+void
+foow (__m512i a, __m512i b, void* p)
+{
+ __mmask32 m1 = _mm512_cmpeq_epi16_mask (a, b);
+ m1 >>= 16;
+ _mm512_mask_storeu_epi16 (p, m1, a);
+}
+
+/* { dg-final { scan-assembler-times {(?n)kshiftrd} "1" } } */
+
+void
+foob (__m512i a, __m512i b, void* p)
+{
+ __mmask64 m1 = _mm512_cmpeq_epi8_mask (a, b);
+ m1 >>= 32;
+ _mm512_mask_storeu_epi8 (p, m1, a);
+}
+
+/* { dg-final { scan-assembler-times {(?n)kshiftrq} "1" { target { ! ia32 } } } } */
+
+void
+fooq1 (__m512i a, __m512i b, void* p)
+{
+ __mmask8 m1 = _mm512_cmpeq_epi64_mask (a, b);
+ m1 <<= 4;
+ _mm512_mask_storeu_epi64 (p, m1, a);
+}
+
+/* { dg-final { scan-assembler-times {(?n)kshiftlb} "1" } } */
+
+void
+food1 (__m512i a, __m512i b, void* p)
+{
+ __mmask16 m1 = _mm512_cmpeq_epi32_mask (a, b);
+ m1 <<= 8;
+ _mm512_mask_storeu_epi32 (p, m1, a);
+}
+
+/* { dg-final { scan-assembler-times {(?n)kshiftlw} "1" } } */
+
+void
+foow1 (__m512i a, __m512i b, void* p)
+{
+ __mmask32 m1 = _mm512_cmpeq_epi16_mask (a, b);
+ m1 <<= 16;
+ _mm512_mask_storeu_epi16 (p, m1, a);
+}
+
+/* { dg-final { scan-assembler-times {(?n)kshiftld} "1" } } */
+
+void
+foob1 (__m512i a, __m512i b, void* p)
+{
+ __mmask64 m1 = _mm512_cmpeq_epi8_mask (a, b);
+ m1 <<= 32;
+ _mm512_mask_storeu_epi8 (p, m1, a);
+}
+
+/* { dg-final { scan-assembler-times {(?n)kshiftlq} "1" { target { ! ia32 } } } } */
--
2.18.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Support logic shift left/right for avx512 mask type.
2021-07-21 3:11 ` Hongtao Liu
@ 2021-07-21 8:22 ` Uros Bizjak
2021-07-22 1:32 ` Liu, Hongtao
0 siblings, 1 reply; 6+ messages in thread
From: Uros Bizjak @ 2021-07-21 8:22 UTC (permalink / raw)
To: Hongtao Liu; +Cc: liuhongt, gcc-patches, H. J. Lu, Richard Biener
On Wed, Jul 21, 2021 at 5:05 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Tue, Jul 20, 2021 at 9:41 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Tue, Jul 20, 2021 at 2:33 PM liuhongt <hongtao.liu@intel.com> wrote:
> > >
> > > Hi:
> > > As mention in https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575420.html
> > >
> > > ----cut start-----
> > > > note for the lowpart we can just view-convert away the excess bits,
> > > > fully re-using the mask. We generate surprisingly "good" code:
> > > >
> > > > kmovb %k1, %edi
> > > > shrb $4, %dil
> > > > kmovb %edi, %k2
> > > >
> > > > besides the lack of using kshiftrb. I guess we're just lacking
> > > > a mask register alternative for
> > > Yes, we can do it similar as kor/kand/kxor.
> > > ---cut end--------
> > >
> > > Bootstrap and regtested on x86_64-linux-gnu{-m32,}.
> > > Ok for trunk?
> > >
> > > gcc/ChangeLog:
> > >
> > > * config/i386/constraints.md (Wb): New constraint.
> > > (Ww): Ditto.
> > > * config/i386/i386.md (*ashlhi3_1): Extend to avx512 mask
> > > shift.
> > > (*ashlqi3_1): Ditto.
> > > (*<insn><mode>3_1): Ditto.
> > > (*<insn><mode>3_1): Ditto.
> > > * config/i386/sse.md (k<code><mode>): New define_split after
> > > it to convert generic shift pattern to mask shift ones.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/i386/mask-shift.c: New test.
+(define_insn "*lshr<mode>3_1"
+ [(set (match_operand:SWI12 0 "nonimmediate_operand" "=<r>m, ?k")
+ (lshiftrt:SWI12
+ (match_operand:SWI12 1 "nonimmediate_operand" "0, k")
+ (match_operand:QI 2 "nonmemory_operand" "c<S>, <KS>")))
+ (clobber (reg:CC FLAGS_REG))]
+ "ix86_binary_operator_ok (LSHIFTRT, <MODE>mode, operands)"
Also split this one to QImode and HImode to avoid conditions in isa attribute.
OK with this change.
Thanks,
Uros.
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: [PATCH] Support logic shift left/right for avx512 mask type.
2021-07-21 8:22 ` Uros Bizjak
@ 2021-07-22 1:32 ` Liu, Hongtao
2021-07-22 6:38 ` Richard Biener
0 siblings, 1 reply; 6+ messages in thread
From: Liu, Hongtao @ 2021-07-22 1:32 UTC (permalink / raw)
To: Uros Bizjak, Hongtao Liu; +Cc: gcc-patches, H. J. Lu, Richard Biener
[-- Attachment #1: Type: text/plain, Size: 2487 bytes --]
>-----Original Message-----
>From: Uros Bizjak <ubizjak@gmail.com>
>Sent: Wednesday, July 21, 2021 4:23 PM
>To: Hongtao Liu <crazylht@gmail.com>
>Cc: Liu, Hongtao <hongtao.liu@intel.com>; gcc-patches@gcc.gnu.org; H. J. Lu
><hjl.tools@gmail.com>; Richard Biener <rguenther@suse.de>
>Subject: Re: [PATCH] Support logic shift left/right for avx512 mask type.
>
>On Wed, Jul 21, 2021 at 5:05 AM Hongtao Liu <crazylht@gmail.com> wrote:
>>
>> On Tue, Jul 20, 2021 at 9:41 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>> >
>> > On Tue, Jul 20, 2021 at 2:33 PM liuhongt <hongtao.liu@intel.com> wrote:
>> > >
>> > > Hi:
>> > > As mention in
>> > > https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575420.html
>> > >
>> > > ----cut start-----
>> > > > note for the lowpart we can just view-convert away the excess
>> > > > bits, fully re-using the mask. We generate surprisingly "good" code:
>> > > >
>> > > > kmovb %k1, %edi
>> > > > shrb $4, %dil
>> > > > kmovb %edi, %k2
>> > > >
>> > > > besides the lack of using kshiftrb. I guess we're just lacking
>> > > > a mask register alternative for
>> > > Yes, we can do it similar as kor/kand/kxor.
>> > > ---cut end--------
>> > >
>> > > Bootstrap and regtested on x86_64-linux-gnu{-m32,}.
>> > > Ok for trunk?
>> > >
>> > > gcc/ChangeLog:
>> > >
>> > > * config/i386/constraints.md (Wb): New constraint.
>> > > (Ww): Ditto.
>> > > * config/i386/i386.md (*ashlhi3_1): Extend to avx512 mask
>> > > shift.
>> > > (*ashlqi3_1): Ditto.
>> > > (*<insn><mode>3_1): Ditto.
>> > > (*<insn><mode>3_1): Ditto.
>> > > * config/i386/sse.md (k<code><mode>): New define_split after
>> > > it to convert generic shift pattern to mask shift ones.
>> > >
>> > > gcc/testsuite/ChangeLog:
>> > >
>> > > * gcc.target/i386/mask-shift.c: New test.
>
>
>+(define_insn "*lshr<mode>3_1"
>+ [(set (match_operand:SWI12 0 "nonimmediate_operand" "=<r>m, ?k")
>+ (lshiftrt:SWI12
>+ (match_operand:SWI12 1 "nonimmediate_operand" "0, k")
>+ (match_operand:QI 2 "nonmemory_operand" "c<S>, <KS>")))
>+ (clobber (reg:CC FLAGS_REG))]
>+ "ix86_binary_operator_ok (LSHIFTRT, <MODE>mode, operands)"
>
>Also split this one to QImode and HImode to avoid conditions in isa attribute.
>
>OK with this change.
>
Thanks for the review, here's the patch I'm check in.
>Thanks,
>Uros.
[-- Attachment #2: V3-0001-Support-logic-shift-left-right-for-avx512-mask-type.patch --]
[-- Type: application/octet-stream, Size: 15479 bytes --]
From b5eecfe8dc1c07e5b52772a9a492ad6a1dad5404 Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Tue, 20 Jul 2021 18:32:35 +0800
Subject: [PATCH] Support logic shift left/right for avx512 mask type.
gcc/ChangeLog:
* config/i386/constraints.md (Wb): New constraint.
(Ww): Ditto.
* config/i386/i386.md (*ashlhi3_1): Extend to avx512 mask
shift.
(*ashlqi3_1): Ditto.
(*<insn><mode>3_1): Split to ..
(*ashr<mode>3_1): this, ...
(*lshr<mode>3_1): and this, also extend this pattern to avx512
mask registers.
(*<insn><mode>3_1): Split to ..
(*ashr<mode>3_1): this, ...
(*lshrqi3_1): and this, also extend this pattern to avx512
mask registers.
(*lshrhi3_1): And this, also extend this pattern to avx512
mask registers.
* config/i386/sse.md (k<code><mode>): New define_split after
it to convert generic shift pattern to mask shift ones.
gcc/testsuite/ChangeLog:
* gcc.target/i386/mask-shift.c: New test.
---
gcc/config/i386/constraints.md | 10 ++
gcc/config/i386/i386.md | 162 ++++++++++++++++++---
gcc/config/i386/sse.md | 14 ++
gcc/testsuite/gcc.target/i386/mask-shift.c | 83 +++++++++++
4 files changed, 245 insertions(+), 24 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/i386/mask-shift.c
diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 485e3f5b2cf..4aa28a5621c 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -222,6 +222,16 @@ (define_constraint "BC"
(match_operand 0 "vector_all_ones_operand"))))
;; Integer constant constraints.
+(define_constraint "Wb"
+ "Integer constant in the range 0 @dots{} 7, for 8-bit shifts."
+ (and (match_code "const_int")
+ (match_test "IN_RANGE (ival, 0, 7)")))
+
+(define_constraint "Ww"
+ "Integer constant in the range 0 @dots{} 15, for 16-bit shifts."
+ (and (match_code "const_int")
+ (match_test "IN_RANGE (ival, 0, 15)")))
+
(define_constraint "I"
"Integer constant in the range 0 @dots{} 31, for 32-bit shifts."
(and (match_code "const_int")
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 8b809c49fe0..44ae18eb4b2 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1136,6 +1136,7 @@ (define_mode_attr di [(SI "nF") (DI "Wd")])
;; Immediate operand constraint for shifts.
(define_mode_attr S [(QI "I") (HI "I") (SI "I") (DI "J") (TI "O")])
+(define_mode_attr KS [(QI "Wb") (HI "Ww") (SI "I") (DI "J")])
;; Print register name in the specified mode.
(define_mode_attr k [(QI "b") (HI "w") (SI "k") (DI "q")])
@@ -11088,9 +11089,9 @@ (define_insn "*bmi2_ashl<mode>3_1"
(set_attr "mode" "<MODE>")])
(define_insn "*ashl<mode>3_1"
- [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r")
- (ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "0,l,rm")
- (match_operand:QI 2 "nonmemory_operand" "c<S>,M,r")))
+ [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,?k")
+ (ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "0,l,rm,k")
+ (match_operand:QI 2 "nonmemory_operand" "c<S>,M,r,<KS>")))
(clobber (reg:CC FLAGS_REG))]
"ix86_binary_operator_ok (ASHIFT, <MODE>mode, operands)"
{
@@ -11098,6 +11099,7 @@ (define_insn "*ashl<mode>3_1"
{
case TYPE_LEA:
case TYPE_ISHIFTX:
+ case TYPE_MSKLOG:
return "#";
case TYPE_ALU:
@@ -11113,7 +11115,7 @@ (define_insn "*ashl<mode>3_1"
return "sal{<imodesuffix>}\t{%2, %0|%0, %2}";
}
}
- [(set_attr "isa" "*,*,bmi2")
+ [(set_attr "isa" "*,*,bmi2,avx512bw")
(set (attr "type")
(cond [(eq_attr "alternative" "1")
(const_string "lea")
@@ -11123,6 +11125,8 @@ (define_insn "*ashl<mode>3_1"
(match_operand 0 "register_operand"))
(match_operand 2 "const1_operand"))
(const_string "alu")
+ (eq_attr "alternative" "3")
+ (const_string "msklog")
]
(const_string "ishift")))
(set (attr "length_immediate")
@@ -11218,15 +11222,16 @@ (define_split
"operands[2] = gen_lowpart (SImode, operands[2]);")
(define_insn "*ashlhi3_1"
- [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,Yp")
- (ashift:HI (match_operand:HI 1 "nonimmediate_operand" "0,l")
- (match_operand:QI 2 "nonmemory_operand" "cI,M")))
+ [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,Yp,?k")
+ (ashift:HI (match_operand:HI 1 "nonimmediate_operand" "0,l,k")
+ (match_operand:QI 2 "nonmemory_operand" "cI,M,Ww")))
(clobber (reg:CC FLAGS_REG))]
"ix86_binary_operator_ok (ASHIFT, HImode, operands)"
{
switch (get_attr_type (insn))
{
case TYPE_LEA:
+ case TYPE_MSKLOG:
return "#";
case TYPE_ALU:
@@ -11241,9 +11246,12 @@ (define_insn "*ashlhi3_1"
return "sal{w}\t{%2, %0|%0, %2}";
}
}
- [(set (attr "type")
+ [(set_attr "isa" "*,*,avx512f")
+ (set (attr "type")
(cond [(eq_attr "alternative" "1")
(const_string "lea")
+ (eq_attr "alternative" "2")
+ (const_string "msklog")
(and (and (match_test "TARGET_DOUBLE_WITH_ADD")
(match_operand 0 "register_operand"))
(match_operand 2 "const1_operand"))
@@ -11259,18 +11267,19 @@ (define_insn "*ashlhi3_1"
(match_test "optimize_function_for_size_p (cfun)")))))
(const_string "0")
(const_string "*")))
- (set_attr "mode" "HI,SI")])
+ (set_attr "mode" "HI,SI,HI")])
(define_insn "*ashlqi3_1"
- [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r,Yp")
- (ashift:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,l")
- (match_operand:QI 2 "nonmemory_operand" "cI,cI,M")))
+ [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r,Yp,?k")
+ (ashift:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,l,k")
+ (match_operand:QI 2 "nonmemory_operand" "cI,cI,M,Wb")))
(clobber (reg:CC FLAGS_REG))]
"ix86_binary_operator_ok (ASHIFT, QImode, operands)"
{
switch (get_attr_type (insn))
{
case TYPE_LEA:
+ case TYPE_MSKLOG:
return "#";
case TYPE_ALU:
@@ -11298,9 +11307,12 @@ (define_insn "*ashlqi3_1"
}
}
}
- [(set (attr "type")
+ [(set_attr "isa" "*,*,*,avx512dq")
+ (set (attr "type")
(cond [(eq_attr "alternative" "2")
(const_string "lea")
+ (eq_attr "alternative" "3")
+ (const_string "msklog")
(and (and (match_test "TARGET_DOUBLE_WITH_ADD")
(match_operand 0 "register_operand"))
(match_operand 2 "const1_operand"))
@@ -11316,7 +11328,7 @@ (define_insn "*ashlqi3_1"
(match_test "optimize_function_for_size_p (cfun)")))))
(const_string "0")
(const_string "*")))
- (set_attr "mode" "QI,SI,SI")
+ (set_attr "mode" "QI,SI,SI,QI")
;; Potential partial reg stall on alternative 1.
(set (attr "preferred_for_speed")
(cond [(eq_attr "alternative" "1")
@@ -11818,13 +11830,13 @@ (define_insn "*bmi2_<insn><mode>3_1"
[(set_attr "type" "ishiftx")
(set_attr "mode" "<MODE>")])
-(define_insn "*<insn><mode>3_1"
+(define_insn "*ashr<mode>3_1"
[(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r")
- (any_shiftrt:SWI48
+ (ashiftrt:SWI48
(match_operand:SWI48 1 "nonimmediate_operand" "0,rm")
(match_operand:QI 2 "nonmemory_operand" "c<S>,r")))
(clobber (reg:CC FLAGS_REG))]
- "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
+ "ix86_binary_operator_ok (ASHIFTRT, <MODE>mode, operands)"
{
switch (get_attr_type (insn))
{
@@ -11834,9 +11846,9 @@ (define_insn "*<insn><mode>3_1"
default:
if (operands[2] == const1_rtx
&& (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
- return "<shift>{<imodesuffix>}\t%0";
+ return "sar{<imodesuffix>}\t%0";
else
- return "<shift>{<imodesuffix>}\t{%2, %0|%0, %2}";
+ return "sar{<imodesuffix>}\t{%2, %0|%0, %2}";
}
}
[(set_attr "isa" "*,bmi2")
@@ -11850,6 +11862,40 @@ (define_insn "*<insn><mode>3_1"
(const_string "*")))
(set_attr "mode" "<MODE>")])
+(define_insn "*lshr<mode>3_1"
+ [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,?k")
+ (lshiftrt:SWI48
+ (match_operand:SWI48 1 "nonimmediate_operand" "0,rm,k")
+ (match_operand:QI 2 "nonmemory_operand" "c<S>,r,<KS>")))
+ (clobber (reg:CC FLAGS_REG))]
+ "ix86_binary_operator_ok (LSHIFTRT, <MODE>mode, operands)"
+{
+ switch (get_attr_type (insn))
+ {
+ case TYPE_ISHIFTX:
+ case TYPE_MSKLOG:
+ return "#";
+
+ default:
+ if (operands[2] == const1_rtx
+ && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+ return "shr{<imodesuffix>}\t%0";
+ else
+ return "shr{<imodesuffix>}\t{%2, %0|%0, %2}";
+ }
+}
+ [(set_attr "isa" "*,bmi2,avx512bw")
+ (set_attr "type" "ishift,ishiftx,msklog")
+ (set (attr "length_immediate")
+ (if_then_else
+ (and (and (match_operand 2 "const1_operand")
+ (eq_attr "alternative" "0"))
+ (ior (match_test "TARGET_SHIFT1")
+ (match_test "optimize_function_for_size_p (cfun)")))
+ (const_string "0")
+ (const_string "*")))
+ (set_attr "mode" "<MODE>")])
+
;; Convert shift to the shiftx pattern to avoid flags dependency.
(define_split
[(set (match_operand:SWI48 0 "register_operand")
@@ -11915,19 +11961,19 @@ (define_split
(zero_extend:DI (any_shiftrt:SI (match_dup 1) (match_dup 2))))]
"operands[2] = gen_lowpart (SImode, operands[2]);")
-(define_insn "*<insn><mode>3_1"
+(define_insn "*ashr<mode>3_1"
[(set (match_operand:SWI12 0 "nonimmediate_operand" "=<r>m")
- (any_shiftrt:SWI12
+ (ashiftrt:SWI12
(match_operand:SWI12 1 "nonimmediate_operand" "0")
(match_operand:QI 2 "nonmemory_operand" "c<S>")))
(clobber (reg:CC FLAGS_REG))]
- "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
+ "ix86_binary_operator_ok (ASHIFTRT, <MODE>mode, operands)"
{
if (operands[2] == const1_rtx
&& (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
- return "<shift>{<imodesuffix>}\t%0";
+ return "sar{<imodesuffix>}\t%0";
else
- return "<shift>{<imodesuffix>}\t{%2, %0|%0, %2}";
+ return "sar{<imodesuffix>}\t{%2, %0|%0, %2}";
}
[(set_attr "type" "ishift")
(set (attr "length_immediate")
@@ -11939,6 +11985,74 @@ (define_insn "*<insn><mode>3_1"
(const_string "*")))
(set_attr "mode" "<MODE>")])
+(define_insn "*lshrqi3_1"
+ [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,?k")
+ (lshiftrt:QI
+ (match_operand:QI 1 "nonimmediate_operand" "0, k")
+ (match_operand:QI 2 "nonmemory_operand" "cI,Wb")))
+ (clobber (reg:CC FLAGS_REG))]
+ "ix86_binary_operator_ok (LSHIFTRT, QImode, operands)"
+{
+ switch (get_attr_type (insn))
+ {
+ case TYPE_ISHIFT:
+ if (operands[2] == const1_rtx
+ && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+ return "shr{b}\t%0";
+ else
+ return "shr{b}\t{%2, %0|%0, %2}";
+ case TYPE_MSKLOG:
+ return "#";
+ default:
+ gcc_unreachable ();
+ }
+}
+ [(set_attr "isa" "*,avx512dq")
+ (set_attr "type" "ishift,msklog")
+ (set (attr "length_immediate")
+ (if_then_else
+ (and (and (match_operand 2 "const1_operand")
+ (eq_attr "alternative" "0"))
+ (ior (match_test "TARGET_SHIFT1")
+ (match_test "optimize_function_for_size_p (cfun)")))
+ (const_string "0")
+ (const_string "*")))
+ (set_attr "mode" "QI")])
+
+(define_insn "*lshrhi3_1"
+ [(set (match_operand:HI 0 "nonimmediate_operand" "=rm, ?k")
+ (lshiftrt:HI
+ (match_operand:HI 1 "nonimmediate_operand" "0, k")
+ (match_operand:QI 2 "nonmemory_operand" "cI, Ww")))
+ (clobber (reg:CC FLAGS_REG))]
+ "ix86_binary_operator_ok (LSHIFTRT, HImode, operands)"
+{
+ switch (get_attr_type (insn))
+ {
+ case TYPE_ISHIFT:
+ if (operands[2] == const1_rtx
+ && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+ return "shr{w}\t%0";
+ else
+ return "shr{w}\t{%2, %0|%0, %2}";
+ case TYPE_MSKLOG:
+ return "#";
+ default:
+ gcc_unreachable ();
+ }
+}
+ [(set_attr "isa" "*, avx512f")
+ (set_attr "type" "ishift,msklog")
+ (set (attr "length_immediate")
+ (if_then_else
+ (and (and (match_operand 2 "const1_operand")
+ (eq_attr "alternative" "0"))
+ (ior (match_test "TARGET_SHIFT1")
+ (match_test "optimize_function_for_size_p (cfun)")))
+ (const_string "0")
+ (const_string "*")))
+ (set_attr "mode" "HI")])
+
(define_insn "*<insn><mode>3_1_slp"
[(set (strict_low_part (match_operand:SWI12 0 "register_operand" "+<r>"))
(any_shiftrt:SWI12 (match_operand:SWI12 1 "register_operand" "0")
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index ab29999023d..f8759e4d758 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1755,6 +1755,20 @@ (define_insn "k<code><mode>"
(set_attr "prefix" "vex")
(set_attr "mode" "<MODE>")])
+(define_split
+ [(set (match_operand:SWI1248_AVX512BW 0 "mask_reg_operand")
+ (any_lshift:SWI1248_AVX512BW
+ (match_operand:SWI1248_AVX512BW 1 "mask_reg_operand")
+ (match_operand 2 "const_int_operand")))
+ (clobber (reg:CC FLAGS_REG))]
+ "TARGET_AVX512F && reload_completed"
+ [(parallel
+ [(set (match_dup 0)
+ (any_lshift:SWI1248_AVX512BW
+ (match_dup 1)
+ (match_dup 2)))
+ (unspec [(const_int 0)] UNSPEC_MASKOP)])])
+
(define_insn "ktest<mode>"
[(set (reg:CC FLAGS_REG)
(unspec:CC
diff --git a/gcc/testsuite/gcc.target/i386/mask-shift.c b/gcc/testsuite/gcc.target/i386/mask-shift.c
new file mode 100644
index 00000000000..4cb6ef37821
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/mask-shift.c
@@ -0,0 +1,83 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -mavx512dq -O2" } */
+
+#include<immintrin.h>
+void
+fooq (__m512i a, __m512i b, void* p)
+{
+ __mmask8 m1 = _mm512_cmpeq_epi64_mask (a, b);
+ m1 >>= 4;
+ _mm512_mask_storeu_epi64 (p, m1, a);
+}
+
+/* { dg-final { scan-assembler-times {(?n)kshiftrb} "1" } } */
+
+void
+food (__m512i a, __m512i b, void* p)
+{
+ __mmask16 m1 = _mm512_cmpeq_epi32_mask (a, b);
+ m1 >>= 8;
+ _mm512_mask_storeu_epi32 (p, m1, a);
+}
+
+/* { dg-final { scan-assembler-times {(?n)kshiftrw} "1" } } */
+
+void
+foow (__m512i a, __m512i b, void* p)
+{
+ __mmask32 m1 = _mm512_cmpeq_epi16_mask (a, b);
+ m1 >>= 16;
+ _mm512_mask_storeu_epi16 (p, m1, a);
+}
+
+/* { dg-final { scan-assembler-times {(?n)kshiftrd} "1" } } */
+
+void
+foob (__m512i a, __m512i b, void* p)
+{
+ __mmask64 m1 = _mm512_cmpeq_epi8_mask (a, b);
+ m1 >>= 32;
+ _mm512_mask_storeu_epi8 (p, m1, a);
+}
+
+/* { dg-final { scan-assembler-times {(?n)kshiftrq} "1" { target { ! ia32 } } } } */
+
+void
+fooq1 (__m512i a, __m512i b, void* p)
+{
+ __mmask8 m1 = _mm512_cmpeq_epi64_mask (a, b);
+ m1 <<= 4;
+ _mm512_mask_storeu_epi64 (p, m1, a);
+}
+
+/* { dg-final { scan-assembler-times {(?n)kshiftlb} "1" } } */
+
+void
+food1 (__m512i a, __m512i b, void* p)
+{
+ __mmask16 m1 = _mm512_cmpeq_epi32_mask (a, b);
+ m1 <<= 8;
+ _mm512_mask_storeu_epi32 (p, m1, a);
+}
+
+/* { dg-final { scan-assembler-times {(?n)kshiftlw} "1" } } */
+
+void
+foow1 (__m512i a, __m512i b, void* p)
+{
+ __mmask32 m1 = _mm512_cmpeq_epi16_mask (a, b);
+ m1 <<= 16;
+ _mm512_mask_storeu_epi16 (p, m1, a);
+}
+
+/* { dg-final { scan-assembler-times {(?n)kshiftld} "1" } } */
+
+void
+foob1 (__m512i a, __m512i b, void* p)
+{
+ __mmask64 m1 = _mm512_cmpeq_epi8_mask (a, b);
+ m1 <<= 32;
+ _mm512_mask_storeu_epi8 (p, m1, a);
+}
+
+/* { dg-final { scan-assembler-times {(?n)kshiftlq} "1" { target { ! ia32 } } } } */
--
2.18.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: [PATCH] Support logic shift left/right for avx512 mask type.
2021-07-22 1:32 ` Liu, Hongtao
@ 2021-07-22 6:38 ` Richard Biener
0 siblings, 0 replies; 6+ messages in thread
From: Richard Biener @ 2021-07-22 6:38 UTC (permalink / raw)
To: Liu, Hongtao; +Cc: Uros Bizjak, Hongtao Liu, gcc-patches, H. J. Lu
On Thu, 22 Jul 2021, Liu, Hongtao wrote:
>
>
> >-----Original Message-----
> >From: Uros Bizjak <ubizjak@gmail.com>
> >Sent: Wednesday, July 21, 2021 4:23 PM
> >To: Hongtao Liu <crazylht@gmail.com>
> >Cc: Liu, Hongtao <hongtao.liu@intel.com>; gcc-patches@gcc.gnu.org; H. J. Lu
> ><hjl.tools@gmail.com>; Richard Biener <rguenther@suse.de>
> >Subject: Re: [PATCH] Support logic shift left/right for avx512 mask type.
> >
> >On Wed, Jul 21, 2021 at 5:05 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >>
> >> On Tue, Jul 20, 2021 at 9:41 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> >> >
> >> > On Tue, Jul 20, 2021 at 2:33 PM liuhongt <hongtao.liu@intel.com> wrote:
> >> > >
> >> > > Hi:
> >> > > As mention in
> >> > > https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575420.html
> >> > >
> >> > > ----cut start-----
> >> > > > note for the lowpart we can just view-convert away the excess
> >> > > > bits, fully re-using the mask. We generate surprisingly "good" code:
> >> > > >
> >> > > > kmovb %k1, %edi
> >> > > > shrb $4, %dil
> >> > > > kmovb %edi, %k2
> >> > > >
> >> > > > besides the lack of using kshiftrb. I guess we're just lacking
> >> > > > a mask register alternative for
> >> > > Yes, we can do it similar as kor/kand/kxor.
> >> > > ---cut end--------
> >> > >
> >> > > Bootstrap and regtested on x86_64-linux-gnu{-m32,}.
> >> > > Ok for trunk?
> >> > >
> >> > > gcc/ChangeLog:
> >> > >
> >> > > * config/i386/constraints.md (Wb): New constraint.
> >> > > (Ww): Ditto.
> >> > > * config/i386/i386.md (*ashlhi3_1): Extend to avx512 mask
> >> > > shift.
> >> > > (*ashlqi3_1): Ditto.
> >> > > (*<insn><mode>3_1): Ditto.
> >> > > (*<insn><mode>3_1): Ditto.
> >> > > * config/i386/sse.md (k<code><mode>): New define_split after
> >> > > it to convert generic shift pattern to mask shift ones.
> >> > >
> >> > > gcc/testsuite/ChangeLog:
> >> > >
> >> > > * gcc.target/i386/mask-shift.c: New test.
> >
> >
> >+(define_insn "*lshr<mode>3_1"
> >+ [(set (match_operand:SWI12 0 "nonimmediate_operand" "=<r>m, ?k")
> >+ (lshiftrt:SWI12
> >+ (match_operand:SWI12 1 "nonimmediate_operand" "0, k")
> >+ (match_operand:QI 2 "nonmemory_operand" "c<S>, <KS>")))
> >+ (clobber (reg:CC FLAGS_REG))]
> >+ "ix86_binary_operator_ok (LSHIFTRT, <MODE>mode, operands)"
> >
> >Also split this one to QImode and HImode to avoid conditions in isa attribute.
> >
> >OK with this change.
> >
>
> Thanks for the review, here's the patch I'm check in.
Works with my experimental patches, thanks!
Richard.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-07-22 6:38 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-20 12:33 [PATCH] Support logic shift left/right for avx512 mask type liuhongt
2021-07-20 13:40 ` Uros Bizjak
2021-07-21 3:11 ` Hongtao Liu
2021-07-21 8:22 ` Uros Bizjak
2021-07-22 1:32 ` Liu, Hongtao
2021-07-22 6:38 ` Richard Biener
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).