public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [x86_64 PATCH] PR target/109973: CCZmode and CCCmode variants of [v]ptest.
@ 2023-05-29 18:17 Roger Sayle
  2023-05-30  6:16 ` Jakub Jelinek
  2023-05-30  7:39 ` Uros Bizjak
  0 siblings, 2 replies; 4+ messages in thread
From: Roger Sayle @ 2023-05-29 18:17 UTC (permalink / raw)
  To: gcc-patches; +Cc: 'Uros Bizjak'

[-- Attachment #1: Type: text/plain, Size: 3692 bytes --]


This is my proposed minimal fix for PR target/109973 (hopefully suitable
for backporting) that follows Jakub Jelinek's suggestion that we introduce
CCZmode and CCCmode variants of ptest and vptest, so that the i386
backend treats [v]ptest instructions similarly to testl instructions;
using different CCmodes to indicate which condition flags are desired,
and then relying on the RTL cmpelim pass to eliminate redundant tests.

This conveniently matches Intel's intrinsics, that provide different
functions for retrieving different flags, _mm_testz_si128 tests the
Z flag, _mm_testc_si128 tests the carry flag.  Currently we use the
same instruction (pattern) for both, and unfortunately the *ptest<mode>_and
optimization is only valid when the ptest/vptest instruction is used to
set/test the Z flag.

The downside, as predicted by Jakub, is that GCC's cmpelim pass is
currently COMPARE-centric and not able to merge the ptests from expressions
such as _mm256_testc_si256 (a, b) + _mm256_testz_si256 (a, b), which is a
known issue, PR target/80040.  I've some follow-up patches to improve
things, but this first patch fixes the wrong-code regression, replacing
it with a rare missed-optimization (hopefully suitable for GCC 13).

The only change that was unanticipated was the tweak to ix86_match_ccmode.
Oddly, CCZmode is allowable for CCmode, but CCCmode isn't.  Given that
CCZmode means just the Z flag, CCCmode means just the C flag, and
CCmode means all the flags, I'm guessing this asymmetry is unintentional.
Perhaps a super-safe fix is to explicitly test for CCZmode, CCCmode or
CCmode
in the *<sse4_1>_ptest<mode> pattern's predicate, and not attempt to
re-use ix86_match_ccmode?

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-05-29  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
        PR targt/109973
        * config/i386/i386-builtin.def (__builtin_ia32_ptestz128): Use new
        CODE_for_sse4_1_ptestzv2di.
        (__builtin_ia32_ptestc128): Use new CODE_for_sse4_1_ptestcv2di.
        (__builtin_ia32_ptestz256): Use new CODE_for_avx_ptestzv4di.
        (__builtin_ia32_ptestc256): Use new CODE_for_avx_ptestcv4di.
        * config/i386/i386-expand.cc (ix86_expand_branch): Use CCZmode
        when expanding UNSPEC_PTEST to compare against zero.
        * config/i386/i386-features.cc (scalar_chain::convert_compare):
        Likewise generate CCZmode UNSPEC_PTESTs when converting comparisons.
        (general_scalar_chain::convert_insn): Use CCZmode for COMPARE
result.
        (timode_scalar_chain::convert_insn): Use CCZmode for COMPARE result.
        * config/i386/i386.cc (ix86_match_ccmode): Allow the SET_SRC to be
        an UNSPEC, in addition to a COMPARE.  Consider CCCmode to be a form
        of CCmode.
        * config/i386/sse.md (define_split): When splitting UNSPEC_MOVMSK
        to UNSPEC_PTEST, preserve the FLAG_REG mode as CCZ.
        (*<sse4_1>_ptest<mode>): Add asterisk to hide define_insn.
        Remove ":CC" flags specification, and use ix86_match_ccmode instead.
        (<sse4_1>_ptestz<mode>): New define_expand to specify CCZ.
        (<sse4_1>_ptestc<mode>): New define_expand to specify CCC.
        (<sse4_1>_ptest<mode>): A define_expand using CC to preserve the
        current behavior.
        (*ptest<mode>_and): Specify CCZ to only perform this optimization
        when only the Z flag is required.

gcc/testsuite/ChangeLog
        PR targt/109973
        * gcc.target/i386/pr109973-1.c: New test case.
        * gcc.target/i386/pr109973-2.c: Likewise.


Thanks,
Roger
--


[-- Attachment #2: patchpt.txt --]
[-- Type: text/plain, Size: 10646 bytes --]

diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index c91e380..383b68a 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -1004,8 +1004,8 @@ BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundps_sfix, "__builtin_ia32_
 BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_roundv4sf2, "__builtin_ia32_roundps_az", IX86_BUILTIN_ROUNDPS_AZ, UNKNOWN, (int) V4SF_FTYPE_V4SF)
 BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_roundv4sf2_sfix, "__builtin_ia32_roundps_az_sfix", IX86_BUILTIN_ROUNDPS_AZ_SFIX, UNKNOWN, (int) V4SI_FTYPE_V4SF)
 
-BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_ptestv2di, "__builtin_ia32_ptestz128", IX86_BUILTIN_PTESTZ, EQ, (int) INT_FTYPE_V2DI_V2DI_PTEST)
-BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_ptestv2di, "__builtin_ia32_ptestc128", IX86_BUILTIN_PTESTC, LTU, (int) INT_FTYPE_V2DI_V2DI_PTEST)
+BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_ptestzv2di, "__builtin_ia32_ptestz128", IX86_BUILTIN_PTESTZ, EQ, (int) INT_FTYPE_V2DI_V2DI_PTEST)
+BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_ptestcv2di, "__builtin_ia32_ptestc128", IX86_BUILTIN_PTESTC, LTU, (int) INT_FTYPE_V2DI_V2DI_PTEST)
 BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_ptestv2di, "__builtin_ia32_ptestnzc128", IX86_BUILTIN_PTESTNZC, GTU, (int) INT_FTYPE_V2DI_V2DI_PTEST)
 
 /* SSE4.2 */
@@ -1164,8 +1164,8 @@ BDESC (OPTION_MASK_ISA_AVX, 0, CODE_FOR_avx_vtestpd256, "__builtin_ia32_vtestnzc
 BDESC (OPTION_MASK_ISA_AVX, 0, CODE_FOR_avx_vtestps256, "__builtin_ia32_vtestzps256", IX86_BUILTIN_VTESTZPS256, EQ, (int) INT_FTYPE_V8SF_V8SF_PTEST)
 BDESC (OPTION_MASK_ISA_AVX, 0, CODE_FOR_avx_vtestps256, "__builtin_ia32_vtestcps256", IX86_BUILTIN_VTESTCPS256, LTU, (int) INT_FTYPE_V8SF_V8SF_PTEST)
 BDESC (OPTION_MASK_ISA_AVX, 0, CODE_FOR_avx_vtestps256, "__builtin_ia32_vtestnzcps256", IX86_BUILTIN_VTESTNZCPS256, GTU, (int) INT_FTYPE_V8SF_V8SF_PTEST)
-BDESC (OPTION_MASK_ISA_AVX, 0, CODE_FOR_avx_ptestv4di, "__builtin_ia32_ptestz256", IX86_BUILTIN_PTESTZ256, EQ, (int) INT_FTYPE_V4DI_V4DI_PTEST)
-BDESC (OPTION_MASK_ISA_AVX, 0, CODE_FOR_avx_ptestv4di, "__builtin_ia32_ptestc256", IX86_BUILTIN_PTESTC256, LTU, (int) INT_FTYPE_V4DI_V4DI_PTEST)
+BDESC (OPTION_MASK_ISA_AVX, 0, CODE_FOR_avx_ptestzv4di, "__builtin_ia32_ptestz256", IX86_BUILTIN_PTESTZ256, EQ, (int) INT_FTYPE_V4DI_V4DI_PTEST)
+BDESC (OPTION_MASK_ISA_AVX, 0, CODE_FOR_avx_ptestcv4di, "__builtin_ia32_ptestc256", IX86_BUILTIN_PTESTC256, LTU, (int) INT_FTYPE_V4DI_V4DI_PTEST)
 BDESC (OPTION_MASK_ISA_AVX, 0, CODE_FOR_avx_ptestv4di, "__builtin_ia32_ptestnzc256", IX86_BUILTIN_PTESTNZC256, GTU, (int) INT_FTYPE_V4DI_V4DI_PTEST)
 
 BDESC (OPTION_MASK_ISA_AVX, 0, CODE_FOR_avx_movmskpd256, "__builtin_ia32_movmskpd256", IX86_BUILTIN_MOVMSKPD256, UNKNOWN, (int) INT_FTYPE_V4DF )
diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 5a57be8..ab793ae 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -2370,8 +2370,8 @@ ix86_expand_branch (enum rtx_code code, rtx op0, rtx op1, rtx label)
       tmp = gen_reg_rtx (mode);
       emit_insn (gen_rtx_SET (tmp, gen_rtx_XOR (mode, op0, op1)));
       tmp = gen_lowpart (p_mode, tmp);
-      emit_insn (gen_rtx_SET (gen_rtx_REG (CCmode, FLAGS_REG),
-			      gen_rtx_UNSPEC (CCmode,
+      emit_insn (gen_rtx_SET (gen_rtx_REG (CCZmode, FLAGS_REG),
+			      gen_rtx_UNSPEC (CCZmode,
 					      gen_rtvec (2, tmp, tmp),
 					      UNSPEC_PTEST)));
       tmp = gen_rtx_fmt_ee (code, VOIDmode, flag, const0_rtx);
diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index a0a7348..3417f6b 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -974,7 +974,7 @@ general_scalar_chain::convert_op (rtx *op, rtx_insn *insn)
     }
 }
 
-/* Convert COMPARE to vector mode.  */
+/* Convert CCZmode COMPARE to vector mode.  */
 
 rtx
 scalar_chain::convert_compare (rtx op1, rtx op2, rtx_insn *insn)
@@ -1023,7 +1023,7 @@ scalar_chain::convert_compare (rtx op1, rtx op2, rtx_insn *insn)
 	  emit_insn_before (gen_rtx_SET (tmp, op11), insn);
 	  op11 = tmp;
 	}
-      return gen_rtx_UNSPEC (CCmode, gen_rtvec (2, op11, op12),
+      return gen_rtx_UNSPEC (CCZmode, gen_rtvec (2, op11, op12),
 			     UNSPEC_PTEST);
     }
   else
@@ -1052,7 +1052,7 @@ scalar_chain::convert_compare (rtx op1, rtx op2, rtx_insn *insn)
       src = tmp;
     }
 
-  return gen_rtx_UNSPEC (CCmode, gen_rtvec (2, src, src), UNSPEC_PTEST);
+  return gen_rtx_UNSPEC (CCZmode, gen_rtvec (2, src, src), UNSPEC_PTEST);
 }
 
 /* Helper function for converting INSN to vector mode.  */
@@ -1219,7 +1219,7 @@ general_scalar_chain::convert_insn (rtx_insn *insn)
       break;
 
     case COMPARE:
-      dst = gen_rtx_REG (CCmode, FLAGS_REG);
+      dst = gen_rtx_REG (CCZmode, FLAGS_REG);
       src = convert_compare (XEXP (src, 0), XEXP (src, 1), insn);
       break;
 
@@ -1726,7 +1726,7 @@ timode_scalar_chain::convert_insn (rtx_insn *insn)
       break;
 
     case COMPARE:
-      dst = gen_rtx_REG (CCmode, FLAGS_REG);
+      dst = gen_rtx_REG (CCZmode, FLAGS_REG);
       src = convert_compare (XEXP (src, 0), XEXP (src, 1), insn);
       break;
 
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 202abf0..7de9f3f 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -15864,7 +15864,8 @@ ix86_match_ccmode (rtx insn, machine_mode req_mode)
   if (GET_CODE (set) == PARALLEL)
     set = XVECEXP (set, 0, 0);
   gcc_assert (GET_CODE (set) == SET);
-  gcc_assert (GET_CODE (SET_SRC (set)) == COMPARE);
+  gcc_assert (GET_CODE (SET_SRC (set)) == COMPARE
+	      || GET_CODE (SET_SRC (set)) == UNSPEC);
 
   set_mode = GET_MODE (SET_DEST (set));
   switch (set_mode)
@@ -15890,10 +15891,12 @@ ix86_match_ccmode (rtx insn, machine_mode req_mode)
     case E_CCZmode:
       break;
 
+    case E_CCCmode:
+      if (req_mode == CCmode)
+	break;
+      /* FALLTHRU */
     case E_CCGZmode:
-
     case E_CCAmode:
-    case E_CCCmode:
     case E_CCOmode:
     case E_CCPmode:
     case E_CCSmode:
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 0656a5ce..9590ca2 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -20423,10 +20423,10 @@
 		    UNSPEC_MOVMSK)
 		 (match_operand 2 "const_int_operand")))]
   "TARGET_SSE4_1 && (INTVAL (operands[2]) == (int) (<vi1avx2const>))"
-  [(set (reg:CC FLAGS_REG)
-	(unspec:CC [(match_dup 0)
-		    (match_dup 0)]
-		   UNSPEC_PTEST))])
+  [(set (reg:CCZ FLAGS_REG)
+	(unspec:CCZ [(match_dup 0)
+		     (match_dup 0)]
+		    UNSPEC_PTEST))])
 
 (define_expand "sse2_maskmovdqu"
   [(set (match_operand:V16QI 0 "memory_operand")
@@ -23078,13 +23078,13 @@
    (set_attr "mode" "<MODE>")])
 
 ;; ptest is very similar to comiss and ucomiss when setting FLAGS_REG.
-;; But it is not a really compare instruction.
-(define_insn "<sse4_1>_ptest<mode>"
-  [(set (reg:CC FLAGS_REG)
-	(unspec:CC [(match_operand:V_AVX 0 "register_operand" "Yr, *x, x")
-		    (match_operand:V_AVX 1 "vector_operand" "YrBm, *xBm, xm")]
-		   UNSPEC_PTEST))]
-  "TARGET_SSE4_1"
+;; But it is not really a compare instruction.
+(define_insn "*<sse4_1>_ptest<mode>"
+  [(set (reg FLAGS_REG)
+	(unspec [(match_operand:V_AVX 0 "register_operand" "Yr, *x, x")
+		 (match_operand:V_AVX 1 "vector_operand" "YrBm, *xBm, xm")]
+		UNSPEC_PTEST))]
+  "TARGET_SSE4_1 && ix86_match_ccmode (insn, CCmode)"
   "%vptest\t{%1, %0|%0, %1}"
   [(set_attr "isa" "noavx,noavx,avx")
    (set_attr "type" "ssecomi")
@@ -23097,6 +23097,30 @@
      (const_string "*")))
    (set_attr "mode" "<sseinsnmode>")])
 
+;; Expand a ptest to set the Z flag.
+(define_expand "<sse4_1>_ptestz<mode>"
+  [(set (reg:CCZ FLAGS_REG)
+	(unspec:CCZ [(match_operand:V_AVX 0 "register_operand")
+		     (match_operand:V_AVX 1 "vector_operand")]
+		    UNSPEC_PTEST))]
+  "TARGET_SSE4_1")
+
+;; Expand a ptest to set the C flag
+(define_expand "<sse4_1>_ptestc<mode>"
+  [(set (reg:CCC FLAGS_REG)
+	(unspec:CCC [(match_operand:V_AVX 0 "register_operand")
+		     (match_operand:V_AVX 1 "vector_operand")]
+		    UNSPEC_PTEST))]
+  "TARGET_SSE4_1")
+
+;; Expand a ptest to set both the Z and C flags
+(define_expand "<sse4_1>_ptest<mode>"
+  [(set (reg:CC FLAGS_REG)
+	(unspec:CC [(match_operand:V_AVX 0 "register_operand")
+		    (match_operand:V_AVX 1 "vector_operand")]
+		   UNSPEC_PTEST))]
+  "TARGET_SSE4_1")
+
 (define_insn "ptesttf2"
   [(set (reg:CC FLAGS_REG)
 	(unspec:CC [(match_operand:TF 0 "register_operand" "Yr, *x, x")
@@ -23111,17 +23135,17 @@
    (set_attr "mode" "TI")])
 
 (define_insn_and_split "*ptest<mode>_and"
-  [(set (reg:CC FLAGS_REG)
-	(unspec:CC [(and:V_AVX (match_operand:V_AVX 0 "register_operand")
-			       (match_operand:V_AVX 1 "vector_operand"))
-		    (and:V_AVX (match_dup 0) (match_dup 1))]
+  [(set (reg:CCZ FLAGS_REG)
+	(unspec:CCZ [(and:V_AVX (match_operand:V_AVX 0 "register_operand")
+				(match_operand:V_AVX 1 "vector_operand"))
+		     (and:V_AVX (match_dup 0) (match_dup 1))]
 		   UNSPEC_PTEST))]
   "TARGET_SSE4_1
    && ix86_pre_reload_split ()"
   "#"
   "&& 1"
-  [(set (reg:CC FLAGS_REG)
-	(unspec:CC [(match_dup 0) (match_dup 1)] UNSPEC_PTEST))])
+  [(set (reg:CCZ FLAGS_REG)
+	(unspec:CCZ [(match_dup 0) (match_dup 1)] UNSPEC_PTEST))])
 
 (define_expand "nearbyint<mode>2"
   [(set (match_operand:VFH 0 "register_operand")
diff --git a/gcc/testsuite/gcc.target/i386/pr109973-1.c b/gcc/testsuite/gcc.target/i386/pr109973-1.c
new file mode 100644
index 0000000..a1b6136b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr109973-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx2" } */
+
+typedef long long __m256i __attribute__ ((__vector_size__ (32)));
+
+int
+foo (__m256i x, __m256i y)
+{
+  __m256i a = x & y;
+  return __builtin_ia32_ptestc256 (a, a);
+}
+
+/* { dg-final { scan-assembler "vpand" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr109973-2.c b/gcc/testsuite/gcc.target/i386/pr109973-2.c
new file mode 100644
index 0000000..167f6ee
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr109973-2.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse4.1" } */
+
+typedef long long __m128i __attribute__ ((__vector_size__ (16)));
+
+int
+foo (__m128i x, __m128i y)
+{
+  __m128i a = x & y;
+  return __builtin_ia32_ptestc128 (a, a);
+}
+
+/* { dg-final { scan-assembler "pand" } } */

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [x86_64 PATCH] PR target/109973: CCZmode and CCCmode variants of [v]ptest.
  2023-05-29 18:17 [x86_64 PATCH] PR target/109973: CCZmode and CCCmode variants of [v]ptest Roger Sayle
@ 2023-05-30  6:16 ` Jakub Jelinek
  2023-05-30  7:39 ` Uros Bizjak
  1 sibling, 0 replies; 4+ messages in thread
From: Jakub Jelinek @ 2023-05-30  6:16 UTC (permalink / raw)
  To: Roger Sayle; +Cc: gcc-patches, 'Uros Bizjak'

On Mon, May 29, 2023 at 07:17:42PM +0100, Roger Sayle wrote:
> The only change that was unanticipated was the tweak to ix86_match_ccmode.
> Oddly, CCZmode is allowable for CCmode, but CCCmode isn't.  Given that

So another option would be to use CCZmode for the ptestz cases and keep
CCmode for ptestc, I think we don't have any modes which cover C and Z
flags but nothing else, and for the optimization we only need to find out
if it is CCZmode.
Though, I'm certainly not familiar with the CC mode details in the backend,
so certainly need to defer this to Uros.

> CCZmode means just the Z flag, CCCmode means just the C flag, and
> CCmode means all the flags, I'm guessing this asymmetry is unintentional.
> Perhaps a super-safe fix is to explicitly test for CCZmode, CCCmode or
> CCmode
> in the *<sse4_1>_ptest<mode> pattern's predicate, and not attempt to
> re-use ix86_match_ccmode?
> 
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?

	Jakub


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [x86_64 PATCH] PR target/109973: CCZmode and CCCmode variants of [v]ptest.
  2023-05-29 18:17 [x86_64 PATCH] PR target/109973: CCZmode and CCCmode variants of [v]ptest Roger Sayle
  2023-05-30  6:16 ` Jakub Jelinek
@ 2023-05-30  7:39 ` Uros Bizjak
  2023-05-30  8:30   ` Uros Bizjak
  1 sibling, 1 reply; 4+ messages in thread
From: Uros Bizjak @ 2023-05-30  7:39 UTC (permalink / raw)
  To: Roger Sayle; +Cc: gcc-patches, Jakub Jelinek

On Mon, May 29, 2023 at 8:17 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> This is my proposed minimal fix for PR target/109973 (hopefully suitable
> for backporting) that follows Jakub Jelinek's suggestion that we introduce
> CCZmode and CCCmode variants of ptest and vptest, so that the i386
> backend treats [v]ptest instructions similarly to testl instructions;
> using different CCmodes to indicate which condition flags are desired,
> and then relying on the RTL cmpelim pass to eliminate redundant tests.
>
> This conveniently matches Intel's intrinsics, that provide different
> functions for retrieving different flags, _mm_testz_si128 tests the
> Z flag, _mm_testc_si128 tests the carry flag.  Currently we use the
> same instruction (pattern) for both, and unfortunately the *ptest<mode>_and
> optimization is only valid when the ptest/vptest instruction is used to
> set/test the Z flag.
>
> The downside, as predicted by Jakub, is that GCC's cmpelim pass is
> currently COMPARE-centric and not able to merge the ptests from expressions
> such as _mm256_testc_si256 (a, b) + _mm256_testz_si256 (a, b), which is a
> known issue, PR target/80040.  I've some follow-up patches to improve
> things, but this first patch fixes the wrong-code regression, replacing
> it with a rare missed-optimization (hopefully suitable for GCC 13).
>
> The only change that was unanticipated was the tweak to ix86_match_ccmode.
> Oddly, CCZmode is allowable for CCmode, but CCCmode isn't.  Given that
> CCZmode means just the Z flag, CCCmode means just the C flag, and
> CCmode means all the flags, I'm guessing this asymmetry is unintentional.
> Perhaps a super-safe fix is to explicitly test for CCZmode, CCCmode or
> CCmode
> in the *<sse4_1>_ptest<mode> pattern's predicate, and not attempt to
> re-use ix86_match_ccmode?

It is actually the other way. CCZmode should NOT be allowed for CCmode
in ix86_match_ccmode. When CCmode is requested, we don't assume
anything about FLAGS bits, so we expect all bits to be valid. CCZmode
implies only Z bit, and should be compatible only with itself. So, the
"break;" is in the wrong place, it should be before E_CCZmode.

Uros.

> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2023-05-29  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         PR targt/109973
>         * config/i386/i386-builtin.def (__builtin_ia32_ptestz128): Use new
>         CODE_for_sse4_1_ptestzv2di.
>         (__builtin_ia32_ptestc128): Use new CODE_for_sse4_1_ptestcv2di.
>         (__builtin_ia32_ptestz256): Use new CODE_for_avx_ptestzv4di.
>         (__builtin_ia32_ptestc256): Use new CODE_for_avx_ptestcv4di.
>         * config/i386/i386-expand.cc (ix86_expand_branch): Use CCZmode
>         when expanding UNSPEC_PTEST to compare against zero.
>         * config/i386/i386-features.cc (scalar_chain::convert_compare):
>         Likewise generate CCZmode UNSPEC_PTESTs when converting comparisons.
>         (general_scalar_chain::convert_insn): Use CCZmode for COMPARE
> result.
>         (timode_scalar_chain::convert_insn): Use CCZmode for COMPARE result.
>         * config/i386/i386.cc (ix86_match_ccmode): Allow the SET_SRC to be
>         an UNSPEC, in addition to a COMPARE.  Consider CCCmode to be a form
>         of CCmode.
>         * config/i386/sse.md (define_split): When splitting UNSPEC_MOVMSK
>         to UNSPEC_PTEST, preserve the FLAG_REG mode as CCZ.
>         (*<sse4_1>_ptest<mode>): Add asterisk to hide define_insn.
>         Remove ":CC" flags specification, and use ix86_match_ccmode instead.
>         (<sse4_1>_ptestz<mode>): New define_expand to specify CCZ.
>         (<sse4_1>_ptestc<mode>): New define_expand to specify CCC.
>         (<sse4_1>_ptest<mode>): A define_expand using CC to preserve the
>         current behavior.
>         (*ptest<mode>_and): Specify CCZ to only perform this optimization
>         when only the Z flag is required.
>
> gcc/testsuite/ChangeLog
>         PR targt/109973
>         * gcc.target/i386/pr109973-1.c: New test case.
>         * gcc.target/i386/pr109973-2.c: Likewise.
>
>
> Thanks,
> Roger
> --
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [x86_64 PATCH] PR target/109973: CCZmode and CCCmode variants of [v]ptest.
  2023-05-30  7:39 ` Uros Bizjak
@ 2023-05-30  8:30   ` Uros Bizjak
  0 siblings, 0 replies; 4+ messages in thread
From: Uros Bizjak @ 2023-05-30  8:30 UTC (permalink / raw)
  To: Roger Sayle; +Cc: gcc-patches, Jakub Jelinek

On Tue, May 30, 2023 at 9:39 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Mon, May 29, 2023 at 8:17 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
> >
> >
> > This is my proposed minimal fix for PR target/109973 (hopefully suitable
> > for backporting) that follows Jakub Jelinek's suggestion that we introduce
> > CCZmode and CCCmode variants of ptest and vptest, so that the i386
> > backend treats [v]ptest instructions similarly to testl instructions;
> > using different CCmodes to indicate which condition flags are desired,
> > and then relying on the RTL cmpelim pass to eliminate redundant tests.
> >
> > This conveniently matches Intel's intrinsics, that provide different
> > functions for retrieving different flags, _mm_testz_si128 tests the
> > Z flag, _mm_testc_si128 tests the carry flag.  Currently we use the
> > same instruction (pattern) for both, and unfortunately the *ptest<mode>_and
> > optimization is only valid when the ptest/vptest instruction is used to
> > set/test the Z flag.
> >
> > The downside, as predicted by Jakub, is that GCC's cmpelim pass is
> > currently COMPARE-centric and not able to merge the ptests from expressions
> > such as _mm256_testc_si256 (a, b) + _mm256_testz_si256 (a, b), which is a
> > known issue, PR target/80040.  I've some follow-up patches to improve
> > things, but this first patch fixes the wrong-code regression, replacing
> > it with a rare missed-optimization (hopefully suitable for GCC 13).
> >
> > The only change that was unanticipated was the tweak to ix86_match_ccmode.
> > Oddly, CCZmode is allowable for CCmode, but CCCmode isn't.  Given that
> > CCZmode means just the Z flag, CCCmode means just the C flag, and
> > CCmode means all the flags, I'm guessing this asymmetry is unintentional.
> > Perhaps a super-safe fix is to explicitly test for CCZmode, CCCmode or
> > CCmode
> > in the *<sse4_1>_ptest<mode> pattern's predicate, and not attempt to
> > re-use ix86_match_ccmode?
>
> It is actually the other way. CCZmode should NOT be allowed for CCmode
> in ix86_match_ccmode. When CCmode is requested, we don't assume
> anything about FLAGS bits, so we expect all bits to be valid. CCZmode
> implies only Z bit, and should be compatible only with itself. So, the
> "break;" is in the wrong place, it should be before E_CCZmode.

Hm, but PTEST is the *PRODUCER* of flags, not the consumer...

So, the whole picture should be like this:

(define_insn "*cmp<mode>_ccno_1"
  [(set (reg FLAGS_REG)
    (compare (match_operand:SWI 0 "nonimmediate_operand" "<r>,?m<r>")
         (match_operand:SWI 1 "const0_operand")))]
  "ix86_match_ccmode (insn, CCNOmode)"

The above means that the compare PROVIDES all bits, but O is
guaranteed to be zero.

(define_insn "*cmp<mode>_1"
  [(set (reg FLAGS_REG)
    (compare (match_operand:SWI 0 "nonimmediate_operand" "<r>m,<r>")
         (match_operand:SWI 1 "<general_operand>" "<r><i>,<r><m>")))]
  "ix86_match_ccmode (insn, CCmode)"

The above means that compare PROVIDES all bits.

+(define_expand "<sse4_1>_ptest<mode>"
+  [(set (reg:CC FLAGS_REG)
+ (unspec:CC [(match_operand:V_AVX 0 "register_operand")
+    (match_operand:V_AVX 1 "vector_operand")]
+   UNSPEC_PTEST))]
+  "TARGET_SSE4_1")

This is not true, PTEST does not provide all FLAGS bits in a general sense.

So, I think your original patch is OK, but please introduce the
ix86_match_ptest_ccmode function instead of reusing ix86_match_ccmode.

Uros.


>
> Uros.
>
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
> >
> >
> > 2023-05-29  Roger Sayle  <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >         PR targt/109973
> >         * config/i386/i386-builtin.def (__builtin_ia32_ptestz128): Use new
> >         CODE_for_sse4_1_ptestzv2di.
> >         (__builtin_ia32_ptestc128): Use new CODE_for_sse4_1_ptestcv2di.
> >         (__builtin_ia32_ptestz256): Use new CODE_for_avx_ptestzv4di.
> >         (__builtin_ia32_ptestc256): Use new CODE_for_avx_ptestcv4di.
> >         * config/i386/i386-expand.cc (ix86_expand_branch): Use CCZmode
> >         when expanding UNSPEC_PTEST to compare against zero.
> >         * config/i386/i386-features.cc (scalar_chain::convert_compare):
> >         Likewise generate CCZmode UNSPEC_PTESTs when converting comparisons.
> >         (general_scalar_chain::convert_insn): Use CCZmode for COMPARE
> > result.
> >         (timode_scalar_chain::convert_insn): Use CCZmode for COMPARE result.
> >         * config/i386/i386.cc (ix86_match_ccmode): Allow the SET_SRC to be
> >         an UNSPEC, in addition to a COMPARE.  Consider CCCmode to be a form
> >         of CCmode.
> >         * config/i386/sse.md (define_split): When splitting UNSPEC_MOVMSK
> >         to UNSPEC_PTEST, preserve the FLAG_REG mode as CCZ.
> >         (*<sse4_1>_ptest<mode>): Add asterisk to hide define_insn.
> >         Remove ":CC" flags specification, and use ix86_match_ccmode instead.
> >         (<sse4_1>_ptestz<mode>): New define_expand to specify CCZ.
> >         (<sse4_1>_ptestc<mode>): New define_expand to specify CCC.
> >         (<sse4_1>_ptest<mode>): A define_expand using CC to preserve the
> >         current behavior.
> >         (*ptest<mode>_and): Specify CCZ to only perform this optimization
> >         when only the Z flag is required.
> >
> > gcc/testsuite/ChangeLog
> >         PR targt/109973
> >         * gcc.target/i386/pr109973-1.c: New test case.
> >         * gcc.target/i386/pr109973-2.c: Likewise.
> >
> >
> > Thanks,
> > Roger
> > --
> >

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-05-30  8:30 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-29 18:17 [x86_64 PATCH] PR target/109973: CCZmode and CCCmode variants of [v]ptest Roger Sayle
2023-05-30  6:16 ` Jakub Jelinek
2023-05-30  7:39 ` Uros Bizjak
2023-05-30  8:30   ` Uros Bizjak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).