[PATCH] x86: Optimize load of const all 1s float vectors

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] x86: Optimize load of const all 1s float vectors
@ 2021-08-07 14:41 H.J. Lu
  2021-08-08 20:23 ` Uros Bizjak
  0 siblings, 1 reply; 7+ messages in thread
From: H.J. Lu @ 2021-08-07 14:41 UTC (permalink / raw)
  To: gcc-patches; +Cc: Uros Bizjak, liuhongt

Update vector_all_ones_operand to return true for const all 1s float
vectors.

gcc/

	PR target/101804
	* config/i386/predicates.md (vector_all_ones_operand): Return
	true for const all 1s float vectors.

gcc/testsuite/

	PR target/101804
	* gcc.target/i386/avx2-gather-2.c: Pass -march=skylake instead
	of "-mavx2 -mtune=skylake".  Scan vpcmpeqd.
---
 gcc/config/i386/predicates.md                 | 7 ++++---
 gcc/testsuite/gcc.target/i386/avx2-gather-2.c | 3 ++-
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 6aa1ea32627..9637e64ea58 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -1126,9 +1126,10 @@ (define_predicate "float_vector_all_ones_operand"
 
 /* Return true if operand is a vector constant that is all ones. */
 (define_predicate "vector_all_ones_operand"
-  (and (match_code "const_vector")
-       (match_test "INTEGRAL_MODE_P (GET_MODE (op))")
-       (match_test "op == CONSTM1_RTX (GET_MODE (op))")))
+  (ior (and (match_code "const_vector")
+	    (match_test "INTEGRAL_MODE_P (GET_MODE (op))")
+	    (match_test "op == CONSTM1_RTX (GET_MODE (op))"))
+       (match_operand 0 "float_vector_all_ones_operand")))
 
 ; Return true when OP is operand acceptable for vector memory operand.
 ; Only AVX can have misaligned memory operand.
diff --git a/gcc/testsuite/gcc.target/i386/avx2-gather-2.c b/gcc/testsuite/gcc.target/i386/avx2-gather-2.c
index 1a704afd834..ad5ef73107c 100644
--- a/gcc/testsuite/gcc.target/i386/avx2-gather-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx2-gather-2.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -mavx2 -fdump-tree-vect-details -mtune=skylake" } */
+/* { dg-options "-O3 -fdump-tree-vect-details -march=skylake" } */
 
 #include "avx2-gather-1.c"
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 16 "vect" } } */
+/* { dg-final { scan-assembler "vpcmpeqd" } } */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86: Optimize load of const all 1s float vectors
  2021-08-07 14:41 [PATCH] x86: Optimize load of const all 1s float vectors H.J. Lu
@ 2021-08-08 20:23 ` Uros Bizjak
  2021-08-09 15:23   ` [PATCH v2] " H.J. Lu
  0 siblings, 1 reply; 7+ messages in thread
From: Uros Bizjak @ 2021-08-08 20:23 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches, liuhongt

On Sat, Aug 7, 2021 at 4:41 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> Update vector_all_ones_operand to return true for const all 1s float
> vectors.
>
> gcc/
>
>         PR target/101804
>         * config/i386/predicates.md (vector_all_ones_operand): Return
>         true for const all 1s float vectors.
>
> gcc/testsuite/
>
>         PR target/101804
>         * gcc.target/i386/avx2-gather-2.c: Pass -march=skylake instead
>         of "-mavx2 -mtune=skylake".  Scan vpcmpeqd.

No, vector_all_ones_operand is intended to be integer minus-one. Use
float_vector_all_ones_operand in a specific place, where it is needed.

Uros.

> ---
>  gcc/config/i386/predicates.md                 | 7 ++++---
>  gcc/testsuite/gcc.target/i386/avx2-gather-2.c | 3 ++-
>  2 files changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
> index 6aa1ea32627..9637e64ea58 100644
> --- a/gcc/config/i386/predicates.md
> +++ b/gcc/config/i386/predicates.md
> @@ -1126,9 +1126,10 @@ (define_predicate "float_vector_all_ones_operand"
>
>  /* Return true if operand is a vector constant that is all ones. */
>  (define_predicate "vector_all_ones_operand"
> -  (and (match_code "const_vector")
> -       (match_test "INTEGRAL_MODE_P (GET_MODE (op))")
> -       (match_test "op == CONSTM1_RTX (GET_MODE (op))")))
> +  (ior (and (match_code "const_vector")
> +           (match_test "INTEGRAL_MODE_P (GET_MODE (op))")
> +           (match_test "op == CONSTM1_RTX (GET_MODE (op))"))
> +       (match_operand 0 "float_vector_all_ones_operand")))
>
>  ; Return true when OP is operand acceptable for vector memory operand.
>  ; Only AVX can have misaligned memory operand.
> diff --git a/gcc/testsuite/gcc.target/i386/avx2-gather-2.c b/gcc/testsuite/gcc.target/i386/avx2-gather-2.c
> index 1a704afd834..ad5ef73107c 100644
> --- a/gcc/testsuite/gcc.target/i386/avx2-gather-2.c
> +++ b/gcc/testsuite/gcc.target/i386/avx2-gather-2.c
> @@ -1,6 +1,7 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O3 -mavx2 -fdump-tree-vect-details -mtune=skylake" } */
> +/* { dg-options "-O3 -fdump-tree-vect-details -march=skylake" } */
>
>  #include "avx2-gather-1.c"
>
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 16 "vect" } } */
> +/* { dg-final { scan-assembler "vpcmpeqd" } } */
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2] x86: Optimize load of const all 1s float vectors
  2021-08-08 20:23 ` Uros Bizjak
@ 2021-08-09 15:23   ` H.J. Lu
  2021-08-09 15:27     ` Uros Bizjak
  0 siblings, 1 reply; 7+ messages in thread
From: H.J. Lu @ 2021-08-09 15:23 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, liuhongt

[-- Attachment #1: Type: text/plain, Size: 758 bytes --]

On Sun, Aug 8, 2021 at 1:23 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Sat, Aug 7, 2021 at 4:41 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > Update vector_all_ones_operand to return true for const all 1s float
> > vectors.
> >
> > gcc/
> >
> >         PR target/101804
> >         * config/i386/predicates.md (vector_all_ones_operand): Return
> >         true for const all 1s float vectors.
> >
> > gcc/testsuite/
> >
> >         PR target/101804
> >         * gcc.target/i386/avx2-gather-2.c: Pass -march=skylake instead
> >         of "-mavx2 -mtune=skylake".  Scan vpcmpeqd.
>
> No, vector_all_ones_operand is intended to be integer minus-one. Use
> float_vector_all_ones_operand in a specific place, where it is needed.
>

Like this?

-- 
H.J.

[-- Attachment #2: v2-0001-x86-Optimize-load-of-const-all-1s-float-vectors.patch --]
[-- Type: text/x-patch, Size: 3343 bytes --]

From 017dee0c9ee946e16fbb1b938c1dd62ac0f95b09 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Fri, 6 Aug 2021 12:32:01 -0700
Subject: [PATCH v2] x86: Optimize load of const all 1s float vectors

Check float_vector_all_ones_operand for vector floating-point modes to
optimize load of const all 1s float vectors.

gcc/

	PR target/101804
	* config/i386/constraints.md (BC): For vector floating-point
	modes, also check float_vector_all_ones_operand.
	* config/i386/i386.c (standard_sse_constant_p): Likewise.
	(standard_sse_constant_opcode): Likewise.

gcc/testsuite/

	PR target/101804
	* gcc.target/i386/avx2-gather-2.c: Pass -march=skylake instead
	of "-mavx2 -mtune=skylake".  Scan vpcmpeqd.
---
 gcc/config/i386/constraints.md                |  4 +++-
 gcc/config/i386/i386.c                        | 11 +++++++++--
 gcc/testsuite/gcc.target/i386/avx2-gather-2.c |  3 ++-
 3 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 4aa28a5621c..cb1a803ab87 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -219,7 +219,9 @@ (define_constraint "BC"
   "@internal SSE constant -1 operand."
   (and (match_test "TARGET_SSE")
        (ior (match_test "op == constm1_rtx")
-	    (match_operand 0 "vector_all_ones_operand"))))
+	    (match_operand 0 "vector_all_ones_operand")
+	    (and (match_test "GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT")
+		 (match_operand 0 "float_vector_all_ones_operand")))))
 
 ;; Integer constant constraints.
 (define_constraint "Wb"
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index aea224ab235..4d4ab6a03d6 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -5073,7 +5073,11 @@ standard_sse_constant_p (rtx x, machine_mode pred_mode)
   if (x == const0_rtx || const0_operand (x, mode))
     return 1;
 
-  if (x == constm1_rtx || vector_all_ones_operand (x, mode))
+  if (x == constm1_rtx
+      || vector_all_ones_operand (x, mode)
+      || ((GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT
+	   || GET_MODE_CLASS (pred_mode) == MODE_VECTOR_FLOAT)
+	  && float_vector_all_ones_operand (x, mode)))
     {
       /* VOIDmode integer constant, get mode from the predicate.  */
       if (mode == VOIDmode)
@@ -5171,7 +5175,10 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx *operands)
 	  gcc_unreachable ();
 	}
     }
-  else if (x == constm1_rtx || vector_all_ones_operand (x, mode))
+  else if (x == constm1_rtx
+	   || vector_all_ones_operand (x, mode)
+	   || (GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT
+	       && float_vector_all_ones_operand (x, mode)))
     {
       enum attr_mode insn_mode = get_attr_mode (insn);
       
diff --git a/gcc/testsuite/gcc.target/i386/avx2-gather-2.c b/gcc/testsuite/gcc.target/i386/avx2-gather-2.c
index 1a704afd834..ad5ef73107c 100644
--- a/gcc/testsuite/gcc.target/i386/avx2-gather-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx2-gather-2.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -mavx2 -fdump-tree-vect-details -mtune=skylake" } */
+/* { dg-options "-O3 -fdump-tree-vect-details -march=skylake" } */
 
 #include "avx2-gather-1.c"
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 16 "vect" } } */
+/* { dg-final { scan-assembler "vpcmpeqd" } } */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] x86: Optimize load of const all 1s float vectors
  2021-08-09 15:23   ` [PATCH v2] " H.J. Lu
@ 2021-08-09 15:27     ` Uros Bizjak
  2021-08-09 17:46       ` [PATCH v3] x86: Optimize load of const all 1s FP vectors H.J. Lu
  0 siblings, 1 reply; 7+ messages in thread
From: Uros Bizjak @ 2021-08-09 15:27 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches, liuhongt

On Mon, Aug 9, 2021 at 5:24 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Sun, Aug 8, 2021 at 1:23 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Sat, Aug 7, 2021 at 4:41 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >
> > > Update vector_all_ones_operand to return true for const all 1s float
> > > vectors.
> > >
> > > gcc/
> > >
> > >         PR target/101804
> > >         * config/i386/predicates.md (vector_all_ones_operand): Return
> > >         true for const all 1s float vectors.
> > >
> > > gcc/testsuite/
> > >
> > >         PR target/101804
> > >         * gcc.target/i386/avx2-gather-2.c: Pass -march=skylake instead
> > >         of "-mavx2 -mtune=skylake".  Scan vpcmpeqd.
> >
> > No, vector_all_ones_operand is intended to be integer minus-one. Use
> > float_vector_all_ones_operand in a specific place, where it is needed.
> >
>
> Like this?

Please also add a new constraint, BC is intended for integer values.

Uros.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v3] x86: Optimize load of const all 1s FP vectors
  2021-08-09 15:27     ` Uros Bizjak
@ 2021-08-09 17:46       ` H.J. Lu
  2021-08-09 18:53         ` Uros Bizjak
  0 siblings, 1 reply; 7+ messages in thread
From: H.J. Lu @ 2021-08-09 17:46 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, liuhongt

[-- Attachment #1: Type: text/plain, Size: 1157 bytes --]

On Mon, Aug 9, 2021 at 8:27 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Mon, Aug 9, 2021 at 5:24 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Sun, Aug 8, 2021 at 1:23 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > On Sat, Aug 7, 2021 at 4:41 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> > > >
> > > > Update vector_all_ones_operand to return true for const all 1s float
> > > > vectors.
> > > >
> > > > gcc/
> > > >
> > > >         PR target/101804
> > > >         * config/i386/predicates.md (vector_all_ones_operand): Return
> > > >         true for const all 1s float vectors.
> > > >
> > > > gcc/testsuite/
> > > >
> > > >         PR target/101804
> > > >         * gcc.target/i386/avx2-gather-2.c: Pass -march=skylake instead
> > > >         of "-mavx2 -mtune=skylake".  Scan vpcmpeqd.
> > >
> > > No, vector_all_ones_operand is intended to be integer minus-one. Use
> > > float_vector_all_ones_operand in a specific place, where it is needed.
> > >
> >
> > Like this?
>
> Please also add a new constraint, BC is intended for integer values.
>
> Uros.

Here is the v3 patch with the new BF constraint.  OK for master?

Thanks.

-- 
H.J.

[-- Attachment #2: v3-0001-x86-Optimize-load-of-const-all-1s-FP-vectors.patch --]
[-- Type: text/x-patch, Size: 5244 bytes --]

From 6d4f8d82ad2c6d284c2c7afc199af27749da6418 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Fri, 6 Aug 2021 12:32:01 -0700
Subject: [PATCH v3] x86: Optimize load of const all 1s FP vectors

Check float_vector_all_ones_operand for vector floating-point modes to
optimize load of const all 1s floating-point vectors.

gcc/

	PR target/101804
	* config/i386/constraints.md (BC): Document for integer SSE
	constant -1 operand.
	(BF): New constraint for const all 1s floating-point vectors.
	* config/i386/i386.c (standard_sse_constant_p): Likewise.
	(standard_sse_constant_opcode): Likewise.
	* config/i386/sse.md (sseconstm1): New mode attribute.
	(mov<mode>_internal): Replace BC with <sseconstm1>.

gcc/testsuite/

	PR target/101804
	* gcc.target/i386/avx2-gather-2.c: Pass -march=skylake instead
	of "-mavx2 -mtune=skylake".  Scan vpcmpeqd.
---
 gcc/config/i386/constraints.md                | 10 ++++++++--
 gcc/config/i386/i386.c                        | 11 +++++++++--
 gcc/config/i386/sse.md                        | 11 ++++++++++-
 gcc/testsuite/gcc.target/i386/avx2-gather-2.c |  3 ++-
 4 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 4aa28a5621c..5a8c52b52e0 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -166,7 +166,8 @@ (define_register_constraint "YW"
 ;;  s  Sibcall memory operand, not valid for TARGET_X32
 ;;  w  Call memory operand, not valid for TARGET_X32
 ;;  z  Constant call address operand.
-;;  C  SSE constant operand.
+;;  C  Integer SSE constant -1 operand.
+;;  F  Floating-point SSE constant -1 operand.
 
 (define_constraint "Bf"
   "@internal Flags register operand."
@@ -216,11 +217,16 @@ (define_constraint "Bz"
   (match_operand 0 "constant_call_address_operand"))
 
 (define_constraint "BC"
-  "@internal SSE constant -1 operand."
+  "@internal integer SSE constant -1 operand."
   (and (match_test "TARGET_SSE")
        (ior (match_test "op == constm1_rtx")
 	    (match_operand 0 "vector_all_ones_operand"))))
 
+(define_constraint "BF"
+  "@internal floating-point SSE constant -1 operand."
+  (and (match_test "TARGET_SSE")
+       (match_operand 0 "float_vector_all_ones_operand")))
+
 ;; Integer constant constraints.
 (define_constraint "Wb"
   "Integer constant in the range 0 @dots{} 7, for 8-bit shifts."
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index aea224ab235..4d4ab6a03d6 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -5073,7 +5073,11 @@ standard_sse_constant_p (rtx x, machine_mode pred_mode)
   if (x == const0_rtx || const0_operand (x, mode))
     return 1;
 
-  if (x == constm1_rtx || vector_all_ones_operand (x, mode))
+  if (x == constm1_rtx
+      || vector_all_ones_operand (x, mode)
+      || ((GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT
+	   || GET_MODE_CLASS (pred_mode) == MODE_VECTOR_FLOAT)
+	  && float_vector_all_ones_operand (x, mode)))
     {
       /* VOIDmode integer constant, get mode from the predicate.  */
       if (mode == VOIDmode)
@@ -5171,7 +5175,10 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx *operands)
 	  gcc_unreachable ();
 	}
     }
-  else if (x == constm1_rtx || vector_all_ones_operand (x, mode))
+  else if (x == constm1_rtx
+	   || vector_all_ones_operand (x, mode)
+	   || (GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT
+	       && float_vector_all_ones_operand (x, mode)))
     {
       enum attr_mode insn_mode = get_attr_mode (insn);
       
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index a46a2373547..5255d42900e 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -777,6 +777,15 @@ (define_mode_attr sseinsnmode
    (V4SF "V4SF") (V2DF "V2DF")
    (TI "TI")])
 
+;; SSE constant -1 constraint
+(define_mode_attr sseconstm1
+  [(V64QI "BC") (V32HI "BC") (V16SI "BC") (V8DI "BC") (V4TI "BC")
+   (V32QI "BC") (V16HI "BC") (V8SI "BC") (V4DI "BC") (V2TI "BC")
+   (V16QI "BC") (V8HI "BC") (V4SI "BC") (V2DI "BC") (V1TI "BC")
+   (V16SF "BF") (V8DF "BF")
+   (V8SF "BF") (V4DF "BF")
+   (V4SF "BF") (V2DF "BF")])
+
 ;; Mapping of vector modes to corresponding mask size
 (define_mode_attr avx512fmaskmode
   [(V64QI "DI") (V32QI "SI") (V16QI "HI")
@@ -1056,7 +1065,7 @@ (define_insn "mov<mode>_internal"
   [(set (match_operand:VMOVE 0 "nonimmediate_operand"
 	 "=v,v ,v ,m")
 	(match_operand:VMOVE 1 "nonimmediate_or_sse_const_operand"
-	 " C,BC,vm,v"))]
+	 " C,<sseconstm1>,vm,v"))]
   "TARGET_SSE
    && (register_operand (operands[0], <MODE>mode)
        || register_operand (operands[1], <MODE>mode))"
diff --git a/gcc/testsuite/gcc.target/i386/avx2-gather-2.c b/gcc/testsuite/gcc.target/i386/avx2-gather-2.c
index 1a704afd834..ad5ef73107c 100644
--- a/gcc/testsuite/gcc.target/i386/avx2-gather-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx2-gather-2.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -mavx2 -fdump-tree-vect-details -mtune=skylake" } */
+/* { dg-options "-O3 -fdump-tree-vect-details -march=skylake" } */
 
 #include "avx2-gather-1.c"
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 16 "vect" } } */
+/* { dg-final { scan-assembler "vpcmpeqd" } } */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] x86: Optimize load of const all 1s FP vectors
  2021-08-09 17:46       ` [PATCH v3] x86: Optimize load of const all 1s FP vectors H.J. Lu
@ 2021-08-09 18:53         ` Uros Bizjak
  2021-08-09 19:14           ` [PATCH v4] x86: Optimize load of const FP all bits set vectors H.J. Lu
  0 siblings, 1 reply; 7+ messages in thread
From: Uros Bizjak @ 2021-08-09 18:53 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches, liuhongt

On Mon, Aug 9, 2021 at 7:47 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Mon, Aug 9, 2021 at 8:27 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Mon, Aug 9, 2021 at 5:24 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >
> > > On Sun, Aug 8, 2021 at 1:23 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > >
> > > > On Sat, Aug 7, 2021 at 4:41 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> > > > >
> > > > > Update vector_all_ones_operand to return true for const all 1s float
> > > > > vectors.
> > > > >
> > > > > gcc/
> > > > >
> > > > >         PR target/101804
> > > > >         * config/i386/predicates.md (vector_all_ones_operand): Return
> > > > >         true for const all 1s float vectors.
> > > > >
> > > > > gcc/testsuite/
> > > > >
> > > > >         PR target/101804
> > > > >         * gcc.target/i386/avx2-gather-2.c: Pass -march=skylake instead
> > > > >         of "-mavx2 -mtune=skylake".  Scan vpcmpeqd.
> > > >
> > > > No, vector_all_ones_operand is intended to be integer minus-one. Use
> > > > float_vector_all_ones_operand in a specific place, where it is needed.
> > > >
> > >
> > > Like this?
> >
> > Please also add a new constraint, BC is intended for integer values.
> >
> > Uros.
>
> Here is the v3 patch with the new BF constraint.  OK for master?

OK with some comment fixes.

+;;  C  Integer SSE constant -1 operand.
+;;  F  Floating-point SSE constant -1 operand.

Maybe we should simply say "... SSE constant with all bits set" here.
"... SSE constant -1" is ambiguous, someone can interpret this as a
constant -1.0.

-  "@internal SSE constant -1 operand."
+  "@internal integer SSE constant -1 operand."

Also here.

+(define_constraint "BF"
+  "@internal floating-point SSE constant -1 operand."

And here.

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v4] x86: Optimize load of const FP all bits set vectors
  2021-08-09 18:53         ` Uros Bizjak
@ 2021-08-09 19:14           ` H.J. Lu
  0 siblings, 0 replies; 7+ messages in thread
From: H.J. Lu @ 2021-08-09 19:14 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, liuhongt

[-- Attachment #1: Type: text/plain, Size: 1999 bytes --]

On Mon, Aug 9, 2021 at 11:53 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Mon, Aug 9, 2021 at 7:47 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Mon, Aug 9, 2021 at 8:27 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > On Mon, Aug 9, 2021 at 5:24 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> > > >
> > > > On Sun, Aug 8, 2021 at 1:23 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > > >
> > > > > On Sat, Aug 7, 2021 at 4:41 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> > > > > >
> > > > > > Update vector_all_ones_operand to return true for const all 1s float
> > > > > > vectors.
> > > > > >
> > > > > > gcc/
> > > > > >
> > > > > >         PR target/101804
> > > > > >         * config/i386/predicates.md (vector_all_ones_operand): Return
> > > > > >         true for const all 1s float vectors.
> > > > > >
> > > > > > gcc/testsuite/
> > > > > >
> > > > > >         PR target/101804
> > > > > >         * gcc.target/i386/avx2-gather-2.c: Pass -march=skylake instead
> > > > > >         of "-mavx2 -mtune=skylake".  Scan vpcmpeqd.
> > > > >
> > > > > No, vector_all_ones_operand is intended to be integer minus-one. Use
> > > > > float_vector_all_ones_operand in a specific place, where it is needed.
> > > > >
> > > >
> > > > Like this?
> > >
> > > Please also add a new constraint, BC is intended for integer values.
> > >
> > > Uros.
> >
> > Here is the v3 patch with the new BF constraint.  OK for master?
>
> OK with some comment fixes.
>
> +;;  C  Integer SSE constant -1 operand.
> +;;  F  Floating-point SSE constant -1 operand.
>
> Maybe we should simply say "... SSE constant with all bits set" here.
> "... SSE constant -1" is ambiguous, someone can interpret this as a
> constant -1.0.
>
> -  "@internal SSE constant -1 operand."
> +  "@internal integer SSE constant -1 operand."
>
> Also here.
>
> +(define_constraint "BF"
> +  "@internal floating-point SSE constant -1 operand."
>
> And here.
>
> Thanks,
> Uros.

This is the patch I am going to check in.

Thanks.

-- 
H.J.

[-- Attachment #2: v4-0001-x86-Optimize-load-of-const-FP-all-bits-set-vector.patch --]
[-- Type: text/x-patch, Size: 5338 bytes --]

From 93499102a52d29974b47e1d32274f6a08a4d6580 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Fri, 6 Aug 2021 12:32:01 -0700
Subject: [PATCH v4] x86: Optimize load of const FP all bits set vectors

Check float_vector_all_ones_operand for vector floating-point modes to
optimize load of const floating-point all bits set vectors.

gcc/

	PR target/101804
	* config/i386/constraints.md (BC): Document for integer SSE
	constant all bits set operand.
	(BF): New constraint for const floating-point all bits set
	vectors.
	* config/i386/i386.c (standard_sse_constant_p): Likewise.
	(standard_sse_constant_opcode): Likewise.
	* config/i386/sse.md (sseconstm1): New mode attribute.
	(mov<mode>_internal): Replace BC with <sseconstm1>.

gcc/testsuite/

	PR target/101804
	* gcc.target/i386/avx2-gather-2.c: Pass -march=skylake instead
	of "-mavx2 -mtune=skylake".  Scan vpcmpeqd.

Fix
---
 gcc/config/i386/constraints.md                | 10 ++++++++--
 gcc/config/i386/i386.c                        | 11 +++++++++--
 gcc/config/i386/sse.md                        | 11 ++++++++++-
 gcc/testsuite/gcc.target/i386/avx2-gather-2.c |  3 ++-
 4 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 4aa28a5621c..87cceac4cfb 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -166,7 +166,8 @@ (define_register_constraint "YW"
 ;;  s  Sibcall memory operand, not valid for TARGET_X32
 ;;  w  Call memory operand, not valid for TARGET_X32
 ;;  z  Constant call address operand.
-;;  C  SSE constant operand.
+;;  C  Integer SSE constant with all bits set operand.
+;;  F  Floating-point SSE constant with all bits set operand.
 
 (define_constraint "Bf"
   "@internal Flags register operand."
@@ -216,11 +217,16 @@ (define_constraint "Bz"
   (match_operand 0 "constant_call_address_operand"))
 
 (define_constraint "BC"
-  "@internal SSE constant -1 operand."
+  "@internal integer SSE constant with all bits set operand."
   (and (match_test "TARGET_SSE")
        (ior (match_test "op == constm1_rtx")
 	    (match_operand 0 "vector_all_ones_operand"))))
 
+(define_constraint "BF"
+  "@internal floating-point SSE constant with all bits set operand."
+  (and (match_test "TARGET_SSE")
+       (match_operand 0 "float_vector_all_ones_operand")))
+
 ;; Integer constant constraints.
 (define_constraint "Wb"
   "Integer constant in the range 0 @dots{} 7, for 8-bit shifts."
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index aea224ab235..4d4ab6a03d6 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -5073,7 +5073,11 @@ standard_sse_constant_p (rtx x, machine_mode pred_mode)
   if (x == const0_rtx || const0_operand (x, mode))
     return 1;
 
-  if (x == constm1_rtx || vector_all_ones_operand (x, mode))
+  if (x == constm1_rtx
+      || vector_all_ones_operand (x, mode)
+      || ((GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT
+	   || GET_MODE_CLASS (pred_mode) == MODE_VECTOR_FLOAT)
+	  && float_vector_all_ones_operand (x, mode)))
     {
       /* VOIDmode integer constant, get mode from the predicate.  */
       if (mode == VOIDmode)
@@ -5171,7 +5175,10 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx *operands)
 	  gcc_unreachable ();
 	}
     }
-  else if (x == constm1_rtx || vector_all_ones_operand (x, mode))
+  else if (x == constm1_rtx
+	   || vector_all_ones_operand (x, mode)
+	   || (GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT
+	       && float_vector_all_ones_operand (x, mode)))
     {
       enum attr_mode insn_mode = get_attr_mode (insn);
       
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index a46a2373547..5255d42900e 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -777,6 +777,15 @@ (define_mode_attr sseinsnmode
    (V4SF "V4SF") (V2DF "V2DF")
    (TI "TI")])
 
+;; SSE constant -1 constraint
+(define_mode_attr sseconstm1
+  [(V64QI "BC") (V32HI "BC") (V16SI "BC") (V8DI "BC") (V4TI "BC")
+   (V32QI "BC") (V16HI "BC") (V8SI "BC") (V4DI "BC") (V2TI "BC")
+   (V16QI "BC") (V8HI "BC") (V4SI "BC") (V2DI "BC") (V1TI "BC")
+   (V16SF "BF") (V8DF "BF")
+   (V8SF "BF") (V4DF "BF")
+   (V4SF "BF") (V2DF "BF")])
+
 ;; Mapping of vector modes to corresponding mask size
 (define_mode_attr avx512fmaskmode
   [(V64QI "DI") (V32QI "SI") (V16QI "HI")
@@ -1056,7 +1065,7 @@ (define_insn "mov<mode>_internal"
   [(set (match_operand:VMOVE 0 "nonimmediate_operand"
 	 "=v,v ,v ,m")
 	(match_operand:VMOVE 1 "nonimmediate_or_sse_const_operand"
-	 " C,BC,vm,v"))]
+	 " C,<sseconstm1>,vm,v"))]
   "TARGET_SSE
    && (register_operand (operands[0], <MODE>mode)
        || register_operand (operands[1], <MODE>mode))"
diff --git a/gcc/testsuite/gcc.target/i386/avx2-gather-2.c b/gcc/testsuite/gcc.target/i386/avx2-gather-2.c
index 1a704afd834..ad5ef73107c 100644
--- a/gcc/testsuite/gcc.target/i386/avx2-gather-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx2-gather-2.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -mavx2 -fdump-tree-vect-details -mtune=skylake" } */
+/* { dg-options "-O3 -fdump-tree-vect-details -march=skylake" } */
 
 #include "avx2-gather-1.c"
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 16 "vect" } } */
+/* { dg-final { scan-assembler "vpcmpeqd" } } */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-08-09 19:15 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-07 14:41 [PATCH] x86: Optimize load of const all 1s float vectors H.J. Lu
2021-08-08 20:23 ` Uros Bizjak
2021-08-09 15:23   ` [PATCH v2] " H.J. Lu
2021-08-09 15:27     ` Uros Bizjak
2021-08-09 17:46       ` [PATCH v3] x86: Optimize load of const all 1s FP vectors H.J. Lu
2021-08-09 18:53         ` Uros Bizjak
2021-08-09 19:14           ` [PATCH v4] x86: Optimize load of const FP all bits set vectors H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).