public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PR97903][ARM] Missed optimization in lowering to vtst
@ 2021-02-05  9:52 Prathamesh Kulkarni
  2021-02-05 10:12 ` Kyrylo Tkachov
  0 siblings, 1 reply; 4+ messages in thread
From: Prathamesh Kulkarni @ 2021-02-05  9:52 UTC (permalink / raw)
  To: gcc Patches, Kyrill Tkachov

[-- Attachment #1: Type: text/plain, Size: 1337 bytes --]

Hi,
For the following test-case:

#include <arm_neon.h>

uint8x8_t f1(int8x8_t a, int8x8_t b) {
  return (uint8x8_t) ((a & b) != 0);
}

gcc fails to lower test operation to vtst, and instead emits:
f1:
        vand    d0, d0, d1
        vceq.i8 d0, d0, #0
        vmvn    d0, d0
        bx      lr

The attached patch tries to fix this by adding a pattern to match this combine:
Trying 7, 8 -> 9:
    7: r120:V8QI=r123:V8QI&r124:V8QI
      REG_DEAD r124:V8QI
      REG_DEAD r123:V8QI
    8: r122:V8QI=-r120:V8QI==const_vector
      REG_DEAD r120:V8QI
    9: r121:V8QI=~r122:V8QI
      REG_DEAD r122:V8QI
Failed to match this instruction:
(set (reg:V8QI 121)
    (plus:V8QI (eq:V8QI (and:V8QI (reg:V8QI 123)
                (reg:V8QI 124))
            (const_vector:V8QI [
                    (const_int 0 [0]) repeated x8
                ]))
        (const_vector:V8QI [
                (const_int -1 [0xffffffffffffffff]) repeated x8
            ])))

Essentially it converts:
r120 = (and r123 r124)
r122 = (neg (eq r120 0))
r121 = (not r122)
-->
r121 = vtst r123, r124

(I guess it simplifies (not (neg X)) to (plus X -1) above).

Code-gen after patch:
f1:
        vtst.8  d0, d0, d1
        bx      lr

Bootstrapped + tested on arm-linux-gnueabihf, and
cross tested on arm*-*-*.
Does it look OK for next stage-1 ?

Thanks,
Prathamesh

[-- Attachment #2: pr97903-1.diff --]
[-- Type: application/octet-stream, Size: 1955 bytes --]

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index fec2cc91d24..cc9372b178c 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -2588,6 +2588,18 @@
   [(set_attr "type" "neon_tst<q>")]
 )
 
+(define_insn "neon_vtst_combine<mode>"
+  [(set (match_operand:VDQIW 0 "s_register_operand" "=w")
+        (plus:VDQIW
+	  (eq:VDQIW
+	    (and:VDQIW (match_operand:VDQIW 1 "s_register_operand" "w")
+		       (match_operand:VDQIW 2 "s_register_operand" "w"))
+	    (match_operand:VDQIW 3 "zero_operand" "i"))
+	  (match_operand:VDQIW 4 "minus_one_operand" "i")))]
+  "TARGET_NEON"
+  "vtst.<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+)
+
 (define_insn "neon_vabd<sup><mode>"
   [(set (match_operand:VDQIW 0 "s_register_operand" "=w")
         (unspec:VDQIW [(match_operand:VDQIW 1 "s_register_operand" "w")
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index c661f015fc5..9db061dc88c 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -200,6 +200,10 @@
   (and (match_code "const_int,const_double,const_vector")
        (match_test "op == CONST0_RTX (mode)")))
 
+(define_predicate "minus_one_operand"
+  (and (match_code "const_int,const_double,const_vector")
+       (match_test "op == CONSTM1_RTX (mode)")))
+
 ;; Match a register, or zero in the appropriate mode.
 (define_predicate "reg_or_zero_operand"
   (ior (match_operand 0 "s_register_operand")
diff --git a/gcc/testsuite/gcc.target/arm/pr97903.c b/gcc/testsuite/gcc.target/arm/pr97903.c
new file mode 100644
index 00000000000..f4058bb1588
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr97903.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O3" }  */
+/* { dg-add-options arm_neon } */
+
+#include <arm_neon.h>
+
+uint8x8_t f1(int8x8_t a, int8x8_t b) {
+  return (uint8x8_t) ((a & b) != 0);
+}
+
+/* { dg-final { scan-assembler "vtst" } } */

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [PR97903][ARM] Missed optimization in lowering to vtst
  2021-02-05  9:52 [PR97903][ARM] Missed optimization in lowering to vtst Prathamesh Kulkarni
@ 2021-02-05 10:12 ` Kyrylo Tkachov
  2021-05-05  8:34   ` Prathamesh Kulkarni
  0 siblings, 1 reply; 4+ messages in thread
From: Kyrylo Tkachov @ 2021-02-05 10:12 UTC (permalink / raw)
  To: Prathamesh Kulkarni, gcc Patches

Hi Prathamesh,

> -----Original Message-----
> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> Sent: 05 February 2021 09:53
> To: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov
> <Kyrylo.Tkachov@arm.com>
> Subject: [PR97903][ARM] Missed optimization in lowering to vtst
> 
> Hi,
> For the following test-case:
> 
> #include <arm_neon.h>
> 
> uint8x8_t f1(int8x8_t a, int8x8_t b) {
>   return (uint8x8_t) ((a & b) != 0);
> }
> 
> gcc fails to lower test operation to vtst, and instead emits:
> f1:
>         vand    d0, d0, d1
>         vceq.i8 d0, d0, #0
>         vmvn    d0, d0
>         bx      lr
> 
> The attached patch tries to fix this by adding a pattern to match this combine:
> Trying 7, 8 -> 9:
>     7: r120:V8QI=r123:V8QI&r124:V8QI
>       REG_DEAD r124:V8QI
>       REG_DEAD r123:V8QI
>     8: r122:V8QI=-r120:V8QI==const_vector
>       REG_DEAD r120:V8QI
>     9: r121:V8QI=~r122:V8QI
>       REG_DEAD r122:V8QI
> Failed to match this instruction:
> (set (reg:V8QI 121)
>     (plus:V8QI (eq:V8QI (and:V8QI (reg:V8QI 123)
>                 (reg:V8QI 124))
>             (const_vector:V8QI [
>                     (const_int 0 [0]) repeated x8
>                 ]))
>         (const_vector:V8QI [
>                 (const_int -1 [0xffffffffffffffff]) repeated x8
>             ])))
> 
> Essentially it converts:
> r120 = (and r123 r124)
> r122 = (neg (eq r120 0))
> r121 = (not r122)
> -->
> r121 = vtst r123, r124
> 
> (I guess it simplifies (not (neg X)) to (plus X -1) above).
> 
> Code-gen after patch:
> f1:
>         vtst.8  d0, d0, d1
>         bx      lr
> 

+(define_insn "neon_vtst_combine<mode>"
+  [(set (match_operand:VDQIW 0 "s_register_operand" "=w")
+        (plus:VDQIW
+	  (eq:VDQIW
+	    (and:VDQIW (match_operand:VDQIW 1 "s_register_operand" "w")
+		       (match_operand:VDQIW 2 "s_register_operand" "w"))
+	    (match_operand:VDQIW 3 "zero_operand" "i"))
+	  (match_operand:VDQIW 4 "minus_one_operand" "i")))]
+  "TARGET_NEON"
+  "vtst.<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+)

This will need a type attribute for scheduling.

> Bootstrapped + tested on arm-linux-gnueabihf, and
> cross tested on arm*-*-*.
> Does it look OK for next stage-1 ?

It looks sensible to me for stage 1.
Thanks,
Kyrill

> 
> Thanks,
> Prathamesh

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PR97903][ARM] Missed optimization in lowering to vtst
  2021-02-05 10:12 ` Kyrylo Tkachov
@ 2021-05-05  8:34   ` Prathamesh Kulkarni
  2021-05-05  8:42     ` Kyrylo Tkachov
  0 siblings, 1 reply; 4+ messages in thread
From: Prathamesh Kulkarni @ 2021-05-05  8:34 UTC (permalink / raw)
  To: Kyrylo Tkachov; +Cc: gcc Patches

[-- Attachment #1: Type: text/plain, Size: 2654 bytes --]

On Fri, 5 Feb 2021 at 15:42, Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> wrote:
>
> Hi Prathamesh,
>
> > -----Original Message-----
> > From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> > Sent: 05 February 2021 09:53
> > To: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov
> > <Kyrylo.Tkachov@arm.com>
> > Subject: [PR97903][ARM] Missed optimization in lowering to vtst
> >
> > Hi,
> > For the following test-case:
> >
> > #include <arm_neon.h>
> >
> > uint8x8_t f1(int8x8_t a, int8x8_t b) {
> >   return (uint8x8_t) ((a & b) != 0);
> > }
> >
> > gcc fails to lower test operation to vtst, and instead emits:
> > f1:
> >         vand    d0, d0, d1
> >         vceq.i8 d0, d0, #0
> >         vmvn    d0, d0
> >         bx      lr
> >
> > The attached patch tries to fix this by adding a pattern to match this combine:
> > Trying 7, 8 -> 9:
> >     7: r120:V8QI=r123:V8QI&r124:V8QI
> >       REG_DEAD r124:V8QI
> >       REG_DEAD r123:V8QI
> >     8: r122:V8QI=-r120:V8QI==const_vector
> >       REG_DEAD r120:V8QI
> >     9: r121:V8QI=~r122:V8QI
> >       REG_DEAD r122:V8QI
> > Failed to match this instruction:
> > (set (reg:V8QI 121)
> >     (plus:V8QI (eq:V8QI (and:V8QI (reg:V8QI 123)
> >                 (reg:V8QI 124))
> >             (const_vector:V8QI [
> >                     (const_int 0 [0]) repeated x8
> >                 ]))
> >         (const_vector:V8QI [
> >                 (const_int -1 [0xffffffffffffffff]) repeated x8
> >             ])))
> >
> > Essentially it converts:
> > r120 = (and r123 r124)
> > r122 = (neg (eq r120 0))
> > r121 = (not r122)
> > -->
> > r121 = vtst r123, r124
> >
> > (I guess it simplifies (not (neg X)) to (plus X -1) above).
> >
> > Code-gen after patch:
> > f1:
> >         vtst.8  d0, d0, d1
> >         bx      lr
> >
>
> +(define_insn "neon_vtst_combine<mode>"
> +  [(set (match_operand:VDQIW 0 "s_register_operand" "=w")
> +        (plus:VDQIW
> +         (eq:VDQIW
> +           (and:VDQIW (match_operand:VDQIW 1 "s_register_operand" "w")
> +                      (match_operand:VDQIW 2 "s_register_operand" "w"))
> +           (match_operand:VDQIW 3 "zero_operand" "i"))
> +         (match_operand:VDQIW 4 "minus_one_operand" "i")))]
> +  "TARGET_NEON"
> +  "vtst.<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
> +)
>
> This will need a type attribute for scheduling.
>
> > Bootstrapped + tested on arm-linux-gnueabihf, and
> > cross tested on arm*-*-*.
> > Does it look OK for next stage-1 ?
>
> It looks sensible to me for stage 1.
Hi Kyrill,
Would it be OK to commit the attached patch after testing passes ?

Thanks,
Prathamesh
> Thanks,
> Kyrill
>
> >
> > Thanks,
> > Prathamesh

[-- Attachment #2: pr97903-2.diff --]
[-- Type: application/octet-stream, Size: 1475 bytes --]

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index fec2cc91d24..2a1e304f2ba 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -2588,6 +2588,19 @@
   [(set_attr "type" "neon_tst<q>")]
 )
 
+(define_insn "neon_vtst_combine<mode>"
+  [(set (match_operand:VDQIW 0 "s_register_operand" "=w")
+        (plus:VDQIW
+	  (eq:VDQIW
+	    (and:VDQIW (match_operand:VDQIW 1 "s_register_operand" "w")
+		       (match_operand:VDQIW 2 "s_register_operand" "w"))
+	    (match_operand:VDQIW 3 "zero_operand" "i"))
+	  (match_operand:VDQIW 4 "minus_one_operand" "i")))]
+  "TARGET_NEON"
+  "vtst.<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+  [(set_attr "type" "neon_tst<q>")]
+)
+
 (define_insn "neon_vabd<sup><mode>"
   [(set (match_operand:VDQIW 0 "s_register_operand" "=w")
         (unspec:VDQIW [(match_operand:VDQIW 1 "s_register_operand" "w")
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index c661f015fc5..9db061dc88c 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -200,6 +200,10 @@
   (and (match_code "const_int,const_double,const_vector")
        (match_test "op == CONST0_RTX (mode)")))
 
+(define_predicate "minus_one_operand"
+  (and (match_code "const_int,const_double,const_vector")
+       (match_test "op == CONSTM1_RTX (mode)")))
+
 ;; Match a register, or zero in the appropriate mode.
 (define_predicate "reg_or_zero_operand"
   (ior (match_operand 0 "s_register_operand")

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [PR97903][ARM] Missed optimization in lowering to vtst
  2021-05-05  8:34   ` Prathamesh Kulkarni
@ 2021-05-05  8:42     ` Kyrylo Tkachov
  0 siblings, 0 replies; 4+ messages in thread
From: Kyrylo Tkachov @ 2021-05-05  8:42 UTC (permalink / raw)
  To: Prathamesh Kulkarni; +Cc: gcc Patches

Hi Prathamesh,

> -----Original Message-----
> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> Sent: 05 May 2021 09:35
> To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Cc: gcc Patches <gcc-patches@gcc.gnu.org>
> Subject: Re: [PR97903][ARM] Missed optimization in lowering to vtst
> 
> On Fri, 5 Feb 2021 at 15:42, Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> wrote:
> >
> > Hi Prathamesh,
> >
> > > -----Original Message-----
> > > From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> > > Sent: 05 February 2021 09:53
> > > To: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov
> > > <Kyrylo.Tkachov@arm.com>
> > > Subject: [PR97903][ARM] Missed optimization in lowering to vtst
> > >
> > > Hi,
> > > For the following test-case:
> > >
> > > #include <arm_neon.h>
> > >
> > > uint8x8_t f1(int8x8_t a, int8x8_t b) {
> > >   return (uint8x8_t) ((a & b) != 0);
> > > }
> > >
> > > gcc fails to lower test operation to vtst, and instead emits:
> > > f1:
> > >         vand    d0, d0, d1
> > >         vceq.i8 d0, d0, #0
> > >         vmvn    d0, d0
> > >         bx      lr
> > >
> > > The attached patch tries to fix this by adding a pattern to match this
> combine:
> > > Trying 7, 8 -> 9:
> > >     7: r120:V8QI=r123:V8QI&r124:V8QI
> > >       REG_DEAD r124:V8QI
> > >       REG_DEAD r123:V8QI
> > >     8: r122:V8QI=-r120:V8QI==const_vector
> > >       REG_DEAD r120:V8QI
> > >     9: r121:V8QI=~r122:V8QI
> > >       REG_DEAD r122:V8QI
> > > Failed to match this instruction:
> > > (set (reg:V8QI 121)
> > >     (plus:V8QI (eq:V8QI (and:V8QI (reg:V8QI 123)
> > >                 (reg:V8QI 124))
> > >             (const_vector:V8QI [
> > >                     (const_int 0 [0]) repeated x8
> > >                 ]))
> > >         (const_vector:V8QI [
> > >                 (const_int -1 [0xffffffffffffffff]) repeated x8
> > >             ])))
> > >
> > > Essentially it converts:
> > > r120 = (and r123 r124)
> > > r122 = (neg (eq r120 0))
> > > r121 = (not r122)
> > > -->
> > > r121 = vtst r123, r124
> > >
> > > (I guess it simplifies (not (neg X)) to (plus X -1) above).
> > >
> > > Code-gen after patch:
> > > f1:
> > >         vtst.8  d0, d0, d1
> > >         bx      lr
> > >
> >
> > +(define_insn "neon_vtst_combine<mode>"
> > +  [(set (match_operand:VDQIW 0 "s_register_operand" "=w")
> > +        (plus:VDQIW
> > +         (eq:VDQIW
> > +           (and:VDQIW (match_operand:VDQIW 1 "s_register_operand" "w")
> > +                      (match_operand:VDQIW 2 "s_register_operand" "w"))
> > +           (match_operand:VDQIW 3 "zero_operand" "i"))
> > +         (match_operand:VDQIW 4 "minus_one_operand" "i")))]
> > +  "TARGET_NEON"
> > +  "vtst.<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
> > +)
> >
> > This will need a type attribute for scheduling.
> >
> > > Bootstrapped + tested on arm-linux-gnueabihf, and
> > > cross tested on arm*-*-*.
> > > Does it look OK for next stage-1 ?
> >
> > It looks sensible to me for stage 1.
> Hi Kyrill,
> Would it be OK to commit the attached patch after testing passes ?

Ok.
Thanks,
Kyrill

> 
> Thanks,
> Prathamesh
> > Thanks,
> > Kyrill
> >
> > >
> > > Thanks,
> > > Prathamesh

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-05-05  8:42 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-05  9:52 [PR97903][ARM] Missed optimization in lowering to vtst Prathamesh Kulkarni
2021-02-05 10:12 ` Kyrylo Tkachov
2021-05-05  8:34   ` Prathamesh Kulkarni
2021-05-05  8:42     ` Kyrylo Tkachov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).