[ARM] PR98435: Missed optimization in expanding vector constructor

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [ARM] PR98435: Missed optimization in expanding vector constructor
@ 2021-06-04  7:25 Prathamesh Kulkarni
  2021-06-04  7:45 ` Christophe Lyon
  0 siblings, 1 reply; 29+ messages in thread
From: Prathamesh Kulkarni @ 2021-06-04  7:25 UTC (permalink / raw)
  To: gcc Patches, Kyrill Tkachov

Hi,
As mentioned in PR, for the following test-case:

#include <arm_neon.h>

bfloat16x4_t f1 (bfloat16_t a)
{
  return vdup_n_bf16 (a);
}

bfloat16x4_t f2 (bfloat16_t a)
{
  return (bfloat16x4_t) {a, a, a, a};
}

Compiling with arm-linux-gnueabi -O3 -mfpu=neon -mfloat-abi=softfp
-march=armv8.2-a+bf16+fp16 results in f2 not being vectorized:

f1:
        vdup.16 d16, r0
        vmov    r0, r1, d16  @ v4bf
        bx      lr

f2:
        mov     r3, r0  @ __bf16
        adr     r1, .L4
        ldrd    r0, [r1]
        mov     r2, r3  @ __bf16
        mov     ip, r3  @ __bf16
        bfi     r1, r2, #0, #16
        bfi     r0, ip, #0, #16
        bfi     r1, r3, #16, #16
        bfi     r0, r2, #16, #16
        bx      lr

This seems to happen because vec_init pattern in neon.md has VDQ mode
iterator, which doesn't include V4BF. In attached patch, I changed
mode
to VDQX which seems to work for the test-case, and the compiler now generates:

f2:
        vdup.16 d16, r0
        vmov    r0, r1, d16  @ v4bf
        bx      lr

However, the pattern is also gated on TARGET_HAVE_MVE and I am not
sure if either VDQ or VDQX are correct modes for MVE since MVE has
only 128-bit vectors ?

Thanks,
Prathamesh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ARM] PR98435: Missed optimization in expanding vector constructor
  2021-06-04  7:25 [ARM] PR98435: Missed optimization in expanding vector constructor Prathamesh Kulkarni
@ 2021-06-04  7:45 ` Christophe Lyon
  2021-06-09 10:28   ` Prathamesh Kulkarni
  0 siblings, 1 reply; 29+ messages in thread
From: Christophe Lyon @ 2021-06-04  7:45 UTC (permalink / raw)
  To: Prathamesh Kulkarni; +Cc: gcc Patches, Kyrill Tkachov

On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi,
> As mentioned in PR, for the following test-case:
>
> #include <arm_neon.h>
>
> bfloat16x4_t f1 (bfloat16_t a)
> {
>   return vdup_n_bf16 (a);
> }
>
> bfloat16x4_t f2 (bfloat16_t a)
> {
>   return (bfloat16x4_t) {a, a, a, a};
> }
>
> Compiling with arm-linux-gnueabi -O3 -mfpu=neon -mfloat-abi=softfp
> -march=armv8.2-a+bf16+fp16 results in f2 not being vectorized:
>
> f1:
>         vdup.16 d16, r0
>         vmov    r0, r1, d16  @ v4bf
>         bx      lr
>
> f2:
>         mov     r3, r0  @ __bf16
>         adr     r1, .L4
>         ldrd    r0, [r1]
>         mov     r2, r3  @ __bf16
>         mov     ip, r3  @ __bf16
>         bfi     r1, r2, #0, #16
>         bfi     r0, ip, #0, #16
>         bfi     r1, r3, #16, #16
>         bfi     r0, r2, #16, #16
>         bx      lr
>
> This seems to happen because vec_init pattern in neon.md has VDQ mode
> iterator, which doesn't include V4BF. In attached patch, I changed
> mode
> to VDQX which seems to work for the test-case, and the compiler now generates:
>
> f2:
>         vdup.16 d16, r0
>         vmov    r0, r1, d16  @ v4bf
>         bx      lr
>
> However, the pattern is also gated on TARGET_HAVE_MVE and I am not
> sure if either VDQ or VDQX are correct modes for MVE since MVE has
> only 128-bit vectors ?
>

I think patterns common to both Neon and MVE should be moved to
vec-common.md, I don't know why such patterns were left in neon.md.

That being said, I suggest you look at other similar patterns in
vec-common.md, most of which are gated on
ARM_HAVE_<MODE>_ARITH
and possibly beware of issues with iwmmxt :-)

Christophe

> Thanks,
> Prathamesh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ARM] PR98435: Missed optimization in expanding vector constructor
  2021-06-04  7:45 ` Christophe Lyon
@ 2021-06-09 10:28   ` Prathamesh Kulkarni
  2021-06-14  8:01     ` Prathamesh Kulkarni
  0 siblings, 1 reply; 29+ messages in thread
From: Prathamesh Kulkarni @ 2021-06-09 10:28 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: gcc Patches, Kyrill Tkachov

[-- Attachment #1: Type: text/plain, Size: 2308 bytes --]

On Fri, 4 Jun 2021 at 13:15, Christophe Lyon <christophe.lyon@linaro.org> wrote:
>
> On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > Hi,
> > As mentioned in PR, for the following test-case:
> >
> > #include <arm_neon.h>
> >
> > bfloat16x4_t f1 (bfloat16_t a)
> > {
> >   return vdup_n_bf16 (a);
> > }
> >
> > bfloat16x4_t f2 (bfloat16_t a)
> > {
> >   return (bfloat16x4_t) {a, a, a, a};
> > }
> >
> > Compiling with arm-linux-gnueabi -O3 -mfpu=neon -mfloat-abi=softfp
> > -march=armv8.2-a+bf16+fp16 results in f2 not being vectorized:
> >
> > f1:
> >         vdup.16 d16, r0
> >         vmov    r0, r1, d16  @ v4bf
> >         bx      lr
> >
> > f2:
> >         mov     r3, r0  @ __bf16
> >         adr     r1, .L4
> >         ldrd    r0, [r1]
> >         mov     r2, r3  @ __bf16
> >         mov     ip, r3  @ __bf16
> >         bfi     r1, r2, #0, #16
> >         bfi     r0, ip, #0, #16
> >         bfi     r1, r3, #16, #16
> >         bfi     r0, r2, #16, #16
> >         bx      lr
> >
> > This seems to happen because vec_init pattern in neon.md has VDQ mode
> > iterator, which doesn't include V4BF. In attached patch, I changed
> > mode
> > to VDQX which seems to work for the test-case, and the compiler now generates:
> >
> > f2:
> >         vdup.16 d16, r0
> >         vmov    r0, r1, d16  @ v4bf
> >         bx      lr
> >
> > However, the pattern is also gated on TARGET_HAVE_MVE and I am not
> > sure if either VDQ or VDQX are correct modes for MVE since MVE has
> > only 128-bit vectors ?
> >
>
> I think patterns common to both Neon and MVE should be moved to
> vec-common.md, I don't know why such patterns were left in neon.md.
Since we end up calling neon_expand_vector_init for both NEON and MVE,
I am not sure if we should separate the pattern ?
Would it make sense to FAIL if the mode size isn't 16 bytes for MVE as
in attached patch so
it will call neon_expand_vector_init only for 128-bit vectors ?
Altho hard-coding 16 in the pattern doesn't seem a good idea to me either.

Thanks,
Prathamesh
>
> That being said, I suggest you look at other similar patterns in
> vec-common.md, most of which are gated on
> ARM_HAVE_<MODE>_ARITH
> and possibly beware of issues with iwmmxt :-)
>
> Christophe
>
> > Thanks,
> > Prathamesh

[-- Attachment #2: pr98435-2.diff --]
[-- Type: application/octet-stream, Size: 541 bytes --]

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 6a6573317cf..27dd672ca76 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -459,10 +459,12 @@
 )
 
 (define_expand "vec_init<mode><V_elem_l>"
-  [(match_operand:VDQ 0 "s_register_operand")
+  [(match_operand:VDQX 0 "s_register_operand")
    (match_operand 1 "" "")]
   "TARGET_NEON || TARGET_HAVE_MVE"
 {
+  if (TARGET_HAVE_MVE && GET_MODE_SIZE (GET_MODE (operands[0])) != 16)
+    FAIL;
   neon_expand_vector_init (operands[0], operands[1]);
   DONE;
 })

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ARM] PR98435: Missed optimization in expanding vector constructor
  2021-06-09 10:28   ` Prathamesh Kulkarni
@ 2021-06-14  8:01     ` Prathamesh Kulkarni
  2021-06-21  8:34       ` Prathamesh Kulkarni
  2021-06-24 16:31       ` Kyrylo Tkachov
  0 siblings, 2 replies; 29+ messages in thread
From: Prathamesh Kulkarni @ 2021-06-14  8:01 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: gcc Patches, Kyrill Tkachov

[-- Attachment #1: Type: text/plain, Size: 2666 bytes --]

On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
>
> On Fri, 4 Jun 2021 at 13:15, Christophe Lyon <christophe.lyon@linaro.org> wrote:
> >
> > On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > Hi,
> > > As mentioned in PR, for the following test-case:
> > >
> > > #include <arm_neon.h>
> > >
> > > bfloat16x4_t f1 (bfloat16_t a)
> > > {
> > >   return vdup_n_bf16 (a);
> > > }
> > >
> > > bfloat16x4_t f2 (bfloat16_t a)
> > > {
> > >   return (bfloat16x4_t) {a, a, a, a};
> > > }
> > >
> > > Compiling with arm-linux-gnueabi -O3 -mfpu=neon -mfloat-abi=softfp
> > > -march=armv8.2-a+bf16+fp16 results in f2 not being vectorized:
> > >
> > > f1:
> > >         vdup.16 d16, r0
> > >         vmov    r0, r1, d16  @ v4bf
> > >         bx      lr
> > >
> > > f2:
> > >         mov     r3, r0  @ __bf16
> > >         adr     r1, .L4
> > >         ldrd    r0, [r1]
> > >         mov     r2, r3  @ __bf16
> > >         mov     ip, r3  @ __bf16
> > >         bfi     r1, r2, #0, #16
> > >         bfi     r0, ip, #0, #16
> > >         bfi     r1, r3, #16, #16
> > >         bfi     r0, r2, #16, #16
> > >         bx      lr
> > >
> > > This seems to happen because vec_init pattern in neon.md has VDQ mode
> > > iterator, which doesn't include V4BF. In attached patch, I changed
> > > mode
> > > to VDQX which seems to work for the test-case, and the compiler now generates:
> > >
> > > f2:
> > >         vdup.16 d16, r0
> > >         vmov    r0, r1, d16  @ v4bf
> > >         bx      lr
> > >
> > > However, the pattern is also gated on TARGET_HAVE_MVE and I am not
> > > sure if either VDQ or VDQX are correct modes for MVE since MVE has
> > > only 128-bit vectors ?
> > >
> >
> > I think patterns common to both Neon and MVE should be moved to
> > vec-common.md, I don't know why such patterns were left in neon.md.
> Since we end up calling neon_expand_vector_init for both NEON and MVE,
> I am not sure if we should separate the pattern ?
> Would it make sense to FAIL if the mode size isn't 16 bytes for MVE as
> in attached patch so
> it will call neon_expand_vector_init only for 128-bit vectors ?
> Altho hard-coding 16 in the pattern doesn't seem a good idea to me either.
ping https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572342.html
(attaching patch as text).

Thanks,
Prathamesh
>
> Thanks,
> Prathamesh
> >
> > That being said, I suggest you look at other similar patterns in
> > vec-common.md, most of which are gated on
> > ARM_HAVE_<MODE>_ARITH
> > and possibly beware of issues with iwmmxt :-)
> >
> > Christophe
> >
> > > Thanks,
> > > Prathamesh

[-- Attachment #2: pr98435-2.txt --]
[-- Type: text/plain, Size: 541 bytes --]

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 6a6573317cf..27dd672ca76 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -459,10 +459,12 @@
 )
 
 (define_expand "vec_init<mode><V_elem_l>"
-  [(match_operand:VDQ 0 "s_register_operand")
+  [(match_operand:VDQX 0 "s_register_operand")
    (match_operand 1 "" "")]
   "TARGET_NEON || TARGET_HAVE_MVE"
 {
+  if (TARGET_HAVE_MVE && GET_MODE_SIZE (GET_MODE (operands[0])) != 16)
+    FAIL;
   neon_expand_vector_init (operands[0], operands[1]);
   DONE;
 })

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ARM] PR98435: Missed optimization in expanding vector constructor
  2021-06-14  8:01     ` Prathamesh Kulkarni
@ 2021-06-21  8:34       ` Prathamesh Kulkarni
  2021-06-24 16:31       ` Kyrylo Tkachov
  1 sibling, 0 replies; 29+ messages in thread
From: Prathamesh Kulkarni @ 2021-06-21  8:34 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: gcc Patches, Kyrill Tkachov

On Mon, 14 Jun 2021 at 13:31, Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
>
> On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni
> <prathamesh.kulkarni@linaro.org> wrote:
> >
> > On Fri, 4 Jun 2021 at 13:15, Christophe Lyon <christophe.lyon@linaro.org> wrote:
> > >
> > > On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > > >
> > > > Hi,
> > > > As mentioned in PR, for the following test-case:
> > > >
> > > > #include <arm_neon.h>
> > > >
> > > > bfloat16x4_t f1 (bfloat16_t a)
> > > > {
> > > >   return vdup_n_bf16 (a);
> > > > }
> > > >
> > > > bfloat16x4_t f2 (bfloat16_t a)
> > > > {
> > > >   return (bfloat16x4_t) {a, a, a, a};
> > > > }
> > > >
> > > > Compiling with arm-linux-gnueabi -O3 -mfpu=neon -mfloat-abi=softfp
> > > > -march=armv8.2-a+bf16+fp16 results in f2 not being vectorized:
> > > >
> > > > f1:
> > > >         vdup.16 d16, r0
> > > >         vmov    r0, r1, d16  @ v4bf
> > > >         bx      lr
> > > >
> > > > f2:
> > > >         mov     r3, r0  @ __bf16
> > > >         adr     r1, .L4
> > > >         ldrd    r0, [r1]
> > > >         mov     r2, r3  @ __bf16
> > > >         mov     ip, r3  @ __bf16
> > > >         bfi     r1, r2, #0, #16
> > > >         bfi     r0, ip, #0, #16
> > > >         bfi     r1, r3, #16, #16
> > > >         bfi     r0, r2, #16, #16
> > > >         bx      lr
> > > >
> > > > This seems to happen because vec_init pattern in neon.md has VDQ mode
> > > > iterator, which doesn't include V4BF. In attached patch, I changed
> > > > mode
> > > > to VDQX which seems to work for the test-case, and the compiler now generates:
> > > >
> > > > f2:
> > > >         vdup.16 d16, r0
> > > >         vmov    r0, r1, d16  @ v4bf
> > > >         bx      lr
> > > >
> > > > However, the pattern is also gated on TARGET_HAVE_MVE and I am not
> > > > sure if either VDQ or VDQX are correct modes for MVE since MVE has
> > > > only 128-bit vectors ?
> > > >
> > >
> > > I think patterns common to both Neon and MVE should be moved to
> > > vec-common.md, I don't know why such patterns were left in neon.md.
> > Since we end up calling neon_expand_vector_init for both NEON and MVE,
> > I am not sure if we should separate the pattern ?
> > Would it make sense to FAIL if the mode size isn't 16 bytes for MVE as
> > in attached patch so
> > it will call neon_expand_vector_init only for 128-bit vectors ?
> > Altho hard-coding 16 in the pattern doesn't seem a good idea to me either.
> ping https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572342.html
> (attaching patch as text).
ping https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572648.html

Thanks,
Prathamesh
>
> Thanks,
> Prathamesh
> >
> > Thanks,
> > Prathamesh
> > >
> > > That being said, I suggest you look at other similar patterns in
> > > vec-common.md, most of which are gated on
> > > ARM_HAVE_<MODE>_ARITH
> > > and possibly beware of issues with iwmmxt :-)
> > >
> > > Christophe
> > >
> > > > Thanks,
> > > > Prathamesh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [ARM] PR98435: Missed optimization in expanding vector constructor
  2021-06-14  8:01     ` Prathamesh Kulkarni
  2021-06-21  8:34       ` Prathamesh Kulkarni
@ 2021-06-24 16:31       ` Kyrylo Tkachov
  2021-06-28  8:37         ` Prathamesh Kulkarni
  1 sibling, 1 reply; 29+ messages in thread
From: Kyrylo Tkachov @ 2021-06-24 16:31 UTC (permalink / raw)
  To: Prathamesh Kulkarni, Christophe Lyon; +Cc: gcc Patches



> -----Original Message-----
> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> Sent: 14 June 2021 09:02
> To: Christophe Lyon <christophe.lyon@linaro.org>
> Cc: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov
> <Kyrylo.Tkachov@arm.com>
> Subject: Re: [ARM] PR98435: Missed optimization in expanding vector
> constructor
> 
> On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni
> <prathamesh.kulkarni@linaro.org> wrote:
> >
> > On Fri, 4 Jun 2021 at 13:15, Christophe Lyon <christophe.lyon@linaro.org>
> wrote:
> > >
> > > On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > > >
> > > > Hi,
> > > > As mentioned in PR, for the following test-case:
> > > >
> > > > #include <arm_neon.h>
> > > >
> > > > bfloat16x4_t f1 (bfloat16_t a)
> > > > {
> > > >   return vdup_n_bf16 (a);
> > > > }
> > > >
> > > > bfloat16x4_t f2 (bfloat16_t a)
> > > > {
> > > >   return (bfloat16x4_t) {a, a, a, a};
> > > > }
> > > >
> > > > Compiling with arm-linux-gnueabi -O3 -mfpu=neon -mfloat-abi=softfp
> > > > -march=armv8.2-a+bf16+fp16 results in f2 not being vectorized:
> > > >
> > > > f1:
> > > >         vdup.16 d16, r0
> > > >         vmov    r0, r1, d16  @ v4bf
> > > >         bx      lr
> > > >
> > > > f2:
> > > >         mov     r3, r0  @ __bf16
> > > >         adr     r1, .L4
> > > >         ldrd    r0, [r1]
> > > >         mov     r2, r3  @ __bf16
> > > >         mov     ip, r3  @ __bf16
> > > >         bfi     r1, r2, #0, #16
> > > >         bfi     r0, ip, #0, #16
> > > >         bfi     r1, r3, #16, #16
> > > >         bfi     r0, r2, #16, #16
> > > >         bx      lr
> > > >
> > > > This seems to happen because vec_init pattern in neon.md has VDQ
> mode
> > > > iterator, which doesn't include V4BF. In attached patch, I changed
> > > > mode
> > > > to VDQX which seems to work for the test-case, and the compiler now
> generates:
> > > >
> > > > f2:
> > > >         vdup.16 d16, r0
> > > >         vmov    r0, r1, d16  @ v4bf
> > > >         bx      lr
> > > >
> > > > However, the pattern is also gated on TARGET_HAVE_MVE and I am
> not
> > > > sure if either VDQ or VDQX are correct modes for MVE since MVE has
> > > > only 128-bit vectors ?
> > > >
> > >
> > > I think patterns common to both Neon and MVE should be moved to
> > > vec-common.md, I don't know why such patterns were left in neon.md.
> > Since we end up calling neon_expand_vector_init for both NEON and MVE,
> > I am not sure if we should separate the pattern ?
> > Would it make sense to FAIL if the mode size isn't 16 bytes for MVE as
> > in attached patch so
> > it will call neon_expand_vector_init only for 128-bit vectors ?
> > Altho hard-coding 16 in the pattern doesn't seem a good idea to me either.
> ping https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572342.html
> (attaching patch as text).
> 

--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -459,10 +459,12 @@
 )
 
 (define_expand "vec_init<mode><V_elem_l>"
-  [(match_operand:VDQ 0 "s_register_operand")
+  [(match_operand:VDQX 0 "s_register_operand")
    (match_operand 1 "" "")]
   "TARGET_NEON || TARGET_HAVE_MVE"
 {
+  if (TARGET_HAVE_MVE && GET_MODE_SIZE (GET_MODE (operands[0])) != 16)
+    FAIL;
   neon_expand_vector_init (operands[0], operands[1]);
   DONE;
 })

I think we should move this to vec-common.md like Christophe said.
Perhaps rather than making it FAIL for non-16 MVE sizes we just disable it in the expander condition?
"TARGET_NEON || (TARGET_HAVE_MVE && GET_MODE_SIZE (< VDQ>mode) != 16)"

Thanks,
Kyrill

> Thanks,
> Prathamesh
> >
> > Thanks,
> > Prathamesh
> > >
> > > That being said, I suggest you look at other similar patterns in
> > > vec-common.md, most of which are gated on
> > > ARM_HAVE_<MODE>_ARITH
> > > and possibly beware of issues with iwmmxt :-)
> > >
> > > Christophe
> > >
> > > > Thanks,
> > > > Prathamesh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ARM] PR98435: Missed optimization in expanding vector constructor
  2021-06-24 16:31       ` Kyrylo Tkachov
@ 2021-06-28  8:37         ` Prathamesh Kulkarni
  2021-06-28  8:40           ` Kyrylo Tkachov
  0 siblings, 1 reply; 29+ messages in thread
From: Prathamesh Kulkarni @ 2021-06-28  8:37 UTC (permalink / raw)
  To: Kyrylo Tkachov; +Cc: Christophe Lyon, gcc Patches

[-- Attachment #1: Type: text/plain, Size: 4527 bytes --]

On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> > Sent: 14 June 2021 09:02
> > To: Christophe Lyon <christophe.lyon@linaro.org>
> > Cc: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov
> > <Kyrylo.Tkachov@arm.com>
> > Subject: Re: [ARM] PR98435: Missed optimization in expanding vector
> > constructor
> >
> > On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni
> > <prathamesh.kulkarni@linaro.org> wrote:
> > >
> > > On Fri, 4 Jun 2021 at 13:15, Christophe Lyon <christophe.lyon@linaro.org>
> > wrote:
> > > >
> > > > On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via Gcc-patches
> > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > >
> > > > > Hi,
> > > > > As mentioned in PR, for the following test-case:
> > > > >
> > > > > #include <arm_neon.h>
> > > > >
> > > > > bfloat16x4_t f1 (bfloat16_t a)
> > > > > {
> > > > >   return vdup_n_bf16 (a);
> > > > > }
> > > > >
> > > > > bfloat16x4_t f2 (bfloat16_t a)
> > > > > {
> > > > >   return (bfloat16x4_t) {a, a, a, a};
> > > > > }
> > > > >
> > > > > Compiling with arm-linux-gnueabi -O3 -mfpu=neon -mfloat-abi=softfp
> > > > > -march=armv8.2-a+bf16+fp16 results in f2 not being vectorized:
> > > > >
> > > > > f1:
> > > > >         vdup.16 d16, r0
> > > > >         vmov    r0, r1, d16  @ v4bf
> > > > >         bx      lr
> > > > >
> > > > > f2:
> > > > >         mov     r3, r0  @ __bf16
> > > > >         adr     r1, .L4
> > > > >         ldrd    r0, [r1]
> > > > >         mov     r2, r3  @ __bf16
> > > > >         mov     ip, r3  @ __bf16
> > > > >         bfi     r1, r2, #0, #16
> > > > >         bfi     r0, ip, #0, #16
> > > > >         bfi     r1, r3, #16, #16
> > > > >         bfi     r0, r2, #16, #16
> > > > >         bx      lr
> > > > >
> > > > > This seems to happen because vec_init pattern in neon.md has VDQ
> > mode
> > > > > iterator, which doesn't include V4BF. In attached patch, I changed
> > > > > mode
> > > > > to VDQX which seems to work for the test-case, and the compiler now
> > generates:
> > > > >
> > > > > f2:
> > > > >         vdup.16 d16, r0
> > > > >         vmov    r0, r1, d16  @ v4bf
> > > > >         bx      lr
> > > > >
> > > > > However, the pattern is also gated on TARGET_HAVE_MVE and I am
> > not
> > > > > sure if either VDQ or VDQX are correct modes for MVE since MVE has
> > > > > only 128-bit vectors ?
> > > > >
> > > >
> > > > I think patterns common to both Neon and MVE should be moved to
> > > > vec-common.md, I don't know why such patterns were left in neon.md.
> > > Since we end up calling neon_expand_vector_init for both NEON and MVE,
> > > I am not sure if we should separate the pattern ?
> > > Would it make sense to FAIL if the mode size isn't 16 bytes for MVE as
> > > in attached patch so
> > > it will call neon_expand_vector_init only for 128-bit vectors ?
> > > Altho hard-coding 16 in the pattern doesn't seem a good idea to me either.
> > ping https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572342.html
> > (attaching patch as text).
> >
>
> --- a/gcc/config/arm/neon.md
> +++ b/gcc/config/arm/neon.md
> @@ -459,10 +459,12 @@
>  )
>
>  (define_expand "vec_init<mode><V_elem_l>"
> -  [(match_operand:VDQ 0 "s_register_operand")
> +  [(match_operand:VDQX 0 "s_register_operand")
>     (match_operand 1 "" "")]
>    "TARGET_NEON || TARGET_HAVE_MVE"
>  {
> +  if (TARGET_HAVE_MVE && GET_MODE_SIZE (GET_MODE (operands[0])) != 16)
> +    FAIL;
>    neon_expand_vector_init (operands[0], operands[1]);
>    DONE;
>  })
>
> I think we should move this to vec-common.md like Christophe said.
> Perhaps rather than making it FAIL for non-16 MVE sizes we just disable it in the expander condition?
> "TARGET_NEON || (TARGET_HAVE_MVE && GET_MODE_SIZE (< VDQ>mode) != 16)"
Is it OK to use <MODE>mode ? Because using <VDQ>mode resulted in lot
of build errors.
Also, I think the comparison should be inverted, ie, GET_MODE_SIZE
(<MODE>mode) == 16 since
we want to make the pattern pass if target is MVE and vector size is 16 bytes ?
Do these changes in attached patch look OK ?

Thanks,
Prathamesh
>
> Thanks,
> Kyrill
>
> > Thanks,
> > Prathamesh
> > >
> > > Thanks,
> > > Prathamesh
> > > >
> > > > That being said, I suggest you look at other similar patterns in
> > > > vec-common.md, most of which are gated on
> > > > ARM_HAVE_<MODE>_ARITH
> > > > and possibly beware of issues with iwmmxt :-)
> > > >
> > > > Christophe
> > > >
> > > > > Thanks,
> > > > > Prathamesh

[-- Attachment #2: pr98435-3.diff --]
[-- Type: application/octet-stream, Size: 1060 bytes --]

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 6a6573317cf..0c98b3a8f23 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -458,15 +458,6 @@
   [(set_attr "type" "neon_store1_one_lane_q,neon_to_gp_q")]
 )
 
-(define_expand "vec_init<mode><V_elem_l>"
-  [(match_operand:VDQ 0 "s_register_operand")
-   (match_operand 1 "" "")]
-  "TARGET_NEON || TARGET_HAVE_MVE"
-{
-  neon_expand_vector_init (operands[0], operands[1]);
-  DONE;
-})
-
 ;; Doubleword and quadword arithmetic.
 
 ;; NOTE: some other instructions also support 64-bit integer
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index 8e35151da46..7a9187bf0e4 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -565,3 +565,12 @@
 
   DONE;
 })
+
+(define_expand "vec_init<mode><V_elem_l>"
+  [(match_operand:VDQX 0 "s_register_operand")
+   (match_operand 1 "" "")]
+  "TARGET_NEON || (TARGET_HAVE_MVE && GET_MODE_SIZE (<MODE>mode) == 16)"
+{
+  neon_expand_vector_init (operands[0], operands[1]);
+  DONE;
+})

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [ARM] PR98435: Missed optimization in expanding vector constructor
  2021-06-28  8:37         ` Prathamesh Kulkarni
@ 2021-06-28  8:40           ` Kyrylo Tkachov
  2021-06-28  9:17             ` Christophe LYON
  0 siblings, 1 reply; 29+ messages in thread
From: Kyrylo Tkachov @ 2021-06-28  8:40 UTC (permalink / raw)
  To: Prathamesh Kulkarni; +Cc: Christophe Lyon, gcc Patches



> -----Original Message-----
> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> Sent: 28 June 2021 09:38
> To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Cc: Christophe Lyon <christophe.lyon@linaro.org>; gcc Patches <gcc-
> patches@gcc.gnu.org>
> Subject: Re: [ARM] PR98435: Missed optimization in expanding vector
> constructor
> 
> On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> > > Sent: 14 June 2021 09:02
> > > To: Christophe Lyon <christophe.lyon@linaro.org>
> > > Cc: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov
> > > <Kyrylo.Tkachov@arm.com>
> > > Subject: Re: [ARM] PR98435: Missed optimization in expanding vector
> > > constructor
> > >
> > > On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni
> > > <prathamesh.kulkarni@linaro.org> wrote:
> > > >
> > > > On Fri, 4 Jun 2021 at 13:15, Christophe Lyon
> <christophe.lyon@linaro.org>
> > > wrote:
> > > > >
> > > > > On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via Gcc-patches
> > > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > > >
> > > > > > Hi,
> > > > > > As mentioned in PR, for the following test-case:
> > > > > >
> > > > > > #include <arm_neon.h>
> > > > > >
> > > > > > bfloat16x4_t f1 (bfloat16_t a)
> > > > > > {
> > > > > >   return vdup_n_bf16 (a);
> > > > > > }
> > > > > >
> > > > > > bfloat16x4_t f2 (bfloat16_t a)
> > > > > > {
> > > > > >   return (bfloat16x4_t) {a, a, a, a};
> > > > > > }
> > > > > >
> > > > > > Compiling with arm-linux-gnueabi -O3 -mfpu=neon -mfloat-
> abi=softfp
> > > > > > -march=armv8.2-a+bf16+fp16 results in f2 not being vectorized:
> > > > > >
> > > > > > f1:
> > > > > >         vdup.16 d16, r0
> > > > > >         vmov    r0, r1, d16  @ v4bf
> > > > > >         bx      lr
> > > > > >
> > > > > > f2:
> > > > > >         mov     r3, r0  @ __bf16
> > > > > >         adr     r1, .L4
> > > > > >         ldrd    r0, [r1]
> > > > > >         mov     r2, r3  @ __bf16
> > > > > >         mov     ip, r3  @ __bf16
> > > > > >         bfi     r1, r2, #0, #16
> > > > > >         bfi     r0, ip, #0, #16
> > > > > >         bfi     r1, r3, #16, #16
> > > > > >         bfi     r0, r2, #16, #16
> > > > > >         bx      lr
> > > > > >
> > > > > > This seems to happen because vec_init pattern in neon.md has VDQ
> > > mode
> > > > > > iterator, which doesn't include V4BF. In attached patch, I changed
> > > > > > mode
> > > > > > to VDQX which seems to work for the test-case, and the compiler
> now
> > > generates:
> > > > > >
> > > > > > f2:
> > > > > >         vdup.16 d16, r0
> > > > > >         vmov    r0, r1, d16  @ v4bf
> > > > > >         bx      lr
> > > > > >
> > > > > > However, the pattern is also gated on TARGET_HAVE_MVE and I am
> > > not
> > > > > > sure if either VDQ or VDQX are correct modes for MVE since MVE
> has
> > > > > > only 128-bit vectors ?
> > > > > >
> > > > >
> > > > > I think patterns common to both Neon and MVE should be moved to
> > > > > vec-common.md, I don't know why such patterns were left in
> neon.md.
> > > > Since we end up calling neon_expand_vector_init for both NEON and
> MVE,
> > > > I am not sure if we should separate the pattern ?
> > > > Would it make sense to FAIL if the mode size isn't 16 bytes for MVE as
> > > > in attached patch so
> > > > it will call neon_expand_vector_init only for 128-bit vectors ?
> > > > Altho hard-coding 16 in the pattern doesn't seem a good idea to me
> either.
> > > ping https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572342.html
> > > (attaching patch as text).
> > >
> >
> > --- a/gcc/config/arm/neon.md
> > +++ b/gcc/config/arm/neon.md
> > @@ -459,10 +459,12 @@
> >  )
> >
> >  (define_expand "vec_init<mode><V_elem_l>"
> > -  [(match_operand:VDQ 0 "s_register_operand")
> > +  [(match_operand:VDQX 0 "s_register_operand")
> >     (match_operand 1 "" "")]
> >    "TARGET_NEON || TARGET_HAVE_MVE"
> >  {
> > +  if (TARGET_HAVE_MVE && GET_MODE_SIZE (GET_MODE
> (operands[0])) != 16)
> > +    FAIL;
> >    neon_expand_vector_init (operands[0], operands[1]);
> >    DONE;
> >  })
> >
> > I think we should move this to vec-common.md like Christophe said.
> > Perhaps rather than making it FAIL for non-16 MVE sizes we just disable it in
> the expander condition?
> > "TARGET_NEON || (TARGET_HAVE_MVE && GET_MODE_SIZE (<
> VDQ>mode) != 16)"
> Is it OK to use <MODE>mode ? Because using <VDQ>mode resulted in lot
> of build errors.
> Also, I think the comparison should be inverted, ie, GET_MODE_SIZE
> (<MODE>mode) == 16 since
> we want to make the pattern pass if target is MVE and vector size is 16 bytes ?
> Do these changes in attached patch look OK ?

Yes, you're right.
Ok.
Thanks,
Kyrill


> 
> Thanks,
> Prathamesh
> >
> > Thanks,
> > Kyrill
> >
> > > Thanks,
> > > Prathamesh
> > > >
> > > > Thanks,
> > > > Prathamesh
> > > > >
> > > > > That being said, I suggest you look at other similar patterns in
> > > > > vec-common.md, most of which are gated on
> > > > > ARM_HAVE_<MODE>_ARITH
> > > > > and possibly beware of issues with iwmmxt :-)
> > > > >
> > > > > Christophe
> > > > >
> > > > > > Thanks,
> > > > > > Prathamesh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ARM] PR98435: Missed optimization in expanding vector constructor
  2021-06-28  8:40           ` Kyrylo Tkachov
@ 2021-06-28  9:17             ` Christophe LYON
  2021-06-29 10:46               ` Prathamesh Kulkarni
  0 siblings, 1 reply; 29+ messages in thread
From: Christophe LYON @ 2021-06-28  9:17 UTC (permalink / raw)
  To: Kyrylo Tkachov, Prathamesh Kulkarni; +Cc: gcc Patches


On 28/06/2021 10:40, Kyrylo Tkachov via Gcc-patches wrote:
>
>> -----Original Message-----
>> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
>> Sent: 28 June 2021 09:38
>> To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
>> Cc: Christophe Lyon <christophe.lyon@linaro.org>; gcc Patches <gcc-
>> patches@gcc.gnu.org>
>> Subject: Re: [ARM] PR98435: Missed optimization in expanding vector
>> constructor
>>
>> On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
>> wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
>>>> Sent: 14 June 2021 09:02
>>>> To: Christophe Lyon <christophe.lyon@linaro.org>
>>>> Cc: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov
>>>> <Kyrylo.Tkachov@arm.com>
>>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding vector
>>>> constructor
>>>>
>>>> On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni
>>>> <prathamesh.kulkarni@linaro.org> wrote:
>>>>> On Fri, 4 Jun 2021 at 13:15, Christophe Lyon
>> <christophe.lyon@linaro.org>
>>>> wrote:
>>>>>> On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via Gcc-patches
>>>>>> <gcc-patches@gcc.gnu.org> wrote:
>>>>>>> Hi,
>>>>>>> As mentioned in PR, for the following test-case:
>>>>>>>
>>>>>>> #include <arm_neon.h>
>>>>>>>
>>>>>>> bfloat16x4_t f1 (bfloat16_t a)
>>>>>>> {
>>>>>>>    return vdup_n_bf16 (a);
>>>>>>> }
>>>>>>>
>>>>>>> bfloat16x4_t f2 (bfloat16_t a)
>>>>>>> {
>>>>>>>    return (bfloat16x4_t) {a, a, a, a};
>>>>>>> }
>>>>>>>
>>>>>>> Compiling with arm-linux-gnueabi -O3 -mfpu=neon -mfloat-
>> abi=softfp
>>>>>>> -march=armv8.2-a+bf16+fp16 results in f2 not being vectorized:
>>>>>>>
>>>>>>> f1:
>>>>>>>          vdup.16 d16, r0
>>>>>>>          vmov    r0, r1, d16  @ v4bf
>>>>>>>          bx      lr
>>>>>>>
>>>>>>> f2:
>>>>>>>          mov     r3, r0  @ __bf16
>>>>>>>          adr     r1, .L4
>>>>>>>          ldrd    r0, [r1]
>>>>>>>          mov     r2, r3  @ __bf16
>>>>>>>          mov     ip, r3  @ __bf16
>>>>>>>          bfi     r1, r2, #0, #16
>>>>>>>          bfi     r0, ip, #0, #16
>>>>>>>          bfi     r1, r3, #16, #16
>>>>>>>          bfi     r0, r2, #16, #16
>>>>>>>          bx      lr
>>>>>>>
>>>>>>> This seems to happen because vec_init pattern in neon.md has VDQ
>>>> mode
>>>>>>> iterator, which doesn't include V4BF. In attached patch, I changed
>>>>>>> mode
>>>>>>> to VDQX which seems to work for the test-case, and the compiler
>> now
>>>> generates:
>>>>>>> f2:
>>>>>>>          vdup.16 d16, r0
>>>>>>>          vmov    r0, r1, d16  @ v4bf
>>>>>>>          bx      lr
>>>>>>>
>>>>>>> However, the pattern is also gated on TARGET_HAVE_MVE and I am
>>>> not
>>>>>>> sure if either VDQ or VDQX are correct modes for MVE since MVE
>> has
>>>>>>> only 128-bit vectors ?
>>>>>>>
>>>>>> I think patterns common to both Neon and MVE should be moved to
>>>>>> vec-common.md, I don't know why such patterns were left in
>> neon.md.
>>>>> Since we end up calling neon_expand_vector_init for both NEON and
>> MVE,
>>>>> I am not sure if we should separate the pattern ?
>>>>> Would it make sense to FAIL if the mode size isn't 16 bytes for MVE as
>>>>> in attached patch so
>>>>> it will call neon_expand_vector_init only for 128-bit vectors ?
>>>>> Altho hard-coding 16 in the pattern doesn't seem a good idea to me
>> either.
>>>> ping https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572342.html
>>>> (attaching patch as text).
>>>>
>>> --- a/gcc/config/arm/neon.md
>>> +++ b/gcc/config/arm/neon.md
>>> @@ -459,10 +459,12 @@
>>>   )
>>>
>>>   (define_expand "vec_init<mode><V_elem_l>"
>>> -  [(match_operand:VDQ 0 "s_register_operand")
>>> +  [(match_operand:VDQX 0 "s_register_operand")
>>>      (match_operand 1 "" "")]
>>>     "TARGET_NEON || TARGET_HAVE_MVE"
>>>   {
>>> +  if (TARGET_HAVE_MVE && GET_MODE_SIZE (GET_MODE
>> (operands[0])) != 16)
>>> +    FAIL;
>>>     neon_expand_vector_init (operands[0], operands[1]);
>>>     DONE;
>>>   })
>>>
>>> I think we should move this to vec-common.md like Christophe said.
>>> Perhaps rather than making it FAIL for non-16 MVE sizes we just disable it in
>> the expander condition?
>>> "TARGET_NEON || (TARGET_HAVE_MVE && GET_MODE_SIZE (<
>> VDQ>mode) != 16)"
>> Is it OK to use <MODE>mode ? Because using <VDQ>mode resulted in lot
>> of build errors.
>> Also, I think the comparison should be inverted, ie, GET_MODE_SIZE
>> (<MODE>mode) == 16 since
>> we want to make the pattern pass if target is MVE and vector size is 16 bytes ?
>> Do these changes in attached patch look OK ?
> Yes, you're right.


Can't this be ARM_HAVE_<MODE>_ARITH like in most expanders in vec-common.md?

(maybe with a && !TARGET_REALLY_IWMMXT if needed)


Christophe


> Ok.
> Thanks,
> Kyrill
>
>
>> Thanks,
>> Prathamesh
>>> Thanks,
>>> Kyrill
>>>
>>>> Thanks,
>>>> Prathamesh
>>>>> Thanks,
>>>>> Prathamesh
>>>>>> That being said, I suggest you look at other similar patterns in
>>>>>> vec-common.md, most of which are gated on
>>>>>> ARM_HAVE_<MODE>_ARITH
>>>>>> and possibly beware of issues with iwmmxt :-)
>>>>>>
>>>>>> Christophe
>>>>>>
>>>>>>> Thanks,
>>>>>>> Prathamesh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ARM] PR98435: Missed optimization in expanding vector constructor
  2021-06-28  9:17             ` Christophe LYON
@ 2021-06-29 10:46               ` Prathamesh Kulkarni
  2021-06-30 15:21                 ` Christophe LYON
  0 siblings, 1 reply; 29+ messages in thread
From: Prathamesh Kulkarni @ 2021-06-29 10:46 UTC (permalink / raw)
  To: Christophe LYON; +Cc: Kyrylo Tkachov, gcc Patches

On Mon, 28 Jun 2021 at 14:48, Christophe LYON
<christophe.lyon@foss.st.com> wrote:
>
>
> On 28/06/2021 10:40, Kyrylo Tkachov via Gcc-patches wrote:
> >
> >> -----Original Message-----
> >> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> >> Sent: 28 June 2021 09:38
> >> To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> >> Cc: Christophe Lyon <christophe.lyon@linaro.org>; gcc Patches <gcc-
> >> patches@gcc.gnu.org>
> >> Subject: Re: [ARM] PR98435: Missed optimization in expanding vector
> >> constructor
> >>
> >> On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> >> wrote:
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> >>>> Sent: 14 June 2021 09:02
> >>>> To: Christophe Lyon <christophe.lyon@linaro.org>
> >>>> Cc: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov
> >>>> <Kyrylo.Tkachov@arm.com>
> >>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding vector
> >>>> constructor
> >>>>
> >>>> On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni
> >>>> <prathamesh.kulkarni@linaro.org> wrote:
> >>>>> On Fri, 4 Jun 2021 at 13:15, Christophe Lyon
> >> <christophe.lyon@linaro.org>
> >>>> wrote:
> >>>>>> On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via Gcc-patches
> >>>>>> <gcc-patches@gcc.gnu.org> wrote:
> >>>>>>> Hi,
> >>>>>>> As mentioned in PR, for the following test-case:
> >>>>>>>
> >>>>>>> #include <arm_neon.h>
> >>>>>>>
> >>>>>>> bfloat16x4_t f1 (bfloat16_t a)
> >>>>>>> {
> >>>>>>>    return vdup_n_bf16 (a);
> >>>>>>> }
> >>>>>>>
> >>>>>>> bfloat16x4_t f2 (bfloat16_t a)
> >>>>>>> {
> >>>>>>>    return (bfloat16x4_t) {a, a, a, a};
> >>>>>>> }
> >>>>>>>
> >>>>>>> Compiling with arm-linux-gnueabi -O3 -mfpu=neon -mfloat-
> >> abi=softfp
> >>>>>>> -march=armv8.2-a+bf16+fp16 results in f2 not being vectorized:
> >>>>>>>
> >>>>>>> f1:
> >>>>>>>          vdup.16 d16, r0
> >>>>>>>          vmov    r0, r1, d16  @ v4bf
> >>>>>>>          bx      lr
> >>>>>>>
> >>>>>>> f2:
> >>>>>>>          mov     r3, r0  @ __bf16
> >>>>>>>          adr     r1, .L4
> >>>>>>>          ldrd    r0, [r1]
> >>>>>>>          mov     r2, r3  @ __bf16
> >>>>>>>          mov     ip, r3  @ __bf16
> >>>>>>>          bfi     r1, r2, #0, #16
> >>>>>>>          bfi     r0, ip, #0, #16
> >>>>>>>          bfi     r1, r3, #16, #16
> >>>>>>>          bfi     r0, r2, #16, #16
> >>>>>>>          bx      lr
> >>>>>>>
> >>>>>>> This seems to happen because vec_init pattern in neon.md has VDQ
> >>>> mode
> >>>>>>> iterator, which doesn't include V4BF. In attached patch, I changed
> >>>>>>> mode
> >>>>>>> to VDQX which seems to work for the test-case, and the compiler
> >> now
> >>>> generates:
> >>>>>>> f2:
> >>>>>>>          vdup.16 d16, r0
> >>>>>>>          vmov    r0, r1, d16  @ v4bf
> >>>>>>>          bx      lr
> >>>>>>>
> >>>>>>> However, the pattern is also gated on TARGET_HAVE_MVE and I am
> >>>> not
> >>>>>>> sure if either VDQ or VDQX are correct modes for MVE since MVE
> >> has
> >>>>>>> only 128-bit vectors ?
> >>>>>>>
> >>>>>> I think patterns common to both Neon and MVE should be moved to
> >>>>>> vec-common.md, I don't know why such patterns were left in
> >> neon.md.
> >>>>> Since we end up calling neon_expand_vector_init for both NEON and
> >> MVE,
> >>>>> I am not sure if we should separate the pattern ?
> >>>>> Would it make sense to FAIL if the mode size isn't 16 bytes for MVE as
> >>>>> in attached patch so
> >>>>> it will call neon_expand_vector_init only for 128-bit vectors ?
> >>>>> Altho hard-coding 16 in the pattern doesn't seem a good idea to me
> >> either.
> >>>> ping https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572342.html
> >>>> (attaching patch as text).
> >>>>
> >>> --- a/gcc/config/arm/neon.md
> >>> +++ b/gcc/config/arm/neon.md
> >>> @@ -459,10 +459,12 @@
> >>>   )
> >>>
> >>>   (define_expand "vec_init<mode><V_elem_l>"
> >>> -  [(match_operand:VDQ 0 "s_register_operand")
> >>> +  [(match_operand:VDQX 0 "s_register_operand")
> >>>      (match_operand 1 "" "")]
> >>>     "TARGET_NEON || TARGET_HAVE_MVE"
> >>>   {
> >>> +  if (TARGET_HAVE_MVE && GET_MODE_SIZE (GET_MODE
> >> (operands[0])) != 16)
> >>> +    FAIL;
> >>>     neon_expand_vector_init (operands[0], operands[1]);
> >>>     DONE;
> >>>   })
> >>>
> >>> I think we should move this to vec-common.md like Christophe said.
> >>> Perhaps rather than making it FAIL for non-16 MVE sizes we just disable it in
> >> the expander condition?
> >>> "TARGET_NEON || (TARGET_HAVE_MVE && GET_MODE_SIZE (<
> >> VDQ>mode) != 16)"
> >> Is it OK to use <MODE>mode ? Because using <VDQ>mode resulted in lot
> >> of build errors.
> >> Also, I think the comparison should be inverted, ie, GET_MODE_SIZE
> >> (<MODE>mode) == 16 since
> >> we want to make the pattern pass if target is MVE and vector size is 16 bytes ?
> >> Do these changes in attached patch look OK ?
> > Yes, you're right.
>
>
> Can't this be ARM_HAVE_<MODE>_ARITH like in most expanders in vec-common.md?
>
> (maybe with a && !TARGET_REALLY_IWMMXT if needed)
I wonder if this should be ARM_HAVE_<MODE>_LDST instead since we're
initializing the vector ?

Thanks,
Prathamesh
>
>
> Christophe
>
>
> > Ok.
> > Thanks,
> > Kyrill
> >
> >
> >> Thanks,
> >> Prathamesh
> >>> Thanks,
> >>> Kyrill
> >>>
> >>>> Thanks,
> >>>> Prathamesh
> >>>>> Thanks,
> >>>>> Prathamesh
> >>>>>> That being said, I suggest you look at other similar patterns in
> >>>>>> vec-common.md, most of which are gated on
> >>>>>> ARM_HAVE_<MODE>_ARITH
> >>>>>> and possibly beware of issues with iwmmxt :-)
> >>>>>>
> >>>>>> Christophe
> >>>>>>
> >>>>>>> Thanks,
> >>>>>>> Prathamesh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ARM] PR98435: Missed optimization in expanding vector constructor
  2021-06-29 10:46               ` Prathamesh Kulkarni
@ 2021-06-30 15:21                 ` Christophe LYON
  2021-07-01 10:56                   ` Prathamesh Kulkarni
  0 siblings, 1 reply; 29+ messages in thread
From: Christophe LYON @ 2021-06-30 15:21 UTC (permalink / raw)
  To: Prathamesh Kulkarni; +Cc: Kyrylo Tkachov, gcc Patches


On 29/06/2021 12:46, Prathamesh Kulkarni wrote:
> On Mon, 28 Jun 2021 at 14:48, Christophe LYON
> <christophe.lyon@foss.st.com> wrote:
>>
>> On 28/06/2021 10:40, Kyrylo Tkachov via Gcc-patches wrote:
>>>> -----Original Message-----
>>>> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
>>>> Sent: 28 June 2021 09:38
>>>> To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
>>>> Cc: Christophe Lyon <christophe.lyon@linaro.org>; gcc Patches <gcc-
>>>> patches@gcc.gnu.org>
>>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding vector
>>>> constructor
>>>>
>>>> On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
>>>> wrote:
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
>>>>>> Sent: 14 June 2021 09:02
>>>>>> To: Christophe Lyon <christophe.lyon@linaro.org>
>>>>>> Cc: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov
>>>>>> <Kyrylo.Tkachov@arm.com>
>>>>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding vector
>>>>>> constructor
>>>>>>
>>>>>> On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni
>>>>>> <prathamesh.kulkarni@linaro.org> wrote:
>>>>>>> On Fri, 4 Jun 2021 at 13:15, Christophe Lyon
>>>> <christophe.lyon@linaro.org>
>>>>>> wrote:
>>>>>>>> On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via Gcc-patches
>>>>>>>> <gcc-patches@gcc.gnu.org> wrote:
>>>>>>>>> Hi,
>>>>>>>>> As mentioned in PR, for the following test-case:
>>>>>>>>>
>>>>>>>>> #include <arm_neon.h>
>>>>>>>>>
>>>>>>>>> bfloat16x4_t f1 (bfloat16_t a)
>>>>>>>>> {
>>>>>>>>>     return vdup_n_bf16 (a);
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> bfloat16x4_t f2 (bfloat16_t a)
>>>>>>>>> {
>>>>>>>>>     return (bfloat16x4_t) {a, a, a, a};
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> Compiling with arm-linux-gnueabi -O3 -mfpu=neon -mfloat-
>>>> abi=softfp
>>>>>>>>> -march=armv8.2-a+bf16+fp16 results in f2 not being vectorized:
>>>>>>>>>
>>>>>>>>> f1:
>>>>>>>>>           vdup.16 d16, r0
>>>>>>>>>           vmov    r0, r1, d16  @ v4bf
>>>>>>>>>           bx      lr
>>>>>>>>>
>>>>>>>>> f2:
>>>>>>>>>           mov     r3, r0  @ __bf16
>>>>>>>>>           adr     r1, .L4
>>>>>>>>>           ldrd    r0, [r1]
>>>>>>>>>           mov     r2, r3  @ __bf16
>>>>>>>>>           mov     ip, r3  @ __bf16
>>>>>>>>>           bfi     r1, r2, #0, #16
>>>>>>>>>           bfi     r0, ip, #0, #16
>>>>>>>>>           bfi     r1, r3, #16, #16
>>>>>>>>>           bfi     r0, r2, #16, #16
>>>>>>>>>           bx      lr
>>>>>>>>>
>>>>>>>>> This seems to happen because vec_init pattern in neon.md has VDQ
>>>>>> mode
>>>>>>>>> iterator, which doesn't include V4BF. In attached patch, I changed
>>>>>>>>> mode
>>>>>>>>> to VDQX which seems to work for the test-case, and the compiler
>>>> now
>>>>>> generates:
>>>>>>>>> f2:
>>>>>>>>>           vdup.16 d16, r0
>>>>>>>>>           vmov    r0, r1, d16  @ v4bf
>>>>>>>>>           bx      lr
>>>>>>>>>
>>>>>>>>> However, the pattern is also gated on TARGET_HAVE_MVE and I am
>>>>>> not
>>>>>>>>> sure if either VDQ or VDQX are correct modes for MVE since MVE
>>>> has
>>>>>>>>> only 128-bit vectors ?
>>>>>>>>>
>>>>>>>> I think patterns common to both Neon and MVE should be moved to
>>>>>>>> vec-common.md, I don't know why such patterns were left in
>>>> neon.md.
>>>>>>> Since we end up calling neon_expand_vector_init for both NEON and
>>>> MVE,
>>>>>>> I am not sure if we should separate the pattern ?
>>>>>>> Would it make sense to FAIL if the mode size isn't 16 bytes for MVE as
>>>>>>> in attached patch so
>>>>>>> it will call neon_expand_vector_init only for 128-bit vectors ?
>>>>>>> Altho hard-coding 16 in the pattern doesn't seem a good idea to me
>>>> either.
>>>>>> ping https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572342.html
>>>>>> (attaching patch as text).
>>>>>>
>>>>> --- a/gcc/config/arm/neon.md
>>>>> +++ b/gcc/config/arm/neon.md
>>>>> @@ -459,10 +459,12 @@
>>>>>    )
>>>>>
>>>>>    (define_expand "vec_init<mode><V_elem_l>"
>>>>> -  [(match_operand:VDQ 0 "s_register_operand")
>>>>> +  [(match_operand:VDQX 0 "s_register_operand")
>>>>>       (match_operand 1 "" "")]
>>>>>      "TARGET_NEON || TARGET_HAVE_MVE"
>>>>>    {
>>>>> +  if (TARGET_HAVE_MVE && GET_MODE_SIZE (GET_MODE
>>>> (operands[0])) != 16)
>>>>> +    FAIL;
>>>>>      neon_expand_vector_init (operands[0], operands[1]);
>>>>>      DONE;
>>>>>    })
>>>>>
>>>>> I think we should move this to vec-common.md like Christophe said.
>>>>> Perhaps rather than making it FAIL for non-16 MVE sizes we just disable it in
>>>> the expander condition?
>>>>> "TARGET_NEON || (TARGET_HAVE_MVE && GET_MODE_SIZE (<
>>>> VDQ>mode) != 16)"
>>>> Is it OK to use <MODE>mode ? Because using <VDQ>mode resulted in lot
>>>> of build errors.
>>>> Also, I think the comparison should be inverted, ie, GET_MODE_SIZE
>>>> (<MODE>mode) == 16 since
>>>> we want to make the pattern pass if target is MVE and vector size is 16 bytes ?
>>>> Do these changes in attached patch look OK ?
>>> Yes, you're right.
>>
>> Can't this be ARM_HAVE_<MODE>_ARITH like in most expanders in vec-common.md?
>>
>> (maybe with a && !TARGET_REALLY_IWMMXT if needed)
> I wonder if this should be ARM_HAVE_<MODE>_LDST instead since we're
> initializing the vector ?


Well, it really depends on which modes you want to enable.


Looks like your move VDQ -> VDQ adds V4BF, V8BF and DI.

Are they all OK for Neon?

They are not OK for MVE.

Ideally you could add testcases to cover to the supported and 
unsupported modes for both Neon and MVE.\

Before your patch, the expander is enabled for MVE for 64 bit modes 
(V8QI, V4HI, V2SI): what happens in this case? Does the compiler crash 
or is there something else preventing the match?


Thanks,


Christophe


> Thanks,
> Prathamesh
>>
>> Christophe
>>
>>
>>> Ok.
>>> Thanks,
>>> Kyrill
>>>
>>>
>>>> Thanks,
>>>> Prathamesh
>>>>> Thanks,
>>>>> Kyrill
>>>>>
>>>>>> Thanks,
>>>>>> Prathamesh
>>>>>>> Thanks,
>>>>>>> Prathamesh
>>>>>>>> That being said, I suggest you look at other similar patterns in
>>>>>>>> vec-common.md, most of which are gated on
>>>>>>>> ARM_HAVE_<MODE>_ARITH
>>>>>>>> and possibly beware of issues with iwmmxt :-)
>>>>>>>>
>>>>>>>> Christophe
>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Prathamesh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ARM] PR98435: Missed optimization in expanding vector constructor
  2021-06-30 15:21                 ` Christophe LYON
@ 2021-07-01 10:56                   ` Prathamesh Kulkarni
  2021-07-06  7:05                     ` Prathamesh Kulkarni
  0 siblings, 1 reply; 29+ messages in thread
From: Prathamesh Kulkarni @ 2021-07-01 10:56 UTC (permalink / raw)
  To: Christophe LYON; +Cc: Kyrylo Tkachov, gcc Patches

[-- Attachment #1: Type: text/plain, Size: 6840 bytes --]

On Wed, 30 Jun 2021 at 20:51, Christophe LYON
<christophe.lyon@foss.st.com> wrote:
>
>
> On 29/06/2021 12:46, Prathamesh Kulkarni wrote:
> > On Mon, 28 Jun 2021 at 14:48, Christophe LYON
> > <christophe.lyon@foss.st.com> wrote:
> >>
> >> On 28/06/2021 10:40, Kyrylo Tkachov via Gcc-patches wrote:
> >>>> -----Original Message-----
> >>>> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> >>>> Sent: 28 June 2021 09:38
> >>>> To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> >>>> Cc: Christophe Lyon <christophe.lyon@linaro.org>; gcc Patches <gcc-
> >>>> patches@gcc.gnu.org>
> >>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding vector
> >>>> constructor
> >>>>
> >>>> On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> >>>> wrote:
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> >>>>>> Sent: 14 June 2021 09:02
> >>>>>> To: Christophe Lyon <christophe.lyon@linaro.org>
> >>>>>> Cc: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov
> >>>>>> <Kyrylo.Tkachov@arm.com>
> >>>>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding vector
> >>>>>> constructor
> >>>>>>
> >>>>>> On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni
> >>>>>> <prathamesh.kulkarni@linaro.org> wrote:
> >>>>>>> On Fri, 4 Jun 2021 at 13:15, Christophe Lyon
> >>>> <christophe.lyon@linaro.org>
> >>>>>> wrote:
> >>>>>>>> On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via Gcc-patches
> >>>>>>>> <gcc-patches@gcc.gnu.org> wrote:
> >>>>>>>>> Hi,
> >>>>>>>>> As mentioned in PR, for the following test-case:
> >>>>>>>>>
> >>>>>>>>> #include <arm_neon.h>
> >>>>>>>>>
> >>>>>>>>> bfloat16x4_t f1 (bfloat16_t a)
> >>>>>>>>> {
> >>>>>>>>>     return vdup_n_bf16 (a);
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>> bfloat16x4_t f2 (bfloat16_t a)
> >>>>>>>>> {
> >>>>>>>>>     return (bfloat16x4_t) {a, a, a, a};
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>> Compiling with arm-linux-gnueabi -O3 -mfpu=neon -mfloat-
> >>>> abi=softfp
> >>>>>>>>> -march=armv8.2-a+bf16+fp16 results in f2 not being vectorized:
> >>>>>>>>>
> >>>>>>>>> f1:
> >>>>>>>>>           vdup.16 d16, r0
> >>>>>>>>>           vmov    r0, r1, d16  @ v4bf
> >>>>>>>>>           bx      lr
> >>>>>>>>>
> >>>>>>>>> f2:
> >>>>>>>>>           mov     r3, r0  @ __bf16
> >>>>>>>>>           adr     r1, .L4
> >>>>>>>>>           ldrd    r0, [r1]
> >>>>>>>>>           mov     r2, r3  @ __bf16
> >>>>>>>>>           mov     ip, r3  @ __bf16
> >>>>>>>>>           bfi     r1, r2, #0, #16
> >>>>>>>>>           bfi     r0, ip, #0, #16
> >>>>>>>>>           bfi     r1, r3, #16, #16
> >>>>>>>>>           bfi     r0, r2, #16, #16
> >>>>>>>>>           bx      lr
> >>>>>>>>>
> >>>>>>>>> This seems to happen because vec_init pattern in neon.md has VDQ
> >>>>>> mode
> >>>>>>>>> iterator, which doesn't include V4BF. In attached patch, I changed
> >>>>>>>>> mode
> >>>>>>>>> to VDQX which seems to work for the test-case, and the compiler
> >>>> now
> >>>>>> generates:
> >>>>>>>>> f2:
> >>>>>>>>>           vdup.16 d16, r0
> >>>>>>>>>           vmov    r0, r1, d16  @ v4bf
> >>>>>>>>>           bx      lr
> >>>>>>>>>
> >>>>>>>>> However, the pattern is also gated on TARGET_HAVE_MVE and I am
> >>>>>> not
> >>>>>>>>> sure if either VDQ or VDQX are correct modes for MVE since MVE
> >>>> has
> >>>>>>>>> only 128-bit vectors ?
> >>>>>>>>>
> >>>>>>>> I think patterns common to both Neon and MVE should be moved to
> >>>>>>>> vec-common.md, I don't know why such patterns were left in
> >>>> neon.md.
> >>>>>>> Since we end up calling neon_expand_vector_init for both NEON and
> >>>> MVE,
> >>>>>>> I am not sure if we should separate the pattern ?
> >>>>>>> Would it make sense to FAIL if the mode size isn't 16 bytes for MVE as
> >>>>>>> in attached patch so
> >>>>>>> it will call neon_expand_vector_init only for 128-bit vectors ?
> >>>>>>> Altho hard-coding 16 in the pattern doesn't seem a good idea to me
> >>>> either.
> >>>>>> ping https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572342.html
> >>>>>> (attaching patch as text).
> >>>>>>
> >>>>> --- a/gcc/config/arm/neon.md
> >>>>> +++ b/gcc/config/arm/neon.md
> >>>>> @@ -459,10 +459,12 @@
> >>>>>    )
> >>>>>
> >>>>>    (define_expand "vec_init<mode><V_elem_l>"
> >>>>> -  [(match_operand:VDQ 0 "s_register_operand")
> >>>>> +  [(match_operand:VDQX 0 "s_register_operand")
> >>>>>       (match_operand 1 "" "")]
> >>>>>      "TARGET_NEON || TARGET_HAVE_MVE"
> >>>>>    {
> >>>>> +  if (TARGET_HAVE_MVE && GET_MODE_SIZE (GET_MODE
> >>>> (operands[0])) != 16)
> >>>>> +    FAIL;
> >>>>>      neon_expand_vector_init (operands[0], operands[1]);
> >>>>>      DONE;
> >>>>>    })
> >>>>>
> >>>>> I think we should move this to vec-common.md like Christophe said.
> >>>>> Perhaps rather than making it FAIL for non-16 MVE sizes we just disable it in
> >>>> the expander condition?
> >>>>> "TARGET_NEON || (TARGET_HAVE_MVE && GET_MODE_SIZE (<
> >>>> VDQ>mode) != 16)"
> >>>> Is it OK to use <MODE>mode ? Because using <VDQ>mode resulted in lot
> >>>> of build errors.
> >>>> Also, I think the comparison should be inverted, ie, GET_MODE_SIZE
> >>>> (<MODE>mode) == 16 since
> >>>> we want to make the pattern pass if target is MVE and vector size is 16 bytes ?
> >>>> Do these changes in attached patch look OK ?
> >>> Yes, you're right.
> >>
> >> Can't this be ARM_HAVE_<MODE>_ARITH like in most expanders in vec-common.md?
> >>
> >> (maybe with a && !TARGET_REALLY_IWMMXT if needed)
> > I wonder if this should be ARM_HAVE_<MODE>_LDST instead since we're
> > initializing the vector ?
>
>
> Well, it really depends on which modes you want to enable.
>
>
> Looks like your move VDQ -> VDQ adds V4BF, V8BF and DI.
>
> Are they all OK for Neon?
>
> They are not OK for MVE.
>
> Ideally you could add testcases to cover to the supported and
> unsupported modes for both Neon and MVE.\
>
> Before your patch, the expander is enabled for MVE for 64 bit modes
> (V8QI, V4HI, V2SI): what happens in this case? Does the compiler crash
> or is there something else preventing the match?
Hi,
Apparently there is VALID_MVE_MODE macro, so is it better to use:
TARGET_NEON || (TARGET_HAVE_MVE && VALID_MVE_MODE(<MODE>mode))
as in the attached patch ?

Thanks,
Prathamesh
>
>
> Thanks,
>
>
> Christophe
>
>
> > Thanks,
> > Prathamesh
> >>
> >> Christophe
> >>
> >>
> >>> Ok.
> >>> Thanks,
> >>> Kyrill
> >>>
> >>>
> >>>> Thanks,
> >>>> Prathamesh
> >>>>> Thanks,
> >>>>> Kyrill
> >>>>>
> >>>>>> Thanks,
> >>>>>> Prathamesh
> >>>>>>> Thanks,
> >>>>>>> Prathamesh
> >>>>>>>> That being said, I suggest you look at other similar patterns in
> >>>>>>>> vec-common.md, most of which are gated on
> >>>>>>>> ARM_HAVE_<MODE>_ARITH
> >>>>>>>> and possibly beware of issues with iwmmxt :-)
> >>>>>>>>
> >>>>>>>> Christophe
> >>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Prathamesh

[-- Attachment #2: pr98435-4.txt --]
[-- Type: text/plain, Size: 1056 bytes --]

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 6a6573317cf..0c98b3a8f23 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -458,15 +458,6 @@
   [(set_attr "type" "neon_store1_one_lane_q,neon_to_gp_q")]
 )
 
-(define_expand "vec_init<mode><V_elem_l>"
-  [(match_operand:VDQ 0 "s_register_operand")
-   (match_operand 1 "" "")]
-  "TARGET_NEON || TARGET_HAVE_MVE"
-{
-  neon_expand_vector_init (operands[0], operands[1]);
-  DONE;
-})
-
 ;; Doubleword and quadword arithmetic.
 
 ;; NOTE: some other instructions also support 64-bit integer
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index 8e35151da46..7858be9f28e 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -565,3 +565,12 @@
 
   DONE;
 })
+
+(define_expand "vec_init<mode><V_elem_l>"
+  [(match_operand:VDQX 0 "s_register_operand")
+   (match_operand 1 "" "")]
+  "TARGET_NEON || (TARGET_HAVE_MVE && VALID_MVE_MODE (<MODE>mode))" 
+{
+  neon_expand_vector_init (operands[0], operands[1]);
+  DONE;
+})

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ARM] PR98435: Missed optimization in expanding vector constructor
  2021-07-01 10:56                   ` Prathamesh Kulkarni
@ 2021-07-06  7:05                     ` Prathamesh Kulkarni
  2021-07-06  8:03                       ` Kyrylo Tkachov
  0 siblings, 1 reply; 29+ messages in thread
From: Prathamesh Kulkarni @ 2021-07-06  7:05 UTC (permalink / raw)
  To: Christophe LYON; +Cc: Kyrylo Tkachov, gcc Patches

On Thu, 1 Jul 2021 at 16:26, Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
>
> On Wed, 30 Jun 2021 at 20:51, Christophe LYON
> <christophe.lyon@foss.st.com> wrote:
> >
> >
> > On 29/06/2021 12:46, Prathamesh Kulkarni wrote:
> > > On Mon, 28 Jun 2021 at 14:48, Christophe LYON
> > > <christophe.lyon@foss.st.com> wrote:
> > >>
> > >> On 28/06/2021 10:40, Kyrylo Tkachov via Gcc-patches wrote:
> > >>>> -----Original Message-----
> > >>>> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> > >>>> Sent: 28 June 2021 09:38
> > >>>> To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> > >>>> Cc: Christophe Lyon <christophe.lyon@linaro.org>; gcc Patches <gcc-
> > >>>> patches@gcc.gnu.org>
> > >>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding vector
> > >>>> constructor
> > >>>>
> > >>>> On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> > >>>> wrote:
> > >>>>>
> > >>>>>> -----Original Message-----
> > >>>>>> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> > >>>>>> Sent: 14 June 2021 09:02
> > >>>>>> To: Christophe Lyon <christophe.lyon@linaro.org>
> > >>>>>> Cc: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov
> > >>>>>> <Kyrylo.Tkachov@arm.com>
> > >>>>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding vector
> > >>>>>> constructor
> > >>>>>>
> > >>>>>> On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni
> > >>>>>> <prathamesh.kulkarni@linaro.org> wrote:
> > >>>>>>> On Fri, 4 Jun 2021 at 13:15, Christophe Lyon
> > >>>> <christophe.lyon@linaro.org>
> > >>>>>> wrote:
> > >>>>>>>> On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via Gcc-patches
> > >>>>>>>> <gcc-patches@gcc.gnu.org> wrote:
> > >>>>>>>>> Hi,
> > >>>>>>>>> As mentioned in PR, for the following test-case:
> > >>>>>>>>>
> > >>>>>>>>> #include <arm_neon.h>
> > >>>>>>>>>
> > >>>>>>>>> bfloat16x4_t f1 (bfloat16_t a)
> > >>>>>>>>> {
> > >>>>>>>>>     return vdup_n_bf16 (a);
> > >>>>>>>>> }
> > >>>>>>>>>
> > >>>>>>>>> bfloat16x4_t f2 (bfloat16_t a)
> > >>>>>>>>> {
> > >>>>>>>>>     return (bfloat16x4_t) {a, a, a, a};
> > >>>>>>>>> }
> > >>>>>>>>>
> > >>>>>>>>> Compiling with arm-linux-gnueabi -O3 -mfpu=neon -mfloat-
> > >>>> abi=softfp
> > >>>>>>>>> -march=armv8.2-a+bf16+fp16 results in f2 not being vectorized:
> > >>>>>>>>>
> > >>>>>>>>> f1:
> > >>>>>>>>>           vdup.16 d16, r0
> > >>>>>>>>>           vmov    r0, r1, d16  @ v4bf
> > >>>>>>>>>           bx      lr
> > >>>>>>>>>
> > >>>>>>>>> f2:
> > >>>>>>>>>           mov     r3, r0  @ __bf16
> > >>>>>>>>>           adr     r1, .L4
> > >>>>>>>>>           ldrd    r0, [r1]
> > >>>>>>>>>           mov     r2, r3  @ __bf16
> > >>>>>>>>>           mov     ip, r3  @ __bf16
> > >>>>>>>>>           bfi     r1, r2, #0, #16
> > >>>>>>>>>           bfi     r0, ip, #0, #16
> > >>>>>>>>>           bfi     r1, r3, #16, #16
> > >>>>>>>>>           bfi     r0, r2, #16, #16
> > >>>>>>>>>           bx      lr
> > >>>>>>>>>
> > >>>>>>>>> This seems to happen because vec_init pattern in neon.md has VDQ
> > >>>>>> mode
> > >>>>>>>>> iterator, which doesn't include V4BF. In attached patch, I changed
> > >>>>>>>>> mode
> > >>>>>>>>> to VDQX which seems to work for the test-case, and the compiler
> > >>>> now
> > >>>>>> generates:
> > >>>>>>>>> f2:
> > >>>>>>>>>           vdup.16 d16, r0
> > >>>>>>>>>           vmov    r0, r1, d16  @ v4bf
> > >>>>>>>>>           bx      lr
> > >>>>>>>>>
> > >>>>>>>>> However, the pattern is also gated on TARGET_HAVE_MVE and I am
> > >>>>>> not
> > >>>>>>>>> sure if either VDQ or VDQX are correct modes for MVE since MVE
> > >>>> has
> > >>>>>>>>> only 128-bit vectors ?
> > >>>>>>>>>
> > >>>>>>>> I think patterns common to both Neon and MVE should be moved to
> > >>>>>>>> vec-common.md, I don't know why such patterns were left in
> > >>>> neon.md.
> > >>>>>>> Since we end up calling neon_expand_vector_init for both NEON and
> > >>>> MVE,
> > >>>>>>> I am not sure if we should separate the pattern ?
> > >>>>>>> Would it make sense to FAIL if the mode size isn't 16 bytes for MVE as
> > >>>>>>> in attached patch so
> > >>>>>>> it will call neon_expand_vector_init only for 128-bit vectors ?
> > >>>>>>> Altho hard-coding 16 in the pattern doesn't seem a good idea to me
> > >>>> either.
> > >>>>>> ping https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572342.html
> > >>>>>> (attaching patch as text).
> > >>>>>>
> > >>>>> --- a/gcc/config/arm/neon.md
> > >>>>> +++ b/gcc/config/arm/neon.md
> > >>>>> @@ -459,10 +459,12 @@
> > >>>>>    )
> > >>>>>
> > >>>>>    (define_expand "vec_init<mode><V_elem_l>"
> > >>>>> -  [(match_operand:VDQ 0 "s_register_operand")
> > >>>>> +  [(match_operand:VDQX 0 "s_register_operand")
> > >>>>>       (match_operand 1 "" "")]
> > >>>>>      "TARGET_NEON || TARGET_HAVE_MVE"
> > >>>>>    {
> > >>>>> +  if (TARGET_HAVE_MVE && GET_MODE_SIZE (GET_MODE
> > >>>> (operands[0])) != 16)
> > >>>>> +    FAIL;
> > >>>>>      neon_expand_vector_init (operands[0], operands[1]);
> > >>>>>      DONE;
> > >>>>>    })
> > >>>>>
> > >>>>> I think we should move this to vec-common.md like Christophe said.
> > >>>>> Perhaps rather than making it FAIL for non-16 MVE sizes we just disable it in
> > >>>> the expander condition?
> > >>>>> "TARGET_NEON || (TARGET_HAVE_MVE && GET_MODE_SIZE (<
> > >>>> VDQ>mode) != 16)"
> > >>>> Is it OK to use <MODE>mode ? Because using <VDQ>mode resulted in lot
> > >>>> of build errors.
> > >>>> Also, I think the comparison should be inverted, ie, GET_MODE_SIZE
> > >>>> (<MODE>mode) == 16 since
> > >>>> we want to make the pattern pass if target is MVE and vector size is 16 bytes ?
> > >>>> Do these changes in attached patch look OK ?
> > >>> Yes, you're right.
> > >>
> > >> Can't this be ARM_HAVE_<MODE>_ARITH like in most expanders in vec-common.md?
> > >>
> > >> (maybe with a && !TARGET_REALLY_IWMMXT if needed)
> > > I wonder if this should be ARM_HAVE_<MODE>_LDST instead since we're
> > > initializing the vector ?
> >
> >
> > Well, it really depends on which modes you want to enable.
> >
> >
> > Looks like your move VDQ -> VDQ adds V4BF, V8BF and DI.
> >
> > Are they all OK for Neon?
> >
> > They are not OK for MVE.
> >
> > Ideally you could add testcases to cover to the supported and
> > unsupported modes for both Neon and MVE.\
> >
> > Before your patch, the expander is enabled for MVE for 64 bit modes
> > (V8QI, V4HI, V2SI): what happens in this case? Does the compiler crash
> > or is there something else preventing the match?
> Hi,
> Apparently there is VALID_MVE_MODE macro, so is it better to use:
> TARGET_NEON || (TARGET_HAVE_MVE && VALID_MVE_MODE(<MODE>mode))
> as in the attached patch ?
ping https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574206.html

Thanks,
Prathamesh
>
> Thanks,
> Prathamesh
> >
> >
> > Thanks,
> >
> >
> > Christophe
> >
> >
> > > Thanks,
> > > Prathamesh
> > >>
> > >> Christophe
> > >>
> > >>
> > >>> Ok.
> > >>> Thanks,
> > >>> Kyrill
> > >>>
> > >>>
> > >>>> Thanks,
> > >>>> Prathamesh
> > >>>>> Thanks,
> > >>>>> Kyrill
> > >>>>>
> > >>>>>> Thanks,
> > >>>>>> Prathamesh
> > >>>>>>> Thanks,
> > >>>>>>> Prathamesh
> > >>>>>>>> That being said, I suggest you look at other similar patterns in
> > >>>>>>>> vec-common.md, most of which are gated on
> > >>>>>>>> ARM_HAVE_<MODE>_ARITH
> > >>>>>>>> and possibly beware of issues with iwmmxt :-)
> > >>>>>>>>
> > >>>>>>>> Christophe
> > >>>>>>>>
> > >>>>>>>>> Thanks,
> > >>>>>>>>> Prathamesh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [ARM] PR98435: Missed optimization in expanding vector constructor
  2021-07-06  7:05                     ` Prathamesh Kulkarni
@ 2021-07-06  8:03                       ` Kyrylo Tkachov
  2021-07-06  9:25                         ` Prathamesh Kulkarni
  0 siblings, 1 reply; 29+ messages in thread
From: Kyrylo Tkachov @ 2021-07-06  8:03 UTC (permalink / raw)
  To: Prathamesh Kulkarni, Christophe LYON; +Cc: gcc Patches



> -----Original Message-----
> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> Sent: 06 July 2021 08:06
> To: Christophe LYON <christophe.lyon@foss.st.com>
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; gcc Patches <gcc-
> patches@gcc.gnu.org>
> Subject: Re: [ARM] PR98435: Missed optimization in expanding vector
> constructor
> 
> On Thu, 1 Jul 2021 at 16:26, Prathamesh Kulkarni
> <prathamesh.kulkarni@linaro.org> wrote:
> >
> > On Wed, 30 Jun 2021 at 20:51, Christophe LYON
> > <christophe.lyon@foss.st.com> wrote:
> > >
> > >
> > > On 29/06/2021 12:46, Prathamesh Kulkarni wrote:
> > > > On Mon, 28 Jun 2021 at 14:48, Christophe LYON
> > > > <christophe.lyon@foss.st.com> wrote:
> > > >>
> > > >> On 28/06/2021 10:40, Kyrylo Tkachov via Gcc-patches wrote:
> > > >>>> -----Original Message-----
> > > >>>> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> > > >>>> Sent: 28 June 2021 09:38
> > > >>>> To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> > > >>>> Cc: Christophe Lyon <christophe.lyon@linaro.org>; gcc Patches
> <gcc-
> > > >>>> patches@gcc.gnu.org>
> > > >>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding
> vector
> > > >>>> constructor
> > > >>>>
> > > >>>> On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov
> <Kyrylo.Tkachov@arm.com>
> > > >>>> wrote:
> > > >>>>>
> > > >>>>>> -----Original Message-----
> > > >>>>>> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> > > >>>>>> Sent: 14 June 2021 09:02
> > > >>>>>> To: Christophe Lyon <christophe.lyon@linaro.org>
> > > >>>>>> Cc: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov
> > > >>>>>> <Kyrylo.Tkachov@arm.com>
> > > >>>>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding
> vector
> > > >>>>>> constructor
> > > >>>>>>
> > > >>>>>> On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni
> > > >>>>>> <prathamesh.kulkarni@linaro.org> wrote:
> > > >>>>>>> On Fri, 4 Jun 2021 at 13:15, Christophe Lyon
> > > >>>> <christophe.lyon@linaro.org>
> > > >>>>>> wrote:
> > > >>>>>>>> On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via Gcc-
> patches
> > > >>>>>>>> <gcc-patches@gcc.gnu.org> wrote:
> > > >>>>>>>>> Hi,
> > > >>>>>>>>> As mentioned in PR, for the following test-case:
> > > >>>>>>>>>
> > > >>>>>>>>> #include <arm_neon.h>
> > > >>>>>>>>>
> > > >>>>>>>>> bfloat16x4_t f1 (bfloat16_t a)
> > > >>>>>>>>> {
> > > >>>>>>>>>     return vdup_n_bf16 (a);
> > > >>>>>>>>> }
> > > >>>>>>>>>
> > > >>>>>>>>> bfloat16x4_t f2 (bfloat16_t a)
> > > >>>>>>>>> {
> > > >>>>>>>>>     return (bfloat16x4_t) {a, a, a, a};
> > > >>>>>>>>> }
> > > >>>>>>>>>
> > > >>>>>>>>> Compiling with arm-linux-gnueabi -O3 -mfpu=neon -mfloat-
> > > >>>> abi=softfp
> > > >>>>>>>>> -march=armv8.2-a+bf16+fp16 results in f2 not being
> vectorized:
> > > >>>>>>>>>
> > > >>>>>>>>> f1:
> > > >>>>>>>>>           vdup.16 d16, r0
> > > >>>>>>>>>           vmov    r0, r1, d16  @ v4bf
> > > >>>>>>>>>           bx      lr
> > > >>>>>>>>>
> > > >>>>>>>>> f2:
> > > >>>>>>>>>           mov     r3, r0  @ __bf16
> > > >>>>>>>>>           adr     r1, .L4
> > > >>>>>>>>>           ldrd    r0, [r1]
> > > >>>>>>>>>           mov     r2, r3  @ __bf16
> > > >>>>>>>>>           mov     ip, r3  @ __bf16
> > > >>>>>>>>>           bfi     r1, r2, #0, #16
> > > >>>>>>>>>           bfi     r0, ip, #0, #16
> > > >>>>>>>>>           bfi     r1, r3, #16, #16
> > > >>>>>>>>>           bfi     r0, r2, #16, #16
> > > >>>>>>>>>           bx      lr
> > > >>>>>>>>>
> > > >>>>>>>>> This seems to happen because vec_init pattern in neon.md
> has VDQ
> > > >>>>>> mode
> > > >>>>>>>>> iterator, which doesn't include V4BF. In attached patch, I
> changed
> > > >>>>>>>>> mode
> > > >>>>>>>>> to VDQX which seems to work for the test-case, and the
> compiler
> > > >>>> now
> > > >>>>>> generates:
> > > >>>>>>>>> f2:
> > > >>>>>>>>>           vdup.16 d16, r0
> > > >>>>>>>>>           vmov    r0, r1, d16  @ v4bf
> > > >>>>>>>>>           bx      lr
> > > >>>>>>>>>
> > > >>>>>>>>> However, the pattern is also gated on TARGET_HAVE_MVE
> and I am
> > > >>>>>> not
> > > >>>>>>>>> sure if either VDQ or VDQX are correct modes for MVE since
> MVE
> > > >>>> has
> > > >>>>>>>>> only 128-bit vectors ?
> > > >>>>>>>>>
> > > >>>>>>>> I think patterns common to both Neon and MVE should be
> moved to
> > > >>>>>>>> vec-common.md, I don't know why such patterns were left in
> > > >>>> neon.md.
> > > >>>>>>> Since we end up calling neon_expand_vector_init for both
> NEON and
> > > >>>> MVE,
> > > >>>>>>> I am not sure if we should separate the pattern ?
> > > >>>>>>> Would it make sense to FAIL if the mode size isn't 16 bytes for
> MVE as
> > > >>>>>>> in attached patch so
> > > >>>>>>> it will call neon_expand_vector_init only for 128-bit vectors ?
> > > >>>>>>> Altho hard-coding 16 in the pattern doesn't seem a good idea to
> me
> > > >>>> either.
> > > >>>>>> ping https://gcc.gnu.org/pipermail/gcc-patches/2021-
> June/572342.html
> > > >>>>>> (attaching patch as text).
> > > >>>>>>
> > > >>>>> --- a/gcc/config/arm/neon.md
> > > >>>>> +++ b/gcc/config/arm/neon.md
> > > >>>>> @@ -459,10 +459,12 @@
> > > >>>>>    )
> > > >>>>>
> > > >>>>>    (define_expand "vec_init<mode><V_elem_l>"
> > > >>>>> -  [(match_operand:VDQ 0 "s_register_operand")
> > > >>>>> +  [(match_operand:VDQX 0 "s_register_operand")
> > > >>>>>       (match_operand 1 "" "")]
> > > >>>>>      "TARGET_NEON || TARGET_HAVE_MVE"
> > > >>>>>    {
> > > >>>>> +  if (TARGET_HAVE_MVE && GET_MODE_SIZE (GET_MODE
> > > >>>> (operands[0])) != 16)
> > > >>>>> +    FAIL;
> > > >>>>>      neon_expand_vector_init (operands[0], operands[1]);
> > > >>>>>      DONE;
> > > >>>>>    })
> > > >>>>>
> > > >>>>> I think we should move this to vec-common.md like Christophe
> said.
> > > >>>>> Perhaps rather than making it FAIL for non-16 MVE sizes we just
> disable it in
> > > >>>> the expander condition?
> > > >>>>> "TARGET_NEON || (TARGET_HAVE_MVE && GET_MODE_SIZE (<
> > > >>>> VDQ>mode) != 16)"
> > > >>>> Is it OK to use <MODE>mode ? Because using <VDQ>mode resulted
> in lot
> > > >>>> of build errors.
> > > >>>> Also, I think the comparison should be inverted, ie, GET_MODE_SIZE
> > > >>>> (<MODE>mode) == 16 since
> > > >>>> we want to make the pattern pass if target is MVE and vector size is
> 16 bytes ?
> > > >>>> Do these changes in attached patch look OK ?
> > > >>> Yes, you're right.
> > > >>
> > > >> Can't this be ARM_HAVE_<MODE>_ARITH like in most expanders in
> vec-common.md?
> > > >>
> > > >> (maybe with a && !TARGET_REALLY_IWMMXT if needed)
> > > > I wonder if this should be ARM_HAVE_<MODE>_LDST instead since
> we're
> > > > initializing the vector ?
> > >
> > >
> > > Well, it really depends on which modes you want to enable.
> > >
> > >
> > > Looks like your move VDQ -> VDQ adds V4BF, V8BF and DI.
> > >
> > > Are they all OK for Neon?
> > >
> > > They are not OK for MVE.
> > >
> > > Ideally you could add testcases to cover to the supported and
> > > unsupported modes for both Neon and MVE.\
> > >
> > > Before your patch, the expander is enabled for MVE for 64 bit modes
> > > (V8QI, V4HI, V2SI): what happens in this case? Does the compiler crash
> > > or is there something else preventing the match?
> > Hi,
> > Apparently there is VALID_MVE_MODE macro, so is it better to use:
> > TARGET_NEON || (TARGET_HAVE_MVE &&
> VALID_MVE_MODE(<MODE>mode))
> > as in the attached patch ?

The change is ok. I would like to see some testcases like Christophe suggested, but this patch just moves the expander around rather than introducing new functionality.
Thanks,
Kyrill

> ping https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574206.html
> 
> Thanks,
> Prathamesh
> >
> > Thanks,
> > Prathamesh
> > >
> > >
> > > Thanks,
> > >
> > >
> > > Christophe
> > >
> > >
> > > > Thanks,
> > > > Prathamesh
> > > >>
> > > >> Christophe
> > > >>
> > > >>
> > > >>> Ok.
> > > >>> Thanks,
> > > >>> Kyrill
> > > >>>
> > > >>>
> > > >>>> Thanks,
> > > >>>> Prathamesh
> > > >>>>> Thanks,
> > > >>>>> Kyrill
> > > >>>>>
> > > >>>>>> Thanks,
> > > >>>>>> Prathamesh
> > > >>>>>>> Thanks,
> > > >>>>>>> Prathamesh
> > > >>>>>>>> That being said, I suggest you look at other similar patterns in
> > > >>>>>>>> vec-common.md, most of which are gated on
> > > >>>>>>>> ARM_HAVE_<MODE>_ARITH
> > > >>>>>>>> and possibly beware of issues with iwmmxt :-)
> > > >>>>>>>>
> > > >>>>>>>> Christophe
> > > >>>>>>>>
> > > >>>>>>>>> Thanks,
> > > >>>>>>>>> Prathamesh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ARM] PR98435: Missed optimization in expanding vector constructor
  2021-07-06  8:03                       ` Kyrylo Tkachov
@ 2021-07-06  9:25                         ` Prathamesh Kulkarni
  2021-07-06  9:28                           ` Kyrylo Tkachov
  2021-08-03  9:29                           ` Christophe Lyon
  0 siblings, 2 replies; 29+ messages in thread
From: Prathamesh Kulkarni @ 2021-07-06  9:25 UTC (permalink / raw)
  To: Kyrylo Tkachov; +Cc: Christophe LYON, gcc Patches

[-- Attachment #1: Type: text/plain, Size: 9930 bytes --]

On Tue, 6 Jul 2021 at 13:33, Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> > Sent: 06 July 2021 08:06
> > To: Christophe LYON <christophe.lyon@foss.st.com>
> > Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; gcc Patches <gcc-
> > patches@gcc.gnu.org>
> > Subject: Re: [ARM] PR98435: Missed optimization in expanding vector
> > constructor
> >
> > On Thu, 1 Jul 2021 at 16:26, Prathamesh Kulkarni
> > <prathamesh.kulkarni@linaro.org> wrote:
> > >
> > > On Wed, 30 Jun 2021 at 20:51, Christophe LYON
> > > <christophe.lyon@foss.st.com> wrote:
> > > >
> > > >
> > > > On 29/06/2021 12:46, Prathamesh Kulkarni wrote:
> > > > > On Mon, 28 Jun 2021 at 14:48, Christophe LYON
> > > > > <christophe.lyon@foss.st.com> wrote:
> > > > >>
> > > > >> On 28/06/2021 10:40, Kyrylo Tkachov via Gcc-patches wrote:
> > > > >>>> -----Original Message-----
> > > > >>>> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> > > > >>>> Sent: 28 June 2021 09:38
> > > > >>>> To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> > > > >>>> Cc: Christophe Lyon <christophe.lyon@linaro.org>; gcc Patches
> > <gcc-
> > > > >>>> patches@gcc.gnu.org>
> > > > >>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding
> > vector
> > > > >>>> constructor
> > > > >>>>
> > > > >>>> On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov
> > <Kyrylo.Tkachov@arm.com>
> > > > >>>> wrote:
> > > > >>>>>
> > > > >>>>>> -----Original Message-----
> > > > >>>>>> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> > > > >>>>>> Sent: 14 June 2021 09:02
> > > > >>>>>> To: Christophe Lyon <christophe.lyon@linaro.org>
> > > > >>>>>> Cc: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov
> > > > >>>>>> <Kyrylo.Tkachov@arm.com>
> > > > >>>>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding
> > vector
> > > > >>>>>> constructor
> > > > >>>>>>
> > > > >>>>>> On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni
> > > > >>>>>> <prathamesh.kulkarni@linaro.org> wrote:
> > > > >>>>>>> On Fri, 4 Jun 2021 at 13:15, Christophe Lyon
> > > > >>>> <christophe.lyon@linaro.org>
> > > > >>>>>> wrote:
> > > > >>>>>>>> On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via Gcc-
> > patches
> > > > >>>>>>>> <gcc-patches@gcc.gnu.org> wrote:
> > > > >>>>>>>>> Hi,
> > > > >>>>>>>>> As mentioned in PR, for the following test-case:
> > > > >>>>>>>>>
> > > > >>>>>>>>> #include <arm_neon.h>
> > > > >>>>>>>>>
> > > > >>>>>>>>> bfloat16x4_t f1 (bfloat16_t a)
> > > > >>>>>>>>> {
> > > > >>>>>>>>>     return vdup_n_bf16 (a);
> > > > >>>>>>>>> }
> > > > >>>>>>>>>
> > > > >>>>>>>>> bfloat16x4_t f2 (bfloat16_t a)
> > > > >>>>>>>>> {
> > > > >>>>>>>>>     return (bfloat16x4_t) {a, a, a, a};
> > > > >>>>>>>>> }
> > > > >>>>>>>>>
> > > > >>>>>>>>> Compiling with arm-linux-gnueabi -O3 -mfpu=neon -mfloat-
> > > > >>>> abi=softfp
> > > > >>>>>>>>> -march=armv8.2-a+bf16+fp16 results in f2 not being
> > vectorized:
> > > > >>>>>>>>>
> > > > >>>>>>>>> f1:
> > > > >>>>>>>>>           vdup.16 d16, r0
> > > > >>>>>>>>>           vmov    r0, r1, d16  @ v4bf
> > > > >>>>>>>>>           bx      lr
> > > > >>>>>>>>>
> > > > >>>>>>>>> f2:
> > > > >>>>>>>>>           mov     r3, r0  @ __bf16
> > > > >>>>>>>>>           adr     r1, .L4
> > > > >>>>>>>>>           ldrd    r0, [r1]
> > > > >>>>>>>>>           mov     r2, r3  @ __bf16
> > > > >>>>>>>>>           mov     ip, r3  @ __bf16
> > > > >>>>>>>>>           bfi     r1, r2, #0, #16
> > > > >>>>>>>>>           bfi     r0, ip, #0, #16
> > > > >>>>>>>>>           bfi     r1, r3, #16, #16
> > > > >>>>>>>>>           bfi     r0, r2, #16, #16
> > > > >>>>>>>>>           bx      lr
> > > > >>>>>>>>>
> > > > >>>>>>>>> This seems to happen because vec_init pattern in neon.md
> > has VDQ
> > > > >>>>>> mode
> > > > >>>>>>>>> iterator, which doesn't include V4BF. In attached patch, I
> > changed
> > > > >>>>>>>>> mode
> > > > >>>>>>>>> to VDQX which seems to work for the test-case, and the
> > compiler
> > > > >>>> now
> > > > >>>>>> generates:
> > > > >>>>>>>>> f2:
> > > > >>>>>>>>>           vdup.16 d16, r0
> > > > >>>>>>>>>           vmov    r0, r1, d16  @ v4bf
> > > > >>>>>>>>>           bx      lr
> > > > >>>>>>>>>
> > > > >>>>>>>>> However, the pattern is also gated on TARGET_HAVE_MVE
> > and I am
> > > > >>>>>> not
> > > > >>>>>>>>> sure if either VDQ or VDQX are correct modes for MVE since
> > MVE
> > > > >>>> has
> > > > >>>>>>>>> only 128-bit vectors ?
> > > > >>>>>>>>>
> > > > >>>>>>>> I think patterns common to both Neon and MVE should be
> > moved to
> > > > >>>>>>>> vec-common.md, I don't know why such patterns were left in
> > > > >>>> neon.md.
> > > > >>>>>>> Since we end up calling neon_expand_vector_init for both
> > NEON and
> > > > >>>> MVE,
> > > > >>>>>>> I am not sure if we should separate the pattern ?
> > > > >>>>>>> Would it make sense to FAIL if the mode size isn't 16 bytes for
> > MVE as
> > > > >>>>>>> in attached patch so
> > > > >>>>>>> it will call neon_expand_vector_init only for 128-bit vectors ?
> > > > >>>>>>> Altho hard-coding 16 in the pattern doesn't seem a good idea to
> > me
> > > > >>>> either.
> > > > >>>>>> ping https://gcc.gnu.org/pipermail/gcc-patches/2021-
> > June/572342.html
> > > > >>>>>> (attaching patch as text).
> > > > >>>>>>
> > > > >>>>> --- a/gcc/config/arm/neon.md
> > > > >>>>> +++ b/gcc/config/arm/neon.md
> > > > >>>>> @@ -459,10 +459,12 @@
> > > > >>>>>    )
> > > > >>>>>
> > > > >>>>>    (define_expand "vec_init<mode><V_elem_l>"
> > > > >>>>> -  [(match_operand:VDQ 0 "s_register_operand")
> > > > >>>>> +  [(match_operand:VDQX 0 "s_register_operand")
> > > > >>>>>       (match_operand 1 "" "")]
> > > > >>>>>      "TARGET_NEON || TARGET_HAVE_MVE"
> > > > >>>>>    {
> > > > >>>>> +  if (TARGET_HAVE_MVE && GET_MODE_SIZE (GET_MODE
> > > > >>>> (operands[0])) != 16)
> > > > >>>>> +    FAIL;
> > > > >>>>>      neon_expand_vector_init (operands[0], operands[1]);
> > > > >>>>>      DONE;
> > > > >>>>>    })
> > > > >>>>>
> > > > >>>>> I think we should move this to vec-common.md like Christophe
> > said.
> > > > >>>>> Perhaps rather than making it FAIL for non-16 MVE sizes we just
> > disable it in
> > > > >>>> the expander condition?
> > > > >>>>> "TARGET_NEON || (TARGET_HAVE_MVE && GET_MODE_SIZE (<
> > > > >>>> VDQ>mode) != 16)"
> > > > >>>> Is it OK to use <MODE>mode ? Because using <VDQ>mode resulted
> > in lot
> > > > >>>> of build errors.
> > > > >>>> Also, I think the comparison should be inverted, ie, GET_MODE_SIZE
> > > > >>>> (<MODE>mode) == 16 since
> > > > >>>> we want to make the pattern pass if target is MVE and vector size is
> > 16 bytes ?
> > > > >>>> Do these changes in attached patch look OK ?
> > > > >>> Yes, you're right.
> > > > >>
> > > > >> Can't this be ARM_HAVE_<MODE>_ARITH like in most expanders in
> > vec-common.md?
> > > > >>
> > > > >> (maybe with a && !TARGET_REALLY_IWMMXT if needed)
> > > > > I wonder if this should be ARM_HAVE_<MODE>_LDST instead since
> > we're
> > > > > initializing the vector ?
> > > >
> > > >
> > > > Well, it really depends on which modes you want to enable.
> > > >
> > > >
> > > > Looks like your move VDQ -> VDQ adds V4BF, V8BF and DI.
> > > >
> > > > Are they all OK for Neon?
> > > >
> > > > They are not OK for MVE.
> > > >
> > > > Ideally you could add testcases to cover to the supported and
> > > > unsupported modes for both Neon and MVE.\
> > > >
> > > > Before your patch, the expander is enabled for MVE for 64 bit modes
> > > > (V8QI, V4HI, V2SI): what happens in this case? Does the compiler crash
> > > > or is there something else preventing the match?
> > > Hi,
> > > Apparently there is VALID_MVE_MODE macro, so is it better to use:
> > > TARGET_NEON || (TARGET_HAVE_MVE &&
> > VALID_MVE_MODE(<MODE>mode))
> > > as in the attached patch ?
>
> The change is ok. I would like to see some testcases like Christophe suggested, but this patch just moves the expander around rather than introducing new functionality.
Hi Kyrill,
As mentioned in the first email, the patch improves code-gen for
following test-case:

bfloat16x4_t f (bfloat16_t a)
{
  return (bfloat16x4_t) {a, a, a, a};
}

Before patch:
f:
        mov     r3, r0  @ __bf16
        adr     r1, .L4
        ldrd    r0, [r1]
        mov     r2, r3  @ __bf16
        mov     ip, r3  @ __bf16
        bfi     r1, r2, #0, #16
        bfi     r0, ip, #0, #16
        bfi     r1, r3, #16, #16
        bfi     r0, r2, #16, #16
        bx      lr

After patch:
f:
        vdup.16 d16, r0
        vmov    r0, r1, d16  @ v4bf
        bx      lr

because the patch changes mode from VDQ to VDQX to accommodate bf modes.
I have included the test in the attached patch.
I think Christophe's concerns were mainly about the right modes
getting enabled for MVE.
Unfortunately, I am not sure how to test for that because the FE
catches invalid modes, and we don't
end up hitting the pattern.

Thanks,
Prathamesh
> Thanks,
> Kyrill
>
> > ping https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574206.html
> >
> > Thanks,
> > Prathamesh
> > >
> > > Thanks,
> > > Prathamesh
> > > >
> > > >
> > > > Thanks,
> > > >
> > > >
> > > > Christophe
> > > >
> > > >
> > > > > Thanks,
> > > > > Prathamesh
> > > > >>
> > > > >> Christophe
> > > > >>
> > > > >>
> > > > >>> Ok.
> > > > >>> Thanks,
> > > > >>> Kyrill
> > > > >>>
> > > > >>>
> > > > >>>> Thanks,
> > > > >>>> Prathamesh
> > > > >>>>> Thanks,
> > > > >>>>> Kyrill
> > > > >>>>>
> > > > >>>>>> Thanks,
> > > > >>>>>> Prathamesh
> > > > >>>>>>> Thanks,
> > > > >>>>>>> Prathamesh
> > > > >>>>>>>> That being said, I suggest you look at other similar patterns in
> > > > >>>>>>>> vec-common.md, most of which are gated on
> > > > >>>>>>>> ARM_HAVE_<MODE>_ARITH
> > > > >>>>>>>> and possibly beware of issues with iwmmxt :-)
> > > > >>>>>>>>
> > > > >>>>>>>> Christophe
> > > > >>>>>>>>
> > > > >>>>>>>>> Thanks,
> > > > >>>>>>>>> Prathamesh

[-- Attachment #2: pr98435-5.txt --]
[-- Type: text/plain, Size: 1780 bytes --]

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 6a6573317cf..0c98b3a8f23 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -458,15 +458,6 @@
   [(set_attr "type" "neon_store1_one_lane_q,neon_to_gp_q")]
 )
 
-(define_expand "vec_init<mode><V_elem_l>"
-  [(match_operand:VDQ 0 "s_register_operand")
-   (match_operand 1 "" "")]
-  "TARGET_NEON || TARGET_HAVE_MVE"
-{
-  neon_expand_vector_init (operands[0], operands[1]);
-  DONE;
-})
-
 ;; Doubleword and quadword arithmetic.
 
 ;; NOTE: some other instructions also support 64-bit integer
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index 8e35151da46..7858be9f28e 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -565,3 +565,12 @@
 
   DONE;
 })
+
+(define_expand "vec_init<mode><V_elem_l>"
+  [(match_operand:VDQX 0 "s_register_operand")
+   (match_operand 1 "" "")]
+  "TARGET_NEON || (TARGET_HAVE_MVE && VALID_MVE_MODE (<MODE>mode))" 
+{
+  neon_expand_vector_init (operands[0], operands[1]);
+  DONE;
+})
diff --git a/gcc/testsuite/gcc.target/arm/simd/pr98435.c b/gcc/testsuite/gcc.target/arm/simd/pr98435.c
new file mode 100644
index 00000000000..0af8633fd56
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/pr98435.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math" } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-add-options arm_v8_2a_bf16_neon } */
+/* { dg-additional-options "-mfloat-abi=softfp -march=armv8.2-a+bf16+fp16" } */
+
+#include <arm_neon.h>
+
+bfloat16x4_t f (bfloat16_t a)
+{
+  return (bfloat16x4_t) {a, a, a, a};
+}
+
+/* { dg-final { scan-assembler {\tvdup.16\td[0-9]+, r0} } } */
+/* { dg-final { scan-assembler {\tvmov\tr0, r1, d[0-9]+} } } */

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [ARM] PR98435: Missed optimization in expanding vector constructor
  2021-07-06  9:25                         ` Prathamesh Kulkarni
@ 2021-07-06  9:28                           ` Kyrylo Tkachov
  2021-07-06 10:16                             ` Christophe Lyon
  2021-08-03  9:29                           ` Christophe Lyon
  1 sibling, 1 reply; 29+ messages in thread
From: Kyrylo Tkachov @ 2021-07-06  9:28 UTC (permalink / raw)
  To: Prathamesh Kulkarni; +Cc: Christophe LYON, gcc Patches



> -----Original Message-----
> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> Sent: 06 July 2021 10:25
> To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Cc: Christophe LYON <christophe.lyon@foss.st.com>; gcc Patches <gcc-
> patches@gcc.gnu.org>
> Subject: Re: [ARM] PR98435: Missed optimization in expanding vector
> constructor
> 
> On Tue, 6 Jul 2021 at 13:33, Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> > > Sent: 06 July 2021 08:06
> > > To: Christophe LYON <christophe.lyon@foss.st.com>
> > > Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; gcc Patches <gcc-
> > > patches@gcc.gnu.org>
> > > Subject: Re: [ARM] PR98435: Missed optimization in expanding vector
> > > constructor
> > >
> > > On Thu, 1 Jul 2021 at 16:26, Prathamesh Kulkarni
> > > <prathamesh.kulkarni@linaro.org> wrote:
> > > >
> > > > On Wed, 30 Jun 2021 at 20:51, Christophe LYON
> > > > <christophe.lyon@foss.st.com> wrote:
> > > > >
> > > > >
> > > > > On 29/06/2021 12:46, Prathamesh Kulkarni wrote:
> > > > > > On Mon, 28 Jun 2021 at 14:48, Christophe LYON
> > > > > > <christophe.lyon@foss.st.com> wrote:
> > > > > >>
> > > > > >> On 28/06/2021 10:40, Kyrylo Tkachov via Gcc-patches wrote:
> > > > > >>>> -----Original Message-----
> > > > > >>>> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> > > > > >>>> Sent: 28 June 2021 09:38
> > > > > >>>> To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> > > > > >>>> Cc: Christophe Lyon <christophe.lyon@linaro.org>; gcc Patches
> > > <gcc-
> > > > > >>>> patches@gcc.gnu.org>
> > > > > >>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding
> > > vector
> > > > > >>>> constructor
> > > > > >>>>
> > > > > >>>> On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov
> > > <Kyrylo.Tkachov@arm.com>
> > > > > >>>> wrote:
> > > > > >>>>>
> > > > > >>>>>> -----Original Message-----
> > > > > >>>>>> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> > > > > >>>>>> Sent: 14 June 2021 09:02
> > > > > >>>>>> To: Christophe Lyon <christophe.lyon@linaro.org>
> > > > > >>>>>> Cc: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov
> > > > > >>>>>> <Kyrylo.Tkachov@arm.com>
> > > > > >>>>>> Subject: Re: [ARM] PR98435: Missed optimization in
> expanding
> > > vector
> > > > > >>>>>> constructor
> > > > > >>>>>>
> > > > > >>>>>> On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni
> > > > > >>>>>> <prathamesh.kulkarni@linaro.org> wrote:
> > > > > >>>>>>> On Fri, 4 Jun 2021 at 13:15, Christophe Lyon
> > > > > >>>> <christophe.lyon@linaro.org>
> > > > > >>>>>> wrote:
> > > > > >>>>>>>> On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via Gcc-
> > > patches
> > > > > >>>>>>>> <gcc-patches@gcc.gnu.org> wrote:
> > > > > >>>>>>>>> Hi,
> > > > > >>>>>>>>> As mentioned in PR, for the following test-case:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> #include <arm_neon.h>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> bfloat16x4_t f1 (bfloat16_t a)
> > > > > >>>>>>>>> {
> > > > > >>>>>>>>>     return vdup_n_bf16 (a);
> > > > > >>>>>>>>> }
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> bfloat16x4_t f2 (bfloat16_t a)
> > > > > >>>>>>>>> {
> > > > > >>>>>>>>>     return (bfloat16x4_t) {a, a, a, a};
> > > > > >>>>>>>>> }
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Compiling with arm-linux-gnueabi -O3 -mfpu=neon -
> mfloat-
> > > > > >>>> abi=softfp
> > > > > >>>>>>>>> -march=armv8.2-a+bf16+fp16 results in f2 not being
> > > vectorized:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> f1:
> > > > > >>>>>>>>>           vdup.16 d16, r0
> > > > > >>>>>>>>>           vmov    r0, r1, d16  @ v4bf
> > > > > >>>>>>>>>           bx      lr
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> f2:
> > > > > >>>>>>>>>           mov     r3, r0  @ __bf16
> > > > > >>>>>>>>>           adr     r1, .L4
> > > > > >>>>>>>>>           ldrd    r0, [r1]
> > > > > >>>>>>>>>           mov     r2, r3  @ __bf16
> > > > > >>>>>>>>>           mov     ip, r3  @ __bf16
> > > > > >>>>>>>>>           bfi     r1, r2, #0, #16
> > > > > >>>>>>>>>           bfi     r0, ip, #0, #16
> > > > > >>>>>>>>>           bfi     r1, r3, #16, #16
> > > > > >>>>>>>>>           bfi     r0, r2, #16, #16
> > > > > >>>>>>>>>           bx      lr
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> This seems to happen because vec_init pattern in neon.md
> > > has VDQ
> > > > > >>>>>> mode
> > > > > >>>>>>>>> iterator, which doesn't include V4BF. In attached patch, I
> > > changed
> > > > > >>>>>>>>> mode
> > > > > >>>>>>>>> to VDQX which seems to work for the test-case, and the
> > > compiler
> > > > > >>>> now
> > > > > >>>>>> generates:
> > > > > >>>>>>>>> f2:
> > > > > >>>>>>>>>           vdup.16 d16, r0
> > > > > >>>>>>>>>           vmov    r0, r1, d16  @ v4bf
> > > > > >>>>>>>>>           bx      lr
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> However, the pattern is also gated on
> TARGET_HAVE_MVE
> > > and I am
> > > > > >>>>>> not
> > > > > >>>>>>>>> sure if either VDQ or VDQX are correct modes for MVE
> since
> > > MVE
> > > > > >>>> has
> > > > > >>>>>>>>> only 128-bit vectors ?
> > > > > >>>>>>>>>
> > > > > >>>>>>>> I think patterns common to both Neon and MVE should be
> > > moved to
> > > > > >>>>>>>> vec-common.md, I don't know why such patterns were left
> in
> > > > > >>>> neon.md.
> > > > > >>>>>>> Since we end up calling neon_expand_vector_init for both
> > > NEON and
> > > > > >>>> MVE,
> > > > > >>>>>>> I am not sure if we should separate the pattern ?
> > > > > >>>>>>> Would it make sense to FAIL if the mode size isn't 16 bytes
> for
> > > MVE as
> > > > > >>>>>>> in attached patch so
> > > > > >>>>>>> it will call neon_expand_vector_init only for 128-bit vectors ?
> > > > > >>>>>>> Altho hard-coding 16 in the pattern doesn't seem a good
> idea to
> > > me
> > > > > >>>> either.
> > > > > >>>>>> ping https://gcc.gnu.org/pipermail/gcc-patches/2021-
> > > June/572342.html
> > > > > >>>>>> (attaching patch as text).
> > > > > >>>>>>
> > > > > >>>>> --- a/gcc/config/arm/neon.md
> > > > > >>>>> +++ b/gcc/config/arm/neon.md
> > > > > >>>>> @@ -459,10 +459,12 @@
> > > > > >>>>>    )
> > > > > >>>>>
> > > > > >>>>>    (define_expand "vec_init<mode><V_elem_l>"
> > > > > >>>>> -  [(match_operand:VDQ 0 "s_register_operand")
> > > > > >>>>> +  [(match_operand:VDQX 0 "s_register_operand")
> > > > > >>>>>       (match_operand 1 "" "")]
> > > > > >>>>>      "TARGET_NEON || TARGET_HAVE_MVE"
> > > > > >>>>>    {
> > > > > >>>>> +  if (TARGET_HAVE_MVE && GET_MODE_SIZE (GET_MODE
> > > > > >>>> (operands[0])) != 16)
> > > > > >>>>> +    FAIL;
> > > > > >>>>>      neon_expand_vector_init (operands[0], operands[1]);
> > > > > >>>>>      DONE;
> > > > > >>>>>    })
> > > > > >>>>>
> > > > > >>>>> I think we should move this to vec-common.md like Christophe
> > > said.
> > > > > >>>>> Perhaps rather than making it FAIL for non-16 MVE sizes we
> just
> > > disable it in
> > > > > >>>> the expander condition?
> > > > > >>>>> "TARGET_NEON || (TARGET_HAVE_MVE && GET_MODE_SIZE
> (<
> > > > > >>>> VDQ>mode) != 16)"
> > > > > >>>> Is it OK to use <MODE>mode ? Because using <VDQ>mode
> resulted
> > > in lot
> > > > > >>>> of build errors.
> > > > > >>>> Also, I think the comparison should be inverted, ie,
> GET_MODE_SIZE
> > > > > >>>> (<MODE>mode) == 16 since
> > > > > >>>> we want to make the pattern pass if target is MVE and vector
> size is
> > > 16 bytes ?
> > > > > >>>> Do these changes in attached patch look OK ?
> > > > > >>> Yes, you're right.
> > > > > >>
> > > > > >> Can't this be ARM_HAVE_<MODE>_ARITH like in most expanders in
> > > vec-common.md?
> > > > > >>
> > > > > >> (maybe with a && !TARGET_REALLY_IWMMXT if needed)
> > > > > > I wonder if this should be ARM_HAVE_<MODE>_LDST instead since
> > > we're
> > > > > > initializing the vector ?
> > > > >
> > > > >
> > > > > Well, it really depends on which modes you want to enable.
> > > > >
> > > > >
> > > > > Looks like your move VDQ -> VDQ adds V4BF, V8BF and DI.
> > > > >
> > > > > Are they all OK for Neon?
> > > > >
> > > > > They are not OK for MVE.
> > > > >
> > > > > Ideally you could add testcases to cover to the supported and
> > > > > unsupported modes for both Neon and MVE.\
> > > > >
> > > > > Before your patch, the expander is enabled for MVE for 64 bit modes
> > > > > (V8QI, V4HI, V2SI): what happens in this case? Does the compiler
> crash
> > > > > or is there something else preventing the match?
> > > > Hi,
> > > > Apparently there is VALID_MVE_MODE macro, so is it better to use:
> > > > TARGET_NEON || (TARGET_HAVE_MVE &&
> > > VALID_MVE_MODE(<MODE>mode))
> > > > as in the attached patch ?
> >
> > The change is ok. I would like to see some testcases like Christophe
> suggested, but this patch just moves the expander around rather than
> introducing new functionality.
> Hi Kyrill,
> As mentioned in the first email, the patch improves code-gen for
> following test-case:
> 
> bfloat16x4_t f (bfloat16_t a)
> {
>   return (bfloat16x4_t) {a, a, a, a};
> }
> 
> Before patch:
> f:
>         mov     r3, r0  @ __bf16
>         adr     r1, .L4
>         ldrd    r0, [r1]
>         mov     r2, r3  @ __bf16
>         mov     ip, r3  @ __bf16
>         bfi     r1, r2, #0, #16
>         bfi     r0, ip, #0, #16
>         bfi     r1, r3, #16, #16
>         bfi     r0, r2, #16, #16
>         bx      lr
> 
> After patch:
> f:
>         vdup.16 d16, r0
>         vmov    r0, r1, d16  @ v4bf
>         bx      lr
> 
> because the patch changes mode from VDQ to VDQX to accommodate bf
> modes.
> I have included the test in the attached patch.
> I think Christophe's concerns were mainly about the right modes
> getting enabled for MVE.
> Unfortunately, I am not sure how to test for that because the FE
> catches invalid modes, and we don't
> end up hitting the pattern.

Ah, that should be ok then.
Thanks,
Kyrill

> 
> Thanks,
> Prathamesh
> > Thanks,
> > Kyrill
> >
> > > ping https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574206.html
> > >
> > > Thanks,
> > > Prathamesh
> > > >
> > > > Thanks,
> > > > Prathamesh
> > > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > >
> > > > > Christophe
> > > > >
> > > > >
> > > > > > Thanks,
> > > > > > Prathamesh
> > > > > >>
> > > > > >> Christophe
> > > > > >>
> > > > > >>
> > > > > >>> Ok.
> > > > > >>> Thanks,
> > > > > >>> Kyrill
> > > > > >>>
> > > > > >>>
> > > > > >>>> Thanks,
> > > > > >>>> Prathamesh
> > > > > >>>>> Thanks,
> > > > > >>>>> Kyrill
> > > > > >>>>>
> > > > > >>>>>> Thanks,
> > > > > >>>>>> Prathamesh
> > > > > >>>>>>> Thanks,
> > > > > >>>>>>> Prathamesh
> > > > > >>>>>>>> That being said, I suggest you look at other similar patterns
> in
> > > > > >>>>>>>> vec-common.md, most of which are gated on
> > > > > >>>>>>>> ARM_HAVE_<MODE>_ARITH
> > > > > >>>>>>>> and possibly beware of issues with iwmmxt :-)
> > > > > >>>>>>>>
> > > > > >>>>>>>> Christophe
> > > > > >>>>>>>>
> > > > > >>>>>>>>> Thanks,
> > > > > >>>>>>>>> Prathamesh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ARM] PR98435: Missed optimization in expanding vector constructor
  2021-07-06  9:28                           ` Kyrylo Tkachov
@ 2021-07-06 10:16                             ` Christophe Lyon
  0 siblings, 0 replies; 29+ messages in thread
From: Christophe Lyon @ 2021-07-06 10:16 UTC (permalink / raw)
  To: Kyrylo Tkachov; +Cc: Prathamesh Kulkarni, gcc Patches

On Tue, 6 Jul 2021 at 11:28, Kyrylo Tkachov via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
>
>
> > -----Original Message-----
> > From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> > Sent: 06 July 2021 10:25
> > To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> > Cc: Christophe LYON <christophe.lyon@foss.st.com>; gcc Patches <gcc-
> > patches@gcc.gnu.org>
> > Subject: Re: [ARM] PR98435: Missed optimization in expanding vector
> > constructor
> >
> > On Tue, 6 Jul 2021 at 13:33, Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> > wrote:
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> > > > Sent: 06 July 2021 08:06
> > > > To: Christophe LYON <christophe.lyon@foss.st.com>
> > > > Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; gcc Patches <gcc-
> > > > patches@gcc.gnu.org>
> > > > Subject: Re: [ARM] PR98435: Missed optimization in expanding vector
> > > > constructor
> > > >
> > > > On Thu, 1 Jul 2021 at 16:26, Prathamesh Kulkarni
> > > > <prathamesh.kulkarni@linaro.org> wrote:
> > > > >
> > > > > On Wed, 30 Jun 2021 at 20:51, Christophe LYON
> > > > > <christophe.lyon@foss.st.com> wrote:
> > > > > >
> > > > > >
> > > > > > On 29/06/2021 12:46, Prathamesh Kulkarni wrote:
> > > > > > > On Mon, 28 Jun 2021 at 14:48, Christophe LYON
> > > > > > > <christophe.lyon@foss.st.com> wrote:
> > > > > > >>
> > > > > > >> On 28/06/2021 10:40, Kyrylo Tkachov via Gcc-patches wrote:
> > > > > > >>>> -----Original Message-----
> > > > > > >>>> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> > > > > > >>>> Sent: 28 June 2021 09:38
> > > > > > >>>> To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> > > > > > >>>> Cc: Christophe Lyon <christophe.lyon@linaro.org>; gcc Patches
> > > > <gcc-
> > > > > > >>>> patches@gcc.gnu.org>
> > > > > > >>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding
> > > > vector
> > > > > > >>>> constructor
> > > > > > >>>>
> > > > > > >>>> On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov
> > > > <Kyrylo.Tkachov@arm.com>
> > > > > > >>>> wrote:
> > > > > > >>>>>
> > > > > > >>>>>> -----Original Message-----
> > > > > > >>>>>> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> > > > > > >>>>>> Sent: 14 June 2021 09:02
> > > > > > >>>>>> To: Christophe Lyon <christophe.lyon@linaro.org>
> > > > > > >>>>>> Cc: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov
> > > > > > >>>>>> <Kyrylo.Tkachov@arm.com>
> > > > > > >>>>>> Subject: Re: [ARM] PR98435: Missed optimization in
> > expanding
> > > > vector
> > > > > > >>>>>> constructor
> > > > > > >>>>>>
> > > > > > >>>>>> On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni
> > > > > > >>>>>> <prathamesh.kulkarni@linaro.org> wrote:
> > > > > > >>>>>>> On Fri, 4 Jun 2021 at 13:15, Christophe Lyon
> > > > > > >>>> <christophe.lyon@linaro.org>
> > > > > > >>>>>> wrote:
> > > > > > >>>>>>>> On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via Gcc-
> > > > patches
> > > > > > >>>>>>>> <gcc-patches@gcc.gnu.org> wrote:
> > > > > > >>>>>>>>> Hi,
> > > > > > >>>>>>>>> As mentioned in PR, for the following test-case:
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> #include <arm_neon.h>
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> bfloat16x4_t f1 (bfloat16_t a)
> > > > > > >>>>>>>>> {
> > > > > > >>>>>>>>>     return vdup_n_bf16 (a);
> > > > > > >>>>>>>>> }
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> bfloat16x4_t f2 (bfloat16_t a)
> > > > > > >>>>>>>>> {
> > > > > > >>>>>>>>>     return (bfloat16x4_t) {a, a, a, a};
> > > > > > >>>>>>>>> }
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> Compiling with arm-linux-gnueabi -O3 -mfpu=neon -
> > mfloat-
> > > > > > >>>> abi=softfp
> > > > > > >>>>>>>>> -march=armv8.2-a+bf16+fp16 results in f2 not being
> > > > vectorized:
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> f1:
> > > > > > >>>>>>>>>           vdup.16 d16, r0
> > > > > > >>>>>>>>>           vmov    r0, r1, d16  @ v4bf
> > > > > > >>>>>>>>>           bx      lr
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> f2:
> > > > > > >>>>>>>>>           mov     r3, r0  @ __bf16
> > > > > > >>>>>>>>>           adr     r1, .L4
> > > > > > >>>>>>>>>           ldrd    r0, [r1]
> > > > > > >>>>>>>>>           mov     r2, r3  @ __bf16
> > > > > > >>>>>>>>>           mov     ip, r3  @ __bf16
> > > > > > >>>>>>>>>           bfi     r1, r2, #0, #16
> > > > > > >>>>>>>>>           bfi     r0, ip, #0, #16
> > > > > > >>>>>>>>>           bfi     r1, r3, #16, #16
> > > > > > >>>>>>>>>           bfi     r0, r2, #16, #16
> > > > > > >>>>>>>>>           bx      lr
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> This seems to happen because vec_init pattern in neon.md
> > > > has VDQ
> > > > > > >>>>>> mode
> > > > > > >>>>>>>>> iterator, which doesn't include V4BF. In attached patch, I
> > > > changed
> > > > > > >>>>>>>>> mode
> > > > > > >>>>>>>>> to VDQX which seems to work for the test-case, and the
> > > > compiler
> > > > > > >>>> now
> > > > > > >>>>>> generates:
> > > > > > >>>>>>>>> f2:
> > > > > > >>>>>>>>>           vdup.16 d16, r0
> > > > > > >>>>>>>>>           vmov    r0, r1, d16  @ v4bf
> > > > > > >>>>>>>>>           bx      lr
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> However, the pattern is also gated on
> > TARGET_HAVE_MVE
> > > > and I am
> > > > > > >>>>>> not
> > > > > > >>>>>>>>> sure if either VDQ or VDQX are correct modes for MVE
> > since
> > > > MVE
> > > > > > >>>> has
> > > > > > >>>>>>>>> only 128-bit vectors ?
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>> I think patterns common to both Neon and MVE should be
> > > > moved to
> > > > > > >>>>>>>> vec-common.md, I don't know why such patterns were left
> > in
> > > > > > >>>> neon.md.
> > > > > > >>>>>>> Since we end up calling neon_expand_vector_init for both
> > > > NEON and
> > > > > > >>>> MVE,
> > > > > > >>>>>>> I am not sure if we should separate the pattern ?
> > > > > > >>>>>>> Would it make sense to FAIL if the mode size isn't 16 bytes
> > for
> > > > MVE as
> > > > > > >>>>>>> in attached patch so
> > > > > > >>>>>>> it will call neon_expand_vector_init only for 128-bit vectors ?
> > > > > > >>>>>>> Altho hard-coding 16 in the pattern doesn't seem a good
> > idea to
> > > > me
> > > > > > >>>> either.
> > > > > > >>>>>> ping https://gcc.gnu.org/pipermail/gcc-patches/2021-
> > > > June/572342.html
> > > > > > >>>>>> (attaching patch as text).
> > > > > > >>>>>>
> > > > > > >>>>> --- a/gcc/config/arm/neon.md
> > > > > > >>>>> +++ b/gcc/config/arm/neon.md
> > > > > > >>>>> @@ -459,10 +459,12 @@
> > > > > > >>>>>    )
> > > > > > >>>>>
> > > > > > >>>>>    (define_expand "vec_init<mode><V_elem_l>"
> > > > > > >>>>> -  [(match_operand:VDQ 0 "s_register_operand")
> > > > > > >>>>> +  [(match_operand:VDQX 0 "s_register_operand")
> > > > > > >>>>>       (match_operand 1 "" "")]
> > > > > > >>>>>      "TARGET_NEON || TARGET_HAVE_MVE"
> > > > > > >>>>>    {
> > > > > > >>>>> +  if (TARGET_HAVE_MVE && GET_MODE_SIZE (GET_MODE
> > > > > > >>>> (operands[0])) != 16)
> > > > > > >>>>> +    FAIL;
> > > > > > >>>>>      neon_expand_vector_init (operands[0], operands[1]);
> > > > > > >>>>>      DONE;
> > > > > > >>>>>    })
> > > > > > >>>>>
> > > > > > >>>>> I think we should move this to vec-common.md like Christophe
> > > > said.
> > > > > > >>>>> Perhaps rather than making it FAIL for non-16 MVE sizes we
> > just
> > > > disable it in
> > > > > > >>>> the expander condition?
> > > > > > >>>>> "TARGET_NEON || (TARGET_HAVE_MVE && GET_MODE_SIZE
> > (<
> > > > > > >>>> VDQ>mode) != 16)"
> > > > > > >>>> Is it OK to use <MODE>mode ? Because using <VDQ>mode
> > resulted
> > > > in lot
> > > > > > >>>> of build errors.
> > > > > > >>>> Also, I think the comparison should be inverted, ie,
> > GET_MODE_SIZE
> > > > > > >>>> (<MODE>mode) == 16 since
> > > > > > >>>> we want to make the pattern pass if target is MVE and vector
> > size is
> > > > 16 bytes ?
> > > > > > >>>> Do these changes in attached patch look OK ?
> > > > > > >>> Yes, you're right.
> > > > > > >>
> > > > > > >> Can't this be ARM_HAVE_<MODE>_ARITH like in most expanders in
> > > > vec-common.md?
> > > > > > >>
> > > > > > >> (maybe with a && !TARGET_REALLY_IWMMXT if needed)
> > > > > > > I wonder if this should be ARM_HAVE_<MODE>_LDST instead since
> > > > we're
> > > > > > > initializing the vector ?
> > > > > >
> > > > > >
> > > > > > Well, it really depends on which modes you want to enable.
> > > > > >
> > > > > >
> > > > > > Looks like your move VDQ -> VDQ adds V4BF, V8BF and DI.
> > > > > >
> > > > > > Are they all OK for Neon?
> > > > > >
> > > > > > They are not OK for MVE.
> > > > > >
> > > > > > Ideally you could add testcases to cover to the supported and
> > > > > > unsupported modes for both Neon and MVE.\
> > > > > >
> > > > > > Before your patch, the expander is enabled for MVE for 64 bit modes
> > > > > > (V8QI, V4HI, V2SI): what happens in this case? Does the compiler
> > crash
> > > > > > or is there something else preventing the match?
> > > > > Hi,
> > > > > Apparently there is VALID_MVE_MODE macro, so is it better to use:
> > > > > TARGET_NEON || (TARGET_HAVE_MVE &&
> > > > VALID_MVE_MODE(<MODE>mode))
> > > > > as in the attached patch ?
> > >
> > > The change is ok. I would like to see some testcases like Christophe
> > suggested, but this patch just moves the expander around rather than
> > introducing new functionality.
> > Hi Kyrill,
> > As mentioned in the first email, the patch improves code-gen for
> > following test-case:
> >
> > bfloat16x4_t f (bfloat16_t a)
> > {
> >   return (bfloat16x4_t) {a, a, a, a};
> > }
> >
> > Before patch:
> > f:
> >         mov     r3, r0  @ __bf16
> >         adr     r1, .L4
> >         ldrd    r0, [r1]
> >         mov     r2, r3  @ __bf16
> >         mov     ip, r3  @ __bf16
> >         bfi     r1, r2, #0, #16
> >         bfi     r0, ip, #0, #16
> >         bfi     r1, r3, #16, #16
> >         bfi     r0, r2, #16, #16
> >         bx      lr
> >
> > After patch:
> > f:
> >         vdup.16 d16, r0
> >         vmov    r0, r1, d16  @ v4bf
> >         bx      lr
> >
> > because the patch changes mode from VDQ to VDQX to accommodate bf
> > modes.
> > I have included the test in the attached patch.
> > I think Christophe's concerns were mainly about the right modes
> > getting enabled for MVE.
> > Unfortunately, I am not sure how to test for that because the FE
> > catches invalid modes, and we don't
> > end up hitting the pattern.
>

Wouldn't testcases with e.g.
return (int32x4_t) {a,a,a,a};
exercise the other modes?

> Ah, that should be ok then.
> Thanks,
> Kyrill
>
> >
> > Thanks,
> > Prathamesh
> > > Thanks,
> > > Kyrill
> > >
> > > > ping https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574206.html
> > > >
> > > > Thanks,
> > > > Prathamesh
> > > > >
> > > > > Thanks,
> > > > > Prathamesh
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > >
> > > > > > Christophe
> > > > > >
> > > > > >
> > > > > > > Thanks,
> > > > > > > Prathamesh
> > > > > > >>
> > > > > > >> Christophe
> > > > > > >>
> > > > > > >>
> > > > > > >>> Ok.
> > > > > > >>> Thanks,
> > > > > > >>> Kyrill
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>> Thanks,
> > > > > > >>>> Prathamesh
> > > > > > >>>>> Thanks,
> > > > > > >>>>> Kyrill
> > > > > > >>>>>
> > > > > > >>>>>> Thanks,
> > > > > > >>>>>> Prathamesh
> > > > > > >>>>>>> Thanks,
> > > > > > >>>>>>> Prathamesh
> > > > > > >>>>>>>> That being said, I suggest you look at other similar patterns
> > in
> > > > > > >>>>>>>> vec-common.md, most of which are gated on
> > > > > > >>>>>>>> ARM_HAVE_<MODE>_ARITH
> > > > > > >>>>>>>> and possibly beware of issues with iwmmxt :-)
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> Christophe
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>> Thanks,
> > > > > > >>>>>>>>> Prathamesh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ARM] PR98435: Missed optimization in expanding vector constructor
  2021-07-06  9:25                         ` Prathamesh Kulkarni
  2021-07-06  9:28                           ` Kyrylo Tkachov
@ 2021-08-03  9:29                           ` Christophe Lyon
  2021-08-03 10:56                             ` Prathamesh Kulkarni
  1 sibling, 1 reply; 29+ messages in thread
From: Christophe Lyon @ 2021-08-03  9:29 UTC (permalink / raw)
  To: Prathamesh Kulkarni; +Cc: Kyrylo Tkachov, gcc Patches

On Tue, Jul 6, 2021 at 11:26 AM Prathamesh Kulkarni via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> On Tue, 6 Jul 2021 at 13:33, Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> > > Sent: 06 July 2021 08:06
> > > To: Christophe LYON <christophe.lyon@foss.st.com>
> > > Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; gcc Patches <gcc-
> > > patches@gcc.gnu.org>
> > > Subject: Re: [ARM] PR98435: Missed optimization in expanding vector
> > > constructor
> > >
> > > On Thu, 1 Jul 2021 at 16:26, Prathamesh Kulkarni
> > > <prathamesh.kulkarni@linaro.org> wrote:
> > > >
> > > > On Wed, 30 Jun 2021 at 20:51, Christophe LYON
> > > > <christophe.lyon@foss.st.com> wrote:
> > > > >
> > > > >
> > > > > On 29/06/2021 12:46, Prathamesh Kulkarni wrote:
> > > > > > On Mon, 28 Jun 2021 at 14:48, Christophe LYON
> > > > > > <christophe.lyon@foss.st.com> wrote:
> > > > > >>
> > > > > >> On 28/06/2021 10:40, Kyrylo Tkachov via Gcc-patches wrote:
> > > > > >>>> -----Original Message-----
> > > > > >>>> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> > > > > >>>> Sent: 28 June 2021 09:38
> > > > > >>>> To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> > > > > >>>> Cc: Christophe Lyon <christophe.lyon@linaro.org>; gcc Patches
> > > <gcc-
> > > > > >>>> patches@gcc.gnu.org>
> > > > > >>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding
> > > vector
> > > > > >>>> constructor
> > > > > >>>>
> > > > > >>>> On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov
> > > <Kyrylo.Tkachov@arm.com>
> > > > > >>>> wrote:
> > > > > >>>>>
> > > > > >>>>>> -----Original Message-----
> > > > > >>>>>> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> > > > > >>>>>> Sent: 14 June 2021 09:02
> > > > > >>>>>> To: Christophe Lyon <christophe.lyon@linaro.org>
> > > > > >>>>>> Cc: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov
> > > > > >>>>>> <Kyrylo.Tkachov@arm.com>
> > > > > >>>>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding
> > > vector
> > > > > >>>>>> constructor
> > > > > >>>>>>
> > > > > >>>>>> On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni
> > > > > >>>>>> <prathamesh.kulkarni@linaro.org> wrote:
> > > > > >>>>>>> On Fri, 4 Jun 2021 at 13:15, Christophe Lyon
> > > > > >>>> <christophe.lyon@linaro.org>
> > > > > >>>>>> wrote:
> > > > > >>>>>>>> On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via Gcc-
> > > patches
> > > > > >>>>>>>> <gcc-patches@gcc.gnu.org> wrote:
> > > > > >>>>>>>>> Hi,
> > > > > >>>>>>>>> As mentioned in PR, for the following test-case:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> #include <arm_neon.h>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> bfloat16x4_t f1 (bfloat16_t a)
> > > > > >>>>>>>>> {
> > > > > >>>>>>>>>     return vdup_n_bf16 (a);
> > > > > >>>>>>>>> }
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> bfloat16x4_t f2 (bfloat16_t a)
> > > > > >>>>>>>>> {
> > > > > >>>>>>>>>     return (bfloat16x4_t) {a, a, a, a};
> > > > > >>>>>>>>> }
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Compiling with arm-linux-gnueabi -O3 -mfpu=neon -mfloat-
> > > > > >>>> abi=softfp
> > > > > >>>>>>>>> -march=armv8.2-a+bf16+fp16 results in f2 not being
> > > vectorized:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> f1:
> > > > > >>>>>>>>>           vdup.16 d16, r0
> > > > > >>>>>>>>>           vmov    r0, r1, d16  @ v4bf
> > > > > >>>>>>>>>           bx      lr
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> f2:
> > > > > >>>>>>>>>           mov     r3, r0  @ __bf16
> > > > > >>>>>>>>>           adr     r1, .L4
> > > > > >>>>>>>>>           ldrd    r0, [r1]
> > > > > >>>>>>>>>           mov     r2, r3  @ __bf16
> > > > > >>>>>>>>>           mov     ip, r3  @ __bf16
> > > > > >>>>>>>>>           bfi     r1, r2, #0, #16
> > > > > >>>>>>>>>           bfi     r0, ip, #0, #16
> > > > > >>>>>>>>>           bfi     r1, r3, #16, #16
> > > > > >>>>>>>>>           bfi     r0, r2, #16, #16
> > > > > >>>>>>>>>           bx      lr
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> This seems to happen because vec_init pattern in neon.md
> > > has VDQ
> > > > > >>>>>> mode
> > > > > >>>>>>>>> iterator, which doesn't include V4BF. In attached patch,
> I
> > > changed
> > > > > >>>>>>>>> mode
> > > > > >>>>>>>>> to VDQX which seems to work for the test-case, and the
> > > compiler
> > > > > >>>> now
> > > > > >>>>>> generates:
> > > > > >>>>>>>>> f2:
> > > > > >>>>>>>>>           vdup.16 d16, r0
> > > > > >>>>>>>>>           vmov    r0, r1, d16  @ v4bf
> > > > > >>>>>>>>>           bx      lr
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> However, the pattern is also gated on TARGET_HAVE_MVE
> > > and I am
> > > > > >>>>>> not
> > > > > >>>>>>>>> sure if either VDQ or VDQX are correct modes for MVE
> since
> > > MVE
> > > > > >>>> has
> > > > > >>>>>>>>> only 128-bit vectors ?
> > > > > >>>>>>>>>
> > > > > >>>>>>>> I think patterns common to both Neon and MVE should be
> > > moved to
> > > > > >>>>>>>> vec-common.md, I don't know why such patterns were left in
> > > > > >>>> neon.md.
> > > > > >>>>>>> Since we end up calling neon_expand_vector_init for both
> > > NEON and
> > > > > >>>> MVE,
> > > > > >>>>>>> I am not sure if we should separate the pattern ?
> > > > > >>>>>>> Would it make sense to FAIL if the mode size isn't 16
> bytes for
> > > MVE as
> > > > > >>>>>>> in attached patch so
> > > > > >>>>>>> it will call neon_expand_vector_init only for 128-bit
> vectors ?
> > > > > >>>>>>> Altho hard-coding 16 in the pattern doesn't seem a good
> idea to
> > > me
> > > > > >>>> either.
> > > > > >>>>>> ping https://gcc.gnu.org/pipermail/gcc-patches/2021-
> > > June/572342.html
> > > > > >>>>>> (attaching patch as text).
> > > > > >>>>>>
> > > > > >>>>> --- a/gcc/config/arm/neon.md
> > > > > >>>>> +++ b/gcc/config/arm/neon.md
> > > > > >>>>> @@ -459,10 +459,12 @@
> > > > > >>>>>    )
> > > > > >>>>>
> > > > > >>>>>    (define_expand "vec_init<mode><V_elem_l>"
> > > > > >>>>> -  [(match_operand:VDQ 0 "s_register_operand")
> > > > > >>>>> +  [(match_operand:VDQX 0 "s_register_operand")
> > > > > >>>>>       (match_operand 1 "" "")]
> > > > > >>>>>      "TARGET_NEON || TARGET_HAVE_MVE"
> > > > > >>>>>    {
> > > > > >>>>> +  if (TARGET_HAVE_MVE && GET_MODE_SIZE (GET_MODE
> > > > > >>>> (operands[0])) != 16)
> > > > > >>>>> +    FAIL;
> > > > > >>>>>      neon_expand_vector_init (operands[0], operands[1]);
> > > > > >>>>>      DONE;
> > > > > >>>>>    })
> > > > > >>>>>
> > > > > >>>>> I think we should move this to vec-common.md like Christophe
> > > said.
> > > > > >>>>> Perhaps rather than making it FAIL for non-16 MVE sizes we
> just
> > > disable it in
> > > > > >>>> the expander condition?
> > > > > >>>>> "TARGET_NEON || (TARGET_HAVE_MVE && GET_MODE_SIZE (<
> > > > > >>>> VDQ>mode) != 16)"
> > > > > >>>> Is it OK to use <MODE>mode ? Because using <VDQ>mode resulted
> > > in lot
> > > > > >>>> of build errors.
> > > > > >>>> Also, I think the comparison should be inverted, ie,
> GET_MODE_SIZE
> > > > > >>>> (<MODE>mode) == 16 since
> > > > > >>>> we want to make the pattern pass if target is MVE and vector
> size is
> > > 16 bytes ?
> > > > > >>>> Do these changes in attached patch look OK ?
> > > > > >>> Yes, you're right.
> > > > > >>
> > > > > >> Can't this be ARM_HAVE_<MODE>_ARITH like in most expanders in
> > > vec-common.md?
> > > > > >>
> > > > > >> (maybe with a && !TARGET_REALLY_IWMMXT if needed)
> > > > > > I wonder if this should be ARM_HAVE_<MODE>_LDST instead since
> > > we're
> > > > > > initializing the vector ?
> > > > >
> > > > >
> > > > > Well, it really depends on which modes you want to enable.
> > > > >
> > > > >
> > > > > Looks like your move VDQ -> VDQ adds V4BF, V8BF and DI.
> > > > >
> > > > > Are they all OK for Neon?
> > > > >
> > > > > They are not OK for MVE.
> > > > >
> > > > > Ideally you could add testcases to cover to the supported and
> > > > > unsupported modes for both Neon and MVE.\
> > > > >
> > > > > Before your patch, the expander is enabled for MVE for 64 bit modes
> > > > > (V8QI, V4HI, V2SI): what happens in this case? Does the compiler
> crash
> > > > > or is there something else preventing the match?
> > > > Hi,
> > > > Apparently there is VALID_MVE_MODE macro, so is it better to use:
> > > > TARGET_NEON || (TARGET_HAVE_MVE &&
> > > VALID_MVE_MODE(<MODE>mode))
> > > > as in the attached patch ?
> >
> > The change is ok. I would like to see some testcases like Christophe
> suggested, but this patch just moves the expander around rather than
> introducing new functionality.
> Hi Kyrill,
> As mentioned in the first email, the patch improves code-gen for
> following test-case:
>
> bfloat16x4_t f (bfloat16_t a)
> {
>   return (bfloat16x4_t) {a, a, a, a};
> }
>
> Before patch:
> f:
>         mov     r3, r0  @ __bf16
>         adr     r1, .L4
>         ldrd    r0, [r1]
>         mov     r2, r3  @ __bf16
>         mov     ip, r3  @ __bf16
>         bfi     r1, r2, #0, #16
>         bfi     r0, ip, #0, #16
>         bfi     r1, r3, #16, #16
>         bfi     r0, r2, #16, #16
>         bx      lr
>
> After patch:
> f:
>         vdup.16 d16, r0
>         vmov    r0, r1, d16  @ v4bf
>         bx      lr
>
> because the patch changes mode from VDQ to VDQX to accommodate bf modes.
> I have included the test in the attached patch.
> I think Christophe's concerns were mainly about the right modes
> getting enabled for MVE.
> Unfortunately, I am not sure how to test for that because the FE
> catches invalid modes, and we don't
> end up hitting the pattern.
>
>
Hi Prathamesh,

The new testcase fails on arm-linux-gnueabihf:
 FAIL: gcc.target/arm/simd/pr98435.c (test for excess errors)
Excess errors:
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/sysroot-arm-none-linux-gnueabihf/usr/include/gnu/stubs.h:7:11:
fatal error: gnu/stubs-soft.h: No such file or directory
compilation terminated.

Because you don't check whether  -mfloat-abi=softfp is actually supported.

Can you fix that?

Thanks

Christophe



> Thanks,
> Prathamesh
> > Thanks,
> > Kyrill
> >
> > > ping https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574206.html
> > >
> > > Thanks,
> > > Prathamesh
> > > >
> > > > Thanks,
> > > > Prathamesh
> > > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > >
> > > > > Christophe
> > > > >
> > > > >
> > > > > > Thanks,
> > > > > > Prathamesh
> > > > > >>
> > > > > >> Christophe
> > > > > >>
> > > > > >>
> > > > > >>> Ok.
> > > > > >>> Thanks,
> > > > > >>> Kyrill
> > > > > >>>
> > > > > >>>
> > > > > >>>> Thanks,
> > > > > >>>> Prathamesh
> > > > > >>>>> Thanks,
> > > > > >>>>> Kyrill
> > > > > >>>>>
> > > > > >>>>>> Thanks,
> > > > > >>>>>> Prathamesh
> > > > > >>>>>>> Thanks,
> > > > > >>>>>>> Prathamesh
> > > > > >>>>>>>> That being said, I suggest you look at other similar
> patterns in
> > > > > >>>>>>>> vec-common.md, most of which are gated on
> > > > > >>>>>>>> ARM_HAVE_<MODE>_ARITH
> > > > > >>>>>>>> and possibly beware of issues with iwmmxt :-)
> > > > > >>>>>>>>
> > > > > >>>>>>>> Christophe
> > > > > >>>>>>>>
> > > > > >>>>>>>>> Thanks,
> > > > > >>>>>>>>> Prathamesh
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ARM] PR98435: Missed optimization in expanding vector constructor
  2021-08-03  9:29                           ` Christophe Lyon
@ 2021-08-03 10:56                             ` Prathamesh Kulkarni
  2021-08-03 15:22                               ` Christophe Lyon
  0 siblings, 1 reply; 29+ messages in thread
From: Prathamesh Kulkarni @ 2021-08-03 10:56 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: Kyrylo Tkachov, gcc Patches

[-- Attachment #1: Type: text/plain, Size: 11625 bytes --]

On Tue, 3 Aug 2021 at 14:59, Christophe Lyon
<christophe.lyon.oss@gmail.com> wrote:
>
>
>
> On Tue, Jul 6, 2021 at 11:26 AM Prathamesh Kulkarni via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>>
>> On Tue, 6 Jul 2021 at 13:33, Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> wrote:
>> >
>> >
>> >
>> > > -----Original Message-----
>> > > From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
>> > > Sent: 06 July 2021 08:06
>> > > To: Christophe LYON <christophe.lyon@foss.st.com>
>> > > Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; gcc Patches <gcc-
>> > > patches@gcc.gnu.org>
>> > > Subject: Re: [ARM] PR98435: Missed optimization in expanding vector
>> > > constructor
>> > >
>> > > On Thu, 1 Jul 2021 at 16:26, Prathamesh Kulkarni
>> > > <prathamesh.kulkarni@linaro.org> wrote:
>> > > >
>> > > > On Wed, 30 Jun 2021 at 20:51, Christophe LYON
>> > > > <christophe.lyon@foss.st.com> wrote:
>> > > > >
>> > > > >
>> > > > > On 29/06/2021 12:46, Prathamesh Kulkarni wrote:
>> > > > > > On Mon, 28 Jun 2021 at 14:48, Christophe LYON
>> > > > > > <christophe.lyon@foss.st.com> wrote:
>> > > > > >>
>> > > > > >> On 28/06/2021 10:40, Kyrylo Tkachov via Gcc-patches wrote:
>> > > > > >>>> -----Original Message-----
>> > > > > >>>> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
>> > > > > >>>> Sent: 28 June 2021 09:38
>> > > > > >>>> To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
>> > > > > >>>> Cc: Christophe Lyon <christophe.lyon@linaro.org>; gcc Patches
>> > > <gcc-
>> > > > > >>>> patches@gcc.gnu.org>
>> > > > > >>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding
>> > > vector
>> > > > > >>>> constructor
>> > > > > >>>>
>> > > > > >>>> On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov
>> > > <Kyrylo.Tkachov@arm.com>
>> > > > > >>>> wrote:
>> > > > > >>>>>
>> > > > > >>>>>> -----Original Message-----
>> > > > > >>>>>> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
>> > > > > >>>>>> Sent: 14 June 2021 09:02
>> > > > > >>>>>> To: Christophe Lyon <christophe.lyon@linaro.org>
>> > > > > >>>>>> Cc: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov
>> > > > > >>>>>> <Kyrylo.Tkachov@arm.com>
>> > > > > >>>>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding
>> > > vector
>> > > > > >>>>>> constructor
>> > > > > >>>>>>
>> > > > > >>>>>> On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni
>> > > > > >>>>>> <prathamesh.kulkarni@linaro.org> wrote:
>> > > > > >>>>>>> On Fri, 4 Jun 2021 at 13:15, Christophe Lyon
>> > > > > >>>> <christophe.lyon@linaro.org>
>> > > > > >>>>>> wrote:
>> > > > > >>>>>>>> On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via Gcc-
>> > > patches
>> > > > > >>>>>>>> <gcc-patches@gcc.gnu.org> wrote:
>> > > > > >>>>>>>>> Hi,
>> > > > > >>>>>>>>> As mentioned in PR, for the following test-case:
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> #include <arm_neon.h>
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> bfloat16x4_t f1 (bfloat16_t a)
>> > > > > >>>>>>>>> {
>> > > > > >>>>>>>>>     return vdup_n_bf16 (a);
>> > > > > >>>>>>>>> }
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> bfloat16x4_t f2 (bfloat16_t a)
>> > > > > >>>>>>>>> {
>> > > > > >>>>>>>>>     return (bfloat16x4_t) {a, a, a, a};
>> > > > > >>>>>>>>> }
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> Compiling with arm-linux-gnueabi -O3 -mfpu=neon -mfloat-
>> > > > > >>>> abi=softfp
>> > > > > >>>>>>>>> -march=armv8.2-a+bf16+fp16 results in f2 not being
>> > > vectorized:
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> f1:
>> > > > > >>>>>>>>>           vdup.16 d16, r0
>> > > > > >>>>>>>>>           vmov    r0, r1, d16  @ v4bf
>> > > > > >>>>>>>>>           bx      lr
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> f2:
>> > > > > >>>>>>>>>           mov     r3, r0  @ __bf16
>> > > > > >>>>>>>>>           adr     r1, .L4
>> > > > > >>>>>>>>>           ldrd    r0, [r1]
>> > > > > >>>>>>>>>           mov     r2, r3  @ __bf16
>> > > > > >>>>>>>>>           mov     ip, r3  @ __bf16
>> > > > > >>>>>>>>>           bfi     r1, r2, #0, #16
>> > > > > >>>>>>>>>           bfi     r0, ip, #0, #16
>> > > > > >>>>>>>>>           bfi     r1, r3, #16, #16
>> > > > > >>>>>>>>>           bfi     r0, r2, #16, #16
>> > > > > >>>>>>>>>           bx      lr
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> This seems to happen because vec_init pattern in neon.md
>> > > has VDQ
>> > > > > >>>>>> mode
>> > > > > >>>>>>>>> iterator, which doesn't include V4BF. In attached patch, I
>> > > changed
>> > > > > >>>>>>>>> mode
>> > > > > >>>>>>>>> to VDQX which seems to work for the test-case, and the
>> > > compiler
>> > > > > >>>> now
>> > > > > >>>>>> generates:
>> > > > > >>>>>>>>> f2:
>> > > > > >>>>>>>>>           vdup.16 d16, r0
>> > > > > >>>>>>>>>           vmov    r0, r1, d16  @ v4bf
>> > > > > >>>>>>>>>           bx      lr
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> However, the pattern is also gated on TARGET_HAVE_MVE
>> > > and I am
>> > > > > >>>>>> not
>> > > > > >>>>>>>>> sure if either VDQ or VDQX are correct modes for MVE since
>> > > MVE
>> > > > > >>>> has
>> > > > > >>>>>>>>> only 128-bit vectors ?
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>> I think patterns common to both Neon and MVE should be
>> > > moved to
>> > > > > >>>>>>>> vec-common.md, I don't know why such patterns were left in
>> > > > > >>>> neon.md.
>> > > > > >>>>>>> Since we end up calling neon_expand_vector_init for both
>> > > NEON and
>> > > > > >>>> MVE,
>> > > > > >>>>>>> I am not sure if we should separate the pattern ?
>> > > > > >>>>>>> Would it make sense to FAIL if the mode size isn't 16 bytes for
>> > > MVE as
>> > > > > >>>>>>> in attached patch so
>> > > > > >>>>>>> it will call neon_expand_vector_init only for 128-bit vectors ?
>> > > > > >>>>>>> Altho hard-coding 16 in the pattern doesn't seem a good idea to
>> > > me
>> > > > > >>>> either.
>> > > > > >>>>>> ping https://gcc.gnu.org/pipermail/gcc-patches/2021-
>> > > June/572342.html
>> > > > > >>>>>> (attaching patch as text).
>> > > > > >>>>>>
>> > > > > >>>>> --- a/gcc/config/arm/neon.md
>> > > > > >>>>> +++ b/gcc/config/arm/neon.md
>> > > > > >>>>> @@ -459,10 +459,12 @@
>> > > > > >>>>>    )
>> > > > > >>>>>
>> > > > > >>>>>    (define_expand "vec_init<mode><V_elem_l>"
>> > > > > >>>>> -  [(match_operand:VDQ 0 "s_register_operand")
>> > > > > >>>>> +  [(match_operand:VDQX 0 "s_register_operand")
>> > > > > >>>>>       (match_operand 1 "" "")]
>> > > > > >>>>>      "TARGET_NEON || TARGET_HAVE_MVE"
>> > > > > >>>>>    {
>> > > > > >>>>> +  if (TARGET_HAVE_MVE && GET_MODE_SIZE (GET_MODE
>> > > > > >>>> (operands[0])) != 16)
>> > > > > >>>>> +    FAIL;
>> > > > > >>>>>      neon_expand_vector_init (operands[0], operands[1]);
>> > > > > >>>>>      DONE;
>> > > > > >>>>>    })
>> > > > > >>>>>
>> > > > > >>>>> I think we should move this to vec-common.md like Christophe
>> > > said.
>> > > > > >>>>> Perhaps rather than making it FAIL for non-16 MVE sizes we just
>> > > disable it in
>> > > > > >>>> the expander condition?
>> > > > > >>>>> "TARGET_NEON || (TARGET_HAVE_MVE && GET_MODE_SIZE (<
>> > > > > >>>> VDQ>mode) != 16)"
>> > > > > >>>> Is it OK to use <MODE>mode ? Because using <VDQ>mode resulted
>> > > in lot
>> > > > > >>>> of build errors.
>> > > > > >>>> Also, I think the comparison should be inverted, ie, GET_MODE_SIZE
>> > > > > >>>> (<MODE>mode) == 16 since
>> > > > > >>>> we want to make the pattern pass if target is MVE and vector size is
>> > > 16 bytes ?
>> > > > > >>>> Do these changes in attached patch look OK ?
>> > > > > >>> Yes, you're right.
>> > > > > >>
>> > > > > >> Can't this be ARM_HAVE_<MODE>_ARITH like in most expanders in
>> > > vec-common.md?
>> > > > > >>
>> > > > > >> (maybe with a && !TARGET_REALLY_IWMMXT if needed)
>> > > > > > I wonder if this should be ARM_HAVE_<MODE>_LDST instead since
>> > > we're
>> > > > > > initializing the vector ?
>> > > > >
>> > > > >
>> > > > > Well, it really depends on which modes you want to enable.
>> > > > >
>> > > > >
>> > > > > Looks like your move VDQ -> VDQ adds V4BF, V8BF and DI.
>> > > > >
>> > > > > Are they all OK for Neon?
>> > > > >
>> > > > > They are not OK for MVE.
>> > > > >
>> > > > > Ideally you could add testcases to cover to the supported and
>> > > > > unsupported modes for both Neon and MVE.\
>> > > > >
>> > > > > Before your patch, the expander is enabled for MVE for 64 bit modes
>> > > > > (V8QI, V4HI, V2SI): what happens in this case? Does the compiler crash
>> > > > > or is there something else preventing the match?
>> > > > Hi,
>> > > > Apparently there is VALID_MVE_MODE macro, so is it better to use:
>> > > > TARGET_NEON || (TARGET_HAVE_MVE &&
>> > > VALID_MVE_MODE(<MODE>mode))
>> > > > as in the attached patch ?
>> >
>> > The change is ok. I would like to see some testcases like Christophe suggested, but this patch just moves the expander around rather than introducing new functionality.
>> Hi Kyrill,
>> As mentioned in the first email, the patch improves code-gen for
>> following test-case:
>>
>> bfloat16x4_t f (bfloat16_t a)
>> {
>>   return (bfloat16x4_t) {a, a, a, a};
>> }
>>
>> Before patch:
>> f:
>>         mov     r3, r0  @ __bf16
>>         adr     r1, .L4
>>         ldrd    r0, [r1]
>>         mov     r2, r3  @ __bf16
>>         mov     ip, r3  @ __bf16
>>         bfi     r1, r2, #0, #16
>>         bfi     r0, ip, #0, #16
>>         bfi     r1, r3, #16, #16
>>         bfi     r0, r2, #16, #16
>>         bx      lr
>>
>> After patch:
>> f:
>>         vdup.16 d16, r0
>>         vmov    r0, r1, d16  @ v4bf
>>         bx      lr
>>
>> because the patch changes mode from VDQ to VDQX to accommodate bf modes.
>> I have included the test in the attached patch.
>> I think Christophe's concerns were mainly about the right modes
>> getting enabled for MVE.
>> Unfortunately, I am not sure how to test for that because the FE
>> catches invalid modes, and we don't
>> end up hitting the pattern.
>>
>
> Hi Prathamesh,
>
> The new testcase fails on arm-linux-gnueabihf:
>  FAIL: gcc.target/arm/simd/pr98435.c (test for excess errors)
> Excess errors:
> /aci-gcc-fsf/builds/gcc-fsf-gccsrc/sysroot-arm-none-linux-gnueabihf/usr/include/gnu/stubs.h:7:11: fatal error: gnu/stubs-soft.h: No such file or directory
> compilation terminated.
>
> Because you don't check whether  -mfloat-abi=softfp is actually supported.
>
> Can you fix that?
Oops, sorry about that.
The attached patch fixes the test by requiring arm_softfloat and makes
it UNSUPPORTED on arm-linux-gnueabihf.
Does it look OK ?

Thanks,
Prathamesh
>
> Thanks
>
> Christophe
>
>
>>
>> Thanks,
>> Prathamesh
>> > Thanks,
>> > Kyrill
>> >
>> > > ping https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574206.html
>> > >
>> > > Thanks,
>> > > Prathamesh
>> > > >
>> > > > Thanks,
>> > > > Prathamesh
>> > > > >
>> > > > >
>> > > > > Thanks,
>> > > > >
>> > > > >
>> > > > > Christophe
>> > > > >
>> > > > >
>> > > > > > Thanks,
>> > > > > > Prathamesh
>> > > > > >>
>> > > > > >> Christophe
>> > > > > >>
>> > > > > >>
>> > > > > >>> Ok.
>> > > > > >>> Thanks,
>> > > > > >>> Kyrill
>> > > > > >>>
>> > > > > >>>
>> > > > > >>>> Thanks,
>> > > > > >>>> Prathamesh
>> > > > > >>>>> Thanks,
>> > > > > >>>>> Kyrill
>> > > > > >>>>>
>> > > > > >>>>>> Thanks,
>> > > > > >>>>>> Prathamesh
>> > > > > >>>>>>> Thanks,
>> > > > > >>>>>>> Prathamesh
>> > > > > >>>>>>>> That being said, I suggest you look at other similar patterns in
>> > > > > >>>>>>>> vec-common.md, most of which are gated on
>> > > > > >>>>>>>> ARM_HAVE_<MODE>_ARITH
>> > > > > >>>>>>>> and possibly beware of issues with iwmmxt :-)
>> > > > > >>>>>>>>
>> > > > > >>>>>>>> Christophe
>> > > > > >>>>>>>>
>> > > > > >>>>>>>>> Thanks,
>> > > > > >>>>>>>>> Prathamesh

[-- Attachment #2: pr98435-test-fix.diff --]
[-- Type: application/octet-stream, Size: 566 bytes --]

diff --git a/gcc/testsuite/gcc.target/arm/simd/pr98435.c b/gcc/testsuite/gcc.target/arm/simd/pr98435.c
index 0af8633fd56..4f6f6208bdf 100644
--- a/gcc/testsuite/gcc.target/arm/simd/pr98435.c
+++ b/gcc/testsuite/gcc.target/arm/simd/pr98435.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -ffast-math" } */
 /* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-require-effective-target arm_softfloat } */
 /* { dg-add-options arm_v8_2a_bf16_neon } */
 /* { dg-additional-options "-mfloat-abi=softfp -march=armv8.2-a+bf16+fp16" } */
 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ARM] PR98435: Missed optimization in expanding vector constructor
  2021-08-03 10:56                             ` Prathamesh Kulkarni
@ 2021-08-03 15:22                               ` Christophe Lyon
  2021-08-05 12:27                                 ` Prathamesh Kulkarni
  0 siblings, 1 reply; 29+ messages in thread
From: Christophe Lyon @ 2021-08-03 15:22 UTC (permalink / raw)
  To: Prathamesh Kulkarni; +Cc: Kyrylo Tkachov, gcc Patches

On Tue, Aug 3, 2021 at 12:57 PM Prathamesh Kulkarni <
prathamesh.kulkarni@linaro.org> wrote:

> On Tue, 3 Aug 2021 at 14:59, Christophe Lyon
> <christophe.lyon.oss@gmail.com> wrote:
> >
> >
> >
> > On Tue, Jul 6, 2021 at 11:26 AM Prathamesh Kulkarni via Gcc-patches <
> gcc-patches@gcc.gnu.org> wrote:
> >>
> >> On Tue, 6 Jul 2021 at 13:33, Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> wrote:
> >> >
> >> >
> >> >
> >> > > -----Original Message-----
> >> > > From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> >> > > Sent: 06 July 2021 08:06
> >> > > To: Christophe LYON <christophe.lyon@foss.st.com>
> >> > > Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; gcc Patches <gcc-
> >> > > patches@gcc.gnu.org>
> >> > > Subject: Re: [ARM] PR98435: Missed optimization in expanding vector
> >> > > constructor
> >> > >
> >> > > On Thu, 1 Jul 2021 at 16:26, Prathamesh Kulkarni
> >> > > <prathamesh.kulkarni@linaro.org> wrote:
> >> > > >
> >> > > > On Wed, 30 Jun 2021 at 20:51, Christophe LYON
> >> > > > <christophe.lyon@foss.st.com> wrote:
> >> > > > >
> >> > > > >
> >> > > > > On 29/06/2021 12:46, Prathamesh Kulkarni wrote:
> >> > > > > > On Mon, 28 Jun 2021 at 14:48, Christophe LYON
> >> > > > > > <christophe.lyon@foss.st.com> wrote:
> >> > > > > >>
> >> > > > > >> On 28/06/2021 10:40, Kyrylo Tkachov via Gcc-patches wrote:
> >> > > > > >>>> -----Original Message-----
> >> > > > > >>>> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> >> > > > > >>>> Sent: 28 June 2021 09:38
> >> > > > > >>>> To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> >> > > > > >>>> Cc: Christophe Lyon <christophe.lyon@linaro.org>; gcc
> Patches
> >> > > <gcc-
> >> > > > > >>>> patches@gcc.gnu.org>
> >> > > > > >>>> Subject: Re: [ARM] PR98435: Missed optimization in
> expanding
> >> > > vector
> >> > > > > >>>> constructor
> >> > > > > >>>>
> >> > > > > >>>> On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov
> >> > > <Kyrylo.Tkachov@arm.com>
> >> > > > > >>>> wrote:
> >> > > > > >>>>>
> >> > > > > >>>>>> -----Original Message-----
> >> > > > > >>>>>> From: Prathamesh Kulkarni <
> prathamesh.kulkarni@linaro.org>
> >> > > > > >>>>>> Sent: 14 June 2021 09:02
> >> > > > > >>>>>> To: Christophe Lyon <christophe.lyon@linaro.org>
> >> > > > > >>>>>> Cc: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo
> Tkachov
> >> > > > > >>>>>> <Kyrylo.Tkachov@arm.com>
> >> > > > > >>>>>> Subject: Re: [ARM] PR98435: Missed optimization in
> expanding
> >> > > vector
> >> > > > > >>>>>> constructor
> >> > > > > >>>>>>
> >> > > > > >>>>>> On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni
> >> > > > > >>>>>> <prathamesh.kulkarni@linaro.org> wrote:
> >> > > > > >>>>>>> On Fri, 4 Jun 2021 at 13:15, Christophe Lyon
> >> > > > > >>>> <christophe.lyon@linaro.org>
> >> > > > > >>>>>> wrote:
> >> > > > > >>>>>>>> On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via
> Gcc-
> >> > > patches
> >> > > > > >>>>>>>> <gcc-patches@gcc.gnu.org> wrote:
> >> > > > > >>>>>>>>> Hi,
> >> > > > > >>>>>>>>> As mentioned in PR, for the following test-case:
> >> > > > > >>>>>>>>>
> >> > > > > >>>>>>>>> #include <arm_neon.h>
> >> > > > > >>>>>>>>>
> >> > > > > >>>>>>>>> bfloat16x4_t f1 (bfloat16_t a)
> >> > > > > >>>>>>>>> {
> >> > > > > >>>>>>>>>     return vdup_n_bf16 (a);
> >> > > > > >>>>>>>>> }
> >> > > > > >>>>>>>>>
> >> > > > > >>>>>>>>> bfloat16x4_t f2 (bfloat16_t a)
> >> > > > > >>>>>>>>> {
> >> > > > > >>>>>>>>>     return (bfloat16x4_t) {a, a, a, a};
> >> > > > > >>>>>>>>> }
> >> > > > > >>>>>>>>>
> >> > > > > >>>>>>>>> Compiling with arm-linux-gnueabi -O3 -mfpu=neon
> -mfloat-
> >> > > > > >>>> abi=softfp
> >> > > > > >>>>>>>>> -march=armv8.2-a+bf16+fp16 results in f2 not being
> >> > > vectorized:
> >> > > > > >>>>>>>>>
> >> > > > > >>>>>>>>> f1:
> >> > > > > >>>>>>>>>           vdup.16 d16, r0
> >> > > > > >>>>>>>>>           vmov    r0, r1, d16  @ v4bf
> >> > > > > >>>>>>>>>           bx      lr
> >> > > > > >>>>>>>>>
> >> > > > > >>>>>>>>> f2:
> >> > > > > >>>>>>>>>           mov     r3, r0  @ __bf16
> >> > > > > >>>>>>>>>           adr     r1, .L4
> >> > > > > >>>>>>>>>           ldrd    r0, [r1]
> >> > > > > >>>>>>>>>           mov     r2, r3  @ __bf16
> >> > > > > >>>>>>>>>           mov     ip, r3  @ __bf16
> >> > > > > >>>>>>>>>           bfi     r1, r2, #0, #16
> >> > > > > >>>>>>>>>           bfi     r0, ip, #0, #16
> >> > > > > >>>>>>>>>           bfi     r1, r3, #16, #16
> >> > > > > >>>>>>>>>           bfi     r0, r2, #16, #16
> >> > > > > >>>>>>>>>           bx      lr
> >> > > > > >>>>>>>>>
> >> > > > > >>>>>>>>> This seems to happen because vec_init pattern in
> neon.md
> >> > > has VDQ
> >> > > > > >>>>>> mode
> >> > > > > >>>>>>>>> iterator, which doesn't include V4BF. In attached
> patch, I
> >> > > changed
> >> > > > > >>>>>>>>> mode
> >> > > > > >>>>>>>>> to VDQX which seems to work for the test-case, and the
> >> > > compiler
> >> > > > > >>>> now
> >> > > > > >>>>>> generates:
> >> > > > > >>>>>>>>> f2:
> >> > > > > >>>>>>>>>           vdup.16 d16, r0
> >> > > > > >>>>>>>>>           vmov    r0, r1, d16  @ v4bf
> >> > > > > >>>>>>>>>           bx      lr
> >> > > > > >>>>>>>>>
> >> > > > > >>>>>>>>> However, the pattern is also gated on TARGET_HAVE_MVE
> >> > > and I am
> >> > > > > >>>>>> not
> >> > > > > >>>>>>>>> sure if either VDQ or VDQX are correct modes for MVE
> since
> >> > > MVE
> >> > > > > >>>> has
> >> > > > > >>>>>>>>> only 128-bit vectors ?
> >> > > > > >>>>>>>>>
> >> > > > > >>>>>>>> I think patterns common to both Neon and MVE should be
> >> > > moved to
> >> > > > > >>>>>>>> vec-common.md, I don't know why such patterns were
> left in
> >> > > > > >>>> neon.md.
> >> > > > > >>>>>>> Since we end up calling neon_expand_vector_init for both
> >> > > NEON and
> >> > > > > >>>> MVE,
> >> > > > > >>>>>>> I am not sure if we should separate the pattern ?
> >> > > > > >>>>>>> Would it make sense to FAIL if the mode size isn't 16
> bytes for
> >> > > MVE as
> >> > > > > >>>>>>> in attached patch so
> >> > > > > >>>>>>> it will call neon_expand_vector_init only for 128-bit
> vectors ?
> >> > > > > >>>>>>> Altho hard-coding 16 in the pattern doesn't seem a good
> idea to
> >> > > me
> >> > > > > >>>> either.
> >> > > > > >>>>>> ping https://gcc.gnu.org/pipermail/gcc-patches/2021-
> >> > > June/572342.html
> >> > > > > >>>>>> (attaching patch as text).
> >> > > > > >>>>>>
> >> > > > > >>>>> --- a/gcc/config/arm/neon.md
> >> > > > > >>>>> +++ b/gcc/config/arm/neon.md
> >> > > > > >>>>> @@ -459,10 +459,12 @@
> >> > > > > >>>>>    )
> >> > > > > >>>>>
> >> > > > > >>>>>    (define_expand "vec_init<mode><V_elem_l>"
> >> > > > > >>>>> -  [(match_operand:VDQ 0 "s_register_operand")
> >> > > > > >>>>> +  [(match_operand:VDQX 0 "s_register_operand")
> >> > > > > >>>>>       (match_operand 1 "" "")]
> >> > > > > >>>>>      "TARGET_NEON || TARGET_HAVE_MVE"
> >> > > > > >>>>>    {
> >> > > > > >>>>> +  if (TARGET_HAVE_MVE && GET_MODE_SIZE (GET_MODE
> >> > > > > >>>> (operands[0])) != 16)
> >> > > > > >>>>> +    FAIL;
> >> > > > > >>>>>      neon_expand_vector_init (operands[0], operands[1]);
> >> > > > > >>>>>      DONE;
> >> > > > > >>>>>    })
> >> > > > > >>>>>
> >> > > > > >>>>> I think we should move this to vec-common.md like
> Christophe
> >> > > said.
> >> > > > > >>>>> Perhaps rather than making it FAIL for non-16 MVE sizes
> we just
> >> > > disable it in
> >> > > > > >>>> the expander condition?
> >> > > > > >>>>> "TARGET_NEON || (TARGET_HAVE_MVE && GET_MODE_SIZE (<
> >> > > > > >>>> VDQ>mode) != 16)"
> >> > > > > >>>> Is it OK to use <MODE>mode ? Because using <VDQ>mode
> resulted
> >> > > in lot
> >> > > > > >>>> of build errors.
> >> > > > > >>>> Also, I think the comparison should be inverted, ie,
> GET_MODE_SIZE
> >> > > > > >>>> (<MODE>mode) == 16 since
> >> > > > > >>>> we want to make the pattern pass if target is MVE and
> vector size is
> >> > > 16 bytes ?
> >> > > > > >>>> Do these changes in attached patch look OK ?
> >> > > > > >>> Yes, you're right.
> >> > > > > >>
> >> > > > > >> Can't this be ARM_HAVE_<MODE>_ARITH like in most expanders in
> >> > > vec-common.md?
> >> > > > > >>
> >> > > > > >> (maybe with a && !TARGET_REALLY_IWMMXT if needed)
> >> > > > > > I wonder if this should be ARM_HAVE_<MODE>_LDST instead since
> >> > > we're
> >> > > > > > initializing the vector ?
> >> > > > >
> >> > > > >
> >> > > > > Well, it really depends on which modes you want to enable.
> >> > > > >
> >> > > > >
> >> > > > > Looks like your move VDQ -> VDQ adds V4BF, V8BF and DI.
> >> > > > >
> >> > > > > Are they all OK for Neon?
> >> > > > >
> >> > > > > They are not OK for MVE.
> >> > > > >
> >> > > > > Ideally you could add testcases to cover to the supported and
> >> > > > > unsupported modes for both Neon and MVE.\
> >> > > > >
> >> > > > > Before your patch, the expander is enabled for MVE for 64 bit
> modes
> >> > > > > (V8QI, V4HI, V2SI): what happens in this case? Does the
> compiler crash
> >> > > > > or is there something else preventing the match?
> >> > > > Hi,
> >> > > > Apparently there is VALID_MVE_MODE macro, so is it better to use:
> >> > > > TARGET_NEON || (TARGET_HAVE_MVE &&
> >> > > VALID_MVE_MODE(<MODE>mode))
> >> > > > as in the attached patch ?
> >> >
> >> > The change is ok. I would like to see some testcases like Christophe
> suggested, but this patch just moves the expander around rather than
> introducing new functionality.
> >> Hi Kyrill,
> >> As mentioned in the first email, the patch improves code-gen for
> >> following test-case:
> >>
> >> bfloat16x4_t f (bfloat16_t a)
> >> {
> >>   return (bfloat16x4_t) {a, a, a, a};
> >> }
> >>
> >> Before patch:
> >> f:
> >>         mov     r3, r0  @ __bf16
> >>         adr     r1, .L4
> >>         ldrd    r0, [r1]
> >>         mov     r2, r3  @ __bf16
> >>         mov     ip, r3  @ __bf16
> >>         bfi     r1, r2, #0, #16
> >>         bfi     r0, ip, #0, #16
> >>         bfi     r1, r3, #16, #16
> >>         bfi     r0, r2, #16, #16
> >>         bx      lr
> >>
> >> After patch:
> >> f:
> >>         vdup.16 d16, r0
> >>         vmov    r0, r1, d16  @ v4bf
> >>         bx      lr
> >>
> >> because the patch changes mode from VDQ to VDQX to accommodate bf modes.
> >> I have included the test in the attached patch.
> >> I think Christophe's concerns were mainly about the right modes
> >> getting enabled for MVE.
> >> Unfortunately, I am not sure how to test for that because the FE
> >> catches invalid modes, and we don't
> >> end up hitting the pattern.
> >>
> >
> > Hi Prathamesh,
> >
> > The new testcase fails on arm-linux-gnueabihf:
> >  FAIL: gcc.target/arm/simd/pr98435.c (test for excess errors)
> > Excess errors:
> >
> /aci-gcc-fsf/builds/gcc-fsf-gccsrc/sysroot-arm-none-linux-gnueabihf/usr/include/gnu/stubs.h:7:11:
> fatal error: gnu/stubs-soft.h: No such file or directory
> > compilation terminated.
> >
> > Because you don't check whether  -mfloat-abi=softfp is actually
> supported.
> >
> > Can you fix that?
> Oops, sorry about that.
> The attached patch fixes the test by requiring arm_softfloat and makes
> it UNSUPPORTED on arm-linux-gnueabihf.
> Does it look OK ?
>
>
I don't think that's right: it would make the test unsupported if softfp is
not the default even if the toolchain has the needed multilibs.
Did you check eg. with arm-eabi and multilibs enabled?

Christophe


> Thanks,
> Prathamesh
> >
> > Thanks
> >
> > Christophe
> >
> >
> >>
> >> Thanks,
> >> Prathamesh
> >> > Thanks,
> >> > Kyrill
> >> >
> >> > > ping
> https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574206.html
> >> > >
> >> > > Thanks,
> >> > > Prathamesh
> >> > > >
> >> > > > Thanks,
> >> > > > Prathamesh
> >> > > > >
> >> > > > >
> >> > > > > Thanks,
> >> > > > >
> >> > > > >
> >> > > > > Christophe
> >> > > > >
> >> > > > >
> >> > > > > > Thanks,
> >> > > > > > Prathamesh
> >> > > > > >>
> >> > > > > >> Christophe
> >> > > > > >>
> >> > > > > >>
> >> > > > > >>> Ok.
> >> > > > > >>> Thanks,
> >> > > > > >>> Kyrill
> >> > > > > >>>
> >> > > > > >>>
> >> > > > > >>>> Thanks,
> >> > > > > >>>> Prathamesh
> >> > > > > >>>>> Thanks,
> >> > > > > >>>>> Kyrill
> >> > > > > >>>>>
> >> > > > > >>>>>> Thanks,
> >> > > > > >>>>>> Prathamesh
> >> > > > > >>>>>>> Thanks,
> >> > > > > >>>>>>> Prathamesh
> >> > > > > >>>>>>>> That being said, I suggest you look at other similar
> patterns in
> >> > > > > >>>>>>>> vec-common.md, most of which are gated on
> >> > > > > >>>>>>>> ARM_HAVE_<MODE>_ARITH
> >> > > > > >>>>>>>> and possibly beware of issues with iwmmxt :-)
> >> > > > > >>>>>>>>
> >> > > > > >>>>>>>> Christophe
> >> > > > > >>>>>>>>
> >> > > > > >>>>>>>>> Thanks,
> >> > > > > >>>>>>>>> Prathamesh
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ARM] PR98435: Missed optimization in expanding vector constructor
  2021-08-03 15:22                               ` Christophe Lyon
@ 2021-08-05 12:27                                 ` Prathamesh Kulkarni
  2021-08-05 12:34                                   ` Christophe Lyon
  0 siblings, 1 reply; 29+ messages in thread
From: Prathamesh Kulkarni @ 2021-08-05 12:27 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: Kyrylo Tkachov, gcc Patches

[-- Attachment #1: Type: text/plain, Size: 13089 bytes --]

On Tue, 3 Aug 2021 at 20:52, Christophe Lyon
<christophe.lyon.oss@gmail.com> wrote:
>
>
>
> On Tue, Aug 3, 2021 at 12:57 PM Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> wrote:
>>
>> On Tue, 3 Aug 2021 at 14:59, Christophe Lyon
>> <christophe.lyon.oss@gmail.com> wrote:
>> >
>> >
>> >
>> > On Tue, Jul 6, 2021 at 11:26 AM Prathamesh Kulkarni via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>> >>
>> >> On Tue, 6 Jul 2021 at 13:33, Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> wrote:
>> >> >
>> >> >
>> >> >
>> >> > > -----Original Message-----
>> >> > > From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
>> >> > > Sent: 06 July 2021 08:06
>> >> > > To: Christophe LYON <christophe.lyon@foss.st.com>
>> >> > > Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; gcc Patches <gcc-
>> >> > > patches@gcc.gnu.org>
>> >> > > Subject: Re: [ARM] PR98435: Missed optimization in expanding vector
>> >> > > constructor
>> >> > >
>> >> > > On Thu, 1 Jul 2021 at 16:26, Prathamesh Kulkarni
>> >> > > <prathamesh.kulkarni@linaro.org> wrote:
>> >> > > >
>> >> > > > On Wed, 30 Jun 2021 at 20:51, Christophe LYON
>> >> > > > <christophe.lyon@foss.st.com> wrote:
>> >> > > > >
>> >> > > > >
>> >> > > > > On 29/06/2021 12:46, Prathamesh Kulkarni wrote:
>> >> > > > > > On Mon, 28 Jun 2021 at 14:48, Christophe LYON
>> >> > > > > > <christophe.lyon@foss.st.com> wrote:
>> >> > > > > >>
>> >> > > > > >> On 28/06/2021 10:40, Kyrylo Tkachov via Gcc-patches wrote:
>> >> > > > > >>>> -----Original Message-----
>> >> > > > > >>>> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
>> >> > > > > >>>> Sent: 28 June 2021 09:38
>> >> > > > > >>>> To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
>> >> > > > > >>>> Cc: Christophe Lyon <christophe.lyon@linaro.org>; gcc Patches
>> >> > > <gcc-
>> >> > > > > >>>> patches@gcc.gnu.org>
>> >> > > > > >>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding
>> >> > > vector
>> >> > > > > >>>> constructor
>> >> > > > > >>>>
>> >> > > > > >>>> On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov
>> >> > > <Kyrylo.Tkachov@arm.com>
>> >> > > > > >>>> wrote:
>> >> > > > > >>>>>
>> >> > > > > >>>>>> -----Original Message-----
>> >> > > > > >>>>>> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
>> >> > > > > >>>>>> Sent: 14 June 2021 09:02
>> >> > > > > >>>>>> To: Christophe Lyon <christophe.lyon@linaro.org>
>> >> > > > > >>>>>> Cc: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov
>> >> > > > > >>>>>> <Kyrylo.Tkachov@arm.com>
>> >> > > > > >>>>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding
>> >> > > vector
>> >> > > > > >>>>>> constructor
>> >> > > > > >>>>>>
>> >> > > > > >>>>>> On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni
>> >> > > > > >>>>>> <prathamesh.kulkarni@linaro.org> wrote:
>> >> > > > > >>>>>>> On Fri, 4 Jun 2021 at 13:15, Christophe Lyon
>> >> > > > > >>>> <christophe.lyon@linaro.org>
>> >> > > > > >>>>>> wrote:
>> >> > > > > >>>>>>>> On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via Gcc-
>> >> > > patches
>> >> > > > > >>>>>>>> <gcc-patches@gcc.gnu.org> wrote:
>> >> > > > > >>>>>>>>> Hi,
>> >> > > > > >>>>>>>>> As mentioned in PR, for the following test-case:
>> >> > > > > >>>>>>>>>
>> >> > > > > >>>>>>>>> #include <arm_neon.h>
>> >> > > > > >>>>>>>>>
>> >> > > > > >>>>>>>>> bfloat16x4_t f1 (bfloat16_t a)
>> >> > > > > >>>>>>>>> {
>> >> > > > > >>>>>>>>>     return vdup_n_bf16 (a);
>> >> > > > > >>>>>>>>> }
>> >> > > > > >>>>>>>>>
>> >> > > > > >>>>>>>>> bfloat16x4_t f2 (bfloat16_t a)
>> >> > > > > >>>>>>>>> {
>> >> > > > > >>>>>>>>>     return (bfloat16x4_t) {a, a, a, a};
>> >> > > > > >>>>>>>>> }
>> >> > > > > >>>>>>>>>
>> >> > > > > >>>>>>>>> Compiling with arm-linux-gnueabi -O3 -mfpu=neon -mfloat-
>> >> > > > > >>>> abi=softfp
>> >> > > > > >>>>>>>>> -march=armv8.2-a+bf16+fp16 results in f2 not being
>> >> > > vectorized:
>> >> > > > > >>>>>>>>>
>> >> > > > > >>>>>>>>> f1:
>> >> > > > > >>>>>>>>>           vdup.16 d16, r0
>> >> > > > > >>>>>>>>>           vmov    r0, r1, d16  @ v4bf
>> >> > > > > >>>>>>>>>           bx      lr
>> >> > > > > >>>>>>>>>
>> >> > > > > >>>>>>>>> f2:
>> >> > > > > >>>>>>>>>           mov     r3, r0  @ __bf16
>> >> > > > > >>>>>>>>>           adr     r1, .L4
>> >> > > > > >>>>>>>>>           ldrd    r0, [r1]
>> >> > > > > >>>>>>>>>           mov     r2, r3  @ __bf16
>> >> > > > > >>>>>>>>>           mov     ip, r3  @ __bf16
>> >> > > > > >>>>>>>>>           bfi     r1, r2, #0, #16
>> >> > > > > >>>>>>>>>           bfi     r0, ip, #0, #16
>> >> > > > > >>>>>>>>>           bfi     r1, r3, #16, #16
>> >> > > > > >>>>>>>>>           bfi     r0, r2, #16, #16
>> >> > > > > >>>>>>>>>           bx      lr
>> >> > > > > >>>>>>>>>
>> >> > > > > >>>>>>>>> This seems to happen because vec_init pattern in neon.md
>> >> > > has VDQ
>> >> > > > > >>>>>> mode
>> >> > > > > >>>>>>>>> iterator, which doesn't include V4BF. In attached patch, I
>> >> > > changed
>> >> > > > > >>>>>>>>> mode
>> >> > > > > >>>>>>>>> to VDQX which seems to work for the test-case, and the
>> >> > > compiler
>> >> > > > > >>>> now
>> >> > > > > >>>>>> generates:
>> >> > > > > >>>>>>>>> f2:
>> >> > > > > >>>>>>>>>           vdup.16 d16, r0
>> >> > > > > >>>>>>>>>           vmov    r0, r1, d16  @ v4bf
>> >> > > > > >>>>>>>>>           bx      lr
>> >> > > > > >>>>>>>>>
>> >> > > > > >>>>>>>>> However, the pattern is also gated on TARGET_HAVE_MVE
>> >> > > and I am
>> >> > > > > >>>>>> not
>> >> > > > > >>>>>>>>> sure if either VDQ or VDQX are correct modes for MVE since
>> >> > > MVE
>> >> > > > > >>>> has
>> >> > > > > >>>>>>>>> only 128-bit vectors ?
>> >> > > > > >>>>>>>>>
>> >> > > > > >>>>>>>> I think patterns common to both Neon and MVE should be
>> >> > > moved to
>> >> > > > > >>>>>>>> vec-common.md, I don't know why such patterns were left in
>> >> > > > > >>>> neon.md.
>> >> > > > > >>>>>>> Since we end up calling neon_expand_vector_init for both
>> >> > > NEON and
>> >> > > > > >>>> MVE,
>> >> > > > > >>>>>>> I am not sure if we should separate the pattern ?
>> >> > > > > >>>>>>> Would it make sense to FAIL if the mode size isn't 16 bytes for
>> >> > > MVE as
>> >> > > > > >>>>>>> in attached patch so
>> >> > > > > >>>>>>> it will call neon_expand_vector_init only for 128-bit vectors ?
>> >> > > > > >>>>>>> Altho hard-coding 16 in the pattern doesn't seem a good idea to
>> >> > > me
>> >> > > > > >>>> either.
>> >> > > > > >>>>>> ping https://gcc.gnu.org/pipermail/gcc-patches/2021-
>> >> > > June/572342.html
>> >> > > > > >>>>>> (attaching patch as text).
>> >> > > > > >>>>>>
>> >> > > > > >>>>> --- a/gcc/config/arm/neon.md
>> >> > > > > >>>>> +++ b/gcc/config/arm/neon.md
>> >> > > > > >>>>> @@ -459,10 +459,12 @@
>> >> > > > > >>>>>    )
>> >> > > > > >>>>>
>> >> > > > > >>>>>    (define_expand "vec_init<mode><V_elem_l>"
>> >> > > > > >>>>> -  [(match_operand:VDQ 0 "s_register_operand")
>> >> > > > > >>>>> +  [(match_operand:VDQX 0 "s_register_operand")
>> >> > > > > >>>>>       (match_operand 1 "" "")]
>> >> > > > > >>>>>      "TARGET_NEON || TARGET_HAVE_MVE"
>> >> > > > > >>>>>    {
>> >> > > > > >>>>> +  if (TARGET_HAVE_MVE && GET_MODE_SIZE (GET_MODE
>> >> > > > > >>>> (operands[0])) != 16)
>> >> > > > > >>>>> +    FAIL;
>> >> > > > > >>>>>      neon_expand_vector_init (operands[0], operands[1]);
>> >> > > > > >>>>>      DONE;
>> >> > > > > >>>>>    })
>> >> > > > > >>>>>
>> >> > > > > >>>>> I think we should move this to vec-common.md like Christophe
>> >> > > said.
>> >> > > > > >>>>> Perhaps rather than making it FAIL for non-16 MVE sizes we just
>> >> > > disable it in
>> >> > > > > >>>> the expander condition?
>> >> > > > > >>>>> "TARGET_NEON || (TARGET_HAVE_MVE && GET_MODE_SIZE (<
>> >> > > > > >>>> VDQ>mode) != 16)"
>> >> > > > > >>>> Is it OK to use <MODE>mode ? Because using <VDQ>mode resulted
>> >> > > in lot
>> >> > > > > >>>> of build errors.
>> >> > > > > >>>> Also, I think the comparison should be inverted, ie, GET_MODE_SIZE
>> >> > > > > >>>> (<MODE>mode) == 16 since
>> >> > > > > >>>> we want to make the pattern pass if target is MVE and vector size is
>> >> > > 16 bytes ?
>> >> > > > > >>>> Do these changes in attached patch look OK ?
>> >> > > > > >>> Yes, you're right.
>> >> > > > > >>
>> >> > > > > >> Can't this be ARM_HAVE_<MODE>_ARITH like in most expanders in
>> >> > > vec-common.md?
>> >> > > > > >>
>> >> > > > > >> (maybe with a && !TARGET_REALLY_IWMMXT if needed)
>> >> > > > > > I wonder if this should be ARM_HAVE_<MODE>_LDST instead since
>> >> > > we're
>> >> > > > > > initializing the vector ?
>> >> > > > >
>> >> > > > >
>> >> > > > > Well, it really depends on which modes you want to enable.
>> >> > > > >
>> >> > > > >
>> >> > > > > Looks like your move VDQ -> VDQ adds V4BF, V8BF and DI.
>> >> > > > >
>> >> > > > > Are they all OK for Neon?
>> >> > > > >
>> >> > > > > They are not OK for MVE.
>> >> > > > >
>> >> > > > > Ideally you could add testcases to cover to the supported and
>> >> > > > > unsupported modes for both Neon and MVE.\
>> >> > > > >
>> >> > > > > Before your patch, the expander is enabled for MVE for 64 bit modes
>> >> > > > > (V8QI, V4HI, V2SI): what happens in this case? Does the compiler crash
>> >> > > > > or is there something else preventing the match?
>> >> > > > Hi,
>> >> > > > Apparently there is VALID_MVE_MODE macro, so is it better to use:
>> >> > > > TARGET_NEON || (TARGET_HAVE_MVE &&
>> >> > > VALID_MVE_MODE(<MODE>mode))
>> >> > > > as in the attached patch ?
>> >> >
>> >> > The change is ok. I would like to see some testcases like Christophe suggested, but this patch just moves the expander around rather than introducing new functionality.
>> >> Hi Kyrill,
>> >> As mentioned in the first email, the patch improves code-gen for
>> >> following test-case:
>> >>
>> >> bfloat16x4_t f (bfloat16_t a)
>> >> {
>> >>   return (bfloat16x4_t) {a, a, a, a};
>> >> }
>> >>
>> >> Before patch:
>> >> f:
>> >>         mov     r3, r0  @ __bf16
>> >>         adr     r1, .L4
>> >>         ldrd    r0, [r1]
>> >>         mov     r2, r3  @ __bf16
>> >>         mov     ip, r3  @ __bf16
>> >>         bfi     r1, r2, #0, #16
>> >>         bfi     r0, ip, #0, #16
>> >>         bfi     r1, r3, #16, #16
>> >>         bfi     r0, r2, #16, #16
>> >>         bx      lr
>> >>
>> >> After patch:
>> >> f:
>> >>         vdup.16 d16, r0
>> >>         vmov    r0, r1, d16  @ v4bf
>> >>         bx      lr
>> >>
>> >> because the patch changes mode from VDQ to VDQX to accommodate bf modes.
>> >> I have included the test in the attached patch.
>> >> I think Christophe's concerns were mainly about the right modes
>> >> getting enabled for MVE.
>> >> Unfortunately, I am not sure how to test for that because the FE
>> >> catches invalid modes, and we don't
>> >> end up hitting the pattern.
>> >>
>> >
>> > Hi Prathamesh,
>> >
>> > The new testcase fails on arm-linux-gnueabihf:
>> >  FAIL: gcc.target/arm/simd/pr98435.c (test for excess errors)
>> > Excess errors:
>> > /aci-gcc-fsf/builds/gcc-fsf-gccsrc/sysroot-arm-none-linux-gnueabihf/usr/include/gnu/stubs.h:7:11: fatal error: gnu/stubs-soft.h: No such file or directory
>> > compilation terminated.
>> >
>> > Because you don't check whether  -mfloat-abi=softfp is actually supported.
>> >
>> > Can you fix that?
>> Oops, sorry about that.
>> The attached patch fixes the test by requiring arm_softfloat and makes
>> it UNSUPPORTED on arm-linux-gnueabihf.
>> Does it look OK ?
>>
>
> I don't think that's right: it would make the test unsupported if softfp is not the default even if the toolchain has the needed multilibs.
> Did you check eg. with arm-eabi and multilibs enabled?
Ah OK, thanks for pointing it out!
Does the attached patch look correct ?

Thanks,
Prathamesh
>
> Christophe
>
>>
>> Thanks,
>> Prathamesh
>> >
>> > Thanks
>> >
>> > Christophe
>> >
>> >
>> >>
>> >> Thanks,
>> >> Prathamesh
>> >> > Thanks,
>> >> > Kyrill
>> >> >
>> >> > > ping https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574206.html
>> >> > >
>> >> > > Thanks,
>> >> > > Prathamesh
>> >> > > >
>> >> > > > Thanks,
>> >> > > > Prathamesh
>> >> > > > >
>> >> > > > >
>> >> > > > > Thanks,
>> >> > > > >
>> >> > > > >
>> >> > > > > Christophe
>> >> > > > >
>> >> > > > >
>> >> > > > > > Thanks,
>> >> > > > > > Prathamesh
>> >> > > > > >>
>> >> > > > > >> Christophe
>> >> > > > > >>
>> >> > > > > >>
>> >> > > > > >>> Ok.
>> >> > > > > >>> Thanks,
>> >> > > > > >>> Kyrill
>> >> > > > > >>>
>> >> > > > > >>>
>> >> > > > > >>>> Thanks,
>> >> > > > > >>>> Prathamesh
>> >> > > > > >>>>> Thanks,
>> >> > > > > >>>>> Kyrill
>> >> > > > > >>>>>
>> >> > > > > >>>>>> Thanks,
>> >> > > > > >>>>>> Prathamesh
>> >> > > > > >>>>>>> Thanks,
>> >> > > > > >>>>>>> Prathamesh
>> >> > > > > >>>>>>>> That being said, I suggest you look at other similar patterns in
>> >> > > > > >>>>>>>> vec-common.md, most of which are gated on
>> >> > > > > >>>>>>>> ARM_HAVE_<MODE>_ARITH
>> >> > > > > >>>>>>>> and possibly beware of issues with iwmmxt :-)
>> >> > > > > >>>>>>>>
>> >> > > > > >>>>>>>> Christophe
>> >> > > > > >>>>>>>>
>> >> > > > > >>>>>>>>> Thanks,
>> >> > > > > >>>>>>>>> Prathamesh

[-- Attachment #2: pr98435-test-fix-2.txt --]
[-- Type: text/plain, Size: 562 bytes --]

diff --git a/gcc/testsuite/gcc.target/arm/simd/pr98435.c b/gcc/testsuite/gcc.target/arm/simd/pr98435.c
index 0af8633fd56..b7ba511e2d9 100644
--- a/gcc/testsuite/gcc.target/arm/simd/pr98435.c
+++ b/gcc/testsuite/gcc.target/arm/simd/pr98435.c
@@ -3,6 +3,7 @@
 /* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
 /* { dg-add-options arm_v8_2a_bf16_neon } */
 /* { dg-additional-options "-mfloat-abi=softfp -march=armv8.2-a+bf16+fp16" } */
+/* { dg-skip-if "skip test for hard float" { *-*-* } { "-mfloat-abi=hard" } { "" } } */
 
 #include <arm_neon.h>
 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ARM] PR98435: Missed optimization in expanding vector constructor
  2021-08-05 12:27                                 ` Prathamesh Kulkarni
@ 2021-08-05 12:34                                   ` Christophe Lyon
  2021-08-06  8:59                                     ` Prathamesh Kulkarni
  0 siblings, 1 reply; 29+ messages in thread
From: Christophe Lyon @ 2021-08-05 12:34 UTC (permalink / raw)
  To: Prathamesh Kulkarni; +Cc: Kyrylo Tkachov, gcc Patches

On Thu, Aug 5, 2021 at 2:28 PM Prathamesh Kulkarni <
prathamesh.kulkarni@linaro.org> wrote:

> On Tue, 3 Aug 2021 at 20:52, Christophe Lyon
> <christophe.lyon.oss@gmail.com> wrote:
> >
> >
> >
> > On Tue, Aug 3, 2021 at 12:57 PM Prathamesh Kulkarni <
> prathamesh.kulkarni@linaro.org> wrote:
> >>
> >> On Tue, 3 Aug 2021 at 14:59, Christophe Lyon
> >> <christophe.lyon.oss@gmail.com> wrote:
> >> >
> >> >
> >> >
> >> > On Tue, Jul 6, 2021 at 11:26 AM Prathamesh Kulkarni via Gcc-patches <
> gcc-patches@gcc.gnu.org> wrote:
> >> >>
> >> >> On Tue, 6 Jul 2021 at 13:33, Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> wrote:
> >> >> >
> >> >> >
> >> >> >
> >> >> > > -----Original Message-----
> >> >> > > From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> >> >> > > Sent: 06 July 2021 08:06
> >> >> > > To: Christophe LYON <christophe.lyon@foss.st.com>
> >> >> > > Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; gcc Patches <gcc-
> >> >> > > patches@gcc.gnu.org>
> >> >> > > Subject: Re: [ARM] PR98435: Missed optimization in expanding
> vector
> >> >> > > constructor
> >> >> > >
> >> >> > > On Thu, 1 Jul 2021 at 16:26, Prathamesh Kulkarni
> >> >> > > <prathamesh.kulkarni@linaro.org> wrote:
> >> >> > > >
> >> >> > > > On Wed, 30 Jun 2021 at 20:51, Christophe LYON
> >> >> > > > <christophe.lyon@foss.st.com> wrote:
> >> >> > > > >
> >> >> > > > >
> >> >> > > > > On 29/06/2021 12:46, Prathamesh Kulkarni wrote:
> >> >> > > > > > On Mon, 28 Jun 2021 at 14:48, Christophe LYON
> >> >> > > > > > <christophe.lyon@foss.st.com> wrote:
> >> >> > > > > >>
> >> >> > > > > >> On 28/06/2021 10:40, Kyrylo Tkachov via Gcc-patches wrote:
> >> >> > > > > >>>> -----Original Message-----
> >> >> > > > > >>>> From: Prathamesh Kulkarni <
> prathamesh.kulkarni@linaro.org>
> >> >> > > > > >>>> Sent: 28 June 2021 09:38
> >> >> > > > > >>>> To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> >> >> > > > > >>>> Cc: Christophe Lyon <christophe.lyon@linaro.org>; gcc
> Patches
> >> >> > > <gcc-
> >> >> > > > > >>>> patches@gcc.gnu.org>
> >> >> > > > > >>>> Subject: Re: [ARM] PR98435: Missed optimization in
> expanding
> >> >> > > vector
> >> >> > > > > >>>> constructor
> >> >> > > > > >>>>
> >> >> > > > > >>>> On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov
> >> >> > > <Kyrylo.Tkachov@arm.com>
> >> >> > > > > >>>> wrote:
> >> >> > > > > >>>>>
> >> >> > > > > >>>>>> -----Original Message-----
> >> >> > > > > >>>>>> From: Prathamesh Kulkarni <
> prathamesh.kulkarni@linaro.org>
> >> >> > > > > >>>>>> Sent: 14 June 2021 09:02
> >> >> > > > > >>>>>> To: Christophe Lyon <christophe.lyon@linaro.org>
> >> >> > > > > >>>>>> Cc: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo
> Tkachov
> >> >> > > > > >>>>>> <Kyrylo.Tkachov@arm.com>
> >> >> > > > > >>>>>> Subject: Re: [ARM] PR98435: Missed optimization in
> expanding
> >> >> > > vector
> >> >> > > > > >>>>>> constructor
> >> >> > > > > >>>>>>
> >> >> > > > > >>>>>> On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni
> >> >> > > > > >>>>>> <prathamesh.kulkarni@linaro.org> wrote:
> >> >> > > > > >>>>>>> On Fri, 4 Jun 2021 at 13:15, Christophe Lyon
> >> >> > > > > >>>> <christophe.lyon@linaro.org>
> >> >> > > > > >>>>>> wrote:
> >> >> > > > > >>>>>>>> On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni
> via Gcc-
> >> >> > > patches
> >> >> > > > > >>>>>>>> <gcc-patches@gcc.gnu.org> wrote:
> >> >> > > > > >>>>>>>>> Hi,
> >> >> > > > > >>>>>>>>> As mentioned in PR, for the following test-case:
> >> >> > > > > >>>>>>>>>
> >> >> > > > > >>>>>>>>> #include <arm_neon.h>
> >> >> > > > > >>>>>>>>>
> >> >> > > > > >>>>>>>>> bfloat16x4_t f1 (bfloat16_t a)
> >> >> > > > > >>>>>>>>> {
> >> >> > > > > >>>>>>>>>     return vdup_n_bf16 (a);
> >> >> > > > > >>>>>>>>> }
> >> >> > > > > >>>>>>>>>
> >> >> > > > > >>>>>>>>> bfloat16x4_t f2 (bfloat16_t a)
> >> >> > > > > >>>>>>>>> {
> >> >> > > > > >>>>>>>>>     return (bfloat16x4_t) {a, a, a, a};
> >> >> > > > > >>>>>>>>> }
> >> >> > > > > >>>>>>>>>
> >> >> > > > > >>>>>>>>> Compiling with arm-linux-gnueabi -O3 -mfpu=neon
> -mfloat-
> >> >> > > > > >>>> abi=softfp
> >> >> > > > > >>>>>>>>> -march=armv8.2-a+bf16+fp16 results in f2 not being
> >> >> > > vectorized:
> >> >> > > > > >>>>>>>>>
> >> >> > > > > >>>>>>>>> f1:
> >> >> > > > > >>>>>>>>>           vdup.16 d16, r0
> >> >> > > > > >>>>>>>>>           vmov    r0, r1, d16  @ v4bf
> >> >> > > > > >>>>>>>>>           bx      lr
> >> >> > > > > >>>>>>>>>
> >> >> > > > > >>>>>>>>> f2:
> >> >> > > > > >>>>>>>>>           mov     r3, r0  @ __bf16
> >> >> > > > > >>>>>>>>>           adr     r1, .L4
> >> >> > > > > >>>>>>>>>           ldrd    r0, [r1]
> >> >> > > > > >>>>>>>>>           mov     r2, r3  @ __bf16
> >> >> > > > > >>>>>>>>>           mov     ip, r3  @ __bf16
> >> >> > > > > >>>>>>>>>           bfi     r1, r2, #0, #16
> >> >> > > > > >>>>>>>>>           bfi     r0, ip, #0, #16
> >> >> > > > > >>>>>>>>>           bfi     r1, r3, #16, #16
> >> >> > > > > >>>>>>>>>           bfi     r0, r2, #16, #16
> >> >> > > > > >>>>>>>>>           bx      lr
> >> >> > > > > >>>>>>>>>
> >> >> > > > > >>>>>>>>> This seems to happen because vec_init pattern in
> neon.md
> >> >> > > has VDQ
> >> >> > > > > >>>>>> mode
> >> >> > > > > >>>>>>>>> iterator, which doesn't include V4BF. In attached
> patch, I
> >> >> > > changed
> >> >> > > > > >>>>>>>>> mode
> >> >> > > > > >>>>>>>>> to VDQX which seems to work for the test-case, and
> the
> >> >> > > compiler
> >> >> > > > > >>>> now
> >> >> > > > > >>>>>> generates:
> >> >> > > > > >>>>>>>>> f2:
> >> >> > > > > >>>>>>>>>           vdup.16 d16, r0
> >> >> > > > > >>>>>>>>>           vmov    r0, r1, d16  @ v4bf
> >> >> > > > > >>>>>>>>>           bx      lr
> >> >> > > > > >>>>>>>>>
> >> >> > > > > >>>>>>>>> However, the pattern is also gated on
> TARGET_HAVE_MVE
> >> >> > > and I am
> >> >> > > > > >>>>>> not
> >> >> > > > > >>>>>>>>> sure if either VDQ or VDQX are correct modes for
> MVE since
> >> >> > > MVE
> >> >> > > > > >>>> has
> >> >> > > > > >>>>>>>>> only 128-bit vectors ?
> >> >> > > > > >>>>>>>>>
> >> >> > > > > >>>>>>>> I think patterns common to both Neon and MVE should
> be
> >> >> > > moved to
> >> >> > > > > >>>>>>>> vec-common.md, I don't know why such patterns were
> left in
> >> >> > > > > >>>> neon.md.
> >> >> > > > > >>>>>>> Since we end up calling neon_expand_vector_init for
> both
> >> >> > > NEON and
> >> >> > > > > >>>> MVE,
> >> >> > > > > >>>>>>> I am not sure if we should separate the pattern ?
> >> >> > > > > >>>>>>> Would it make sense to FAIL if the mode size isn't
> 16 bytes for
> >> >> > > MVE as
> >> >> > > > > >>>>>>> in attached patch so
> >> >> > > > > >>>>>>> it will call neon_expand_vector_init only for
> 128-bit vectors ?
> >> >> > > > > >>>>>>> Altho hard-coding 16 in the pattern doesn't seem a
> good idea to
> >> >> > > me
> >> >> > > > > >>>> either.
> >> >> > > > > >>>>>> ping https://gcc.gnu.org/pipermail/gcc-patches/2021-
> >> >> > > June/572342.html
> >> >> > > > > >>>>>> (attaching patch as text).
> >> >> > > > > >>>>>>
> >> >> > > > > >>>>> --- a/gcc/config/arm/neon.md
> >> >> > > > > >>>>> +++ b/gcc/config/arm/neon.md
> >> >> > > > > >>>>> @@ -459,10 +459,12 @@
> >> >> > > > > >>>>>    )
> >> >> > > > > >>>>>
> >> >> > > > > >>>>>    (define_expand "vec_init<mode><V_elem_l>"
> >> >> > > > > >>>>> -  [(match_operand:VDQ 0 "s_register_operand")
> >> >> > > > > >>>>> +  [(match_operand:VDQX 0 "s_register_operand")
> >> >> > > > > >>>>>       (match_operand 1 "" "")]
> >> >> > > > > >>>>>      "TARGET_NEON || TARGET_HAVE_MVE"
> >> >> > > > > >>>>>    {
> >> >> > > > > >>>>> +  if (TARGET_HAVE_MVE && GET_MODE_SIZE (GET_MODE
> >> >> > > > > >>>> (operands[0])) != 16)
> >> >> > > > > >>>>> +    FAIL;
> >> >> > > > > >>>>>      neon_expand_vector_init (operands[0],
> operands[1]);
> >> >> > > > > >>>>>      DONE;
> >> >> > > > > >>>>>    })
> >> >> > > > > >>>>>
> >> >> > > > > >>>>> I think we should move this to vec-common.md like
> Christophe
> >> >> > > said.
> >> >> > > > > >>>>> Perhaps rather than making it FAIL for non-16 MVE
> sizes we just
> >> >> > > disable it in
> >> >> > > > > >>>> the expander condition?
> >> >> > > > > >>>>> "TARGET_NEON || (TARGET_HAVE_MVE && GET_MODE_SIZE (<
> >> >> > > > > >>>> VDQ>mode) != 16)"
> >> >> > > > > >>>> Is it OK to use <MODE>mode ? Because using <VDQ>mode
> resulted
> >> >> > > in lot
> >> >> > > > > >>>> of build errors.
> >> >> > > > > >>>> Also, I think the comparison should be inverted, ie,
> GET_MODE_SIZE
> >> >> > > > > >>>> (<MODE>mode) == 16 since
> >> >> > > > > >>>> we want to make the pattern pass if target is MVE and
> vector size is
> >> >> > > 16 bytes ?
> >> >> > > > > >>>> Do these changes in attached patch look OK ?
> >> >> > > > > >>> Yes, you're right.
> >> >> > > > > >>
> >> >> > > > > >> Can't this be ARM_HAVE_<MODE>_ARITH like in most
> expanders in
> >> >> > > vec-common.md?
> >> >> > > > > >>
> >> >> > > > > >> (maybe with a && !TARGET_REALLY_IWMMXT if needed)
> >> >> > > > > > I wonder if this should be ARM_HAVE_<MODE>_LDST instead
> since
> >> >> > > we're
> >> >> > > > > > initializing the vector ?
> >> >> > > > >
> >> >> > > > >
> >> >> > > > > Well, it really depends on which modes you want to enable.
> >> >> > > > >
> >> >> > > > >
> >> >> > > > > Looks like your move VDQ -> VDQ adds V4BF, V8BF and DI.
> >> >> > > > >
> >> >> > > > > Are they all OK for Neon?
> >> >> > > > >
> >> >> > > > > They are not OK for MVE.
> >> >> > > > >
> >> >> > > > > Ideally you could add testcases to cover to the supported and
> >> >> > > > > unsupported modes for both Neon and MVE.\
> >> >> > > > >
> >> >> > > > > Before your patch, the expander is enabled for MVE for 64
> bit modes
> >> >> > > > > (V8QI, V4HI, V2SI): what happens in this case? Does the
> compiler crash
> >> >> > > > > or is there something else preventing the match?
> >> >> > > > Hi,
> >> >> > > > Apparently there is VALID_MVE_MODE macro, so is it better to
> use:
> >> >> > > > TARGET_NEON || (TARGET_HAVE_MVE &&
> >> >> > > VALID_MVE_MODE(<MODE>mode))
> >> >> > > > as in the attached patch ?
> >> >> >
> >> >> > The change is ok. I would like to see some testcases like
> Christophe suggested, but this patch just moves the expander around rather
> than introducing new functionality.
> >> >> Hi Kyrill,
> >> >> As mentioned in the first email, the patch improves code-gen for
> >> >> following test-case:
> >> >>
> >> >> bfloat16x4_t f (bfloat16_t a)
> >> >> {
> >> >>   return (bfloat16x4_t) {a, a, a, a};
> >> >> }
> >> >>
> >> >> Before patch:
> >> >> f:
> >> >>         mov     r3, r0  @ __bf16
> >> >>         adr     r1, .L4
> >> >>         ldrd    r0, [r1]
> >> >>         mov     r2, r3  @ __bf16
> >> >>         mov     ip, r3  @ __bf16
> >> >>         bfi     r1, r2, #0, #16
> >> >>         bfi     r0, ip, #0, #16
> >> >>         bfi     r1, r3, #16, #16
> >> >>         bfi     r0, r2, #16, #16
> >> >>         bx      lr
> >> >>
> >> >> After patch:
> >> >> f:
> >> >>         vdup.16 d16, r0
> >> >>         vmov    r0, r1, d16  @ v4bf
> >> >>         bx      lr
> >> >>
> >> >> because the patch changes mode from VDQ to VDQX to accommodate bf
> modes.
> >> >> I have included the test in the attached patch.
> >> >> I think Christophe's concerns were mainly about the right modes
> >> >> getting enabled for MVE.
> >> >> Unfortunately, I am not sure how to test for that because the FE
> >> >> catches invalid modes, and we don't
> >> >> end up hitting the pattern.
> >> >>
> >> >
> >> > Hi Prathamesh,
> >> >
> >> > The new testcase fails on arm-linux-gnueabihf:
> >> >  FAIL: gcc.target/arm/simd/pr98435.c (test for excess errors)
> >> > Excess errors:
> >> >
> /aci-gcc-fsf/builds/gcc-fsf-gccsrc/sysroot-arm-none-linux-gnueabihf/usr/include/gnu/stubs.h:7:11:
> fatal error: gnu/stubs-soft.h: No such file or directory
> >> > compilation terminated.
> >> >
> >> > Because you don't check whether  -mfloat-abi=softfp is actually
> supported.
> >> >
> >> > Can you fix that?
> >> Oops, sorry about that.
> >> The attached patch fixes the test by requiring arm_softfloat and makes
> >> it UNSUPPORTED on arm-linux-gnueabihf.
> >> Does it look OK ?
> >>
> >
> > I don't think that's right: it would make the test unsupported if softfp
> is not the default even if the toolchain has the needed multilibs.
> > Did you check eg. with arm-eabi and multilibs enabled?
> Ah OK, thanks for pointing it out!
> Does the attached patch look correct ?
>
>
I don't think: this would skip the test even if the toolchain has multilibs
enabled.
Did you check eg. with arm-eabi and multilibs enabled and the usual option
overrides?


Christophe

Thanks,
> Prathamesh
> >
> > Christophe
> >
> >>
> >> Thanks,
> >> Prathamesh
> >> >
> >> > Thanks
> >> >
> >> > Christophe
> >> >
> >> >
> >> >>
> >> >> Thanks,
> >> >> Prathamesh
> >> >> > Thanks,
> >> >> > Kyrill
> >> >> >
> >> >> > > ping
> https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574206.html
> >> >> > >
> >> >> > > Thanks,
> >> >> > > Prathamesh
> >> >> > > >
> >> >> > > > Thanks,
> >> >> > > > Prathamesh
> >> >> > > > >
> >> >> > > > >
> >> >> > > > > Thanks,
> >> >> > > > >
> >> >> > > > >
> >> >> > > > > Christophe
> >> >> > > > >
> >> >> > > > >
> >> >> > > > > > Thanks,
> >> >> > > > > > Prathamesh
> >> >> > > > > >>
> >> >> > > > > >> Christophe
> >> >> > > > > >>
> >> >> > > > > >>
> >> >> > > > > >>> Ok.
> >> >> > > > > >>> Thanks,
> >> >> > > > > >>> Kyrill
> >> >> > > > > >>>
> >> >> > > > > >>>
> >> >> > > > > >>>> Thanks,
> >> >> > > > > >>>> Prathamesh
> >> >> > > > > >>>>> Thanks,
> >> >> > > > > >>>>> Kyrill
> >> >> > > > > >>>>>
> >> >> > > > > >>>>>> Thanks,
> >> >> > > > > >>>>>> Prathamesh
> >> >> > > > > >>>>>>> Thanks,
> >> >> > > > > >>>>>>> Prathamesh
> >> >> > > > > >>>>>>>> That being said, I suggest you look at other
> similar patterns in
> >> >> > > > > >>>>>>>> vec-common.md, most of which are gated on
> >> >> > > > > >>>>>>>> ARM_HAVE_<MODE>_ARITH
> >> >> > > > > >>>>>>>> and possibly beware of issues with iwmmxt :-)
> >> >> > > > > >>>>>>>>
> >> >> > > > > >>>>>>>> Christophe
> >> >> > > > > >>>>>>>>
> >> >> > > > > >>>>>>>>> Thanks,
> >> >> > > > > >>>>>>>>> Prathamesh
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ARM] PR98435: Missed optimization in expanding vector constructor
  2021-08-05 12:34                                   ` Christophe Lyon
@ 2021-08-06  8:59                                     ` Prathamesh Kulkarni
  2021-08-06  9:19                                       ` Christophe Lyon
  0 siblings, 1 reply; 29+ messages in thread
From: Prathamesh Kulkarni @ 2021-08-06  8:59 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: Kyrylo Tkachov, gcc Patches

On Thu, 5 Aug 2021 at 18:05, Christophe Lyon
<christophe.lyon.oss@gmail.com> wrote:
>
>
>
> On Thu, Aug 5, 2021 at 2:28 PM Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> wrote:
>>
>> On Tue, 3 Aug 2021 at 20:52, Christophe Lyon
>> <christophe.lyon.oss@gmail.com> wrote:
>> >
>> >
>> >
>> > On Tue, Aug 3, 2021 at 12:57 PM Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> wrote:
>> >>
>> >> On Tue, 3 Aug 2021 at 14:59, Christophe Lyon
>> >> <christophe.lyon.oss@gmail.com> wrote:
>> >> >
>> >> >
>> >> >
>> >> > On Tue, Jul 6, 2021 at 11:26 AM Prathamesh Kulkarni via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>> >> >>
>> >> >> On Tue, 6 Jul 2021 at 13:33, Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> wrote:
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > > -----Original Message-----
>> >> >> > > From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
>> >> >> > > Sent: 06 July 2021 08:06
>> >> >> > > To: Christophe LYON <christophe.lyon@foss.st.com>
>> >> >> > > Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; gcc Patches <gcc-
>> >> >> > > patches@gcc.gnu.org>
>> >> >> > > Subject: Re: [ARM] PR98435: Missed optimization in expanding vector
>> >> >> > > constructor
>> >> >> > >
>> >> >> > > On Thu, 1 Jul 2021 at 16:26, Prathamesh Kulkarni
>> >> >> > > <prathamesh.kulkarni@linaro.org> wrote:
>> >> >> > > >
>> >> >> > > > On Wed, 30 Jun 2021 at 20:51, Christophe LYON
>> >> >> > > > <christophe.lyon@foss.st.com> wrote:
>> >> >> > > > >
>> >> >> > > > >
>> >> >> > > > > On 29/06/2021 12:46, Prathamesh Kulkarni wrote:
>> >> >> > > > > > On Mon, 28 Jun 2021 at 14:48, Christophe LYON
>> >> >> > > > > > <christophe.lyon@foss.st.com> wrote:
>> >> >> > > > > >>
>> >> >> > > > > >> On 28/06/2021 10:40, Kyrylo Tkachov via Gcc-patches wrote:
>> >> >> > > > > >>>> -----Original Message-----
>> >> >> > > > > >>>> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
>> >> >> > > > > >>>> Sent: 28 June 2021 09:38
>> >> >> > > > > >>>> To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
>> >> >> > > > > >>>> Cc: Christophe Lyon <christophe.lyon@linaro.org>; gcc Patches
>> >> >> > > <gcc-
>> >> >> > > > > >>>> patches@gcc.gnu.org>
>> >> >> > > > > >>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding
>> >> >> > > vector
>> >> >> > > > > >>>> constructor
>> >> >> > > > > >>>>
>> >> >> > > > > >>>> On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov
>> >> >> > > <Kyrylo.Tkachov@arm.com>
>> >> >> > > > > >>>> wrote:
>> >> >> > > > > >>>>>
>> >> >> > > > > >>>>>> -----Original Message-----
>> >> >> > > > > >>>>>> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
>> >> >> > > > > >>>>>> Sent: 14 June 2021 09:02
>> >> >> > > > > >>>>>> To: Christophe Lyon <christophe.lyon@linaro.org>
>> >> >> > > > > >>>>>> Cc: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov
>> >> >> > > > > >>>>>> <Kyrylo.Tkachov@arm.com>
>> >> >> > > > > >>>>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding
>> >> >> > > vector
>> >> >> > > > > >>>>>> constructor
>> >> >> > > > > >>>>>>
>> >> >> > > > > >>>>>> On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni
>> >> >> > > > > >>>>>> <prathamesh.kulkarni@linaro.org> wrote:
>> >> >> > > > > >>>>>>> On Fri, 4 Jun 2021 at 13:15, Christophe Lyon
>> >> >> > > > > >>>> <christophe.lyon@linaro.org>
>> >> >> > > > > >>>>>> wrote:
>> >> >> > > > > >>>>>>>> On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via Gcc-
>> >> >> > > patches
>> >> >> > > > > >>>>>>>> <gcc-patches@gcc.gnu.org> wrote:
>> >> >> > > > > >>>>>>>>> Hi,
>> >> >> > > > > >>>>>>>>> As mentioned in PR, for the following test-case:
>> >> >> > > > > >>>>>>>>>
>> >> >> > > > > >>>>>>>>> #include <arm_neon.h>
>> >> >> > > > > >>>>>>>>>
>> >> >> > > > > >>>>>>>>> bfloat16x4_t f1 (bfloat16_t a)
>> >> >> > > > > >>>>>>>>> {
>> >> >> > > > > >>>>>>>>>     return vdup_n_bf16 (a);
>> >> >> > > > > >>>>>>>>> }
>> >> >> > > > > >>>>>>>>>
>> >> >> > > > > >>>>>>>>> bfloat16x4_t f2 (bfloat16_t a)
>> >> >> > > > > >>>>>>>>> {
>> >> >> > > > > >>>>>>>>>     return (bfloat16x4_t) {a, a, a, a};
>> >> >> > > > > >>>>>>>>> }
>> >> >> > > > > >>>>>>>>>
>> >> >> > > > > >>>>>>>>> Compiling with arm-linux-gnueabi -O3 -mfpu=neon -mfloat-
>> >> >> > > > > >>>> abi=softfp
>> >> >> > > > > >>>>>>>>> -march=armv8.2-a+bf16+fp16 results in f2 not being
>> >> >> > > vectorized:
>> >> >> > > > > >>>>>>>>>
>> >> >> > > > > >>>>>>>>> f1:
>> >> >> > > > > >>>>>>>>>           vdup.16 d16, r0
>> >> >> > > > > >>>>>>>>>           vmov    r0, r1, d16  @ v4bf
>> >> >> > > > > >>>>>>>>>           bx      lr
>> >> >> > > > > >>>>>>>>>
>> >> >> > > > > >>>>>>>>> f2:
>> >> >> > > > > >>>>>>>>>           mov     r3, r0  @ __bf16
>> >> >> > > > > >>>>>>>>>           adr     r1, .L4
>> >> >> > > > > >>>>>>>>>           ldrd    r0, [r1]
>> >> >> > > > > >>>>>>>>>           mov     r2, r3  @ __bf16
>> >> >> > > > > >>>>>>>>>           mov     ip, r3  @ __bf16
>> >> >> > > > > >>>>>>>>>           bfi     r1, r2, #0, #16
>> >> >> > > > > >>>>>>>>>           bfi     r0, ip, #0, #16
>> >> >> > > > > >>>>>>>>>           bfi     r1, r3, #16, #16
>> >> >> > > > > >>>>>>>>>           bfi     r0, r2, #16, #16
>> >> >> > > > > >>>>>>>>>           bx      lr
>> >> >> > > > > >>>>>>>>>
>> >> >> > > > > >>>>>>>>> This seems to happen because vec_init pattern in neon.md
>> >> >> > > has VDQ
>> >> >> > > > > >>>>>> mode
>> >> >> > > > > >>>>>>>>> iterator, which doesn't include V4BF. In attached patch, I
>> >> >> > > changed
>> >> >> > > > > >>>>>>>>> mode
>> >> >> > > > > >>>>>>>>> to VDQX which seems to work for the test-case, and the
>> >> >> > > compiler
>> >> >> > > > > >>>> now
>> >> >> > > > > >>>>>> generates:
>> >> >> > > > > >>>>>>>>> f2:
>> >> >> > > > > >>>>>>>>>           vdup.16 d16, r0
>> >> >> > > > > >>>>>>>>>           vmov    r0, r1, d16  @ v4bf
>> >> >> > > > > >>>>>>>>>           bx      lr
>> >> >> > > > > >>>>>>>>>
>> >> >> > > > > >>>>>>>>> However, the pattern is also gated on TARGET_HAVE_MVE
>> >> >> > > and I am
>> >> >> > > > > >>>>>> not
>> >> >> > > > > >>>>>>>>> sure if either VDQ or VDQX are correct modes for MVE since
>> >> >> > > MVE
>> >> >> > > > > >>>> has
>> >> >> > > > > >>>>>>>>> only 128-bit vectors ?
>> >> >> > > > > >>>>>>>>>
>> >> >> > > > > >>>>>>>> I think patterns common to both Neon and MVE should be
>> >> >> > > moved to
>> >> >> > > > > >>>>>>>> vec-common.md, I don't know why such patterns were left in
>> >> >> > > > > >>>> neon.md.
>> >> >> > > > > >>>>>>> Since we end up calling neon_expand_vector_init for both
>> >> >> > > NEON and
>> >> >> > > > > >>>> MVE,
>> >> >> > > > > >>>>>>> I am not sure if we should separate the pattern ?
>> >> >> > > > > >>>>>>> Would it make sense to FAIL if the mode size isn't 16 bytes for
>> >> >> > > MVE as
>> >> >> > > > > >>>>>>> in attached patch so
>> >> >> > > > > >>>>>>> it will call neon_expand_vector_init only for 128-bit vectors ?
>> >> >> > > > > >>>>>>> Altho hard-coding 16 in the pattern doesn't seem a good idea to
>> >> >> > > me
>> >> >> > > > > >>>> either.
>> >> >> > > > > >>>>>> ping https://gcc.gnu.org/pipermail/gcc-patches/2021-
>> >> >> > > June/572342.html
>> >> >> > > > > >>>>>> (attaching patch as text).
>> >> >> > > > > >>>>>>
>> >> >> > > > > >>>>> --- a/gcc/config/arm/neon.md
>> >> >> > > > > >>>>> +++ b/gcc/config/arm/neon.md
>> >> >> > > > > >>>>> @@ -459,10 +459,12 @@
>> >> >> > > > > >>>>>    )
>> >> >> > > > > >>>>>
>> >> >> > > > > >>>>>    (define_expand "vec_init<mode><V_elem_l>"
>> >> >> > > > > >>>>> -  [(match_operand:VDQ 0 "s_register_operand")
>> >> >> > > > > >>>>> +  [(match_operand:VDQX 0 "s_register_operand")
>> >> >> > > > > >>>>>       (match_operand 1 "" "")]
>> >> >> > > > > >>>>>      "TARGET_NEON || TARGET_HAVE_MVE"
>> >> >> > > > > >>>>>    {
>> >> >> > > > > >>>>> +  if (TARGET_HAVE_MVE && GET_MODE_SIZE (GET_MODE
>> >> >> > > > > >>>> (operands[0])) != 16)
>> >> >> > > > > >>>>> +    FAIL;
>> >> >> > > > > >>>>>      neon_expand_vector_init (operands[0], operands[1]);
>> >> >> > > > > >>>>>      DONE;
>> >> >> > > > > >>>>>    })
>> >> >> > > > > >>>>>
>> >> >> > > > > >>>>> I think we should move this to vec-common.md like Christophe
>> >> >> > > said.
>> >> >> > > > > >>>>> Perhaps rather than making it FAIL for non-16 MVE sizes we just
>> >> >> > > disable it in
>> >> >> > > > > >>>> the expander condition?
>> >> >> > > > > >>>>> "TARGET_NEON || (TARGET_HAVE_MVE && GET_MODE_SIZE (<
>> >> >> > > > > >>>> VDQ>mode) != 16)"
>> >> >> > > > > >>>> Is it OK to use <MODE>mode ? Because using <VDQ>mode resulted
>> >> >> > > in lot
>> >> >> > > > > >>>> of build errors.
>> >> >> > > > > >>>> Also, I think the comparison should be inverted, ie, GET_MODE_SIZE
>> >> >> > > > > >>>> (<MODE>mode) == 16 since
>> >> >> > > > > >>>> we want to make the pattern pass if target is MVE and vector size is
>> >> >> > > 16 bytes ?
>> >> >> > > > > >>>> Do these changes in attached patch look OK ?
>> >> >> > > > > >>> Yes, you're right.
>> >> >> > > > > >>
>> >> >> > > > > >> Can't this be ARM_HAVE_<MODE>_ARITH like in most expanders in
>> >> >> > > vec-common.md?
>> >> >> > > > > >>
>> >> >> > > > > >> (maybe with a && !TARGET_REALLY_IWMMXT if needed)
>> >> >> > > > > > I wonder if this should be ARM_HAVE_<MODE>_LDST instead since
>> >> >> > > we're
>> >> >> > > > > > initializing the vector ?
>> >> >> > > > >
>> >> >> > > > >
>> >> >> > > > > Well, it really depends on which modes you want to enable.
>> >> >> > > > >
>> >> >> > > > >
>> >> >> > > > > Looks like your move VDQ -> VDQ adds V4BF, V8BF and DI.
>> >> >> > > > >
>> >> >> > > > > Are they all OK for Neon?
>> >> >> > > > >
>> >> >> > > > > They are not OK for MVE.
>> >> >> > > > >
>> >> >> > > > > Ideally you could add testcases to cover to the supported and
>> >> >> > > > > unsupported modes for both Neon and MVE.\
>> >> >> > > > >
>> >> >> > > > > Before your patch, the expander is enabled for MVE for 64 bit modes
>> >> >> > > > > (V8QI, V4HI, V2SI): what happens in this case? Does the compiler crash
>> >> >> > > > > or is there something else preventing the match?
>> >> >> > > > Hi,
>> >> >> > > > Apparently there is VALID_MVE_MODE macro, so is it better to use:
>> >> >> > > > TARGET_NEON || (TARGET_HAVE_MVE &&
>> >> >> > > VALID_MVE_MODE(<MODE>mode))
>> >> >> > > > as in the attached patch ?
>> >> >> >
>> >> >> > The change is ok. I would like to see some testcases like Christophe suggested, but this patch just moves the expander around rather than introducing new functionality.
>> >> >> Hi Kyrill,
>> >> >> As mentioned in the first email, the patch improves code-gen for
>> >> >> following test-case:
>> >> >>
>> >> >> bfloat16x4_t f (bfloat16_t a)
>> >> >> {
>> >> >>   return (bfloat16x4_t) {a, a, a, a};
>> >> >> }
>> >> >>
>> >> >> Before patch:
>> >> >> f:
>> >> >>         mov     r3, r0  @ __bf16
>> >> >>         adr     r1, .L4
>> >> >>         ldrd    r0, [r1]
>> >> >>         mov     r2, r3  @ __bf16
>> >> >>         mov     ip, r3  @ __bf16
>> >> >>         bfi     r1, r2, #0, #16
>> >> >>         bfi     r0, ip, #0, #16
>> >> >>         bfi     r1, r3, #16, #16
>> >> >>         bfi     r0, r2, #16, #16
>> >> >>         bx      lr
>> >> >>
>> >> >> After patch:
>> >> >> f:
>> >> >>         vdup.16 d16, r0
>> >> >>         vmov    r0, r1, d16  @ v4bf
>> >> >>         bx      lr
>> >> >>
>> >> >> because the patch changes mode from VDQ to VDQX to accommodate bf modes.
>> >> >> I have included the test in the attached patch.
>> >> >> I think Christophe's concerns were mainly about the right modes
>> >> >> getting enabled for MVE.
>> >> >> Unfortunately, I am not sure how to test for that because the FE
>> >> >> catches invalid modes, and we don't
>> >> >> end up hitting the pattern.
>> >> >>
>> >> >
>> >> > Hi Prathamesh,
>> >> >
>> >> > The new testcase fails on arm-linux-gnueabihf:
>> >> >  FAIL: gcc.target/arm/simd/pr98435.c (test for excess errors)
>> >> > Excess errors:
>> >> > /aci-gcc-fsf/builds/gcc-fsf-gccsrc/sysroot-arm-none-linux-gnueabihf/usr/include/gnu/stubs.h:7:11: fatal error: gnu/stubs-soft.h: No such file or directory
>> >> > compilation terminated.
>> >> >
>> >> > Because you don't check whether  -mfloat-abi=softfp is actually supported.
>> >> >
>> >> > Can you fix that?
>> >> Oops, sorry about that.
>> >> The attached patch fixes the test by requiring arm_softfloat and makes
>> >> it UNSUPPORTED on arm-linux-gnueabihf.
>> >> Does it look OK ?
>> >>
>> >
>> > I don't think that's right: it would make the test unsupported if softfp is not the default even if the toolchain has the needed multilibs.
>> > Did you check eg. with arm-eabi and multilibs enabled?
>> Ah OK, thanks for pointing it out!
>> Does the attached patch look correct ?
>>
>
> I don't think: this would skip the test even if the toolchain has multilibs enabled.
> Did you check eg. with arm-eabi and multilibs enabled and the usual option overrides?
It showed 3 PASS with second patch:
/* { dg-skip-if "skip test for hard float" { *-*-* } {
"-mfloat-abi=hard" } { "" } } */

I ran it using make check-gcc RUNTESTFLAGS="simd.exp=pr98435.c"
and built the toolchain using:
abe.sh --target arm-eabi --build all --set multilib=aprofile gcc=gcc.git~master.
I suppose that's correct ?

gcc -v output:
Configured with:
'/home/prathamesh.kulkarni/abe-toolchain-2/snapshots/gcc.git~master/configure'
SHELL=/bin/bash
--with-mpc=/home/prathamesh.kulkarni/abe-toolchain-2/builds/destdir/x86_64-pc-linux-gnu
--with-mpfr=/home/prathamesh.kulkarni/abe-toolchain-2/builds/destdir/x86_64-pc-linux-gnu
--with-gmp=/home/prathamesh.kulkarni/abe-toolchain-2/builds/destdir/x86_64-pc-linux-gnu
--with-gnu-as --with-gnu-ld --disable-libmudflap --enable-lto
--enable-shared --without-included-gettext --enable-nls
--with-system-zlib --disable-sjlj-exceptions
--enable-gnu-unique-object --enable-linker-build-id
--disable-libstdcxx-pch --enable-c99 --enable-clocale=gnu
--enable-libstdcxx-debug --enable-long-long --with-cloog=no
--with-ppl=no --with-isl=no --enable-multilib
--with-multilib-list=aprofile --enable-threads=no --disable-multiarch
--with-sysroot=/home/prathamesh.kulkarni/abe-toolchain-2/builds/destdir/x86_64-pc-linux-gnu/arm-eabi
--with-newlib --enable-checking=yes --disable-bootstrap
--enable-languages=c,c++,lto
--prefix=/home/prathamesh.kulkarni/abe-toolchain-2/builds/destdir/x86_64-pc-linux-gnu
--build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu
--target=arm-eabi

Thanks,
Prathamesh
>
>
> Christophe
>
>> Thanks,
>> Prathamesh
>> >
>> > Christophe
>> >
>> >>
>> >> Thanks,
>> >> Prathamesh
>> >> >
>> >> > Thanks
>> >> >
>> >> > Christophe
>> >> >
>> >> >
>> >> >>
>> >> >> Thanks,
>> >> >> Prathamesh
>> >> >> > Thanks,
>> >> >> > Kyrill
>> >> >> >
>> >> >> > > ping https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574206.html
>> >> >> > >
>> >> >> > > Thanks,
>> >> >> > > Prathamesh
>> >> >> > > >
>> >> >> > > > Thanks,
>> >> >> > > > Prathamesh
>> >> >> > > > >
>> >> >> > > > >
>> >> >> > > > > Thanks,
>> >> >> > > > >
>> >> >> > > > >
>> >> >> > > > > Christophe
>> >> >> > > > >
>> >> >> > > > >
>> >> >> > > > > > Thanks,
>> >> >> > > > > > Prathamesh
>> >> >> > > > > >>
>> >> >> > > > > >> Christophe
>> >> >> > > > > >>
>> >> >> > > > > >>
>> >> >> > > > > >>> Ok.
>> >> >> > > > > >>> Thanks,
>> >> >> > > > > >>> Kyrill
>> >> >> > > > > >>>
>> >> >> > > > > >>>
>> >> >> > > > > >>>> Thanks,
>> >> >> > > > > >>>> Prathamesh
>> >> >> > > > > >>>>> Thanks,
>> >> >> > > > > >>>>> Kyrill
>> >> >> > > > > >>>>>
>> >> >> > > > > >>>>>> Thanks,
>> >> >> > > > > >>>>>> Prathamesh
>> >> >> > > > > >>>>>>> Thanks,
>> >> >> > > > > >>>>>>> Prathamesh
>> >> >> > > > > >>>>>>>> That being said, I suggest you look at other similar patterns in
>> >> >> > > > > >>>>>>>> vec-common.md, most of which are gated on
>> >> >> > > > > >>>>>>>> ARM_HAVE_<MODE>_ARITH
>> >> >> > > > > >>>>>>>> and possibly beware of issues with iwmmxt :-)
>> >> >> > > > > >>>>>>>>
>> >> >> > > > > >>>>>>>> Christophe
>> >> >> > > > > >>>>>>>>
>> >> >> > > > > >>>>>>>>> Thanks,
>> >> >> > > > > >>>>>>>>> Prathamesh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ARM] PR98435: Missed optimization in expanding vector constructor
  2021-08-06  8:59                                     ` Prathamesh Kulkarni
@ 2021-08-06  9:19                                       ` Christophe Lyon
  2021-08-06  9:50                                         ` Prathamesh Kulkarni
  0 siblings, 1 reply; 29+ messages in thread
From: Christophe Lyon @ 2021-08-06  9:19 UTC (permalink / raw)
  To: Prathamesh Kulkarni; +Cc: Kyrylo Tkachov, gcc Patches

On Fri, Aug 6, 2021 at 11:00 AM Prathamesh Kulkarni <
prathamesh.kulkarni@linaro.org> wrote:

> On Thu, 5 Aug 2021 at 18:05, Christophe Lyon
> <christophe.lyon.oss@gmail.com> wrote:
> >
> >
> >
> > On Thu, Aug 5, 2021 at 2:28 PM Prathamesh Kulkarni <
> prathamesh.kulkarni@linaro.org> wrote:
> >>
> >> On Tue, 3 Aug 2021 at 20:52, Christophe Lyon
> >> <christophe.lyon.oss@gmail.com> wrote:
> >> >
> >> >
> >> >
> >> > On Tue, Aug 3, 2021 at 12:57 PM Prathamesh Kulkarni <
> prathamesh.kulkarni@linaro.org> wrote:
> >> >>
> >> >> On Tue, 3 Aug 2021 at 14:59, Christophe Lyon
> >> >> <christophe.lyon.oss@gmail.com> wrote:
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Tue, Jul 6, 2021 at 11:26 AM Prathamesh Kulkarni via
> Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> >> >> >>
> >> >> >> On Tue, 6 Jul 2021 at 13:33, Kyrylo Tkachov <
> Kyrylo.Tkachov@arm.com> wrote:
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > > -----Original Message-----
> >> >> >> > > From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> >> >> >> > > Sent: 06 July 2021 08:06
> >> >> >> > > To: Christophe LYON <christophe.lyon@foss.st.com>
> >> >> >> > > Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; gcc Patches
> <gcc-
> >> >> >> > > patches@gcc.gnu.org>
> >> >> >> > > Subject: Re: [ARM] PR98435: Missed optimization in expanding
> vector
> >> >> >> > > constructor
> >> >> >> > >
> >> >> >> > > On Thu, 1 Jul 2021 at 16:26, Prathamesh Kulkarni
> >> >> >> > > <prathamesh.kulkarni@linaro.org> wrote:
> >> >> >> > > >
> >> >> >> > > > On Wed, 30 Jun 2021 at 20:51, Christophe LYON
> >> >> >> > > > <christophe.lyon@foss.st.com> wrote:
> >> >> >> > > > >
> >> >> >> > > > >
> >> >> >> > > > > On 29/06/2021 12:46, Prathamesh Kulkarni wrote:
> >> >> >> > > > > > On Mon, 28 Jun 2021 at 14:48, Christophe LYON
> >> >> >> > > > > > <christophe.lyon@foss.st.com> wrote:
> >> >> >> > > > > >>
> >> >> >> > > > > >> On 28/06/2021 10:40, Kyrylo Tkachov via Gcc-patches
> wrote:
> >> >> >> > > > > >>>> -----Original Message-----
> >> >> >> > > > > >>>> From: Prathamesh Kulkarni <
> prathamesh.kulkarni@linaro.org>
> >> >> >> > > > > >>>> Sent: 28 June 2021 09:38
> >> >> >> > > > > >>>> To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> >> >> >> > > > > >>>> Cc: Christophe Lyon <christophe.lyon@linaro.org>;
> gcc Patches
> >> >> >> > > <gcc-
> >> >> >> > > > > >>>> patches@gcc.gnu.org>
> >> >> >> > > > > >>>> Subject: Re: [ARM] PR98435: Missed optimization in
> expanding
> >> >> >> > > vector
> >> >> >> > > > > >>>> constructor
> >> >> >> > > > > >>>>
> >> >> >> > > > > >>>> On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov
> >> >> >> > > <Kyrylo.Tkachov@arm.com>
> >> >> >> > > > > >>>> wrote:
> >> >> >> > > > > >>>>>
> >> >> >> > > > > >>>>>> -----Original Message-----
> >> >> >> > > > > >>>>>> From: Prathamesh Kulkarni <
> prathamesh.kulkarni@linaro.org>
> >> >> >> > > > > >>>>>> Sent: 14 June 2021 09:02
> >> >> >> > > > > >>>>>> To: Christophe Lyon <christophe.lyon@linaro.org>
> >> >> >> > > > > >>>>>> Cc: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo
> Tkachov
> >> >> >> > > > > >>>>>> <Kyrylo.Tkachov@arm.com>
> >> >> >> > > > > >>>>>> Subject: Re: [ARM] PR98435: Missed optimization in
> expanding
> >> >> >> > > vector
> >> >> >> > > > > >>>>>> constructor
> >> >> >> > > > > >>>>>>
> >> >> >> > > > > >>>>>> On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni
> >> >> >> > > > > >>>>>> <prathamesh.kulkarni@linaro.org> wrote:
> >> >> >> > > > > >>>>>>> On Fri, 4 Jun 2021 at 13:15, Christophe Lyon
> >> >> >> > > > > >>>> <christophe.lyon@linaro.org>
> >> >> >> > > > > >>>>>> wrote:
> >> >> >> > > > > >>>>>>>> On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni
> via Gcc-
> >> >> >> > > patches
> >> >> >> > > > > >>>>>>>> <gcc-patches@gcc.gnu.org> wrote:
> >> >> >> > > > > >>>>>>>>> Hi,
> >> >> >> > > > > >>>>>>>>> As mentioned in PR, for the following test-case:
> >> >> >> > > > > >>>>>>>>>
> >> >> >> > > > > >>>>>>>>> #include <arm_neon.h>
> >> >> >> > > > > >>>>>>>>>
> >> >> >> > > > > >>>>>>>>> bfloat16x4_t f1 (bfloat16_t a)
> >> >> >> > > > > >>>>>>>>> {
> >> >> >> > > > > >>>>>>>>>     return vdup_n_bf16 (a);
> >> >> >> > > > > >>>>>>>>> }
> >> >> >> > > > > >>>>>>>>>
> >> >> >> > > > > >>>>>>>>> bfloat16x4_t f2 (bfloat16_t a)
> >> >> >> > > > > >>>>>>>>> {
> >> >> >> > > > > >>>>>>>>>     return (bfloat16x4_t) {a, a, a, a};
> >> >> >> > > > > >>>>>>>>> }
> >> >> >> > > > > >>>>>>>>>
> >> >> >> > > > > >>>>>>>>> Compiling with arm-linux-gnueabi -O3 -mfpu=neon
> -mfloat-
> >> >> >> > > > > >>>> abi=softfp
> >> >> >> > > > > >>>>>>>>> -march=armv8.2-a+bf16+fp16 results in f2 not
> being
> >> >> >> > > vectorized:
> >> >> >> > > > > >>>>>>>>>
> >> >> >> > > > > >>>>>>>>> f1:
> >> >> >> > > > > >>>>>>>>>           vdup.16 d16, r0
> >> >> >> > > > > >>>>>>>>>           vmov    r0, r1, d16  @ v4bf
> >> >> >> > > > > >>>>>>>>>           bx      lr
> >> >> >> > > > > >>>>>>>>>
> >> >> >> > > > > >>>>>>>>> f2:
> >> >> >> > > > > >>>>>>>>>           mov     r3, r0  @ __bf16
> >> >> >> > > > > >>>>>>>>>           adr     r1, .L4
> >> >> >> > > > > >>>>>>>>>           ldrd    r0, [r1]
> >> >> >> > > > > >>>>>>>>>           mov     r2, r3  @ __bf16
> >> >> >> > > > > >>>>>>>>>           mov     ip, r3  @ __bf16
> >> >> >> > > > > >>>>>>>>>           bfi     r1, r2, #0, #16
> >> >> >> > > > > >>>>>>>>>           bfi     r0, ip, #0, #16
> >> >> >> > > > > >>>>>>>>>           bfi     r1, r3, #16, #16
> >> >> >> > > > > >>>>>>>>>           bfi     r0, r2, #16, #16
> >> >> >> > > > > >>>>>>>>>           bx      lr
> >> >> >> > > > > >>>>>>>>>
> >> >> >> > > > > >>>>>>>>> This seems to happen because vec_init pattern
> in neon.md
> >> >> >> > > has VDQ
> >> >> >> > > > > >>>>>> mode
> >> >> >> > > > > >>>>>>>>> iterator, which doesn't include V4BF. In
> attached patch, I
> >> >> >> > > changed
> >> >> >> > > > > >>>>>>>>> mode
> >> >> >> > > > > >>>>>>>>> to VDQX which seems to work for the test-case,
> and the
> >> >> >> > > compiler
> >> >> >> > > > > >>>> now
> >> >> >> > > > > >>>>>> generates:
> >> >> >> > > > > >>>>>>>>> f2:
> >> >> >> > > > > >>>>>>>>>           vdup.16 d16, r0
> >> >> >> > > > > >>>>>>>>>           vmov    r0, r1, d16  @ v4bf
> >> >> >> > > > > >>>>>>>>>           bx      lr
> >> >> >> > > > > >>>>>>>>>
> >> >> >> > > > > >>>>>>>>> However, the pattern is also gated on
> TARGET_HAVE_MVE
> >> >> >> > > and I am
> >> >> >> > > > > >>>>>> not
> >> >> >> > > > > >>>>>>>>> sure if either VDQ or VDQX are correct modes
> for MVE since
> >> >> >> > > MVE
> >> >> >> > > > > >>>> has
> >> >> >> > > > > >>>>>>>>> only 128-bit vectors ?
> >> >> >> > > > > >>>>>>>>>
> >> >> >> > > > > >>>>>>>> I think patterns common to both Neon and MVE
> should be
> >> >> >> > > moved to
> >> >> >> > > > > >>>>>>>> vec-common.md, I don't know why such patterns
> were left in
> >> >> >> > > > > >>>> neon.md.
> >> >> >> > > > > >>>>>>> Since we end up calling neon_expand_vector_init
> for both
> >> >> >> > > NEON and
> >> >> >> > > > > >>>> MVE,
> >> >> >> > > > > >>>>>>> I am not sure if we should separate the pattern ?
> >> >> >> > > > > >>>>>>> Would it make sense to FAIL if the mode size
> isn't 16 bytes for
> >> >> >> > > MVE as
> >> >> >> > > > > >>>>>>> in attached patch so
> >> >> >> > > > > >>>>>>> it will call neon_expand_vector_init only for
> 128-bit vectors ?
> >> >> >> > > > > >>>>>>> Altho hard-coding 16 in the pattern doesn't seem
> a good idea to
> >> >> >> > > me
> >> >> >> > > > > >>>> either.
> >> >> >> > > > > >>>>>> ping
> https://gcc.gnu.org/pipermail/gcc-patches/2021-
> >> >> >> > > June/572342.html
> >> >> >> > > > > >>>>>> (attaching patch as text).
> >> >> >> > > > > >>>>>>
> >> >> >> > > > > >>>>> --- a/gcc/config/arm/neon.md
> >> >> >> > > > > >>>>> +++ b/gcc/config/arm/neon.md
> >> >> >> > > > > >>>>> @@ -459,10 +459,12 @@
> >> >> >> > > > > >>>>>    )
> >> >> >> > > > > >>>>>
> >> >> >> > > > > >>>>>    (define_expand "vec_init<mode><V_elem_l>"
> >> >> >> > > > > >>>>> -  [(match_operand:VDQ 0 "s_register_operand")
> >> >> >> > > > > >>>>> +  [(match_operand:VDQX 0 "s_register_operand")
> >> >> >> > > > > >>>>>       (match_operand 1 "" "")]
> >> >> >> > > > > >>>>>      "TARGET_NEON || TARGET_HAVE_MVE"
> >> >> >> > > > > >>>>>    {
> >> >> >> > > > > >>>>> +  if (TARGET_HAVE_MVE && GET_MODE_SIZE (GET_MODE
> >> >> >> > > > > >>>> (operands[0])) != 16)
> >> >> >> > > > > >>>>> +    FAIL;
> >> >> >> > > > > >>>>>      neon_expand_vector_init (operands[0],
> operands[1]);
> >> >> >> > > > > >>>>>      DONE;
> >> >> >> > > > > >>>>>    })
> >> >> >> > > > > >>>>>
> >> >> >> > > > > >>>>> I think we should move this to vec-common.md like
> Christophe
> >> >> >> > > said.
> >> >> >> > > > > >>>>> Perhaps rather than making it FAIL for non-16 MVE
> sizes we just
> >> >> >> > > disable it in
> >> >> >> > > > > >>>> the expander condition?
> >> >> >> > > > > >>>>> "TARGET_NEON || (TARGET_HAVE_MVE && GET_MODE_SIZE (<
> >> >> >> > > > > >>>> VDQ>mode) != 16)"
> >> >> >> > > > > >>>> Is it OK to use <MODE>mode ? Because using <VDQ>mode
> resulted
> >> >> >> > > in lot
> >> >> >> > > > > >>>> of build errors.
> >> >> >> > > > > >>>> Also, I think the comparison should be inverted, ie,
> GET_MODE_SIZE
> >> >> >> > > > > >>>> (<MODE>mode) == 16 since
> >> >> >> > > > > >>>> we want to make the pattern pass if target is MVE
> and vector size is
> >> >> >> > > 16 bytes ?
> >> >> >> > > > > >>>> Do these changes in attached patch look OK ?
> >> >> >> > > > > >>> Yes, you're right.
> >> >> >> > > > > >>
> >> >> >> > > > > >> Can't this be ARM_HAVE_<MODE>_ARITH like in most
> expanders in
> >> >> >> > > vec-common.md?
> >> >> >> > > > > >>
> >> >> >> > > > > >> (maybe with a && !TARGET_REALLY_IWMMXT if needed)
> >> >> >> > > > > > I wonder if this should be ARM_HAVE_<MODE>_LDST instead
> since
> >> >> >> > > we're
> >> >> >> > > > > > initializing the vector ?
> >> >> >> > > > >
> >> >> >> > > > >
> >> >> >> > > > > Well, it really depends on which modes you want to enable.
> >> >> >> > > > >
> >> >> >> > > > >
> >> >> >> > > > > Looks like your move VDQ -> VDQ adds V4BF, V8BF and DI.
> >> >> >> > > > >
> >> >> >> > > > > Are they all OK for Neon?
> >> >> >> > > > >
> >> >> >> > > > > They are not OK for MVE.
> >> >> >> > > > >
> >> >> >> > > > > Ideally you could add testcases to cover to the supported
> and
> >> >> >> > > > > unsupported modes for both Neon and MVE.\
> >> >> >> > > > >
> >> >> >> > > > > Before your patch, the expander is enabled for MVE for 64
> bit modes
> >> >> >> > > > > (V8QI, V4HI, V2SI): what happens in this case? Does the
> compiler crash
> >> >> >> > > > > or is there something else preventing the match?
> >> >> >> > > > Hi,
> >> >> >> > > > Apparently there is VALID_MVE_MODE macro, so is it better
> to use:
> >> >> >> > > > TARGET_NEON || (TARGET_HAVE_MVE &&
> >> >> >> > > VALID_MVE_MODE(<MODE>mode))
> >> >> >> > > > as in the attached patch ?
> >> >> >> >
> >> >> >> > The change is ok. I would like to see some testcases like
> Christophe suggested, but this patch just moves the expander around rather
> than introducing new functionality.
> >> >> >> Hi Kyrill,
> >> >> >> As mentioned in the first email, the patch improves code-gen for
> >> >> >> following test-case:
> >> >> >>
> >> >> >> bfloat16x4_t f (bfloat16_t a)
> >> >> >> {
> >> >> >>   return (bfloat16x4_t) {a, a, a, a};
> >> >> >> }
> >> >> >>
> >> >> >> Before patch:
> >> >> >> f:
> >> >> >>         mov     r3, r0  @ __bf16
> >> >> >>         adr     r1, .L4
> >> >> >>         ldrd    r0, [r1]
> >> >> >>         mov     r2, r3  @ __bf16
> >> >> >>         mov     ip, r3  @ __bf16
> >> >> >>         bfi     r1, r2, #0, #16
> >> >> >>         bfi     r0, ip, #0, #16
> >> >> >>         bfi     r1, r3, #16, #16
> >> >> >>         bfi     r0, r2, #16, #16
> >> >> >>         bx      lr
> >> >> >>
> >> >> >> After patch:
> >> >> >> f:
> >> >> >>         vdup.16 d16, r0
> >> >> >>         vmov    r0, r1, d16  @ v4bf
> >> >> >>         bx      lr
> >> >> >>
> >> >> >> because the patch changes mode from VDQ to VDQX to accommodate bf
> modes.
> >> >> >> I have included the test in the attached patch.
> >> >> >> I think Christophe's concerns were mainly about the right modes
> >> >> >> getting enabled for MVE.
> >> >> >> Unfortunately, I am not sure how to test for that because the FE
> >> >> >> catches invalid modes, and we don't
> >> >> >> end up hitting the pattern.
> >> >> >>
> >> >> >
> >> >> > Hi Prathamesh,
> >> >> >
> >> >> > The new testcase fails on arm-linux-gnueabihf:
> >> >> >  FAIL: gcc.target/arm/simd/pr98435.c (test for excess errors)
> >> >> > Excess errors:
> >> >> >
> /aci-gcc-fsf/builds/gcc-fsf-gccsrc/sysroot-arm-none-linux-gnueabihf/usr/include/gnu/stubs.h:7:11:
> fatal error: gnu/stubs-soft.h: No such file or directory
> >> >> > compilation terminated.
> >> >> >
> >> >> > Because you don't check whether  -mfloat-abi=softfp is actually
> supported.
> >> >> >
> >> >> > Can you fix that?
> >> >> Oops, sorry about that.
> >> >> The attached patch fixes the test by requiring arm_softfloat and
> makes
> >> >> it UNSUPPORTED on arm-linux-gnueabihf.
> >> >> Does it look OK ?
> >> >>
> >> >
> >> > I don't think that's right: it would make the test unsupported if
> softfp is not the default even if the toolchain has the needed multilibs.
> >> > Did you check eg. with arm-eabi and multilibs enabled?
> >> Ah OK, thanks for pointing it out!
> >> Does the attached patch look correct ?
> >>
> >
> > I don't think: this would skip the test even if the toolchain has
> multilibs enabled.
> > Did you check eg. with arm-eabi and multilibs enabled and the usual
> option overrides?
> It showed 3 PASS with second patch:
> /* { dg-skip-if "skip test for hard float" { *-*-* } {
> "-mfloat-abi=hard" } { "" } } */
>
> I ran it using make check-gcc RUNTESTFLAGS="simd.exp=pr98435.c"
> and built the toolchain using:
> abe.sh --target arm-eabi --build all --set multilib=aprofile
> gcc=gcc.git~master.
> I suppose that's correct ?
>

I use rmprofile for arm-eabi, but since aprofile also includes both hard
and soft multilibs, that should be OK.
However, I meant overriding the flags used for testing. Here is my current
list:

-mcpu=cortex-a7/-mfloat-abi=soft/-march=armv7ve+simd
-mcpu=cortex-a7/-mfloat-abi=hard/-march=armv7ve+simd
-mthumb/-mcpu=cortex-a7/-mfloat-abi=hard/-march=armv7ve+simd
-mthumb/-mfloat-abi=soft/-march=armv6s-m
-mthumb/-mfloat-abi=soft/-march=armv7-m
-mthumb/-mfloat-abi=hard/-march=armv7e-m+fp
-mthumb/-mfloat-abi=hard/-march=armv7e-m+fp.dp
-mthumb/-mfloat-abi=hard/-march=armv8-m.main+fp+dsp
-mthumb/-mfloat-abi=soft/-march=armv8.1-m.main

Christophe


> gcc -v output:
> Configured with:
>
> '/home/prathamesh.kulkarni/abe-toolchain-2/snapshots/gcc.git~master/configure'
> SHELL=/bin/bash
>
> --with-mpc=/home/prathamesh.kulkarni/abe-toolchain-2/builds/destdir/x86_64-pc-linux-gnu
>
> --with-mpfr=/home/prathamesh.kulkarni/abe-toolchain-2/builds/destdir/x86_64-pc-linux-gnu
>
> --with-gmp=/home/prathamesh.kulkarni/abe-toolchain-2/builds/destdir/x86_64-pc-linux-gnu
> --with-gnu-as --with-gnu-ld --disable-libmudflap --enable-lto
> --enable-shared --without-included-gettext --enable-nls
> --with-system-zlib --disable-sjlj-exceptions
> --enable-gnu-unique-object --enable-linker-build-id
> --disable-libstdcxx-pch --enable-c99 --enable-clocale=gnu
> --enable-libstdcxx-debug --enable-long-long --with-cloog=no
> --with-ppl=no --with-isl=no --enable-multilib
> --with-multilib-list=aprofile --enable-threads=no --disable-multiarch
>
> --with-sysroot=/home/prathamesh.kulkarni/abe-toolchain-2/builds/destdir/x86_64-pc-linux-gnu/arm-eabi
> --with-newlib --enable-checking=yes --disable-bootstrap
> --enable-languages=c,c++,lto
>
> --prefix=/home/prathamesh.kulkarni/abe-toolchain-2/builds/destdir/x86_64-pc-linux-gnu
> --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu
> --target=arm-eabi
>
> Thanks,
> Prathamesh
> >
> >
> > Christophe
> >
> >> Thanks,
> >> Prathamesh
> >> >
> >> > Christophe
> >> >
> >> >>
> >> >> Thanks,
> >> >> Prathamesh
> >> >> >
> >> >> > Thanks
> >> >> >
> >> >> > Christophe
> >> >> >
> >> >> >
> >> >> >>
> >> >> >> Thanks,
> >> >> >> Prathamesh
> >> >> >> > Thanks,
> >> >> >> > Kyrill
> >> >> >> >
> >> >> >> > > ping
> https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574206.html
> >> >> >> > >
> >> >> >> > > Thanks,
> >> >> >> > > Prathamesh
> >> >> >> > > >
> >> >> >> > > > Thanks,
> >> >> >> > > > Prathamesh
> >> >> >> > > > >
> >> >> >> > > > >
> >> >> >> > > > > Thanks,
> >> >> >> > > > >
> >> >> >> > > > >
> >> >> >> > > > > Christophe
> >> >> >> > > > >
> >> >> >> > > > >
> >> >> >> > > > > > Thanks,
> >> >> >> > > > > > Prathamesh
> >> >> >> > > > > >>
> >> >> >> > > > > >> Christophe
> >> >> >> > > > > >>
> >> >> >> > > > > >>
> >> >> >> > > > > >>> Ok.
> >> >> >> > > > > >>> Thanks,
> >> >> >> > > > > >>> Kyrill
> >> >> >> > > > > >>>
> >> >> >> > > > > >>>
> >> >> >> > > > > >>>> Thanks,
> >> >> >> > > > > >>>> Prathamesh
> >> >> >> > > > > >>>>> Thanks,
> >> >> >> > > > > >>>>> Kyrill
> >> >> >> > > > > >>>>>
> >> >> >> > > > > >>>>>> Thanks,
> >> >> >> > > > > >>>>>> Prathamesh
> >> >> >> > > > > >>>>>>> Thanks,
> >> >> >> > > > > >>>>>>> Prathamesh
> >> >> >> > > > > >>>>>>>> That being said, I suggest you look at other
> similar patterns in
> >> >> >> > > > > >>>>>>>> vec-common.md, most of which are gated on
> >> >> >> > > > > >>>>>>>> ARM_HAVE_<MODE>_ARITH
> >> >> >> > > > > >>>>>>>> and possibly beware of issues with iwmmxt :-)
> >> >> >> > > > > >>>>>>>>
> >> >> >> > > > > >>>>>>>> Christophe
> >> >> >> > > > > >>>>>>>>
> >> >> >> > > > > >>>>>>>>> Thanks,
> >> >> >> > > > > >>>>>>>>> Prathamesh
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ARM] PR98435: Missed optimization in expanding vector constructor
  2021-08-06  9:19                                       ` Christophe Lyon
@ 2021-08-06  9:50                                         ` Prathamesh Kulkarni
  2021-08-06 12:01                                           ` Christophe Lyon
  0 siblings, 1 reply; 29+ messages in thread
From: Prathamesh Kulkarni @ 2021-08-06  9:50 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: Kyrylo Tkachov, gcc Patches

On Fri, 6 Aug 2021 at 14:49, Christophe Lyon
<christophe.lyon.oss@gmail.com> wrote:
>
>
>
> On Fri, Aug 6, 2021 at 11:00 AM Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> wrote:
>>
>> On Thu, 5 Aug 2021 at 18:05, Christophe Lyon
>> <christophe.lyon.oss@gmail.com> wrote:
>> >
>> >
>> >
>> > On Thu, Aug 5, 2021 at 2:28 PM Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> wrote:
>> >>
>> >> On Tue, 3 Aug 2021 at 20:52, Christophe Lyon
>> >> <christophe.lyon.oss@gmail.com> wrote:
>> >> >
>> >> >
>> >> >
>> >> > On Tue, Aug 3, 2021 at 12:57 PM Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> wrote:
>> >> >>
>> >> >> On Tue, 3 Aug 2021 at 14:59, Christophe Lyon
>> >> >> <christophe.lyon.oss@gmail.com> wrote:
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > On Tue, Jul 6, 2021 at 11:26 AM Prathamesh Kulkarni via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>> >> >> >>
>> >> >> >> On Tue, 6 Jul 2021 at 13:33, Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> wrote:
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > > -----Original Message-----
>> >> >> >> > > From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
>> >> >> >> > > Sent: 06 July 2021 08:06
>> >> >> >> > > To: Christophe LYON <christophe.lyon@foss.st.com>
>> >> >> >> > > Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; gcc Patches <gcc-
>> >> >> >> > > patches@gcc.gnu.org>
>> >> >> >> > > Subject: Re: [ARM] PR98435: Missed optimization in expanding vector
>> >> >> >> > > constructor
>> >> >> >> > >
>> >> >> >> > > On Thu, 1 Jul 2021 at 16:26, Prathamesh Kulkarni
>> >> >> >> > > <prathamesh.kulkarni@linaro.org> wrote:
>> >> >> >> > > >
>> >> >> >> > > > On Wed, 30 Jun 2021 at 20:51, Christophe LYON
>> >> >> >> > > > <christophe.lyon@foss.st.com> wrote:
>> >> >> >> > > > >
>> >> >> >> > > > >
>> >> >> >> > > > > On 29/06/2021 12:46, Prathamesh Kulkarni wrote:
>> >> >> >> > > > > > On Mon, 28 Jun 2021 at 14:48, Christophe LYON
>> >> >> >> > > > > > <christophe.lyon@foss.st.com> wrote:
>> >> >> >> > > > > >>
>> >> >> >> > > > > >> On 28/06/2021 10:40, Kyrylo Tkachov via Gcc-patches wrote:
>> >> >> >> > > > > >>>> -----Original Message-----
>> >> >> >> > > > > >>>> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
>> >> >> >> > > > > >>>> Sent: 28 June 2021 09:38
>> >> >> >> > > > > >>>> To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
>> >> >> >> > > > > >>>> Cc: Christophe Lyon <christophe.lyon@linaro.org>; gcc Patches
>> >> >> >> > > <gcc-
>> >> >> >> > > > > >>>> patches@gcc.gnu.org>
>> >> >> >> > > > > >>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding
>> >> >> >> > > vector
>> >> >> >> > > > > >>>> constructor
>> >> >> >> > > > > >>>>
>> >> >> >> > > > > >>>> On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov
>> >> >> >> > > <Kyrylo.Tkachov@arm.com>
>> >> >> >> > > > > >>>> wrote:
>> >> >> >> > > > > >>>>>
>> >> >> >> > > > > >>>>>> -----Original Message-----
>> >> >> >> > > > > >>>>>> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
>> >> >> >> > > > > >>>>>> Sent: 14 June 2021 09:02
>> >> >> >> > > > > >>>>>> To: Christophe Lyon <christophe.lyon@linaro.org>
>> >> >> >> > > > > >>>>>> Cc: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov
>> >> >> >> > > > > >>>>>> <Kyrylo.Tkachov@arm.com>
>> >> >> >> > > > > >>>>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding
>> >> >> >> > > vector
>> >> >> >> > > > > >>>>>> constructor
>> >> >> >> > > > > >>>>>>
>> >> >> >> > > > > >>>>>> On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni
>> >> >> >> > > > > >>>>>> <prathamesh.kulkarni@linaro.org> wrote:
>> >> >> >> > > > > >>>>>>> On Fri, 4 Jun 2021 at 13:15, Christophe Lyon
>> >> >> >> > > > > >>>> <christophe.lyon@linaro.org>
>> >> >> >> > > > > >>>>>> wrote:
>> >> >> >> > > > > >>>>>>>> On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via Gcc-
>> >> >> >> > > patches
>> >> >> >> > > > > >>>>>>>> <gcc-patches@gcc.gnu.org> wrote:
>> >> >> >> > > > > >>>>>>>>> Hi,
>> >> >> >> > > > > >>>>>>>>> As mentioned in PR, for the following test-case:
>> >> >> >> > > > > >>>>>>>>>
>> >> >> >> > > > > >>>>>>>>> #include <arm_neon.h>
>> >> >> >> > > > > >>>>>>>>>
>> >> >> >> > > > > >>>>>>>>> bfloat16x4_t f1 (bfloat16_t a)
>> >> >> >> > > > > >>>>>>>>> {
>> >> >> >> > > > > >>>>>>>>>     return vdup_n_bf16 (a);
>> >> >> >> > > > > >>>>>>>>> }
>> >> >> >> > > > > >>>>>>>>>
>> >> >> >> > > > > >>>>>>>>> bfloat16x4_t f2 (bfloat16_t a)
>> >> >> >> > > > > >>>>>>>>> {
>> >> >> >> > > > > >>>>>>>>>     return (bfloat16x4_t) {a, a, a, a};
>> >> >> >> > > > > >>>>>>>>> }
>> >> >> >> > > > > >>>>>>>>>
>> >> >> >> > > > > >>>>>>>>> Compiling with arm-linux-gnueabi -O3 -mfpu=neon -mfloat-
>> >> >> >> > > > > >>>> abi=softfp
>> >> >> >> > > > > >>>>>>>>> -march=armv8.2-a+bf16+fp16 results in f2 not being
>> >> >> >> > > vectorized:
>> >> >> >> > > > > >>>>>>>>>
>> >> >> >> > > > > >>>>>>>>> f1:
>> >> >> >> > > > > >>>>>>>>>           vdup.16 d16, r0
>> >> >> >> > > > > >>>>>>>>>           vmov    r0, r1, d16  @ v4bf
>> >> >> >> > > > > >>>>>>>>>           bx      lr
>> >> >> >> > > > > >>>>>>>>>
>> >> >> >> > > > > >>>>>>>>> f2:
>> >> >> >> > > > > >>>>>>>>>           mov     r3, r0  @ __bf16
>> >> >> >> > > > > >>>>>>>>>           adr     r1, .L4
>> >> >> >> > > > > >>>>>>>>>           ldrd    r0, [r1]
>> >> >> >> > > > > >>>>>>>>>           mov     r2, r3  @ __bf16
>> >> >> >> > > > > >>>>>>>>>           mov     ip, r3  @ __bf16
>> >> >> >> > > > > >>>>>>>>>           bfi     r1, r2, #0, #16
>> >> >> >> > > > > >>>>>>>>>           bfi     r0, ip, #0, #16
>> >> >> >> > > > > >>>>>>>>>           bfi     r1, r3, #16, #16
>> >> >> >> > > > > >>>>>>>>>           bfi     r0, r2, #16, #16
>> >> >> >> > > > > >>>>>>>>>           bx      lr
>> >> >> >> > > > > >>>>>>>>>
>> >> >> >> > > > > >>>>>>>>> This seems to happen because vec_init pattern in neon.md
>> >> >> >> > > has VDQ
>> >> >> >> > > > > >>>>>> mode
>> >> >> >> > > > > >>>>>>>>> iterator, which doesn't include V4BF. In attached patch, I
>> >> >> >> > > changed
>> >> >> >> > > > > >>>>>>>>> mode
>> >> >> >> > > > > >>>>>>>>> to VDQX which seems to work for the test-case, and the
>> >> >> >> > > compiler
>> >> >> >> > > > > >>>> now
>> >> >> >> > > > > >>>>>> generates:
>> >> >> >> > > > > >>>>>>>>> f2:
>> >> >> >> > > > > >>>>>>>>>           vdup.16 d16, r0
>> >> >> >> > > > > >>>>>>>>>           vmov    r0, r1, d16  @ v4bf
>> >> >> >> > > > > >>>>>>>>>           bx      lr
>> >> >> >> > > > > >>>>>>>>>
>> >> >> >> > > > > >>>>>>>>> However, the pattern is also gated on TARGET_HAVE_MVE
>> >> >> >> > > and I am
>> >> >> >> > > > > >>>>>> not
>> >> >> >> > > > > >>>>>>>>> sure if either VDQ or VDQX are correct modes for MVE since
>> >> >> >> > > MVE
>> >> >> >> > > > > >>>> has
>> >> >> >> > > > > >>>>>>>>> only 128-bit vectors ?
>> >> >> >> > > > > >>>>>>>>>
>> >> >> >> > > > > >>>>>>>> I think patterns common to both Neon and MVE should be
>> >> >> >> > > moved to
>> >> >> >> > > > > >>>>>>>> vec-common.md, I don't know why such patterns were left in
>> >> >> >> > > > > >>>> neon.md.
>> >> >> >> > > > > >>>>>>> Since we end up calling neon_expand_vector_init for both
>> >> >> >> > > NEON and
>> >> >> >> > > > > >>>> MVE,
>> >> >> >> > > > > >>>>>>> I am not sure if we should separate the pattern ?
>> >> >> >> > > > > >>>>>>> Would it make sense to FAIL if the mode size isn't 16 bytes for
>> >> >> >> > > MVE as
>> >> >> >> > > > > >>>>>>> in attached patch so
>> >> >> >> > > > > >>>>>>> it will call neon_expand_vector_init only for 128-bit vectors ?
>> >> >> >> > > > > >>>>>>> Altho hard-coding 16 in the pattern doesn't seem a good idea to
>> >> >> >> > > me
>> >> >> >> > > > > >>>> either.
>> >> >> >> > > > > >>>>>> ping https://gcc.gnu.org/pipermail/gcc-patches/2021-
>> >> >> >> > > June/572342.html
>> >> >> >> > > > > >>>>>> (attaching patch as text).
>> >> >> >> > > > > >>>>>>
>> >> >> >> > > > > >>>>> --- a/gcc/config/arm/neon.md
>> >> >> >> > > > > >>>>> +++ b/gcc/config/arm/neon.md
>> >> >> >> > > > > >>>>> @@ -459,10 +459,12 @@
>> >> >> >> > > > > >>>>>    )
>> >> >> >> > > > > >>>>>
>> >> >> >> > > > > >>>>>    (define_expand "vec_init<mode><V_elem_l>"
>> >> >> >> > > > > >>>>> -  [(match_operand:VDQ 0 "s_register_operand")
>> >> >> >> > > > > >>>>> +  [(match_operand:VDQX 0 "s_register_operand")
>> >> >> >> > > > > >>>>>       (match_operand 1 "" "")]
>> >> >> >> > > > > >>>>>      "TARGET_NEON || TARGET_HAVE_MVE"
>> >> >> >> > > > > >>>>>    {
>> >> >> >> > > > > >>>>> +  if (TARGET_HAVE_MVE && GET_MODE_SIZE (GET_MODE
>> >> >> >> > > > > >>>> (operands[0])) != 16)
>> >> >> >> > > > > >>>>> +    FAIL;
>> >> >> >> > > > > >>>>>      neon_expand_vector_init (operands[0], operands[1]);
>> >> >> >> > > > > >>>>>      DONE;
>> >> >> >> > > > > >>>>>    })
>> >> >> >> > > > > >>>>>
>> >> >> >> > > > > >>>>> I think we should move this to vec-common.md like Christophe
>> >> >> >> > > said.
>> >> >> >> > > > > >>>>> Perhaps rather than making it FAIL for non-16 MVE sizes we just
>> >> >> >> > > disable it in
>> >> >> >> > > > > >>>> the expander condition?
>> >> >> >> > > > > >>>>> "TARGET_NEON || (TARGET_HAVE_MVE && GET_MODE_SIZE (<
>> >> >> >> > > > > >>>> VDQ>mode) != 16)"
>> >> >> >> > > > > >>>> Is it OK to use <MODE>mode ? Because using <VDQ>mode resulted
>> >> >> >> > > in lot
>> >> >> >> > > > > >>>> of build errors.
>> >> >> >> > > > > >>>> Also, I think the comparison should be inverted, ie, GET_MODE_SIZE
>> >> >> >> > > > > >>>> (<MODE>mode) == 16 since
>> >> >> >> > > > > >>>> we want to make the pattern pass if target is MVE and vector size is
>> >> >> >> > > 16 bytes ?
>> >> >> >> > > > > >>>> Do these changes in attached patch look OK ?
>> >> >> >> > > > > >>> Yes, you're right.
>> >> >> >> > > > > >>
>> >> >> >> > > > > >> Can't this be ARM_HAVE_<MODE>_ARITH like in most expanders in
>> >> >> >> > > vec-common.md?
>> >> >> >> > > > > >>
>> >> >> >> > > > > >> (maybe with a && !TARGET_REALLY_IWMMXT if needed)
>> >> >> >> > > > > > I wonder if this should be ARM_HAVE_<MODE>_LDST instead since
>> >> >> >> > > we're
>> >> >> >> > > > > > initializing the vector ?
>> >> >> >> > > > >
>> >> >> >> > > > >
>> >> >> >> > > > > Well, it really depends on which modes you want to enable.
>> >> >> >> > > > >
>> >> >> >> > > > >
>> >> >> >> > > > > Looks like your move VDQ -> VDQ adds V4BF, V8BF and DI.
>> >> >> >> > > > >
>> >> >> >> > > > > Are they all OK for Neon?
>> >> >> >> > > > >
>> >> >> >> > > > > They are not OK for MVE.
>> >> >> >> > > > >
>> >> >> >> > > > > Ideally you could add testcases to cover to the supported and
>> >> >> >> > > > > unsupported modes for both Neon and MVE.\
>> >> >> >> > > > >
>> >> >> >> > > > > Before your patch, the expander is enabled for MVE for 64 bit modes
>> >> >> >> > > > > (V8QI, V4HI, V2SI): what happens in this case? Does the compiler crash
>> >> >> >> > > > > or is there something else preventing the match?
>> >> >> >> > > > Hi,
>> >> >> >> > > > Apparently there is VALID_MVE_MODE macro, so is it better to use:
>> >> >> >> > > > TARGET_NEON || (TARGET_HAVE_MVE &&
>> >> >> >> > > VALID_MVE_MODE(<MODE>mode))
>> >> >> >> > > > as in the attached patch ?
>> >> >> >> >
>> >> >> >> > The change is ok. I would like to see some testcases like Christophe suggested, but this patch just moves the expander around rather than introducing new functionality.
>> >> >> >> Hi Kyrill,
>> >> >> >> As mentioned in the first email, the patch improves code-gen for
>> >> >> >> following test-case:
>> >> >> >>
>> >> >> >> bfloat16x4_t f (bfloat16_t a)
>> >> >> >> {
>> >> >> >>   return (bfloat16x4_t) {a, a, a, a};
>> >> >> >> }
>> >> >> >>
>> >> >> >> Before patch:
>> >> >> >> f:
>> >> >> >>         mov     r3, r0  @ __bf16
>> >> >> >>         adr     r1, .L4
>> >> >> >>         ldrd    r0, [r1]
>> >> >> >>         mov     r2, r3  @ __bf16
>> >> >> >>         mov     ip, r3  @ __bf16
>> >> >> >>         bfi     r1, r2, #0, #16
>> >> >> >>         bfi     r0, ip, #0, #16
>> >> >> >>         bfi     r1, r3, #16, #16
>> >> >> >>         bfi     r0, r2, #16, #16
>> >> >> >>         bx      lr
>> >> >> >>
>> >> >> >> After patch:
>> >> >> >> f:
>> >> >> >>         vdup.16 d16, r0
>> >> >> >>         vmov    r0, r1, d16  @ v4bf
>> >> >> >>         bx      lr
>> >> >> >>
>> >> >> >> because the patch changes mode from VDQ to VDQX to accommodate bf modes.
>> >> >> >> I have included the test in the attached patch.
>> >> >> >> I think Christophe's concerns were mainly about the right modes
>> >> >> >> getting enabled for MVE.
>> >> >> >> Unfortunately, I am not sure how to test for that because the FE
>> >> >> >> catches invalid modes, and we don't
>> >> >> >> end up hitting the pattern.
>> >> >> >>
>> >> >> >
>> >> >> > Hi Prathamesh,
>> >> >> >
>> >> >> > The new testcase fails on arm-linux-gnueabihf:
>> >> >> >  FAIL: gcc.target/arm/simd/pr98435.c (test for excess errors)
>> >> >> > Excess errors:
>> >> >> > /aci-gcc-fsf/builds/gcc-fsf-gccsrc/sysroot-arm-none-linux-gnueabihf/usr/include/gnu/stubs.h:7:11: fatal error: gnu/stubs-soft.h: No such file or directory
>> >> >> > compilation terminated.
>> >> >> >
>> >> >> > Because you don't check whether  -mfloat-abi=softfp is actually supported.
>> >> >> >
>> >> >> > Can you fix that?
>> >> >> Oops, sorry about that.
>> >> >> The attached patch fixes the test by requiring arm_softfloat and makes
>> >> >> it UNSUPPORTED on arm-linux-gnueabihf.
>> >> >> Does it look OK ?
>> >> >>
>> >> >
>> >> > I don't think that's right: it would make the test unsupported if softfp is not the default even if the toolchain has the needed multilibs.
>> >> > Did you check eg. with arm-eabi and multilibs enabled?
>> >> Ah OK, thanks for pointing it out!
>> >> Does the attached patch look correct ?
>> >>
>> >
>> > I don't think: this would skip the test even if the toolchain has multilibs enabled.
>> > Did you check eg. with arm-eabi and multilibs enabled and the usual option overrides?
>> It showed 3 PASS with second patch:
>> /* { dg-skip-if "skip test for hard float" { *-*-* } {
>> "-mfloat-abi=hard" } { "" } } */
>>
>> I ran it using make check-gcc RUNTESTFLAGS="simd.exp=pr98435.c"
>> and built the toolchain using:
>> abe.sh --target arm-eabi --build all --set multilib=aprofile gcc=gcc.git~master.
>> I suppose that's correct ?
>
>
> I use rmprofile for arm-eabi, but since aprofile also includes both hard and soft multilibs, that should be OK.
> However, I meant overriding the flags used for testing. Here is my current list:
>
> -mcpu=cortex-a7/-mfloat-abi=soft/-march=armv7ve+simd
> -mcpu=cortex-a7/-mfloat-abi=hard/-march=armv7ve+simd
> -mthumb/-mcpu=cortex-a7/-mfloat-abi=hard/-march=armv7ve+simd
> -mthumb/-mfloat-abi=soft/-march=armv6s-m
> -mthumb/-mfloat-abi=soft/-march=armv7-m
> -mthumb/-mfloat-abi=hard/-march=armv7e-m+fp
> -mthumb/-mfloat-abi=hard/-march=armv7e-m+fp.dp
> -mthumb/-mfloat-abi=hard/-march=armv8-m.main+fp+dsp
> -mthumb/-mfloat-abi=soft/-march=armv8.1-m.main
Ah right, thanks for the list.
So, with these options -mthumb/-mfloat-abi=hard/-march=armv7e-m+fp,
the test used to PASS but with the patch applied, it now appears UNSUPPORTED
because it skips the test for -mfloat-abi=hard.

So I guess what we want to check is if -mfloat-abi=hard is used, then
the target has multilib support enabled ?
Could you suggest how to check for that with dejagnu ?

Thanks,
Prathamesh
>
> Christophe
>
>>
>> gcc -v output:
>> Configured with:
>> '/home/prathamesh.kulkarni/abe-toolchain-2/snapshots/gcc.git~master/configure'
>> SHELL=/bin/bash
>> --with-mpc=/home/prathamesh.kulkarni/abe-toolchain-2/builds/destdir/x86_64-pc-linux-gnu
>> --with-mpfr=/home/prathamesh.kulkarni/abe-toolchain-2/builds/destdir/x86_64-pc-linux-gnu
>> --with-gmp=/home/prathamesh.kulkarni/abe-toolchain-2/builds/destdir/x86_64-pc-linux-gnu
>> --with-gnu-as --with-gnu-ld --disable-libmudflap --enable-lto
>> --enable-shared --without-included-gettext --enable-nls
>> --with-system-zlib --disable-sjlj-exceptions
>> --enable-gnu-unique-object --enable-linker-build-id
>> --disable-libstdcxx-pch --enable-c99 --enable-clocale=gnu
>> --enable-libstdcxx-debug --enable-long-long --with-cloog=no
>> --with-ppl=no --with-isl=no --enable-multilib
>> --with-multilib-list=aprofile --enable-threads=no --disable-multiarch
>> --with-sysroot=/home/prathamesh.kulkarni/abe-toolchain-2/builds/destdir/x86_64-pc-linux-gnu/arm-eabi
>> --with-newlib --enable-checking=yes --disable-bootstrap
>> --enable-languages=c,c++,lto
>> --prefix=/home/prathamesh.kulkarni/abe-toolchain-2/builds/destdir/x86_64-pc-linux-gnu
>> --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu
>> --target=arm-eabi
>>
>> Thanks,
>> Prathamesh
>> >
>> >
>> > Christophe
>> >
>> >> Thanks,
>> >> Prathamesh
>> >> >
>> >> > Christophe
>> >> >
>> >> >>
>> >> >> Thanks,
>> >> >> Prathamesh
>> >> >> >
>> >> >> > Thanks
>> >> >> >
>> >> >> > Christophe
>> >> >> >
>> >> >> >
>> >> >> >>
>> >> >> >> Thanks,
>> >> >> >> Prathamesh
>> >> >> >> > Thanks,
>> >> >> >> > Kyrill
>> >> >> >> >
>> >> >> >> > > ping https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574206.html
>> >> >> >> > >
>> >> >> >> > > Thanks,
>> >> >> >> > > Prathamesh
>> >> >> >> > > >
>> >> >> >> > > > Thanks,
>> >> >> >> > > > Prathamesh
>> >> >> >> > > > >
>> >> >> >> > > > >
>> >> >> >> > > > > Thanks,
>> >> >> >> > > > >
>> >> >> >> > > > >
>> >> >> >> > > > > Christophe
>> >> >> >> > > > >
>> >> >> >> > > > >
>> >> >> >> > > > > > Thanks,
>> >> >> >> > > > > > Prathamesh
>> >> >> >> > > > > >>
>> >> >> >> > > > > >> Christophe
>> >> >> >> > > > > >>
>> >> >> >> > > > > >>
>> >> >> >> > > > > >>> Ok.
>> >> >> >> > > > > >>> Thanks,
>> >> >> >> > > > > >>> Kyrill
>> >> >> >> > > > > >>>
>> >> >> >> > > > > >>>
>> >> >> >> > > > > >>>> Thanks,
>> >> >> >> > > > > >>>> Prathamesh
>> >> >> >> > > > > >>>>> Thanks,
>> >> >> >> > > > > >>>>> Kyrill
>> >> >> >> > > > > >>>>>
>> >> >> >> > > > > >>>>>> Thanks,
>> >> >> >> > > > > >>>>>> Prathamesh
>> >> >> >> > > > > >>>>>>> Thanks,
>> >> >> >> > > > > >>>>>>> Prathamesh
>> >> >> >> > > > > >>>>>>>> That being said, I suggest you look at other similar patterns in
>> >> >> >> > > > > >>>>>>>> vec-common.md, most of which are gated on
>> >> >> >> > > > > >>>>>>>> ARM_HAVE_<MODE>_ARITH
>> >> >> >> > > > > >>>>>>>> and possibly beware of issues with iwmmxt :-)
>> >> >> >> > > > > >>>>>>>>
>> >> >> >> > > > > >>>>>>>> Christophe
>> >> >> >> > > > > >>>>>>>>
>> >> >> >> > > > > >>>>>>>>> Thanks,
>> >> >> >> > > > > >>>>>>>>> Prathamesh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ARM] PR98435: Missed optimization in expanding vector constructor
  2021-08-06  9:50                                         ` Prathamesh Kulkarni
@ 2021-08-06 12:01                                           ` Christophe Lyon
  2021-08-09  5:07                                             ` Prathamesh Kulkarni
  0 siblings, 1 reply; 29+ messages in thread
From: Christophe Lyon @ 2021-08-06 12:01 UTC (permalink / raw)
  To: Prathamesh Kulkarni; +Cc: Kyrylo Tkachov, gcc Patches

On Fri, Aug 6, 2021 at 11:51 AM Prathamesh Kulkarni <
prathamesh.kulkarni@linaro.org> wrote:

> On Fri, 6 Aug 2021 at 14:49, Christophe Lyon
> <christophe.lyon.oss@gmail.com> wrote:
> >
> >
> >
> > On Fri, Aug 6, 2021 at 11:00 AM Prathamesh Kulkarni <
> prathamesh.kulkarni@linaro.org> wrote:
> >>
> >> On Thu, 5 Aug 2021 at 18:05, Christophe Lyon
> >> <christophe.lyon.oss@gmail.com> wrote:
> >> >
> >> >
> >> >
> >> > On Thu, Aug 5, 2021 at 2:28 PM Prathamesh Kulkarni <
> prathamesh.kulkarni@linaro.org> wrote:
> >> >>
> >> >> On Tue, 3 Aug 2021 at 20:52, Christophe Lyon
> >> >> <christophe.lyon.oss@gmail.com> wrote:
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Tue, Aug 3, 2021 at 12:57 PM Prathamesh Kulkarni <
> prathamesh.kulkarni@linaro.org> wrote:
> >> >> >>
> >> >> >> On Tue, 3 Aug 2021 at 14:59, Christophe Lyon
> >> >> >> <christophe.lyon.oss@gmail.com> wrote:
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > On Tue, Jul 6, 2021 at 11:26 AM Prathamesh Kulkarni via
> Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> >> >> >> >>
> >> >> >> >> On Tue, 6 Jul 2021 at 13:33, Kyrylo Tkachov <
> Kyrylo.Tkachov@arm.com> wrote:
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> > > -----Original Message-----
> >> >> >> >> > > From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> >> >> >> >> > > Sent: 06 July 2021 08:06
> >> >> >> >> > > To: Christophe LYON <christophe.lyon@foss.st.com>
> >> >> >> >> > > Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; gcc Patches
> <gcc-
> >> >> >> >> > > patches@gcc.gnu.org>
> >> >> >> >> > > Subject: Re: [ARM] PR98435: Missed optimization in
> expanding vector
> >> >> >> >> > > constructor
> >> >> >> >> > >
> >> >> >> >> > > On Thu, 1 Jul 2021 at 16:26, Prathamesh Kulkarni
> >> >> >> >> > > <prathamesh.kulkarni@linaro.org> wrote:
> >> >> >> >> > > >
> >> >> >> >> > > > On Wed, 30 Jun 2021 at 20:51, Christophe LYON
> >> >> >> >> > > > <christophe.lyon@foss.st.com> wrote:
> >> >> >> >> > > > >
> >> >> >> >> > > > >
> >> >> >> >> > > > > On 29/06/2021 12:46, Prathamesh Kulkarni wrote:
> >> >> >> >> > > > > > On Mon, 28 Jun 2021 at 14:48, Christophe LYON
> >> >> >> >> > > > > > <christophe.lyon@foss.st.com> wrote:
> >> >> >> >> > > > > >>
> >> >> >> >> > > > > >> On 28/06/2021 10:40, Kyrylo Tkachov via Gcc-patches
> wrote:
> >> >> >> >> > > > > >>>> -----Original Message-----
> >> >> >> >> > > > > >>>> From: Prathamesh Kulkarni <
> prathamesh.kulkarni@linaro.org>
> >> >> >> >> > > > > >>>> Sent: 28 June 2021 09:38
> >> >> >> >> > > > > >>>> To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> >> >> >> >> > > > > >>>> Cc: Christophe Lyon <christophe.lyon@linaro.org>;
> gcc Patches
> >> >> >> >> > > <gcc-
> >> >> >> >> > > > > >>>> patches@gcc.gnu.org>
> >> >> >> >> > > > > >>>> Subject: Re: [ARM] PR98435: Missed optimization
> in expanding
> >> >> >> >> > > vector
> >> >> >> >> > > > > >>>> constructor
> >> >> >> >> > > > > >>>>
> >> >> >> >> > > > > >>>> On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov
> >> >> >> >> > > <Kyrylo.Tkachov@arm.com>
> >> >> >> >> > > > > >>>> wrote:
> >> >> >> >> > > > > >>>>>
> >> >> >> >> > > > > >>>>>> -----Original Message-----
> >> >> >> >> > > > > >>>>>> From: Prathamesh Kulkarni <
> prathamesh.kulkarni@linaro.org>
> >> >> >> >> > > > > >>>>>> Sent: 14 June 2021 09:02
> >> >> >> >> > > > > >>>>>> To: Christophe Lyon <christophe.lyon@linaro.org
> >
> >> >> >> >> > > > > >>>>>> Cc: gcc Patches <gcc-patches@gcc.gnu.org>;
> Kyrylo Tkachov
> >> >> >> >> > > > > >>>>>> <Kyrylo.Tkachov@arm.com>
> >> >> >> >> > > > > >>>>>> Subject: Re: [ARM] PR98435: Missed optimization
> in expanding
> >> >> >> >> > > vector
> >> >> >> >> > > > > >>>>>> constructor
> >> >> >> >> > > > > >>>>>>
> >> >> >> >> > > > > >>>>>> On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni
> >> >> >> >> > > > > >>>>>> <prathamesh.kulkarni@linaro.org> wrote:
> >> >> >> >> > > > > >>>>>>> On Fri, 4 Jun 2021 at 13:15, Christophe Lyon
> >> >> >> >> > > > > >>>> <christophe.lyon@linaro.org>
> >> >> >> >> > > > > >>>>>> wrote:
> >> >> >> >> > > > > >>>>>>>> On Fri, 4 Jun 2021 at 09:27, Prathamesh
> Kulkarni via Gcc-
> >> >> >> >> > > patches
> >> >> >> >> > > > > >>>>>>>> <gcc-patches@gcc.gnu.org> wrote:
> >> >> >> >> > > > > >>>>>>>>> Hi,
> >> >> >> >> > > > > >>>>>>>>> As mentioned in PR, for the following
> test-case:
> >> >> >> >> > > > > >>>>>>>>>
> >> >> >> >> > > > > >>>>>>>>> #include <arm_neon.h>
> >> >> >> >> > > > > >>>>>>>>>
> >> >> >> >> > > > > >>>>>>>>> bfloat16x4_t f1 (bfloat16_t a)
> >> >> >> >> > > > > >>>>>>>>> {
> >> >> >> >> > > > > >>>>>>>>>     return vdup_n_bf16 (a);
> >> >> >> >> > > > > >>>>>>>>> }
> >> >> >> >> > > > > >>>>>>>>>
> >> >> >> >> > > > > >>>>>>>>> bfloat16x4_t f2 (bfloat16_t a)
> >> >> >> >> > > > > >>>>>>>>> {
> >> >> >> >> > > > > >>>>>>>>>     return (bfloat16x4_t) {a, a, a, a};
> >> >> >> >> > > > > >>>>>>>>> }
> >> >> >> >> > > > > >>>>>>>>>
> >> >> >> >> > > > > >>>>>>>>> Compiling with arm-linux-gnueabi -O3
> -mfpu=neon -mfloat-
> >> >> >> >> > > > > >>>> abi=softfp
> >> >> >> >> > > > > >>>>>>>>> -march=armv8.2-a+bf16+fp16 results in f2 not
> being
> >> >> >> >> > > vectorized:
> >> >> >> >> > > > > >>>>>>>>>
> >> >> >> >> > > > > >>>>>>>>> f1:
> >> >> >> >> > > > > >>>>>>>>>           vdup.16 d16, r0
> >> >> >> >> > > > > >>>>>>>>>           vmov    r0, r1, d16  @ v4bf
> >> >> >> >> > > > > >>>>>>>>>           bx      lr
> >> >> >> >> > > > > >>>>>>>>>
> >> >> >> >> > > > > >>>>>>>>> f2:
> >> >> >> >> > > > > >>>>>>>>>           mov     r3, r0  @ __bf16
> >> >> >> >> > > > > >>>>>>>>>           adr     r1, .L4
> >> >> >> >> > > > > >>>>>>>>>           ldrd    r0, [r1]
> >> >> >> >> > > > > >>>>>>>>>           mov     r2, r3  @ __bf16
> >> >> >> >> > > > > >>>>>>>>>           mov     ip, r3  @ __bf16
> >> >> >> >> > > > > >>>>>>>>>           bfi     r1, r2, #0, #16
> >> >> >> >> > > > > >>>>>>>>>           bfi     r0, ip, #0, #16
> >> >> >> >> > > > > >>>>>>>>>           bfi     r1, r3, #16, #16
> >> >> >> >> > > > > >>>>>>>>>           bfi     r0, r2, #16, #16
> >> >> >> >> > > > > >>>>>>>>>           bx      lr
> >> >> >> >> > > > > >>>>>>>>>
> >> >> >> >> > > > > >>>>>>>>> This seems to happen because vec_init
> pattern in neon.md
> >> >> >> >> > > has VDQ
> >> >> >> >> > > > > >>>>>> mode
> >> >> >> >> > > > > >>>>>>>>> iterator, which doesn't include V4BF. In
> attached patch, I
> >> >> >> >> > > changed
> >> >> >> >> > > > > >>>>>>>>> mode
> >> >> >> >> > > > > >>>>>>>>> to VDQX which seems to work for the
> test-case, and the
> >> >> >> >> > > compiler
> >> >> >> >> > > > > >>>> now
> >> >> >> >> > > > > >>>>>> generates:
> >> >> >> >> > > > > >>>>>>>>> f2:
> >> >> >> >> > > > > >>>>>>>>>           vdup.16 d16, r0
> >> >> >> >> > > > > >>>>>>>>>           vmov    r0, r1, d16  @ v4bf
> >> >> >> >> > > > > >>>>>>>>>           bx      lr
> >> >> >> >> > > > > >>>>>>>>>
> >> >> >> >> > > > > >>>>>>>>> However, the pattern is also gated on
> TARGET_HAVE_MVE
> >> >> >> >> > > and I am
> >> >> >> >> > > > > >>>>>> not
> >> >> >> >> > > > > >>>>>>>>> sure if either VDQ or VDQX are correct modes
> for MVE since
> >> >> >> >> > > MVE
> >> >> >> >> > > > > >>>> has
> >> >> >> >> > > > > >>>>>>>>> only 128-bit vectors ?
> >> >> >> >> > > > > >>>>>>>>>
> >> >> >> >> > > > > >>>>>>>> I think patterns common to both Neon and MVE
> should be
> >> >> >> >> > > moved to
> >> >> >> >> > > > > >>>>>>>> vec-common.md, I don't know why such patterns
> were left in
> >> >> >> >> > > > > >>>> neon.md.
> >> >> >> >> > > > > >>>>>>> Since we end up calling
> neon_expand_vector_init for both
> >> >> >> >> > > NEON and
> >> >> >> >> > > > > >>>> MVE,
> >> >> >> >> > > > > >>>>>>> I am not sure if we should separate the
> pattern ?
> >> >> >> >> > > > > >>>>>>> Would it make sense to FAIL if the mode size
> isn't 16 bytes for
> >> >> >> >> > > MVE as
> >> >> >> >> > > > > >>>>>>> in attached patch so
> >> >> >> >> > > > > >>>>>>> it will call neon_expand_vector_init only for
> 128-bit vectors ?
> >> >> >> >> > > > > >>>>>>> Altho hard-coding 16 in the pattern doesn't
> seem a good idea to
> >> >> >> >> > > me
> >> >> >> >> > > > > >>>> either.
> >> >> >> >> > > > > >>>>>> ping
> https://gcc.gnu.org/pipermail/gcc-patches/2021-
> >> >> >> >> > > June/572342.html
> >> >> >> >> > > > > >>>>>> (attaching patch as text).
> >> >> >> >> > > > > >>>>>>
> >> >> >> >> > > > > >>>>> --- a/gcc/config/arm/neon.md
> >> >> >> >> > > > > >>>>> +++ b/gcc/config/arm/neon.md
> >> >> >> >> > > > > >>>>> @@ -459,10 +459,12 @@
> >> >> >> >> > > > > >>>>>    )
> >> >> >> >> > > > > >>>>>
> >> >> >> >> > > > > >>>>>    (define_expand "vec_init<mode><V_elem_l>"
> >> >> >> >> > > > > >>>>> -  [(match_operand:VDQ 0 "s_register_operand")
> >> >> >> >> > > > > >>>>> +  [(match_operand:VDQX 0 "s_register_operand")
> >> >> >> >> > > > > >>>>>       (match_operand 1 "" "")]
> >> >> >> >> > > > > >>>>>      "TARGET_NEON || TARGET_HAVE_MVE"
> >> >> >> >> > > > > >>>>>    {
> >> >> >> >> > > > > >>>>> +  if (TARGET_HAVE_MVE && GET_MODE_SIZE (GET_MODE
> >> >> >> >> > > > > >>>> (operands[0])) != 16)
> >> >> >> >> > > > > >>>>> +    FAIL;
> >> >> >> >> > > > > >>>>>      neon_expand_vector_init (operands[0],
> operands[1]);
> >> >> >> >> > > > > >>>>>      DONE;
> >> >> >> >> > > > > >>>>>    })
> >> >> >> >> > > > > >>>>>
> >> >> >> >> > > > > >>>>> I think we should move this to vec-common.md
> like Christophe
> >> >> >> >> > > said.
> >> >> >> >> > > > > >>>>> Perhaps rather than making it FAIL for non-16
> MVE sizes we just
> >> >> >> >> > > disable it in
> >> >> >> >> > > > > >>>> the expander condition?
> >> >> >> >> > > > > >>>>> "TARGET_NEON || (TARGET_HAVE_MVE &&
> GET_MODE_SIZE (<
> >> >> >> >> > > > > >>>> VDQ>mode) != 16)"
> >> >> >> >> > > > > >>>> Is it OK to use <MODE>mode ? Because using
> <VDQ>mode resulted
> >> >> >> >> > > in lot
> >> >> >> >> > > > > >>>> of build errors.
> >> >> >> >> > > > > >>>> Also, I think the comparison should be inverted,
> ie, GET_MODE_SIZE
> >> >> >> >> > > > > >>>> (<MODE>mode) == 16 since
> >> >> >> >> > > > > >>>> we want to make the pattern pass if target is MVE
> and vector size is
> >> >> >> >> > > 16 bytes ?
> >> >> >> >> > > > > >>>> Do these changes in attached patch look OK ?
> >> >> >> >> > > > > >>> Yes, you're right.
> >> >> >> >> > > > > >>
> >> >> >> >> > > > > >> Can't this be ARM_HAVE_<MODE>_ARITH like in most
> expanders in
> >> >> >> >> > > vec-common.md?
> >> >> >> >> > > > > >>
> >> >> >> >> > > > > >> (maybe with a && !TARGET_REALLY_IWMMXT if needed)
> >> >> >> >> > > > > > I wonder if this should be ARM_HAVE_<MODE>_LDST
> instead since
> >> >> >> >> > > we're
> >> >> >> >> > > > > > initializing the vector ?
> >> >> >> >> > > > >
> >> >> >> >> > > > >
> >> >> >> >> > > > > Well, it really depends on which modes you want to
> enable.
> >> >> >> >> > > > >
> >> >> >> >> > > > >
> >> >> >> >> > > > > Looks like your move VDQ -> VDQ adds V4BF, V8BF and DI.
> >> >> >> >> > > > >
> >> >> >> >> > > > > Are they all OK for Neon?
> >> >> >> >> > > > >
> >> >> >> >> > > > > They are not OK for MVE.
> >> >> >> >> > > > >
> >> >> >> >> > > > > Ideally you could add testcases to cover to the
> supported and
> >> >> >> >> > > > > unsupported modes for both Neon and MVE.\
> >> >> >> >> > > > >
> >> >> >> >> > > > > Before your patch, the expander is enabled for MVE for
> 64 bit modes
> >> >> >> >> > > > > (V8QI, V4HI, V2SI): what happens in this case? Does
> the compiler crash
> >> >> >> >> > > > > or is there something else preventing the match?
> >> >> >> >> > > > Hi,
> >> >> >> >> > > > Apparently there is VALID_MVE_MODE macro, so is it
> better to use:
> >> >> >> >> > > > TARGET_NEON || (TARGET_HAVE_MVE &&
> >> >> >> >> > > VALID_MVE_MODE(<MODE>mode))
> >> >> >> >> > > > as in the attached patch ?
> >> >> >> >> >
> >> >> >> >> > The change is ok. I would like to see some testcases like
> Christophe suggested, but this patch just moves the expander around rather
> than introducing new functionality.
> >> >> >> >> Hi Kyrill,
> >> >> >> >> As mentioned in the first email, the patch improves code-gen
> for
> >> >> >> >> following test-case:
> >> >> >> >>
> >> >> >> >> bfloat16x4_t f (bfloat16_t a)
> >> >> >> >> {
> >> >> >> >>   return (bfloat16x4_t) {a, a, a, a};
> >> >> >> >> }
> >> >> >> >>
> >> >> >> >> Before patch:
> >> >> >> >> f:
> >> >> >> >>         mov     r3, r0  @ __bf16
> >> >> >> >>         adr     r1, .L4
> >> >> >> >>         ldrd    r0, [r1]
> >> >> >> >>         mov     r2, r3  @ __bf16
> >> >> >> >>         mov     ip, r3  @ __bf16
> >> >> >> >>         bfi     r1, r2, #0, #16
> >> >> >> >>         bfi     r0, ip, #0, #16
> >> >> >> >>         bfi     r1, r3, #16, #16
> >> >> >> >>         bfi     r0, r2, #16, #16
> >> >> >> >>         bx      lr
> >> >> >> >>
> >> >> >> >> After patch:
> >> >> >> >> f:
> >> >> >> >>         vdup.16 d16, r0
> >> >> >> >>         vmov    r0, r1, d16  @ v4bf
> >> >> >> >>         bx      lr
> >> >> >> >>
> >> >> >> >> because the patch changes mode from VDQ to VDQX to accommodate
> bf modes.
> >> >> >> >> I have included the test in the attached patch.
> >> >> >> >> I think Christophe's concerns were mainly about the right modes
> >> >> >> >> getting enabled for MVE.
> >> >> >> >> Unfortunately, I am not sure how to test for that because the
> FE
> >> >> >> >> catches invalid modes, and we don't
> >> >> >> >> end up hitting the pattern.
> >> >> >> >>
> >> >> >> >
> >> >> >> > Hi Prathamesh,
> >> >> >> >
> >> >> >> > The new testcase fails on arm-linux-gnueabihf:
> >> >> >> >  FAIL: gcc.target/arm/simd/pr98435.c (test for excess errors)
> >> >> >> > Excess errors:
> >> >> >> >
> /aci-gcc-fsf/builds/gcc-fsf-gccsrc/sysroot-arm-none-linux-gnueabihf/usr/include/gnu/stubs.h:7:11:
> fatal error: gnu/stubs-soft.h: No such file or directory
> >> >> >> > compilation terminated.
> >> >> >> >
> >> >> >> > Because you don't check whether  -mfloat-abi=softfp is actually
> supported.
> >> >> >> >
> >> >> >> > Can you fix that?
> >> >> >> Oops, sorry about that.
> >> >> >> The attached patch fixes the test by requiring arm_softfloat and
> makes
> >> >> >> it UNSUPPORTED on arm-linux-gnueabihf.
> >> >> >> Does it look OK ?
> >> >> >>
> >> >> >
> >> >> > I don't think that's right: it would make the test unsupported if
> softfp is not the default even if the toolchain has the needed multilibs.
> >> >> > Did you check eg. with arm-eabi and multilibs enabled?
> >> >> Ah OK, thanks for pointing it out!
> >> >> Does the attached patch look correct ?
> >> >>
> >> >
> >> > I don't think: this would skip the test even if the toolchain has
> multilibs enabled.
> >> > Did you check eg. with arm-eabi and multilibs enabled and the usual
> option overrides?
> >> It showed 3 PASS with second patch:
> >> /* { dg-skip-if "skip test for hard float" { *-*-* } {
> >> "-mfloat-abi=hard" } { "" } } */
> >>
> >> I ran it using make check-gcc RUNTESTFLAGS="simd.exp=pr98435.c"
> >> and built the toolchain using:
> >> abe.sh --target arm-eabi --build all --set multilib=aprofile
> gcc=gcc.git~master.
> >> I suppose that's correct ?
> >
> >
> > I use rmprofile for arm-eabi, but since aprofile also includes both hard
> and soft multilibs, that should be OK.
> > However, I meant overriding the flags used for testing. Here is my
> current list:
> >
> > -mcpu=cortex-a7/-mfloat-abi=soft/-march=armv7ve+simd
> > -mcpu=cortex-a7/-mfloat-abi=hard/-march=armv7ve+simd
> > -mthumb/-mcpu=cortex-a7/-mfloat-abi=hard/-march=armv7ve+simd
> > -mthumb/-mfloat-abi=soft/-march=armv6s-m
> > -mthumb/-mfloat-abi=soft/-march=armv7-m
> > -mthumb/-mfloat-abi=hard/-march=armv7e-m+fp
> > -mthumb/-mfloat-abi=hard/-march=armv7e-m+fp.dp
> > -mthumb/-mfloat-abi=hard/-march=armv8-m.main+fp+dsp
> > -mthumb/-mfloat-abi=soft/-march=armv8.1-m.main
> Ah right, thanks for the list.
> So, with these options -mthumb/-mfloat-abi=hard/-march=armv7e-m+fp,
> the test used to PASS but with the patch applied, it now appears
> UNSUPPORTED
> because it skips the test for -mfloat-abi=hard.
>

Yes, that's what I wrote above.


> So I guess what we want to check is if -mfloat-abi=hard is used, then
> the target has multilib support enabled ?
> Could you suggest how to check for that with dejagnu ?
>

No, since you want to use floatfp, you want to make sure that floatfp is
accepted by the toolchain.
Looking at target-supports.exp, I'd suggest you try  arm_softfp_ok.

Christophe


> Thanks,
> Prathamesh
> >
> > Christophe
> >
> >>
> >> gcc -v output:
> >> Configured with:
> >>
> '/home/prathamesh.kulkarni/abe-toolchain-2/snapshots/gcc.git~master/configure'
> >> SHELL=/bin/bash
> >>
> --with-mpc=/home/prathamesh.kulkarni/abe-toolchain-2/builds/destdir/x86_64-pc-linux-gnu
> >>
> --with-mpfr=/home/prathamesh.kulkarni/abe-toolchain-2/builds/destdir/x86_64-pc-linux-gnu
> >>
> --with-gmp=/home/prathamesh.kulkarni/abe-toolchain-2/builds/destdir/x86_64-pc-linux-gnu
> >> --with-gnu-as --with-gnu-ld --disable-libmudflap --enable-lto
> >> --enable-shared --without-included-gettext --enable-nls
> >> --with-system-zlib --disable-sjlj-exceptions
> >> --enable-gnu-unique-object --enable-linker-build-id
> >> --disable-libstdcxx-pch --enable-c99 --enable-clocale=gnu
> >> --enable-libstdcxx-debug --enable-long-long --with-cloog=no
> >> --with-ppl=no --with-isl=no --enable-multilib
> >> --with-multilib-list=aprofile --enable-threads=no --disable-multiarch
> >>
> --with-sysroot=/home/prathamesh.kulkarni/abe-toolchain-2/builds/destdir/x86_64-pc-linux-gnu/arm-eabi
> >> --with-newlib --enable-checking=yes --disable-bootstrap
> >> --enable-languages=c,c++,lto
> >>
> --prefix=/home/prathamesh.kulkarni/abe-toolchain-2/builds/destdir/x86_64-pc-linux-gnu
> >> --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu
> >> --target=arm-eabi
> >>
> >> Thanks,
> >> Prathamesh
> >> >
> >> >
> >> > Christophe
> >> >
> >> >> Thanks,
> >> >> Prathamesh
> >> >> >
> >> >> > Christophe
> >> >> >
> >> >> >>
> >> >> >> Thanks,
> >> >> >> Prathamesh
> >> >> >> >
> >> >> >> > Thanks
> >> >> >> >
> >> >> >> > Christophe
> >> >> >> >
> >> >> >> >
> >> >> >> >>
> >> >> >> >> Thanks,
> >> >> >> >> Prathamesh
> >> >> >> >> > Thanks,
> >> >> >> >> > Kyrill
> >> >> >> >> >
> >> >> >> >> > > ping
> https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574206.html
> >> >> >> >> > >
> >> >> >> >> > > Thanks,
> >> >> >> >> > > Prathamesh
> >> >> >> >> > > >
> >> >> >> >> > > > Thanks,
> >> >> >> >> > > > Prathamesh
> >> >> >> >> > > > >
> >> >> >> >> > > > >
> >> >> >> >> > > > > Thanks,
> >> >> >> >> > > > >
> >> >> >> >> > > > >
> >> >> >> >> > > > > Christophe
> >> >> >> >> > > > >
> >> >> >> >> > > > >
> >> >> >> >> > > > > > Thanks,
> >> >> >> >> > > > > > Prathamesh
> >> >> >> >> > > > > >>
> >> >> >> >> > > > > >> Christophe
> >> >> >> >> > > > > >>
> >> >> >> >> > > > > >>
> >> >> >> >> > > > > >>> Ok.
> >> >> >> >> > > > > >>> Thanks,
> >> >> >> >> > > > > >>> Kyrill
> >> >> >> >> > > > > >>>
> >> >> >> >> > > > > >>>
> >> >> >> >> > > > > >>>> Thanks,
> >> >> >> >> > > > > >>>> Prathamesh
> >> >> >> >> > > > > >>>>> Thanks,
> >> >> >> >> > > > > >>>>> Kyrill
> >> >> >> >> > > > > >>>>>
> >> >> >> >> > > > > >>>>>> Thanks,
> >> >> >> >> > > > > >>>>>> Prathamesh
> >> >> >> >> > > > > >>>>>>> Thanks,
> >> >> >> >> > > > > >>>>>>> Prathamesh
> >> >> >> >> > > > > >>>>>>>> That being said, I suggest you look at other
> similar patterns in
> >> >> >> >> > > > > >>>>>>>> vec-common.md, most of which are gated on
> >> >> >> >> > > > > >>>>>>>> ARM_HAVE_<MODE>_ARITH
> >> >> >> >> > > > > >>>>>>>> and possibly beware of issues with iwmmxt :-)
> >> >> >> >> > > > > >>>>>>>>
> >> >> >> >> > > > > >>>>>>>> Christophe
> >> >> >> >> > > > > >>>>>>>>
> >> >> >> >> > > > > >>>>>>>>> Thanks,
> >> >> >> >> > > > > >>>>>>>>> Prathamesh
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ARM] PR98435: Missed optimization in expanding vector constructor
  2021-08-06 12:01                                           ` Christophe Lyon
@ 2021-08-09  5:07                                             ` Prathamesh Kulkarni
  2021-08-09 16:19                                               ` Christophe Lyon
  0 siblings, 1 reply; 29+ messages in thread
From: Prathamesh Kulkarni @ 2021-08-09  5:07 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: Kyrylo Tkachov, gcc Patches

[-- Attachment #1: Type: text/plain, Size: 20288 bytes --]

On Fri, 6 Aug 2021 at 17:31, Christophe Lyon
<christophe.lyon.oss@gmail.com> wrote:
>
>
>
> On Fri, Aug 6, 2021 at 11:51 AM Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> wrote:
>>
>> On Fri, 6 Aug 2021 at 14:49, Christophe Lyon
>> <christophe.lyon.oss@gmail.com> wrote:
>> >
>> >
>> >
>> > On Fri, Aug 6, 2021 at 11:00 AM Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> wrote:
>> >>
>> >> On Thu, 5 Aug 2021 at 18:05, Christophe Lyon
>> >> <christophe.lyon.oss@gmail.com> wrote:
>> >> >
>> >> >
>> >> >
>> >> > On Thu, Aug 5, 2021 at 2:28 PM Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> wrote:
>> >> >>
>> >> >> On Tue, 3 Aug 2021 at 20:52, Christophe Lyon
>> >> >> <christophe.lyon.oss@gmail.com> wrote:
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > On Tue, Aug 3, 2021 at 12:57 PM Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> wrote:
>> >> >> >>
>> >> >> >> On Tue, 3 Aug 2021 at 14:59, Christophe Lyon
>> >> >> >> <christophe.lyon.oss@gmail.com> wrote:
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > On Tue, Jul 6, 2021 at 11:26 AM Prathamesh Kulkarni via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>> >> >> >> >>
>> >> >> >> >> On Tue, 6 Jul 2021 at 13:33, Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> wrote:
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> > > -----Original Message-----
>> >> >> >> >> > > From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
>> >> >> >> >> > > Sent: 06 July 2021 08:06
>> >> >> >> >> > > To: Christophe LYON <christophe.lyon@foss.st.com>
>> >> >> >> >> > > Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; gcc Patches <gcc-
>> >> >> >> >> > > patches@gcc.gnu.org>
>> >> >> >> >> > > Subject: Re: [ARM] PR98435: Missed optimization in expanding vector
>> >> >> >> >> > > constructor
>> >> >> >> >> > >
>> >> >> >> >> > > On Thu, 1 Jul 2021 at 16:26, Prathamesh Kulkarni
>> >> >> >> >> > > <prathamesh.kulkarni@linaro.org> wrote:
>> >> >> >> >> > > >
>> >> >> >> >> > > > On Wed, 30 Jun 2021 at 20:51, Christophe LYON
>> >> >> >> >> > > > <christophe.lyon@foss.st.com> wrote:
>> >> >> >> >> > > > >
>> >> >> >> >> > > > >
>> >> >> >> >> > > > > On 29/06/2021 12:46, Prathamesh Kulkarni wrote:
>> >> >> >> >> > > > > > On Mon, 28 Jun 2021 at 14:48, Christophe LYON
>> >> >> >> >> > > > > > <christophe.lyon@foss.st.com> wrote:
>> >> >> >> >> > > > > >>
>> >> >> >> >> > > > > >> On 28/06/2021 10:40, Kyrylo Tkachov via Gcc-patches wrote:
>> >> >> >> >> > > > > >>>> -----Original Message-----
>> >> >> >> >> > > > > >>>> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
>> >> >> >> >> > > > > >>>> Sent: 28 June 2021 09:38
>> >> >> >> >> > > > > >>>> To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
>> >> >> >> >> > > > > >>>> Cc: Christophe Lyon <christophe.lyon@linaro.org>; gcc Patches
>> >> >> >> >> > > <gcc-
>> >> >> >> >> > > > > >>>> patches@gcc.gnu.org>
>> >> >> >> >> > > > > >>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding
>> >> >> >> >> > > vector
>> >> >> >> >> > > > > >>>> constructor
>> >> >> >> >> > > > > >>>>
>> >> >> >> >> > > > > >>>> On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov
>> >> >> >> >> > > <Kyrylo.Tkachov@arm.com>
>> >> >> >> >> > > > > >>>> wrote:
>> >> >> >> >> > > > > >>>>>
>> >> >> >> >> > > > > >>>>>> -----Original Message-----
>> >> >> >> >> > > > > >>>>>> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
>> >> >> >> >> > > > > >>>>>> Sent: 14 June 2021 09:02
>> >> >> >> >> > > > > >>>>>> To: Christophe Lyon <christophe.lyon@linaro.org>
>> >> >> >> >> > > > > >>>>>> Cc: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov
>> >> >> >> >> > > > > >>>>>> <Kyrylo.Tkachov@arm.com>
>> >> >> >> >> > > > > >>>>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding
>> >> >> >> >> > > vector
>> >> >> >> >> > > > > >>>>>> constructor
>> >> >> >> >> > > > > >>>>>>
>> >> >> >> >> > > > > >>>>>> On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni
>> >> >> >> >> > > > > >>>>>> <prathamesh.kulkarni@linaro.org> wrote:
>> >> >> >> >> > > > > >>>>>>> On Fri, 4 Jun 2021 at 13:15, Christophe Lyon
>> >> >> >> >> > > > > >>>> <christophe.lyon@linaro.org>
>> >> >> >> >> > > > > >>>>>> wrote:
>> >> >> >> >> > > > > >>>>>>>> On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via Gcc-
>> >> >> >> >> > > patches
>> >> >> >> >> > > > > >>>>>>>> <gcc-patches@gcc.gnu.org> wrote:
>> >> >> >> >> > > > > >>>>>>>>> Hi,
>> >> >> >> >> > > > > >>>>>>>>> As mentioned in PR, for the following test-case:
>> >> >> >> >> > > > > >>>>>>>>>
>> >> >> >> >> > > > > >>>>>>>>> #include <arm_neon.h>
>> >> >> >> >> > > > > >>>>>>>>>
>> >> >> >> >> > > > > >>>>>>>>> bfloat16x4_t f1 (bfloat16_t a)
>> >> >> >> >> > > > > >>>>>>>>> {
>> >> >> >> >> > > > > >>>>>>>>>     return vdup_n_bf16 (a);
>> >> >> >> >> > > > > >>>>>>>>> }
>> >> >> >> >> > > > > >>>>>>>>>
>> >> >> >> >> > > > > >>>>>>>>> bfloat16x4_t f2 (bfloat16_t a)
>> >> >> >> >> > > > > >>>>>>>>> {
>> >> >> >> >> > > > > >>>>>>>>>     return (bfloat16x4_t) {a, a, a, a};
>> >> >> >> >> > > > > >>>>>>>>> }
>> >> >> >> >> > > > > >>>>>>>>>
>> >> >> >> >> > > > > >>>>>>>>> Compiling with arm-linux-gnueabi -O3 -mfpu=neon -mfloat-
>> >> >> >> >> > > > > >>>> abi=softfp
>> >> >> >> >> > > > > >>>>>>>>> -march=armv8.2-a+bf16+fp16 results in f2 not being
>> >> >> >> >> > > vectorized:
>> >> >> >> >> > > > > >>>>>>>>>
>> >> >> >> >> > > > > >>>>>>>>> f1:
>> >> >> >> >> > > > > >>>>>>>>>           vdup.16 d16, r0
>> >> >> >> >> > > > > >>>>>>>>>           vmov    r0, r1, d16  @ v4bf
>> >> >> >> >> > > > > >>>>>>>>>           bx      lr
>> >> >> >> >> > > > > >>>>>>>>>
>> >> >> >> >> > > > > >>>>>>>>> f2:
>> >> >> >> >> > > > > >>>>>>>>>           mov     r3, r0  @ __bf16
>> >> >> >> >> > > > > >>>>>>>>>           adr     r1, .L4
>> >> >> >> >> > > > > >>>>>>>>>           ldrd    r0, [r1]
>> >> >> >> >> > > > > >>>>>>>>>           mov     r2, r3  @ __bf16
>> >> >> >> >> > > > > >>>>>>>>>           mov     ip, r3  @ __bf16
>> >> >> >> >> > > > > >>>>>>>>>           bfi     r1, r2, #0, #16
>> >> >> >> >> > > > > >>>>>>>>>           bfi     r0, ip, #0, #16
>> >> >> >> >> > > > > >>>>>>>>>           bfi     r1, r3, #16, #16
>> >> >> >> >> > > > > >>>>>>>>>           bfi     r0, r2, #16, #16
>> >> >> >> >> > > > > >>>>>>>>>           bx      lr
>> >> >> >> >> > > > > >>>>>>>>>
>> >> >> >> >> > > > > >>>>>>>>> This seems to happen because vec_init pattern in neon.md
>> >> >> >> >> > > has VDQ
>> >> >> >> >> > > > > >>>>>> mode
>> >> >> >> >> > > > > >>>>>>>>> iterator, which doesn't include V4BF. In attached patch, I
>> >> >> >> >> > > changed
>> >> >> >> >> > > > > >>>>>>>>> mode
>> >> >> >> >> > > > > >>>>>>>>> to VDQX which seems to work for the test-case, and the
>> >> >> >> >> > > compiler
>> >> >> >> >> > > > > >>>> now
>> >> >> >> >> > > > > >>>>>> generates:
>> >> >> >> >> > > > > >>>>>>>>> f2:
>> >> >> >> >> > > > > >>>>>>>>>           vdup.16 d16, r0
>> >> >> >> >> > > > > >>>>>>>>>           vmov    r0, r1, d16  @ v4bf
>> >> >> >> >> > > > > >>>>>>>>>           bx      lr
>> >> >> >> >> > > > > >>>>>>>>>
>> >> >> >> >> > > > > >>>>>>>>> However, the pattern is also gated on TARGET_HAVE_MVE
>> >> >> >> >> > > and I am
>> >> >> >> >> > > > > >>>>>> not
>> >> >> >> >> > > > > >>>>>>>>> sure if either VDQ or VDQX are correct modes for MVE since
>> >> >> >> >> > > MVE
>> >> >> >> >> > > > > >>>> has
>> >> >> >> >> > > > > >>>>>>>>> only 128-bit vectors ?
>> >> >> >> >> > > > > >>>>>>>>>
>> >> >> >> >> > > > > >>>>>>>> I think patterns common to both Neon and MVE should be
>> >> >> >> >> > > moved to
>> >> >> >> >> > > > > >>>>>>>> vec-common.md, I don't know why such patterns were left in
>> >> >> >> >> > > > > >>>> neon.md.
>> >> >> >> >> > > > > >>>>>>> Since we end up calling neon_expand_vector_init for both
>> >> >> >> >> > > NEON and
>> >> >> >> >> > > > > >>>> MVE,
>> >> >> >> >> > > > > >>>>>>> I am not sure if we should separate the pattern ?
>> >> >> >> >> > > > > >>>>>>> Would it make sense to FAIL if the mode size isn't 16 bytes for
>> >> >> >> >> > > MVE as
>> >> >> >> >> > > > > >>>>>>> in attached patch so
>> >> >> >> >> > > > > >>>>>>> it will call neon_expand_vector_init only for 128-bit vectors ?
>> >> >> >> >> > > > > >>>>>>> Altho hard-coding 16 in the pattern doesn't seem a good idea to
>> >> >> >> >> > > me
>> >> >> >> >> > > > > >>>> either.
>> >> >> >> >> > > > > >>>>>> ping https://gcc.gnu.org/pipermail/gcc-patches/2021-
>> >> >> >> >> > > June/572342.html
>> >> >> >> >> > > > > >>>>>> (attaching patch as text).
>> >> >> >> >> > > > > >>>>>>
>> >> >> >> >> > > > > >>>>> --- a/gcc/config/arm/neon.md
>> >> >> >> >> > > > > >>>>> +++ b/gcc/config/arm/neon.md
>> >> >> >> >> > > > > >>>>> @@ -459,10 +459,12 @@
>> >> >> >> >> > > > > >>>>>    )
>> >> >> >> >> > > > > >>>>>
>> >> >> >> >> > > > > >>>>>    (define_expand "vec_init<mode><V_elem_l>"
>> >> >> >> >> > > > > >>>>> -  [(match_operand:VDQ 0 "s_register_operand")
>> >> >> >> >> > > > > >>>>> +  [(match_operand:VDQX 0 "s_register_operand")
>> >> >> >> >> > > > > >>>>>       (match_operand 1 "" "")]
>> >> >> >> >> > > > > >>>>>      "TARGET_NEON || TARGET_HAVE_MVE"
>> >> >> >> >> > > > > >>>>>    {
>> >> >> >> >> > > > > >>>>> +  if (TARGET_HAVE_MVE && GET_MODE_SIZE (GET_MODE
>> >> >> >> >> > > > > >>>> (operands[0])) != 16)
>> >> >> >> >> > > > > >>>>> +    FAIL;
>> >> >> >> >> > > > > >>>>>      neon_expand_vector_init (operands[0], operands[1]);
>> >> >> >> >> > > > > >>>>>      DONE;
>> >> >> >> >> > > > > >>>>>    })
>> >> >> >> >> > > > > >>>>>
>> >> >> >> >> > > > > >>>>> I think we should move this to vec-common.md like Christophe
>> >> >> >> >> > > said.
>> >> >> >> >> > > > > >>>>> Perhaps rather than making it FAIL for non-16 MVE sizes we just
>> >> >> >> >> > > disable it in
>> >> >> >> >> > > > > >>>> the expander condition?
>> >> >> >> >> > > > > >>>>> "TARGET_NEON || (TARGET_HAVE_MVE && GET_MODE_SIZE (<
>> >> >> >> >> > > > > >>>> VDQ>mode) != 16)"
>> >> >> >> >> > > > > >>>> Is it OK to use <MODE>mode ? Because using <VDQ>mode resulted
>> >> >> >> >> > > in lot
>> >> >> >> >> > > > > >>>> of build errors.
>> >> >> >> >> > > > > >>>> Also, I think the comparison should be inverted, ie, GET_MODE_SIZE
>> >> >> >> >> > > > > >>>> (<MODE>mode) == 16 since
>> >> >> >> >> > > > > >>>> we want to make the pattern pass if target is MVE and vector size is
>> >> >> >> >> > > 16 bytes ?
>> >> >> >> >> > > > > >>>> Do these changes in attached patch look OK ?
>> >> >> >> >> > > > > >>> Yes, you're right.
>> >> >> >> >> > > > > >>
>> >> >> >> >> > > > > >> Can't this be ARM_HAVE_<MODE>_ARITH like in most expanders in
>> >> >> >> >> > > vec-common.md?
>> >> >> >> >> > > > > >>
>> >> >> >> >> > > > > >> (maybe with a && !TARGET_REALLY_IWMMXT if needed)
>> >> >> >> >> > > > > > I wonder if this should be ARM_HAVE_<MODE>_LDST instead since
>> >> >> >> >> > > we're
>> >> >> >> >> > > > > > initializing the vector ?
>> >> >> >> >> > > > >
>> >> >> >> >> > > > >
>> >> >> >> >> > > > > Well, it really depends on which modes you want to enable.
>> >> >> >> >> > > > >
>> >> >> >> >> > > > >
>> >> >> >> >> > > > > Looks like your move VDQ -> VDQ adds V4BF, V8BF and DI.
>> >> >> >> >> > > > >
>> >> >> >> >> > > > > Are they all OK for Neon?
>> >> >> >> >> > > > >
>> >> >> >> >> > > > > They are not OK for MVE.
>> >> >> >> >> > > > >
>> >> >> >> >> > > > > Ideally you could add testcases to cover to the supported and
>> >> >> >> >> > > > > unsupported modes for both Neon and MVE.\
>> >> >> >> >> > > > >
>> >> >> >> >> > > > > Before your patch, the expander is enabled for MVE for 64 bit modes
>> >> >> >> >> > > > > (V8QI, V4HI, V2SI): what happens in this case? Does the compiler crash
>> >> >> >> >> > > > > or is there something else preventing the match?
>> >> >> >> >> > > > Hi,
>> >> >> >> >> > > > Apparently there is VALID_MVE_MODE macro, so is it better to use:
>> >> >> >> >> > > > TARGET_NEON || (TARGET_HAVE_MVE &&
>> >> >> >> >> > > VALID_MVE_MODE(<MODE>mode))
>> >> >> >> >> > > > as in the attached patch ?
>> >> >> >> >> >
>> >> >> >> >> > The change is ok. I would like to see some testcases like Christophe suggested, but this patch just moves the expander around rather than introducing new functionality.
>> >> >> >> >> Hi Kyrill,
>> >> >> >> >> As mentioned in the first email, the patch improves code-gen for
>> >> >> >> >> following test-case:
>> >> >> >> >>
>> >> >> >> >> bfloat16x4_t f (bfloat16_t a)
>> >> >> >> >> {
>> >> >> >> >>   return (bfloat16x4_t) {a, a, a, a};
>> >> >> >> >> }
>> >> >> >> >>
>> >> >> >> >> Before patch:
>> >> >> >> >> f:
>> >> >> >> >>         mov     r3, r0  @ __bf16
>> >> >> >> >>         adr     r1, .L4
>> >> >> >> >>         ldrd    r0, [r1]
>> >> >> >> >>         mov     r2, r3  @ __bf16
>> >> >> >> >>         mov     ip, r3  @ __bf16
>> >> >> >> >>         bfi     r1, r2, #0, #16
>> >> >> >> >>         bfi     r0, ip, #0, #16
>> >> >> >> >>         bfi     r1, r3, #16, #16
>> >> >> >> >>         bfi     r0, r2, #16, #16
>> >> >> >> >>         bx      lr
>> >> >> >> >>
>> >> >> >> >> After patch:
>> >> >> >> >> f:
>> >> >> >> >>         vdup.16 d16, r0
>> >> >> >> >>         vmov    r0, r1, d16  @ v4bf
>> >> >> >> >>         bx      lr
>> >> >> >> >>
>> >> >> >> >> because the patch changes mode from VDQ to VDQX to accommodate bf modes.
>> >> >> >> >> I have included the test in the attached patch.
>> >> >> >> >> I think Christophe's concerns were mainly about the right modes
>> >> >> >> >> getting enabled for MVE.
>> >> >> >> >> Unfortunately, I am not sure how to test for that because the FE
>> >> >> >> >> catches invalid modes, and we don't
>> >> >> >> >> end up hitting the pattern.
>> >> >> >> >>
>> >> >> >> >
>> >> >> >> > Hi Prathamesh,
>> >> >> >> >
>> >> >> >> > The new testcase fails on arm-linux-gnueabihf:
>> >> >> >> >  FAIL: gcc.target/arm/simd/pr98435.c (test for excess errors)
>> >> >> >> > Excess errors:
>> >> >> >> > /aci-gcc-fsf/builds/gcc-fsf-gccsrc/sysroot-arm-none-linux-gnueabihf/usr/include/gnu/stubs.h:7:11: fatal error: gnu/stubs-soft.h: No such file or directory
>> >> >> >> > compilation terminated.
>> >> >> >> >
>> >> >> >> > Because you don't check whether  -mfloat-abi=softfp is actually supported.
>> >> >> >> >
>> >> >> >> > Can you fix that?
>> >> >> >> Oops, sorry about that.
>> >> >> >> The attached patch fixes the test by requiring arm_softfloat and makes
>> >> >> >> it UNSUPPORTED on arm-linux-gnueabihf.
>> >> >> >> Does it look OK ?
>> >> >> >>
>> >> >> >
>> >> >> > I don't think that's right: it would make the test unsupported if softfp is not the default even if the toolchain has the needed multilibs.
>> >> >> > Did you check eg. with arm-eabi and multilibs enabled?
>> >> >> Ah OK, thanks for pointing it out!
>> >> >> Does the attached patch look correct ?
>> >> >>
>> >> >
>> >> > I don't think: this would skip the test even if the toolchain has multilibs enabled.
>> >> > Did you check eg. with arm-eabi and multilibs enabled and the usual option overrides?
>> >> It showed 3 PASS with second patch:
>> >> /* { dg-skip-if "skip test for hard float" { *-*-* } {
>> >> "-mfloat-abi=hard" } { "" } } */
>> >>
>> >> I ran it using make check-gcc RUNTESTFLAGS="simd.exp=pr98435.c"
>> >> and built the toolchain using:
>> >> abe.sh --target arm-eabi --build all --set multilib=aprofile gcc=gcc.git~master.
>> >> I suppose that's correct ?
>> >
>> >
>> > I use rmprofile for arm-eabi, but since aprofile also includes both hard and soft multilibs, that should be OK.
>> > However, I meant overriding the flags used for testing. Here is my current list:
>> >
>> > -mcpu=cortex-a7/-mfloat-abi=soft/-march=armv7ve+simd
>> > -mcpu=cortex-a7/-mfloat-abi=hard/-march=armv7ve+simd
>> > -mthumb/-mcpu=cortex-a7/-mfloat-abi=hard/-march=armv7ve+simd
>> > -mthumb/-mfloat-abi=soft/-march=armv6s-m
>> > -mthumb/-mfloat-abi=soft/-march=armv7-m
>> > -mthumb/-mfloat-abi=hard/-march=armv7e-m+fp
>> > -mthumb/-mfloat-abi=hard/-march=armv7e-m+fp.dp
>> > -mthumb/-mfloat-abi=hard/-march=armv8-m.main+fp+dsp
>> > -mthumb/-mfloat-abi=soft/-march=armv8.1-m.main
>> Ah right, thanks for the list.
>> So, with these options -mthumb/-mfloat-abi=hard/-march=armv7e-m+fp,
>> the test used to PASS but with the patch applied, it now appears UNSUPPORTED
>> because it skips the test for -mfloat-abi=hard.
>
>
> Yes, that's what I wrote above.
>
>>
>> So I guess what we want to check is if -mfloat-abi=hard is used, then
>> the target has multilib support enabled ?
>> Could you suggest how to check for that with dejagnu ?
>
>
> No, since you want to use floatfp, you want to make sure that floatfp is accepted by the toolchain.
> Looking at target-supports.exp, I'd suggest you try  arm_softfp_ok.
That worked, thanks!
It skipped the test on armhf and passed on arm-eabi with multilibs enabled.
Is this patch OK to commit ?

Thanks,
Prathamesh
>
> Christophe
>
>>
>> Thanks,
>> Prathamesh
>> >
>> > Christophe
>> >
>> >>
>> >> gcc -v output:
>> >> Configured with:
>> >> '/home/prathamesh.kulkarni/abe-toolchain-2/snapshots/gcc.git~master/configure'
>> >> SHELL=/bin/bash
>> >> --with-mpc=/home/prathamesh.kulkarni/abe-toolchain-2/builds/destdir/x86_64-pc-linux-gnu
>> >> --with-mpfr=/home/prathamesh.kulkarni/abe-toolchain-2/builds/destdir/x86_64-pc-linux-gnu
>> >> --with-gmp=/home/prathamesh.kulkarni/abe-toolchain-2/builds/destdir/x86_64-pc-linux-gnu
>> >> --with-gnu-as --with-gnu-ld --disable-libmudflap --enable-lto
>> >> --enable-shared --without-included-gettext --enable-nls
>> >> --with-system-zlib --disable-sjlj-exceptions
>> >> --enable-gnu-unique-object --enable-linker-build-id
>> >> --disable-libstdcxx-pch --enable-c99 --enable-clocale=gnu
>> >> --enable-libstdcxx-debug --enable-long-long --with-cloog=no
>> >> --with-ppl=no --with-isl=no --enable-multilib
>> >> --with-multilib-list=aprofile --enable-threads=no --disable-multiarch
>> >> --with-sysroot=/home/prathamesh.kulkarni/abe-toolchain-2/builds/destdir/x86_64-pc-linux-gnu/arm-eabi
>> >> --with-newlib --enable-checking=yes --disable-bootstrap
>> >> --enable-languages=c,c++,lto
>> >> --prefix=/home/prathamesh.kulkarni/abe-toolchain-2/builds/destdir/x86_64-pc-linux-gnu
>> >> --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu
>> >> --target=arm-eabi
>> >>
>> >> Thanks,
>> >> Prathamesh
>> >> >
>> >> >
>> >> > Christophe
>> >> >
>> >> >> Thanks,
>> >> >> Prathamesh
>> >> >> >
>> >> >> > Christophe
>> >> >> >
>> >> >> >>
>> >> >> >> Thanks,
>> >> >> >> Prathamesh
>> >> >> >> >
>> >> >> >> > Thanks
>> >> >> >> >
>> >> >> >> > Christophe
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >>
>> >> >> >> >> Thanks,
>> >> >> >> >> Prathamesh
>> >> >> >> >> > Thanks,
>> >> >> >> >> > Kyrill
>> >> >> >> >> >
>> >> >> >> >> > > ping https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574206.html
>> >> >> >> >> > >
>> >> >> >> >> > > Thanks,
>> >> >> >> >> > > Prathamesh
>> >> >> >> >> > > >
>> >> >> >> >> > > > Thanks,
>> >> >> >> >> > > > Prathamesh
>> >> >> >> >> > > > >
>> >> >> >> >> > > > >
>> >> >> >> >> > > > > Thanks,
>> >> >> >> >> > > > >
>> >> >> >> >> > > > >
>> >> >> >> >> > > > > Christophe
>> >> >> >> >> > > > >
>> >> >> >> >> > > > >
>> >> >> >> >> > > > > > Thanks,
>> >> >> >> >> > > > > > Prathamesh
>> >> >> >> >> > > > > >>
>> >> >> >> >> > > > > >> Christophe
>> >> >> >> >> > > > > >>
>> >> >> >> >> > > > > >>
>> >> >> >> >> > > > > >>> Ok.
>> >> >> >> >> > > > > >>> Thanks,
>> >> >> >> >> > > > > >>> Kyrill
>> >> >> >> >> > > > > >>>
>> >> >> >> >> > > > > >>>
>> >> >> >> >> > > > > >>>> Thanks,
>> >> >> >> >> > > > > >>>> Prathamesh
>> >> >> >> >> > > > > >>>>> Thanks,
>> >> >> >> >> > > > > >>>>> Kyrill
>> >> >> >> >> > > > > >>>>>
>> >> >> >> >> > > > > >>>>>> Thanks,
>> >> >> >> >> > > > > >>>>>> Prathamesh
>> >> >> >> >> > > > > >>>>>>> Thanks,
>> >> >> >> >> > > > > >>>>>>> Prathamesh
>> >> >> >> >> > > > > >>>>>>>> That being said, I suggest you look at other similar patterns in
>> >> >> >> >> > > > > >>>>>>>> vec-common.md, most of which are gated on
>> >> >> >> >> > > > > >>>>>>>> ARM_HAVE_<MODE>_ARITH
>> >> >> >> >> > > > > >>>>>>>> and possibly beware of issues with iwmmxt :-)
>> >> >> >> >> > > > > >>>>>>>>
>> >> >> >> >> > > > > >>>>>>>> Christophe
>> >> >> >> >> > > > > >>>>>>>>
>> >> >> >> >> > > > > >>>>>>>>> Thanks,
>> >> >> >> >> > > > > >>>>>>>>> Prathamesh

[-- Attachment #2: pr98435-test-fix-3.txt --]
[-- Type: text/plain, Size: 564 bytes --]

diff --git a/gcc/testsuite/gcc.target/arm/simd/pr98435.c b/gcc/testsuite/gcc.target/arm/simd/pr98435.c
index 0af8633fd56..a4c6a1c85e0 100644
--- a/gcc/testsuite/gcc.target/arm/simd/pr98435.c
+++ b/gcc/testsuite/gcc.target/arm/simd/pr98435.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -ffast-math" } */
+/* { dg-require-effective-target arm_softfp_ok } */
 /* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
 /* { dg-add-options arm_v8_2a_bf16_neon } */
 /* { dg-additional-options "-mfloat-abi=softfp -march=armv8.2-a+bf16+fp16" } */

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ARM] PR98435: Missed optimization in expanding vector constructor
  2021-08-09  5:07                                             ` Prathamesh Kulkarni
@ 2021-08-09 16:19                                               ` Christophe Lyon
  2021-08-13  7:04                                                 ` Prathamesh Kulkarni
  0 siblings, 1 reply; 29+ messages in thread
From: Christophe Lyon @ 2021-08-09 16:19 UTC (permalink / raw)
  To: Prathamesh Kulkarni; +Cc: Kyrylo Tkachov, gcc Patches

On Mon, Aug 9, 2021 at 7:07 AM Prathamesh Kulkarni <
prathamesh.kulkarni@linaro.org> wrote:

> On Fri, 6 Aug 2021 at 17:31, Christophe Lyon
> <christophe.lyon.oss@gmail.com> wrote:
> >
> >
> >
> > On Fri, Aug 6, 2021 at 11:51 AM Prathamesh Kulkarni <
> prathamesh.kulkarni@linaro.org> wrote:
> >>
> >> On Fri, 6 Aug 2021 at 14:49, Christophe Lyon
> >> <christophe.lyon.oss@gmail.com> wrote:
> >> >
> >> >
> >> >
> >> > On Fri, Aug 6, 2021 at 11:00 AM Prathamesh Kulkarni <
> prathamesh.kulkarni@linaro.org> wrote:
> >> >>
> >> >> On Thu, 5 Aug 2021 at 18:05, Christophe Lyon
> >> >> <christophe.lyon.oss@gmail.com> wrote:
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Thu, Aug 5, 2021 at 2:28 PM Prathamesh Kulkarni <
> prathamesh.kulkarni@linaro.org> wrote:
> >> >> >>
> >> >> >> On Tue, 3 Aug 2021 at 20:52, Christophe Lyon
> >> >> >> <christophe.lyon.oss@gmail.com> wrote:
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > On Tue, Aug 3, 2021 at 12:57 PM Prathamesh Kulkarni <
> prathamesh.kulkarni@linaro.org> wrote:
> >> >> >> >>
> >> >> >> >> On Tue, 3 Aug 2021 at 14:59, Christophe Lyon
> >> >> >> >> <christophe.lyon.oss@gmail.com> wrote:
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> > On Tue, Jul 6, 2021 at 11:26 AM Prathamesh Kulkarni via
> Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> >> >> >> >> >>
> >> >> >> >> >> On Tue, 6 Jul 2021 at 13:33, Kyrylo Tkachov <
> Kyrylo.Tkachov@arm.com> wrote:
> >> >> >> >> >> >
> >> >> >> >> >> >
> >> >> >> >> >> >
> >> >> >> >> >> > > -----Original Message-----
> >> >> >> >> >> > > From: Prathamesh Kulkarni <
> prathamesh.kulkarni@linaro.org>
> >> >> >> >> >> > > Sent: 06 July 2021 08:06
> >> >> >> >> >> > > To: Christophe LYON <christophe.lyon@foss.st.com>
> >> >> >> >> >> > > Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; gcc
> Patches <gcc-
> >> >> >> >> >> > > patches@gcc.gnu.org>
> >> >> >> >> >> > > Subject: Re: [ARM] PR98435: Missed optimization in
> expanding vector
> >> >> >> >> >> > > constructor
> >> >> >> >> >> > >
> >> >> >> >> >> > > On Thu, 1 Jul 2021 at 16:26, Prathamesh Kulkarni
> >> >> >> >> >> > > <prathamesh.kulkarni@linaro.org> wrote:
> >> >> >> >> >> > > >
> >> >> >> >> >> > > > On Wed, 30 Jun 2021 at 20:51, Christophe LYON
> >> >> >> >> >> > > > <christophe.lyon@foss.st.com> wrote:
> >> >> >> >> >> > > > >
> >> >> >> >> >> > > > >
> >> >> >> >> >> > > > > On 29/06/2021 12:46, Prathamesh Kulkarni wrote:
> >> >> >> >> >> > > > > > On Mon, 28 Jun 2021 at 14:48, Christophe LYON
> >> >> >> >> >> > > > > > <christophe.lyon@foss.st.com> wrote:
> >> >> >> >> >> > > > > >>
> >> >> >> >> >> > > > > >> On 28/06/2021 10:40, Kyrylo Tkachov via
> Gcc-patches wrote:
> >> >> >> >> >> > > > > >>>> -----Original Message-----
> >> >> >> >> >> > > > > >>>> From: Prathamesh Kulkarni <
> prathamesh.kulkarni@linaro.org>
> >> >> >> >> >> > > > > >>>> Sent: 28 June 2021 09:38
> >> >> >> >> >> > > > > >>>> To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> >> >> >> >> >> > > > > >>>> Cc: Christophe Lyon <
> christophe.lyon@linaro.org>; gcc Patches
> >> >> >> >> >> > > <gcc-
> >> >> >> >> >> > > > > >>>> patches@gcc.gnu.org>
> >> >> >> >> >> > > > > >>>> Subject: Re: [ARM] PR98435: Missed
> optimization in expanding
> >> >> >> >> >> > > vector
> >> >> >> >> >> > > > > >>>> constructor
> >> >> >> >> >> > > > > >>>>
> >> >> >> >> >> > > > > >>>> On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov
> >> >> >> >> >> > > <Kyrylo.Tkachov@arm.com>
> >> >> >> >> >> > > > > >>>> wrote:
> >> >> >> >> >> > > > > >>>>>
> >> >> >> >> >> > > > > >>>>>> -----Original Message-----
> >> >> >> >> >> > > > > >>>>>> From: Prathamesh Kulkarni <
> prathamesh.kulkarni@linaro.org>
> >> >> >> >> >> > > > > >>>>>> Sent: 14 June 2021 09:02
> >> >> >> >> >> > > > > >>>>>> To: Christophe Lyon <
> christophe.lyon@linaro.org>
> >> >> >> >> >> > > > > >>>>>> Cc: gcc Patches <gcc-patches@gcc.gnu.org>;
> Kyrylo Tkachov
> >> >> >> >> >> > > > > >>>>>> <Kyrylo.Tkachov@arm.com>
> >> >> >> >> >> > > > > >>>>>> Subject: Re: [ARM] PR98435: Missed
> optimization in expanding
> >> >> >> >> >> > > vector
> >> >> >> >> >> > > > > >>>>>> constructor
> >> >> >> >> >> > > > > >>>>>>
> >> >> >> >> >> > > > > >>>>>> On Wed, 9 Jun 2021 at 15:58, Prathamesh
> Kulkarni
> >> >> >> >> >> > > > > >>>>>> <prathamesh.kulkarni@linaro.org> wrote:
> >> >> >> >> >> > > > > >>>>>>> On Fri, 4 Jun 2021 at 13:15, Christophe Lyon
> >> >> >> >> >> > > > > >>>> <christophe.lyon@linaro.org>
> >> >> >> >> >> > > > > >>>>>> wrote:
> >> >> >> >> >> > > > > >>>>>>>> On Fri, 4 Jun 2021 at 09:27, Prathamesh
> Kulkarni via Gcc-
> >> >> >> >> >> > > patches
> >> >> >> >> >> > > > > >>>>>>>> <gcc-patches@gcc.gnu.org> wrote:
> >> >> >> >> >> > > > > >>>>>>>>> Hi,
> >> >> >> >> >> > > > > >>>>>>>>> As mentioned in PR, for the following
> test-case:
> >> >> >> >> >> > > > > >>>>>>>>>
> >> >> >> >> >> > > > > >>>>>>>>> #include <arm_neon.h>
> >> >> >> >> >> > > > > >>>>>>>>>
> >> >> >> >> >> > > > > >>>>>>>>> bfloat16x4_t f1 (bfloat16_t a)
> >> >> >> >> >> > > > > >>>>>>>>> {
> >> >> >> >> >> > > > > >>>>>>>>>     return vdup_n_bf16 (a);
> >> >> >> >> >> > > > > >>>>>>>>> }
> >> >> >> >> >> > > > > >>>>>>>>>
> >> >> >> >> >> > > > > >>>>>>>>> bfloat16x4_t f2 (bfloat16_t a)
> >> >> >> >> >> > > > > >>>>>>>>> {
> >> >> >> >> >> > > > > >>>>>>>>>     return (bfloat16x4_t) {a, a, a, a};
> >> >> >> >> >> > > > > >>>>>>>>> }
> >> >> >> >> >> > > > > >>>>>>>>>
> >> >> >> >> >> > > > > >>>>>>>>> Compiling with arm-linux-gnueabi -O3
> -mfpu=neon -mfloat-
> >> >> >> >> >> > > > > >>>> abi=softfp
> >> >> >> >> >> > > > > >>>>>>>>> -march=armv8.2-a+bf16+fp16 results in f2
> not being
> >> >> >> >> >> > > vectorized:
> >> >> >> >> >> > > > > >>>>>>>>>
> >> >> >> >> >> > > > > >>>>>>>>> f1:
> >> >> >> >> >> > > > > >>>>>>>>>           vdup.16 d16, r0
> >> >> >> >> >> > > > > >>>>>>>>>           vmov    r0, r1, d16  @ v4bf
> >> >> >> >> >> > > > > >>>>>>>>>           bx      lr
> >> >> >> >> >> > > > > >>>>>>>>>
> >> >> >> >> >> > > > > >>>>>>>>> f2:
> >> >> >> >> >> > > > > >>>>>>>>>           mov     r3, r0  @ __bf16
> >> >> >> >> >> > > > > >>>>>>>>>           adr     r1, .L4
> >> >> >> >> >> > > > > >>>>>>>>>           ldrd    r0, [r1]
> >> >> >> >> >> > > > > >>>>>>>>>           mov     r2, r3  @ __bf16
> >> >> >> >> >> > > > > >>>>>>>>>           mov     ip, r3  @ __bf16
> >> >> >> >> >> > > > > >>>>>>>>>           bfi     r1, r2, #0, #16
> >> >> >> >> >> > > > > >>>>>>>>>           bfi     r0, ip, #0, #16
> >> >> >> >> >> > > > > >>>>>>>>>           bfi     r1, r3, #16, #16
> >> >> >> >> >> > > > > >>>>>>>>>           bfi     r0, r2, #16, #16
> >> >> >> >> >> > > > > >>>>>>>>>           bx      lr
> >> >> >> >> >> > > > > >>>>>>>>>
> >> >> >> >> >> > > > > >>>>>>>>> This seems to happen because vec_init
> pattern in neon.md
> >> >> >> >> >> > > has VDQ
> >> >> >> >> >> > > > > >>>>>> mode
> >> >> >> >> >> > > > > >>>>>>>>> iterator, which doesn't include V4BF. In
> attached patch, I
> >> >> >> >> >> > > changed
> >> >> >> >> >> > > > > >>>>>>>>> mode
> >> >> >> >> >> > > > > >>>>>>>>> to VDQX which seems to work for the
> test-case, and the
> >> >> >> >> >> > > compiler
> >> >> >> >> >> > > > > >>>> now
> >> >> >> >> >> > > > > >>>>>> generates:
> >> >> >> >> >> > > > > >>>>>>>>> f2:
> >> >> >> >> >> > > > > >>>>>>>>>           vdup.16 d16, r0
> >> >> >> >> >> > > > > >>>>>>>>>           vmov    r0, r1, d16  @ v4bf
> >> >> >> >> >> > > > > >>>>>>>>>           bx      lr
> >> >> >> >> >> > > > > >>>>>>>>>
> >> >> >> >> >> > > > > >>>>>>>>> However, the pattern is also gated on
> TARGET_HAVE_MVE
> >> >> >> >> >> > > and I am
> >> >> >> >> >> > > > > >>>>>> not
> >> >> >> >> >> > > > > >>>>>>>>> sure if either VDQ or VDQX are correct
> modes for MVE since
> >> >> >> >> >> > > MVE
> >> >> >> >> >> > > > > >>>> has
> >> >> >> >> >> > > > > >>>>>>>>> only 128-bit vectors ?
> >> >> >> >> >> > > > > >>>>>>>>>
> >> >> >> >> >> > > > > >>>>>>>> I think patterns common to both Neon and
> MVE should be
> >> >> >> >> >> > > moved to
> >> >> >> >> >> > > > > >>>>>>>> vec-common.md, I don't know why such
> patterns were left in
> >> >> >> >> >> > > > > >>>> neon.md.
> >> >> >> >> >> > > > > >>>>>>> Since we end up calling
> neon_expand_vector_init for both
> >> >> >> >> >> > > NEON and
> >> >> >> >> >> > > > > >>>> MVE,
> >> >> >> >> >> > > > > >>>>>>> I am not sure if we should separate the
> pattern ?
> >> >> >> >> >> > > > > >>>>>>> Would it make sense to FAIL if the mode
> size isn't 16 bytes for
> >> >> >> >> >> > > MVE as
> >> >> >> >> >> > > > > >>>>>>> in attached patch so
> >> >> >> >> >> > > > > >>>>>>> it will call neon_expand_vector_init only
> for 128-bit vectors ?
> >> >> >> >> >> > > > > >>>>>>> Altho hard-coding 16 in the pattern doesn't
> seem a good idea to
> >> >> >> >> >> > > me
> >> >> >> >> >> > > > > >>>> either.
> >> >> >> >> >> > > > > >>>>>> ping
> https://gcc.gnu.org/pipermail/gcc-patches/2021-
> >> >> >> >> >> > > June/572342.html
> >> >> >> >> >> > > > > >>>>>> (attaching patch as text).
> >> >> >> >> >> > > > > >>>>>>
> >> >> >> >> >> > > > > >>>>> --- a/gcc/config/arm/neon.md
> >> >> >> >> >> > > > > >>>>> +++ b/gcc/config/arm/neon.md
> >> >> >> >> >> > > > > >>>>> @@ -459,10 +459,12 @@
> >> >> >> >> >> > > > > >>>>>    )
> >> >> >> >> >> > > > > >>>>>
> >> >> >> >> >> > > > > >>>>>    (define_expand "vec_init<mode><V_elem_l>"
> >> >> >> >> >> > > > > >>>>> -  [(match_operand:VDQ 0 "s_register_operand")
> >> >> >> >> >> > > > > >>>>> +  [(match_operand:VDQX 0
> "s_register_operand")
> >> >> >> >> >> > > > > >>>>>       (match_operand 1 "" "")]
> >> >> >> >> >> > > > > >>>>>      "TARGET_NEON || TARGET_HAVE_MVE"
> >> >> >> >> >> > > > > >>>>>    {
> >> >> >> >> >> > > > > >>>>> +  if (TARGET_HAVE_MVE && GET_MODE_SIZE
> (GET_MODE
> >> >> >> >> >> > > > > >>>> (operands[0])) != 16)
> >> >> >> >> >> > > > > >>>>> +    FAIL;
> >> >> >> >> >> > > > > >>>>>      neon_expand_vector_init (operands[0],
> operands[1]);
> >> >> >> >> >> > > > > >>>>>      DONE;
> >> >> >> >> >> > > > > >>>>>    })
> >> >> >> >> >> > > > > >>>>>
> >> >> >> >> >> > > > > >>>>> I think we should move this to vec-common.md
> like Christophe
> >> >> >> >> >> > > said.
> >> >> >> >> >> > > > > >>>>> Perhaps rather than making it FAIL for non-16
> MVE sizes we just
> >> >> >> >> >> > > disable it in
> >> >> >> >> >> > > > > >>>> the expander condition?
> >> >> >> >> >> > > > > >>>>> "TARGET_NEON || (TARGET_HAVE_MVE &&
> GET_MODE_SIZE (<
> >> >> >> >> >> > > > > >>>> VDQ>mode) != 16)"
> >> >> >> >> >> > > > > >>>> Is it OK to use <MODE>mode ? Because using
> <VDQ>mode resulted
> >> >> >> >> >> > > in lot
> >> >> >> >> >> > > > > >>>> of build errors.
> >> >> >> >> >> > > > > >>>> Also, I think the comparison should be
> inverted, ie, GET_MODE_SIZE
> >> >> >> >> >> > > > > >>>> (<MODE>mode) == 16 since
> >> >> >> >> >> > > > > >>>> we want to make the pattern pass if target is
> MVE and vector size is
> >> >> >> >> >> > > 16 bytes ?
> >> >> >> >> >> > > > > >>>> Do these changes in attached patch look OK ?
> >> >> >> >> >> > > > > >>> Yes, you're right.
> >> >> >> >> >> > > > > >>
> >> >> >> >> >> > > > > >> Can't this be ARM_HAVE_<MODE>_ARITH like in most
> expanders in
> >> >> >> >> >> > > vec-common.md?
> >> >> >> >> >> > > > > >>
> >> >> >> >> >> > > > > >> (maybe with a && !TARGET_REALLY_IWMMXT if needed)
> >> >> >> >> >> > > > > > I wonder if this should be ARM_HAVE_<MODE>_LDST
> instead since
> >> >> >> >> >> > > we're
> >> >> >> >> >> > > > > > initializing the vector ?
> >> >> >> >> >> > > > >
> >> >> >> >> >> > > > >
> >> >> >> >> >> > > > > Well, it really depends on which modes you want to
> enable.
> >> >> >> >> >> > > > >
> >> >> >> >> >> > > > >
> >> >> >> >> >> > > > > Looks like your move VDQ -> VDQ adds V4BF, V8BF and
> DI.
> >> >> >> >> >> > > > >
> >> >> >> >> >> > > > > Are they all OK for Neon?
> >> >> >> >> >> > > > >
> >> >> >> >> >> > > > > They are not OK for MVE.
> >> >> >> >> >> > > > >
> >> >> >> >> >> > > > > Ideally you could add testcases to cover to the
> supported and
> >> >> >> >> >> > > > > unsupported modes for both Neon and MVE.\
> >> >> >> >> >> > > > >
> >> >> >> >> >> > > > > Before your patch, the expander is enabled for MVE
> for 64 bit modes
> >> >> >> >> >> > > > > (V8QI, V4HI, V2SI): what happens in this case? Does
> the compiler crash
> >> >> >> >> >> > > > > or is there something else preventing the match?
> >> >> >> >> >> > > > Hi,
> >> >> >> >> >> > > > Apparently there is VALID_MVE_MODE macro, so is it
> better to use:
> >> >> >> >> >> > > > TARGET_NEON || (TARGET_HAVE_MVE &&
> >> >> >> >> >> > > VALID_MVE_MODE(<MODE>mode))
> >> >> >> >> >> > > > as in the attached patch ?
> >> >> >> >> >> >
> >> >> >> >> >> > The change is ok. I would like to see some testcases like
> Christophe suggested, but this patch just moves the expander around rather
> than introducing new functionality.
> >> >> >> >> >> Hi Kyrill,
> >> >> >> >> >> As mentioned in the first email, the patch improves
> code-gen for
> >> >> >> >> >> following test-case:
> >> >> >> >> >>
> >> >> >> >> >> bfloat16x4_t f (bfloat16_t a)
> >> >> >> >> >> {
> >> >> >> >> >>   return (bfloat16x4_t) {a, a, a, a};
> >> >> >> >> >> }
> >> >> >> >> >>
> >> >> >> >> >> Before patch:
> >> >> >> >> >> f:
> >> >> >> >> >>         mov     r3, r0  @ __bf16
> >> >> >> >> >>         adr     r1, .L4
> >> >> >> >> >>         ldrd    r0, [r1]
> >> >> >> >> >>         mov     r2, r3  @ __bf16
> >> >> >> >> >>         mov     ip, r3  @ __bf16
> >> >> >> >> >>         bfi     r1, r2, #0, #16
> >> >> >> >> >>         bfi     r0, ip, #0, #16
> >> >> >> >> >>         bfi     r1, r3, #16, #16
> >> >> >> >> >>         bfi     r0, r2, #16, #16
> >> >> >> >> >>         bx      lr
> >> >> >> >> >>
> >> >> >> >> >> After patch:
> >> >> >> >> >> f:
> >> >> >> >> >>         vdup.16 d16, r0
> >> >> >> >> >>         vmov    r0, r1, d16  @ v4bf
> >> >> >> >> >>         bx      lr
> >> >> >> >> >>
> >> >> >> >> >> because the patch changes mode from VDQ to VDQX to
> accommodate bf modes.
> >> >> >> >> >> I have included the test in the attached patch.
> >> >> >> >> >> I think Christophe's concerns were mainly about the right
> modes
> >> >> >> >> >> getting enabled for MVE.
> >> >> >> >> >> Unfortunately, I am not sure how to test for that because
> the FE
> >> >> >> >> >> catches invalid modes, and we don't
> >> >> >> >> >> end up hitting the pattern.
> >> >> >> >> >>
> >> >> >> >> >
> >> >> >> >> > Hi Prathamesh,
> >> >> >> >> >
> >> >> >> >> > The new testcase fails on arm-linux-gnueabihf:
> >> >> >> >> >  FAIL: gcc.target/arm/simd/pr98435.c (test for excess errors)
> >> >> >> >> > Excess errors:
> >> >> >> >> >
> /aci-gcc-fsf/builds/gcc-fsf-gccsrc/sysroot-arm-none-linux-gnueabihf/usr/include/gnu/stubs.h:7:11:
> fatal error: gnu/stubs-soft.h: No such file or directory
> >> >> >> >> > compilation terminated.
> >> >> >> >> >
> >> >> >> >> > Because you don't check whether  -mfloat-abi=softfp is
> actually supported.
> >> >> >> >> >
> >> >> >> >> > Can you fix that?
> >> >> >> >> Oops, sorry about that.
> >> >> >> >> The attached patch fixes the test by requiring arm_softfloat
> and makes
> >> >> >> >> it UNSUPPORTED on arm-linux-gnueabihf.
> >> >> >> >> Does it look OK ?
> >> >> >> >>
> >> >> >> >
> >> >> >> > I don't think that's right: it would make the test unsupported
> if softfp is not the default even if the toolchain has the needed multilibs.
> >> >> >> > Did you check eg. with arm-eabi and multilibs enabled?
> >> >> >> Ah OK, thanks for pointing it out!
> >> >> >> Does the attached patch look correct ?
> >> >> >>
> >> >> >
> >> >> > I don't think: this would skip the test even if the toolchain has
> multilibs enabled.
> >> >> > Did you check eg. with arm-eabi and multilibs enabled and the
> usual option overrides?
> >> >> It showed 3 PASS with second patch:
> >> >> /* { dg-skip-if "skip test for hard float" { *-*-* } {
> >> >> "-mfloat-abi=hard" } { "" } } */
> >> >>
> >> >> I ran it using make check-gcc RUNTESTFLAGS="simd.exp=pr98435.c"
> >> >> and built the toolchain using:
> >> >> abe.sh --target arm-eabi --build all --set multilib=aprofile
> gcc=gcc.git~master.
> >> >> I suppose that's correct ?
> >> >
> >> >
> >> > I use rmprofile for arm-eabi, but since aprofile also includes both
> hard and soft multilibs, that should be OK.
> >> > However, I meant overriding the flags used for testing. Here is my
> current list:
> >> >
> >> > -mcpu=cortex-a7/-mfloat-abi=soft/-march=armv7ve+simd
> >> > -mcpu=cortex-a7/-mfloat-abi=hard/-march=armv7ve+simd
> >> > -mthumb/-mcpu=cortex-a7/-mfloat-abi=hard/-march=armv7ve+simd
> >> > -mthumb/-mfloat-abi=soft/-march=armv6s-m
> >> > -mthumb/-mfloat-abi=soft/-march=armv7-m
> >> > -mthumb/-mfloat-abi=hard/-march=armv7e-m+fp
> >> > -mthumb/-mfloat-abi=hard/-march=armv7e-m+fp.dp
> >> > -mthumb/-mfloat-abi=hard/-march=armv8-m.main+fp+dsp
> >> > -mthumb/-mfloat-abi=soft/-march=armv8.1-m.main
> >> Ah right, thanks for the list.
> >> So, with these options -mthumb/-mfloat-abi=hard/-march=armv7e-m+fp,
> >> the test used to PASS but with the patch applied, it now appears
> UNSUPPORTED
> >> because it skips the test for -mfloat-abi=hard.
> >
> >
> > Yes, that's what I wrote above.
> >
> >>
> >> So I guess what we want to check is if -mfloat-abi=hard is used, then
> >> the target has multilib support enabled ?
> >> Could you suggest how to check for that with dejagnu ?
> >
> >
> > No, since you want to use floatfp, you want to make sure that floatfp is
> accepted by the toolchain.
> > Looking at target-supports.exp, I'd suggest you try  arm_softfp_ok.
> That worked, thanks!
> It skipped the test on armhf and passed on arm-eabi with multilibs enabled.
> Is this patch OK to commit ?
>
>
LGTM :-)


> Thanks,
> Prathamesh
> >
> > Christophe
> >
> >>
> >> Thanks,
> >> Prathamesh
> >> >
> >> > Christophe
> >> >
> >> >>
> >> >> gcc -v output:
> >> >> Configured with:
> >> >>
> '/home/prathamesh.kulkarni/abe-toolchain-2/snapshots/gcc.git~master/configure'
> >> >> SHELL=/bin/bash
> >> >>
> --with-mpc=/home/prathamesh.kulkarni/abe-toolchain-2/builds/destdir/x86_64-pc-linux-gnu
> >> >>
> --with-mpfr=/home/prathamesh.kulkarni/abe-toolchain-2/builds/destdir/x86_64-pc-linux-gnu
> >> >>
> --with-gmp=/home/prathamesh.kulkarni/abe-toolchain-2/builds/destdir/x86_64-pc-linux-gnu
> >> >> --with-gnu-as --with-gnu-ld --disable-libmudflap --enable-lto
> >> >> --enable-shared --without-included-gettext --enable-nls
> >> >> --with-system-zlib --disable-sjlj-exceptions
> >> >> --enable-gnu-unique-object --enable-linker-build-id
> >> >> --disable-libstdcxx-pch --enable-c99 --enable-clocale=gnu
> >> >> --enable-libstdcxx-debug --enable-long-long --with-cloog=no
> >> >> --with-ppl=no --with-isl=no --enable-multilib
> >> >> --with-multilib-list=aprofile --enable-threads=no --disable-multiarch
> >> >>
> --with-sysroot=/home/prathamesh.kulkarni/abe-toolchain-2/builds/destdir/x86_64-pc-linux-gnu/arm-eabi
> >> >> --with-newlib --enable-checking=yes --disable-bootstrap
> >> >> --enable-languages=c,c++,lto
> >> >>
> --prefix=/home/prathamesh.kulkarni/abe-toolchain-2/builds/destdir/x86_64-pc-linux-gnu
> >> >> --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu
> >> >> --target=arm-eabi
> >> >>
> >> >> Thanks,
> >> >> Prathamesh
> >> >> >
> >> >> >
> >> >> > Christophe
> >> >> >
> >> >> >> Thanks,
> >> >> >> Prathamesh
> >> >> >> >
> >> >> >> > Christophe
> >> >> >> >
> >> >> >> >>
> >> >> >> >> Thanks,
> >> >> >> >> Prathamesh
> >> >> >> >> >
> >> >> >> >> > Thanks
> >> >> >> >> >
> >> >> >> >> > Christophe
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> >>
> >> >> >> >> >> Thanks,
> >> >> >> >> >> Prathamesh
> >> >> >> >> >> > Thanks,
> >> >> >> >> >> > Kyrill
> >> >> >> >> >> >
> >> >> >> >> >> > > ping
> https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574206.html
> >> >> >> >> >> > >
> >> >> >> >> >> > > Thanks,
> >> >> >> >> >> > > Prathamesh
> >> >> >> >> >> > > >
> >> >> >> >> >> > > > Thanks,
> >> >> >> >> >> > > > Prathamesh
> >> >> >> >> >> > > > >
> >> >> >> >> >> > > > >
> >> >> >> >> >> > > > > Thanks,
> >> >> >> >> >> > > > >
> >> >> >> >> >> > > > >
> >> >> >> >> >> > > > > Christophe
> >> >> >> >> >> > > > >
> >> >> >> >> >> > > > >
> >> >> >> >> >> > > > > > Thanks,
> >> >> >> >> >> > > > > > Prathamesh
> >> >> >> >> >> > > > > >>
> >> >> >> >> >> > > > > >> Christophe
> >> >> >> >> >> > > > > >>
> >> >> >> >> >> > > > > >>
> >> >> >> >> >> > > > > >>> Ok.
> >> >> >> >> >> > > > > >>> Thanks,
> >> >> >> >> >> > > > > >>> Kyrill
> >> >> >> >> >> > > > > >>>
> >> >> >> >> >> > > > > >>>
> >> >> >> >> >> > > > > >>>> Thanks,
> >> >> >> >> >> > > > > >>>> Prathamesh
> >> >> >> >> >> > > > > >>>>> Thanks,
> >> >> >> >> >> > > > > >>>>> Kyrill
> >> >> >> >> >> > > > > >>>>>
> >> >> >> >> >> > > > > >>>>>> Thanks,
> >> >> >> >> >> > > > > >>>>>> Prathamesh
> >> >> >> >> >> > > > > >>>>>>> Thanks,
> >> >> >> >> >> > > > > >>>>>>> Prathamesh
> >> >> >> >> >> > > > > >>>>>>>> That being said, I suggest you look at
> other similar patterns in
> >> >> >> >> >> > > > > >>>>>>>> vec-common.md, most of which are gated on
> >> >> >> >> >> > > > > >>>>>>>> ARM_HAVE_<MODE>_ARITH
> >> >> >> >> >> > > > > >>>>>>>> and possibly beware of issues with iwmmxt
> :-)
> >> >> >> >> >> > > > > >>>>>>>>
> >> >> >> >> >> > > > > >>>>>>>> Christophe
> >> >> >> >> >> > > > > >>>>>>>>
> >> >> >> >> >> > > > > >>>>>>>>> Thanks,
> >> >> >> >> >> > > > > >>>>>>>>> Prathamesh
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ARM] PR98435: Missed optimization in expanding vector constructor
  2021-08-09 16:19                                               ` Christophe Lyon
@ 2021-08-13  7:04                                                 ` Prathamesh Kulkarni
  0 siblings, 0 replies; 29+ messages in thread
From: Prathamesh Kulkarni @ 2021-08-13  7:04 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: Kyrylo Tkachov, gcc Patches

On Mon, 9 Aug 2021 at 21:50, Christophe Lyon
<christophe.lyon.oss@gmail.com> wrote:
>
>
>
> On Mon, Aug 9, 2021 at 7:07 AM Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> wrote:
>>
>> On Fri, 6 Aug 2021 at 17:31, Christophe Lyon
>> <christophe.lyon.oss@gmail.com> wrote:
>> >
>> >
>> >
>> > On Fri, Aug 6, 2021 at 11:51 AM Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> wrote:
>> >>
>> >> On Fri, 6 Aug 2021 at 14:49, Christophe Lyon
>> >> <christophe.lyon.oss@gmail.com> wrote:
>> >> >
>> >> >
>> >> >
>> >> > On Fri, Aug 6, 2021 at 11:00 AM Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> wrote:
>> >> >>
>> >> >> On Thu, 5 Aug 2021 at 18:05, Christophe Lyon
>> >> >> <christophe.lyon.oss@gmail.com> wrote:
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > On Thu, Aug 5, 2021 at 2:28 PM Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> wrote:
>> >> >> >>
>> >> >> >> On Tue, 3 Aug 2021 at 20:52, Christophe Lyon
>> >> >> >> <christophe.lyon.oss@gmail.com> wrote:
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > On Tue, Aug 3, 2021 at 12:57 PM Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> wrote:
>> >> >> >> >>
>> >> >> >> >> On Tue, 3 Aug 2021 at 14:59, Christophe Lyon
>> >> >> >> >> <christophe.lyon.oss@gmail.com> wrote:
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> > On Tue, Jul 6, 2021 at 11:26 AM Prathamesh Kulkarni via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>> >> >> >> >> >>
>> >> >> >> >> >> On Tue, 6 Jul 2021 at 13:33, Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> wrote:
>> >> >> >> >> >> >
>> >> >> >> >> >> >
>> >> >> >> >> >> >
>> >> >> >> >> >> > > -----Original Message-----
>> >> >> >> >> >> > > From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
>> >> >> >> >> >> > > Sent: 06 July 2021 08:06
>> >> >> >> >> >> > > To: Christophe LYON <christophe.lyon@foss.st.com>
>> >> >> >> >> >> > > Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; gcc Patches <gcc-
>> >> >> >> >> >> > > patches@gcc.gnu.org>
>> >> >> >> >> >> > > Subject: Re: [ARM] PR98435: Missed optimization in expanding vector
>> >> >> >> >> >> > > constructor
>> >> >> >> >> >> > >
>> >> >> >> >> >> > > On Thu, 1 Jul 2021 at 16:26, Prathamesh Kulkarni
>> >> >> >> >> >> > > <prathamesh.kulkarni@linaro.org> wrote:
>> >> >> >> >> >> > > >
>> >> >> >> >> >> > > > On Wed, 30 Jun 2021 at 20:51, Christophe LYON
>> >> >> >> >> >> > > > <christophe.lyon@foss.st.com> wrote:
>> >> >> >> >> >> > > > >
>> >> >> >> >> >> > > > >
>> >> >> >> >> >> > > > > On 29/06/2021 12:46, Prathamesh Kulkarni wrote:
>> >> >> >> >> >> > > > > > On Mon, 28 Jun 2021 at 14:48, Christophe LYON
>> >> >> >> >> >> > > > > > <christophe.lyon@foss.st.com> wrote:
>> >> >> >> >> >> > > > > >>
>> >> >> >> >> >> > > > > >> On 28/06/2021 10:40, Kyrylo Tkachov via Gcc-patches wrote:
>> >> >> >> >> >> > > > > >>>> -----Original Message-----
>> >> >> >> >> >> > > > > >>>> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
>> >> >> >> >> >> > > > > >>>> Sent: 28 June 2021 09:38
>> >> >> >> >> >> > > > > >>>> To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
>> >> >> >> >> >> > > > > >>>> Cc: Christophe Lyon <christophe.lyon@linaro.org>; gcc Patches
>> >> >> >> >> >> > > <gcc-
>> >> >> >> >> >> > > > > >>>> patches@gcc.gnu.org>
>> >> >> >> >> >> > > > > >>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding
>> >> >> >> >> >> > > vector
>> >> >> >> >> >> > > > > >>>> constructor
>> >> >> >> >> >> > > > > >>>>
>> >> >> >> >> >> > > > > >>>> On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov
>> >> >> >> >> >> > > <Kyrylo.Tkachov@arm.com>
>> >> >> >> >> >> > > > > >>>> wrote:
>> >> >> >> >> >> > > > > >>>>>
>> >> >> >> >> >> > > > > >>>>>> -----Original Message-----
>> >> >> >> >> >> > > > > >>>>>> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
>> >> >> >> >> >> > > > > >>>>>> Sent: 14 June 2021 09:02
>> >> >> >> >> >> > > > > >>>>>> To: Christophe Lyon <christophe.lyon@linaro.org>
>> >> >> >> >> >> > > > > >>>>>> Cc: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov
>> >> >> >> >> >> > > > > >>>>>> <Kyrylo.Tkachov@arm.com>
>> >> >> >> >> >> > > > > >>>>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding
>> >> >> >> >> >> > > vector
>> >> >> >> >> >> > > > > >>>>>> constructor
>> >> >> >> >> >> > > > > >>>>>>
>> >> >> >> >> >> > > > > >>>>>> On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni
>> >> >> >> >> >> > > > > >>>>>> <prathamesh.kulkarni@linaro.org> wrote:
>> >> >> >> >> >> > > > > >>>>>>> On Fri, 4 Jun 2021 at 13:15, Christophe Lyon
>> >> >> >> >> >> > > > > >>>> <christophe.lyon@linaro.org>
>> >> >> >> >> >> > > > > >>>>>> wrote:
>> >> >> >> >> >> > > > > >>>>>>>> On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via Gcc-
>> >> >> >> >> >> > > patches
>> >> >> >> >> >> > > > > >>>>>>>> <gcc-patches@gcc.gnu.org> wrote:
>> >> >> >> >> >> > > > > >>>>>>>>> Hi,
>> >> >> >> >> >> > > > > >>>>>>>>> As mentioned in PR, for the following test-case:
>> >> >> >> >> >> > > > > >>>>>>>>>
>> >> >> >> >> >> > > > > >>>>>>>>> #include <arm_neon.h>
>> >> >> >> >> >> > > > > >>>>>>>>>
>> >> >> >> >> >> > > > > >>>>>>>>> bfloat16x4_t f1 (bfloat16_t a)
>> >> >> >> >> >> > > > > >>>>>>>>> {
>> >> >> >> >> >> > > > > >>>>>>>>>     return vdup_n_bf16 (a);
>> >> >> >> >> >> > > > > >>>>>>>>> }
>> >> >> >> >> >> > > > > >>>>>>>>>
>> >> >> >> >> >> > > > > >>>>>>>>> bfloat16x4_t f2 (bfloat16_t a)
>> >> >> >> >> >> > > > > >>>>>>>>> {
>> >> >> >> >> >> > > > > >>>>>>>>>     return (bfloat16x4_t) {a, a, a, a};
>> >> >> >> >> >> > > > > >>>>>>>>> }
>> >> >> >> >> >> > > > > >>>>>>>>>
>> >> >> >> >> >> > > > > >>>>>>>>> Compiling with arm-linux-gnueabi -O3 -mfpu=neon -mfloat-
>> >> >> >> >> >> > > > > >>>> abi=softfp
>> >> >> >> >> >> > > > > >>>>>>>>> -march=armv8.2-a+bf16+fp16 results in f2 not being
>> >> >> >> >> >> > > vectorized:
>> >> >> >> >> >> > > > > >>>>>>>>>
>> >> >> >> >> >> > > > > >>>>>>>>> f1:
>> >> >> >> >> >> > > > > >>>>>>>>>           vdup.16 d16, r0
>> >> >> >> >> >> > > > > >>>>>>>>>           vmov    r0, r1, d16  @ v4bf
>> >> >> >> >> >> > > > > >>>>>>>>>           bx      lr
>> >> >> >> >> >> > > > > >>>>>>>>>
>> >> >> >> >> >> > > > > >>>>>>>>> f2:
>> >> >> >> >> >> > > > > >>>>>>>>>           mov     r3, r0  @ __bf16
>> >> >> >> >> >> > > > > >>>>>>>>>           adr     r1, .L4
>> >> >> >> >> >> > > > > >>>>>>>>>           ldrd    r0, [r1]
>> >> >> >> >> >> > > > > >>>>>>>>>           mov     r2, r3  @ __bf16
>> >> >> >> >> >> > > > > >>>>>>>>>           mov     ip, r3  @ __bf16
>> >> >> >> >> >> > > > > >>>>>>>>>           bfi     r1, r2, #0, #16
>> >> >> >> >> >> > > > > >>>>>>>>>           bfi     r0, ip, #0, #16
>> >> >> >> >> >> > > > > >>>>>>>>>           bfi     r1, r3, #16, #16
>> >> >> >> >> >> > > > > >>>>>>>>>           bfi     r0, r2, #16, #16
>> >> >> >> >> >> > > > > >>>>>>>>>           bx      lr
>> >> >> >> >> >> > > > > >>>>>>>>>
>> >> >> >> >> >> > > > > >>>>>>>>> This seems to happen because vec_init pattern in neon.md
>> >> >> >> >> >> > > has VDQ
>> >> >> >> >> >> > > > > >>>>>> mode
>> >> >> >> >> >> > > > > >>>>>>>>> iterator, which doesn't include V4BF. In attached patch, I
>> >> >> >> >> >> > > changed
>> >> >> >> >> >> > > > > >>>>>>>>> mode
>> >> >> >> >> >> > > > > >>>>>>>>> to VDQX which seems to work for the test-case, and the
>> >> >> >> >> >> > > compiler
>> >> >> >> >> >> > > > > >>>> now
>> >> >> >> >> >> > > > > >>>>>> generates:
>> >> >> >> >> >> > > > > >>>>>>>>> f2:
>> >> >> >> >> >> > > > > >>>>>>>>>           vdup.16 d16, r0
>> >> >> >> >> >> > > > > >>>>>>>>>           vmov    r0, r1, d16  @ v4bf
>> >> >> >> >> >> > > > > >>>>>>>>>           bx      lr
>> >> >> >> >> >> > > > > >>>>>>>>>
>> >> >> >> >> >> > > > > >>>>>>>>> However, the pattern is also gated on TARGET_HAVE_MVE
>> >> >> >> >> >> > > and I am
>> >> >> >> >> >> > > > > >>>>>> not
>> >> >> >> >> >> > > > > >>>>>>>>> sure if either VDQ or VDQX are correct modes for MVE since
>> >> >> >> >> >> > > MVE
>> >> >> >> >> >> > > > > >>>> has
>> >> >> >> >> >> > > > > >>>>>>>>> only 128-bit vectors ?
>> >> >> >> >> >> > > > > >>>>>>>>>
>> >> >> >> >> >> > > > > >>>>>>>> I think patterns common to both Neon and MVE should be
>> >> >> >> >> >> > > moved to
>> >> >> >> >> >> > > > > >>>>>>>> vec-common.md, I don't know why such patterns were left in
>> >> >> >> >> >> > > > > >>>> neon.md.
>> >> >> >> >> >> > > > > >>>>>>> Since we end up calling neon_expand_vector_init for both
>> >> >> >> >> >> > > NEON and
>> >> >> >> >> >> > > > > >>>> MVE,
>> >> >> >> >> >> > > > > >>>>>>> I am not sure if we should separate the pattern ?
>> >> >> >> >> >> > > > > >>>>>>> Would it make sense to FAIL if the mode size isn't 16 bytes for
>> >> >> >> >> >> > > MVE as
>> >> >> >> >> >> > > > > >>>>>>> in attached patch so
>> >> >> >> >> >> > > > > >>>>>>> it will call neon_expand_vector_init only for 128-bit vectors ?
>> >> >> >> >> >> > > > > >>>>>>> Altho hard-coding 16 in the pattern doesn't seem a good idea to
>> >> >> >> >> >> > > me
>> >> >> >> >> >> > > > > >>>> either.
>> >> >> >> >> >> > > > > >>>>>> ping https://gcc.gnu.org/pipermail/gcc-patches/2021-
>> >> >> >> >> >> > > June/572342.html
>> >> >> >> >> >> > > > > >>>>>> (attaching patch as text).
>> >> >> >> >> >> > > > > >>>>>>
>> >> >> >> >> >> > > > > >>>>> --- a/gcc/config/arm/neon.md
>> >> >> >> >> >> > > > > >>>>> +++ b/gcc/config/arm/neon.md
>> >> >> >> >> >> > > > > >>>>> @@ -459,10 +459,12 @@
>> >> >> >> >> >> > > > > >>>>>    )
>> >> >> >> >> >> > > > > >>>>>
>> >> >> >> >> >> > > > > >>>>>    (define_expand "vec_init<mode><V_elem_l>"
>> >> >> >> >> >> > > > > >>>>> -  [(match_operand:VDQ 0 "s_register_operand")
>> >> >> >> >> >> > > > > >>>>> +  [(match_operand:VDQX 0 "s_register_operand")
>> >> >> >> >> >> > > > > >>>>>       (match_operand 1 "" "")]
>> >> >> >> >> >> > > > > >>>>>      "TARGET_NEON || TARGET_HAVE_MVE"
>> >> >> >> >> >> > > > > >>>>>    {
>> >> >> >> >> >> > > > > >>>>> +  if (TARGET_HAVE_MVE && GET_MODE_SIZE (GET_MODE
>> >> >> >> >> >> > > > > >>>> (operands[0])) != 16)
>> >> >> >> >> >> > > > > >>>>> +    FAIL;
>> >> >> >> >> >> > > > > >>>>>      neon_expand_vector_init (operands[0], operands[1]);
>> >> >> >> >> >> > > > > >>>>>      DONE;
>> >> >> >> >> >> > > > > >>>>>    })
>> >> >> >> >> >> > > > > >>>>>
>> >> >> >> >> >> > > > > >>>>> I think we should move this to vec-common.md like Christophe
>> >> >> >> >> >> > > said.
>> >> >> >> >> >> > > > > >>>>> Perhaps rather than making it FAIL for non-16 MVE sizes we just
>> >> >> >> >> >> > > disable it in
>> >> >> >> >> >> > > > > >>>> the expander condition?
>> >> >> >> >> >> > > > > >>>>> "TARGET_NEON || (TARGET_HAVE_MVE && GET_MODE_SIZE (<
>> >> >> >> >> >> > > > > >>>> VDQ>mode) != 16)"
>> >> >> >> >> >> > > > > >>>> Is it OK to use <MODE>mode ? Because using <VDQ>mode resulted
>> >> >> >> >> >> > > in lot
>> >> >> >> >> >> > > > > >>>> of build errors.
>> >> >> >> >> >> > > > > >>>> Also, I think the comparison should be inverted, ie, GET_MODE_SIZE
>> >> >> >> >> >> > > > > >>>> (<MODE>mode) == 16 since
>> >> >> >> >> >> > > > > >>>> we want to make the pattern pass if target is MVE and vector size is
>> >> >> >> >> >> > > 16 bytes ?
>> >> >> >> >> >> > > > > >>>> Do these changes in attached patch look OK ?
>> >> >> >> >> >> > > > > >>> Yes, you're right.
>> >> >> >> >> >> > > > > >>
>> >> >> >> >> >> > > > > >> Can't this be ARM_HAVE_<MODE>_ARITH like in most expanders in
>> >> >> >> >> >> > > vec-common.md?
>> >> >> >> >> >> > > > > >>
>> >> >> >> >> >> > > > > >> (maybe with a && !TARGET_REALLY_IWMMXT if needed)
>> >> >> >> >> >> > > > > > I wonder if this should be ARM_HAVE_<MODE>_LDST instead since
>> >> >> >> >> >> > > we're
>> >> >> >> >> >> > > > > > initializing the vector ?
>> >> >> >> >> >> > > > >
>> >> >> >> >> >> > > > >
>> >> >> >> >> >> > > > > Well, it really depends on which modes you want to enable.
>> >> >> >> >> >> > > > >
>> >> >> >> >> >> > > > >
>> >> >> >> >> >> > > > > Looks like your move VDQ -> VDQ adds V4BF, V8BF and DI.
>> >> >> >> >> >> > > > >
>> >> >> >> >> >> > > > > Are they all OK for Neon?
>> >> >> >> >> >> > > > >
>> >> >> >> >> >> > > > > They are not OK for MVE.
>> >> >> >> >> >> > > > >
>> >> >> >> >> >> > > > > Ideally you could add testcases to cover to the supported and
>> >> >> >> >> >> > > > > unsupported modes for both Neon and MVE.\
>> >> >> >> >> >> > > > >
>> >> >> >> >> >> > > > > Before your patch, the expander is enabled for MVE for 64 bit modes
>> >> >> >> >> >> > > > > (V8QI, V4HI, V2SI): what happens in this case? Does the compiler crash
>> >> >> >> >> >> > > > > or is there something else preventing the match?
>> >> >> >> >> >> > > > Hi,
>> >> >> >> >> >> > > > Apparently there is VALID_MVE_MODE macro, so is it better to use:
>> >> >> >> >> >> > > > TARGET_NEON || (TARGET_HAVE_MVE &&
>> >> >> >> >> >> > > VALID_MVE_MODE(<MODE>mode))
>> >> >> >> >> >> > > > as in the attached patch ?
>> >> >> >> >> >> >
>> >> >> >> >> >> > The change is ok. I would like to see some testcases like Christophe suggested, but this patch just moves the expander around rather than introducing new functionality.
>> >> >> >> >> >> Hi Kyrill,
>> >> >> >> >> >> As mentioned in the first email, the patch improves code-gen for
>> >> >> >> >> >> following test-case:
>> >> >> >> >> >>
>> >> >> >> >> >> bfloat16x4_t f (bfloat16_t a)
>> >> >> >> >> >> {
>> >> >> >> >> >>   return (bfloat16x4_t) {a, a, a, a};
>> >> >> >> >> >> }
>> >> >> >> >> >>
>> >> >> >> >> >> Before patch:
>> >> >> >> >> >> f:
>> >> >> >> >> >>         mov     r3, r0  @ __bf16
>> >> >> >> >> >>         adr     r1, .L4
>> >> >> >> >> >>         ldrd    r0, [r1]
>> >> >> >> >> >>         mov     r2, r3  @ __bf16
>> >> >> >> >> >>         mov     ip, r3  @ __bf16
>> >> >> >> >> >>         bfi     r1, r2, #0, #16
>> >> >> >> >> >>         bfi     r0, ip, #0, #16
>> >> >> >> >> >>         bfi     r1, r3, #16, #16
>> >> >> >> >> >>         bfi     r0, r2, #16, #16
>> >> >> >> >> >>         bx      lr
>> >> >> >> >> >>
>> >> >> >> >> >> After patch:
>> >> >> >> >> >> f:
>> >> >> >> >> >>         vdup.16 d16, r0
>> >> >> >> >> >>         vmov    r0, r1, d16  @ v4bf
>> >> >> >> >> >>         bx      lr
>> >> >> >> >> >>
>> >> >> >> >> >> because the patch changes mode from VDQ to VDQX to accommodate bf modes.
>> >> >> >> >> >> I have included the test in the attached patch.
>> >> >> >> >> >> I think Christophe's concerns were mainly about the right modes
>> >> >> >> >> >> getting enabled for MVE.
>> >> >> >> >> >> Unfortunately, I am not sure how to test for that because the FE
>> >> >> >> >> >> catches invalid modes, and we don't
>> >> >> >> >> >> end up hitting the pattern.
>> >> >> >> >> >>
>> >> >> >> >> >
>> >> >> >> >> > Hi Prathamesh,
>> >> >> >> >> >
>> >> >> >> >> > The new testcase fails on arm-linux-gnueabihf:
>> >> >> >> >> >  FAIL: gcc.target/arm/simd/pr98435.c (test for excess errors)
>> >> >> >> >> > Excess errors:
>> >> >> >> >> > /aci-gcc-fsf/builds/gcc-fsf-gccsrc/sysroot-arm-none-linux-gnueabihf/usr/include/gnu/stubs.h:7:11: fatal error: gnu/stubs-soft.h: No such file or directory
>> >> >> >> >> > compilation terminated.
>> >> >> >> >> >
>> >> >> >> >> > Because you don't check whether  -mfloat-abi=softfp is actually supported.
>> >> >> >> >> >
>> >> >> >> >> > Can you fix that?
>> >> >> >> >> Oops, sorry about that.
>> >> >> >> >> The attached patch fixes the test by requiring arm_softfloat and makes
>> >> >> >> >> it UNSUPPORTED on arm-linux-gnueabihf.
>> >> >> >> >> Does it look OK ?
>> >> >> >> >>
>> >> >> >> >
>> >> >> >> > I don't think that's right: it would make the test unsupported if softfp is not the default even if the toolchain has the needed multilibs.
>> >> >> >> > Did you check eg. with arm-eabi and multilibs enabled?
>> >> >> >> Ah OK, thanks for pointing it out!
>> >> >> >> Does the attached patch look correct ?
>> >> >> >>
>> >> >> >
>> >> >> > I don't think: this would skip the test even if the toolchain has multilibs enabled.
>> >> >> > Did you check eg. with arm-eabi and multilibs enabled and the usual option overrides?
>> >> >> It showed 3 PASS with second patch:
>> >> >> /* { dg-skip-if "skip test for hard float" { *-*-* } {
>> >> >> "-mfloat-abi=hard" } { "" } } */
>> >> >>
>> >> >> I ran it using make check-gcc RUNTESTFLAGS="simd.exp=pr98435.c"
>> >> >> and built the toolchain using:
>> >> >> abe.sh --target arm-eabi --build all --set multilib=aprofile gcc=gcc.git~master.
>> >> >> I suppose that's correct ?
>> >> >
>> >> >
>> >> > I use rmprofile for arm-eabi, but since aprofile also includes both hard and soft multilibs, that should be OK.
>> >> > However, I meant overriding the flags used for testing. Here is my current list:
>> >> >
>> >> > -mcpu=cortex-a7/-mfloat-abi=soft/-march=armv7ve+simd
>> >> > -mcpu=cortex-a7/-mfloat-abi=hard/-march=armv7ve+simd
>> >> > -mthumb/-mcpu=cortex-a7/-mfloat-abi=hard/-march=armv7ve+simd
>> >> > -mthumb/-mfloat-abi=soft/-march=armv6s-m
>> >> > -mthumb/-mfloat-abi=soft/-march=armv7-m
>> >> > -mthumb/-mfloat-abi=hard/-march=armv7e-m+fp
>> >> > -mthumb/-mfloat-abi=hard/-march=armv7e-m+fp.dp
>> >> > -mthumb/-mfloat-abi=hard/-march=armv8-m.main+fp+dsp
>> >> > -mthumb/-mfloat-abi=soft/-march=armv8.1-m.main
>> >> Ah right, thanks for the list.
>> >> So, with these options -mthumb/-mfloat-abi=hard/-march=armv7e-m+fp,
>> >> the test used to PASS but with the patch applied, it now appears UNSUPPORTED
>> >> because it skips the test for -mfloat-abi=hard.
>> >
>> >
>> > Yes, that's what I wrote above.
>> >
>> >>
>> >> So I guess what we want to check is if -mfloat-abi=hard is used, then
>> >> the target has multilib support enabled ?
>> >> Could you suggest how to check for that with dejagnu ?
>> >
>> >
>> > No, since you want to use floatfp, you want to make sure that floatfp is accepted by the toolchain.
>> > Looking at target-supports.exp, I'd suggest you try  arm_softfp_ok.
>> That worked, thanks!
>> It skipped the test on armhf and passed on arm-eabi with multilibs enabled.
>> Is this patch OK to commit ?
>>
>
> LGTM :-)
Thanks, since the patch was one liner fix to a test-case, I committed
it in e37ddb91a83335e58f16ef9ce9080668ad6ad47f,
under the "obvious" rule.

Thanks,
Prathamesh
>
>>
>> Thanks,
>> Prathamesh
>> >
>> > Christophe
>> >
>> >>
>> >> Thanks,
>> >> Prathamesh
>> >> >
>> >> > Christophe
>> >> >
>> >> >>
>> >> >> gcc -v output:
>> >> >> Configured with:
>> >> >> '/home/prathamesh.kulkarni/abe-toolchain-2/snapshots/gcc.git~master/configure'
>> >> >> SHELL=/bin/bash
>> >> >> --with-mpc=/home/prathamesh.kulkarni/abe-toolchain-2/builds/destdir/x86_64-pc-linux-gnu
>> >> >> --with-mpfr=/home/prathamesh.kulkarni/abe-toolchain-2/builds/destdir/x86_64-pc-linux-gnu
>> >> >> --with-gmp=/home/prathamesh.kulkarni/abe-toolchain-2/builds/destdir/x86_64-pc-linux-gnu
>> >> >> --with-gnu-as --with-gnu-ld --disable-libmudflap --enable-lto
>> >> >> --enable-shared --without-included-gettext --enable-nls
>> >> >> --with-system-zlib --disable-sjlj-exceptions
>> >> >> --enable-gnu-unique-object --enable-linker-build-id
>> >> >> --disable-libstdcxx-pch --enable-c99 --enable-clocale=gnu
>> >> >> --enable-libstdcxx-debug --enable-long-long --with-cloog=no
>> >> >> --with-ppl=no --with-isl=no --enable-multilib
>> >> >> --with-multilib-list=aprofile --enable-threads=no --disable-multiarch
>> >> >> --with-sysroot=/home/prathamesh.kulkarni/abe-toolchain-2/builds/destdir/x86_64-pc-linux-gnu/arm-eabi
>> >> >> --with-newlib --enable-checking=yes --disable-bootstrap
>> >> >> --enable-languages=c,c++,lto
>> >> >> --prefix=/home/prathamesh.kulkarni/abe-toolchain-2/builds/destdir/x86_64-pc-linux-gnu
>> >> >> --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu
>> >> >> --target=arm-eabi
>> >> >>
>> >> >> Thanks,
>> >> >> Prathamesh
>> >> >> >
>> >> >> >
>> >> >> > Christophe
>> >> >> >
>> >> >> >> Thanks,
>> >> >> >> Prathamesh
>> >> >> >> >
>> >> >> >> > Christophe
>> >> >> >> >
>> >> >> >> >>
>> >> >> >> >> Thanks,
>> >> >> >> >> Prathamesh
>> >> >> >> >> >
>> >> >> >> >> > Thanks
>> >> >> >> >> >
>> >> >> >> >> > Christophe
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> >>
>> >> >> >> >> >> Thanks,
>> >> >> >> >> >> Prathamesh
>> >> >> >> >> >> > Thanks,
>> >> >> >> >> >> > Kyrill
>> >> >> >> >> >> >
>> >> >> >> >> >> > > ping https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574206.html
>> >> >> >> >> >> > >
>> >> >> >> >> >> > > Thanks,
>> >> >> >> >> >> > > Prathamesh
>> >> >> >> >> >> > > >
>> >> >> >> >> >> > > > Thanks,
>> >> >> >> >> >> > > > Prathamesh
>> >> >> >> >> >> > > > >
>> >> >> >> >> >> > > > >
>> >> >> >> >> >> > > > > Thanks,
>> >> >> >> >> >> > > > >
>> >> >> >> >> >> > > > >
>> >> >> >> >> >> > > > > Christophe
>> >> >> >> >> >> > > > >
>> >> >> >> >> >> > > > >
>> >> >> >> >> >> > > > > > Thanks,
>> >> >> >> >> >> > > > > > Prathamesh
>> >> >> >> >> >> > > > > >>
>> >> >> >> >> >> > > > > >> Christophe
>> >> >> >> >> >> > > > > >>
>> >> >> >> >> >> > > > > >>
>> >> >> >> >> >> > > > > >>> Ok.
>> >> >> >> >> >> > > > > >>> Thanks,
>> >> >> >> >> >> > > > > >>> Kyrill
>> >> >> >> >> >> > > > > >>>
>> >> >> >> >> >> > > > > >>>
>> >> >> >> >> >> > > > > >>>> Thanks,
>> >> >> >> >> >> > > > > >>>> Prathamesh
>> >> >> >> >> >> > > > > >>>>> Thanks,
>> >> >> >> >> >> > > > > >>>>> Kyrill
>> >> >> >> >> >> > > > > >>>>>
>> >> >> >> >> >> > > > > >>>>>> Thanks,
>> >> >> >> >> >> > > > > >>>>>> Prathamesh
>> >> >> >> >> >> > > > > >>>>>>> Thanks,
>> >> >> >> >> >> > > > > >>>>>>> Prathamesh
>> >> >> >> >> >> > > > > >>>>>>>> That being said, I suggest you look at other similar patterns in
>> >> >> >> >> >> > > > > >>>>>>>> vec-common.md, most of which are gated on
>> >> >> >> >> >> > > > > >>>>>>>> ARM_HAVE_<MODE>_ARITH
>> >> >> >> >> >> > > > > >>>>>>>> and possibly beware of issues with iwmmxt :-)
>> >> >> >> >> >> > > > > >>>>>>>>
>> >> >> >> >> >> > > > > >>>>>>>> Christophe
>> >> >> >> >> >> > > > > >>>>>>>>
>> >> >> >> >> >> > > > > >>>>>>>>> Thanks,
>> >> >> >> >> >> > > > > >>>>>>>>> Prathamesh

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2021-08-13  7:04 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-04  7:25 [ARM] PR98435: Missed optimization in expanding vector constructor Prathamesh Kulkarni
2021-06-04  7:45 ` Christophe Lyon
2021-06-09 10:28   ` Prathamesh Kulkarni
2021-06-14  8:01     ` Prathamesh Kulkarni
2021-06-21  8:34       ` Prathamesh Kulkarni
2021-06-24 16:31       ` Kyrylo Tkachov
2021-06-28  8:37         ` Prathamesh Kulkarni
2021-06-28  8:40           ` Kyrylo Tkachov
2021-06-28  9:17             ` Christophe LYON
2021-06-29 10:46               ` Prathamesh Kulkarni
2021-06-30 15:21                 ` Christophe LYON
2021-07-01 10:56                   ` Prathamesh Kulkarni
2021-07-06  7:05                     ` Prathamesh Kulkarni
2021-07-06  8:03                       ` Kyrylo Tkachov
2021-07-06  9:25                         ` Prathamesh Kulkarni
2021-07-06  9:28                           ` Kyrylo Tkachov
2021-07-06 10:16                             ` Christophe Lyon
2021-08-03  9:29                           ` Christophe Lyon
2021-08-03 10:56                             ` Prathamesh Kulkarni
2021-08-03 15:22                               ` Christophe Lyon
2021-08-05 12:27                                 ` Prathamesh Kulkarni
2021-08-05 12:34                                   ` Christophe Lyon
2021-08-06  8:59                                     ` Prathamesh Kulkarni
2021-08-06  9:19                                       ` Christophe Lyon
2021-08-06  9:50                                         ` Prathamesh Kulkarni
2021-08-06 12:01                                           ` Christophe Lyon
2021-08-09  5:07                                             ` Prathamesh Kulkarni
2021-08-09 16:19                                               ` Christophe Lyon
2021-08-13  7:04                                                 ` Prathamesh Kulkarni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).