Re: [ARM] PR98435: Missed optimization in expanding vector constructor

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
Cc: Christophe LYON <christophe.lyon@foss.st.com>,
	gcc Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [ARM] PR98435: Missed optimization in expanding vector constructor
Date: Tue, 6 Jul 2021 14:55:16 +0530	[thread overview]
Message-ID: <CAAgBjM=E5Ye-LiH2FYrE90tM0YqpLT5OQ6iz5yEr-RbPBzdRrw@mail.gmail.com> (raw)
In-Reply-To: <PAXPR08MB69266327F7D386A1BA8DA46E931B9@PAXPR08MB6926.eurprd08.prod.outlook.com>

[-- Attachment #1: Type: text/plain, Size: 9930 bytes --]

On Tue, 6 Jul 2021 at 13:33, Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> > Sent: 06 July 2021 08:06
> > To: Christophe LYON <christophe.lyon@foss.st.com>
> > Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; gcc Patches <gcc-
> > patches@gcc.gnu.org>
> > Subject: Re: [ARM] PR98435: Missed optimization in expanding vector
> > constructor
> >
> > On Thu, 1 Jul 2021 at 16:26, Prathamesh Kulkarni
> > <prathamesh.kulkarni@linaro.org> wrote:
> > >
> > > On Wed, 30 Jun 2021 at 20:51, Christophe LYON
> > > <christophe.lyon@foss.st.com> wrote:
> > > >
> > > >
> > > > On 29/06/2021 12:46, Prathamesh Kulkarni wrote:
> > > > > On Mon, 28 Jun 2021 at 14:48, Christophe LYON
> > > > > <christophe.lyon@foss.st.com> wrote:
> > > > >>
> > > > >> On 28/06/2021 10:40, Kyrylo Tkachov via Gcc-patches wrote:
> > > > >>>> -----Original Message-----
> > > > >>>> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> > > > >>>> Sent: 28 June 2021 09:38
> > > > >>>> To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> > > > >>>> Cc: Christophe Lyon <christophe.lyon@linaro.org>; gcc Patches
> > <gcc-
> > > > >>>> patches@gcc.gnu.org>
> > > > >>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding
> > vector
> > > > >>>> constructor
> > > > >>>>
> > > > >>>> On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov
> > <Kyrylo.Tkachov@arm.com>
> > > > >>>> wrote:
> > > > >>>>>
> > > > >>>>>> -----Original Message-----
> > > > >>>>>> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> > > > >>>>>> Sent: 14 June 2021 09:02
> > > > >>>>>> To: Christophe Lyon <christophe.lyon@linaro.org>
> > > > >>>>>> Cc: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov
> > > > >>>>>> <Kyrylo.Tkachov@arm.com>
> > > > >>>>>> Subject: Re: [ARM] PR98435: Missed optimization in expanding
> > vector
> > > > >>>>>> constructor
> > > > >>>>>>
> > > > >>>>>> On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni
> > > > >>>>>> <prathamesh.kulkarni@linaro.org> wrote:
> > > > >>>>>>> On Fri, 4 Jun 2021 at 13:15, Christophe Lyon
> > > > >>>> <christophe.lyon@linaro.org>
> > > > >>>>>> wrote:
> > > > >>>>>>>> On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via Gcc-
> > patches
> > > > >>>>>>>> <gcc-patches@gcc.gnu.org> wrote:
> > > > >>>>>>>>> Hi,
> > > > >>>>>>>>> As mentioned in PR, for the following test-case:
> > > > >>>>>>>>>
> > > > >>>>>>>>> #include <arm_neon.h>
> > > > >>>>>>>>>
> > > > >>>>>>>>> bfloat16x4_t f1 (bfloat16_t a)
> > > > >>>>>>>>> {
> > > > >>>>>>>>>     return vdup_n_bf16 (a);
> > > > >>>>>>>>> }
> > > > >>>>>>>>>
> > > > >>>>>>>>> bfloat16x4_t f2 (bfloat16_t a)
> > > > >>>>>>>>> {
> > > > >>>>>>>>>     return (bfloat16x4_t) {a, a, a, a};
> > > > >>>>>>>>> }
> > > > >>>>>>>>>
> > > > >>>>>>>>> Compiling with arm-linux-gnueabi -O3 -mfpu=neon -mfloat-
> > > > >>>> abi=softfp
> > > > >>>>>>>>> -march=armv8.2-a+bf16+fp16 results in f2 not being
> > vectorized:
> > > > >>>>>>>>>
> > > > >>>>>>>>> f1:
> > > > >>>>>>>>>           vdup.16 d16, r0
> > > > >>>>>>>>>           vmov    r0, r1, d16  @ v4bf
> > > > >>>>>>>>>           bx      lr
> > > > >>>>>>>>>
> > > > >>>>>>>>> f2:
> > > > >>>>>>>>>           mov     r3, r0  @ __bf16
> > > > >>>>>>>>>           adr     r1, .L4
> > > > >>>>>>>>>           ldrd    r0, [r1]
> > > > >>>>>>>>>           mov     r2, r3  @ __bf16
> > > > >>>>>>>>>           mov     ip, r3  @ __bf16
> > > > >>>>>>>>>           bfi     r1, r2, #0, #16
> > > > >>>>>>>>>           bfi     r0, ip, #0, #16
> > > > >>>>>>>>>           bfi     r1, r3, #16, #16
> > > > >>>>>>>>>           bfi     r0, r2, #16, #16
> > > > >>>>>>>>>           bx      lr
> > > > >>>>>>>>>
> > > > >>>>>>>>> This seems to happen because vec_init pattern in neon.md
> > has VDQ
> > > > >>>>>> mode
> > > > >>>>>>>>> iterator, which doesn't include V4BF. In attached patch, I
> > changed
> > > > >>>>>>>>> mode
> > > > >>>>>>>>> to VDQX which seems to work for the test-case, and the
> > compiler
> > > > >>>> now
> > > > >>>>>> generates:
> > > > >>>>>>>>> f2:
> > > > >>>>>>>>>           vdup.16 d16, r0
> > > > >>>>>>>>>           vmov    r0, r1, d16  @ v4bf
> > > > >>>>>>>>>           bx      lr
> > > > >>>>>>>>>
> > > > >>>>>>>>> However, the pattern is also gated on TARGET_HAVE_MVE
> > and I am
> > > > >>>>>> not
> > > > >>>>>>>>> sure if either VDQ or VDQX are correct modes for MVE since
> > MVE
> > > > >>>> has
> > > > >>>>>>>>> only 128-bit vectors ?
> > > > >>>>>>>>>
> > > > >>>>>>>> I think patterns common to both Neon and MVE should be
> > moved to
> > > > >>>>>>>> vec-common.md, I don't know why such patterns were left in
> > > > >>>> neon.md.
> > > > >>>>>>> Since we end up calling neon_expand_vector_init for both
> > NEON and
> > > > >>>> MVE,
> > > > >>>>>>> I am not sure if we should separate the pattern ?
> > > > >>>>>>> Would it make sense to FAIL if the mode size isn't 16 bytes for
> > MVE as
> > > > >>>>>>> in attached patch so
> > > > >>>>>>> it will call neon_expand_vector_init only for 128-bit vectors ?
> > > > >>>>>>> Altho hard-coding 16 in the pattern doesn't seem a good idea to
> > me
> > > > >>>> either.
> > > > >>>>>> ping https://gcc.gnu.org/pipermail/gcc-patches/2021-
> > June/572342.html
> > > > >>>>>> (attaching patch as text).
> > > > >>>>>>
> > > > >>>>> --- a/gcc/config/arm/neon.md
> > > > >>>>> +++ b/gcc/config/arm/neon.md
> > > > >>>>> @@ -459,10 +459,12 @@
> > > > >>>>>    )
> > > > >>>>>
> > > > >>>>>    (define_expand "vec_init<mode><V_elem_l>"
> > > > >>>>> -  [(match_operand:VDQ 0 "s_register_operand")
> > > > >>>>> +  [(match_operand:VDQX 0 "s_register_operand")
> > > > >>>>>       (match_operand 1 "" "")]
> > > > >>>>>      "TARGET_NEON || TARGET_HAVE_MVE"
> > > > >>>>>    {
> > > > >>>>> +  if (TARGET_HAVE_MVE && GET_MODE_SIZE (GET_MODE
> > > > >>>> (operands[0])) != 16)
> > > > >>>>> +    FAIL;
> > > > >>>>>      neon_expand_vector_init (operands[0], operands[1]);
> > > > >>>>>      DONE;
> > > > >>>>>    })
> > > > >>>>>
> > > > >>>>> I think we should move this to vec-common.md like Christophe
> > said.
> > > > >>>>> Perhaps rather than making it FAIL for non-16 MVE sizes we just
> > disable it in
> > > > >>>> the expander condition?
> > > > >>>>> "TARGET_NEON || (TARGET_HAVE_MVE && GET_MODE_SIZE (<
> > > > >>>> VDQ>mode) != 16)"
> > > > >>>> Is it OK to use <MODE>mode ? Because using <VDQ>mode resulted
> > in lot
> > > > >>>> of build errors.
> > > > >>>> Also, I think the comparison should be inverted, ie, GET_MODE_SIZE
> > > > >>>> (<MODE>mode) == 16 since
> > > > >>>> we want to make the pattern pass if target is MVE and vector size is
> > 16 bytes ?
> > > > >>>> Do these changes in attached patch look OK ?
> > > > >>> Yes, you're right.
> > > > >>
> > > > >> Can't this be ARM_HAVE_<MODE>_ARITH like in most expanders in
> > vec-common.md?
> > > > >>
> > > > >> (maybe with a && !TARGET_REALLY_IWMMXT if needed)
> > > > > I wonder if this should be ARM_HAVE_<MODE>_LDST instead since
> > we're
> > > > > initializing the vector ?
> > > >
> > > >
> > > > Well, it really depends on which modes you want to enable.
> > > >
> > > >
> > > > Looks like your move VDQ -> VDQ adds V4BF, V8BF and DI.
> > > >
> > > > Are they all OK for Neon?
> > > >
> > > > They are not OK for MVE.
> > > >
> > > > Ideally you could add testcases to cover to the supported and
> > > > unsupported modes for both Neon and MVE.\
> > > >
> > > > Before your patch, the expander is enabled for MVE for 64 bit modes
> > > > (V8QI, V4HI, V2SI): what happens in this case? Does the compiler crash
> > > > or is there something else preventing the match?
> > > Hi,
> > > Apparently there is VALID_MVE_MODE macro, so is it better to use:
> > > TARGET_NEON || (TARGET_HAVE_MVE &&
> > VALID_MVE_MODE(<MODE>mode))
> > > as in the attached patch ?
>
> The change is ok. I would like to see some testcases like Christophe suggested, but this patch just moves the expander around rather than introducing new functionality.
Hi Kyrill,
As mentioned in the first email, the patch improves code-gen for
following test-case:

bfloat16x4_t f (bfloat16_t a)
{
  return (bfloat16x4_t) {a, a, a, a};
}

Before patch:
f:
        mov     r3, r0  @ __bf16
        adr     r1, .L4
        ldrd    r0, [r1]
        mov     r2, r3  @ __bf16
        mov     ip, r3  @ __bf16
        bfi     r1, r2, #0, #16
        bfi     r0, ip, #0, #16
        bfi     r1, r3, #16, #16
        bfi     r0, r2, #16, #16
        bx      lr

After patch:
f:
        vdup.16 d16, r0
        vmov    r0, r1, d16  @ v4bf
        bx      lr

because the patch changes mode from VDQ to VDQX to accommodate bf modes.
I have included the test in the attached patch.
I think Christophe's concerns were mainly about the right modes
getting enabled for MVE.
Unfortunately, I am not sure how to test for that because the FE
catches invalid modes, and we don't
end up hitting the pattern.

Thanks,
Prathamesh
> Thanks,
> Kyrill
>
> > ping https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574206.html
> >
> > Thanks,
> > Prathamesh
> > >
> > > Thanks,
> > > Prathamesh
> > > >
> > > >
> > > > Thanks,
> > > >
> > > >
> > > > Christophe
> > > >
> > > >
> > > > > Thanks,
> > > > > Prathamesh
> > > > >>
> > > > >> Christophe
> > > > >>
> > > > >>
> > > > >>> Ok.
> > > > >>> Thanks,
> > > > >>> Kyrill
> > > > >>>
> > > > >>>
> > > > >>>> Thanks,
> > > > >>>> Prathamesh
> > > > >>>>> Thanks,
> > > > >>>>> Kyrill
> > > > >>>>>
> > > > >>>>>> Thanks,
> > > > >>>>>> Prathamesh
> > > > >>>>>>> Thanks,
> > > > >>>>>>> Prathamesh
> > > > >>>>>>>> That being said, I suggest you look at other similar patterns in
> > > > >>>>>>>> vec-common.md, most of which are gated on
> > > > >>>>>>>> ARM_HAVE_<MODE>_ARITH
> > > > >>>>>>>> and possibly beware of issues with iwmmxt :-)
> > > > >>>>>>>>
> > > > >>>>>>>> Christophe
> > > > >>>>>>>>
> > > > >>>>>>>>> Thanks,
> > > > >>>>>>>>> Prathamesh

[-- Attachment #2: pr98435-5.txt --]
[-- Type: text/plain, Size: 1780 bytes --]

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 6a6573317cf..0c98b3a8f23 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -458,15 +458,6 @@
   [(set_attr "type" "neon_store1_one_lane_q,neon_to_gp_q")]
 )
 
-(define_expand "vec_init<mode><V_elem_l>"
-  [(match_operand:VDQ 0 "s_register_operand")
-   (match_operand 1 "" "")]
-  "TARGET_NEON || TARGET_HAVE_MVE"
-{
-  neon_expand_vector_init (operands[0], operands[1]);
-  DONE;
-})
-
 ;; Doubleword and quadword arithmetic.
 
 ;; NOTE: some other instructions also support 64-bit integer
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index 8e35151da46..7858be9f28e 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -565,3 +565,12 @@
 
   DONE;
 })
+
+(define_expand "vec_init<mode><V_elem_l>"
+  [(match_operand:VDQX 0 "s_register_operand")
+   (match_operand 1 "" "")]
+  "TARGET_NEON || (TARGET_HAVE_MVE && VALID_MVE_MODE (<MODE>mode))" 
+{
+  neon_expand_vector_init (operands[0], operands[1]);
+  DONE;
+})
diff --git a/gcc/testsuite/gcc.target/arm/simd/pr98435.c b/gcc/testsuite/gcc.target/arm/simd/pr98435.c
new file mode 100644
index 00000000000..0af8633fd56
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/pr98435.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math" } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-add-options arm_v8_2a_bf16_neon } */
+/* { dg-additional-options "-mfloat-abi=softfp -march=armv8.2-a+bf16+fp16" } */
+
+#include <arm_neon.h>
+
+bfloat16x4_t f (bfloat16_t a)
+{
+  return (bfloat16x4_t) {a, a, a, a};
+}
+
+/* { dg-final { scan-assembler {\tvdup.16\td[0-9]+, r0} } } */
+/* { dg-final { scan-assembler {\tvmov\tr0, r1, d[0-9]+} } } */

next prev parent reply	other threads:[~2021-07-06  9:25 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-04  7:25 Prathamesh Kulkarni
2021-06-04  7:45 ` Christophe Lyon
2021-06-09 10:28   ` Prathamesh Kulkarni
2021-06-14  8:01     ` Prathamesh Kulkarni
2021-06-21  8:34       ` Prathamesh Kulkarni
2021-06-24 16:31       ` Kyrylo Tkachov
2021-06-28  8:37         ` Prathamesh Kulkarni
2021-06-28  8:40           ` Kyrylo Tkachov
2021-06-28  9:17             ` Christophe LYON
2021-06-29 10:46               ` Prathamesh Kulkarni
2021-06-30 15:21                 ` Christophe LYON
2021-07-01 10:56                   ` Prathamesh Kulkarni
2021-07-06  7:05                     ` Prathamesh Kulkarni
2021-07-06  8:03                       ` Kyrylo Tkachov
2021-07-06  9:25                         ` Prathamesh Kulkarni [this message]
2021-07-06  9:28                           ` Kyrylo Tkachov
2021-07-06 10:16                             ` Christophe Lyon
2021-08-03  9:29                           ` Christophe Lyon
2021-08-03 10:56                             ` Prathamesh Kulkarni
2021-08-03 15:22                               ` Christophe Lyon
2021-08-05 12:27                                 ` Prathamesh Kulkarni
2021-08-05 12:34                                   ` Christophe Lyon
2021-08-06  8:59                                     ` Prathamesh Kulkarni
2021-08-06  9:19                                       ` Christophe Lyon
2021-08-06  9:50                                         ` Prathamesh Kulkarni
2021-08-06 12:01                                           ` Christophe Lyon
2021-08-09  5:07                                             ` Prathamesh Kulkarni
2021-08-09 16:19                                               ` Christophe Lyon
2021-08-13  7:04                                                 ` Prathamesh Kulkarni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAAgBjM=E5Ye-LiH2FYrE90tM0YqpLT5OQ6iz5yEr-RbPBzdRrw@mail.gmail.com' \
    --to=prathamesh.kulkarni@linaro.org \
    --cc=Kyrylo.Tkachov@arm.com \
    --cc=christophe.lyon@foss.st.com \
    --cc=gcc-patches@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).