public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Sandiford <richard.sandiford@arm.com>
To: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
Cc: gcc Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [aarch64] Code-gen for vector initialization involving constants
Date: Tue, 02 May 2023 13:52:28 +0100	[thread overview]
Message-ID: <mpt7ctqvqsz.fsf@arm.com> (raw)
In-Reply-To: <CAAgBjMnP5Z3bsge08n5GWszBBqPgwV66KaZjr6vYOs3GUvv2qw@mail.gmail.com> (Prathamesh Kulkarni's message of "Tue, 2 May 2023 18:08:27 +0530")

Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> On Tue, 2 May 2023 at 17:32, Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
>> > On Tue, 2 May 2023 at 14:56, Richard Sandiford
>> > <richard.sandiford@arm.com> wrote:
>> >> > [aarch64] Improve code-gen for vector initialization with single constant element.
>> >> >
>> >> > gcc/ChangeLog:
>> >> >       * config/aarch64/aarc64.cc (aarch64_expand_vector_init): Tweak condition
>> >> >       if (n_var == n_elts && n_elts <= 16) to allow a single constant,
>> >> >       and if maxv == 1, use constant element for duplicating into register.
>> >> >
>> >> > gcc/testsuite/ChangeLog:
>> >> >       * gcc.target/aarch64/vec-init-single-const.c: New test.
>> >> >
>> >> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
>> >> > index 2b0de7ca038..f46750133a6 100644
>> >> > --- a/gcc/config/aarch64/aarch64.cc
>> >> > +++ b/gcc/config/aarch64/aarch64.cc
>> >> > @@ -22167,7 +22167,7 @@ aarch64_expand_vector_init (rtx target, rtx vals)
>> >> >       and matches[X][1] with the count of duplicate elements (if X is the
>> >> >       earliest element which has duplicates).  */
>> >> >
>> >> > -  if (n_var == n_elts && n_elts <= 16)
>> >> > +  if ((n_var >= n_elts - 1) && n_elts <= 16)
>> >> >      {
>> >> >        int matches[16][2] = {0};
>> >> >        for (int i = 0; i < n_elts; i++)
>> >> > @@ -22227,6 +22227,18 @@ aarch64_expand_vector_init (rtx target, rtx vals)
>> >> >            vector register.  For big-endian we want that position to hold
>> >> >            the last element of VALS.  */
>> >> >         maxelement = BYTES_BIG_ENDIAN ? n_elts - 1 : 0;
>> >> > +
>> >> > +       /* If we have a single constant element, use that for duplicating
>> >> > +          instead.  */
>> >> > +       if (n_var == n_elts - 1)
>> >> > +         for (int i = 0; i < n_elts; i++)
>> >> > +           if (CONST_INT_P (XVECEXP (vals, 0, i))
>> >> > +               || CONST_DOUBLE_P (XVECEXP (vals, 0, i)))
>> >> > +             {
>> >> > +               maxelement = i;
>> >> > +               break;
>> >> > +             }
>> >> > +
>> >> >         rtx x = force_reg (inner_mode, XVECEXP (vals, 0, maxelement));
>> >> >         aarch64_emit_move (target, lowpart_subreg (mode, x, inner_mode));
>> >>
>> >> We don't want to force the constant into a register though.
>> > OK right, sorry.
>> > With the attached patch, for the following test-case:
>> > int64x2_t f_s64(int64_t x)
>> > {
>> >   return (int64x2_t) { x, 1 };
>> > }
>> >
>> > it loads constant from memory (same code-gen as without patch).
>> > f_s64:
>> >         adrp    x1, .LC0
>> >         ldr     q0, [x1, #:lo12:.LC0]
>> >         ins     v0.d[0], x0
>> >         ret
>> >
>> > Does the patch look OK ?
>> >
>> > Thanks,
>> > Prathamesh
>> > [...]
>> > [aarch64] Improve code-gen for vector initialization with single constant element.
>> >
>> > gcc/ChangeLog:
>> >       * config/aarch64/aarc64.cc (aarch64_expand_vector_init): Tweak condition
>> >       if (n_var == n_elts && n_elts <= 16) to allow a single constant,
>> >       and if maxv == 1, use constant element for duplicating into register.
>> >
>> > gcc/testsuite/ChangeLog:
>> >       * gcc.target/aarch64/vec-init-single-const.c: New test.
>> >
>> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
>> > index 2b0de7ca038..97309ddec4f 100644
>> > --- a/gcc/config/aarch64/aarch64.cc
>> > +++ b/gcc/config/aarch64/aarch64.cc
>> > @@ -22167,7 +22167,7 @@ aarch64_expand_vector_init (rtx target, rtx vals)
>> >       and matches[X][1] with the count of duplicate elements (if X is the
>> >       earliest element which has duplicates).  */
>> >
>> > -  if (n_var == n_elts && n_elts <= 16)
>> > +  if ((n_var >= n_elts - 1) && n_elts <= 16)
>>
>> No need for the extra brackets.
> Adjusted, thanks. Sorry if this sounds like a silly question, but why
> do we need the n_elts <= 16 check ?
> Won't n_elts be always <= 16 since max number of elements in a vector
> would be 16 for V16QI ?

Was wondering the same thing :)

Let's leave it though.

>> >      {
>> >        int matches[16][2] = {0};
>> >        for (int i = 0; i < n_elts; i++)
>> > @@ -22227,8 +22227,26 @@ aarch64_expand_vector_init (rtx target, rtx vals)
>> >            vector register.  For big-endian we want that position to hold
>> >            the last element of VALS.  */
>> >         maxelement = BYTES_BIG_ENDIAN ? n_elts - 1 : 0;
>> > -       rtx x = force_reg (inner_mode, XVECEXP (vals, 0, maxelement));
>> > -       aarch64_emit_move (target, lowpart_subreg (mode, x, inner_mode));
>> > +
>> > +       /* If we have a single constant element, use that for duplicating
>> > +          instead.  */
>> > +       if (n_var == n_elts - 1)
>> > +         for (int i = 0; i < n_elts; i++)
>> > +           if (CONST_INT_P (XVECEXP (vals, 0, i))
>> > +               || CONST_DOUBLE_P (XVECEXP (vals, 0, i)))
>> > +             {
>> > +               maxelement = i;
>> > +               break;
>> > +             }
>> > +
>> > +       rtx maxval = XVECEXP (vals, 0, maxelement);
>> > +       if (!(CONST_INT_P (maxval) || CONST_DOUBLE_P (maxval)))
>> > +         {
>> > +           rtx x = force_reg (inner_mode, XVECEXP (vals, 0, maxelement));
>> > +           aarch64_emit_move (target, lowpart_subreg (mode, x, inner_mode));
>> > +         }
>> > +       else
>> > +         aarch64_emit_move (target, gen_vec_duplicate (mode, maxval));
>> >       }
>> >        else
>> >       {
>>
>> This seems a bit convoluted.  It might be easier to record whether
>> we see a CONST_INT_P or a CONST_DOUBLE_P during the previous loop,
>> and if so what the constant is.  Then handle that case first,
>> as a separate arm of the "if".
> Adjusted in the attached patch. Does it look OK ?

I meant: adjust

      int maxelement = 0;
      int maxv = 0;
      for (int i = 0; i < n_elts; i++)
	if (matches[i][1] > maxv)
	  {
	    maxelement = i;
	    maxv = matches[i][1];
	  }

so that it also records any CONST_INT or CONST_DOUBLE (as an rtx).

>> > diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-single-const.c b/gcc/testsuite/gcc.target/aarch64/vec-init-single-const.c
>> > new file mode 100644
>> > index 00000000000..682fd43439a
>> > --- /dev/null
>> > +++ b/gcc/testsuite/gcc.target/aarch64/vec-init-single-const.c
>> > @@ -0,0 +1,66 @@
>> > +/* { dg-do compile } */
>> > +/* { dg-options "-O2" } */
>> > +/* { dg-final { check-function-bodies "**" "" "" } } */
>> > +
>> > +#include <arm_neon.h>
>> > +
>> > +/*
>> > +** f_s8:
>> > +**   ...
>> > +**   dup     v[0-9]+\.16b, w[0-9]+
>> > +**   movi    v[0-9]+\.8b, 0x1
>> > +**   ins     v[0-9]+\.b\[15\], v[0-9]+\.b\[0\]
>> > +**   ...
>> > +**   ret
>>
>> Like with the divide-and-conquer patch, there's nothing that requires
>> the first two instructions to be in that order.
> Hmm, will it be OK to disable scheduling by passing
> -fno-schedule-insns -fno-schedule-insns2
> for the test ?

Guess we might as well try that for now.

Elsewhere I've used:

  (
     first sequence
  |
     second sequence
  )
     common part

but we probably have enough control over the unscheduled sequence
for that not to be necessary here.

>> What is the second ... hiding?  What sequences do we actually generate?
> Sorry, added them by mistake. They were the exact sequences. Adjusted
> tests in the patch.
>>
>> BTW, remember to say how patches were tested :-)
> Right, sorry. The patch is under bootstrap+test on aarch64-linux-gnu.

Please also test the new tests on big-endian.

> +/*
> +** f_s8:
> +**	dup	v[0-9]+\.16b, w[0-9]+

Without the ...s, this must be v0 and w0 respectively

> +**	movi	v[0-9]+\.8b, 0x1

Would be good to capture the register number here and use \1 in the
following line.

> +**	ins	v[0-9]+\.b\[15\], v[0-9]+\.b\[0\]

Similarly v0 for the first operand here.

Thanks,
Richard

> +**	ret
> +*/
> +
> +int8x16_t f_s8(int8_t x)
> +{
> +  return (int8x16_t) { x, x, x, x, x, x, x, x,
> +                       x, x, x, x, x, x, x, 1 };
> +}
> +
> +/*
> +** f_s16:
> +**	dup	v[0-9]+\.8h, w[0-9]+
> +**	movi	v[0-9]+\.4h, 0x1
> +**	ins	v[0-9]+\.h\[7\], v[0-9]+\.h\[0\]
> +**	ret
> +*/
> +
> +int16x8_t f_s16(int16_t x)
> +{
> +  return (int16x8_t) { x, x, x, x, x, x, x, 1 };
> +}
> +
> +/*
> +** f_s32:
> +**	dup	v[0-9]\.4s, w[0-9]+
> +**	movi	v[0-9]\.2s, 0x1
> +**	ins	v[0-9]+\.s\[3\], v[0-9]+\.s\[0\]
> +**	ret
> +*/
> +
> +int32x4_t f_s32(int32_t x)
> +{
> +  return (int32x4_t) { x, x, x, 1 };
> +}
> +
> +/*
> +** f_s64:
> +**	adrp	x[0-9]+, .LC[0-9]+
> +**	ldr	q[0-9]+, \[x[0-9]+, #:lo12:.LC[0-9]+\]
> +**	ins	v[0-9]+\.d\[0\], x[0-9]+
> +**	ret
> +*/
> +
> +int64x2_t f_s64(int64_t x)
> +{
> +  return (int64x2_t) { x, 1 };
> +}

  reply	other threads:[~2023-05-02 12:52 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-03  7:16 Prathamesh Kulkarni
2023-02-13  6:28 ` Prathamesh Kulkarni
2023-04-03 18:12   ` Prathamesh Kulkarni
2023-04-25 10:59 ` Richard Sandiford
2023-05-02  5:41   ` Prathamesh Kulkarni
2023-05-02  9:25     ` Richard Sandiford
2023-05-02 10:22       ` Prathamesh Kulkarni
2023-05-02 12:02         ` Richard Sandiford
2023-05-02 12:38           ` Prathamesh Kulkarni
2023-05-02 12:52             ` Richard Sandiford [this message]
2023-05-03 11:28               ` Prathamesh Kulkarni
2023-05-11 19:15                 ` Richard Sandiford
2023-05-15 14:09                   ` Prathamesh Kulkarni
2023-05-15 18:59                     ` Richard Sandiford
2023-05-17 15:23                       ` Prathamesh Kulkarni
2023-05-18  8:07                         ` Richard Sandiford
2023-05-18 14:41                           ` Prathamesh Kulkarni
2023-05-18 16:34                             ` Richard Sandiford
2023-05-19 10:56                               ` Prathamesh Kulkarni
2023-05-22  8:48                                 ` Richard Sandiford
2023-05-24  9:29                                   ` Prathamesh Kulkarni
2023-05-24 10:10                                     ` Richard Sandiford
2023-05-24 19:13                                       ` Prathamesh Kulkarni
2023-05-24 19:58                                         ` Richard Sandiford
2023-05-25  6:47                                           ` Prathamesh Kulkarni
2023-05-25  7:34                                             ` Richard Sandiford
2023-05-25  9:56                                               ` Prathamesh Kulkarni
2023-05-26  3:04                                                 ` Prathamesh Kulkarni
2023-05-30 18:53                                                   ` Richard Sandiford
2023-06-12 17:52                                                     ` Prathamesh Kulkarni
2023-05-24 19:50                                       ` Prathamesh Kulkarni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mpt7ctqvqsz.fsf@arm.com \
    --to=richard.sandiford@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=prathamesh.kulkarni@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).