From: Richard Sandiford <richard.sandiford@arm.com>
To: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
Cc: gcc Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [aarch64] Code-gen for vector initialization involving constants
Date: Tue, 02 May 2023 13:52:28 +0100 [thread overview]
Message-ID: <mpt7ctqvqsz.fsf@arm.com> (raw)
In-Reply-To: <CAAgBjMnP5Z3bsge08n5GWszBBqPgwV66KaZjr6vYOs3GUvv2qw@mail.gmail.com> (Prathamesh Kulkarni's message of "Tue, 2 May 2023 18:08:27 +0530")
Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> On Tue, 2 May 2023 at 17:32, Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
>> > On Tue, 2 May 2023 at 14:56, Richard Sandiford
>> > <richard.sandiford@arm.com> wrote:
>> >> > [aarch64] Improve code-gen for vector initialization with single constant element.
>> >> >
>> >> > gcc/ChangeLog:
>> >> > * config/aarch64/aarc64.cc (aarch64_expand_vector_init): Tweak condition
>> >> > if (n_var == n_elts && n_elts <= 16) to allow a single constant,
>> >> > and if maxv == 1, use constant element for duplicating into register.
>> >> >
>> >> > gcc/testsuite/ChangeLog:
>> >> > * gcc.target/aarch64/vec-init-single-const.c: New test.
>> >> >
>> >> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
>> >> > index 2b0de7ca038..f46750133a6 100644
>> >> > --- a/gcc/config/aarch64/aarch64.cc
>> >> > +++ b/gcc/config/aarch64/aarch64.cc
>> >> > @@ -22167,7 +22167,7 @@ aarch64_expand_vector_init (rtx target, rtx vals)
>> >> > and matches[X][1] with the count of duplicate elements (if X is the
>> >> > earliest element which has duplicates). */
>> >> >
>> >> > - if (n_var == n_elts && n_elts <= 16)
>> >> > + if ((n_var >= n_elts - 1) && n_elts <= 16)
>> >> > {
>> >> > int matches[16][2] = {0};
>> >> > for (int i = 0; i < n_elts; i++)
>> >> > @@ -22227,6 +22227,18 @@ aarch64_expand_vector_init (rtx target, rtx vals)
>> >> > vector register. For big-endian we want that position to hold
>> >> > the last element of VALS. */
>> >> > maxelement = BYTES_BIG_ENDIAN ? n_elts - 1 : 0;
>> >> > +
>> >> > + /* If we have a single constant element, use that for duplicating
>> >> > + instead. */
>> >> > + if (n_var == n_elts - 1)
>> >> > + for (int i = 0; i < n_elts; i++)
>> >> > + if (CONST_INT_P (XVECEXP (vals, 0, i))
>> >> > + || CONST_DOUBLE_P (XVECEXP (vals, 0, i)))
>> >> > + {
>> >> > + maxelement = i;
>> >> > + break;
>> >> > + }
>> >> > +
>> >> > rtx x = force_reg (inner_mode, XVECEXP (vals, 0, maxelement));
>> >> > aarch64_emit_move (target, lowpart_subreg (mode, x, inner_mode));
>> >>
>> >> We don't want to force the constant into a register though.
>> > OK right, sorry.
>> > With the attached patch, for the following test-case:
>> > int64x2_t f_s64(int64_t x)
>> > {
>> > return (int64x2_t) { x, 1 };
>> > }
>> >
>> > it loads constant from memory (same code-gen as without patch).
>> > f_s64:
>> > adrp x1, .LC0
>> > ldr q0, [x1, #:lo12:.LC0]
>> > ins v0.d[0], x0
>> > ret
>> >
>> > Does the patch look OK ?
>> >
>> > Thanks,
>> > Prathamesh
>> > [...]
>> > [aarch64] Improve code-gen for vector initialization with single constant element.
>> >
>> > gcc/ChangeLog:
>> > * config/aarch64/aarc64.cc (aarch64_expand_vector_init): Tweak condition
>> > if (n_var == n_elts && n_elts <= 16) to allow a single constant,
>> > and if maxv == 1, use constant element for duplicating into register.
>> >
>> > gcc/testsuite/ChangeLog:
>> > * gcc.target/aarch64/vec-init-single-const.c: New test.
>> >
>> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
>> > index 2b0de7ca038..97309ddec4f 100644
>> > --- a/gcc/config/aarch64/aarch64.cc
>> > +++ b/gcc/config/aarch64/aarch64.cc
>> > @@ -22167,7 +22167,7 @@ aarch64_expand_vector_init (rtx target, rtx vals)
>> > and matches[X][1] with the count of duplicate elements (if X is the
>> > earliest element which has duplicates). */
>> >
>> > - if (n_var == n_elts && n_elts <= 16)
>> > + if ((n_var >= n_elts - 1) && n_elts <= 16)
>>
>> No need for the extra brackets.
> Adjusted, thanks. Sorry if this sounds like a silly question, but why
> do we need the n_elts <= 16 check ?
> Won't n_elts be always <= 16 since max number of elements in a vector
> would be 16 for V16QI ?
Was wondering the same thing :)
Let's leave it though.
>> > {
>> > int matches[16][2] = {0};
>> > for (int i = 0; i < n_elts; i++)
>> > @@ -22227,8 +22227,26 @@ aarch64_expand_vector_init (rtx target, rtx vals)
>> > vector register. For big-endian we want that position to hold
>> > the last element of VALS. */
>> > maxelement = BYTES_BIG_ENDIAN ? n_elts - 1 : 0;
>> > - rtx x = force_reg (inner_mode, XVECEXP (vals, 0, maxelement));
>> > - aarch64_emit_move (target, lowpart_subreg (mode, x, inner_mode));
>> > +
>> > + /* If we have a single constant element, use that for duplicating
>> > + instead. */
>> > + if (n_var == n_elts - 1)
>> > + for (int i = 0; i < n_elts; i++)
>> > + if (CONST_INT_P (XVECEXP (vals, 0, i))
>> > + || CONST_DOUBLE_P (XVECEXP (vals, 0, i)))
>> > + {
>> > + maxelement = i;
>> > + break;
>> > + }
>> > +
>> > + rtx maxval = XVECEXP (vals, 0, maxelement);
>> > + if (!(CONST_INT_P (maxval) || CONST_DOUBLE_P (maxval)))
>> > + {
>> > + rtx x = force_reg (inner_mode, XVECEXP (vals, 0, maxelement));
>> > + aarch64_emit_move (target, lowpart_subreg (mode, x, inner_mode));
>> > + }
>> > + else
>> > + aarch64_emit_move (target, gen_vec_duplicate (mode, maxval));
>> > }
>> > else
>> > {
>>
>> This seems a bit convoluted. It might be easier to record whether
>> we see a CONST_INT_P or a CONST_DOUBLE_P during the previous loop,
>> and if so what the constant is. Then handle that case first,
>> as a separate arm of the "if".
> Adjusted in the attached patch. Does it look OK ?
I meant: adjust
int maxelement = 0;
int maxv = 0;
for (int i = 0; i < n_elts; i++)
if (matches[i][1] > maxv)
{
maxelement = i;
maxv = matches[i][1];
}
so that it also records any CONST_INT or CONST_DOUBLE (as an rtx).
>> > diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-single-const.c b/gcc/testsuite/gcc.target/aarch64/vec-init-single-const.c
>> > new file mode 100644
>> > index 00000000000..682fd43439a
>> > --- /dev/null
>> > +++ b/gcc/testsuite/gcc.target/aarch64/vec-init-single-const.c
>> > @@ -0,0 +1,66 @@
>> > +/* { dg-do compile } */
>> > +/* { dg-options "-O2" } */
>> > +/* { dg-final { check-function-bodies "**" "" "" } } */
>> > +
>> > +#include <arm_neon.h>
>> > +
>> > +/*
>> > +** f_s8:
>> > +** ...
>> > +** dup v[0-9]+\.16b, w[0-9]+
>> > +** movi v[0-9]+\.8b, 0x1
>> > +** ins v[0-9]+\.b\[15\], v[0-9]+\.b\[0\]
>> > +** ...
>> > +** ret
>>
>> Like with the divide-and-conquer patch, there's nothing that requires
>> the first two instructions to be in that order.
> Hmm, will it be OK to disable scheduling by passing
> -fno-schedule-insns -fno-schedule-insns2
> for the test ?
Guess we might as well try that for now.
Elsewhere I've used:
(
first sequence
|
second sequence
)
common part
but we probably have enough control over the unscheduled sequence
for that not to be necessary here.
>> What is the second ... hiding? What sequences do we actually generate?
> Sorry, added them by mistake. They were the exact sequences. Adjusted
> tests in the patch.
>>
>> BTW, remember to say how patches were tested :-)
> Right, sorry. The patch is under bootstrap+test on aarch64-linux-gnu.
Please also test the new tests on big-endian.
> +/*
> +** f_s8:
> +** dup v[0-9]+\.16b, w[0-9]+
Without the ...s, this must be v0 and w0 respectively
> +** movi v[0-9]+\.8b, 0x1
Would be good to capture the register number here and use \1 in the
following line.
> +** ins v[0-9]+\.b\[15\], v[0-9]+\.b\[0\]
Similarly v0 for the first operand here.
Thanks,
Richard
> +** ret
> +*/
> +
> +int8x16_t f_s8(int8_t x)
> +{
> + return (int8x16_t) { x, x, x, x, x, x, x, x,
> + x, x, x, x, x, x, x, 1 };
> +}
> +
> +/*
> +** f_s16:
> +** dup v[0-9]+\.8h, w[0-9]+
> +** movi v[0-9]+\.4h, 0x1
> +** ins v[0-9]+\.h\[7\], v[0-9]+\.h\[0\]
> +** ret
> +*/
> +
> +int16x8_t f_s16(int16_t x)
> +{
> + return (int16x8_t) { x, x, x, x, x, x, x, 1 };
> +}
> +
> +/*
> +** f_s32:
> +** dup v[0-9]\.4s, w[0-9]+
> +** movi v[0-9]\.2s, 0x1
> +** ins v[0-9]+\.s\[3\], v[0-9]+\.s\[0\]
> +** ret
> +*/
> +
> +int32x4_t f_s32(int32_t x)
> +{
> + return (int32x4_t) { x, x, x, 1 };
> +}
> +
> +/*
> +** f_s64:
> +** adrp x[0-9]+, .LC[0-9]+
> +** ldr q[0-9]+, \[x[0-9]+, #:lo12:.LC[0-9]+\]
> +** ins v[0-9]+\.d\[0\], x[0-9]+
> +** ret
> +*/
> +
> +int64x2_t f_s64(int64_t x)
> +{
> + return (int64x2_t) { x, 1 };
> +}
next prev parent reply other threads:[~2023-05-02 12:52 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-03 7:16 Prathamesh Kulkarni
2023-02-13 6:28 ` Prathamesh Kulkarni
2023-04-03 18:12 ` Prathamesh Kulkarni
2023-04-25 10:59 ` Richard Sandiford
2023-05-02 5:41 ` Prathamesh Kulkarni
2023-05-02 9:25 ` Richard Sandiford
2023-05-02 10:22 ` Prathamesh Kulkarni
2023-05-02 12:02 ` Richard Sandiford
2023-05-02 12:38 ` Prathamesh Kulkarni
2023-05-02 12:52 ` Richard Sandiford [this message]
2023-05-03 11:28 ` Prathamesh Kulkarni
2023-05-11 19:15 ` Richard Sandiford
2023-05-15 14:09 ` Prathamesh Kulkarni
2023-05-15 18:59 ` Richard Sandiford
2023-05-17 15:23 ` Prathamesh Kulkarni
2023-05-18 8:07 ` Richard Sandiford
2023-05-18 14:41 ` Prathamesh Kulkarni
2023-05-18 16:34 ` Richard Sandiford
2023-05-19 10:56 ` Prathamesh Kulkarni
2023-05-22 8:48 ` Richard Sandiford
2023-05-24 9:29 ` Prathamesh Kulkarni
2023-05-24 10:10 ` Richard Sandiford
2023-05-24 19:13 ` Prathamesh Kulkarni
2023-05-24 19:58 ` Richard Sandiford
2023-05-25 6:47 ` Prathamesh Kulkarni
2023-05-25 7:34 ` Richard Sandiford
2023-05-25 9:56 ` Prathamesh Kulkarni
2023-05-26 3:04 ` Prathamesh Kulkarni
2023-05-30 18:53 ` Richard Sandiford
2023-06-12 17:52 ` Prathamesh Kulkarni
2023-05-24 19:50 ` Prathamesh Kulkarni
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=mpt7ctqvqsz.fsf@arm.com \
--to=richard.sandiford@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=prathamesh.kulkarni@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).