Re: [aarch64] Code-gen for vector initialization involving constants

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
To: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>,
	gcc Patches <gcc-patches@gcc.gnu.org>,
	 richard.sandiford@arm.com
Subject: Re: [aarch64] Code-gen for vector initialization involving constants
Date: Wed, 24 May 2023 14:59:36 +0530	[thread overview]
Message-ID: <CAAgBjMksK3d5Wtv3bOAeL4Rjjwt8_yUV3yt47uxuLRYWPsXJog@mail.gmail.com> (raw)
In-Reply-To: <mptilckpx8n.fsf@arm.com>

On Mon, 22 May 2023 at 14:18, Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> > Hi Richard,
> > Thanks for the suggestions. Does the attached patch look OK ?
> > Boostrap+test in progress on aarch64-linux-gnu.
>
> Like I say, please wait for the tests to complete before sending an RFA.
> It saves a review cycle if the tests don't in fact pass.
Right, sorry, will post patches after completion of testing henceforth.
>
> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > index 29dbacfa917..e611a7cca25 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -22332,6 +22332,43 @@ aarch64_unzip_vector_init (machine_mode mode, rtx vals, bool even_p)
> >    return gen_rtx_PARALLEL (new_mode, vec);
> >  }
> >
> > +/* Return true if INSN is a scalar move.  */
> > +
> > +static bool
> > +scalar_move_insn_p (const rtx_insn *insn)
> > +{
> > +  rtx set = single_set (insn);
> > +  if (!set)
> > +    return false;
> > +  rtx src = SET_SRC (set);
> > +  rtx dest = SET_DEST (set);
> > +  return is_a<scalar_mode>(GET_MODE (dest))
> > +      && aarch64_mov_operand_p (src, GET_MODE (src));
>
> Formatting:
>
>   return (is_a<scalar_mode>(GET_MODE (dest))
>           && aarch64_mov_operand_p (src, GET_MODE (src)));
>
> OK with that change if the tests pass, thanks.
Unfortunately, the patch regressed vec-init-21.c:

int8x16_t f_s8(int8_t x, int8_t y)
{
  return (int8x16_t) { x, y, 1, 2, 3, 4, 5, 6,
                       7, 8, 9, 10, 11, 12, 13, 14 };
}

-O3 code-gen trunk:
f_s8:
        adrp    x2, .LC0
        ldr     q0, [x2, #:lo12:.LC0]
        ins     v0.b[0], w0
        ins     v0.b[1], w1
        ret

-O3 code-gen patch:
f_s8:
        adrp    x2, .LC0
        ldr     d31, [x2, #:lo12:.LC0]
        adrp    x2, .LC1
        ldr     d0, [x2, #:lo12:.LC1]
        ins     v31.b[0], w0
        ins     v0.b[0], w1
        zip1    v0.16b, v31.16b, v0.16b
        ret

With trunk, it chooses the fallback sequence because both fallback
and zip1 sequence had cost = 20, however with patch applied,
we end up with zip1 sequence cost = 24 and fallback sequence
cost = 28.

This happens because of using insn_cost instead of
set_rtx_cost for the following expression:
(set (reg:QI 100)
    (subreg/s/u:QI (reg/v:SI 94 [ y ]) 0))
set_rtx_cost returns 0 for above expression but insn_cost returns 4.

This expression template appears twice in fallback sequence, which raises
the cost to 28 from 20, while it appears once in each half of zip1 sequence,
which raises the cost to 24 from 20, and so it now prefers zip1 sequence
instead.

I assumed this expression would be ignored because it looks like a scalar move,
but that doesn't seem to be the case ?
aarch64_classify_symbolic_expression returns
SYMBOL_FORCE_TO_MEM for (subreg/s/u:QI (reg/v:SI 94 [ y ]) 0)
and thus aarch64_mov_operand_p returns false.

Another issue with the zip1 sequence above is using same register x2
for loading another half of constant in:
adrp    x2, .LC1

I guess this will create an output dependency from adrp x2, .LC0 ->
adrp x2, .LC1
and anti-dependency from  ldr d31, [x2, #:lo12:.LC0] -> adrp x2, .LC1
essentially forcing almost the entire sequence (except ins
instructions) to execute sequentially ?

Fallback sequence rtl, cost = 28
(set (reg:V16QI 96)
    (const_vector:V16QI [
            (const_int 7 [0x7])
            (const_int 8 [0x8])
            (const_int 1 [0x1])
            (const_int 2 [0x2])
            (const_int 3 [0x3])
            (const_int 4 [0x4])
            (const_int 5 [0x5])
            (const_int 6 [0x6])
            (const_int 7 [0x7])
            (const_int 8 [0x8])
            (const_int 9 [0x9])
            (const_int 10 [0xa])
            (const_int 11 [0xb])
            (const_int 12 [0xc])
            (const_int 13 [0xd])
            (const_int 14 [0xe])
        ]))
cost = 12
(set (reg:QI 101)
    (subreg/s/u:QI (reg/v:SI 93 [ x ]) 0))
cost = 4
(set (reg:V16QI 96)
    (vec_merge:V16QI (vec_duplicate:V16QI (reg:QI 101))
        (reg:V16QI 96)
        (const_int 1 [0x1])))
cost = 4
(set (reg:QI 102)
    (subreg/s/u:QI (reg/v:SI 94 [ y ]) 0))
cost = 4
(set (reg:V16QI 96)
    (vec_merge:V16QI (vec_duplicate:V16QI (reg:QI 102))
        (reg:V16QI 96)
        (const_int 2 [0x2])))
cost = 4

zip1 sequence rtl, cost = 24
(set (reg:V8QI 97)
    (const_vector:V8QI [
            (const_int 7 [0x7])
            (const_int 1 [0x1])
            (const_int 3 [0x3])
            (const_int 5 [0x5])
            (const_int 7 [0x7])
            (const_int 9 [0x9])
            (const_int 11 [0xb])
            (const_int 13 [0xd])
        ]))
cost = 12
(set (reg:QI 98)
    (subreg/s/u:QI (reg/v:SI 93 [ x ]) 0))
cost = 4
(set (reg:V8QI 97)
    (vec_merge:V8QI (vec_duplicate:V8QI (reg:QI 98))
        (reg:V8QI 97)
        (const_int 1 [0x1])))
cost = 4
(set (reg:V8QI 99)
    (const_vector:V8QI [
            (const_int 8 [0x8])
            (const_int 2 [0x2])
            (const_int 4 [0x4])
            (const_int 6 [0x6])
            (const_int 8 [0x8])
            (const_int 10 [0xa])
            (const_int 12 [0xc])
            (const_int 14 [0xe])
        ]))
cost = 12
(set (reg:QI 100)
    (subreg/s/u:QI (reg/v:SI 94 [ y ]) 0))
cost = 4
(set (reg:V8QI 99)
    (vec_merge:V8QI (vec_duplicate:V8QI (reg:QI 100))
        (reg:V8QI 99)
        (const_int 1 [0x1])))
cost = 4
(set (reg:V16QI 96)
    (unspec:V16QI [
            (subreg:V16QI (reg:V8QI 97) 0)
            (subreg:V16QI (reg:V8QI 99) 0)
        ] UNSPEC_ZIP1))
cost = 4

Thanks,
Prathamesh
>
> Richard

next prev parent reply	other threads:[~2023-05-24  9:30 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-03  7:16 Prathamesh Kulkarni
2023-02-13  6:28 ` Prathamesh Kulkarni
2023-04-03 18:12   ` Prathamesh Kulkarni
2023-04-25 10:59 ` Richard Sandiford
2023-05-02  5:41   ` Prathamesh Kulkarni
2023-05-02  9:25     ` Richard Sandiford
2023-05-02 10:22       ` Prathamesh Kulkarni
2023-05-02 12:02         ` Richard Sandiford
2023-05-02 12:38           ` Prathamesh Kulkarni
2023-05-02 12:52             ` Richard Sandiford
2023-05-03 11:28               ` Prathamesh Kulkarni
2023-05-11 19:15                 ` Richard Sandiford
2023-05-15 14:09                   ` Prathamesh Kulkarni
2023-05-15 18:59                     ` Richard Sandiford
2023-05-17 15:23                       ` Prathamesh Kulkarni
2023-05-18  8:07                         ` Richard Sandiford
2023-05-18 14:41                           ` Prathamesh Kulkarni
2023-05-18 16:34                             ` Richard Sandiford
2023-05-19 10:56                               ` Prathamesh Kulkarni
2023-05-22  8:48                                 ` Richard Sandiford
2023-05-24  9:29                                   ` Prathamesh Kulkarni [this message]
2023-05-24 10:10                                     ` Richard Sandiford
2023-05-24 19:13                                       ` Prathamesh Kulkarni
2023-05-24 19:58                                         ` Richard Sandiford
2023-05-25  6:47                                           ` Prathamesh Kulkarni
2023-05-25  7:34                                             ` Richard Sandiford
2023-05-25  9:56                                               ` Prathamesh Kulkarni
2023-05-26  3:04                                                 ` Prathamesh Kulkarni
2023-05-30 18:53                                                   ` Richard Sandiford
2023-06-12 17:52                                                     ` Prathamesh Kulkarni
2023-05-24 19:50                                       ` Prathamesh Kulkarni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAAgBjMksK3d5Wtv3bOAeL4Rjjwt8_yUV3yt47uxuLRYWPsXJog@mail.gmail.com \
    --to=prathamesh.kulkarni@linaro.org \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).