From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=4MOh=BN=arm.com=richard.sandiford@sourceware.org>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
	by sourceware.org (Postfix) with ESMTP id CEB653858D28
	for <gcc-patches@gcc.gnu.org>; Wed, 24 May 2023 19:58:08 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CEB653858D28
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 78B6B1FB;
	Wed, 24 May 2023 12:58:53 -0700 (PDT)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E93D23F6C4;
	Wed, 24 May 2023 12:58:07 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
Mail-Followup-To: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>,gcc Patches <gcc-patches@gcc.gnu.org>, richard.sandiford@arm.com
Cc: gcc Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [aarch64] Code-gen for vector initialization involving constants
References: <CAAgBjMnwGk4fOc3PTM_agTXXFvt=767a3-AWOfSr23Xja6K81w@mail.gmail.com>
	<mpt354o448m.fsf@arm.com>
	<CAAgBjMm2o_64mbmOsnzjZMrLNRozav95sCFR73cz+iA2SLURwg@mail.gmail.com>
	<mpt354fw0dc.fsf@arm.com>
	<CAAgBjM=ryaC9h-9N_WWmPjcqE_fEA7xXPK8wye8TvFXsnMTyHg@mail.gmail.com>
	<mpto7n3ueju.fsf@arm.com>
	<CAAgBjMnP5Z3bsge08n5GWszBBqPgwV66KaZjr6vYOs3GUvv2qw@mail.gmail.com>
	<mpt7ctqvqsz.fsf@arm.com>
	<CAAgBjMn6tOT3HpOdjB3f47D_qAVC2S1jsoGVGvnJ7uk6vDcKdA@mail.gmail.com>
	<mpt3542el2f.fsf@arm.com>
	<CAAgBjMm9__ZPxuNvtaiwSFaS35kWqj2StsQY6atYnUPMzUYYsw@mail.gmail.com>
	<mpth6sd5slj.fsf@arm.com>
	<CAAgBjMnAT6keAqMqtj2+0Z6ZKyKDQ+e+jWHANgEESVFHG_WM2w@mail.gmail.com>
	<mpt353uysem.fsf@arm.com>
	<CAAgBjMnXfy0R8AjJhaXZ7Af7CF7n=xgPEycc4TKJ77V-Tcy5mQ@mail.gmail.com>
	<mptjzx5y4wj.fsf@arm.com>
	<CAAgBjMkzx9rB_WQD78jv18NiV+R+5bg9Jr3UFPonVmxTSSh+OA@mail.gmail.com>
	<mptilckpx8n.fsf@arm.com>
	<CAAgBjMksK3d5Wtv3bOAeL4Rjjwt8_yUV3yt47uxuLRYWPsXJog@mail.gmail.com>
	<mptwn0yjazm.fsf@arm.com>
	<CAAgBjMmB9WqP85xw3u7jFrk6Geq5E2LDgKorAJn0G+gkknTZMw@mail.gmail.com>
Date: Wed, 24 May 2023 20:58:06 +0100
In-Reply-To: <CAAgBjMmB9WqP85xw3u7jFrk6Geq5E2LDgKorAJn0G+gkknTZMw@mail.gmail.com>
	(Prathamesh Kulkarni's message of "Thu, 25 May 2023 00:43:27 +0530")
Message-ID: <mptv8ghcxip.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-27.8 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_STOCKGEN,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> On Wed, 24 May 2023 at 15:40, Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
>> > On Mon, 22 May 2023 at 14:18, Richard Sandiford
>> > <richard.sandiford@arm.com> wrote:
>> >>
>> >> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
>> >> > Hi Richard,
>> >> > Thanks for the suggestions. Does the attached patch look OK ?
>> >> > Boostrap+test in progress on aarch64-linux-gnu.
>> >>
>> >> Like I say, please wait for the tests to complete before sending an R=
FA.
>> >> It saves a review cycle if the tests don't in fact pass.
>> > Right, sorry, will post patches after completion of testing henceforth.
>> >>
>> >> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aar=
ch64.cc
>> >> > index 29dbacfa917..e611a7cca25 100644
>> >> > --- a/gcc/config/aarch64/aarch64.cc
>> >> > +++ b/gcc/config/aarch64/aarch64.cc
>> >> > @@ -22332,6 +22332,43 @@ aarch64_unzip_vector_init (machine_mode mo=
de, rtx vals, bool even_p)
>> >> >    return gen_rtx_PARALLEL (new_mode, vec);
>> >> >  }
>> >> >
>> >> > +/* Return true if INSN is a scalar move.  */
>> >> > +
>> >> > +static bool
>> >> > +scalar_move_insn_p (const rtx_insn *insn)
>> >> > +{
>> >> > +  rtx set =3D single_set (insn);
>> >> > +  if (!set)
>> >> > +    return false;
>> >> > +  rtx src =3D SET_SRC (set);
>> >> > +  rtx dest =3D SET_DEST (set);
>> >> > +  return is_a<scalar_mode>(GET_MODE (dest))
>> >> > +      && aarch64_mov_operand_p (src, GET_MODE (src));
>> >>
>> >> Formatting:
>> >>
>> >>   return (is_a<scalar_mode>(GET_MODE (dest))
>> >>           && aarch64_mov_operand_p (src, GET_MODE (src)));
>> >>
>> >> OK with that change if the tests pass, thanks.
>> > Unfortunately, the patch regressed vec-init-21.c:
>> >
>> > int8x16_t f_s8(int8_t x, int8_t y)
>> > {
>> >   return (int8x16_t) { x, y, 1, 2, 3, 4, 5, 6,
>> >                        7, 8, 9, 10, 11, 12, 13, 14 };
>> > }
>> >
>> > -O3 code-gen trunk:
>> > f_s8:
>> >         adrp    x2, .LC0
>> >         ldr     q0, [x2, #:lo12:.LC0]
>> >         ins     v0.b[0], w0
>> >         ins     v0.b[1], w1
>> >         ret
>> >
>> > -O3 code-gen patch:
>> > f_s8:
>> >         adrp    x2, .LC0
>> >         ldr     d31, [x2, #:lo12:.LC0]
>> >         adrp    x2, .LC1
>> >         ldr     d0, [x2, #:lo12:.LC1]
>> >         ins     v31.b[0], w0
>> >         ins     v0.b[0], w1
>> >         zip1    v0.16b, v31.16b, v0.16b
>> >         ret
>> >
>> > With trunk, it chooses the fallback sequence because both fallback
>> > and zip1 sequence had cost =3D 20, however with patch applied,
>> > we end up with zip1 sequence cost =3D 24 and fallback sequence
>> > cost =3D 28.
>> >
>> > This happens because of using insn_cost instead of
>> > set_rtx_cost for the following expression:
>> > (set (reg:QI 100)
>> >     (subreg/s/u:QI (reg/v:SI 94 [ y ]) 0))
>> > set_rtx_cost returns 0 for above expression but insn_cost returns 4.
>>
>> Yeah, was wondering why you'd dropped the set_rtx_cost thing,
>> but decided not to question it since using insn_cost seemed
>> reasonable if it worked.
> The attached patch uses set_rtx_cost for single_set and insn_cost
> otherwise for non debug insns similar to seq_cost.

FWIW, I think with the aarch64_mov_operand fix, the old way of using
insn_cost for everything would have worked too.  But either way is fine.

>> > This expression template appears twice in fallback sequence, which rai=
ses
>> > the cost to 28 from 20, while it appears once in each half of zip1 seq=
uence,
>> > which raises the cost to 24 from 20, and so it now prefers zip1 sequen=
ce
>> > instead.
>> >
>> > I assumed this expression would be ignored because it looks like a sca=
lar move,
>> > but that doesn't seem to be the case ?
>> > aarch64_classify_symbolic_expression returns
>> > SYMBOL_FORCE_TO_MEM for (subreg/s/u:QI (reg/v:SI 94 [ y ]) 0)
>> > and thus aarch64_mov_operand_p returns false.
>>
>> Ah, I guess it should be aarch64_mov_operand instead.  Confusing that
>> they're so different...
> Thanks, using aarch64_mov_operand worked.
>>
>> > Another issue with the zip1 sequence above is using same register x2
>> > for loading another half of constant in:
>> > adrp    x2, .LC1
>> >
>> > I guess this will create an output dependency from adrp x2, .LC0 ->
>> > adrp x2, .LC1
>> > and anti-dependency from  ldr d31, [x2, #:lo12:.LC0] -> adrp x2, .LC1
>> > essentially forcing almost the entire sequence (except ins
>> > instructions) to execute sequentially ?
>>
>> I'd expect modern cores to handle that via renaming.
> Ah right, thanks for the clarification.
>
> For some reason, it seems git diff is not formatting the patch correctly =
:/
> Or perhaps I am doing something wrongly.

No, I think it's fine.  It's just tabs vs. spaces.  A leading
"+" followed by a tab is still only indented 8 columns, whereas
"+" followed by 6 spaces is indented 7 columns.  So indentation
can look a bit weird in the diff.

I was accounting for that though. :)

> For eg, it shows:
> +  return is_a<scalar_mode>(GET_MODE (dest))
> +        && aarch64_mov_operand (src, GET_MODE (src));
> but after applying the patch, it's formatted correctly with "&&
> aarch64..."  right below is_a<scalar_mode>, both on column 10.

Yeah, the indentation itself was OK.  But there's an =E2=80=9Cemacs rule=E2=
=80=9D
that says that parens should be used when splitting an expression
over multiple lines like this.  So:

-------
Formatting:

  return (is_a<scalar_mode>(GET_MODE (dest))
	  && aarch64_mov_operand_p (src, GET_MODE (src)));
-------

was about adding the parens.

> +  for (; seq; seq =3D NEXT_INSN (seq))
> +    if (NONDEBUG_INSN_P (seq)
> +	&& !scalar_move_insn_p (seq))
> +      {
> +	if (rtx set =3D single_set (seq))
> +	  cost +=3D set_rtx_cost (set, speed);
> +	else
> +	  {
> +	    int this_cost =3D insn_cost (CONST_CAST_RTX_INSN (seq), speed);
> +	    if (this_cost > 0)
> +	      cost +=3D this_cost;
> +	    else
> +	      cost++;
> +	  }
> +      }

I think it'd be better to do the single_set first, and pass the set
to scalar_move_insn_p.  I.e.

  for (; seq; seq =3D NEXT_INSN (seq))
    if (NONDEBUG_INSN_P (seq))
      {
	if (rtx set =3D single_set (seq))
	  {
	    if (!scalar_move_insn_p (set))
	      cost +=3D set_rtx_cost (set, speed);
	  }
	else
	  {
	    int this_cost =3D insn_cost (CONST_CAST_RTX_INSN (seq), speed);
	    if (this_cost > 0)
	      cost +=3D this_cost;
	    else
	      cost++;
	  }
      }

Then scalar_move_insn_p can just be the last three statements,
adjusted as follows:

  rtx src =3D SET_SRC (set);
  rtx dest =3D SET_DEST (set);
  return (is_a<scalar_mode> (GET_MODE (dest))
	  && aarch64_mov_operand (src, GET_MODE (dest)));

Note the space after ">", and that the mode passed to aarch64_mov_operand
is the destination mode (since the source mode might be VOIDmode).

OK with that change, thanks.

Thanks,
Richard