From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=4MOh=BN=arm.com=richard.sandiford@sourceware.org>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
	by sourceware.org (Postfix) with ESMTP id F2A7C3858CDB
	for <gcc-patches@gcc.gnu.org>; Wed, 24 May 2023 10:10:38 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F2A7C3858CDB
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id BA3E91042;
	Wed, 24 May 2023 03:11:23 -0700 (PDT)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 3B87E3F67D;
	Wed, 24 May 2023 03:10:38 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
Mail-Followup-To: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>,gcc Patches <gcc-patches@gcc.gnu.org>, richard.sandiford@arm.com
Cc: gcc Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [aarch64] Code-gen for vector initialization involving constants
References: <CAAgBjMnwGk4fOc3PTM_agTXXFvt=767a3-AWOfSr23Xja6K81w@mail.gmail.com>
	<mpt354o448m.fsf@arm.com>
	<CAAgBjMm2o_64mbmOsnzjZMrLNRozav95sCFR73cz+iA2SLURwg@mail.gmail.com>
	<mpt354fw0dc.fsf@arm.com>
	<CAAgBjM=ryaC9h-9N_WWmPjcqE_fEA7xXPK8wye8TvFXsnMTyHg@mail.gmail.com>
	<mpto7n3ueju.fsf@arm.com>
	<CAAgBjMnP5Z3bsge08n5GWszBBqPgwV66KaZjr6vYOs3GUvv2qw@mail.gmail.com>
	<mpt7ctqvqsz.fsf@arm.com>
	<CAAgBjMn6tOT3HpOdjB3f47D_qAVC2S1jsoGVGvnJ7uk6vDcKdA@mail.gmail.com>
	<mpt3542el2f.fsf@arm.com>
	<CAAgBjMm9__ZPxuNvtaiwSFaS35kWqj2StsQY6atYnUPMzUYYsw@mail.gmail.com>
	<mpth6sd5slj.fsf@arm.com>
	<CAAgBjMnAT6keAqMqtj2+0Z6ZKyKDQ+e+jWHANgEESVFHG_WM2w@mail.gmail.com>
	<mpt353uysem.fsf@arm.com>
	<CAAgBjMnXfy0R8AjJhaXZ7Af7CF7n=xgPEycc4TKJ77V-Tcy5mQ@mail.gmail.com>
	<mptjzx5y4wj.fsf@arm.com>
	<CAAgBjMkzx9rB_WQD78jv18NiV+R+5bg9Jr3UFPonVmxTSSh+OA@mail.gmail.com>
	<mptilckpx8n.fsf@arm.com>
	<CAAgBjMksK3d5Wtv3bOAeL4Rjjwt8_yUV3yt47uxuLRYWPsXJog@mail.gmail.com>
Date: Wed, 24 May 2023 11:10:37 +0100
In-Reply-To: <CAAgBjMksK3d5Wtv3bOAeL4Rjjwt8_yUV3yt47uxuLRYWPsXJog@mail.gmail.com>
	(Prathamesh Kulkarni's message of "Wed, 24 May 2023 14:59:36 +0530")
Message-ID: <mptwn0yjazm.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Status: No, score=-28.0 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_STOCKGEN,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> On Mon, 22 May 2023 at 14:18, Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
>> > Hi Richard,
>> > Thanks for the suggestions. Does the attached patch look OK ?
>> > Boostrap+test in progress on aarch64-linux-gnu.
>>
>> Like I say, please wait for the tests to complete before sending an RFA.
>> It saves a review cycle if the tests don't in fact pass.
> Right, sorry, will post patches after completion of testing henceforth.
>>
>> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
>> > index 29dbacfa917..e611a7cca25 100644
>> > --- a/gcc/config/aarch64/aarch64.cc
>> > +++ b/gcc/config/aarch64/aarch64.cc
>> > @@ -22332,6 +22332,43 @@ aarch64_unzip_vector_init (machine_mode mode, rtx vals, bool even_p)
>> >    return gen_rtx_PARALLEL (new_mode, vec);
>> >  }
>> >
>> > +/* Return true if INSN is a scalar move.  */
>> > +
>> > +static bool
>> > +scalar_move_insn_p (const rtx_insn *insn)
>> > +{
>> > +  rtx set = single_set (insn);
>> > +  if (!set)
>> > +    return false;
>> > +  rtx src = SET_SRC (set);
>> > +  rtx dest = SET_DEST (set);
>> > +  return is_a<scalar_mode>(GET_MODE (dest))
>> > +      && aarch64_mov_operand_p (src, GET_MODE (src));
>>
>> Formatting:
>>
>>   return (is_a<scalar_mode>(GET_MODE (dest))
>>           && aarch64_mov_operand_p (src, GET_MODE (src)));
>>
>> OK with that change if the tests pass, thanks.
> Unfortunately, the patch regressed vec-init-21.c:
>
> int8x16_t f_s8(int8_t x, int8_t y)
> {
>   return (int8x16_t) { x, y, 1, 2, 3, 4, 5, 6,
>                        7, 8, 9, 10, 11, 12, 13, 14 };
> }
>
> -O3 code-gen trunk:
> f_s8:
>         adrp    x2, .LC0
>         ldr     q0, [x2, #:lo12:.LC0]
>         ins     v0.b[0], w0
>         ins     v0.b[1], w1
>         ret
>
> -O3 code-gen patch:
> f_s8:
>         adrp    x2, .LC0
>         ldr     d31, [x2, #:lo12:.LC0]
>         adrp    x2, .LC1
>         ldr     d0, [x2, #:lo12:.LC1]
>         ins     v31.b[0], w0
>         ins     v0.b[0], w1
>         zip1    v0.16b, v31.16b, v0.16b
>         ret
>
> With trunk, it chooses the fallback sequence because both fallback
> and zip1 sequence had cost = 20, however with patch applied,
> we end up with zip1 sequence cost = 24 and fallback sequence
> cost = 28.
>
> This happens because of using insn_cost instead of
> set_rtx_cost for the following expression:
> (set (reg:QI 100)
>     (subreg/s/u:QI (reg/v:SI 94 [ y ]) 0))
> set_rtx_cost returns 0 for above expression but insn_cost returns 4.

Yeah, was wondering why you'd dropped the set_rtx_cost thing,
but decided not to question it since using insn_cost seemed
reasonable if it worked.

> This expression template appears twice in fallback sequence, which raises
> the cost to 28 from 20, while it appears once in each half of zip1 sequence,
> which raises the cost to 24 from 20, and so it now prefers zip1 sequence
> instead.
>
> I assumed this expression would be ignored because it looks like a scalar move,
> but that doesn't seem to be the case ?
> aarch64_classify_symbolic_expression returns
> SYMBOL_FORCE_TO_MEM for (subreg/s/u:QI (reg/v:SI 94 [ y ]) 0)
> and thus aarch64_mov_operand_p returns false.

Ah, I guess it should be aarch64_mov_operand instead.  Confusing that
they're so different...

> Another issue with the zip1 sequence above is using same register x2
> for loading another half of constant in:
> adrp    x2, .LC1
>
> I guess this will create an output dependency from adrp x2, .LC0 ->
> adrp x2, .LC1
> and anti-dependency from  ldr d31, [x2, #:lo12:.LC0] -> adrp x2, .LC1
> essentially forcing almost the entire sequence (except ins
> instructions) to execute sequentially ?

I'd expect modern cores to handle that via renaming.

Thanks,
Richard