From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=zBNQ=AQ=arm.com=richard.sandiford@sourceware.org>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
	by sourceware.org (Postfix) with ESMTP id 247E53858D1E
	for <gcc-patches@gcc.gnu.org>; Tue, 25 Apr 2023 10:59:24 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 247E53858D1E
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id BB9FA2F4;
	Tue, 25 Apr 2023 04:00:07 -0700 (PDT)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 4092C3F5A1;
	Tue, 25 Apr 2023 03:59:23 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
Mail-Followup-To: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>,gcc Patches <gcc-patches@gcc.gnu.org>, richard.sandiford@arm.com
Cc: gcc Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [aarch64] Code-gen for vector initialization involving constants
References: <CAAgBjMnwGk4fOc3PTM_agTXXFvt=767a3-AWOfSr23Xja6K81w@mail.gmail.com>
Date: Tue, 25 Apr 2023 11:59:21 +0100
In-Reply-To: <CAAgBjMnwGk4fOc3PTM_agTXXFvt=767a3-AWOfSr23Xja6K81w@mail.gmail.com>
	(Prathamesh Kulkarni's message of "Fri, 3 Feb 2023 12:46:33 +0530")
Message-ID: <mpt354o448m.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Status: No, score=-24.5 required=5.0 tests=BAYES_00,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> Hi Richard,
> While digging thru aarch64_expand_vector_init, I noticed it gives
> priority to loading a constant first:
>  /* Initialise a vector which is part-variable.  We want to first try
>      to build those lanes which are constant in the most efficient way we
>      can.  */
>
> which results in suboptimal code-gen for following case:
> int16x8_t f_s16(int16_t x)
> {
>   return (int16x8_t) { x, x, x, x, x, x, x, 1 };
> }
>
> code-gen trunk:
> f_s16:
>         movi    v0.8h, 0x1
>         ins     v0.h[0], w0
>         ins     v0.h[1], w0
>         ins     v0.h[2], w0
>         ins     v0.h[3], w0
>         ins     v0.h[4], w0
>         ins     v0.h[5], w0
>         ins     v0.h[6], w0
>         ret
>
> The attached patch tweaks the following condition:
> if (n_var == n_elts && n_elts <= 16)
>   {
>     ...
>   }
>
> to pass if maxv >= 80% of n_elts, with 80% being an
> arbitrary "high enough" threshold. The intent is to dup
> the most repeating variable if it it's repetition
> is "high enough" and insert constants which should be "better" than
> loading constant first and inserting variables like in the above case.

I'm not too keen on the 80%.  Like you say, it seems a bit arbitrary.

The case above can also be handled by relaxing n_var == n_elts to
n_var >= n_elts - 1, so that if there's just one constant element,
we look for duplicated variable elements.  If there are none
(maxv == 1), but there is a constant element, we can duplicate
the constant element into a register.

The case when there's more than one constant element needs more thought
(and testcases :-)).  E.g. after a certain point, it would probably be
better to load the variable and constant parts separately and blend them
using TBL.  It also matters whether the constants are equal or not.

There are also cases that could be handled using EXT.

Plus, if we're inserting many variable elements that are already
in GPRs, we can probably do better by coalescing them into bigger
GPR values and inserting them as wider elements.

Because of things like that, I think we should stick to the
single-constant case for now.

Thanks,
Richard