From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.guenther@gmail.com>
Received: from mail-ed1-x52f.google.com (mail-ed1-x52f.google.com
 [IPv6:2a00:1450:4864:20::52f])
 by sourceware.org (Postfix) with ESMTPS id 17A183861834
 for <gcc-patches@gcc.gnu.org>; Wed,  4 Aug 2021 11:45:57 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 17A183861834
Received: by mail-ed1-x52f.google.com with SMTP id cf5so3130413edb.2
 for <gcc-patches@gcc.gnu.org>; Wed, 04 Aug 2021 04:45:57 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to;
 bh=AUMG4tqejlAtLG+1VEYF4hgL9gR2Clh0+wNV3Y/Yhg0=;
 b=SYK75v4wbO0JHmfWPYC1Q6q43aRjki4L/J1W/BAiy5FVh8o8zhQJSmpI9Kzv2JOFwx
 eFuN2kgir96li04oPH++IqZMz212Y0ODSYmIWrBiY9kGv3MNsCPxZDfWzIHUOXCotg4a
 CFYvlH2MWnapec3W5/cGH/L2XUoXrdgA9M9kqEmRlhNen2rjh0RnAoRq5Qxnu0ie9aK1
 iV/NX9NFIlw7Tx2dF8GNvGbGS8Wdy2sOu0/2ZjWr9NYVARaXx0CiwHoKy5TwTmygLGah
 0dT7rmYOyLgN7+O4kbNKL7A2B4+1XoYZXmTOrSU2lgarW/fgpK6KqBgMJlNKEsToBN8F
 THTQ==
X-Gm-Message-State: AOAM532JHPISjzMyBdVeyX50lC/OLTA2JjGJMLCAF1jgpQzcPHFWfDT8
 4a6j2YrWuCEWP3OKps2xc0Fgm3TJmMQPnVyFCWo=
X-Google-Smtp-Source: ABdhPJwCi7wtb45TOiIjLAAoGzkXODrVUUL5tE3tBw88pr18ztNZ2tbf2epq5W3FCneSgixfz7CxxFgHjc3PWNxbS6A=
X-Received: by 2002:a05:6402:1603:: with SMTP id
 f3mr31341539edv.274.1628077555950; 
 Wed, 04 Aug 2021 04:45:55 -0700 (PDT)
MIME-Version: 1.0
References: <mptlf5id9bw.fsf@arm.com> <mpto8aebumt.fsf@arm.com>
In-Reply-To: <mpto8aebumt.fsf@arm.com>
From: Richard Biener <richard.guenther@gmail.com>
Date: Wed, 4 Aug 2021 13:45:45 +0200
Message-ID: <CAFiYyc2asRzpT7Rp5UTqDo==_qG91DdyOkxWJB38P9fyU4TokA@mail.gmail.com>
Subject: Re: [PATCH 6/8] aarch64: Tweak MLA vector costs
To: Richard Sandiford <richard.sandiford@arm.com>,
 GCC Patches <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset="UTF-8"
X-Spam-Status: No, score=-8.7 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0,
 RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Wed, 04 Aug 2021 11:45:58 -0000

On Tue, Aug 3, 2021 at 2:10 PM Richard Sandiford via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> The issue-based vector costs currently assume that a multiply-add
> sequence can be implemented using a single instruction.  This is
> generally true for scalars (which have a 4-operand instruction)
> and SVE (which allows the output to be tied to any input).
> However, for Advanced SIMD, multiplying two values and adding
> an invariant will end up being a move and an MLA.
>
> The only target to use the issue-based vector costs is Neoverse V1,
> which would generally prefer SVE in this case anyway.  I therefore
> don't have a self-contained testcase.  However, the distinction
> becomes more important with a later patch.

But we do cost any invariants separately (for the prologue), so they
should be available in a register.  How doesn't that work?

> gcc/
>         * config/aarch64/aarch64.c (aarch64_multiply_add_p): Add a vec_flags
>         parameter.  Detect cases in which an Advanced SIMD MLA would almost
>         certainly require a MOV.
>         (aarch64_count_ops): Update accordingly.
> ---
>  gcc/config/aarch64/aarch64.c | 25 ++++++++++++++++++++++---
>  1 file changed, 22 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 084f8caa0da..19045ef6944 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -14767,9 +14767,12 @@ aarch64_integer_truncation_p (stmt_vec_info stmt_info)
>
>  /* Return true if STMT_INFO is the second part of a two-statement multiply-add
>     or multiply-subtract sequence that might be suitable for fusing into a
> -   single instruction.  */
> +   single instruction.  If VEC_FLAGS is zero, analyze the operation as
> +   a scalar one, otherwise analyze it as an operation on vectors with those
> +   VEC_* flags.  */
>  static bool
> -aarch64_multiply_add_p (vec_info *vinfo, stmt_vec_info stmt_info)
> +aarch64_multiply_add_p (vec_info *vinfo, stmt_vec_info stmt_info,
> +                       unsigned int vec_flags)
>  {
>    gassign *assign = dyn_cast<gassign *> (stmt_info->stmt);
>    if (!assign)
> @@ -14797,6 +14800,22 @@ aarch64_multiply_add_p (vec_info *vinfo, stmt_vec_info stmt_info)
>        if (!rhs_assign || gimple_assign_rhs_code (rhs_assign) != MULT_EXPR)
>         continue;
>
> +      if (vec_flags & VEC_ADVSIMD)
> +       {
> +         /* Scalar and SVE code can tie the result to any FMLA input (or none,
> +            although that requires a MOVPRFX for SVE).  However, Advanced SIMD
> +            only supports MLA forms, so will require a move if the result
> +            cannot be tied to the accumulator.  The most important case in
> +            which this is true is when the accumulator input is invariant.  */
> +         rhs = gimple_op (assign, 3 - i);
> +         if (TREE_CODE (rhs) != SSA_NAME)
> +           return false;
> +         def_stmt_info = vinfo->lookup_def (rhs);
> +         if (!def_stmt_info
> +             || STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_external_def)
> +           return false;
> +       }
> +
>        return true;
>      }
>    return false;
> @@ -15232,7 +15251,7 @@ aarch64_count_ops (class vec_info *vinfo, aarch64_vector_costs *costs,
>      }
>
>    /* Assume that multiply-adds will become a single operation.  */
> -  if (stmt_info && aarch64_multiply_add_p (vinfo, stmt_info))
> +  if (stmt_info && aarch64_multiply_add_p (vinfo, stmt_info, vec_flags))
>      return;
>
>    /* When costing scalar statements in vector code, the count already