From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.sandiford@arm.com>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by sourceware.org (Postfix) with ESMTP id 462E3393AC16
 for <gcc-patches@gcc.gnu.org>; Tue,  3 Aug 2021 12:06:04 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 462E3393AC16
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id F0EE613A1
 for <gcc-patches@gcc.gnu.org>; Tue,  3 Aug 2021 05:06:03 -0700 (PDT)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 98EF23F40C
 for <gcc-patches@gcc.gnu.org>; Tue,  3 Aug 2021 05:06:03 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: gcc-patches@gcc.gnu.org
Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com
Subject: [PATCH 6/8] aarch64: Tweak MLA vector costs
References: <mptlf5id9bw.fsf@arm.com>
Date: Tue, 03 Aug 2021 13:06:02 +0100
In-Reply-To: <mptlf5id9bw.fsf@arm.com> (Richard Sandiford's message of "Tue,
 03 Aug 2021 13:03:15 +0100")
Message-ID: <mpto8aebumt.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Aug 2021 12:06:05 -0000

The issue-based vector costs currently assume that a multiply-add
sequence can be implemented using a single instruction.  This is
generally true for scalars (which have a 4-operand instruction)
and SVE (which allows the output to be tied to any input).
However, for Advanced SIMD, multiplying two values and adding
an invariant will end up being a move and an MLA.

The only target to use the issue-based vector costs is Neoverse V1,
which would generally prefer SVE in this case anyway.  I therefore
don't have a self-contained testcase.  However, the distinction
becomes more important with a later patch.

gcc/
	* config/aarch64/aarch64.c (aarch64_multiply_add_p): Add a vec_flags
	parameter.  Detect cases in which an Advanced SIMD MLA would almost
	certainly require a MOV.
	(aarch64_count_ops): Update accordingly.
---
 gcc/config/aarch64/aarch64.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 084f8caa0da..19045ef6944 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -14767,9 +14767,12 @@ aarch64_integer_truncation_p (stmt_vec_info stmt_info)
 
 /* Return true if STMT_INFO is the second part of a two-statement multiply-add
    or multiply-subtract sequence that might be suitable for fusing into a
-   single instruction.  */
+   single instruction.  If VEC_FLAGS is zero, analyze the operation as
+   a scalar one, otherwise analyze it as an operation on vectors with those
+   VEC_* flags.  */
 static bool
-aarch64_multiply_add_p (vec_info *vinfo, stmt_vec_info stmt_info)
+aarch64_multiply_add_p (vec_info *vinfo, stmt_vec_info stmt_info,
+			unsigned int vec_flags)
 {
   gassign *assign = dyn_cast<gassign *> (stmt_info->stmt);
   if (!assign)
@@ -14797,6 +14800,22 @@ aarch64_multiply_add_p (vec_info *vinfo, stmt_vec_info stmt_info)
       if (!rhs_assign || gimple_assign_rhs_code (rhs_assign) != MULT_EXPR)
 	continue;
 
+      if (vec_flags & VEC_ADVSIMD)
+	{
+	  /* Scalar and SVE code can tie the result to any FMLA input (or none,
+	     although that requires a MOVPRFX for SVE).  However, Advanced SIMD
+	     only supports MLA forms, so will require a move if the result
+	     cannot be tied to the accumulator.  The most important case in
+	     which this is true is when the accumulator input is invariant.  */
+	  rhs = gimple_op (assign, 3 - i);
+	  if (TREE_CODE (rhs) != SSA_NAME)
+	    return false;
+	  def_stmt_info = vinfo->lookup_def (rhs);
+	  if (!def_stmt_info
+	      || STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_external_def)
+	    return false;
+	}
+
       return true;
     }
   return false;
@@ -15232,7 +15251,7 @@ aarch64_count_ops (class vec_info *vinfo, aarch64_vector_costs *costs,
     }
 
   /* Assume that multiply-adds will become a single operation.  */
-  if (stmt_info && aarch64_multiply_add_p (vinfo, stmt_info))
+  if (stmt_info && aarch64_multiply_add_p (vinfo, stmt_info, vec_flags))
     return;
 
   /* When costing scalar statements in vector code, the count already