From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 462E3393AC16 for ; Tue, 3 Aug 2021 12:06:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 462E3393AC16 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id F0EE613A1 for ; Tue, 3 Aug 2021 05:06:03 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 98EF23F40C for ; Tue, 3 Aug 2021 05:06:03 -0700 (PDT) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [PATCH 6/8] aarch64: Tweak MLA vector costs References: Date: Tue, 03 Aug 2021 13:06:02 +0100 In-Reply-To: (Richard Sandiford's message of "Tue, 03 Aug 2021 13:03:15 +0100") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Aug 2021 12:06:05 -0000 The issue-based vector costs currently assume that a multiply-add sequence can be implemented using a single instruction. This is generally true for scalars (which have a 4-operand instruction) and SVE (which allows the output to be tied to any input). However, for Advanced SIMD, multiplying two values and adding an invariant will end up being a move and an MLA. The only target to use the issue-based vector costs is Neoverse V1, which would generally prefer SVE in this case anyway. I therefore don't have a self-contained testcase. However, the distinction becomes more important with a later patch. gcc/ * config/aarch64/aarch64.c (aarch64_multiply_add_p): Add a vec_flags parameter. Detect cases in which an Advanced SIMD MLA would almost certainly require a MOV. (aarch64_count_ops): Update accordingly. --- gcc/config/aarch64/aarch64.c | 25 ++++++++++++++++++++++--- 1 file changed, 22 insertions(+), 3 deletions(-) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 084f8caa0da..19045ef6944 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -14767,9 +14767,12 @@ aarch64_integer_truncation_p (stmt_vec_info stmt_info) /* Return true if STMT_INFO is the second part of a two-statement multiply-add or multiply-subtract sequence that might be suitable for fusing into a - single instruction. */ + single instruction. If VEC_FLAGS is zero, analyze the operation as + a scalar one, otherwise analyze it as an operation on vectors with those + VEC_* flags. */ static bool -aarch64_multiply_add_p (vec_info *vinfo, stmt_vec_info stmt_info) +aarch64_multiply_add_p (vec_info *vinfo, stmt_vec_info stmt_info, + unsigned int vec_flags) { gassign *assign = dyn_cast (stmt_info->stmt); if (!assign) @@ -14797,6 +14800,22 @@ aarch64_multiply_add_p (vec_info *vinfo, stmt_vec_info stmt_info) if (!rhs_assign || gimple_assign_rhs_code (rhs_assign) != MULT_EXPR) continue; + if (vec_flags & VEC_ADVSIMD) + { + /* Scalar and SVE code can tie the result to any FMLA input (or none, + although that requires a MOVPRFX for SVE). However, Advanced SIMD + only supports MLA forms, so will require a move if the result + cannot be tied to the accumulator. The most important case in + which this is true is when the accumulator input is invariant. */ + rhs = gimple_op (assign, 3 - i); + if (TREE_CODE (rhs) != SSA_NAME) + return false; + def_stmt_info = vinfo->lookup_def (rhs); + if (!def_stmt_info + || STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_external_def) + return false; + } + return true; } return false; @@ -15232,7 +15251,7 @@ aarch64_count_ops (class vec_info *vinfo, aarch64_vector_costs *costs, } /* Assume that multiply-adds will become a single operation. */ - if (stmt_info && aarch64_multiply_add_p (vinfo, stmt_info)) + if (stmt_info && aarch64_multiply_add_p (vinfo, stmt_info, vec_flags)) return; /* When costing scalar statements in vector code, the count already