From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 1130) id 92169393B004; Tue, 3 Aug 2021 12:01:26 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 92169393B004 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="utf-8" From: Richard Sandiford To: gcc-cvs@gcc.gnu.org Subject: [gcc r12-2692] aarch64: Tweak the cost of elementwise stores X-Act-Checkin: gcc X-Git-Author: Richard Sandiford X-Git-Refname: refs/heads/master X-Git-Oldrev: 78770e0e5d9fef70679e1db4eb2fb06596fbb2f8 X-Git-Newrev: 537afb0857c8f60c2b60a09fad4660420cd13e8f Message-Id: <20210803120126.92169393B004@sourceware.org> Date: Tue, 3 Aug 2021 12:01:26 +0000 (GMT) X-BeenThere: gcc-cvs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-cvs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Aug 2021 12:01:26 -0000 https://gcc.gnu.org/g:537afb0857c8f60c2b60a09fad4660420cd13e8f commit r12-2692-g537afb0857c8f60c2b60a09fad4660420cd13e8f Author: Richard Sandiford Date: Tue Aug 3 13:00:46 2021 +0100 aarch64: Tweak the cost of elementwise stores When the vectoriser scalarises a strided store, it counts one scalar_store for each element plus one vec_to_scalar extraction for each element. However, extracting element 0 is free on AArch64, so it should have zero cost. I don't have a testcase that requires this for existing -mtune options, but it becomes more important with a later patch. gcc/ * config/aarch64/aarch64.c (aarch64_is_store_elt_extraction): New function, split out from... (aarch64_detect_vector_stmt_subtype): ...here. (aarch64_add_stmt_cost): Treat extracting element 0 as free. Diff: --- gcc/config/aarch64/aarch64.c | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 36f11808916..084f8caa0da 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -14622,6 +14622,18 @@ aarch64_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost, } } +/* Return true if an operaton of kind KIND for STMT_INFO represents + the extraction of an element from a vector in preparation for + storing the element to memory. */ +static bool +aarch64_is_store_elt_extraction (vect_cost_for_stmt kind, + stmt_vec_info stmt_info) +{ + return (kind == vec_to_scalar + && STMT_VINFO_DATA_REF (stmt_info) + && DR_IS_WRITE (STMT_VINFO_DATA_REF (stmt_info))); +} + /* Return true if STMT_INFO represents part of a reduction. */ static bool aarch64_is_reduction (stmt_vec_info stmt_info) @@ -14959,9 +14971,7 @@ aarch64_detect_vector_stmt_subtype (vec_info *vinfo, vect_cost_for_stmt kind, /* Detect cases in which vec_to_scalar is describing the extraction of a vector element in preparation for a scalar store. The store itself is costed separately. */ - if (kind == vec_to_scalar - && STMT_VINFO_DATA_REF (stmt_info) - && DR_IS_WRITE (STMT_VINFO_DATA_REF (stmt_info))) + if (aarch64_is_store_elt_extraction (kind, stmt_info)) return simd_costs->store_elt_extra_cost; /* Detect SVE gather loads, which are costed as a single scalar_load @@ -15382,6 +15392,12 @@ aarch64_add_stmt_cost (class vec_info *vinfo, void *data, int count, if (vectype && aarch64_sve_only_stmt_p (stmt_info, vectype)) costs->saw_sve_only_op = true; + /* If we scalarize a strided store, the vectorizer costs one + vec_to_scalar for each element. However, we can store the first + element using an FP store without a separate extract step. */ + if (aarch64_is_store_elt_extraction (kind, stmt_info)) + count -= 1; + stmt_cost = aarch64_detect_scalar_stmt_subtype (vinfo, kind, stmt_info, stmt_cost);