From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id B84453853D28; Mon, 3 Apr 2023 08:58:10 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B84453853D28 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1680512290; bh=B5lKs8s0x66aMRjK+2w0Z9632xkRED0Hyt+b1zdL8ws=; h=From:To:Subject:Date:In-Reply-To:References:From; b=gM6jzx+Esb/l1sRhjrnk68qE5jaKsCl5sDlXENphmdicEz838cj6KxCHR4/UenZEG 3dICT+sy+0K1XYaUpDAe7KBjROU/8BA3g9jxrRTIKCbR4kTNNSKWkjikbqfzUpUH/O bbUoZlSxsru8Orsqf06cnAQM/XQeKM5SGLFbA56s= From: "cvs-commit at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/109072] [12 Regression] SLP costs for vec duplicate too high since g:4963079769c99c4073adfd799885410ad484cbbe Date: Mon, 03 Apr 2023 08:58:10 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: cvs-commit at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: rsandifo at gcc dot gnu.org X-Bugzilla-Target-Milestone: 12.3 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D109072 --- Comment #11 from CVS Commits --- The releases/gcc-12 branch has been updated by Richard Sandiford : https://gcc.gnu.org/g:eff10fe7384d1504f2c92db1fd44c663f737f57d commit r12-9383-geff10fe7384d1504f2c92db1fd44c663f737f57d Author: Richard Sandiford Date: Mon Apr 3 09:57:08 2023 +0100 aarch64: Restore vectorisation of vld1 inputs [PR109072] Before GCC 12, we would vectorize: int32_t arr[] =3D { x, x, x, x }; at -O3. Vectorizing the store on its own is often a loss, particularly for integers, so g:4963079769c99c4073adfd799885410ad484cbbe suppressed = it. This was necessary to fix regressions from enabling vectorisation at -O= 2, However, the vectorisation is important if the code subsequently loads from the array using vld1: return vld1q_s32 (arr); This approach of initialising an array and loading from it is the recommend endian-agnostic way of constructing an ACLE vector. As discussed in the PR notes, the general fix would be to fold the store and load-back to a constructor (preferably before vectorisation). But that's clearly not stage 4 material. This patch instead delays folding vld1 until after inlining and records which decls a vld1 loads from. It then treats vector stores to those decls as free, on the optimistic assumption that they will be removed later. The patch also brute-forces vectorization of plain constructor+store sequences, since some of the CPU costs make that (dubiously) expensive even when the store is discounted. Delaying folding showed that we were failing to update the vops. The patch fixes that too. Thanks to Tamar for discussion & help with testing. gcc/ PR target/109072 * config/aarch64/aarch64-protos.h (aarch64_vector_load_decl): Declare. * config/aarch64/aarch64.h (machine_function::vector_load_decls= ): New variable. * config/aarch64/aarch64-builtins.cc (aarch64_record_vector_load_arg): New function. (aarch64_general_gimple_fold_builtin): Delay folding of vld1 un= til after inlining. Record which decls are loaded from. Fix handl= ing of vops for loads and stores. * config/aarch64/aarch64.cc (aarch64_vector_load_decl): New function. (aarch64_accesses_vector_load_decl_p): Likewise. (aarch64_vector_costs::m_stores_to_vector_load_decl): New member variable. (aarch64_vector_costs::add_stmt_cost): If the function has a vl= d1 that loads from a decl, treat vector stores to those decls as zero cost. (aarch64_vector_costs::finish_cost): ...and in that case, if the vector code does nothing more than a store, give the prologue a zero cost as well. gcc/testsuite/ PR target/109072 * gcc.target/aarch64/pr109072_1.c: New test. * gcc.target/aarch64/pr109072_2.c: Likewise. (cherry picked from commit fcb411564a655a01f759eea3bb16bfd1bc879bfd)=