From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 95F733858C31; Fri, 29 Dec 2023 15:59:40 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 95F733858C31 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1703865580; bh=tdv/fllVC7zlWlBk+sU0hyjRHFSa5/Zm8L/PKlhi6xw=; h=From:To:Subject:Date:In-Reply-To:References:From; b=MpZSATxBOhUxI7xo8zjAcvDTIzUFGiTLGu7KWnSBg/iRK45avSwecdoc4TIWe30Z5 uNYuEqD0gULkJBc7i7EqM/5ZWmcY0ck3pkkVVoeTekHn/2DRRlmp8yvnf/UVMGGskz ZmHNJxEapwooAV1Om4MbnJCGyEbs3vTqOOnJMUq4= From: "cvs-commit at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/110625] [14 Regression][AArch64] Vect: SLP fails to vectorize a loop as the reduction_latency calculated by new costs is too large Date: Fri, 29 Dec 2023 15:59:38 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: cvs-commit at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P1 X-Bugzilla-Assigned-To: rsandifo at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D110625 --- Comment #24 from GCC Commits --- The master branch has been updated by Tamar Christina : https://gcc.gnu.org/g:984bdeaa39b6417b11736b2c167ef82119e272dc commit r14-6865-g984bdeaa39b6417b11736b2c167ef82119e272dc Author: Tamar Christina Date: Fri Dec 29 15:58:29 2023 +0000 AArch64: Update costing for vector conversions [PR110625] In gimple the operation short _8; double _9; _9 =3D (double) _8; denotes two operations on AArch64. First we have to widen from short to long and then convert this integer to a double. Currently however we only count the widen/truncate operations: (double) _5 6 times vec_promote_demote costs 12 in body (double) _5 12 times vec_promote_demote costs 24 in body but not the actual conversion operation, which needs an additional 12 instructions in the attached testcase. Without this the attached test= case ends up incorrectly thinking that it's beneficial to vectorize the loop at a very high VF =3D 8 (4x unrolled). Because we can't change the mid-end to account for this the costing cod= e in the backend now keeps track of whether the previous operation was a promotion/demotion and ajdusts the expected number of instructions to: 1. If it's the first FLOAT_EXPR and the precision of the lhs and rhs are different, double it, since we need to convert and promote. 2. If it's the previous operation was a demonition/promotion then reduce the cost of the current operation by the amount we added extra in the la= st. with the patch we get: (double) _5 6 times vec_promote_demote costs 24 in body (double) _5 12 times vec_promote_demote costs 36 in body which correctly accounts for 30 operations. This fixes the 16% regression in imagick in SPECCPU 2017 reported on Neoverse N2 and using the new generic Armv9-a cost model. gcc/ChangeLog: PR target/110625 * config/aarch64/aarch64.cc (aarch64_vector_costs::add_stmt_cos= t): Adjust throughput and latency calculations for vector conversio= ns. (class aarch64_vector_costs): Add m_num_last_promote_demote. gcc/testsuite/ChangeLog: PR target/110625 * gcc.target/aarch64/pr110625_4.c: New test. * gcc.target/aarch64/sve/unpack_fcvt_signed_1.c: Add --param aarch64-sve-compare-costs=3D0. * gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c: Likewise=