From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 95F733858C31; Fri, 29 Dec 2023 15:59:40 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 95F733858C31
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1703865580;
	bh=tdv/fllVC7zlWlBk+sU0hyjRHFSa5/Zm8L/PKlhi6xw=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=MpZSATxBOhUxI7xo8zjAcvDTIzUFGiTLGu7KWnSBg/iRK45avSwecdoc4TIWe30Z5
	 uNYuEqD0gULkJBc7i7EqM/5ZWmcY0ck3pkkVVoeTekHn/2DRRlmp8yvnf/UVMGGskz
	 ZmHNJxEapwooAV1Om4MbnJCGyEbs3vTqOOnJMUq4=
From: "cvs-commit at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/110625] [14 Regression][AArch64] Vect: SLP fails to
 vectorize a loop as the reduction_latency calculated by new costs is too
 large
Date: Fri, 29 Dec 2023 15:59:38 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 14.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: cvs-commit at gcc dot gnu.org
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P1
X-Bugzilla-Assigned-To: rsandifo at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-110625-4-tMRZXvDsNi@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-110625-4@http.gcc.gnu.org/bugzilla/>
References: <bug-110625-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D110625
--- Comment #24 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tamar Christina <tnfchris@gcc.gnu.org=
>:

https://gcc.gnu.org/g:984bdeaa39b6417b11736b2c167ef82119e272dc

commit r14-6865-g984bdeaa39b6417b11736b2c167ef82119e272dc
Author: Tamar Christina <tamar.christina@arm.com>
Date:   Fri Dec 29 15:58:29 2023 +0000

    AArch64: Update costing for vector conversions [PR110625]

    In gimple the operation

    short _8;
    double _9;
    _9 =3D (double) _8;

    denotes two operations on AArch64.  First we have to widen from short to
    long and then convert this integer to a double.

    Currently however we only count the widen/truncate operations:

    (double) _5 6 times vec_promote_demote costs 12 in body
    (double) _5 12 times vec_promote_demote costs 24 in body

    but not the actual conversion operation, which needs an additional 12
    instructions in the attached testcase.   Without this the attached test=
case
ends
    up incorrectly thinking that it's beneficial to vectorize the loop at a
very
    high VF =3D 8 (4x unrolled).

    Because we can't change the mid-end to account for this the costing cod=
e in
the
    backend now keeps track of whether the previous operation was a
    promotion/demotion and ajdusts the expected number of instructions to:

    1. If it's the first FLOAT_EXPR and the precision of the lhs and rhs are
       different, double it, since we need to convert and promote.
    2. If it's the previous operation was a demonition/promotion then reduce
the
       cost of the current operation by the amount we added extra in the la=
st.

    with the patch we get:

    (double) _5 6 times vec_promote_demote costs 24 in body
    (double) _5 12 times vec_promote_demote costs 36 in body

    which correctly accounts for 30 operations.

    This fixes the 16% regression in imagick in SPECCPU 2017 reported on
Neoverse N2
    and using the new generic Armv9-a cost model.

    gcc/ChangeLog:

            PR target/110625
            * config/aarch64/aarch64.cc (aarch64_vector_costs::add_stmt_cos=
t):
            Adjust throughput and latency calculations for vector conversio=
ns.
            (class aarch64_vector_costs): Add m_num_last_promote_demote.

    gcc/testsuite/ChangeLog:

            PR target/110625
            * gcc.target/aarch64/pr110625_4.c: New test.
            * gcc.target/aarch64/sve/unpack_fcvt_signed_1.c: Add
            --param aarch64-sve-compare-costs=3D0.
            * gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c: Likewise=