From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id AA76038582B5; Fri, 12 Jan 2024 12:38:27 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org AA76038582B5 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1705063107; bh=EeTsZ4VtlDdl6N6PO+vgkkzpY78L0za1DFM4xjQ8rc4=; h=From:To:Subject:Date:In-Reply-To:References:From; b=PsAIJmDheILUL1GmDRC8U0uDeKdTlK9vss8SvUCFtoMRf2I5oIChKaADWyYNFvo4R H2HcgbwQXbZTz2aULPco4agbywXyqoaLyxVTeLcLSb3/h/pct/wt0VJrqaVmEt3YaB Pluq9jBV/qMYPDH/NEbo4Dq433yU/B7nMqQVTNkk= From: "cvs-commit at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/113196] [14 Regression] Failure to use ushll{,2} Date: Fri, 12 Jan 2024 12:38:25 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: cvs-commit at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: rsandifo at gcc dot gnu.org X-Bugzilla-Target-Milestone: 14.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D113196 --- Comment #2 from GCC Commits --- The trunk branch has been updated by Richard Sandiford : https://gcc.gnu.org/g:74e3e839ab2d368413207455af2fdaaacc73842b commit r14-7187-g74e3e839ab2d368413207455af2fdaaacc73842b Author: Richard Sandiford Date: Fri Jan 12 12:38:01 2024 +0000 aarch64: Rework uxtl->zip optimisation [PR113196] g:f26f92b534f9 implemented unsigned extensions using ZIPs rather than UXTL{,2}, since the former has a higher throughput than the latter on amny cores. The optimisation worked by lowering directly to ZIP during expand, so that the zero input could be hoisted and shared. However, changing to ZIP means that zero extensions no longer benefit from some existing combine patterns. The patch included new patterns for UADDW and USUBW, but the PR shows that other patterns were affected as well. This patch instead introduces the ZIPs during a pre-reload split and forcibly hoists the zero move to the outermost scope. This has the disadvantage of executing the move even for a shrink-wrapped function, which I suppose could be a problem if it causes a kernel to trap and enable Advanced SIMD unnecessarily. In other circumstances, an unused move shouldn't affect things much. Also, the RA should be able to rematerialise the move at an appropriate point if necessary, such as if there is an intervening call. In https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641948.html I'd then tried to allow a zero to be recombined back into a solitary ZIP. However, that relied on late-combine, which didn't make it into GCC 14. This version instead restricts the split to cases where the UXTL executes more frequently as the entry block (which is where we plan to put the zero). Also, the original optimisation contained a big-endian correction that I don't think is needed/correct. Even on big-endian targets, we want the ZIP to take the low half of an element from the input vector and the high half from the zero vector. And the patterns map directly to the underlying Advanced SIMD instructions: the use of unspecs means that there's no need to adjust for the difference between GCC and Arm lane numbering. gcc/ PR target/113196 * config/aarch64/aarch64.h (machine_function::advsimd_zero_insn= ): New member variable. * config/aarch64/aarch64-protos.h (aarch64_split_simd_shift_p): Declare. * config/aarch64/iterators.md (Vnarrowq2): New mode attribute. * config/aarch64/aarch64-simd.md (vec_unpacku_hi_, vec_unpacks_hi_): Recombine into.= .. (vec_unpack_hi_): ...this. Move the generation of zip2 for zero-extends to... (aarch64_simd_vec_unpack_hi_): ...a split of this instruction. Fix big-endian handling. (vec_unpacku_lo_, vec_unpacks_lo_): Recombine into.= .. (vec_unpack_lo_): ...this. Move the generation of zip1 for zero-extends to... (2): ...a split of this instruction. Fix big-endian handling. (*aarch64_zip1_uxtl): New pattern. (aarch64_usubw_lo_zip, aarch64_uaddw_lo_zip): Delete (aarch64_usubw_hi_zip, aarch64_uaddw_hi_zip): Likew= ise. * config/aarch64/aarch64.cc (aarch64_get_shareable_reg): New function. (aarch64_gen_shareable_zero): Use it. (aarch64_split_simd_shift_p): New function. gcc/testsuite/ PR target/113196 * gcc.target/aarch64/pr113196.c: New test. * gcc.target/aarch64/simd/vmovl_high_1.c: Remove double include. Expect uxtl2 rather than zip2. * gcc.target/aarch64/vect_mixed_sizes_8.c: Expect zip1 rather than uxtl. * gcc.target/aarch64/vect_mixed_sizes_9.c: Likewise. * gcc.target/aarch64/vect_mixed_sizes_10.c: Likewise.=