From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 85A603858CD1; Fri, 14 Jul 2023 17:13:35 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 85A603858CD1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1689354815; bh=RrJpvwdWUufjFL6wcYr/WqeC2+jx91R1BqPP8GSNgvM=; h=From:To:Subject:Date:In-Reply-To:References:From; b=q76G44gjxKqSe8bivfd9VQ36jjpggr4dANwC5p5WPoMW9JCJlS1kzKs1Pa2c3wLHE oW11h3mcqXchvfdj+miVO9ocNOha44Ei0Ld4959Vs5poXbF/ee4x9S4w7gtymtd+e0 GxoIAV/3YcpKiwEO9ik6ZkHtg+uPBWCLbG3zC/Gs= From: "cvs-commit at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug middle-end/88873] missing vectorization for decomposed operations on a vector type Date: Fri, 14 Jul 2023 17:13:34 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: middle-end X-Bugzilla-Version: 9.0 X-Bugzilla-Keywords: missed-optimization, ra X-Bugzilla-Severity: normal X-Bugzilla-Who: cvs-commit at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D88873 --- Comment #8 from CVS Commits --- The master branch has been updated by Roger Sayle : https://gcc.gnu.org/g:8911879415d6c2a7baad88235554a912887a1c5c commit r14-2526-g8911879415d6c2a7baad88235554a912887a1c5c Author: Roger Sayle Date: Fri Jul 14 18:10:05 2023 +0100 i386: Improved insv of DImode/DFmode {high,low}parts into TImode. This is the next piece towards a fix for (the x86_64 ABI issues affecti= ng) PR 88873. This patch generalizes the recent tweak to ix86_expand_move for setting the highpart of a TImode reg from a DImode source using *insvti_highpart_1, to handle both DImode and DFmode sources, and also use the recently added *insvti_lowpart_1 for setting the lowpart. Although this is another intermediate step (not yet a fix), towards enabling *insvti and *concat* patterns to be candidates for TImode STV (by using V2DI/V2DF instructions), it already improves things a little. For the test case from PR 88873 typedef struct { double x, y; } s_t; typedef double v2df __attribute__ ((vector_size (2 * sizeof(double)))); s_t foo (s_t a, s_t b, s_t c) { return (s_t) { fma(a.x, b.x, c.x), fma (a.y, b.y, c.y) }; } With -O2 -march=3Dcascadelake, GCC currently generates: Before (29 instructions): vmovq %xmm2, -56(%rsp) movq -56(%rsp), %rdx vmovq %xmm4, -40(%rsp) movq $0, -48(%rsp) movq %rdx, -56(%rsp) movq -40(%rsp), %rdx vmovq %xmm0, -24(%rsp) movq %rdx, -40(%rsp) movq -24(%rsp), %rsi movq -56(%rsp), %rax movq $0, -32(%rsp) vmovq %xmm3, -48(%rsp) movq -48(%rsp), %rcx vmovq %xmm5, -32(%rsp) vmovq %rax, %xmm6 movq -40(%rsp), %rax movq $0, -16(%rsp) movq %rsi, -24(%rsp) movq -32(%rsp), %rsi vpinsrq $1, %rcx, %xmm6, %xmm6 vmovq %rax, %xmm7 vmovq %xmm1, -16(%rsp) vmovapd %xmm6, %xmm3 vpinsrq $1, %rsi, %xmm7, %xmm7 vfmadd132pd -24(%rsp), %xmm7, %xmm3 vmovapd %xmm3, -56(%rsp) vmovsd -48(%rsp), %xmm1 vmovsd -56(%rsp), %xmm0 ret After (20 instructions): vmovq %xmm2, -56(%rsp) movq -56(%rsp), %rax vmovq %xmm3, -48(%rsp) vmovq %xmm4, -40(%rsp) movq -48(%rsp), %rcx vmovq %xmm5, -32(%rsp) vmovq %rax, %xmm6 movq -40(%rsp), %rax movq -32(%rsp), %rsi vpinsrq $1, %rcx, %xmm6, %xmm6 vmovq %xmm0, -24(%rsp) vmovq %rax, %xmm7 vmovq %xmm1, -16(%rsp) vmovapd %xmm6, %xmm2 vpinsrq $1, %rsi, %xmm7, %xmm7 vfmadd132pd -24(%rsp), %xmm7, %xmm2 vmovapd %xmm2, -56(%rsp) vmovsd -48(%rsp), %xmm1 vmovsd -56(%rsp), %xmm0 ret 2023-07-14 Roger Sayle gcc/ChangeLog * config/i386/i386-expand.cc (ix86_expand_move): Generalize spe= cial case inserting of 64-bit values into a TImode register, to hand= le both DImode and DFmode using either *insvti_lowpart_1 or *isnvti_highpart_1.=