From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id E2167385B53C; Fri, 21 Jul 2023 12:28:12 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E2167385B53C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1689942492; bh=TISSSzgYqYDAL/u4DjOPhRrjCEbDBwJtcuJwfk6M4iQ=; h=From:To:Subject:Date:In-Reply-To:References:From; b=BW4s/NN36/RcJ1lP0YdoWTcPWgJtUzsrQp6X5gNKueunj11npr4Ptrq61bIwMqULl VqEycXJI6vBG599BsVjtMmEskGqU6dL1/o9gZB5krjRPHMTXG6wVd/72nTX693k3B7 Y6DjAmfnuPDo1aDxBOuuCyNWu4gAGqsUJEFe4hII= From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/54939] Very poor vectorization of loops with complex arithmetic Date: Fri, 21 Jul 2023 12:28:10 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 4.8.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: RESOLVED X-Bugzilla-Resolution: FIXED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_status resolution cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D54939 Richard Biener changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution|--- |FIXED CC| |crazylht at gmail dot com --- Comment #14 from Richard Biener --- With SSE4.2 we now get .L3: movupd (%rdx,%rax), %xmm0 movupd (%rcx,%rax), %xmm4 movapd %xmm0, %xmm1 palignr $8, %xmm0, %xmm0 mulpd %xmm3, %xmm1 mulpd %xmm2, %xmm0 addpd %xmm4, %xmm1 addsubpd %xmm0, %xmm1 movups %xmm1, (%rcx,%rax) addq $16, %rax cmpq %rsi, %rax jne .L3 with AVX and FMA .L4: vmovupd (%rdx,%rax), %ymm0 vmovapd %ymm4, %ymm1 vfmadd213pd (%rcx,%rax), %ymm0, %ymm1 vpermilpd $5, %ymm0, %ymm0 vmulpd %ymm3, %ymm0, %ymm0 vaddsubpd %ymm0, %ymm1, %ymm1 vmovupd %ymm1, (%rcx,%rax) addq $32, %rax cmpq %rsi, %rax jne .L4 so I'd say fixed. But. With AVX512 we now get .L4: vmovupd (%rdi,%rax), %zmm0 vmovapd %zmm7, %zmm2 vmovapd %zmm4, %zmm6 vfmadd213pd (%rcx,%rax), %zmm0, %zmm2 vpermilpd $85, %zmm0, %zmm0 vfmadd132pd %zmm0, %zmm2, %zmm6 vfnmadd132pd %zmm4, %zmm2, %zmm0 vmovapd %zmm6, %zmm0{%k1} vmovupd %zmm0, (%rcx,%rax) addq $64, %rax cmpq %rax, %rsi jne .L4 it's odd that this only happens with -mprefer-vector-width=3D512 though. Do we possibly miss vec_{fm,}{addsub,subadd} for those? Looks like so. Tracking in PR110767. The vectorizer side is fixed.=