From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id EF5A03858C66; Wed, 21 Jun 2023 13:45:29 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org EF5A03858C66 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1687355129; bh=vIMAM//5mgYbjcWz+v67fL+bBWcesNXuNSwmDBV4yg0=; h=From:To:Subject:Date:In-Reply-To:References:From; b=s0Ba5FKqKOGDemwXkSLEBvpgS0ymMXtZzrrzQbT9oBOoqe906B87Mdxai1AA+dric n/3bmNcMLD0vcCaHPANV1FAIibALm3QFw7EOonOfMrK5ieiKm7T4YDpQdX2sNuSgCa mfkQl6o2b9hvBTkxWmc1yMEqjK/hiexwm3qkNvqo= From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug middle-end/106081] missed vectorization Date: Wed, 21 Jun 2023 13:45:28 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: middle-end X-Bugzilla-Version: 13.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: enhancement X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: dependson Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D106081 Richard Biener changed: What |Removed |Added ---------------------------------------------------------------------------- Depends on| |96208 --- Comment #5 from Richard Biener --- PR96208 is the SLP of non-grouped loads. We now can convert short -> double and we get with the grouped load hacked and -march=3Dznver3: .L2: vmovdqu (%rax), %ymm0 vpermq $27, -24(%rdi), %ymm10 addq $32, %rax subq $32, %rdi vpshufb %ymm7, %ymm0, %ymm0 vpermpd $85, %ymm10, %ymm9 vpermpd $170, %ymm10, %ymm8 vpermpd $255, %ymm10, %ymm6 vpmovsxwd %xmm0, %ymm1 vextracti128 $0x1, %ymm0, %xmm0 vbroadcastsd %xmm10, %ymm10 vcvtdq2pd %xmm1, %ymm11 vextracti128 $0x1, %ymm1, %xmm1 vpmovsxwd %xmm0, %ymm0 vcvtdq2pd %xmm1, %ymm1 vfmadd231pd %ymm10, %ymm11, %ymm5 vfmadd231pd %ymm9, %ymm1, %ymm2 vcvtdq2pd %xmm0, %ymm1 vextracti128 $0x1, %ymm0, %xmm0 vcvtdq2pd %xmm0, %ymm0 vfmadd231pd %ymm8, %ymm1, %ymm4 vfmadd231pd %ymm6, %ymm0, %ymm3 cmpq %rax, %rdx jne .L2 that is, the 'short' data type forces a higher VF to us and the splat codegen I hacked in is sub-optimal still. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D96208 [Bug 96208] non-grouped load can be SLP vectorized for 2-element vectors ca= se=