From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 761453858C53; Wed, 28 Jun 2023 12:15:12 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 761453858C53 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1687954512; bh=X1MQ6t7p/xPr2r06pXaZiOBDUWsKxSFqGWf5QWSlzkg=; h=From:To:Subject:Date:In-Reply-To:References:From; b=XXeqWoDSO2GbbwcOmGqGhh149vTUjW/EaZIYOC9OduFyfaOouxqxVvvwgfDGMpA4+ QI5B0NUf2eobdmIGxfJJHr6EX8O0/Q1LMNg2QTcfk+rYiRr3+Y6rMldBQ69FbXnQa/ YPiOwWUQnJw2UsFLsxqKAdRMHtqQ/JHIyVf0M0UA= From: "hubicka at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug middle-end/106081] missed vectorization Date: Wed, 28 Jun 2023 12:15:11 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: middle-end X-Bugzilla-Version: 13.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: enhancement X-Bugzilla-Who: hubicka at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D106081 --- Comment #8 from Jan Hubicka --- Imagemagick improved by 17% on zen3 and 11% on altra https://lnt.opensuse.org/db_default/v4/SPEC/37550 https://lnt.opensuse.org/db_default/v4/SPEC/37543 which is cool :) The loop is now optimized as: .L2: vmovdqu16 (%rax), %zmm0 vmovupd (%rdx), %zmm2 addq $64, %rax subq $64, %rdx vpermpd %zmm2, %zmm15, %zmm9 vpermpd %zmm2, %zmm14, %zmm8 vpermpd %zmm2, %zmm13, %zmm7 vpermpd %zmm2, %zmm11, %zmm2 vpshufb %zmm12, %zmm0, %zmm0 vpmovsxwd %ymm0, %zmm1 vextracti64x4 $0x1, %zmm0, %ymm0 vpmovsxwd %ymm0, %zmm0 vcvtdq2pd %ymm1, %zmm10 vextracti32x8 $0x1, %zmm1, %ymm1 vcvtdq2pd %ymm1, %zmm1 vfmadd231pd %zmm2, %zmm10, %zmm6 vfmadd231pd %zmm9, %zmm1, %zmm3 vcvtdq2pd %ymm0, %zmm1 vextracti32x8 $0x1, %zmm0, %ymm0 vcvtdq2pd %ymm0, %zmm0 vfmadd231pd %zmm8, %zmm1, %zmm5 vfmadd231pd %zmm7, %zmm0, %zmm4 cmpq %rax, %rcx jne .L2=