From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id BE0BE397B01E; Thu, 12 Aug 2021 01:53:24 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BE0BE397B01E From: "crazylht at gmail dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/101846] Improve __builtin_shufflevector emitted code Date: Thu, 12 Aug 2021 01:53:24 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: crazylht at gmail dot com X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Aug 2021 01:53:24 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D101846 --- Comment #3 from Hongtao.liu --- expand_vec_perm_1 is supposed to generate 1 instruction, but it doesn't consider load of const_vector, if we handle (In reply to Hongtao.liu from comment #2) > For foo, vmovdqa is avx_vec_concatv16si/2, and we can add > define_insn_and_split to combine avx_vec_concatv16si/2 and > avx512f_zero_extendv16hiv16si2_1, similar for other modes in > pmovzx{bw,wd,dq}. >=20 > For bar, we need to match pmov{wb,dw,qd} in ix86_vectorize_vec_perm_const > when only one operand is used and selector are truncate index, just like = we > did for pmovzx. >=20 > I'll take this. For bar when there's real use for upper bits like v32hi foo_dw_512 (v32hi x) { return __builtin_shufflevector (x, x, 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31); } The vpmovdw version seems still better - vmovdqa64 %zmm0, %zmm1 - vmovdqa64 .LC0(%rip), %zmm0 - vpermi2w %zmm1, %zmm1, %zmm0 + vpmovdw %zmm0, %ymm1 + vinserti64x4 $0x0, %ymm1, %zmm0, %zmm0 The conclusion hold true for other 256/512bit modes, but not 128-bit modes. - vpshufb .LC2(%rip), %xmm0, %xmm0 + vpmovdw %xmm0, %xmm1 + vmovq %xmm1, %rax + vpinsrq $0, %rax, %xmm0, %xmm0=