From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 54C3D3858D28; Wed, 15 Dec 2021 11:26:16 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 54C3D3858D28 From: "jakub at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/101846] Improve __builtin_shufflevector emitted code Date: Wed, 15 Dec 2021 11:26:16 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: enhancement X-Bugzilla-Who: jakub at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Dec 2021 11:26:16 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D101846 --- Comment #10 from Jakub Jelinek --- For bar, the problem is that while vpmovdw is AVX512F, we actually recogniz= e it only at combine time as vpermw (with selected exact permutation) combined w= ith low part extraction. And vpermw is only AVX512BW. In order to optimize it, we'd need to implement what LLVM actually has supp= ort for, namely the "I don't care" possibilities for the permutations. So, instead of what we emit right now in GIMPLE: _1 =3D VEC_PERM_EXPR ; _3 =3D BIT_FIELD_REF <_1, 256, 0>; we'd need to emit _1 =3D VEC_PERM_EXPR ; (we'd need a special VEC_PERM_EXPR variant for that which would only accept VECTOR_CSTs and reserve all ones for the "ANY" case in there). And, the hard part, adjust the target const vec perm code to handle those efficiently - as a wildcard for whatever other element of the vector or constant 0. One thing are the code which verifies the d->perm[?] values wh= ich would treat the wildcards as anything but for a successful match we'd actua= lly need to compute what value is best based on the non-wildcard values in the permutation. Another are the many cases where we construct RTL and try to recog it, we'd need some new RTL which would stand for CONST_INT_WILDCARD that would compare eq= ual to any int, but would need some way how the pattern if matched would actual= ly tells us back which number it wants to use. With that support, we could recognize the { 0, 2, 4, 6, 8, 10, 12, 14, 16, = 18, 20, 22, 24, 26, 28, 30, ANY, ANY, ANY, ANY, ANY, ANY, ANY, ANY, ANY, ANY, A= NY, ANY, ANY, ANY, ANY, ANY } V32HI permutation as matching the vpmovdw instruc= tion which puts 0s in the upper half of the vector. The foo case is doable even without this I think, the question is whether we should try to split arbitrary permutation of 64-byte vectors into permutati= ons of the two halves merged then together if the permutation allows that (first half of elements is from first halves of the inputs and second half of elem= ents is from second halves of the inputs).=