From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id BE0BE397B01E; Thu, 12 Aug 2021 01:53:24 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BE0BE397B01E
From: "crazylht at gmail dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/101846] Improve __builtin_shufflevector emitted code
Date: Thu, 12 Aug 2021 01:53:24 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 12.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: crazylht at gmail dot com
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-101846-4-ZrFniSJ5tG@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-101846-4@http.gcc.gnu.org/bugzilla/>
References: <bug-101846-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 12 Aug 2021 01:53:24 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D101846
--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
expand_vec_perm_1 is supposed to generate 1 instruction, but it doesn't
consider load of const_vector, if we handle (In reply to Hongtao.liu from
comment #2)
> For foo, vmovdqa is avx_vec_concatv16si/2, and we can add
> define_insn_and_split to combine avx_vec_concatv16si/2 and
> avx512f_zero_extendv16hiv16si2_1, similar for other modes in
> pmovzx{bw,wd,dq}.
>=20
> For bar, we need to match pmov{wb,dw,qd} in ix86_vectorize_vec_perm_const
> when only one operand is used and selector are truncate index, just like =
we
> did for pmovzx.
>=20
> I'll take this.

For bar when there's real use for upper bits like
v32hi
foo_dw_512 (v32hi x)
{
  return __builtin_shufflevector (x, x,
                                  0, 2, 4, 6, 8, 10, 12, 14,
                                  16, 18, 20, 22, 24, 26, 28, 30,
                                  16, 17, 18, 19, 20, 21, 22, 23,
                                  24, 25, 26, 27, 28, 29, 30, 31);
}

The vpmovdw version seems still better

-       vmovdqa64       %zmm0, %zmm1
-       vmovdqa64       .LC0(%rip), %zmm0
-       vpermi2w        %zmm1, %zmm1, %zmm0
+       vpmovdw %zmm0, %ymm1
+       vinserti64x4    $0x0, %ymm1, %zmm0, %zmm0

The conclusion hold true for other 256/512bit modes, but not 128-bit modes.

-       vpshufb .LC2(%rip), %xmm0, %xmm0
+       vpmovdw %xmm0, %xmm1
+       vmovq   %xmm1, %rax
+       vpinsrq $0, %rax, %xmm0, %xmm0=