From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <crazylht@gmail.com>
Received: from mail-yb1-xb30.google.com (mail-yb1-xb30.google.com
 [IPv6:2607:f8b0:4864:20::b30])
 by sourceware.org (Postfix) with ESMTPS id B02873858D39
 for <gcc-patches@gcc.gnu.org>; Fri,  1 Jul 2022 02:12:38 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B02873858D39
Received: by mail-yb1-xb30.google.com with SMTP id v38so1666052ybi.3
 for <gcc-patches@gcc.gnu.org>; Thu, 30 Jun 2022 19:12:38 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=4yviJEKOoISF0OxK70V0ByWDxLtYJVIBqcSdt6CgyFs=;
 b=jZxkgufl383cwnXM4mhlQmEoqx81GHHrYhbWiMeebxcAu3sCm5RGEJtNV+J21XlfZu
 XDHlCWWmkgzNZF6Udn3c75DHoj60lnc+NfPQsGX0eq3IJK79+XNGAOuyrdoGfhG/ML5p
 QjcAxB7ckJdMgSEDPtf82bOEpuOK2ajp1c4vdTZ59HtgUC4z2du+Y63Wd7zFOmZefCy0
 uFYNPprFr9Qbn9/OqUrFsQWk+LrN2fxUwHToMnCdzJmJ5ZRxQiP61G02bmBmBms4yqyD
 ao4eQ4YEkq/AbDwQFvUnGSj6/BRPZ0NQ+Ef4vqV3u8Cp3xN2+7sugqIKd1cAPOr3jRng
 u66A==
X-Gm-Message-State: AJIora/0YMUS3Lb7XUUwgYQ+px9SKNsi/Iox8I9bvpVxWbYSRLrqN0EM
 dLjg4RDUiThCORTPt1jmzw/x/BKFK5HGlRXjTHo=
X-Google-Smtp-Source: AGRyM1tCcAnArg8Zif7P/WeLmLL+7UAXb92tdflZcT5NeCtm18K3Ku/Pnv6nmlrCO2aX0Dlo4B9rVsZbBqCMccNQuqk=
X-Received: by 2002:a25:ca0a:0:b0:66b:4e6c:e094 with SMTP id
 a10-20020a25ca0a000000b0066b4e6ce094mr13754259ybg.296.1656641558041; Thu, 30
 Jun 2022 19:12:38 -0700 (PDT)
MIME-Version: 1.0
References: <006901d88cb1$1a06e870$4e14b950$@nextmovesoftware.com>
In-Reply-To: <006901d88cb1$1a06e870$4e14b950$@nextmovesoftware.com>
From: Hongtao Liu <crazylht@gmail.com>
Date: Fri, 1 Jul 2022 10:12:26 +0800
Message-ID: <CAMZc-bzjAP8PDz1=OWsaYPgLQ1qDh=p5kSL7eiXAGbEACT1irg@mail.gmail.com>
Subject: Re: [x86 PATCH] UNSPEC_PALIGNR optimizations and clean-ups.
To: Roger Sayle <roger@nextmovesoftware.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset="UTF-8"
X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, KAM_SHORT,
 RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Jul 2022 02:12:40 -0000

On Fri, Jul 1, 2022 at 2:42 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> This patch is a follow-up to Hongtao's fix for PR target/105854.  That
> fix is perfectly correct, but the thing that caught my eye was why is
> the compiler generating a shift by zero at all.  Digging deeper it
> turns out that we can easily optimize __builtin_ia32_palignr for
> alignments of 0 and 64 respectively, which may be simplified to moves
> from the highpart or lowpart.
>
> After adding optimizations to simplify the 64-bit DImode palignr,
> I started to add the corresponding optimizations for vpalignr (i.e.
> 128-bit).  The first oddity is that sse.md uses TImode and a special
> SSESCALARMODE iterator, rather than V1TImode, and indeed the comment
> above SSESCALARMODE hints that this should be "dropped in favor of
> VIMAX_AVX2_AVX512BW".  Hence this patch includes the migration of
> <ssse3_avx2>_palignr<mode> to use VIMAX_AVX2_AVX512BW, basically
> using V1TImode instead of TImode for 128-bit palignr.
>
> But it was only after I'd implemented this clean-up that I stumbled
> across the strange semantics of 128-bit [v]palignr.  According to
> https://www.felixcloutier.com/x86/palignr, the semantics are subtly
> different based upon how the instruction is encoded.  PALIGNR leaves
> the highpart unmodified, whilst VEX.128 encoded VPALIGNR clears the
> highpart, and (unless I'm mistaken) it looks like GCC currently uses
> the exact same RTL/templates for both, treating one as an alternative
> for the other.
I think as long as patterns or intrinsics only care about the low
part, they should be ok.
But if we want to use default behavior for upper bits, we need to
restrict them under specific isa(.i.e. vmovq in vec_set<mode>_0).
Generally, 128-bit sse legacy instructions have different behaviors
for upper bits from AVX ones, and that's why vzeroupper is introduced
for sse <-> avx instructions transition.
>
> Hence I thought I'd post what I have so far (part optimization and
> part clean-up), to then ask the x86 experts for their opinions.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-,32},
> with no new failures.  Ok for mainline?
>
>
> 2022-06-30  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/i386/i386-builtin.def (__builtin_ia32_palignr128): Change
>         CODE_FOR_ssse3_palignrti to CODE_FOR_ssse3_palignrv1ti.
>         * config/i386/i386-expand.cc (expand_vec_perm_palignr): Use V1TImode
>         and gen_ssse3_palignv1ti instead of TImode.
>         * config/i386/sse.md (SSESCALARMODE): Delete.
>         (define_mode_attr ssse3_avx2): Handle V1TImode instead of TImode.
>         (<ssse3_avx2>_palignr<mode>): Use VIMAX_AVX2_AVX512BW as a mode
>         iterator instead of SSESCALARMODE.
>
>         (ssse3_palignrdi): Optimize cases when operands[3] is 0 or 64,
>         using a single move instruction (if required).
>         (define_split): Likewise split UNSPEC_PALIGNR $0 into a move.
>         (define_split): Likewise split UNSPEC_PALIGNR $64 into a move.
>
> gcc/testsuite/ChangeLog
>         * gcc.target/i386/ssse3-palignr-2.c: New test case.
>
>
> Thanks in advance,
> Roger
> --
>

+(define_split
+  [(set (match_operand:DI 0 "register_operand")
+ (unspec:DI [(match_operand:DI 1 "register_operand")
+    (match_operand:DI 2 "register_mmxmem_operand")
+    (const_int 0)]
+   UNSPEC_PALIGNR))]
+  ""
+  [(set (match_dup 0) (match_dup 2))])
+
+(define_split
+  [(set (match_operand:DI 0 "register_operand")
+ (unspec:DI [(match_operand:DI 1 "register_operand")
+    (match_operand:DI 2 "register_mmxmem_operand")
+    (const_int 64)]
+   UNSPEC_PALIGNR))]
+  ""
+  [(set (match_dup 0) (match_dup 1))])
+
define_split is assumed to be splitted to 2(or more) insns, hence
pass_combine will only try define_split if the number of merged insns
is greater than 2.
For palignr, i think most time there would be only 2 merged
insns(constant propagation), so better to change them as pre_reload
splitter.
(.i.e. (define_insn_and_split "*avx512bw_permvar_truncv16siv16hi_1").


--
BR,
Hongtao