From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <crazylht@gmail.com>
Received: from mail-yw1-x112e.google.com (mail-yw1-x112e.google.com
 [IPv6:2607:f8b0:4864:20::112e])
 by sourceware.org (Postfix) with ESMTPS id 5DB0D3858292
 for <gcc-patches@gcc.gnu.org>; Tue,  5 Jul 2022 00:31:09 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5DB0D3858292
Received: by mail-yw1-x112e.google.com with SMTP id
 00721157ae682-31cb2c649f7so19849597b3.11
 for <gcc-patches@gcc.gnu.org>; Mon, 04 Jul 2022 17:31:09 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=to1TDBi/2DeQaInd6D1homcoyRMiZ4HpLle7QolBGZg=;
 b=U8pcb3v1Rtp97zNbAzbjixPDalDUsuuasKhKH5c8V5WuAKbmYFZV1F8wuS5FAPc6qn
 q1dJKcfcV6gOsV5bpGWxkoRmF9uzR777GAN84RZrENyG0HLkqrhUl1EElji1a958SZUZ
 F1yP5kE58jLw9j/5DVSOWWjsuSiJxAsO4FOVS38jOUSwWLpEBvxTxob2XTRfGCwJayBw
 ZZFLbhsdmqTWo4+57fcbOfTkldcF7pxV2+sNCzqY9P+Ks2+Gbz1bS/K5YgwMiRhHYe/9
 Kl93GqItKbXJ64tPjIE4rFPmVnRbUYfsZ91SxPEhzy+EVA4A7yQrMxDRbYi8vzqawxxL
 Yu+w==
X-Gm-Message-State: AJIora9TfWcG4ctchGERHjYmLn6o8md4FDf87KgWAzrQVomkabhjfux7
 DLp7xR0eoWAjfzEBiorwB2mFjx3UyOcua8DYeAw7UdNCgGQ=
X-Google-Smtp-Source: AGRyM1u0X/B4dfPp/GxOgbc5D5UxlNDH64CMe+hSl2ZJTrzDIFJsrLDPEFuMVwOsqPucviWZTnhgP4ViTIrp699zgyA=
X-Received: by 2002:a81:7dd6:0:b0:31c:85d9:8488 with SMTP id
 y205-20020a817dd6000000b0031c85d98488mr14737044ywc.475.1656981068632; Mon, 04
 Jul 2022 17:31:08 -0700 (PDT)
MIME-Version: 1.0
References: <006901d88cb1$1a06e870$4e14b950$@nextmovesoftware.com>
 <CAMZc-bzjAP8PDz1=OWsaYPgLQ1qDh=p5kSL7eiXAGbEACT1irg@mail.gmail.com>
 <CAMZc-byU6460h3GrPDVdBVpma47PUhdpCXfSRnpam6d5E9wSrA@mail.gmail.com>
 <008701d88fce$39c9c8b0$ad5d5a10$@nextmovesoftware.com>
In-Reply-To: <008701d88fce$39c9c8b0$ad5d5a10$@nextmovesoftware.com>
From: Hongtao Liu <crazylht@gmail.com>
Date: Tue, 5 Jul 2022 08:30:58 +0800
Message-ID: <CAMZc-bw+B1HqD_F5duCxZa9pxTOHB675FuaM8fkWpK7pfym8Cg@mail.gmail.com>
Subject: Re: [x86 PATCH] UNSPEC_PALIGNR optimizations and clean-ups.
To: Roger Sayle <roger@nextmovesoftware.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset="UTF-8"
X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, KAM_SHORT,
 RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jul 2022 00:31:12 -0000

On Tue, Jul 5, 2022 at 1:48 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> Hi Hongtao,
> Many thanks for your review.  This revised patch implements your
> suggestions of removing the combine splitters, and instead reusing
> the functionality of the ssse3_palignrdi define_insn_and split.
>
> This revised patch has been tested on x86_64-pc-linux-gnu with make
> bootstrap and make -k check, both with and with --target_board=unix{-32},
> with no new failures.  Is this revised version Ok for mainline?
Ok.
>
>
> 2022-07-04  Roger Sayle  <roger@nextmovesoftware.com>
>             Hongtao Liu  <hongtao.liu@intel.com>
>
> gcc/ChangeLog
>         * config/i386/i386-builtin.def (__builtin_ia32_palignr128): Change
>         CODE_FOR_ssse3_palignrti to CODE_FOR_ssse3_palignrv1ti.
>         * config/i386/i386-expand.cc (expand_vec_perm_palignr): Use V1TImode
>         and gen_ssse3_palignv1ti instead of TImode.
>         * config/i386/sse.md (SSESCALARMODE): Delete.
>         (define_mode_attr ssse3_avx2): Handle V1TImode instead of TImode.
>         (<ssse3_avx2>_palignr<mode>): Use VIMAX_AVX2_AVX512BW as a mode
>         iterator instead of SSESCALARMODE.
>
>         (ssse3_palignrdi): Optimize cases where operands[3] is 0 or 64,
>         using a single move instruction (if required).
>
> gcc/testsuite/ChangeLog
>         * gcc.target/i386/ssse3-palignr-2.c: New test case.
>
>
> Thanks in advance,
> Roger
> --
>
> > -----Original Message-----
> > From: Hongtao Liu <crazylht@gmail.com>
> > Sent: 01 July 2022 03:40
> > To: Roger Sayle <roger@nextmovesoftware.com>
> > Cc: GCC Patches <gcc-patches@gcc.gnu.org>
> > Subject: Re: [x86 PATCH] UNSPEC_PALIGNR optimizations and clean-ups.
> >
> > On Fri, Jul 1, 2022 at 10:12 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > >
> > > On Fri, Jul 1, 2022 at 2:42 AM Roger Sayle <roger@nextmovesoftware.com>
> > wrote:
> > > >
> > > >
> > > > This patch is a follow-up to Hongtao's fix for PR target/105854.
> > > > That fix is perfectly correct, but the thing that caught my eye was
> > > > why is the compiler generating a shift by zero at all.  Digging
> > > > deeper it turns out that we can easily optimize
> > > > __builtin_ia32_palignr for alignments of 0 and 64 respectively,
> > > > which may be simplified to moves from the highpart or lowpart.
> > > >
> > > > After adding optimizations to simplify the 64-bit DImode palignr, I
> > > > started to add the corresponding optimizations for vpalignr (i.e.
> > > > 128-bit).  The first oddity is that sse.md uses TImode and a special
> > > > SSESCALARMODE iterator, rather than V1TImode, and indeed the comment
> > > > above SSESCALARMODE hints that this should be "dropped in favor of
> > > > VIMAX_AVX2_AVX512BW".  Hence this patch includes the migration of
> > > > <ssse3_avx2>_palignr<mode> to use VIMAX_AVX2_AVX512BW, basically
> > > > using V1TImode instead of TImode for 128-bit palignr.
> > > >
> > > > But it was only after I'd implemented this clean-up that I stumbled
> > > > across the strange semantics of 128-bit [v]palignr.  According to
> > > > https://www.felixcloutier.com/x86/palignr, the semantics are subtly
> > > > different based upon how the instruction is encoded.  PALIGNR leaves
> > > > the highpart unmodified, whilst VEX.128 encoded VPALIGNR clears the
> > > > highpart, and (unless I'm mistaken) it looks like GCC currently uses
> > > > the exact same RTL/templates for both, treating one as an
> > > > alternative for the other.
> > > I think as long as patterns or intrinsics only care about the low
> > > part, they should be ok.
> > > But if we want to use default behavior for upper bits, we need to
> > > restrict them under specific isa(.i.e. vmovq in vec_set<mode>_0).
> > > Generally, 128-bit sse legacy instructions have different behaviors
> > > for upper bits from AVX ones, and that's why vzeroupper is introduced
> > > for sse <-> avx instructions transition.
> > > >
> > > > Hence I thought I'd post what I have so far (part optimization and
> > > > part clean-up), to then ask the x86 experts for their opinions.
> > > >
> > > > This patch has been tested on x86_64-pc-linux-gnu with make
> > > > bootstrap and make -k check, both with and without
> > > > --target_board=unix{-,32}, with no new failures.  Ok for mainline?
> > > >
> > > >
> > > > 2022-06-30  Roger Sayle  <roger@nextmovesoftware.com>
> > > >
> > > > gcc/ChangeLog
> > > >         * config/i386/i386-builtin.def (__builtin_ia32_palignr128): Change
> > > >         CODE_FOR_ssse3_palignrti to CODE_FOR_ssse3_palignrv1ti.
> > > >         * config/i386/i386-expand.cc (expand_vec_perm_palignr): Use
> > V1TImode
> > > >         and gen_ssse3_palignv1ti instead of TImode.
> > > >         * config/i386/sse.md (SSESCALARMODE): Delete.
> > > >         (define_mode_attr ssse3_avx2): Handle V1TImode instead of TImode.
> > > >         (<ssse3_avx2>_palignr<mode>): Use VIMAX_AVX2_AVX512BW as a
> > mode
> > > >         iterator instead of SSESCALARMODE.
> > > >
> > > >         (ssse3_palignrdi): Optimize cases when operands[3] is 0 or 64,
> > > >         using a single move instruction (if required).
> > > >         (define_split): Likewise split UNSPEC_PALIGNR $0 into a move.
> > > >         (define_split): Likewise split UNSPEC_PALIGNR $64 into a move.
> > > >
> > > > gcc/testsuite/ChangeLog
> > > >         * gcc.target/i386/ssse3-palignr-2.c: New test case.
> > > >
> > > >
> > > > Thanks in advance,
> > > > Roger
> > > > --
> > > >
> > >
> > > +(define_split
> > > +  [(set (match_operand:DI 0 "register_operand")  (unspec:DI
> > > +[(match_operand:DI 1 "register_operand")
> > > +    (match_operand:DI 2 "register_mmxmem_operand")
> > > +    (const_int 0)]
> > > +   UNSPEC_PALIGNR))]
> > > +  ""
> > > +  [(set (match_dup 0) (match_dup 2))])
> > > +
> > > +(define_split
> > > +  [(set (match_operand:DI 0 "register_operand")  (unspec:DI
> > > +[(match_operand:DI 1 "register_operand")
> > > +    (match_operand:DI 2 "register_mmxmem_operand")
> > > +    (const_int 64)]
> > > +   UNSPEC_PALIGNR))]
> > > +  ""
> > > +  [(set (match_dup 0) (match_dup 1))])
> > > +
> > > define_split is assumed to be splitted to 2(or more) insns, hence
> > > pass_combine will only try define_split if the number of merged insns
> > > is greater than 2.
> > > For palignr, i think most time there would be only 2 merged
> > > insns(constant propagation), so better to change them as pre_reload
> > > splitter.
> > > (.i.e. (define_insn_and_split "*avx512bw_permvar_truncv16siv16hi_1").
> > I think you can just merge 2 define_split into define_insn_and_split
> > "ssse3_palignrdi" by relaxing split condition as
> >
> > -  "TARGET_SSSE3 && reload_completed
> > -   && SSE_REGNO_P (REGNO (operands[0]))"
> > +  "(TARGET_SSSE3 && reload_completed
> > +   && SSE_REGNO_P (REGNO (operands[0])))
> > +   || INVAL(operands[3]) == 0
> > +   || INVAL(operands[3]) == 64"
> >
> > and you have already handled them by
> >
> > +  if (operands[3] == const0_rtx)
> > +    {
> > +      if (!rtx_equal_p (operands[0], operands[2])) emit_move_insn
> > + (operands[0], operands[2]);
> > +      else
> > + emit_note (NOTE_INSN_DELETED);
> > +      DONE;
> > +    }
> > +  else if (INTVAL (operands[3]) == 64)
> > +    {
> > +      if (!rtx_equal_p (operands[0], operands[1])) emit_move_insn
> > + (operands[0], operands[1]);
> > +      else
> > + emit_note (NOTE_INSN_DELETED);
> > +      DONE;
> > +    }
> > +
> >
> > >
> >
> > >
> > > --
> > > BR,
> > > Hongtao
> >
> >
> >
> > --
> > BR,
> > Hongtao


-- 
BR,
Hongtao