From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yb1-xb35.google.com (mail-yb1-xb35.google.com [IPv6:2607:f8b0:4864:20::b35]) by sourceware.org (Postfix) with ESMTPS id 23F7B3858C50 for ; Fri, 4 Aug 2023 00:37:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 23F7B3858C50 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-yb1-xb35.google.com with SMTP id 3f1490d57ef6-d0728058651so1615921276.1 for ; Thu, 03 Aug 2023 17:37:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1691109425; x=1691714225; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=jSn2HZh9oA9bCKfYe/SNIAO1r22kgnpyaDCukWaKMzM=; b=jLZh1XRg1S+5y4mhwu6KpFzJudMLy0YaLVI/ouPMeNypgmZ9IrLtYjRUgo+LveIcyc anb6AQob8Lj0RziIhZfxASNuXt5YCZmN+jKPFbzP4UDu4NBnkLw+7sADQRh14WeG6wRf q5SLRSZG6QLzGPeR9ghOBMxwdVa79DpgRNr/lzBHSFPE7iziFveK+gxhMlHP3EM2d2+j sylrja97p8hoTHg48sl6zcPugx9UfcSwKv9fnKDKQeKMVejGc4SS1dooqEKcT2Zdy0BQ Kyop19MJm4goi3D+QyWjbG7McwrWmkODOMsFd7kNV+PsTXL23JJwFkbohebn7Ihcss47 pXQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691109425; x=1691714225; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jSn2HZh9oA9bCKfYe/SNIAO1r22kgnpyaDCukWaKMzM=; b=DB4IT2FLqpNhyKHuTgczTN015L0CBlHN3Ko6Mum21mva0rW05h75ysCgSuwuWhXEPf JkZG6LvwVvjhJDv2UAe/Eg8tMJBZu2iDePkG/12vM+sHZ8E7oYIxZYqgSEWkvJfXd+tP Z+5SsR7hW/hHQtYnzgJTSVRaOQZm0Al48rhkv9CiWx1beilJ66z1zaBRbv7VS+L2dcoB vEXW4K8ExDwuIgR74dgv8/B7CYIWJ0F01rUlDf2n2jqsYJCvhn3hne5BV94yHi9kz60q spuyNxtnB+GpxAfPIHC9OC+9yXclVNYWqci+++qsQ7reh1wLH2f1KoIXkL6LfLEHLX+e Kenw== X-Gm-Message-State: AOJu0YzcywW5nKUMXw/3wjIi4Gy91xsIe2wUwWRj32OE4sZHzCPFqhxT /xfnxj75liHNaSJ9a7o14QAlqNwMqCHQ+kr0Xrs= X-Google-Smtp-Source: AGHT+IF62CQmJrJyeAM6JGclNvGKB6zUl9s15ptCU35jPq0KSl7cwSdZ/WDUW/JBMJpvSW1YnnKKib7+nqsvEUi86sE= X-Received: by 2002:a25:37c3:0:b0:d07:f1ed:51f7 with SMTP id e186-20020a2537c3000000b00d07f1ed51f7mr86209yba.43.1691109425379; Thu, 03 Aug 2023 17:37:05 -0700 (PDT) MIME-Version: 1.0 References: <20230725181118.27484-1-simonaytes.yan@ispras.ru> In-Reply-To: From: Hongtao Liu Date: Fri, 4 Aug 2023 08:43:59 +0800 Message-ID: Subject: Re: [PATCH] Replace invariant ternlog operands To: Alexander Monakov Cc: "Liu, Hongtao" , Yan Simonaytes , "gcc-patches@gcc.gnu.org" , Uros Bizjak Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-7.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, Aug 4, 2023 at 1:30=E2=80=AFAM Alexander Monakov wrote: > > > On Thu, 27 Jul 2023, Liu, Hongtao via Gcc-patches wrote: > > > > +;; If the first and the second operands of ternlog are invariant and= ;; > > > +the third operand is memory ;; then we should add load third operand > > > +from memory to register and ;; replace first and second operands wit= h > > > +this register (define_split > > > + [(set (match_operand:V 0 "register_operand") > > > + (unspec:V > > > + [(match_operand:V 1 "register_operand") > > > + (match_operand:V 2 "register_operand") > > > + (match_operand:V 3 "memory_operand") > > > + (match_operand:SI 4 "const_0_to_255_operand")] > > > + UNSPEC_VTERNLOG))] > > > + "ternlog_invariant_operand_mask (operands) =3D=3D 3 && !reload_com= pleted" > > Maybe better with "!reload_completed && ternlog_invariant_operand_mask= (operands) =3D=3D 3" > > I made this change (in both places), plus some style TLC. Ok to apply? Ok. > > From d24304a9efd049e8db6df5ac78de8ca2d941a3c7 Mon Sep 17 00:00:00 2001 > From: Yan Simonaytes > Date: Tue, 25 Jul 2023 20:43:19 +0300 > Subject: [PATCH] Eliminate irrelevant operands of VPTERNLOG > > As mentioned in PR 110202, GCC may be presented with input where control > word of the VPTERNLOG intrinsic implies that some of its operands do not > affect the result. In that case, we can eliminate irrelevant operands > of the instruction by substituting any other operand in their place. > This removes false dependencies. > > For instance, instead of (252 =3D 0xfc =3D _MM_TERNLOG_A | _MM_TERNLOG_B) > > vpternlogq $252, %zmm2, %zmm1, %zmm0 > > emit > > vpternlogq $252, %zmm0, %zmm1, %zmm0 > > When VPTERNLOG is invariant w.r.t first and second operands, and the > third operand is memory, load memory into the output operand first, i.e. > instead of (85 =3D 0x55 =3D ~_MM_TERNLOG_C) > > vpternlogq $85, (%rdi), %zmm1, %zmm0 > > emit > > vmovdqa64 (%rdi), %zmm0 > vpternlogq $85, %zmm0, %zmm0, %zmm0 > > gcc/ChangeLog: > > * config/i386/i386-protos.h (vpternlog_irrelevant_operand_mask): > Declare. > (substitute_vpternlog_operands): Declare. > * config/i386/i386.cc (vpternlog_irrelevant_operand_mask): New > helper. > (substitute_vpternlog_operands): New function. Use them... > * config/i386/sse.md: ... here in new VPTERNLOG define_splits. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/invariant-ternlog-1.c: New test. > * gcc.target/i386/invariant-ternlog-2.c: New test. > --- > gcc/config/i386/i386-protos.h | 3 ++ > gcc/config/i386/i386.cc | 43 +++++++++++++++++++ > gcc/config/i386/sse.md | 42 ++++++++++++++++++ > .../gcc.target/i386/invariant-ternlog-1.c | 21 +++++++++ > .../gcc.target/i386/invariant-ternlog-2.c | 12 ++++++ > 5 files changed, 121 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/i386/invariant-ternlog-1.c > create mode 100644 gcc/testsuite/gcc.target/i386/invariant-ternlog-2.c > > diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.= h > index 27fe73ca65..12e6ff0ebc 100644 > --- a/gcc/config/i386/i386-protos.h > +++ b/gcc/config/i386/i386-protos.h > @@ -70,6 +70,9 @@ extern machine_mode ix86_cc_mode (enum rtx_code, rtx, r= tx); > extern int avx_vpermilp_parallel (rtx par, machine_mode mode); > extern int avx_vperm2f128_parallel (rtx par, machine_mode mode); > > +extern int vpternlog_irrelevant_operand_mask (rtx[]); > +extern void substitute_vpternlog_operands (rtx[]); > + > extern bool ix86_expand_strlen (rtx, rtx, rtx, rtx); > extern bool ix86_expand_set_or_cpymem (rtx, rtx, rtx, rtx, rtx, rtx, > rtx, rtx, rtx, rtx, bool); > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > index 32851a514a..9a7c1135a0 100644 > --- a/gcc/config/i386/i386.cc > +++ b/gcc/config/i386/i386.cc > @@ -19420,6 +19420,49 @@ avx_vperm2f128_parallel (rtx par, machine_mode m= ode) > return mask + 1; > } > > +/* Return a mask of VPTERNLOG operands that do not affect output. */ > + > +int > +vpternlog_irrelevant_operand_mask (rtx *operands) > +{ > + int mask =3D 0; > + int imm8 =3D XINT (operands[4], 0); > + > + if (((imm8 >> 4) & 0x0F) =3D=3D (imm8 & 0x0F)) > + mask |=3D 1; > + if (((imm8 >> 2) & 0x33) =3D=3D (imm8 & 0x33)) > + mask |=3D 2; > + if (((imm8 >> 1) & 0x55) =3D=3D (imm8 & 0x55)) > + mask |=3D 4; > + > + return mask; > +} > + > +/* Eliminate false dependencies on operands that do not affect output > + by substituting other operands of a VPTERNLOG. */ > + > +void > +substitute_vpternlog_operands (rtx *operands) > +{ > + int mask =3D vpternlog_irrelevant_operand_mask (operands); > + > + if (mask & 1) /* The first operand is irrelevant. */ > + operands[1] =3D operands[2]; > + > + if (mask & 2) /* The second operand is irrelevant. */ > + operands[2] =3D operands[1]; > + > + if (mask & 4) /* The third operand is irrelevant. */ > + operands[3] =3D operands[1]; > + else if (REG_P (operands[3])) > + { > + if (mask & 1) > + operands[1] =3D operands[3]; > + if (mask & 2) > + operands[2] =3D operands[3]; > + } > +} > + > /* Return a register priority for hard reg REGNO. */ > static int > ix86_register_priority (int hard_regno) > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > index f793258b6c..1e2ec4bedc 100644 > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -12627,6 +12627,48 @@ (define_insn "*_vternlog_all" > (symbol_ref " =3D=3D 64 || TARGET_AVX512= VL") > (const_string "*")))]) > > +;; When VPTERNLOG happens to be invariant w.r.t first and second operand= s, > +;; and the third operand is memory, eliminate false dependencies by load= ing > +;; memory into the output operand first. > +(define_split > + [(set (match_operand:V 0 "register_operand") > + (unspec:V > + [(match_operand:V 1 "register_operand") > + (match_operand:V 2 "register_operand") > + (match_operand:V 3 "memory_operand") > + (match_operand:SI 4 "const_0_to_255_operand")] > + UNSPEC_VTERNLOG))] > + "!reload_completed && vpternlog_irrelevant_operand_mask (operands) =3D= =3D 3" > + [(set (match_dup 0) > + (match_dup 3)) > + (set (match_dup 0) > + (unspec:V > + [(match_dup 0) > + (match_dup 0) > + (match_dup 0) > + (match_dup 4)] > + UNSPEC_VTERNLOG))]) > + > +;; Eliminate false dependencies when VPTERNLOG is invariant w.r.t any > +;; of input operands (except the case handled in the above split). > +(define_split > + [(set (match_operand:V 0 "register_operand") > + (unspec:V > + [(match_operand:V 1 "register_operand") > + (match_operand:V 2 "register_operand") > + (match_operand:V 3 "nonimmediate_operand") > + (match_operand:SI 4 "const_0_to_255_operand")] > + UNSPEC_VTERNLOG))] > + "!reload_completed && vpternlog_irrelevant_operand_mask (operands) != =3D 0" > + [(set (match_dup 0) > + (unspec:V > + [(match_dup 1) > + (match_dup 2) > + (match_dup 3) > + (match_dup 4)] > + UNSPEC_VTERNLOG))] > + "substitute_vpternlog_operands (operands);") > + > ;; There must be lots of other combinations like > ;; > ;; (any_logic:V > diff --git a/gcc/testsuite/gcc.target/i386/invariant-ternlog-1.c b/gcc/te= stsuite/gcc.target/i386/invariant-ternlog-1.c > new file mode 100644 > index 0000000000..21051c6bba > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/invariant-ternlog-1.c > @@ -0,0 +1,21 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512f -O2" } */ > +/* { dg-final { scan-assembler-times "vmovdqa" 4 } } */ > +/* { dg-final { scan-assembler-times {vpternlog[^\n\r]*\(%rdx\)} 2 } } *= / > + > +#include > + > +__m512i f(__m512i* a, __m512i* b, __m512i* c) > +{ > + return _mm512_ternarylogic_epi64 (a[0], b[0], c[0], ~_MM_TERNLOG_= B | ~_MM_TERNLOG_C); > +} > + > +__m512i g(__m512i* a, __m512i* b, __m512i* c) > +{ > + return _mm512_ternarylogic_epi64 (a[0], b[0], c[0], ~_MM_TERNLOG_= A | ~_MM_TERNLOG_C); > +} > + > +__m512i h(__m512i* a, __m512i* b, __m512i* c) > +{ > + return _mm512_ternarylogic_epi64 (a[0], b[0], c[0], ~_MM_TERNLOG_= A | ~_MM_TERNLOG_B); > +} > diff --git a/gcc/testsuite/gcc.target/i386/invariant-ternlog-2.c b/gcc/te= stsuite/gcc.target/i386/invariant-ternlog-2.c > new file mode 100644 > index 0000000000..d70bbb0239 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/invariant-ternlog-2.c > @@ -0,0 +1,12 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512f -O2" } */ > +/* { dg-final { scan-assembler-times "vmovdqa" 1 } } */ > +/* { dg-final { scan-assembler "vpternlog.*zmm0.*zmm0.*zmm0" } } */ > + > +#include > + > +__m512i f(__m512i* a, __m512i* b, __m512i* c) > +{ > + return _mm512_ternarylogic_epi64 (a[0], b[0], c[0], ~_MM_TERNLOG_= C); > +} > + > -- > 2.39.2 > --=20 BR, Hongtao