From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yw1-x1133.google.com (mail-yw1-x1133.google.com [IPv6:2607:f8b0:4864:20::1133]) by sourceware.org (Postfix) with ESMTPS id C661A3858D38 for ; Tue, 20 Jun 2023 08:33:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C661A3858D38 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-yw1-x1133.google.com with SMTP id 00721157ae682-5701810884aso39978057b3.0 for ; Tue, 20 Jun 2023 01:33:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1687250038; x=1689842038; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=bZCQxNZS2JSyauUXO2IWxoT4vwiN3DFIkmepDJlZUdY=; b=HJuHHhTazGHzHOSNAZ8i+E/11ytWsOqtcL/oUvLEEKhaH2MpIXtp3nFrb4UJQ03ejZ QgjpN7XB5pB9doic04q/UOSvQwETCh0xhNAzb0PYbNBI0T7HJi2j9LMiriR/NDe+4FeS l3PUOZOz3B+sLs/yOBtbaZT/A/pMdmVs7KljGnJMjB20dWsJPFJc7ongafq0dFUThoVJ SBoonAp4fG9E+V4CHvHHne1KJwWXbZW7FIrsZEhuZNM7iGEJeF1zbbNL7ftq4Sng6s4i IbHd3evo3k09OvXfQiiehPbCCa/r/JNwFpFvJwbETfgBlaS0EjNaYhz52TAIyFEPFtxv RK7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687250038; x=1689842038; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=bZCQxNZS2JSyauUXO2IWxoT4vwiN3DFIkmepDJlZUdY=; b=HbfuFT30ry+s2+IMDg67UyaBE8Uf9wuOqgGwrVOx//mnVUHT09gC6WiioUad7hPpX4 Ujt8hpip3oAr9Dwoej+PqDK2Jp3oUs9gEI34upAyfLfOyot8a62QF46hD/u0CW4fAGfC aXG4jilcg/mi/aH+lhLR7BAa4Vv3pAaRcVxvjptrisuQDnPSr8WH/+8FQ04wPbQrQf6x pbXEQJn8AyG9Bm2st9+J9FnX8gwnDHdWPpNbVqF/3SJD365pVlBsM8Vu6VoNpbORrkIy wNldVWYWU1nZJXL/mldB3Mb0zeDGMKGDzXqLXssCiWqZSmyUYO1NR/gsT6R6hMebgvN8 NPTQ== X-Gm-Message-State: AC+VfDwRgTaihv2KJ89pZ7CHSpw7/syKu1vQihEHOKusdOL3rAG5LnK3 nlg+w8nHRCscI2p9m7NgDuR7xlJj5CLHAkludLFooEoFSzLRHQ== X-Google-Smtp-Source: ACHHUZ71TBsj8TWI2wmyk8JcXWRIJhxsyuVFw0Hsw4B2WEd/aLwzsNLyV14yfI+ME+9xJ/uEKNH1HdOaykGVKaS2tsA= X-Received: by 2002:a25:c54e:0:b0:bc6:9479:c806 with SMTP id v75-20020a25c54e000000b00bc69479c806mr3041109ybe.49.1687250038006; Tue, 20 Jun 2023 01:33:58 -0700 (PDT) MIME-Version: 1.0 References: <169ca252-3828-b466-4d47-a8fe720ec4ef@suse.com> In-Reply-To: <169ca252-3828-b466-4d47-a8fe720ec4ef@suse.com> From: Hongtao Liu Date: Tue, 20 Jun 2023 16:33:46 +0800 Message-ID: Subject: Re: [PATCH v3] x86: make VPTERNLOG* usable on less than 512-bit operands with just AVX512F To: Jan Beulich Cc: "gcc-patches@gcc.gnu.org" , Kirill Yukhin , Hongtao Liu Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, Jun 20, 2023 at 3:07=E2=80=AFPM Jan Beulich via Gcc-patches wrote: > > There's no reason to constrain this to AVX512VL, unless instructed so by > -mprefer-vector-width=3D, as the wider operation is unusable for more > narrow operands only when the possible memory source is a non-broadcast > one. This way even the scalar copysign3 can benefit from the > operation being a single-insn one (leaving aside moves which the > compiler decides to insert for unclear reasons, and leaving aside the > fact that bcst_mem_operand() is too restrictive for broadcast to be > embedded right into VPTERNLOG*). > > While there also bring *_vternlog_all's in sync with that > of the three splitters. > > Along with this also request value duplication in > ix86_expand_copysign()'s call to ix86_build_signbit_mask(), eliminating > excess space allocation in .rodata.*, filled with zeros which are never > read. > > gcc/ > > * config/i386/i386-expand.cc (ix86_expand_copysign): Request > value duplication by ix86_build_signbit_mask() when AVX512F and > not HFmode. > * config/i386/sse.md (*_vternlog_all): Convert to > 2-alternative form. Adjust "mode" attribute. Add "enabled" > attribute. > (*_vpternlog_1): Also permit when TARGET_AVX512F > && !TARGET_PREFER_AVX256. > (*_vpternlog_2): Likewise. > (*_vpternlog_3): Likewise. > > gcc/testsuite/ > * gcc.target/i386/avx512f-copysign.c: New test. > --- > I haven't been able to find documentation on the dejagnu(?) regex syntax > (?:...). With ordinary (...) failing (producing twice as many matches), > I could only derive this from other scan-assembler patterns. > > I guess the underlying pattern, going along the lines of what > one_cmpl2 uses, can be applied elsewhere > as well. That should be guarded with !TARGET_PREFER_AVX256, let's handle that in a separate patch. > > HFmode could use embedded broadcast too for copysign and alike, but that > would need to be V2HF -> V8HF (for which I don't think there are any > existing patterns). > --- > v3: Adjust insn conditional as well. Add testcase. > v2: Respect -mprefer-vector-width=3D. > > --- a/gcc/config/i386/i386-expand.cc > +++ b/gcc/config/i386/i386-expand.cc > @@ -2266,7 +2266,7 @@ ix86_expand_copysign (rtx operands[]) > else > dest =3D NULL_RTX; > op1 =3D lowpart_subreg (vmode, force_reg (mode, operands[2]), mode); > - mask =3D ix86_build_signbit_mask (vmode, 0, 0); > + mask =3D ix86_build_signbit_mask (vmode, TARGET_AVX512F && mode !=3D H= Fmode, 0); > > if (CONST_DOUBLE_P (operands[1])) > { > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -12399,22 +12399,35 @@ > (set_attr "mode" "")]) > > (define_insn "*_vternlog_all" > - [(set (match_operand:V 0 "register_operand" "=3Dv") > + [(set (match_operand:V 0 "register_operand" "=3Dv,v") > (unspec:V > - [(match_operand:V 1 "register_operand" "0") > - (match_operand:V 2 "register_operand" "v") > - (match_operand:V 3 "bcst_vector_operand" "vmBr") > + [(match_operand:V 1 "register_operand" "0,0") > + (match_operand:V 2 "register_operand" "v,v") > + (match_operand:V 3 "bcst_vector_operand" "vBr,m") > (match_operand:SI 4 "const_0_to_255_operand")] > UNSPEC_VTERNLOG))] > - "TARGET_AVX512F > + "( =3D=3D 64 || TARGET_AVX512VL > + || (TARGET_AVX512F && !TARGET_PREFER_AVX256)) > /* Disallow embeded broadcast for vector HFmode since > it's not real AVX512FP16 instruction. */ > && (GET_MODE_SIZE (GET_MODE_INNER (mode)) >=3D 4 > || GET_CODE (operands[3]) !=3D VEC_DUPLICATE)" > - "vpternlog\t{%4, %3, %2, %0|%0, %2, %3, %4}" > +{ > + if (TARGET_AVX512VL) > + return "vpternlog\t{%4, %3, %2, %0|%0, %2, %3, %4}"; > + else > + return "vpternlog\t{%4, %g3, %g2, %g0|%g0, %g2, %g3, = %4}"; > +} > [(set_attr "type" "sselog") > (set_attr "prefix" "evex") > - (set_attr "mode" "")]) > + (set (attr "mode") > + (if_then_else (match_test "TARGET_AVX512VL") > + (const_string "") > + (const_string "XI"))) > + (set (attr "enabled") > + (if_then_else (eq_attr "alternative" "1") > + (symbol_ref " =3D=3D 64 || TARGET_AVX512= VL") > + (const_string "*")))]) > > ;; There must be lots of other combinations like > ;; > @@ -12443,7 +12456,8 @@ > (any_logic2:V > (match_operand:V 3 "regmem_or_bitnot_regmem_operand") > (match_operand:V 4 "regmem_or_bitnot_regmem_operand"))))] > - "( =3D=3D 64 || TARGET_AVX512VL) > + "( =3D=3D 64 || TARGET_AVX512VL > + || (TARGET_AVX512F && !TARGET_PREFER_AVX256)) > && ix86_pre_reload_split () > && (rtx_equal_p (STRIP_UNARY (operands[1]), > STRIP_UNARY (operands[4])) > @@ -12527,7 +12541,8 @@ > (match_operand:V 2 "regmem_or_bitnot_regmem_operand")) > (match_operand:V 3 "regmem_or_bitnot_regmem_operand")) > (match_operand:V 4 "regmem_or_bitnot_regmem_operand")))] > - "( =3D=3D 64 || TARGET_AVX512VL) > + "( =3D=3D 64 || TARGET_AVX512VL > + || (TARGET_AVX512F && !TARGET_PREFER_AVX256)) > && ix86_pre_reload_split () > && (rtx_equal_p (STRIP_UNARY (operands[1]), > STRIP_UNARY (operands[4])) > @@ -12610,7 +12625,8 @@ > (match_operand:V 1 "regmem_or_bitnot_regmem_operand") > (match_operand:V 2 "regmem_or_bitnot_regmem_operand")) > (match_operand:V 3 "regmem_or_bitnot_regmem_operand")))] > - "( =3D=3D 64 || TARGET_AVX512VL) > + "( =3D=3D 64 || TARGET_AVX512VL > + || (TARGET_AVX512F && !TARGET_PREFER_AVX256)) > && ix86_pre_reload_split ()" > "#" > "&& 1" > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/avx512f-copysign.c > @@ -0,0 +1,32 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512f -mno-avx512vl -O2" } */ Please explicitly add -mprefer-vector-width=3D512, our tester will also test unix{-m32 \-march=3Dcascadelake,\ -march=3Dcascadelake} which set the - mprefer-vector-width=3D256, -mprefer-vector-width=3D512 in dg-options can overwrite that. Others LGTM. > +/* { dg-final { scan-assembler-times "vpternlog\[dq\]\[ \\t\]+\\\$(?:216= |228|0xd8|0xe4)," 5 } } */ > + > +double cs_df (double x, double y) > +{ > + return __builtin_copysign (x, y); > +} > + > +float cs_sf (float x, float y) > +{ > + return __builtin_copysignf (x, y); > +} > + > +typedef double __attribute__ ((vector_size (16))) v2df; > +typedef double __attribute__ ((vector_size (32))) v4df; > +typedef double __attribute__ ((vector_size (64))) v8df; > + > +v2df cs_v2df (v2df x, v2df y) > +{ > + return __builtin_ia32_copysignpd (x, y); > +} > + > +v4df cs_v4df (v4df x, v4df y) > +{ > + return __builtin_ia32_copysignpd256 (x, y); > +} > + > +v8df cs_v8df (v8df x, v8df y) > +{ > + return __builtin_ia32_copysignpd512 (x, y); > +} --=20 BR, Hongtao