From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yb1-xb34.google.com (mail-yb1-xb34.google.com [IPv6:2607:f8b0:4864:20::b34]) by sourceware.org (Postfix) with ESMTPS id 174A13858C1F for ; Thu, 15 Jun 2023 05:23:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 174A13858C1F Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-yb1-xb34.google.com with SMTP id 3f1490d57ef6-bc40d4145feso1300270276.1 for ; Wed, 14 Jun 2023 22:23:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1686806597; x=1689398597; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=9tShBdHjOhcdiY7flrt6E1ogdgPF4v+z61QWLh+kKKQ=; b=G/sybtQKcbpzC3w5s6MOv//HvJdwJFR1cLO6Q1HfJrxiieA1AMNTc9t7KcGAoXc2P/ 3E5HFVH7Wshsl8JZaBJh+bynurktUysTifP4fUKiEeCJ1kZDWgITCkvXG82x9WMyp3d3 7db2paIONmYD4ou03hOefmdwaOLkP9jLd3ztXUS9NvKoq80fwkZNP+M1XqcyIjCH2VH0 WYlhiYZFyyeIHVmJ54DWd+xUex3zJwiPLis5F916Tbi8GZ6ak3jaKC4col+o1oVtN0YB TP5osnNkyO9g+TzxG8skAPbEefJ9PIlI0iD16e5XAj1AY9ifJKN9IM95CVBFMnlI2yW9 DqGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686806597; x=1689398597; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9tShBdHjOhcdiY7flrt6E1ogdgPF4v+z61QWLh+kKKQ=; b=g3mPZ6QZSCR6ajPRYXWBv9y9te7OtGKaCjL01y5bdebmiFydgfE13NwxpbpWRNUkwo utxi/qFnta6EqeW1DoQY4ki7FdfQbPfIdUFeXm4VBirPh/t1lnG7N8MHkDu6Z/fhL2mX Xasu2PgYqxXdgXHF99OpSzMGcNiKoefLEmtTWCdsYVvY7EKVJ09R2OGN3spoly4kID/n akgpk9zuysh4CxCQ8Xb9WvE2q27UYYLOKUGsjhIi1T4R7+2msTSlIy+sN/5Zrt5ptSXZ vjmo3v7+eTuU3KHwE90KCfZGFDhPZpocLDqsgZIfMNysvaV7to5Sx1Zrlglr4GN/WDMO U5jg== X-Gm-Message-State: AC+VfDxohZXUy6H6Fu53xIqpehj8hIYkw0lOhF5jUXECDiO6AC8BgVPT l3SgBx5n4F+fE+ZQRi5bJPFTBNMuQM8Et4ViKHk= X-Google-Smtp-Source: ACHHUZ6C4H9vwmhkobSEZP+xAzfj6UjSdtcf0CC8Xjyp3yLaUDQicVeRR+5yiEH5+3QbnRgTfbi1ytzfSd8SK1eJjUc= X-Received: by 2002:a81:4645:0:b0:570:acd:4945 with SMTP id t66-20020a814645000000b005700acd4945mr3657669ywa.19.1686806597315; Wed, 14 Jun 2023 22:23:17 -0700 (PDT) MIME-Version: 1.0 References: <48be2ae1-66d7-f87f-5997-b5307bd25fbc@suse.com> <9bb236f7-7864-47b0-8831-cc4ebf837b4e@suse.com> In-Reply-To: <9bb236f7-7864-47b0-8831-cc4ebf837b4e@suse.com> From: Hongtao Liu Date: Thu, 15 Jun 2023 13:23:06 +0800 Message-ID: Subject: Re: [PATCH] x86: make better use of VBROADCASTSS / VPBROADCASTD To: Jan Beulich Cc: "gcc-patches@gcc.gnu.org" , Kirill Yukhin , Hongtao Liu Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, Jun 14, 2023 at 5:03=E2=80=AFPM Jan Beulich wro= te: > > On 14.06.2023 09:41, Hongtao Liu wrote: > > On Wed, Jun 14, 2023 at 1:58=E2=80=AFPM Jan Beulich via Gcc-patches > > wrote: > >> > >> ... in vec_dupv4sf / *vec_dupv4si. The respective broadcast insns are > >> never longer (yet sometimes shorter) than the corresponding VSHUFPS / > >> VPSHUFD, due to the immediate operand of the shuffle insns balancing t= he > >> need for VEX3 in the broadcast ones. When EVEX encoding is required th= e > >> broadcast insns are always shorter. > >> > >> Add two new alternatives each, one covering the AVX2 case and one > >> covering AVX512. > > I think you can just change assemble output for this first alternative > > when TARGET_AVX2, use vbroadcastss, else use vshufps since > > vbroadcastss only accept register operand when TARGET_AVX2. And no > > need to support 2 extra alternatives which doesn't make sense just > > make RA more confused about the same meaning of different > > alternatives. > > You mean by switching from "@ ..." to C code using "switch > (which_alternative)"? I can do that, sure. Yet that'll make for a > more complicated "length_immediate" attribute then. Would be nice Yes, you can also do something like (set (attr "length_immediate") (cond [(eq_attr "alternative" "0") (if_then_else (match_test "TARGET_AVX2) (const_string "") (const_string "1")) ...] > if you could confirm that this is what you want, as I may well > have misunderstood you. > > But that'll be for vec_dupv4sf only, as vec_dupv4si is subtly > different. Yes, but can we use vpbroadcastd for vec_dupv4si similarly? > > >> --- > >> I'm working from the assumption that the isa attributes to the origina= l > >> 1st and 2nd alternatives don't need further restricting (to sse2_noavx= 2 > >> or avx_noavx2 as applicable), as the new earlier alternatives cover al= l > >> operand forms already when at least AVX2 is enabled. > >> > >> Isn't prefix_extra use bogus here? What extra prefix does vbroadcastss > >> use? (Same further down in *vec_dupv4si and avx2_vbroadcasti128_ > >> and elsewhere.) > > Not sure about this part. I grep prefix_extra, seems only used by > > znver.md/znver4.md for schedule, and only for comi instructions(?the > > reservation name seems so). > > define_attr "length_vex" and define_attr "length" use it, too. > Otherwise I would have asked whether the attribute couldn't be > purged from most insns. > > My present understanding is that the attribute is wrong on > vec_dupv4sf (and hence wants dropping from there altogether), and it > should be "prefix_data16" instead on *vec_dupv4si, evaluating to 1 > only for the non-AVX pshufd case. I suspect at least the latter > would be going to far for doing it "while here" right in this patch. > Plus I think I have seen various other questionable uses of that > attribute. > > >> Is use of Yv for the source operand really necessary in *vec_dupv4si? > >> I.e. would scalar integer values be put in XMM{16...31} when AVX512VL > > Yes, You can look at ix86_hard_regno_mode_ok, EXT_REX_SSE_REGNO is > > allowed for scalar mode, but not for 128/256-bit vector modes. > > > > 20204 if (TARGET_AVX512F > > 20205 && (VALID_AVX512F_REG_OR_XI_MODE (mode) > > 20206 || VALID_AVX512F_SCALAR_MODE (mode))) > > 20207 return true; > > Okay, so I need to switch input constraints for relevant new > alternatives to Yv (I actually wonder why I did use v in > vec_dupv4sf, as it was clear to me that SFmode can be in the high > 16 xmm registers with just AVX512F). > > >> isn't enabled? If so (*movsi_internal / *movdi_internal suggest they > >> might), wouldn't *vec_dupv2di need to use Yv as well in its 3rd > >> alternative (or just m, as Yv is already covered by the 2nd one)? > > I guess xm is more suitable since we still want to allocate > > operands[1] to register when sse3_noavx. > > It didn't hit any error since for avx and above, alternative 1(2rd > > one) is always matched than alternative 2. > > I'm afraid I don't follow: With just -mavx512f the source operand > can be in, say, %xmm16 (as per your clarification above). This > would not match Yv, but it would match vm. And hence wrongly > create an AVX512VL form of vmovddup. I didn't try it out earlier, > because unlike for SFmode / DFmode I thought it's not really clear > how to get the compiler to reliably put a DImode variable in an xmm > reg, but it just occurred to me that this can be done the same way > there. And voila, > > typedef long long __attribute__((vector_size(16))) v2di; > > v2di bcst(long long ll) { > register long long x asm("xmm16") =3D ll; > > asm("nop %%esp" : "+v" (x)); > return (v2di){x, x}; > } > > compiled with just -mavx512f (and -O2) produces an AVX512VL insn. Ah, I see, indeed it's a potential bug for -mavx512f -mavx512vl I meant with -mavx512vl, _vec_dup_gpr will be matched instead of vec_dupv2di since it's put before vec_dupv2di in .md file, first will be matched first for exactly the same pattern available. So we just need to handle non-avx512 cases. Also even w/o _vec_dup_gpr, in the same pattern vec_dupv2di, with -mavx512vl, Yv(alternative 1) will always be match instead of Yvm(alternative 2) for register operand, so Yv in Yvm is never be matched that's why I said xm is more suitable (or enough). You can use -dp to check .s file which alternative will be matched. > I'll make another patch, yet for that I'm then also not sure why > you say xm would be more suitable. Yvm allows for registers (with > or without AVX, merely SSE being required) just as much as vm > does, doesn't it? And I don't think I've found any combination of > destination being v and source being xm anywhere. Plus we want to > allow for the higher registers when AVX512VL is enabled. > > Jan --=20 BR, Hongtao