From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qv1-xf32.google.com (mail-qv1-xf32.google.com [IPv6:2607:f8b0:4864:20::f32]) by sourceware.org (Postfix) with ESMTPS id 3C6023858000 for ; Thu, 15 Jun 2023 07:07:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3C6023858000 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-qv1-xf32.google.com with SMTP id 6a1803df08f44-62fe192f7d3so13062596d6.3 for ; Thu, 15 Jun 2023 00:07:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1686812832; x=1689404832; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=brpBuc4Aq8B/pw1sb+R0hYifCoYlb6iMAsaMI04uOOg=; b=UtwBMLNo5butmtZxUxCegWa7yTfrZY005fhrJH5hZV/Nouonz+aOKIQ8nNuM0cdDqu pxUYtyZ756b1GAfCyDDSsw2prL/oE8eXrE4b7qlR23v4P9VQD2zKaTBwIqMbM/lvo30w lC3v+ql3Qs2TI2+R+dAH9HoDoYw1euiMzx3hW+gIhr3eZ8q6ReRbLx2/8MjK3pPs9lVj XzHYN1zQUUJiX8qCS+SE32T8tfBRefhbxbl5eTpSyhjElhuUED/BvqwUrE/eCKKJ1djB jZswI+8tCaelH1Yc/WHhPbHtY+nepNPUEDVaDhRh8SNkSCkqW8grMW6Cq8nscT0zmH3z k68g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686812832; x=1689404832; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=brpBuc4Aq8B/pw1sb+R0hYifCoYlb6iMAsaMI04uOOg=; b=fQrdB1LCHNQFY43mIYF/GICOHTerDEL/ylG9tK1ZmGrc2kuusOKJeSI4G+DLvsRc9n sSWLfKU9KbfjBTB62WMR154pPM1PPM8Sy7I/OUmd64YS4iqpMJ2q9bmZM2GJpMAr76Lu mpfGvShueV4VVZNPXNMo2Lp+CCUTK8q4+XFWOgrtzoLo3ZiR5zDxkM0GS6QHksenUg3I L/JrTPhKl0g4gP/31aT9IocSZjTpNFiSyYailybzv5l7G3dgR/TQi/JywqqFLRCLfsmS qYH+qFHosN2n59sxEJSjca+rxLpYeGKHcryPyeO2/msFYYWVSc2VZh9AR79ZGJaaQFid ag3g== X-Gm-Message-State: AC+VfDzGaHtQIY2cG0GEOXAovTBwp0efjvvFQA3dS7KAHZJGWkSF5VL4 5qcU96nGgSPQvZ8sH4MjR5FC8t5sLLXxxKUrH0w= X-Google-Smtp-Source: ACHHUZ41KB9zaz4rT/rbiutCD1ouwk4oGrjIu4sTnLD+qUqTEUtdYpcllqXJl0UEiaoIo1/Mk7ZGr0wn928x4ihGWbs= X-Received: by 2002:a05:6214:76b:b0:5f7:a9e1:bbbf with SMTP id f11-20020a056214076b00b005f7a9e1bbbfmr21383345qvz.44.1686812832484; Thu, 15 Jun 2023 00:07:12 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Uros Bizjak Date: Thu, 15 Jun 2023 09:07:01 +0200 Message-ID: Subject: Re: [PATCH] x86: correct and improve "*vec_dupv2di" To: Jan Beulich Cc: "gcc-patches@gcc.gnu.org" , Hongtao Liu , Kirill Yukhin Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, Jun 15, 2023 at 8:03=E2=80=AFAM Jan Beulich via Gcc-patches wrote: > > The input constraint for the %vmovddup alternative was wrong, as the > upper 16 XMM registers require AVX512VL to be used with this insn. To > compensate, introduce a new alternative permitting all 32 registers, by > broadcasting to the full 512 bits in that case if AVX512VL is not > available. > > gcc/ > > * config/i386/sse.md (vec_dupv2di): Correct %vmovddup input > constraint. Add new AVX512F alternative. > --- > Strictly speaking the new alternative could be enabled from AVX2 > onwards, but vmovddup can frequently be a shorter encoding (VEX2 > vs VEX3). > > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -25851,19 +25851,39 @@ > (symbol_ref "true")))]) > > (define_insn "*vec_dupv2di" > - [(set (match_operand:V2DI 0 "register_operand" "=3Dx,v,v,x") > + [(set (match_operand:V2DI 0 "register_operand" "=3Dx,v,v,v,x") > (vec_duplicate:V2DI > - (match_operand:DI 1 "nonimmediate_operand" " 0,Yv,vm,0")))] > + (match_operand:DI 1 "nonimmediate_operand" " 0,Yv,vm,Yvm,0")))] > "TARGET_SSE" > - "@ > - punpcklqdq\t%0, %0 > - vpunpcklqdq\t{%d1, %0|%0, %d1} > - %vmovddup\t{%1, %0|%0, %1} > - movlhps\t%0, %0" > - [(set_attr "isa" "sse2_noavx,avx,sse3,noavx") > - (set_attr "type" "sselog1,sselog1,sselog1,ssemov") > - (set_attr "prefix" "orig,maybe_evex,maybe_vex,orig") > - (set_attr "mode" "TI,TI,DF,V4SF")]) > +{ > + switch (which_alternative) > + { > + case 0: > + return "punpcklqdq\t%0, %0"; > + case 1: > + return "vpunpcklqdq\t{%d1, %0|%0, %d1}"; > + case 2: > + if (TARGET_AVX512VL) > + return "vpbroadcastq\t{%1, %0|%0, %1}"; > + return "vpbroadcastq\t{%1, %g0|%g0, %1}"; You can use * return TARGET_AVX512VL ? \"vpbroadcastq\t{%1, %0|%0, %1}\" : \"vpbroadcastq\t{%1, %g0|%g0, %1}\"; directly in a multi-output insn template to avoid the above C code. See e.g. sse2_cvtpd2pi for an example. Uros. > + case 3: > + return "%vmovddup\t{%1, %0|%0, %1}"; > + case 4: > + return "movlhps\t%0, %0"; > + default: > + gcc_unreachable (); > + } > +} > + [(set_attr "isa" "sse2_noavx,avx,avx512f,sse3,noavx") > + (set_attr "type" "sselog1,sselog1,ssemov,sselog1,ssemov") > + (set_attr "prefix" "orig,maybe_evex,evex,maybe_vex,orig") > + (set_attr "mode" "TI,TI,TI,DF,V4SF") > + (set (attr "enabled") > + (if_then_else > + (eq_attr "alternative" "2") > + (symbol_ref "TARGET_AVX512VL > + || (TARGET_AVX512F && !TARGET_PREFER_AVX256)") > + (const_string "*")))]) > > (define_insn "avx2_vbroadcasti128_" > [(set (match_operand:VI_256 0 "register_operand" "=3Dx,v,v")