From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qv1-xf31.google.com (mail-qv1-xf31.google.com [IPv6:2607:f8b0:4864:20::f31]) by sourceware.org (Postfix) with ESMTPS id C783838313AC for ; Fri, 16 Jun 2023 07:33:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C783838313AC Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-qv1-xf31.google.com with SMTP id 6a1803df08f44-62fd844ad58so4926186d6.2 for ; Fri, 16 Jun 2023 00:33:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1686900782; x=1689492782; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=fx6/+aytxDvWgsf3UNInHUcG51hQ1t0lZL6fsHo522I=; b=izrxOSL5yXdvefw+qmuJXa480l9qMzJZc64AoPULAFg79dC6K+Mq6LOaTcD6aZ6kgR 6rqJnZ/qDW1omZxgBaViv/djk8QQsuzVgXAKwsDd5IPjAgTxvbnu28nExsBwcUjlvY6z nilYl1yRVbTSVyaL9kaUK6tdC4VODopOr3D3jvfNZW/hyAGCE6/ERGJx+LfPjbd3qOPn orTI+Gbp6nBEiMvcGWBYmaIBCPrXztC0Pzl28rB+VqGmDy9zJTfabAvr9G3uXnUqJ6DT 7zuGJE/qv18Vw7G/Gu5h6W5/8Kzs+dsrDwThiLlQH8FBcZK6Q7TzTOrig1m/X5cdX/fN Ec2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686900782; x=1689492782; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=fx6/+aytxDvWgsf3UNInHUcG51hQ1t0lZL6fsHo522I=; b=VVnIraPTM8M0ww/KrHXWMOpFTPprXmvkXgCiar+eHUsS7HEVgujYf1JztBcFN/diiJ BDlz6Pa2/5qmLN0FfmYMZK1xfWKxrfkxjLp3hPDVcnGdEOQyhdeord/k7KJGbpT5CNL3 n6PhsE9Jzpk+QQUzShuL+QFmvSvMnIV2ofAAskS98HlO9srGbpLbu0Ds2pa4j5WhUw7H 6WzaI4avXtNa74d3aSGAA5LMeSE0Sl1O/PHUPAfBZDW8H/ix3MYqR1plQIC2eeRWfOlS 8NRInM0MMk1KU99T5kB57Ro46VHnVA5InMH+CbozB5r4oF6mIdXgBg/7Vf7NL0vHZhIK bp5A== X-Gm-Message-State: AC+VfDw1VStc0r5UpKsHRXgFnpw1jAJwrjSCqik5wmbwr5ubB6iWRWZV i/RayXsSYoEd7+C8+xf/zQy3DprMGDOkQD0a+t6PTEQcb8o= X-Google-Smtp-Source: ACHHUZ6jhdouDWJDBHgfxzr5nmWQyYrsOyHlaudewXV8CBnix8A9SLYhck1Q0vB0wZ6+Sa5MQwmU5gSvgOLnciCBp34= X-Received: by 2002:a05:6214:3008:b0:5af:9276:b59d with SMTP id ke8-20020a056214300800b005af9276b59dmr1491978qvb.18.1686900781823; Fri, 16 Jun 2023 00:33:01 -0700 (PDT) MIME-Version: 1.0 References: <20230616020958.1413585-1-hongtao.liu@intel.com> In-Reply-To: <20230616020958.1413585-1-hongtao.liu@intel.com> From: Uros Bizjak Date: Fri, 16 Jun 2023 09:32:55 +0200 Message-ID: Subject: Re: [PATCH 1/2] Reimplement packuswb/packusdw with UNSPEC_US_TRUNCATE instead of original us_truncate. To: liuhongt Cc: gcc-patches@gcc.gnu.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-8.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, Jun 16, 2023 at 4:12=E2=80=AFAM liuhongt wr= ote: > > packuswb/packusdw does unsigned saturation for signed source, but rtl > us_truncate means does unsigned saturation for unsigned source. > So for value -1, packuswb will produce 0, but us_truncate produces > 255. The patch reimplement those related patterns and functions with > UNSPEC_US_TRUNCATE instead of us_truncate. > > The patch will fix below testcase which failed after > g:921b841350c4fc298d09f6c5674663e0f4208610 added constant-folding for US_= TRUNCATE > > FAIL: gcc.target/i386/avx-vpackuswb-1.c execution test > FAIL: gcc.target/i386/avx2-vpackusdw-2.c execution test > FAIL: gcc.target/i386/avx2-vpackuswb-2.c execution test > FAIL: gcc.target/i386/sse2-packuswb-1.c execution test > > Bootstrapped and regtested on x86_64-pc-linux-gnu. > Ok for trunk? Please proofread the ChangeLog entries and comments and fix confusion with truncation / saturation in comments. OK with the above change. Thanks, Uros. > > gcc/ChangeLog: > > PR target/110235 > * config/i386/i386-expand.cc (ix86_split_mmx_pack): Use > UNSPEC_US_TRUNCATE instead of original us_truncate for > packusdw/packuswb. > * config/i386/mmx.md (mmx_packswb): Splitted to > below 2 new patterns. Just say: ...: Substitute with ... > (mmx_packsswb): New reload_completed define_insn_and_split. ...: ... this and ... > (mmx_packuswb): Ditto. ...: ... this. > (mmx_packusdw): Use UNSPEC_US_TRUNCATE instead of original > us_truncate. > (s_trunsuffix): Removed. ...: Remove code iterator. > (any_s_truncate): Removed. ...: Ditto. > * config/i386/sse.md (_packuswb): Use > UNSPEC_US_TRUNCATE instead of original us_truncate. > (_packusdw): Ditto. > * config/i386/i386.md (UNSPEC_US_TRUNCATE): New unspec_c_enum. > --- > gcc/config/i386/i386-expand.cc | 20 ++++++++++++---- > gcc/config/i386/i386.md | 4 ++++ > gcc/config/i386/mmx.md | 43 ++++++++++++++++++++++------------ > gcc/config/i386/sse.md | 20 ++++++++-------- > 4 files changed, 57 insertions(+), 30 deletions(-) > > diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand= .cc > index def060ab562..35e2740f9b6 100644 > --- a/gcc/config/i386/i386-expand.cc > +++ b/gcc/config/i386/i386-expand.cc > @@ -1019,6 +1019,7 @@ ix86_split_mmx_pack (rtx operands[], enum rtx_code = code) > rtx op0 =3D operands[0]; > rtx op1 =3D operands[1]; > rtx op2 =3D operands[2]; > + rtx src; > > machine_mode dmode =3D GET_MODE (op0); > machine_mode smode =3D GET_MODE (op1); > @@ -1042,11 +1043,20 @@ ix86_split_mmx_pack (rtx operands[], enum rtx_cod= e code) > op1 =3D lowpart_subreg (sse_smode, op1, GET_MODE (op1)); > op2 =3D lowpart_subreg (sse_smode, op2, GET_MODE (op2)); > > - op1 =3D gen_rtx_fmt_e (code, sse_half_dmode, op1); > - op2 =3D gen_rtx_fmt_e (code, sse_half_dmode, op2); > - rtx insn =3D gen_rtx_SET (dest, gen_rtx_VEC_CONCAT (sse_dmode, > - op1, op2)); > - emit_insn (insn); > + /* For packusdw/packuswb, it does unsigned saturation for > + signed source which is different for rtl US_TRUNCATE. */ paskusdw/packuswb does unsigned saturation of a signed source which is different from generic us_truncate RTX. > + if (code =3D=3D US_TRUNCATE) > + src =3D gen_rtx_UNSPEC (sse_dmode, > + gen_rtvec (2, op1, op2), > + UNSPEC_US_TRUNCATE); > + else > + { > + op1 =3D gen_rtx_fmt_e (code, sse_half_dmode, op1); > + op2 =3D gen_rtx_fmt_e (code, sse_half_dmode, op2); > + src =3D gen_rtx_VEC_CONCAT (sse_dmode, op1, op2); > + } > + > + emit_move_insn (dest, src); > > ix86_move_vector_high_sse_to_mmx (op0); > } > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md > index 0929115ed4d..070a84d8af9 100644 > --- a/gcc/config/i386/i386.md > +++ b/gcc/config/i386/i386.md > @@ -129,6 +129,10 @@ (define_c_enum "unspec" [ > UNSPEC_RSQRT > UNSPEC_PSADBW > > + ;; US_TRUNCATE this is different from rtl us_truncate, > + ;; it does unsigned truncation for signed source. Different from generic us_truncate RTX as it does unsigned saturation of signed source. > + UNSPEC_US_TRUNCATE > + > ;; For AVX/AVX512F support > UNSPEC_SCALEF > UNSPEC_PCMP > diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md > index 6fbe3909c8b..315eb4193c4 100644 > --- a/gcc/config/i386/mmx.md > +++ b/gcc/config/i386/mmx.md > @@ -3337,27 +3337,41 @@ (define_split > ;; > ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; > > -;; Used in signed and unsigned truncations with saturation. > -(define_code_iterator any_s_truncate [ss_truncate us_truncate]) > -;; Instruction suffix for truncations with saturation. > -(define_code_attr s_trunsuffix [(ss_truncate "s") (us_truncate "u")]) > - > -(define_insn_and_split "mmx_packswb" > +(define_insn_and_split "mmx_packsswb" > [(set (match_operand:V8QI 0 "register_operand" "=3Dy,x,Yw") > (vec_concat:V8QI > - (any_s_truncate:V4QI > + (ss_truncate:V4QI > (match_operand:V4HI 1 "register_operand" "0,0,Yw")) > - (any_s_truncate:V4QI > + (ss_truncate:V4QI > (match_operand:V4HI 2 "register_mmxmem_operand" "ym,x,Yw"))))= ] > "TARGET_MMX || TARGET_MMX_WITH_SSE" > "@ > - packswb\t{%2, %0|%0, %2} > + packsswb\t{%2, %0|%0, %2} > + # > + #" > + "&& reload_completed > + && SSE_REGNO_P (REGNO (operands[0]))" > + [(const_int 0)] > + "ix86_split_mmx_pack (operands, SS_TRUNCATE); DONE;" > + [(set_attr "mmx_isa" "native,sse_noavx,avx") > + (set_attr "type" "mmxshft,sselog,sselog") > + (set_attr "mode" "DI,TI,TI")]) > + This instruction does unsigned saturation of signed source and is different from generic us_truncate RTX. > +(define_insn_and_split "mmx_packuswb" > + [(set (match_operand:V8QI 0 "register_operand" "=3Dy,x,Yw") > + (unspec:V8QI > + [(match_operand:V4HI 1 "register_operand" "0,0,Yw") > + (match_operand:V4HI 2 "register_mmxmem_operand" "ym,x,Yw")] > + UNSPEC_US_TRUNCATE))] > + "TARGET_MMX || TARGET_MMX_WITH_SSE" > + "@ > + packuswb\t{%2, %0|%0, %2} > # > #" > "&& reload_completed > && SSE_REGNO_P (REGNO (operands[0]))" > [(const_int 0)] > - "ix86_split_mmx_pack (operands, ); DONE;" > + "ix86_split_mmx_pack (operands, US_TRUNCATE); DONE;" > [(set_attr "mmx_isa" "native,sse_noavx,avx") > (set_attr "type" "mmxshft,sselog,sselog") > (set_attr "mode" "DI,TI,TI")]) > @@ -3384,11 +3398,10 @@ (define_insn_and_split "mmx_packssdw" > > (define_insn_and_split "mmx_packusdw" > [(set (match_operand:V4HI 0 "register_operand" "=3DYr,*x,Yw") > - (vec_concat:V4HI > - (us_truncate:V2HI > - (match_operand:V2SI 1 "register_operand" "0,0,Yw")) > - (us_truncate:V2HI > - (match_operand:V2SI 2 "register_operand" "Yr,*x,Yw"))))] > + (unspec:V4HI > + [(match_operand:V2SI 1 "register_operand" "0,0,Yw") > + (match_operand:V2SI 2 "register_operand" "Yr,*x,Yw")] > + UNSPEC_US_TRUNCATE))] > "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE" > "#" > "&& reload_completed" > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > index 7d4b4ec8df5..83e3f534fd2 100644 > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -17796,13 +17796,14 @@ (define_insn "_packssdw" > (set_attr "prefix" "orig,") > (set_attr "mode" "")]) > > +;; This is different from rtl unsigned saturation, the instruction does > +;; unsigned saturation for signed value. This instruction does unsigned saturation of signed source and is different from generic us_truncate RTX. > (define_insn "_packuswb" > [(set (match_operand:VI1_AVX512 0 "register_operand" "=3Dx,") > - (vec_concat:VI1_AVX512 > - (us_truncate: > - (match_operand: 1 "register_operand" "0,= ")) > - (us_truncate: > - (match_operand: 2 "vector_operand" "xBm,= m"))))] > + (unspec:VI1_AVX512 > + [(match_operand: 1 "register_operand" "0,"= ) > + (match_operand: 2 "vector_operand" "xBm,m= ")] > + UNSPEC_US_TRUNCATE))] > "TARGET_SSE2 && && " > "@ > packuswb\t{%2, %0|%0, %2} > @@ -21889,11 +21890,10 @@ (define_insn "_mpsadbw" > > (define_insn "_packusdw" > [(set (match_operand:VI2_AVX2 0 "register_operand" "=3DYr,*x,") > - (vec_concat:VI2_AVX2 > - (us_truncate: > - (match_operand: 1 "register_operand" "0,0,")) > - (us_truncate: > - (match_operand: 2 "vector_operand" "YrBm,*xBm,= m"))))] > + (unspec:VI2_AVX2 > + [(match_operand: 1 "register_operand" "0,0,") > + (match_operand: 2 "vector_operand" "YrBm,*xBm,<= v_Yw>m")] > + UNSPEC_US_TRUNCATE))] > "TARGET_SSE4_1 && && " > "@ > packusdw\t{%2, %0|%0, %2} > -- > 2.39.1.388.g2fc9e9ca3c >