From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x135.google.com (mail-lf1-x135.google.com [IPv6:2a00:1450:4864:20::135]) by sourceware.org (Postfix) with ESMTPS id C25023858D37 for ; Tue, 24 Oct 2023 07:14:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C25023858D37 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org C25023858D37 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::135 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698131647; cv=none; b=Ggz8PWEJA1DHtWC9n2xZ1NlhU/qXSU3Jxthy26qCieWIEFvsQjedfbavJD6/ecQPAtrLyNOHbXsD3leT9zN8KYXE8Mk7jVmLByuMMFR/7w5fNvKC3RlvMLxwmTk68YCRsG5L1fsVZLNW5fugC3/bm4CYijzJDAiNTex/uMOG5ZM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698131647; c=relaxed/simple; bh=9bNShOjtjgUI/XG8Yz+nUHR9Gd0uF0bonGoa8b9qNBw=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=xro509BAZ2m8cQMvupCbTHPW+dkaiOcdXCL0dkYFr0eqVc+KcNmrEJyE5xo6k2nPwMPnC/ofxJ/IDhPzgKgK/6rtBOAAjiN9535RutoE8ziytUe5OIu0unbxFO4Eer/bSm7Oig1kxxinvevMVterKhPi2wzihMt9jXDYLagmGdo= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-lf1-x135.google.com with SMTP id 2adb3069b0e04-507975d34e8so6105504e87.1 for ; Tue, 24 Oct 2023 00:14:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1698131642; x=1698736442; darn=gcc.gnu.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=UEhebivIwmUHihwcfcKfUOzXrcVb1tca0UeCAKKoikI=; b=g5HfZROIT1jd7G9q9zpjVXsX/cONT7NgTKDGcc/XdjmOwS+ACVwB6Vo5mS9ojT9xBa qCbShNzIi1UXlacngOQdT7h2FnU0hbSECmrVL8ZNYS2dILLiN0BfYRufLcrSf3yXa6om FrLqe51185TRD+xBlOTgbXMJEDVvCRhNyS/a8zaydXL8Nh2MoGlDKUiD9ky7DtfOoGn6 Rsksu57t9C/VbZpwGCchUd5L4XqXLF9yhEHBcGNU99XtHM2vE6nH4bPPsq/dyZcShDnD fYaRnjSH+ZHGo89PLmniOA8kDKAaqAie0LGUsrbOBmn8Q8VrhHHiDbmdfX8Q9Qi0FdPl qMNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698131642; x=1698736442; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=UEhebivIwmUHihwcfcKfUOzXrcVb1tca0UeCAKKoikI=; b=IbsH42kGo/GOXzHmGvx9XB6eqaMD/DWsO09qieUfK2THYmhCUyttaaEJzfSpdUcoVi 3LJJU8AQvNhh5SJOoBNSQ9jWI90Lzf78cb9pUbCF3Mnxa9XiJu7pZMsqEZ9s4Tk1Curp KTsh/4KA0ysvzeTzrZ/iBcbHV2aj5JHvxEKF6bXXjjawWDXCR733YRy3o2ecVXBdyBrl JMN2wl9aVsEwDAvV5SvjCmNjS+HylSeP+hyl4Dzd1RkASWtd9W/lZlIuPTEQO0g7ePHc nE/pKfLKTWkgA0nWU6Fznlf4XCPI3qLXgfqPS9omAtnYH9DE0qzeS0SGzWu9XnXYeUrm Yqug== X-Gm-Message-State: AOJu0YzZNDRvx/34wuEYihL5oxQRa2TCs7xIU/4cIFPygFL5HX1nkZRr J7WOOsfbxCuE3S3ApbDyvQHa0w7p06ov1+iOMMzLPEbuNsE= X-Google-Smtp-Source: AGHT+IE6AIdd0T8bCjVKuKh4zENdlpYKgACSsP6CrO2Oji24kOElVuoa01VGDLuhBfoRdUCIzKhyDQ7Kbcx7SpRJ/sM= X-Received: by 2002:a05:6512:39cc:b0:503:5cd:998b with SMTP id k12-20020a05651239cc00b0050305cd998bmr9302479lfu.28.1698131641859; Tue, 24 Oct 2023 00:14:01 -0700 (PDT) MIME-Version: 1.0 References: <20231023084803.1600456-1-hongtao.liu@intel.com> In-Reply-To: From: Richard Biener Date: Tue, 24 Oct 2023 09:13:49 +0200 Message-ID: Subject: Re: [PATCH] Support vec_cmpmn/vcondmn for v2hf/v4hf. To: Hongtao Liu Cc: liuhongt , gcc-patches@gcc.gnu.org, hjl.tools@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, Oct 24, 2023 at 7:44=E2=80=AFAM Hongtao Liu wr= ote: > > On Tue, Oct 24, 2023 at 1:23=E2=80=AFPM Hongtao Liu = wrote: > > > > On Tue, Oct 24, 2023 at 10:53=E2=80=AFAM Hongtao Liu wrote: > > > > > > On Mon, Oct 23, 2023 at 8:35=E2=80=AFPM Richard Biener > > > wrote: > > > > > > > > On Mon, Oct 23, 2023 at 10:48=E2=80=AFAM liuhongt wrote: > > > > > > > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > > > > Ready push to trunk. > > > > > > > > vcond and vcondeq shouldn't be necessary if there's > > > > vcond_mask and vcmp support which is the "modern" > > > > way of handling vcond. Unless the ISA really can do > > > > compare and select with a single instruction. > > > For testcase > > > > > > typedef _Float16 __attribute__((__vector_size__ (4))) __v2hf; > > > typedef _Float16 __attribute__((__vector_size__ (8))) __v4hf; > > > > > > > > > __v4hf cf, df; > > > > > > __v4hf cfu (__v4hf c, __v4hf d) { return (c > d) ? cf : df; } > > > > > > The data_mode passes to ix86_get_mask_mode is v4hi, not v4hf since > > > > > > /* Always construct signed integer vector type. */ > > > intt =3D c_common_type_for_size > > > (GET_MODE_BITSIZE (SCALAR_TYPE_MODE (TREE_TYPE (type0))), 0); > > > if (!intt) > > > { > > > if (complain & tf_error) > > > error_at (location, "could not find an integer type " > > > "of the same size as %qT", TREE_TYPE (type0)); > > > return error_mark_node; > > > } > > > result_type =3D build_opaque_vector_type (intt, > > > TYPE_VECTOR_SUBPARTS (type0)); > > > return build_vec_cmp (resultcode, result_type, op0, op1); > > > > > > The backend can't distinguish whether it's a vector fp16 comparison o= r > > > a vector hi comparison. > > > the former requires -mavx512fp16, the latter requires -mavx512bw > > Should we pass type0 instead of result_type here? > 6335@deftypefn {Target Hook} opt_machine_mode > TARGET_VECTORIZE_GET_MASK_MODE (machine_mode @var{mode}) > 6336Return the mode to use for a vector mask that holds one boolean > 6337result for each element of vector mode @var{mode}. The returned mas= k mode > 6338can be a vector of integers (class @code{MODE_VECTOR_INT}), a vector= of > 6339booleans (class @code{MODE_VECTOR_BOOL}) or a scalar integer (class > 6340@code{MODE_INT}). Return an empty @code{opt_machine_mode} if no suc= h > 6341mask mode exists. > > Looks like it's on purpose, v2hi is exactly what we needed here. > > Then we use either kmask or v4hi for both v4hf and v4hi comparison, > but can't use v4hi for v4hi comparison, but kmask for v4hf comparison. I think it's indeed on purpose that the result of v1 < v2 is a signed integer vector type. But build_vec_cmp should not use the truth type for the result but instead = the truth type for the comparison, so diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc index 112d28fd656..01dea608980 100644 --- a/gcc/c/c-typeck.cc +++ b/gcc/c/c-typeck.cc @@ -11986,7 +11986,7 @@ build_vec_cmp (tree_code code, tree type, { tree zero_vec =3D build_zero_cst (type); tree minus_one_vec =3D build_minus_one_cst (type); - tree cmp_type =3D truth_type_for (type); + tree cmp_type =3D truth_type_for (TREE_TYPE (arg0)); tree cmp =3D build2 (code, cmp_type, arg0, arg1); return build3 (VEC_COND_EXPR, type, cmp, minus_one_vec, zero_vec); } > > > > > > > > Richard. > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > PR target/103861 > > > > > * config/i386/i386-expand.cc (ix86_expand_sse_movcc): Han= dle > > > > > V2HF/V2BF/V4HF/V4BFmode. > > > > > * config/i386/mmx.md (vec_cmpv4hfqi): New expander. > > > > > (vcondv4hf): Ditto. > > > > > (vcondv4hi): Ditto. > > > > > (vconduv4hi): Ditto. > > > > > (vcond_mask_v4hi): Ditto. > > > > > (vcond_mask_qi): Ditto. > > > > > (vec_cmpv2hfqi): Ditto. > > > > > (vcondv2hf): Ditto. > > > > > (vcondv2hi): Ditto. > > > > > (vconduv2hi): Ditto. > > > > > (vcond_mask_v2hi): Ditto. > > > > > * config/i386/sse.md (vcond): Merge this with= .. > > > > > (vcond): .. this into .. > > > > > (vcond): .. thi= s, > > > > > and extend to V8BF/V16BF/V32BFmode. > > > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > > > * g++.target/i386/part-vect-vcondhf.C: New test. > > > > > * gcc.target/i386/part-vect-vec_cmphf.c: New test. > > > > > --- > > > > > gcc/config/i386/i386-expand.cc | 4 + > > > > > gcc/config/i386/mmx.md | 237 ++++++++++++= +++++- > > > > > gcc/config/i386/sse.md | 25 +- > > > > > .../g++.target/i386/part-vect-vcondhf.C | 34 +++ > > > > > .../gcc.target/i386/part-vect-vec_cmphf.c | 26 ++ > > > > > 5 files changed, 304 insertions(+), 22 deletions(-) > > > > > create mode 100644 gcc/testsuite/g++.target/i386/part-vect-vcond= hf.C > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/part-vect-vec_c= mphf.c > > > > > > > > > > diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i38= 6-expand.cc > > > > > index 1eae9d7c78c..9658f9c5a2d 100644 > > > > > --- a/gcc/config/i386/i386-expand.cc > > > > > +++ b/gcc/config/i386/i386-expand.cc > > > > > @@ -4198,6 +4198,8 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, r= tx op_true, rtx op_false) > > > > > break; > > > > > case E_V8QImode: > > > > > case E_V4HImode: > > > > > + case E_V4HFmode: > > > > > + case E_V4BFmode: > > > > > case E_V2SImode: > > > > > if (TARGET_SSE4_1) > > > > > { > > > > > @@ -4207,6 +4209,8 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, r= tx op_true, rtx op_false) > > > > > break; > > > > > case E_V4QImode: > > > > > case E_V2HImode: > > > > > + case E_V2HFmode: > > > > > + case E_V2BFmode: > > > > > if (TARGET_SSE4_1) > > > > > { > > > > > gen =3D gen_mmx_pblendvb_v4qi; > > > > > diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md > > > > > index 491a0a51272..b9617e9d8c6 100644 > > > > > --- a/gcc/config/i386/mmx.md > > > > > +++ b/gcc/config/i386/mmx.md > > > > > @@ -61,6 +61,9 @@ (define_mode_iterator MMXMODE248 [V4HI V2SI V1D= I]) > > > > > (define_mode_iterator V_32 [V4QI V2HI V1SI V2HF V2BF]) > > > > > > > > > > (define_mode_iterator V2FI_32 [V2HF V2BF V2HI]) > > > > > +(define_mode_iterator V4FI_64 [V4HF V4BF V4HI]) > > > > > +(define_mode_iterator V4F_64 [V4HF V4BF]) > > > > > +(define_mode_iterator V2F_32 [V2HF V2BF]) > > > > > ;; 4-byte integer vector modes > > > > > (define_mode_iterator VI_32 [V4QI V2HI]) > > > > > > > > > > @@ -1972,10 +1975,12 @@ (define_mode_attr mov_to_sse_suffix > > > > > [(V2HF "d") (V4HF "q") (V2HI "d") (V4HI "q")]) > > > > > > > > > > (define_mode_attr mmxxmmmode > > > > > - [(V2HF "V8HF") (V2HI "V8HI") (V2BF "V8BF")]) > > > > > + [(V2HF "V8HF") (V2HI "V8HI") (V2BF "V8BF") > > > > > + (V4HF "V8HF") (V4HI "V8HI") (V4BF "V8BF")]) > > > > > > > > > > (define_mode_attr mmxxmmmodelower > > > > > - [(V2HF "v8hf") (V2HI "v8hi") (V2BF "v8bf")]) > > > > > + [(V2HF "v8hf") (V2HI "v8hi") (V2BF "v8bf") > > > > > + (V4HF "v8hf") (V4HI "v8hi") (V4BF "v8bf")]) > > > > > > > > > > (define_expand "movd__to_sse" > > > > > [(set (match_operand: 0 "register_operand") > > > > > @@ -2114,6 +2119,234 @@ (define_insn_and_split "*mmx_nabs2" > > > > > [(set (match_dup 0) > > > > > (ior: (match_dup 1) (match_dup 2)))]) > > > > > > > > > > +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;= ;;;;; > > > > > +;; > > > > > +;; Parallel half-precision floating point comparisons > > > > > +;; > > > > > +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;= ;;;;; > > > > > + > > > > > +(define_expand "vec_cmpv4hfqi" > > > > > + [(set (match_operand:QI 0 "register_operand") > > > > > + (match_operator:QI 1 "" > > > > > + [(match_operand:V4HF 2 "nonimmediate_operand") > > > > > + (match_operand:V4HF 3 "nonimmediate_operand")]))] > > > > > + "TARGET_MMX_WITH_SSE && TARGET_AVX512FP16 && TARGET_AVX512VL > > > > > + && ix86_partial_vec_fp_math" > > > > > +{ > > > > > + rtx ops[4]; > > > > > + ops[3] =3D gen_reg_rtx (V8HFmode); > > > > > + ops[2] =3D gen_reg_rtx (V8HFmode); > > > > > + > > > > > + emit_insn (gen_movq_v4hf_to_sse (ops[3], operands[3])); > > > > > + emit_insn (gen_movq_v4hf_to_sse (ops[2], operands[2])); > > > > > + emit_insn (gen_vec_cmpv8hfqi (operands[0], operands[1], ops[2]= , ops[3])); > > > > > + DONE; > > > > > +}) > > > > > + > > > > > +(define_expand "vcondv4hf" > > > > > + [(set (match_operand:V4FI_64 0 "register_operand") > > > > > + (if_then_else:V4FI_64 > > > > > + (match_operator 3 "" > > > > > + [(match_operand:V4HF 4 "nonimmediate_operand") > > > > > + (match_operand:V4HF 5 "nonimmediate_operand")]) > > > > > + (match_operand:V4FI_64 1 "general_operand") > > > > > + (match_operand:V4FI_64 2 "general_operand")))] > > > > > + "TARGET_AVX512FP16 && TARGET_AVX512VL > > > > > + && TARGET_MMX_WITH_SSE && ix86_partial_vec_fp_math" > > > > > +{ > > > > > + rtx ops[6]; > > > > > + ops[5] =3D gen_reg_rtx (V8HFmode); > > > > > + ops[4] =3D gen_reg_rtx (V8HFmode); > > > > > + ops[0] =3D gen_reg_rtx (mode); > > > > > + ops[1] =3D lowpart_subreg (mode, > > > > > + force_reg (mode, operands[1]), > > > > > + mode); > > > > > + ops[2] =3D lowpart_subreg (mode, > > > > > + force_reg (mode, operands[2]), > > > > > + mode); > > > > > + ops[3] =3D operands[3]; > > > > > + emit_insn (gen_movq_v4hf_to_sse (ops[4], operands[4])); > > > > > + emit_insn (gen_movq_v4hf_to_sse (ops[5], operands[5])); > > > > > + bool ok =3D ix86_expand_fp_vcond (ops); > > > > > + gcc_assert (ok); > > > > > + > > > > > + emit_move_insn (operands[0], lowpart_subreg (mode, ops[0= ], > > > > > + mode))= ; > > > > > + DONE; > > > > > +}) > > > > > + > > > > > +(define_expand "vcondv4hi" > > > > > + [(set (match_operand:V4F_64 0 "register_operand") > > > > > + (if_then_else:V4F_64 > > > > > + (match_operator 3 "" > > > > > + [(match_operand:V4HI 4 "nonimmediate_operand") > > > > > + (match_operand:V4HI 5 "nonimmediate_operand")]) > > > > > + (match_operand:V4F_64 1 "general_operand") > > > > > + (match_operand:V4F_64 2 "general_operand")))] > > > > > + "TARGET_MMX_WITH_SSE && TARGET_SSE4_1" > > > > > +{ > > > > > + bool ok =3D ix86_expand_int_vcond (operands); > > > > > + gcc_assert (ok); > > > > > + DONE; > > > > > +}) > > > > > + > > > > > +(define_expand "vconduv4hi" > > > > > + [(set (match_operand:V4F_64 0 "register_operand") > > > > > + (if_then_else:V4F_64 > > > > > + (match_operator 3 "" > > > > > + [(match_operand:V4HI 4 "nonimmediate_operand") > > > > > + (match_operand:V4HI 5 "nonimmediate_operand")]) > > > > > + (match_operand:V4F_64 1 "general_operand") > > > > > + (match_operand:V4F_64 2 "general_operand")))] > > > > > + "TARGET_MMX_WITH_SSE && TARGET_SSE4_1" > > > > > +{ > > > > > + bool ok =3D ix86_expand_int_vcond (operands); > > > > > + gcc_assert (ok); > > > > > + DONE; > > > > > +}) > > > > > + > > > > > +(define_expand "vcond_mask_v4hi" > > > > > + [(set (match_operand:V4F_64 0 "register_operand") > > > > > + (vec_merge:V4F_64 > > > > > + (match_operand:V4F_64 1 "register_operand") > > > > > + (match_operand:V4F_64 2 "register_operand") > > > > > + (match_operand:V4HI 3 "register_operand")))] > > > > > + "TARGET_MMX_WITH_SSE && TARGET_SSE4_1" > > > > > +{ > > > > > + ix86_expand_sse_movcc (operands[0], operands[3], > > > > > + operands[1], operands[2]); > > > > > + DONE; > > > > > +}) > > > > > + > > > > > +(define_expand "vcond_mask_qi" > > > > > + [(set (match_operand:V4FI_64 0 "register_operand") > > > > > + (vec_merge:V4FI_64 > > > > > + (match_operand:V4FI_64 1 "register_operand") > > > > > + (match_operand:V4FI_64 2 "register_operand") > > > > > + (match_operand:QI 3 "register_operand")))] > > > > > + "TARGET_MMX_WITH_SSE && TARGET_AVX512BW && TARGET_AVX512VL" > > > > > +{ > > > > > + rtx op0 =3D gen_reg_rtx (mode); > > > > > + operands[1] =3D lowpart_subreg (mode, operands[1],= mode); > > > > > + operands[2] =3D lowpart_subreg (mode, operands[2],= mode); > > > > > + emit_insn (gen_vcond_mask_qi (op0, operands[1= ], > > > > > + operands[2], ope= rands[3])); > > > > > + emit_move_insn (operands[0], > > > > > + lowpart_subreg (mode, op0, mo= de)); > > > > > + DONE; > > > > > +}) > > > > > + > > > > > +(define_expand "vec_cmpv2hfqi" > > > > > + [(set (match_operand:QI 0 "register_operand") > > > > > + (match_operator:QI 1 "" > > > > > + [(match_operand:V2HF 2 "nonimmediate_operand") > > > > > + (match_operand:V2HF 3 "nonimmediate_operand")]))] > > > > > + "TARGET_AVX512FP16 && TARGET_AVX512VL > > > > > + && ix86_partial_vec_fp_math" > > > > > +{ > > > > > + rtx ops[4]; > > > > > + ops[3] =3D gen_reg_rtx (V8HFmode); > > > > > + ops[2] =3D gen_reg_rtx (V8HFmode); > > > > > + > > > > > + emit_insn (gen_movd_v2hf_to_sse (ops[3], operands[3])); > > > > > + emit_insn (gen_movd_v2hf_to_sse (ops[2], operands[2])); > > > > > + emit_insn (gen_vec_cmpv8hfqi (operands[0], operands[1], ops[2]= , ops[3])); > > > > > + DONE; > > > > > +}) > > > > > + > > > > > +(define_expand "vcondv2hf" > > > > > + [(set (match_operand:V2FI_32 0 "register_operand") > > > > > + (if_then_else:V2FI_32 > > > > > + (match_operator 3 "" > > > > > + [(match_operand:V2HF 4 "nonimmediate_operand") > > > > > + (match_operand:V2HF 5 "nonimmediate_operand")]) > > > > > + (match_operand:V2FI_32 1 "general_operand") > > > > > + (match_operand:V2FI_32 2 "general_operand")))] > > > > > + "TARGET_AVX512FP16 && TARGET_AVX512VL > > > > > + && ix86_partial_vec_fp_math" > > > > > +{ > > > > > + rtx ops[6]; > > > > > + ops[5] =3D gen_reg_rtx (V8HFmode); > > > > > + ops[4] =3D gen_reg_rtx (V8HFmode); > > > > > + ops[0] =3D gen_reg_rtx (mode); > > > > > + ops[1] =3D lowpart_subreg (mode, > > > > > + force_reg (mode, operands[1]), > > > > > + mode); > > > > > + ops[2] =3D lowpart_subreg (mode, > > > > > + force_reg (mode, operands[2]), > > > > > + mode); > > > > > + ops[3] =3D operands[3]; > > > > > + emit_insn (gen_movd_v2hf_to_sse (ops[4], operands[4])); > > > > > + emit_insn (gen_movd_v2hf_to_sse (ops[5], operands[5])); > > > > > + bool ok =3D ix86_expand_fp_vcond (ops); > > > > > + gcc_assert (ok); > > > > > + > > > > > + emit_move_insn (operands[0], lowpart_subreg (mode, ops[0= ], > > > > > + mode))= ; > > > > > + DONE; > > > > > +}) > > > > > + > > > > > +(define_expand "vcondv2hi" > > > > > + [(set (match_operand:V2F_32 0 "register_operand") > > > > > + (if_then_else:V2F_32 > > > > > + (match_operator 3 "" > > > > > + [(match_operand:V2HI 4 "nonimmediate_operand") > > > > > + (match_operand:V2HI 5 "nonimmediate_operand")]) > > > > > + (match_operand:V2F_32 1 "general_operand") > > > > > + (match_operand:V2F_32 2 "general_operand")))] > > > > > + "TARGET_SSE4_1" > > > > > +{ > > > > > + bool ok =3D ix86_expand_int_vcond (operands); > > > > > + gcc_assert (ok); > > > > > + DONE; > > > > > +}) > > > > > + > > > > > +(define_expand "vconduv2hi" > > > > > + [(set (match_operand:V2F_32 0 "register_operand") > > > > > + (if_then_else:V2F_32 > > > > > + (match_operator 3 "" > > > > > + [(match_operand:V2HI 4 "nonimmediate_operand") > > > > > + (match_operand:V2HI 5 "nonimmediate_operand")]) > > > > > + (match_operand:V2F_32 1 "general_operand") > > > > > + (match_operand:V2F_32 2 "general_operand")))] > > > > > + "TARGET_SSE4_1" > > > > > +{ > > > > > + bool ok =3D ix86_expand_int_vcond (operands); > > > > > + gcc_assert (ok); > > > > > + DONE; > > > > > +}) > > > > > + > > > > > +(define_expand "vcond_mask_v2hi" > > > > > + [(set (match_operand:V2F_32 0 "register_operand") > > > > > + (vec_merge:V2F_32 > > > > > + (match_operand:V2F_32 1 "register_operand") > > > > > + (match_operand:V2F_32 2 "register_operand") > > > > > + (match_operand:V2HI 3 "register_operand")))] > > > > > + "TARGET_SSE4_1" > > > > > +{ > > > > > + ix86_expand_sse_movcc (operands[0], operands[3], > > > > > + operands[1], operands[2]); > > > > > + DONE; > > > > > +}) > > > > > + > > > > > +(define_expand "vcond_mask_qi" > > > > > + [(set (match_operand:V2FI_32 0 "register_operand") > > > > > + (vec_merge:V2FI_32 > > > > > + (match_operand:V2FI_32 1 "register_operand") > > > > > + (match_operand:V2FI_32 2 "register_operand") > > > > > + (match_operand:QI 3 "register_operand")))] > > > > > + "TARGET_AVX512BW && TARGET_AVX512VL" > > > > > +{ > > > > > + rtx op0 =3D gen_reg_rtx (mode); > > > > > + operands[1] =3D lowpart_subreg (mode, operands[1],= mode); > > > > > + operands[2] =3D lowpart_subreg (mode, operands[2],= mode); > > > > > + emit_insn (gen_vcond_mask_qi (op0, operands[1= ], > > > > > + operands[2], ope= rands[3])); > > > > > + emit_move_insn (operands[0], > > > > > + lowpart_subreg (mode, op0, mo= de)); > > > > > + DONE; > > > > > +}) > > > > > + > > > > > ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;= ;;;;; > > > > > ;; > > > > > ;; Parallel half-precision floating point rounding operations. > > > > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > > > > > index c988935d4df..e2a7cbeb722 100644 > > > > > --- a/gcc/config/i386/sse.md > > > > > +++ b/gcc/config/i386/sse.md > > > > > @@ -4644,29 +4644,14 @@ (define_expand "vcond" > > > > > DONE; > > > > > }) > > > > > > > > > > -(define_expand "vcond" > > > > > - [(set (match_operand:VHF_AVX512VL 0 "register_operand") > > > > > - (if_then_else:VHF_AVX512VL > > > > > - (match_operator 3 "" > > > > > - [(match_operand:VHF_AVX512VL 4 "vector_operand") > > > > > - (match_operand:VHF_AVX512VL 5 "vector_operand")]) > > > > > - (match_operand:VHF_AVX512VL 1 "general_operand") > > > > > - (match_operand:VHF_AVX512VL 2 "general_operand")))] > > > > > - "TARGET_AVX512FP16" > > > > > -{ > > > > > - bool ok =3D ix86_expand_fp_vcond (operands); > > > > > - gcc_assert (ok); > > > > > - DONE; > > > > > -}) > > > > > - > > > > > -(define_expand "vcond" > > > > > - [(set (match_operand: 0 "register_operand") > > > > > - (if_then_else: > > > > > +(define_expand "vcond" > > > > > + [(set (match_operand:VI2HFBF_AVX512VL 0 "register_operand") > > > > > + (if_then_else:VI2HFBF_AVX512VL > > > > > (match_operator 3 "" > > > > > [(match_operand:VHF_AVX512VL 4 "vector_operand") > > > > > (match_operand:VHF_AVX512VL 5 "vector_operand")]) > > > > > - (match_operand: 1 "general_operand") > > > > > - (match_operand: 2 "general_operand")))] > > > > > + (match_operand:VI2HFBF_AVX512VL 1 "general_operand") > > > > > + (match_operand:VI2HFBF_AVX512VL 2 "general_operand")))] > > > > > "TARGET_AVX512FP16" > > > > > { > > > > > bool ok =3D ix86_expand_fp_vcond (operands); > > > > > diff --git a/gcc/testsuite/g++.target/i386/part-vect-vcondhf.C b/= gcc/testsuite/g++.target/i386/part-vect-vcondhf.C > > > > > new file mode 100644 > > > > > index 00000000000..8bf01b7cb4a > > > > > --- /dev/null > > > > > +++ b/gcc/testsuite/g++.target/i386/part-vect-vcondhf.C > > > > > @@ -0,0 +1,34 @@ > > > > > +/* PR target/103861 */ > > > > > +/* { dg-do compile { target { ! ia32 } } } */ > > > > > +/* { dg-options "-O2 -mavx512fp16 -mavx512vl" } */ > > > > > +/* { dg-final { scan-assembler-times "vpcmpeqw" 2 } } */ > > > > > +/* { dg-final { scan-assembler-times "vpcmpgtw" 2 } } */ > > > > > +/* { dg-final { scan-assembler-times "vcmpph" 4 } } */ > > > > > +/* { dg-final { scan-assembler-times "vpblendvb" 4 } } */ > > > > > +typedef unsigned short __attribute__((__vector_size__ (4))) __v= 2hu; > > > > > +typedef short __attribute__((__vector_size__ (4))) __v2hi; > > > > > + > > > > > +typedef unsigned short __attribute__((__vector_size__ (8))) __v= 4hu; > > > > > +typedef short __attribute__((__vector_size__ (8))) __v4hi; > > > > > + > > > > > +typedef _Float16 __attribute__((__vector_size__ (4))) __v2hf; > > > > > +typedef _Float16 __attribute__((__vector_size__ (8))) __v4hf; > > > > > + > > > > > + > > > > > +__v2hu au, bu; > > > > > +__v2hi as, bs; > > > > > +__v2hf af, bf; > > > > > + > > > > > +__v4hu cu, du; > > > > > +__v4hi cs, ds; > > > > > +__v4hf cf, df; > > > > > + > > > > > +__v2hf auf (__v2hu a, __v2hu b) { return (a > b) ? af : bf; } > > > > > +__v2hf asf (__v2hi a, __v2hi b) { return (a > b) ? af : bf; } > > > > > +__v2hu afu (__v2hf a, __v2hf b) { return (a > b) ? au : bu; } > > > > > +__v2hi afs (__v2hf a, __v2hf b) { return (a > b) ? as : bs; } > > > > > + > > > > > +__v4hf cuf (__v4hu c, __v4hu d) { return (c > d) ? cf : df; } > > > > > +__v4hf csf (__v4hi c, __v4hi d) { return (c > d) ? cf : df; } > > > > > +__v4hu cfu (__v4hf c, __v4hf d) { return (c > d) ? cu : du; } > > > > > +__v4hi cfs (__v4hf c, __v4hf d) { return (c > d) ? cs : ds; } > > > > > diff --git a/gcc/testsuite/gcc.target/i386/part-vect-vec_cmphf.c = b/gcc/testsuite/gcc.target/i386/part-vect-vec_cmphf.c > > > > > new file mode 100644 > > > > > index 00000000000..ee8659395eb > > > > > --- /dev/null > > > > > +++ b/gcc/testsuite/gcc.target/i386/part-vect-vec_cmphf.c > > > > > @@ -0,0 +1,26 @@ > > > > > +/* { dg-do compile { target { ! ia32 } } } */ > > > > > +/* { dg-options "-O2 -mavx512fp16 -mavx512vl" } */ > > > > > +/* { dg-final { scan-assembler-times "vcmpph" 10 } } */ > > > > > + > > > > > +typedef _Float16 __attribute__((__vector_size__ (4))) v2hf; > > > > > +typedef _Float16 __attribute__((__vector_size__ (8))) v4hf; > > > > > + > > > > > + > > > > > +#define VCMPMN(type, op, name) \ > > > > > +type \ > > > > > +__attribute__ ((noinline, noclone)) \ > > > > > +vec_cmp_##type##type##name (type a, type b) \ > > > > > +{ \ > > > > > + return a op b; \ > > > > > +} > > > > > + > > > > > +VCMPMN (v4hf, <, lt) > > > > > +VCMPMN (v2hf, <, lt) > > > > > +VCMPMN (v4hf, <=3D, le) > > > > > +VCMPMN (v2hf, <=3D, le) > > > > > +VCMPMN (v4hf, >, gt) > > > > > +VCMPMN (v2hf, >, gt) > > > > > +VCMPMN (v4hf, >=3D, ge) > > > > > +VCMPMN (v2hf, >=3D, ge) > > > > > +VCMPMN (v4hf, =3D=3D, eq) > > > > > +VCMPMN (v2hf, =3D=3D, eq) > > > > > -- > > > > > 2.31.1 > > > > > > > > > > > > > > > > > -- > > > BR, > > > Hongtao > > > > > > > > -- > > BR, > > Hongtao > > > > -- > BR, > Hongtao