From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt1-x82e.google.com (mail-qt1-x82e.google.com [IPv6:2607:f8b0:4864:20::82e]) by sourceware.org (Postfix) with ESMTPS id 57AB43858D37 for ; Fri, 14 Jan 2022 06:11:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 57AB43858D37 Received: by mail-qt1-x82e.google.com with SMTP id x8so3759321qta.12 for ; Thu, 13 Jan 2022 22:11:24 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=gja8boA8yAiLwSK9nTc35dw62g7eV1vY30lQ2d8mdeo=; b=dp1yIorY1XrE8t8LtcG3g7qsvj63s4bXfLacmeokUEh5lhc/JeCoJ+ewueWsG3eSvG cICAPsVMgw7su710SnejN2coSFi3HfMF5d3mebmyL3IVqbDlYgCoLzShjnfWj8/SBe9q 7CZDPTldYg75E1JLbA3vm+rs5tvUceyFS8aNi/MP1N4bY7ChIDiXrZEI8TiWd/cmaRJ+ Dx3U2a2jxXzOsPUfxLG/UtGGtwxG/wWycbEe9d02+JAH7mOmk/Xm6b65JE9Hnn+qLhpV BwK/LL5InUTQtihX/Nqbl8E5LC4nIkbohZWy9/UD95gR/EaOGOp13DPYaSX+a0GF4FZX NDYg== X-Gm-Message-State: AOAM532djXw58pKhYg3A1JyC0Rn5THDbu330Kl8GK8niGLRFMkQUImgi uT7LlMlwNxnU2ZSLYkGeAaR7Z8eFcLO/V3Omhmc= X-Google-Smtp-Source: ABdhPJw6b3nLfxcSZxPUgC4St9azhIugVuLUu7RY6h8EoJBHi+puyV66jinOrMF9VnmJI6wjKQFShJF2eQhaeN3mcI4= X-Received: by 2002:ac8:5a47:: with SMTP id o7mr6474482qta.613.1642140683287; Thu, 13 Jan 2022 22:11:23 -0800 (PST) MIME-Version: 1.0 References: <20220113072839.8405-1-hongyu.wang@intel.com> In-Reply-To: From: Hongyu Wang Date: Fri, 14 Jan 2022 14:03:49 +0800 Message-ID: Subject: Re: [PATCH] [i386] GLC tuning: Break false dependency for dest register. To: Uros Bizjak Cc: Hongyu Wang , Hongtao Liu , "gcc-patches@gcc.gnu.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Jan 2022 06:11:28 -0000 > > No, the approach is wrong. You have to solve output clearing on RTL > > level, please look at how e.g. tzcnt false dep is solved: > > Actually we have considered such approach before, but we found we need > to break original define_insn to remove the mask/rounding subst, > since define_split could not adopt subst, and that would add 6 more > define_insn_and_split and 4 define_insn for each instruction. We think > such approach would introduce too much redundant code. > > Do you think the code size increment is acceptable? Also that 100+ more patterns increases maintenance effort. If we split them at epilogue_complete stage, it seems not much difference to put it under output template... Hongyu Wang =E4=BA=8E2022=E5=B9=B41=E6=9C=8814=E6= =97=A5=E5=91=A8=E4=BA=94 13:38=E5=86=99=E9=81=93=EF=BC=9A > > > No, the approach is wrong. You have to solve output clearing on RTL > > level, please look at how e.g. tzcnt false dep is solved: > > Actually we have considered such approach before, but we found we need > to break original define_insn to remove the mask/rounding subst, > since define_split could not adopt subst, and that would add 6 more > define_insn_and_split and 4 define_insn for each instruction. We think > such approach would introduce too much redundant code. > > Do you think the code size increment is acceptable? > > Uros Bizjak via Gcc-patches =E4=BA=8E2022=E5=B9= =B41=E6=9C=8813=E6=97=A5=E5=91=A8=E5=9B=9B 15:42=E5=86=99=E9=81=93=EF=BC=9A > > > > On Thu, Jan 13, 2022 at 8:28 AM Hongyu Wang wro= te: > > > > > > From: wwwhhhyyy > > > > > > Hi, > > > > > > For GoldenCove micro-architecture, force insert zero-idiom in asm > > > template to break false dependency of dest register for several insns= . > > > > > > The related insns are: > > > > > > VPERM/D/Q/PS/PD > > > VRANGEPD/PS/SD/SS > > > VGETMANTSS/SD/SH > > > VGETMANDPS/PD - mem version only > > > VPMULLQ > > > VFMULCSH/PH > > > VFCMULCSH/PH > > > > > > Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,} > > > > > > Ok for master? > > > > No, the approach is wrong. You have to solve output clearing on RTL > > level, please look at how e.g. tzcnt false dep is solved: > > > > [(set (reg:CCC FLAGS_REG) > > (compare:CCC (match_operand:SWI48 1 "nonimmediate_operand" "rm") > > (const_int 0))) > > (set (match_operand:SWI48 0 "register_operand" "=3Dr") > > (ctz:SWI48 (match_dup 1)))] > > "TARGET_BMI" > > "tzcnt{}\t{%1, %0|%0, %1}"; > > "&& TARGET_AVOID_FALSE_DEP_FOR_BMI && epilogue_completed > > && optimize_function_for_speed_p (cfun) > > && !reg_mentioned_p (operands[0], operands[1])" > > [(parallel > > [(set (reg:CCC FLAGS_REG) > > (compare:CCC (match_dup 1) (const_int 0))) > > (set (match_dup 0) > > (ctz:SWI48 (match_dup 1))) > > (unspec [(match_dup 0)] UNSPEC_INSN_FALSE_DEP)])] > > "ix86_expand_clear (operands[0]);" > > [(set_attr "type" "alu1") > > (set_attr "prefix_0f" "1") > > (set_attr "prefix_rep" "1") > > (set_attr "btver2_decode" "double") > > (set_attr "mode" "")]) > > > > For TARGET_AVOID_FALSE_DEP_FOR_BMI, we split at epilogue_complete when > > insn registers are stable and use ix86_expand_clear to clear output > > operand. Please also note how the final insn is tagged with > > UNSPEC_INSN_FALSE_DEP to avoid combine from recognizing it too early. > > > > Uros. > > > > > > > > gcc/ChangeLog: > > > > > > * config/i386/i386.h (TARGET_DEST_FALSE_DEPENDENCY): New macr= o. > > > * config/i386/i386.md (dest_false_dep): New define_attr. > > > * config/i386/sse.md (__): > > > Insert zero-idiom in output template when attr enabled, set n= ew attribute to > > > true for non-mask/maskz insn. > > > (avx512fp16_sh_v8hf): > > > Likewise. > > > (avx512dq_mul3): Likewise. > > > (_permvar): Likewise. > > > (avx2_perm_1): Likewise. > > > (avx512f_perm_1): Likewise. > > > (avx512dq_rangep): Likew= ise. > > > (avx512dq_ranges): > > > Likewise. > > > (_getmant): Like= wise. > > > (avx512f_vgetmant): > > > Likewise. > > > * config/i386/subst.md (mask3_dest_false_dep_attr): New subst= _attr. > > > (mask4_dest_false_dep_attr): Likewise. > > > (mask6_dest_false_dep_attr): Likewise. > > > (mask10_dest_false_dep_attr): Likewise. > > > (maskc_dest_false_dep_attr): Likewise. > > > (mask_scalar4_dest_false_dep_attr): Likewise. > > > (mask_scalarc_dest_false_dep_attr): Likewise. > > > * config/i386/x86-tune.def (X86_TUNE_DEST_FALSE_DEPENDENCY): = New > > > DEF_TUNE enabled for m_SAPPHIRERAPIDS and m_ALDERLAKE > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.target/i386/avx2-dest-false-dependency.c: New test. > > > * gcc.target/i386/avx512dq-dest-false-dependency.c: Ditto. > > > * gcc.target/i386/avx512f-dest-false-dependency.c: Ditto. > > > * gcc.target/i386/avx512fp16-dest-false-dependency.c: Ditto. > > > * gcc.target/i386/avx512fp16vl-dest-false-dependency.c: Ditto= . > > > * gcc.target/i386/avx512vl-dest-false-dependency.c: Ditto. > > > --- > > > gcc/config/i386/i386.h | 2 + > > > gcc/config/i386/i386.md | 4 + > > > gcc/config/i386/sse.md | 142 +++++++++++++++-= -- > > > gcc/config/i386/subst.md | 7 + > > > gcc/config/i386/x86-tune.def | 5 + > > > .../i386/avx2-dest-false-dependency.c | 24 +++ > > > .../i386/avx512dq-dest-false-dependency.c | 73 +++++++++ > > > .../i386/avx512f-dest-false-dependency.c | 102 +++++++++++++ > > > .../i386/avx512fp16-dest-false-dependency.c | 45 ++++++ > > > .../i386/avx512fp16vl-dest-false-dependency.c | 24 +++ > > > .../i386/avx512vl-dest-false-dependency.c | 76 ++++++++++ > > > 11 files changed, 486 insertions(+), 18 deletions(-) > > > create mode 100644 gcc/testsuite/gcc.target/i386/avx2-dest-false-dep= endency.c > > > create mode 100644 gcc/testsuite/gcc.target/i386/avx512dq-dest-false= -dependency.c > > > create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-dest-false-= dependency.c > > > create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-dest-fal= se-dependency.c > > > create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-dest-f= alse-dependency.c > > > create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-dest-false= -dependency.c > > > > > > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h > > > index 3ac0f698ae2..ddbf6b9825a 100644 > > > --- a/gcc/config/i386/i386.h > > > +++ b/gcc/config/i386/i386.h > > > @@ -429,6 +429,8 @@ extern unsigned char ix86_tune_features[X86_TUNE_= LAST]; > > > ix86_tune_features[X86_TUNE_EXPAND_ABS] > > > #define TARGET_V2DF_REDUCTION_PREFER_HADDPD \ > > > ix86_tune_features[X86_TUNE_V2DF_REDUCTION_PREFER_HADDPD] > > > +#define TARGET_DEST_FALSE_DEPENDENCY \ > > > + ix86_tune_features[X86_TUNE_DEST_FALSE_DEPENDENCY] > > > > > > /* Feature tests against the various architecture variations. */ > > > enum ix86_arch_indices { > > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md > > > index 9937643a273..40a2b580740 100644 > > > --- a/gcc/config/i386/i386.md > > > +++ b/gcc/config/i386/i386.md > > > @@ -823,6 +823,10 @@ (define_attr "i387_cw" "roundeven,floor,ceil,tru= nc,uninitialized,any" > > > (define_attr "avx_partial_xmm_update" "false,true" > > > (const_string "false")) > > > > > > +;; Define attribute to indicate complex mult insn with false depende= ncy > > > +(define_attr "dest_false_dep" "false,true" > > > + (const_string "false")) > > > + > > > ;; Define attribute to classify add/sub insns that consumes carry fl= ag (CF) > > > (define_attr "use_carry" "0,1" (const_string "0")) > > > > > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > > > index 0864748875e..c8dace5b2f8 100644 > > > --- a/gcc/config/i386/sse.md > > > +++ b/gcc/config/i386/sse.md > > > @@ -6536,9 +6536,20 @@ (define_insn "__<= maskc_name>" > > > (match_operand:VF_AVX512FP16VL 2 "nonimmediate_operand" = "")] > > > UNSPEC_COMPLEX_F_C_MUL))] > > > "TARGET_AVX512FP16 && " > > > - "v\t{%2, %1, %0|%0, %1, %2}" > > > +{ > > > + if (TARGET_DEST_FALSE_DEPENDENCY > > > + && get_attr_dest_false_dep (insn) =3D=3D > > > + DEST_FALSE_DEP_TRUE) > > > + output_asm_insn ("vxorps\t{%x0, %x0, %x0}", operands); > > > + return "v\t{%2, %1,= %0|%0, %1, %2}"; > > > +} > > > [(set_attr "type" "ssemul") > > > - (set_attr "mode" "")]) > > > + (set_attr "mode" "") > > > + (set (attr "dest_false_dep") > > > + (if_then_else > > > + (match_test "") > > > + (const_string "false") > > > + (const_string "true")))]) > > > > > > (define_expand "avx512fp16_fmaddcsh_v8hf_maskz" > > > [(match_operand:V8HF 0 "register_operand") > > > @@ -6742,9 +6753,20 @@ (define_insn "avx512fp16_sh_v8h= f > > (match_dup 1) > > > (const_int 3)))] > > > "TARGET_AVX512FP16" > > > - "vsh\t{%2, %1, %0|%0, %1, %2}" > > > +{ > > > + if (TARGET_DEST_FALSE_DEPENDENCY > > > + && get_attr_dest_false_dep (insn) =3D=3D > > > + DEST_FALSE_DEP_TRUE) > > > + output_asm_insn ("vxorps\t{%x0, %x0, %x0}", operands); > > > + return "vsh\t{%2, %1, %0|%0, %1, %2}"; > > > +} > > > [(set_attr "type" "ssemul") > > > - (set_attr "mode" "V8HF")]) > > > + (set_attr "mode" "V8HF") > > > + (set (attr "dest_false_dep") > > > + (if_then_else > > > + (match_test "") > > > + (const_string "false") > > > + (const_string "true")))]) > > > > > > ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;= ; > > > ;; > > > @@ -15207,10 +15229,21 @@ (define_insn "avx512dq_mul3" > > > (match_operand:VI8_AVX512VL 2 "bcst_vector_operand" "vmBr")= ))] > > > "TARGET_AVX512DQ && > > > && ix86_binary_operator_ok (MULT, mode, operands)" > > > - "vpmullq\t{%2, %1, %0|%0, %1, %2}" > > > +{ > > > + if (TARGET_DEST_FALSE_DEPENDENCY > > > + && get_attr_dest_false_dep (insn) =3D=3D > > > + DEST_FALSE_DEP_TRUE) > > > + output_asm_insn ("vxorps\t{%x0, %x0, %x0}", operands); > > > + return "vpmullq\t{%2, %1, %0|%0, %1,= %2}"; > > > +} > > > [(set_attr "type" "sseimul") > > > (set_attr "prefix" "evex") > > > - (set_attr "mode" "")]) > > > + (set_attr "mode" "") > > > + (set (attr "dest_false_dep") > > > + (if_then_else > > > + (match_test "") > > > + (const_string "false") > > > + (const_string "true")))]) > > > > > > (define_expand "cond_mul" > > > [(set (match_operand:VI4_AVX512VL 0 "register_operand") > > > @@ -24636,10 +24669,21 @@ (define_insn "_permvar" > > > (match_operand: 2 "register_operand" "v")] > > > UNSPEC_VPERMVAR))] > > > "TARGET_AVX2 && " > > > - "vperm\t{%1, %2, %0|%0, %2, %1}" > > > +{ > > > + if (TARGET_DEST_FALSE_DEPENDENCY > > > + && get_attr_dest_false_dep (insn) =3D=3D > > > + DEST_FALSE_DEP_TRUE) > > > + output_asm_insn ("vxorps\t{%x0, %x0, %x0}", operands); > > > + return "vperm\t{%1, %2, %0|%0, %2, %1}"; > > > +} > > > [(set_attr "type" "sselog") > > > (set_attr "prefix" "") > > > - (set_attr "mode" "")]) > > > + (set_attr "mode" "") > > > + (set (attr "dest_false_dep") > > > + (if_then_else > > > + (match_test "") > > > + (const_string "false") > > > + (const_string "true")))]) > > > > > > (define_insn "_permvar" > > > [(set (match_operand:VI1_AVX512VL 0 "register_operand" "=3Dv") > > > @@ -24873,11 +24917,20 @@ (define_insn "avx2_perm_1" > > > mask |=3D INTVAL (operands[4]) << 4; > > > mask |=3D INTVAL (operands[5]) << 6; > > > operands[2] =3D GEN_INT (mask); > > > + if (TARGET_DEST_FALSE_DEPENDENCY > > > + && get_attr_dest_false_dep (insn) =3D=3D > > > + DEST_FALSE_DEP_TRUE) > > > + output_asm_insn ("vxorps\t{%x0, %x0, %x0}", operands); > > > return "vperm\t{%2, %1, %0|%0, %1, %2}"; > > > } > > > [(set_attr "type" "sselog") > > > (set_attr "prefix" "") > > > - (set_attr "mode" "")]) > > > + (set_attr "mode" "") > > > + (set (attr "dest_false_dep") > > > + (if_then_else > > > + (match_test "") > > > + (const_string "false") > > > + (const_string "true")))]) > > > > > > (define_expand "avx512f_perm" > > > [(match_operand:V8FI 0 "register_operand") > > > @@ -24944,11 +24997,20 @@ (define_insn "avx512f_perm_1" > > > mask |=3D INTVAL (operands[4]) << 4; > > > mask |=3D INTVAL (operands[5]) << 6; > > > operands[2] =3D GEN_INT (mask); > > > + if (TARGET_DEST_FALSE_DEPENDENCY > > > + && get_attr_dest_false_dep (insn) =3D=3D > > > + DEST_FALSE_DEP_TRUE) > > > + output_asm_insn ("vxorps\t{%x0, %x0, %x0}", operands); > > > return "vperm\t{%2, %1, %0|%0, %1, %2}"; > > > } > > > [(set_attr "type" "sselog") > > > (set_attr "prefix" "") > > > - (set_attr "mode" "")]) > > > + (set_attr "mode" "") > > > + (set (attr "dest_false_dep") > > > + (if_then_else > > > + (match_test "") > > > + (const_string "false") > > > + (const_string "true")))]) > > > > > > (define_insn "avx2_permv2ti" > > > [(set (match_operand:V4DI 0 "register_operand" "=3Dx") > > > @@ -26843,10 +26905,21 @@ (define_insn "avx512dq_rangep" > > > (match_operand:SI 3 "const_0_to_15_operand")] > > > UNSPEC_RANGE))] > > > "TARGET_AVX512DQ && " > > > - "vrange\t{%3, %2, %1, %0|%0, %1, %2, %3}" > > > +{ > > > + if (TARGET_DEST_FALSE_DEPENDENCY > > > + && get_attr_dest_false_dep (insn) =3D=3D > > > + DEST_FALSE_DEP_TRUE) > > > + output_asm_insn ("vxorps\t{%x0, %x0, %x0}", operands); > > > + return "vrange\t{%3, %2, %1= , %0|%0, %1, %2, %3}"= ; > > > +} > > > [(set_attr "type" "sse") > > > (set_attr "prefix" "evex") > > > - (set_attr "mode" "")]) > > > + (set_attr "mode" "") > > > + (set (attr "dest_false_dep") > > > + (if_then_else > > > + (match_test "") > > > + (const_string "false") > > > + (const_string "true")))]) > > > > > > (define_insn "avx512dq_ranges" > > > [(set (match_operand:VF_128 0 "register_operand" "=3Dv") > > > @@ -26859,10 +26932,21 @@ (define_insn "avx512dq_ranges > > > (match_dup 1) > > > (const_int 1)))] > > > "TARGET_AVX512DQ" > > > - "vrange\t{%3, = %2, %1, %0|%0, %1, %2, %3}" > > > +{ > > > + if (TARGET_DEST_FALSE_DEPENDENCY > > > + && get_attr_dest_false_dep (insn) =3D=3D > > > + DEST_FALSE_DEP_TRUE) > > > + output_asm_insn ("vxorps\t{%x0, %x0, %x0}", operands); > > > + return "vrange\t{%3, %2, %1, %0|%0, %1, %2, %3}"; > > > +} > > > [(set_attr "type" "sse") > > > (set_attr "prefix" "evex") > > > - (set_attr "mode" "")]) > > > + (set_attr "mode" "") > > > + (set (attr "dest_false_dep") > > > + (if_then_else > > > + (match_test "") > > > + (const_string "false") > > > + (const_string "true")))]) > > > > > > (define_insn "avx512dq_fpclass" > > > [(set (match_operand: 0 "register_operand" "=3Dk"= ) > > > @@ -26899,9 +26983,20 @@ (define_insn "_getmant" > > > (match_operand:SI 2 "const_0_to_15_operand")] > > > UNSPEC_GETMANT))] > > > "TARGET_AVX512F" > > > - "vgetmant\t{%2, %1, %0|%0, %1, %2}"; > > > +{ > > > + if (TARGET_DEST_FALSE_DEPENDENCY > > > + && get_attr_dest_false_dep (insn) =3D=3D > > > + DEST_FALSE_DEP_TRUE) > > > + output_asm_insn ("vxorps\t{%x0, %x0, %x0}", operands); > > > + return "vgetmant\t{%2, %1, = %0|%0, %1, %2}"; > > > +} > > > [(set_attr "prefix" "evex") > > > - (set_attr "mode" "")]) > > > + (set_attr "mode" "") > > > + (set (attr "dest_false_dep") > > > + (if_then_else > > > + (match_test "!MEM_P (operands[1]) || ") > > > + (const_string "false") > > > + (const_string "true")))]) > > > > > > (define_insn "avx512f_vgetmant" > > > [(set (match_operand:VFH_128 0 "register_operand" "=3Dv") > > > @@ -26914,9 +27009,20 @@ (define_insn "avx512f_vgetmant > > (match_dup 1) > > > (const_int 1)))] > > > "TARGET_AVX512F" > > > - "vgetmant\t{%3, %2, %1, %0|%0, %1, %2<= round_saeonly_scalar_mask_op4>, %3}"; > > > +{ > > > + if (TARGET_DEST_FALSE_DEPENDENCY > > > + && get_attr_dest_false_dep (insn) =3D=3D > > > + DEST_FALSE_DEP_TRUE) > > > + output_asm_insn ("vxorps\t{%x0, %x0, %x0}", operands); > > > + return "vgetmant\t{%3, %2, %1, %0|%0, %1, %2, %3}"; > > > +} > > > [(set_attr "prefix" "evex") > > > - (set_attr "mode" "")]) > > > + (set_attr "mode" "") > > > + (set (attr "dest_false_dep") > > > + (if_then_else > > > + (match_test "") > > > + (const_string "false") > > > + (const_string "true")))]) > > > > > > ;; The correct representation for this is absolutely enormous, and > > > ;; surely not generally useful. > > > diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md > > > index 21d445cc46c..802a8715b01 100644 > > > --- a/gcc/config/i386/subst.md > > > +++ b/gcc/config/i386/subst.md > > > @@ -71,6 +71,11 @@ (define_subst_attr "bcst_mask_prefix3" "mask" "ori= g,maybe_evex" "evex,evex") > > > (define_subst_attr "mask_prefix4" "mask" "orig,orig,vex" "evex,evex,= evex") > > > (define_subst_attr "bcst_mask_prefix4" "mask" "orig,orig,maybe_evex"= "evex,evex,evex") > > > (define_subst_attr "mask_expand_op3" "mask" "3" "5") > > > +(define_subst_attr "mask3_dest_false_dep_attr" "mask" "0" "operands[= 3] !=3D CONST0_RTX(mode)") > > > +(define_subst_attr "mask4_dest_false_dep_attr" "mask" "0" "operands[= 4] !=3D CONST0_RTX(mode)") > > > +(define_subst_attr "mask6_dest_false_dep_attr" "mask" "0" "operands[= 6] !=3D CONST0_RTX(mode)") > > > +(define_subst_attr "mask10_dest_false_dep_attr" "mask" "0" "operands= [10] !=3D CONST0_RTX(mode)") > > > +(define_subst_attr "maskc_dest_false_dep_attr" "maskc" "0" "operands= [3] !=3D CONST0_RTX(mode)") > > > > > > (define_subst "mask" > > > [(set (match_operand:SUBST_V 0) > > > @@ -337,6 +342,8 @@ (define_subst_attr "mask_scalarc_operand3" "mask_= scalarc" "" "%{%4%}%N3") > > > (define_subst_attr "mask_scalar_operand3" "mask_scalar" "" "%{%4%}%N= 3") > > > (define_subst_attr "mask_scalar_operand4" "mask_scalar" "" "%{%5%}%N= 4") > > > (define_subst_attr "mask_scalarcz_operand4" "mask_scalarcz" "" "%{%5= %}%N4") > > > +(define_subst_attr "mask_scalar4_dest_false_dep_attr" "mask_scalar" = "0" "operands[4] !=3D CONST0_RTX(mode)") > > > +(define_subst_attr "mask_scalarc_dest_false_dep_attr" "mask_scalarc"= "0" "operands[3] !=3D CONST0_RTX(V8HFmode)") > > > > > > (define_subst "mask_scalar" > > > [(set (match_operand:SUBST_V 0) > > > diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.= def > > > index 0d3fd078068..1b42c96fc38 100644 > > > --- a/gcc/config/i386/x86-tune.def > > > +++ b/gcc/config/i386/x86-tune.def > > > @@ -79,6 +79,11 @@ DEF_TUNE (X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPEND= ENCY, > > > m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_AMDFAM10 > > > | m_BDVER | m_ZNVER | m_ALDERLAKE | m_GENERIC) > > > > > > +/* X86_TUNE_DEST_FALSE_DEPENDENCY: This knob inserts zero-idiom befo= re > > > + several insns to break false dependency on the dest register. */ > > > +DEF_TUNE (X86_TUNE_DEST_FALSE_DEPENDENCY, > > > + "dest_false_dependency", m_SAPPHIRERAPIDS | m_ALDERLAKE) > > > + > > > /* X86_TUNE_SSE_SPLIT_REGS: Set for machines where the type and depe= ndencies > > > are resolved on SSE register parts instead of whole registers, so= we may > > > maintain just lower part of scalar values in proper format leavin= g the > > > diff --git a/gcc/testsuite/gcc.target/i386/avx2-dest-false-dependency= .c b/gcc/testsuite/gcc.target/i386/avx2-dest-false-dependency.c > > > new file mode 100644 > > > index 00000000000..e138920ce18 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/i386/avx2-dest-false-dependency.c > > > @@ -0,0 +1,24 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-mavx2 -mtune-ctrl=3Ddest_false_dependency -O2" } *= / > > > + > > > + > > > +#include > > > + > > > +extern __m256i i1, i2, i3, i4; > > > +extern __m256d d1, d2; > > > +extern __m256 f1, f2; > > > + > > > +void vperm_test (void) > > > +{ > > > + i3 =3D _mm256_permutevar8x32_epi32 (i1, i2); > > > + i4 =3D _mm256_permute4x64_epi64 (i1, 12); > > > + d2 =3D _mm256_permute4x64_pd (d1, 12); > > > + f2 =3D _mm256_permutevar8x32_ps (f1, i2); > > > +} > > > + > > > +/* { dg-final { scan-assembler-times "vxorps" 4 } } */ > > > +/* { dg-final { scan-assembler-times "vpermd" 1 } } */ > > > +/* { dg-final { scan-assembler-times "vpermq" 1 } } */ > > > +/* { dg-final { scan-assembler-times "vpermpd" 1 } } */ > > > +/* { dg-final { scan-assembler-times "vpermps" 1 } } */ > > > + > > > diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-dest-false-depend= ency.c b/gcc/testsuite/gcc.target/i386/avx512dq-dest-false-dependency.c > > > new file mode 100644 > > > index 00000000000..2feb58f2cd8 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/i386/avx512dq-dest-false-dependency.c > > > @@ -0,0 +1,73 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-mavx512dq -mavx512vl -mtune-ctrl=3Ddest_false_depe= ndency -O2" } */ > > > + > > > +#include > > > + > > > +extern __m512i i1; > > > +extern __m256i i2; > > > +extern __m128i i3; > > > +extern __m512d d1; > > > +extern __m256d d2; > > > +extern __m128d d3; > > > +extern __m512 f1; > > > +extern __m256 f2; > > > +extern __m128 f3; > > > + > > > +__mmask32 m32; > > > +__mmask16 m16; > > > +__mmask8 m8; > > > + > > > +void mullo_test (void) > > > +{ > > > + i1 =3D _mm512_mullo_epi64 (i1, i1); > > > + i1 =3D _mm512_mask_mullo_epi64 (i1, m8, i1, i1); > > > + i1 =3D _mm512_maskz_mullo_epi64 (m8, i1, i1); > > > + i2 =3D _mm256_mullo_epi64 (i2, i2); > > > + i2 =3D _mm256_mask_mullo_epi64 (i2, m8, i2, i2); > > > + i2 =3D _mm256_maskz_mullo_epi64 (m8, i2, i2); > > > + i3 =3D _mm_mullo_epi64 (i3, i3); > > > + i3 =3D _mm_mask_mullo_epi64 (i3, m8, i3, i3); > > > + i3 =3D _mm_maskz_mullo_epi64 (m8, i3, i3); > > > +} > > > + > > > +void range_test (void) > > > +{ > > > + d1 =3D _mm512_range_pd (d1, d1, 15); > > > + d1 =3D _mm512_range_round_pd (d1, d1, 15, 8); > > > + d1 =3D _mm512_mask_range_pd (d1, m8, d1, d1, 15); > > > + d1 =3D _mm512_mask_range_round_pd (d1, m8, d1, d1, 15, 8); > > > + d1 =3D _mm512_maskz_range_pd (m8, d1, d1, 15); > > > + d1 =3D _mm512_maskz_range_round_pd (m8, d1, d1, 15, 8); > > > + d2 =3D _mm256_range_pd (d2, d2, 15); > > > + d2 =3D _mm256_mask_range_pd (d2, m8, d2, d2, 15); > > > + d2 =3D _mm256_maskz_range_pd (m8, d2, d2, 15); > > > + d3 =3D _mm_range_pd (d3, d3, 15); > > > + d3 =3D _mm_mask_range_pd (d3, m8, d3, d3, 15); > > > + d3 =3D _mm_maskz_range_pd (m8, d3, d3, 15); > > > + d3 =3D _mm_range_sd (d3, d3, 15); > > > + d3 =3D _mm_mask_range_sd (d3, m8, d3, d3, 15); > > > + d3 =3D _mm_maskz_range_sd (m8, d3, d3, 15); > > > + > > > + f1 =3D _mm512_range_ps (f1, f1, 15); > > > + f1 =3D _mm512_range_round_ps (f1, f1, 15, 8); > > > + f1 =3D _mm512_mask_range_ps (f1, m16, f1, f1, 15); > > > + f1 =3D _mm512_mask_range_round_ps (f1, m16, f1, f1, 15, 8); > > > + f1 =3D _mm512_maskz_range_ps (m16, f1, f1, 15); > > > + f1 =3D _mm512_maskz_range_round_ps (m16, f1, f1, 15, 8); > > > + f2 =3D _mm256_range_ps (f2, f2, 15); > > > + f2 =3D _mm256_mask_range_ps (f2, m8, f2, f2, 15); > > > + f2 =3D _mm256_maskz_range_ps (m8, f2, f2, 15); > > > + f3 =3D _mm_range_ps (f3, f3, 15); > > > + f3 =3D _mm_mask_range_ps (f3, m8, f3, f3, 15); > > > + f3 =3D _mm_maskz_range_ps (m8, f3, f3, 15); > > > + f3 =3D _mm_range_ss (f3, f3, 15); > > > + f3 =3D _mm_mask_range_ss (f3, m8, f3, f3, 15); > > > + f3 =3D _mm_maskz_range_ss (m8, f3, f3, 15); > > > +} > > > + > > > +/* { dg-final { scan-assembler-times "vxorps" 26 } } */ > > > +/* { dg-final { scan-assembler-times "vpmullq" 9 } } */ > > > +/* { dg-final { scan-assembler-times "vrangepd" 12 } } */ > > > +/* { dg-final { scan-assembler-times "vrangesd" 3 } } */ > > > +/* { dg-final { scan-assembler-times "vrangeps" 12 } } */ > > > +/* { dg-final { scan-assembler-times "vrangess" 3 } } */ > > > diff --git a/gcc/testsuite/gcc.target/i386/avx512f-dest-false-depende= ncy.c b/gcc/testsuite/gcc.target/i386/avx512f-dest-false-dependency.c > > > new file mode 100644 > > > index 00000000000..9650839970e > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/i386/avx512f-dest-false-dependency.c > > > @@ -0,0 +1,102 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-mavx512f -mtune-ctrl=3Ddest_false_dependency -O2" = } */ > > > + > > > +#include > > > + > > > +extern __m512i i1; > > > +extern __m512d d1, *pd1; > > > +extern __m128d d2; > > > +extern __m512 f1, *pf1; > > > +extern __m128 f2; > > > + > > > +__mmask16 m16; > > > +__mmask8 m8; > > > + > > > +void vperm_test (void) > > > +{ > > > + d1 =3D _mm512_permutex_pd (d1, 12); > > > + d1 =3D _mm512_mask_permutex_pd (d1, m8, d1, 12); > > > + d1 =3D _mm512_maskz_permutex_pd (m8, d1, 12); > > > + d1 =3D _mm512_permutexvar_pd (i1, d1); > > > + d1 =3D _mm512_mask_permutexvar_pd (d1, m8, i1, d1); > > > + d1 =3D _mm512_maskz_permutexvar_pd (m8, i1, d1); > > > + > > > + f1 =3D _mm512_permutexvar_ps (i1, f1); > > > + f1 =3D _mm512_mask_permutexvar_ps (f1, m16, i1, f1); > > > + f1 =3D _mm512_maskz_permutexvar_ps (m16, i1, f1); > > > + > > > + i1 =3D _mm512_permutexvar_epi64 (i1, i1); > > > + i1 =3D _mm512_mask_permutexvar_epi64 (i1, m8, i1, i1); > > > + i1 =3D _mm512_maskz_permutexvar_epi64 (m8, i1, i1); > > > + i1 =3D _mm512_permutex_epi64 (i1, 12); > > > + i1 =3D _mm512_mask_permutex_epi64 (i1, m8, i1, 12); > > > + i1 =3D _mm512_maskz_permutex_epi64 (m8, i1, 12); > > > + > > > + i1 =3D _mm512_permutexvar_epi32 (i1, i1); > > > + i1 =3D _mm512_mask_permutexvar_epi32 (i1, m16, i1, i1); > > > + i1 =3D _mm512_maskz_permutexvar_epi32 (m16, i1, i1); > > > +} > > > + > > > +void getmant_test (void) > > > +{ > > > + d1 =3D _mm512_getmant_pd (*pd1, _MM_MANT_NORM_p75_1p5, > > > + _MM_MANT_SIGN_src); > > > + d1 =3D _mm512_getmant_round_pd (*pd1, _MM_MANT_NORM_p75_1p5, > > > + _MM_MANT_SIGN_src, 8); > > > + d1 =3D _mm512_mask_getmant_pd (d1, m8, *pd1, _MM_MANT_NORM_p75_1p5= , > > > + _MM_MANT_SIGN_src); > > > + d1 =3D _mm512_mask_getmant_round_pd (d1, m8, *pd1, _MM_MANT_NORM_p= 75_1p5, > > > + _MM_MANT_SIGN_src, 8); > > > + d1 =3D _mm512_maskz_getmant_pd (m8, *pd1, _MM_MANT_NORM_p75_1p5, > > > + _MM_MANT_SIGN_src); > > > + d1 =3D _mm512_maskz_getmant_round_pd (m8, *pd1, _MM_MANT_NORM_p75_= 1p5, > > > + _MM_MANT_SIGN_src, 8); > > > + f1 =3D _mm512_getmant_ps (*pf1, _MM_MANT_NORM_p75_1p5, > > > + _MM_MANT_SIGN_src); > > > + f1 =3D _mm512_getmant_round_ps (*pf1, _MM_MANT_NORM_p75_1p5, > > > + _MM_MANT_SIGN_src, 8); > > > + f1 =3D _mm512_mask_getmant_ps (f1, m16, *pf1, _MM_MANT_NORM_p75_1p= 5, > > > + _MM_MANT_SIGN_src); > > > + f1 =3D _mm512_mask_getmant_round_ps (f1, m16, *pf1, _MM_MANT_NORM_= p75_1p5, > > > + _MM_MANT_SIGN_src, 8); > > > + f1 =3D _mm512_maskz_getmant_ps (m16, *pf1, _MM_MANT_NORM_p75_1p5, > > > + _MM_MANT_SIGN_src); > > > + f1 =3D _mm512_maskz_getmant_round_ps (m16, *pf1, _MM_MANT_NORM_p75= _1p5, > > > + _MM_MANT_SIGN_src, 8); > > > + > > > + d2 =3D _mm_getmant_sd (d2, d2, _MM_MANT_NORM_p75_1p5, > > > + _MM_MANT_SIGN_src); > > > + d2 =3D _mm_getmant_round_sd (d2, d2, _MM_MANT_NORM_p75_1p5, > > > + _MM_MANT_SIGN_src, 8); > > > + d2 =3D _mm_mask_getmant_sd (d2, m8, d2, d2, _MM_MANT_NORM_p75_1p5, > > > + _MM_MANT_SIGN_src); > > > + d2 =3D _mm_mask_getmant_round_sd (d2, m8, d2, d2, _MM_MANT_NORM_p7= 5_1p5, > > > + _MM_MANT_SIGN_src, 8); > > > + d2 =3D _mm_maskz_getmant_sd (m8, d2, d2, _MM_MANT_NORM_p75_1p5, > > > + _MM_MANT_SIGN_src); > > > + d2 =3D _mm_maskz_getmant_round_sd (m8, d2, d2, _MM_MANT_NORM_p75_1= p5, > > > + _MM_MANT_SIGN_src, 8); > > > + f2 =3D _mm_getmant_ss (f2, f2, _MM_MANT_NORM_p75_1p5, > > > + _MM_MANT_SIGN_src); > > > + f2 =3D _mm_getmant_round_ss (f2, f2, _MM_MANT_NORM_p75_1p5, > > > + _MM_MANT_SIGN_src, 8); > > > + f2 =3D _mm_mask_getmant_ss (f2, m8, f2, f2, _MM_MANT_NORM_p75_1p5, > > > + _MM_MANT_SIGN_src); > > > + f2 =3D _mm_mask_getmant_round_ss (f2, m8, f2, f2, _MM_MANT_NORM_p7= 5_1p5, > > > + _MM_MANT_SIGN_src, 8); > > > + f2 =3D _mm_maskz_getmant_ss (m8, f2, f2, _MM_MANT_NORM_p75_1p5, > > > + _MM_MANT_SIGN_src); > > > + f2 =3D _mm_maskz_getmant_round_ss (m8, f2, f2, _MM_MANT_NORM_p75_1= p5, > > > + _MM_MANT_SIGN_src, 8); > > > + > > > +} > > > + > > > +/* { dg-final { scan-assembler-times "vxorps" 24 } } */ > > > +/* { dg-final { scan-assembler-times "vpermd" 3 } } */ > > > +/* { dg-final { scan-assembler-times "vpermq" 6 } } */ > > > +/* { dg-final { scan-assembler-times "vpermps" 3 } } */ > > > +/* { dg-final { scan-assembler-times "vpermpd" 6 } } */ > > > +/* { dg-final { scan-assembler-times "vgetmantpd" 6 } } */ > > > +/* { dg-final { scan-assembler-times "vgetmantps" 6 } } */ > > > +/* { dg-final { scan-assembler-times "vgetmantsd" 6 } } */ > > > +/* { dg-final { scan-assembler-times "vgetmantss" 6 } } */ > > > diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-dest-false-depe= ndency.c b/gcc/testsuite/gcc.target/i386/avx512fp16-dest-false-dependency.c > > > new file mode 100644 > > > index 00000000000..793bb66201b > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-dest-false-dependency.= c > > > @@ -0,0 +1,45 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-mavx512fp16 -mavx512vl -mtune-ctrl=3Ddest_false_de= pendency -O2" } */ > > > + > > > +#include > > > + > > > +extern __m512h h1; > > > +extern __m256h h2; > > > +extern __m128h h3; > > > + > > > +__mmask32 m32; > > > +__mmask16 m16; > > > +__mmask8 m8; > > > + > > > +void complex_mul_test (void) > > > +{ > > > + h1 =3D _mm512_fmul_pch (h1, h1); > > > + h1 =3D _mm512_fmul_round_pch (h1, h1, 8); > > > + h1 =3D _mm512_mask_fmul_pch (h1, m32, h1, h1); > > > + h1 =3D _mm512_mask_fmul_round_pch (h1, m32, h1, h1, 8); > > > + h1 =3D _mm512_maskz_fmul_pch (m32, h1, h1); > > > + h1 =3D _mm512_maskz_fmul_round_pch (m32, h1, h1, 11); > > > + > > > + h3 =3D _mm_fmul_sch (h3, h3); > > > + h3 =3D _mm_fmul_round_sch (h3, h3, 8); > > > + h3 =3D _mm_mask_fmul_sch (h3, m8, h3, h3); > > > + h3 =3D _mm_mask_fmul_round_sch (h3, m8, h3, h3, 8); > > > + h3 =3D _mm_maskz_fmul_sch (m8, h3, h3); > > > + h3 =3D _mm_maskz_fmul_round_sch (m8, h3, h3, 11); > > > +} > > > + > > > +void vgetmant_test (void) > > > +{ > > > + h3 =3D _mm_getmant_sh (h3, h3, _MM_MANT_NORM_p75_1p5, > > > + _MM_MANT_SIGN_src); > > > + h3 =3D _mm_mask_getmant_sh (h3, m8, h3, h3, _MM_MANT_NORM_p75_1p5, > > > + _MM_MANT_SIGN_src); > > > + h3 =3D _mm_maskz_getmant_sh (m8, h3, h3, _MM_MANT_NORM_p75_1p5, > > > + _MM_MANT_SIGN_src); > > > +} > > > + > > > +/* { dg-final { scan-assembler-times "vxorps" 10 } } */ > > > +/* { dg-final { scan-assembler-times "vfmulcph" 6 } } */ > > > +/* { dg-final { scan-assembler-times "vfmulcsh" 6 } } */ > > > +/* { dg-final { scan-assembler-times "vgetmantsh" 3 } } */ > > > + > > > diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-dest-false-de= pendency.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-dest-false-dependen= cy.c > > > new file mode 100644 > > > index 00000000000..09658905d2d > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-dest-false-dependenc= y.c > > > @@ -0,0 +1,24 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-mavx512fp16 -mavx512vl -mtune-ctrl=3Ddest_false_de= pendency -O2" } */ > > > + > > > +#include > > > + > > > +extern __m256h h1; > > > +extern __m128h h2; > > > + > > > +__mmask16 m16; > > > +__mmask8 m8; > > > + > > > +void complex_mul_test (void) > > > +{ > > > + h1 =3D _mm256_fmul_pch (h1, h1); > > > + h1 =3D _mm256_mask_fmul_pch (h1, m16, h1, h1); > > > + h1 =3D _mm256_maskz_fmul_pch (m16, h1, h1); > > > + h2 =3D _mm_fmul_pch (h2, h2); > > > + h2 =3D _mm_mask_fmul_pch (h2, m16, h2, h2); > > > + h2 =3D _mm_maskz_fmul_pch (m16, h2, h2); > > > +} > > > + > > > +/* { dg-final { scan-assembler-times "vxorps" 4 } } */ > > > +/* { dg-final { scan-assembler-times "vfmulcph" 6 } } */ > > > + > > > diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-dest-false-depend= ency.c b/gcc/testsuite/gcc.target/i386/avx512vl-dest-false-dependency.c > > > new file mode 100644 > > > index 00000000000..92717a99837 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/i386/avx512vl-dest-false-dependency.c > > > @@ -0,0 +1,76 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-mavx512f -mavx512vl -mtune-ctrl=3Ddest_false_depen= dency -O2" } */ > > > + > > > + > > > +#include > > > + > > > +extern __m256i i1; > > > +extern __m256d d1, *pd1; > > > +extern __m128d d2, *pd2; > > > +extern __m256 f1, *pf1; > > > +extern __m128 f2, *pf2; > > > + > > > +__mmask16 m16; > > > +__mmask8 m8; > > > + > > > +void vperm_test (void) > > > +{ > > > + d1 =3D _mm256_permutex_pd (d1, 12); > > > + d1 =3D _mm256_mask_permutex_pd (d1, m8, d1, 12); > > > + d1 =3D _mm256_maskz_permutex_pd (m8, d1, 12); > > > + d1 =3D _mm256_permutexvar_pd (i1, d1); > > > + d1 =3D _mm256_mask_permutexvar_pd (d1, m8, i1, d1); > > > + d1 =3D _mm256_maskz_permutexvar_pd (m8, i1, d1); > > > + > > > + f1 =3D _mm256_permutexvar_ps (i1, f1); > > > + f1 =3D _mm256_mask_permutexvar_ps (f1, m8, i1, f1); > > > + f1 =3D _mm256_maskz_permutexvar_ps (m8, i1, f1); > > > + > > > + i1 =3D _mm256_permutexvar_epi64 (i1, i1); > > > + i1 =3D _mm256_mask_permutexvar_epi64 (i1, m8, i1, i1); > > > + i1 =3D _mm256_maskz_permutexvar_epi64 (m8, i1, i1); > > > + i1 =3D _mm256_permutex_epi64 (i1, 12); > > > + i1 =3D _mm256_mask_permutex_epi64 (i1, m8, i1, 12); > > > + i1 =3D _mm256_maskz_permutex_epi64 (m8, i1, 12); > > > + > > > + i1 =3D _mm256_permutexvar_epi32 (i1, i1); > > > + i1 =3D _mm256_mask_permutexvar_epi32 (i1, m8, i1, i1); > > > + i1 =3D _mm256_maskz_permutexvar_epi32 (m8, i1, i1); > > > +} > > > + > > > +void getmant_test (void) > > > +{ > > > + d1 =3D _mm256_getmant_pd (*pd1, _MM_MANT_NORM_p75_1p5, > > > + _MM_MANT_SIGN_src); > > > + d1 =3D _mm256_mask_getmant_pd (d1, m8, *pd1, _MM_MANT_NORM_p75_1p5= , > > > + _MM_MANT_SIGN_src); > > > + d1 =3D _mm256_maskz_getmant_pd (m8, *pd1, _MM_MANT_NORM_p75_1p5, > > > + _MM_MANT_SIGN_src); > > > + d2 =3D _mm_getmant_pd (*pd2, _MM_MANT_NORM_p75_1p5, > > > + _MM_MANT_SIGN_src); > > > + d2 =3D _mm_mask_getmant_pd (d2, m8, *pd2, _MM_MANT_NORM_p75_1p5, > > > + _MM_MANT_SIGN_src); > > > + d2 =3D _mm_maskz_getmant_pd (m8, *pd2, _MM_MANT_NORM_p75_1p5, > > > + _MM_MANT_SIGN_src); > > > + f1 =3D _mm256_getmant_ps (*pf1, _MM_MANT_NORM_p75_1p5, > > > + _MM_MANT_SIGN_src); > > > + f1 =3D _mm256_mask_getmant_ps (f1, m8, *pf1, _MM_MANT_NORM_p75_1p5= , > > > + _MM_MANT_SIGN_src); > > > + f1 =3D _mm256_maskz_getmant_ps (m8, *pf1, _MM_MANT_NORM_p75_1p5, > > > + _MM_MANT_SIGN_src); > > > + f2 =3D _mm_getmant_ps (*pf2, _MM_MANT_NORM_p75_1p5, > > > + _MM_MANT_SIGN_src); > > > + f2 =3D _mm_mask_getmant_ps (f2, m8, *pf2, _MM_MANT_NORM_p75_1p5, > > > + _MM_MANT_SIGN_src); > > > + f2 =3D _mm_maskz_getmant_ps (m8, *pf2, _MM_MANT_NORM_p75_1p5, > > > + _MM_MANT_SIGN_src); > > > +} > > > + > > > +/* { dg-final { scan-assembler-times "vxorps" 20 } } */ > > > +/* { dg-final { scan-assembler-times "vpermpd" 6 } } */ > > > +/* { dg-final { scan-assembler-times "vpermps" 3 } } */ > > > +/* { dg-final { scan-assembler-times "vpermq" 6 } } */ > > > +/* { dg-final { scan-assembler-times "vpermd" 3 } } */ > > > +/* { dg-final { scan-assembler-times "vgetmantpd" 6 } } */ > > > +/* { dg-final { scan-assembler-times "vgetmantps" 6 } } */ > > > + > > > -- > > > 2.18.1 > > >