From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by sourceware.org (Postfix) with ESMTPS id 012173858C30 for ; Mon, 10 Jul 2023 01:19:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 012173858C30 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1688951958; x=1720487958; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=cuFQhMLMT7yDcDEw8YzVKq0q7qfewr+tSjVVDpPqH6c=; b=Z8E85U4oTwqxEGNXhursuC7sORsgkLeobwIFbK+dZE00bAInTyoesg3q O2bJbwXVim9Jiz4opVz62mUIIeKikLO1VidPl8Q4x5jbfqr4WTDtpYfGg bkDjgHmB8arVUpLB+atFmBQDkPHeFbQjqrPpmLYW420pUqjvnTnVodSPV kgq4WXai9JwNXjMuB9K9ypfDQo30bfMiLCqhiJNl+8IxH/Sd4cDGgtpzk kTh481xjiVmUXK+CjWjoTBdfrn8FUZLSzJkm5N+MsLTdD+FxORNwoz7wW 0vo+jLKZPv/4LOyE9corrNOaKzCkFeJI/66n/ETK1zUa23uM41mkWcgBI Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10766"; a="427923608" X-IronPort-AV: E=Sophos;i="6.01,193,1684825200"; d="scan'208";a="427923608" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jul 2023 18:19:17 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10766"; a="790624165" X-IronPort-AV: E=Sophos;i="6.01,193,1684825200"; d="scan'208";a="790624165" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmsmga004.fm.intel.com with ESMTP; 09 Jul 2023 18:19:15 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 34AC81005696; Mon, 10 Jul 2023 09:19:14 +0800 (CST) From: liuhongt To: gcc-patches@gcc.gnu.org Cc: simonaytes.yan@ispras.ru Subject: [PATCH] Break false dependence for vpternlog by inserting vpxor or setting constraint of input operand to '0' Date: Mon, 10 Jul 2023 09:17:14 +0800 Message-Id: <20230710011714.3615931-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.39.1.388.g2fc9e9ca3c In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: False dependency happens when destination is only updated by pternlog. There is no false dependency when destination is also used in source. So either a pxor should be inserted, or input operand should be set with constraint '0'. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ready to push to trunk. gcc/ChangeLog: PR target/110438 PR target/110202 * config/i386/predicates.md (int_float_vector_all_ones_operand): New predicate. * config/i386/sse.md (*vmov_constm1_pternlog_false_dep): New define_insn. (*_cvtmask2_pternlog_false_dep): Ditto. (*_cvtmask2_pternlog_false_dep): Ditto. (*_cvtmask2): Adjust to define_insn_and_split to avoid false dependence. (*_cvtmask2): Ditto. (one_cmpl2): Adjust constraint of operands 1 to '0' to avoid false dependence. (*andnot3): Ditto. (iornot3): Ditto. (*3): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pr110438.c: New test. --- gcc/config/i386/predicates.md | 8 +- gcc/config/i386/sse.md | 113 ++++++++++++++++++++--- gcc/testsuite/gcc.target/i386/pr110438.c | 30 ++++++ 3 files changed, 135 insertions(+), 16 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr110438.c diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md index 7ddbe01a6f9..37d20c6303a 100644 --- a/gcc/config/i386/predicates.md +++ b/gcc/config/i386/predicates.md @@ -1192,12 +1192,18 @@ (define_predicate "float_vector_all_ones_operand" return false; }) -/* Return true if operand is a vector constant that is all ones. */ +/* Return true if operand is an integral vector constant that is all ones. */ (define_predicate "vector_all_ones_operand" (and (match_code "const_vector") (match_test "INTEGRAL_MODE_P (GET_MODE (op))") (match_test "op == CONSTM1_RTX (GET_MODE (op))"))) +/* Return true if operand is a vector constant that is all ones. */ +(define_predicate "int_float_vector_all_ones_operand" + (ior (match_operand 0 "vector_all_ones_operand") + (match_operand 0 "float_vector_all_ones_operand") + (match_test "op == constm1_rtx"))) + /* Return true if operand is an 128/256bit all ones vector that zero-extends to 256/512bit. */ (define_predicate "vector_all_ones_zero_extend_half_operand" diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 418c337a775..56920a3e1d3 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -1382,6 +1382,29 @@ (define_insn "mov_internal" ] (symbol_ref "true")))]) +; False dependency happens on destination register which is not really +; used when moving all ones to vector register +(define_split + [(set (match_operand:VMOVE 0 "register_operand") + (match_operand:VMOVE 1 "int_float_vector_all_ones_operand"))] + "TARGET_AVX512F && reload_completed + && ( == 64 || EXT_REX_SSE_REG_P (operands[0])) + && optimize_function_for_speed_p (cfun)" + [(set (match_dup 0) (match_dup 2)) + (parallel + [(set (match_dup 0) (match_dup 1)) + (unspec [(match_dup 0)] UNSPEC_INSN_FALSE_DEP)])] + "operands[2] = CONST0_RTX (mode);") + +(define_insn "*vmov_constm1_pternlog_false_dep" + [(set (match_operand:VMOVE 0 "register_operand" "=v") + (match_operand:VMOVE 1 "int_float_vector_all_ones_operand" "")) + (unspec [(match_operand:VMOVE 2 "register_operand" "0")] UNSPEC_INSN_FALSE_DEP)] + "TARGET_AVX512VL || == 64" + "vpternlogd\t{$0xFF, %0, %0, %0|%0, %0, %0, 0xFF}" + [(set_attr "type" "sselog1") + (set_attr "prefix" "evex")]) + ;; If mem_addr points to a memory region with less than whole vector size bytes ;; of accessible memory and k is a mask that would prevent reading the inaccessible ;; bytes from mem_addr, add UNSPEC_MASKLOAD to prevent it to be transformed to vpblendd @@ -9336,7 +9359,7 @@ (define_expand "_cvtmask2" operands[3] = CONST0_RTX (mode); }") -(define_insn "*_cvtmask2" +(define_insn_and_split "*_cvtmask2" [(set (match_operand:VI48_AVX512VL 0 "register_operand" "=v,v") (vec_merge:VI48_AVX512VL (match_operand:VI48_AVX512VL 2 "vector_all_ones_operand") @@ -9346,11 +9369,35 @@ (define_insn "*_cvtmask2" "@ vpmovm2\t{%1, %0|%0, %1} vpternlog\t{$0x81, %0, %0, %0%{%1%}%{z%}|%0%{%1%}%{z%}, %0, %0, 0x81}" + "&& !TARGET_AVX512DQ && reload_completed + && optimize_function_for_speed_p (cfun)" + [(set (match_dup 0) (match_dup 4)) + (parallel + [(set (match_dup 0) + (vec_merge:VI48_AVX512VL + (match_dup 2) + (match_dup 3) + (match_dup 1))) + (unspec [(match_dup 0)] UNSPEC_INSN_FALSE_DEP)])] + "operands[4] = CONST0_RTX (mode);" [(set_attr "isa" "avx512dq,*") (set_attr "length_immediate" "0,1") (set_attr "prefix" "evex") (set_attr "mode" "")]) +(define_insn "*_cvtmask2_pternlog_false_dep" + [(set (match_operand:VI48_AVX512VL 0 "register_operand" "=v") + (vec_merge:VI48_AVX512VL + (match_operand:VI48_AVX512VL 2 "vector_all_ones_operand") + (match_operand:VI48_AVX512VL 3 "const0_operand") + (match_operand: 1 "register_operand" "Yk"))) + (unspec [(match_operand:VI48_AVX512VL 4 "register_operand" "0")] UNSPEC_INSN_FALSE_DEP)] + "TARGET_AVX512F && !TARGET_AVX512DQ" + "vpternlog\t{$0x81, %0, %0, %0%{%1%}%{z%}|%0%{%1%}%{z%}, %0, %0, 0x81}" + [(set_attr "length_immediate" "1") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + (define_expand "extendv2sfv2df2" [(set (match_operand:V2DF 0 "register_operand") (float_extend:V2DF @@ -17166,20 +17213,32 @@ (define_expand "one_cmpl2" operands[2] = force_reg (mode, operands[2]); }) -(define_insn "one_cmpl2" - [(set (match_operand:VI 0 "register_operand" "=v,v") - (xor:VI (match_operand:VI 1 "bcst_vector_operand" "vBr,m") - (match_operand:VI 2 "vector_all_ones_operand" "BC,BC")))] +(define_insn_and_split "one_cmpl2" + [(set (match_operand:VI 0 "register_operand" "=v,v,v") + (xor:VI (match_operand:VI 1 "bcst_vector_operand" " 0, m,Br") + (match_operand:VI 2 "vector_all_ones_operand" "BC,BC,BC")))] "TARGET_AVX512F && (! || mode == SImode || mode == DImode)" { + if (! && which_alternative + && optimize_function_for_speed_p (cfun)) + return "#"; + if (TARGET_AVX512VL) return "vpternlog\t{$0x55, %1, %0, %0|%0, %0, %1, 0x55}"; else return "vpternlog\t{$0x55, %g1, %g0, %g0|%g0, %g0, %g1, 0x55}"; } + "&& reload_completed && !REG_P (operands[1]) && ! + && optimize_function_for_speed_p (cfun)" + [(set (match_dup 0) (match_dup 3)) + (parallel + [(set (match_dup 0) + (xor:VI (match_dup 1) (match_dup 2))) + (unspec [(match_dup 0)] UNSPEC_INSN_FALSE_DEP)])] + "operands[3] = CONST0_RTX (mode);" [(set_attr "type" "sselog") (set_attr "prefix" "evex") (set (attr "mode") @@ -17191,6 +17250,30 @@ (define_insn "one_cmpl2" (symbol_ref " == 64 || TARGET_AVX512VL") (const_int 1)))]) +(define_insn "*one_cmpl2_pternlog_false_dep" + [(set (match_operand:VI 0 "register_operand" "=v,v") + (xor:VI (match_operand:VI 1 "bcst_vector_operand" "m, Br") + (match_operand:VI 2 "vector_all_ones_operand" "BC,BC"))) + (unspec [(match_operand:VI 3 "register_operand" "0,0")] + UNSPEC_INSN_FALSE_DEP)] + "TARGET_AVX512F" +{ + if (TARGET_AVX512VL) + return "vpternlog\t{$0x55, %1, %0, %0|%0, %0, %1, 0x55}"; + else + return "vpternlog\t{$0x55, %g1, %g0, %g0|%g0, %g0, %g1, 0x55}"; +} + [(set_attr "type" "sselog") + (set_attr "prefix" "evex") + (set (attr "mode") + (if_then_else (match_test "TARGET_AVX512VL") + (const_string "") + (const_string "XI"))) + (set (attr "enabled") + (if_then_else (eq_attr "alternative" "0") + (symbol_ref " == 64 || TARGET_AVX512VL") + (const_int 1)))]) + (define_split [(set (match_operand:VI48_AVX512F 0 "register_operand") (vec_duplicate:VI48_AVX512F @@ -17226,7 +17309,7 @@ (define_insn "*andnot3" [(set (match_operand:VI 0 "register_operand" "=x,x,v,v,v") (and:VI (not:VI (match_operand:VI 1 "bcst_vector_operand" "0,x,v,m,Br")) - (match_operand:VI 2 "bcst_vector_operand" "xBm,xm,vmBr,v,v")))] + (match_operand:VI 2 "bcst_vector_operand" "xBm,xm,vmBr,0,0")))] "TARGET_SSE && (register_operand (operands[1], mode) || register_operand (operands[2], mode))" @@ -17685,8 +17768,8 @@ (define_insn "*iornot3" [(set (match_operand:VI 0 "register_operand" "=v,v,v,v") (ior:VI (not:VI - (match_operand:VI 1 "bcst_vector_operand" "v,Br,v,m")) - (match_operand:VI 2 "bcst_vector_operand" "vBr,v,m,v")))] + (match_operand:VI 1 "bcst_vector_operand" "0,m, 0,vBr")) + (match_operand:VI 2 "bcst_vector_operand" "m,0,vBr, 0")))] "( == 64 || TARGET_AVX512VL || (TARGET_AVX512F && !TARGET_PREFER_AVX256)) && (register_operand (operands[1], mode) @@ -17710,7 +17793,7 @@ (define_insn "*iornot3" (const_string "") (const_string "XI"))) (set (attr "enabled") - (if_then_else (eq_attr "alternative" "2,3") + (if_then_else (eq_attr "alternative" "0,1") (symbol_ref " == 64 || TARGET_AVX512VL") (const_string "*")))]) @@ -17718,8 +17801,8 @@ (define_insn "*xnor3" [(set (match_operand:VI 0 "register_operand" "=v,v") (not:VI (xor:VI - (match_operand:VI 1 "bcst_vector_operand" "%v,v") - (match_operand:VI 2 "bcst_vector_operand" "vBr,m"))))] + (match_operand:VI 1 "bcst_vector_operand" "%0, 0") + (match_operand:VI 2 "bcst_vector_operand" " m,vBr"))))] "( == 64 || TARGET_AVX512VL || (TARGET_AVX512F && !TARGET_PREFER_AVX256)) && (register_operand (operands[1], mode) @@ -17738,7 +17821,7 @@ (define_insn "*xnor3" (const_string "") (const_string "XI"))) (set (attr "enabled") - (if_then_else (eq_attr "alternative" "1") + (if_then_else (eq_attr "alternative" "0") (symbol_ref " == 64 || TARGET_AVX512VL") (const_string "*")))]) @@ -17749,8 +17832,8 @@ (define_code_attr ternlog_nlogic [(and "0x11") (ior "0x77")]) (define_insn "*3" [(set (match_operand:VI 0 "register_operand" "=v,v") (andor:VI - (not:VI (match_operand:VI 1 "bcst_vector_operand" "%v,v")) - (not:VI (match_operand:VI 2 "bcst_vector_operand" "vBr,m"))))] + (not:VI (match_operand:VI 1 "bcst_vector_operand" "%0, 0")) + (not:VI (match_operand:VI 2 "bcst_vector_operand" "m,vBr"))))] "( == 64 || TARGET_AVX512VL || (TARGET_AVX512F && !TARGET_PREFER_AVX256)) && (register_operand (operands[1], mode) @@ -17769,7 +17852,7 @@ (define_insn "*3" (const_string "") (const_string "XI"))) (set (attr "enabled") - (if_then_else (eq_attr "alternative" "1") + (if_then_else (eq_attr "alternative" "0") (symbol_ref " == 64 || TARGET_AVX512VL") (const_string "*")))]) diff --git a/gcc/testsuite/gcc.target/i386/pr110438.c b/gcc/testsuite/gcc.target/i386/pr110438.c new file mode 100644 index 00000000000..11b8cc59fd2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr110438.c @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512f -O2 -ftree-vectorize -mno-avx512dq -dp -mprefer-vector-width=512" } */ +/* { dg-final { scan-assembler-times {cvtmask2.*_pternlog} "1" } } */ +/* { dg-final { scan-assembler-times {constm1_pternlog} "1" } } */ +/* { dg-final { scan-assembler-not {(?n)vpternlogd.*\(} } } */ + + +#include + +__m512i g(void) +{ + return (__m512i){ 0 } - 1; +} + +__m512i g1(__m512i* a) +{ + return ~(*a); +} + +void +foo (int* a, int* __restrict b) +{ + for (int i = 0; i != 16; i++) + { + if (b[i]) + a[i] = -1; + else + a[i] = 0; + } +} -- 2.39.1.388.g2fc9e9ca3c