From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) by sourceware.org (Postfix) with ESMTPS id 1582F384CB97 for ; Thu, 16 May 2024 04:05:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1582F384CB97 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 1582F384CB97 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.7 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715832351; cv=none; b=JqAk9d3fiP4j8n8EQXkweVtF4/arAPofvpzMuVa/7HOOfN4oIa1m1JcuvIyzL84M7Z+1Smc/VqVk+K21U5FsDiLhcz1St5GgxdqNOgGHEfQYs+ByLJyQgs/aqTnCGIs09kxG5lI/SZP8RBgk98WlDXEMoZjAsNmjnzrxNeOl/n0= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715832351; c=relaxed/simple; bh=/GVW5S01UQkVLewpSI50IYqMJGhZ5pRv72RYPUCzBIo=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=f2MYQFCvre3rudVc5rP6gRfpYUA7732S4cJiuJvSi9BACtUlu7vrh43X7nENxRlkobvzHUvuBzDeZughwr1RsHi2R53JfZ8lm4nFsgtucxFyRwtJh+veYb1RjPDcjWVpEco9KlzwRQzvbsQdwTDtuaT6o+bxFQ4awSS/YfC5puM= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1715832349; x=1747368349; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=/GVW5S01UQkVLewpSI50IYqMJGhZ5pRv72RYPUCzBIo=; b=M6wu8U6PfyjVq6L6FtbGBqE8QAPYMOC3MwMR3kxLKsgqaNukZY3j42b7 +kacQN/s6s3zCkIG6Wcd7ScPuuX4GRH7AR42iTxBPQUiZvIwRMPcCriTO z1ezS6fV9a2EUemSLChfFkbkY2Qubdv4jmwQFtlZGYXtPl89Vb9PQ3QWc uVH2WkMyJlbAivct7OB5LAleP72tCk//pDFdQBkWjjD27j1zoBdXFFFUE vjfjCI/4iNwtQmT/xPbxXYKF+p/+W4A43EAl8ARvT94yWfQPIK7+wWKS3 u8XsSzhu3jUX0GaY+vopNnXQogjvL5a7cxOrfJj/ZN2bFIj/YeIn/X/11 w==; X-CSE-ConnectionGUID: VF9hczsCRt2buUmMKhxl3Q== X-CSE-MsgGUID: 3zDzCBQcTqqN7s4bDZuh6Q== X-IronPort-AV: E=McAfee;i="6600,9927,11074"; a="37297984" X-IronPort-AV: E=Sophos;i="6.08,163,1712646000"; d="scan'208";a="37297984" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 May 2024 21:05:47 -0700 X-CSE-ConnectionGUID: hTt0OgcVSESJhmDq13RFGw== X-CSE-MsgGUID: 63qNx/nlTBOOrIDGQKL3UQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,163,1712646000"; d="scan'208";a="68736863" Received: from shvmail02.sh.intel.com ([10.239.244.9]) by orviesa001.jf.intel.com with ESMTP; 15 May 2024 21:05:45 -0700 Received: from pli-ubuntu.sh.intel.com (pli-ubuntu.sh.intel.com [10.239.159.47]) by shvmail02.sh.intel.com (Postfix) with ESMTP id 14DD41005666; Thu, 16 May 2024 12:05:44 +0800 (CST) From: pan2.li@intel.com To: gcc-patches@gcc.gnu.org Cc: juzhe.zhong@rivai.ai, kito.cheng@gmail.com, tamar.christina@arm.com, richard.guenther@gmail.com, Richard.Sandiford@arm.com, Pan Li Subject: [PATCH v2 1/3] Vect: Support loop len in vectorizable early exit Date: Thu, 16 May 2024 12:05:40 +0800 Message-Id: <20240516040542.2734412-1-pan2.li@intel.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: From: Pan Li This patch adds early break auto-vectorization support for target which use length on partial vectorization. Consider this following example: unsigned vect_a[802]; unsigned vect_b[802]; void test (unsigned x, int n) { for (int i = 0; i < n; i++) { vect_b[i] = x + i; if (vect_a[i] > x) break; vect_a[i] = x; } } We use VCOND_MASK_LEN to simulate the generate (mask && i < len + bias). And then the IR of RVV looks like below: ... _87 = .SELECT_VL (ivtmp_85, POLY_INT_CST [32, 32]); _55 = (int) _87; ... mask_patt_6.13_69 = vect_cst__62 < vect__3.12_67; vec_len_mask_72 = .VCOND_MASK_LEN (mask_patt_6.13_69, { -1, ... }, \ {0, ... }, _87, 0); if (vec_len_mask_72 != { 0, ... }) goto ; [5.50%] else goto ; [94.50%] The below tests are passed for this patch: 1. The riscv fully regression tests. 2. The x86 bootstrap tests. 3. The x86 fully regression tests. gcc/ChangeLog: * tree-vect-stmts.cc (vectorizable_early_exit): Add loop len handling for one or multiple stmt. gcc/ChangeLog: * tree-vect-loop.cc (vect_gen_loop_len_mask): New func to gen the loop len mask. * tree-vect-stmts.cc (vectorizable_early_exit): Invoke the vect_gen_loop_len_mask for 1 or more stmt(s). * tree-vectorizer.h (vect_gen_loop_len_mask): New func decl for vect_gen_loop_len_mask. Signed-off-by: Pan Li --- gcc/tree-vect-loop.cc | 27 +++++++++++++++++++++++++++ gcc/tree-vect-stmts.cc | 17 +++++++++++++++-- gcc/tree-vectorizer.h | 4 ++++ 3 files changed, 46 insertions(+), 2 deletions(-) diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 361aec06488..83c0544b6aa 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -11416,6 +11416,33 @@ vect_get_loop_len (loop_vec_info loop_vinfo, gimple_stmt_iterator *gsi, return loop_len; } +/* Generate the tree for the loop len mask and return it. Given the lens, + nvectors, vectype, index and factor to gen the len mask as below. + + tree len_mask = VCOND_MASK_LEN (compare_mask, ones, zero, len, bias) +*/ +tree +vect_gen_loop_len_mask (loop_vec_info loop_vinfo, gimple_stmt_iterator *gsi, + gimple_stmt_iterator *cond_gsi, vec_loop_lens *lens, + unsigned int nvectors, tree vectype, tree stmt, + unsigned int index, unsigned int factor) +{ + tree all_one_mask = build_all_ones_cst (vectype); + tree all_zero_mask = build_zero_cst (vectype); + tree len = vect_get_loop_len (loop_vinfo, gsi, lens, nvectors, vectype, index, + factor); + tree bias = build_int_cst (intQI_type_node, + LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo)); + tree len_mask = make_temp_ssa_name (TREE_TYPE (stmt), NULL, "vec_len_mask"); + gcall *call = gimple_build_call_internal (IFN_VCOND_MASK_LEN, 5, stmt, + all_one_mask, all_zero_mask, len, + bias); + gimple_call_set_lhs (call, len_mask); + gsi_insert_before (cond_gsi, call, GSI_SAME_STMT); + + return len_mask; +} + /* Scale profiling counters by estimation for LOOP which is vectorized by factor VF. If FLAT is true, the loop we started with had unrealistically flat diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index b8a71605f1b..672959501bb 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -12895,7 +12895,9 @@ vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info, ncopies = vect_get_num_copies (loop_vinfo, vectype); vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo); + vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo); bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo); + bool len_loop_p = LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo); /* Now build the new conditional. Pattern gimple_conds get dropped during codegen so we must replace the original insn. */ @@ -12959,12 +12961,11 @@ vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info, { if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype, OPTIMIZE_FOR_SPEED)) - return false; + vect_record_loop_len (loop_vinfo, lens, ncopies, vectype, 1); else vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL); } - return true; } @@ -13017,6 +13018,15 @@ vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info, stmts[i], &cond_gsi); workset.quick_push (stmt_mask); } + else if (len_loop_p) + for (unsigned i = 0; i < stmts.length (); i++) + { + tree len_mask = vect_gen_loop_len_mask (loop_vinfo, gsi, &cond_gsi, + lens, ncopies, vectype, + stmts[i], i, 1); + + workset.quick_push (len_mask); + } else workset.splice (stmts); @@ -13041,6 +13051,9 @@ vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info, new_temp = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, new_temp, &cond_gsi); } + else if (len_loop_p) + new_temp = vect_gen_loop_len_mask (loop_vinfo, gsi, &cond_gsi, lens, + ncopies, vectype, new_temp, 0, 1); } gcc_assert (new_temp); diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index db44d730b70..93bc30ef660 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -2408,6 +2408,10 @@ extern void vect_record_loop_len (loop_vec_info, vec_loop_lens *, unsigned int, extern tree vect_get_loop_len (loop_vec_info, gimple_stmt_iterator *, vec_loop_lens *, unsigned int, tree, unsigned int, unsigned int); +extern tree vect_gen_loop_len_mask (loop_vec_info, gimple_stmt_iterator *, + gimple_stmt_iterator *, vec_loop_lens *, + unsigned int, tree, tree, unsigned int, + unsigned int); extern gimple_seq vect_gen_len (tree, tree, tree, tree); extern stmt_vec_info info_for_reduction (vec_info *, stmt_vec_info); extern bool reduction_fn_for_scalar_code (code_helper, internal_fn *); -- 2.34.1