From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpbguseast2.qq.com (smtpbguseast2.qq.com [54.204.34.130]) by sourceware.org (Postfix) with ESMTPS id F37EE3858404 for ; Wed, 9 Aug 2023 06:36:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F37EE3858404 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai X-QQ-mid: bizesmtp87t1691562985tqx3y770 Received: from server1.localdomain ( [58.60.1.10]) by bizesmtp.qq.com (ESMTP) with id ; Wed, 09 Aug 2023 14:36:24 +0800 (CST) X-QQ-SSF: 01400000000000G0V000000A0000000 X-QQ-FEAT: pPKMqzLgSAT6jUdf+NMeHN7Vp0QqLSXVXxXfl2vWjw1xiZYmwANIbdzN+trh4 UVBa7ftl6eBySgQXbfnUTYuMUtpwASvpqViyDsScuB1RW5kCJRYtJQqoDnXr3cpEZLy8vc5 1qO2mIygSmc8chho1ho8waHIbzjCyOwYUhlld+BeYElpWp4bM5EUFdpyzrpSuDxIvPNPI+O gIO4hUPV44mtTM84iXWFBR7PpOfTycvM9T4Kp7KqAzkiFHWcdfyj307zceBNvyfz6XJyDXC S7Vwkc+v+4Qub5rLuUUiSup5XeyNdzbaiIb/Orbr6tMofivpne2uW/4yczi39ad6IiPFmMU hrhR/ABab38Cf9J54/133dNopV/RX1M76yyB5SCotXuoZhMex7BlwslXzoRBzTDsH/jBKzo HtwQgHFPg8xza6X6KTfQUQ== X-QQ-GoodBg: 2 X-BIZMAIL-ID: 14944133363159421833 From: juzhe.zhong@rivai.ai To: gcc-patches@gcc.gnu.org Cc: richard.sandiford@arm.com, rguenther@suse.de, Ju-Zhe Zhong Subject: [PATCH] VECT: Support loop len control on EXTRACT_LAST vectorization Date: Wed, 9 Aug 2023 14:36:22 +0800 Message-Id: <20230809063622.316743-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-10.2 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_STATUS,RCVD_IN_BARRACUDACENTRAL,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: From: Ju-Zhe Zhong Hi, this patch is adding loop len control on extract_last autovectorization. Consider this following case: #include #define EXTRACT_LAST(TYPE) \ TYPE __attribute__ ((noinline, noclone)) \ test_##TYPE (TYPE *x, int n, TYPE value) \ { \ TYPE last; \ for (int j = 0; j < n; ++j) \ { \ last = x[j]; \ x[j] = last * value; \ } \ return last; \ } #define TEST_ALL(T) \ T (uint8_t) \ TEST_ALL (EXTRACT_LAST) ARM SVE IR: Preheader: max_mask_34 = .WHILE_ULT (0, bnd.5_6, { 0, ... }); Loop: ... # loop_mask_22 = PHI ... vect_last_12.8_23 = .MASK_LOAD (_7, 8B, loop_mask_22); vect__4.9_27 = vect_last_12.8_23 * vect_cst__26; .MASK_STORE (_7, 8B, loop_mask_22, vect__4.9_27); ... next_mask_35 = .WHILE_ULT (_1, bnd.5_6, { 0, ... }); ... Epilogue: _25 = .EXTRACT_LAST (loop_mask_22, vect_last_12.8_23); For RVV since we prefer len in loop control, after this patch for RVV: Loop: ... loop_len_22 = SELECT_VL; vect_last_12.8_23 = .MASK_LOAD (_7, 8B, loop_len_22); vect__4.9_27 = vect_last_12.8_23 * vect_cst__26; .MASK_STORE (_7, 8B, loop_len_22, vect__4.9_27); ... Epilogue: _25 = .EXTRACT_LAST (loop_len_22, vect_last_12.8_23); This patch didn't add a new pattern for length loop control of extract_last. Instead we reuse current extract_last. Here is the code: Step 1 - Enable length and record length for extract_last: + machine_mode vec_mode = TYPE_MODE (vectype); + if (get_len_load_store_mode (vec_mode, true).exists (&vec_mode)) + vect_record_loop_len (loop_vinfo, + &LOOP_VINFO_LENS (loop_vinfo), 1, + vectype, 1); + else + vect_record_loop_mask (loop_vinfo, + &LOOP_VINFO_MASKS (loop_vinfo), 1, + vectype, NULL); We use 'get_len_load_store_mode' to check whether targets support loop len control or not. If yes, record a loop len. Step 2 - Build EXTRACT_LAST with len: - tree mask = vect_get_loop_mask (loop_vinfo, gsi, - &LOOP_VINFO_MASKS (loop_vinfo), - 1, vectype, 0); + tree control; + if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)) + control = vect_get_loop_len (loop_vinfo, gsi, + &LOOP_VINFO_LENS (loop_vinfo), 1, + vectype, 0, 0); + else + control = vect_get_loop_mask (loop_vinfo, gsi, + &LOOP_VINFO_MASKS (loop_vinfo), 1, + vectype, 0); tree scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type, - mask, vec_lhs_phi); + control, vec_lhs_phi); Reuse the current codes (build EXTRACT_LAST with mask), build length instead if 'LOOP_VINFO_FULLY_WITH_LENGTH_P' is true. This patch has been fully tested in RISC-V port. Bootstrap and Regression on X86 passed. Ok for trunk ? gcc/ChangeLog: * tree-vect-loop.cc (vectorizable_live_operation): Add length control. --- gcc/tree-vect-loop.cc | 40 ++++++++++++++++++++++++++++------------ 1 file changed, 28 insertions(+), 12 deletions(-) diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 00058c3c13e..fde098cafde 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -10311,9 +10311,15 @@ vectorizable_live_operation (vec_info *vinfo, else { gcc_assert (ncopies == 1 && !slp_node); - vect_record_loop_mask (loop_vinfo, - &LOOP_VINFO_MASKS (loop_vinfo), - 1, vectype, NULL); + machine_mode vec_mode = TYPE_MODE (vectype); + if (get_len_load_store_mode (vec_mode, true).exists (&vec_mode)) + vect_record_loop_len (loop_vinfo, + &LOOP_VINFO_LENS (loop_vinfo), 1, + vectype, 1); + else + vect_record_loop_mask (loop_vinfo, + &LOOP_VINFO_MASKS (loop_vinfo), 1, + vectype, NULL); } } /* ??? Enable for loop costing as well. */ @@ -10339,7 +10345,9 @@ vectorizable_live_operation (vec_info *vinfo, gimple *vec_stmt; if (slp_node) { - gcc_assert (!loop_vinfo || !LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)); + gcc_assert (!loop_vinfo + || !LOOP_VINFO_FULLY_MASKED_P (loop_vinfo) + || !LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)); /* Get the correct slp vectorized stmt. */ vec_lhs = SLP_TREE_VEC_DEFS (slp_node)[vec_entry]; @@ -10383,21 +10391,29 @@ vectorizable_live_operation (vec_info *vinfo, gimple_seq stmts = NULL; tree new_tree; - if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)) + if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo) + || LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)) { /* Emit: - SCALAR_RES = EXTRACT_LAST + SCALAR_RES = EXTRACT_LAST - where VEC_LHS is the vectorized live-out result and MASK is - the loop mask for the final iteration. */ + where VEC_LHS is the vectorized live-out result and CONTROL can + be either the loop mask for the final iteration or the loop len + for the final iteration. */ gcc_assert (ncopies == 1 && !slp_node); tree scalar_type = TREE_TYPE (STMT_VINFO_VECTYPE (stmt_info)); - tree mask = vect_get_loop_mask (loop_vinfo, gsi, - &LOOP_VINFO_MASKS (loop_vinfo), - 1, vectype, 0); + tree control; + if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)) + control = vect_get_loop_len (loop_vinfo, gsi, + &LOOP_VINFO_LENS (loop_vinfo), 1, + vectype, 0, 0); + else + control = vect_get_loop_mask (loop_vinfo, gsi, + &LOOP_VINFO_MASKS (loop_vinfo), 1, + vectype, 0); tree scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type, - mask, vec_lhs_phi); + control, vec_lhs_phi); /* Convert the extracted vector element to the scalar type. */ new_tree = gimple_convert (&stmts, lhs_type, scalar_res); -- 2.36.1