From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id B395A3858D20 for ; Wed, 9 Aug 2023 11:00:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B395A3858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id D838821866; Wed, 9 Aug 2023 11:00:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1691578835; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Qus8Y2TGP7SLagu+buyCxj2JKu99rAZQwD3E4yMLAFM=; b=wkQJd5jyDkP55ED1wMCY1XxCYTyqJpEAFECjdJBSxtJvw2qZ7qjmfcuIDnMiJsBnI7zu23 2NIezZgPDvW08VM8UeKFn1sqO4+vCiDL5apRGI2/HR2HPPryu5XPtzaeO4KyKSGmdrRcsA 8BVftZymFslVz1AudvRuh7Wt/DLg8Ug= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1691578835; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Qus8Y2TGP7SLagu+buyCxj2JKu99rAZQwD3E4yMLAFM=; b=Y4AKYQNUc1FyX2/gItG6yRzCO9ft9DjcGjdaUVl5S5/rrPDmz2D36M78DcxB5YmsV87IkO YwJ+Fq5fbv5GcrBQ== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id C2DD12C143; Wed, 9 Aug 2023 11:00:35 +0000 (UTC) Date: Wed, 9 Aug 2023 11:00:35 +0000 (UTC) From: Richard Biener To: Ju-Zhe Zhong cc: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: Re: [PATCH] VECT: Support loop len control on EXTRACT_LAST vectorization In-Reply-To: <20230809063622.316743-1-juzhe.zhong@rivai.ai> Message-ID: References: <20230809063622.316743-1-juzhe.zhong@rivai.ai> User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, 9 Aug 2023, juzhe.zhong@rivai.ai wrote: > From: Ju-Zhe Zhong > > Hi, this patch is adding loop len control on extract_last autovectorization. > > Consider this following case: > > #include > > #define EXTRACT_LAST(TYPE) \ > TYPE __attribute__ ((noinline, noclone)) \ > test_##TYPE (TYPE *x, int n, TYPE value) \ > { \ > TYPE last; \ > for (int j = 0; j < n; ++j) \ > { \ > last = x[j]; \ > x[j] = last * value; \ > } \ > return last; \ > } > > #define TEST_ALL(T) \ > T (uint8_t) \ > > TEST_ALL (EXTRACT_LAST) > > ARM SVE IR: > > Preheader: > max_mask_34 = .WHILE_ULT (0, bnd.5_6, { 0, ... }); > > Loop: > ... > # loop_mask_22 = PHI > ... > vect_last_12.8_23 = .MASK_LOAD (_7, 8B, loop_mask_22); > vect__4.9_27 = vect_last_12.8_23 * vect_cst__26; > .MASK_STORE (_7, 8B, loop_mask_22, vect__4.9_27); > ... > next_mask_35 = .WHILE_ULT (_1, bnd.5_6, { 0, ... }); > ... > > Epilogue: > _25 = .EXTRACT_LAST (loop_mask_22, vect_last_12.8_23); > > For RVV since we prefer len in loop control, after this patch for RVV: > > Loop: > ... > loop_len_22 = SELECT_VL; > vect_last_12.8_23 = .MASK_LOAD (_7, 8B, loop_len_22); > vect__4.9_27 = vect_last_12.8_23 * vect_cst__26; > .MASK_STORE (_7, 8B, loop_len_22, vect__4.9_27); > ... > > Epilogue: > _25 = .EXTRACT_LAST (loop_len_22, vect_last_12.8_23); > > This patch didn't add a new pattern for length loop control of extract_last. > Instead we reuse current extract_last. > > Here is the code: > > Step 1 - Enable length and record length for extract_last: > > + machine_mode vec_mode = TYPE_MODE (vectype); > + if (get_len_load_store_mode (vec_mode, true).exists (&vec_mode)) > + vect_record_loop_len (loop_vinfo, > + &LOOP_VINFO_LENS (loop_vinfo), 1, > + vectype, 1); > + else > + vect_record_loop_mask (loop_vinfo, > + &LOOP_VINFO_MASKS (loop_vinfo), 1, > + vectype, NULL); > > We use 'get_len_load_store_mode' to check whether targets support loop len control or not. > If yes, record a loop len. > > Step 2 - Build EXTRACT_LAST with len: > > - tree mask = vect_get_loop_mask (loop_vinfo, gsi, > - &LOOP_VINFO_MASKS (loop_vinfo), > - 1, vectype, 0); > + tree control; > + if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)) > + control = vect_get_loop_len (loop_vinfo, gsi, > + &LOOP_VINFO_LENS (loop_vinfo), 1, > + vectype, 0, 0); > + else > + control = vect_get_loop_mask (loop_vinfo, gsi, > + &LOOP_VINFO_MASKS (loop_vinfo), 1, > + vectype, 0); > tree scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type, > - mask, vec_lhs_phi); > + control, vec_lhs_phi); > > Reuse the current codes (build EXTRACT_LAST with mask), build length instead if > 'LOOP_VINFO_FULLY_WITH_LENGTH_P' is true. > > This patch has been fully tested in RISC-V port. > > Bootstrap and Regression on X86 passed. > > Ok for trunk ? > > gcc/ChangeLog: > > * tree-vect-loop.cc (vectorizable_live_operation): Add length control. > > --- > gcc/tree-vect-loop.cc | 40 ++++++++++++++++++++++++++++------------ > 1 file changed, 28 insertions(+), 12 deletions(-) > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > index 00058c3c13e..fde098cafde 100644 > --- a/gcc/tree-vect-loop.cc > +++ b/gcc/tree-vect-loop.cc > @@ -10311,9 +10311,15 @@ vectorizable_live_operation (vec_info *vinfo, > else > { > gcc_assert (ncopies == 1 && !slp_node); > - vect_record_loop_mask (loop_vinfo, > - &LOOP_VINFO_MASKS (loop_vinfo), > - 1, vectype, NULL); > + machine_mode vec_mode = TYPE_MODE (vectype); > + if (get_len_load_store_mode (vec_mode, true).exists (&vec_mode)) > + vect_record_loop_len (loop_vinfo, > + &LOOP_VINFO_LENS (loop_vinfo), 1, > + vectype, 1); > + else > + vect_record_loop_mask (loop_vinfo, > + &LOOP_VINFO_MASKS (loop_vinfo), 1, > + vectype, NULL); > } > } > /* ??? Enable for loop costing as well. */ > @@ -10339,7 +10345,9 @@ vectorizable_live_operation (vec_info *vinfo, > gimple *vec_stmt; > if (slp_node) > { > - gcc_assert (!loop_vinfo || !LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)); > + gcc_assert (!loop_vinfo > + || !LOOP_VINFO_FULLY_MASKED_P (loop_vinfo) > + || !LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)); that should be || (!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo) && !LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)) I think. It seems to imply that SLP isn't supported with masking/lengthing. > > /* Get the correct slp vectorized stmt. */ > vec_lhs = SLP_TREE_VEC_DEFS (slp_node)[vec_entry]; > @@ -10383,21 +10391,29 @@ vectorizable_live_operation (vec_info *vinfo, > > gimple_seq stmts = NULL; > tree new_tree; > - if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)) > + if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo) > + || LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)) > { > /* Emit: > > - SCALAR_RES = EXTRACT_LAST > + SCALAR_RES = EXTRACT_LAST > > - where VEC_LHS is the vectorized live-out result and MASK is > - the loop mask for the final iteration. */ > + where VEC_LHS is the vectorized live-out result and CONTROL can > + be either the loop mask for the final iteration or the loop len > + for the final iteration. */ > gcc_assert (ncopies == 1 && !slp_node); > tree scalar_type = TREE_TYPE (STMT_VINFO_VECTYPE (stmt_info)); > - tree mask = vect_get_loop_mask (loop_vinfo, gsi, > - &LOOP_VINFO_MASKS (loop_vinfo), > - 1, vectype, 0); > + tree control; > + if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)) > + control = vect_get_loop_len (loop_vinfo, gsi, > + &LOOP_VINFO_LENS (loop_vinfo), 1, > + vectype, 0, 0); > + else > + control = vect_get_loop_mask (loop_vinfo, gsi, > + &LOOP_VINFO_MASKS (loop_vinfo), 1, > + vectype, 0); > tree scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type, > - mask, vec_lhs_phi); > + control, vec_lhs_phi); Hum, how does CFN_EXTRACT_LAST handle both mask and length transparently? Don't you need some CFN_LEN_EXTRACT_LAST instead? > /* Convert the extracted vector element to the scalar type. */ > new_tree = gimple_convert (&stmts, lhs_type, scalar_res); > Otherwise looks OK to me. Thanks, Richard.