From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id 5C9AF3858C2F for ; Thu, 12 Oct 2023 11:13:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5C9AF3858C2F Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 5F67F21869; Thu, 12 Oct 2023 11:13:34 +0000 (UTC) Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 2D5642C984; Thu, 12 Oct 2023 11:13:34 +0000 (UTC) Date: Thu, 12 Oct 2023 11:13:34 +0000 (UTC) From: Richard Biener To: "juzhe.zhong@rivai.ai" cc: gcc-patches , "richard.sandiford" Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721] In-Reply-To: Message-ID: References: <20231011122715.3017753-1-juzhe.zhong@rivai.ai>, , , , <10DAD51FB2CAF84A+202310121750281482051@rivai.ai>, User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Level: Authentication-Results: smtp-out1.suse.de; none X-Rspamd-Server: rspamd2 X-Spamd-Result: default: False [-4.00 / 50.00]; REPLY(-4.00)[] X-Spam-Score: -4.00 X-Rspamd-Queue-Id: 5F67F21869 X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_STATUS,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, 12 Oct 2023, juzhe.zhong@rivai.ai wrote: > Oh. I see. > > Here make vect_constant_def failed to SLP: > > tree-vect-slp.cc: > vect_build_slp_tree_2 > line 2354: > > if (oprnd_info->first_dt == vect_external_def > || oprnd_info->first_dt == vect_constant_def) > { > slp_tree invnode = vect_create_new_slp_node (oprnd_info->ops); > SLP_TREE_DEF_TYPE (invnode) = oprnd_info->first_dt; > oprnd_info->ops = vNULL; > children.safe_push (invnode); > continue; > } > > It seems that we handle vect_constant_def same as vect_external_def. > So failed to SLP ? Why? We _should_ see a SLP node for the all-true mask operand. > > > > juzhe.zhong@rivai.ai > > From: Richard Biener > Date: 2023-10-12 17:55 > To: juzhe.zhong@rivai.ai > CC: gcc-patches; richard.sandiford > Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721] > On Thu, 12 Oct 2023, juzhe.zhong@rivai.ai wrote: > > > I tree-vect-slp.cc: > > vect_get_and_check_slp_defs > > 711: > > > > tree type = TREE_TYPE (oprnd); > > dt = dts[i]; > > if ((dt == vect_constant_def > > || dt == vect_external_def) > > && !GET_MODE_SIZE (vinfo->vector_mode).is_constant () > > && (TREE_CODE (type) == BOOLEAN_TYPE > > || !can_duplicate_and_interleave_p (vinfo, stmts.length (), > > type))) > > { > > if (dump_enabled_p ()) > > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > "Build SLP failed: invalid type of def " > > "for variable-length SLP %T\n", oprnd); > > return -1; > > } > > > > Here mask = -1 is BOOLEAN type in tree-vect-patterns.cc reaching this condition, then SLP failed: > > Build SLP failed: invalid type of def > > I think this can be restricted to vect_external_def, but some history > might reveal the cases we put this code in for (we should be able to > materialize all constants?). At least uniform boolean constants > should be fine. > > > > > > > > juzhe.zhong@rivai.ai > > > > From: Richard Biener > > Date: 2023-10-12 17:44 > > To: ??? > > CC: gcc-patches; richard.sandiford > > Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721] > > On Thu, 12 Oct 2023, ??? wrote: > > > > > Thanks Richi point it out. > > > > > > I found this patch can't make conditional gather load succeed on SLP. > > > > > > I am considering change MASK_LEN_GATHER_LOAD in pattern recognization: > > > > > > If no condition mask, in tree-vect-patterns.cc, I build MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0) -> 4 arguments same as GATHER_LOAD. > > > In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP flow naturally. > > > > > > If has condition mask, in tree-vect-patterns.cc, I build MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments same as MASK_GATHER_LOAD. > > > In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD SLP flow naturally. > > > > > > Is it reasonable ? > > > > What's wrong with handling MASK_LEN_GATHER_LOAD with all arguments > > even when the mask is -1? > > > > > > > > juzhe.zhong@rivai.ai > > > > > > From: Richard Biener > > > Date: 2023-10-11 20:50 > > > To: Juzhe-Zhong > > > CC: gcc-patches; richard.sandiford > > > Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721] > > > On Wed, 11 Oct 2023, Juzhe-Zhong wrote: > > > > > > > This patch fixes this following FAILs in RISC-V regression: > > > > > > > > FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts" > > > > FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP stmts" > > > > FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts" > > > > FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP stmts" > > > > > > > > The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD. > > > > > > > > Since for RVV, we build MASK_LEN_GATHER_LOAD with dummy mask (-1) in tree-vect-patterns.cc if it is same > > > > situation as GATHER_LOAD (no conditional mask). > > > > > > > > So we make MASK_LEN_GATHER_LOAD leverage the flow of GATHER_LOAD if mask argument is a dummy mask. > > > > > > > > gcc/ChangeLog: > > > > > > > > * tree-vect-slp.cc (vect_get_operand_map): > > > > (vect_build_slp_tree_1): > > > > (vect_build_slp_tree_2): > > > > * tree-vect-stmts.cc (vectorizable_load): > > > > > > > > --- > > > > gcc/tree-vect-slp.cc | 18 ++++++++++++++++-- > > > > gcc/tree-vect-stmts.cc | 4 ++-- > > > > 2 files changed, 18 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc > > > > index fa098f9ff4e..712c04ec278 100644 > > > > --- a/gcc/tree-vect-slp.cc > > > > +++ b/gcc/tree-vect-slp.cc > > > > @@ -544,6 +544,17 @@ vect_get_operand_map (const gimple *stmt, unsigned char swap = 0) > > > > case IFN_MASK_GATHER_LOAD: > > > > return arg1_arg4_map; > > > > > > > > + case IFN_MASK_LEN_GATHER_LOAD: > > > > + /* In tree-vect-patterns.cc, we will have these 2 situations: > > > > + > > > > + - Unconditional gather load transforms > > > > + into MASK_LEN_GATHER_LOAD with dummy mask which is -1. > > > > + > > > > + - Conditional gather load transforms > > > > + into MASK_LEN_GATHER_LOAD with real conditional mask.*/ > > > > + return integer_minus_onep (gimple_call_arg (call, 4)) ? arg1_map > > > > + : nullptr; > > > > + > > > > case IFN_MASK_STORE: > > > > return arg3_arg2_map; > > > > > > > > @@ -1077,7 +1088,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap, > > > > > > > > if (cfn == CFN_MASK_LOAD > > > > || cfn == CFN_GATHER_LOAD > > > > - || cfn == CFN_MASK_GATHER_LOAD) > > > > + || cfn == CFN_MASK_GATHER_LOAD > > > > + || cfn == CFN_MASK_LEN_GATHER_LOAD) > > > > ldst_p = true; > > > > else if (cfn == CFN_MASK_STORE) > > > > { > > > > @@ -1337,6 +1349,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap, > > > > if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info)) > > > > && rhs_code != CFN_GATHER_LOAD > > > > && rhs_code != CFN_MASK_GATHER_LOAD > > > > + && rhs_code != CFN_MASK_LEN_GATHER_LOAD > > > > /* Not grouped loads are handled as externals for BB > > > > vectorization. For loop vectorization we can handle > > > > splats the same we handle single element interleaving. */ > > > > @@ -1837,7 +1850,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node, > > > > if (gcall *stmt = dyn_cast (stmt_info->stmt)) > > > > gcc_assert (gimple_call_internal_p (stmt, IFN_MASK_LOAD) > > > > || gimple_call_internal_p (stmt, IFN_GATHER_LOAD) > > > > - || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD)); > > > > + || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD) > > > > + || gimple_call_internal_p (stmt, IFN_MASK_LEN_GATHER_LOAD)); > > > > else > > > > { > > > > *max_nunits = this_max_nunits; > > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > > > > index cd7c1090d88..263acf5d3cd 100644 > > > > --- a/gcc/tree-vect-stmts.cc > > > > +++ b/gcc/tree-vect-stmts.cc > > > > @@ -9575,9 +9575,9 @@ vectorizable_load (vec_info *vinfo, > > > > return false; > > > > > > > > mask_index = internal_fn_mask_index (ifn); > > > > - if (mask_index >= 0 && slp_node) > > > > + if (mask_index >= 0 && slp_node && internal_fn_len_index (ifn) < 0) > > > > mask_index = vect_slp_child_index_for_operand (call, mask_index); > > > > - if (mask_index >= 0 > > > > + if (mask_index >= 0 && internal_fn_len_index (ifn) < 0 > > > > && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index, > > > > &mask, NULL, &mask_dt, &mask_vectype)) > > > > return false; > > > > > > You are ignoring the mask argument and here only handle it when the > > > IFN doesn't have a _LEN. This doesn't seem to be forward looking > > > to the point where you want to actually handle masked (aka conditional) > > > gather. > > > > > > Did you check that SLP is actually used to vectorize this? > > > > > > Richard. > > > > > > > > > > > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)