From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by sourceware.org (Postfix) with ESMTPS id 338C03858408 for ; Mon, 27 Sep 2021 06:25:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 338C03858408 Received: from relay1.suse.de (relay1.suse.de [149.44.160.133]) by smtp-out2.suse.de (Postfix) with ESMTP id 50B982007E; Mon, 27 Sep 2021 06:25:21 +0000 (UTC) Received: from murzim.suse.de (murzim.suse.de [10.160.4.192]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay1.suse.de (Postfix) with ESMTPS id 4AE7F25D57; Mon, 27 Sep 2021 06:25:21 +0000 (UTC) Date: Mon, 27 Sep 2021 08:25:21 +0200 (CEST) From: Richard Biener To: Richard Sandiford cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Allow different vector types for stmt groups In-Reply-To: Message-ID: <839254rr-2sns-3p81-1026-2qqnrn13q35q@fhfr.qr> References: MIME-Version: 1.0 X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Sep 2021 06:25:23 -0000 On Fri, 24 Sep 2021, Richard Sandiford wrote: > Richard Biener writes: > > This allows vectorization (in practice non-loop vectorization) to > > have a stmt participate in different vector type vectorizations. > > It allows us to remove vect_update_shared_vectype and replace it > > by pushing/popping STMT_VINFO_VECTYPE from SLP_TREE_VECTYPE around > > vect_analyze_stmt and vect_transform_stmt. > > > > For data-ref the situation is a bit more complicated since we > > analyze alignment info with a specific vector type in mind which > > doesn't play well when that changes. > > > > So the bulk of the change is passing down the actual vector type > > used for a vectorized access to the various accessors of alignment > > info, first and foremost dr_misalignment but also aligned_access_p, > > known_alignment_for_access_p, vect_known_alignment_in_bytes and > > vect_supportable_dr_alignment. I took the liberty to replace > > ALL_CAPS macro accessors with the lower-case function invocations. > > > > The actual changes to the behavior are in dr_misalignment which now > > is the place factoring in the negative step adjustment as well as > > handling alignment queries for a vector type with bigger alignment > > requirements than what we can (or have) analyze(d). > > > > vect_slp_analyze_node_alignment makes use of this and upon receiving > > a vector type with a bigger alingment desire re-analyzes the DR > > with respect to it but keeps an older more precise result if possible. > > In this context it might be possible to do the analysis just once > > but instead of analyzing with respect to a specific desired alignment > > look for the biggest alignment we can compute a not unknown alignment. > > > > The ChangeLog includes the functional changes but not the bulk due > > to the alignment accessor API changes - I hope that's something good. > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, testing on SPEC > > CPU 2017 in progress (for stats and correctness). > > > > Any comments? > > Sorry for the super-slow response, some comments below. > > > […] > > diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c > > index a57700f2c1b..c42fc2fb272 100644 > > --- a/gcc/tree-vect-data-refs.c > > +++ b/gcc/tree-vect-data-refs.c > > @@ -887,37 +887,53 @@ vect_slp_analyze_instance_dependence (vec_info *vinfo, slp_instance instance) > > return res; > > } > > > > -/* Return the misalignment of DR_INFO. */ > > +/* Return the misalignment of DR_INFO accessed in VECTYPE. */ > > > > int > > -dr_misalignment (dr_vec_info *dr_info) > > +dr_misalignment (dr_vec_info *dr_info, tree vectype) > > { > > + HOST_WIDE_INT diff = 0; > > + /* Alignment is only analyzed for the first element of a DR group, > > + use that but adjust misalignment by the offset of the access. */ > > if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt)) > > { > > dr_vec_info *first_dr > > = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (dr_info->stmt)); > > - int misalign = first_dr->misalignment; > > - gcc_assert (misalign != DR_MISALIGNMENT_UNINITIALIZED); > > - if (misalign == DR_MISALIGNMENT_UNKNOWN) > > - return misalign; > > /* vect_analyze_data_ref_accesses guarantees that DR_INIT are > > INTEGER_CSTs and the first element in the group has the lowest > > address. Likewise vect_compute_data_ref_alignment will > > have ensured that target_alignment is constant and otherwise > > set misalign to DR_MISALIGNMENT_UNKNOWN. */ > > Can you move the second sentence down so that it stays with the to_constant? > > > - HOST_WIDE_INT diff = (TREE_INT_CST_LOW (DR_INIT (dr_info->dr)) > > - - TREE_INT_CST_LOW (DR_INIT (first_dr->dr))); > > + diff = (TREE_INT_CST_LOW (DR_INIT (dr_info->dr)) > > + - TREE_INT_CST_LOW (DR_INIT (first_dr->dr))); > > gcc_assert (diff >= 0); > > - unsigned HOST_WIDE_INT target_alignment_c > > - = first_dr->target_alignment.to_constant (); > > - return (misalign + diff) % target_alignment_c; > > + dr_info = first_dr; > > } > > - else > > + > > + int misalign = dr_info->misalignment; > > + gcc_assert (misalign != DR_MISALIGNMENT_UNINITIALIZED); > > + if (misalign == DR_MISALIGNMENT_UNKNOWN) > > + return misalign; > > + > > + /* If the access is only aligned for a vector type with smaller alignment > > + requirement the access has unknown misalignment. */ > > + if (maybe_lt (dr_info->target_alignment * BITS_PER_UNIT, > > + targetm.vectorize.preferred_vector_alignment (vectype))) > > + return DR_MISALIGNMENT_UNKNOWN; > > + > > + /* If this is a backward running DR then first access in the larger > > + vectype actually is N-1 elements before the address in the DR. > > + Adjust misalign accordingly. */ > > + if (tree_int_cst_sgn (DR_STEP (dr_info->dr)) < 0) > > { > > - int misalign = dr_info->misalignment; > > - gcc_assert (misalign != DR_MISALIGNMENT_UNINITIALIZED); > > - return misalign; > > + if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant ()) > > + return DR_MISALIGNMENT_UNKNOWN; > > + diff += ((TYPE_VECTOR_SUBPARTS (vectype).to_constant () - 1) > > + * -TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (vectype)))); > > The target alignment might only be element alignment. If so, I think we > should either skip this block or return early above. This would avoid > returning DR_MISALIGNMENT_UNKNOWN unnecessarily for VLA. > > Or, more generally, I think this is case where known_misalignment > does work. I.e. diff can be a poly_int64 and: > > > } > > + unsigned HOST_WIDE_INT target_alignment_c > > + = dr_info->target_alignment.to_constant (); > > + return (misalign + diff) % target_alignment_c; > > the last line can become: > > if (!known_misalignment (misalign + diff, target_alignment_c, &misalign)) > return DR_MISALIGNMENT_UNKNOWN; > return misalign; > > avoiding the need for the is_constant above. OK, done - this then matches more closely what vect_compute_data_ref_alignment did. > > […] > > @@ -1896,10 +1902,9 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) > > unsigned int target_align = > > DR_TARGET_ALIGNMENT (dr_info).to_constant (); > > unsigned int dr_size = vect_get_scalar_dr_size (dr_info); > > - mis = (negative > > - ? DR_MISALIGNMENT (dr_info) > > - : -DR_MISALIGNMENT (dr_info)); > > - if (DR_MISALIGNMENT (dr_info) != 0) > > + unsigned int mis = dr_misalignment (dr_info, vectype); > > + mis = negative ? mis : -mis; > > Just checking: is this still correct? It probably is, but it looked a > bit like we're handling negative steps twice. Same further down. Yeah, it's done the same number of times than before. But I didn't try to understand this instance but I hope to followup with a change that moves the negative step adjustment to where we actually do vectorize with the computed address offset ... (there's a PR which shows we currently generate wrong-code for negative stride SLP where this adjustment is off and thus we get wrong aligned/misaligned accesses) Richard.