From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 13DD23858C3B for ; Wed, 15 Sep 2021 10:08:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 13DD23858C3B Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9839C6D; Wed, 15 Sep 2021 03:08:45 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 17C963F5A1; Wed, 15 Sep 2021 03:08:44 -0700 (PDT) From: Richard Sandiford To: Richard Biener via Gcc-patches Mail-Followup-To: Richard Biener via Gcc-patches , Richard Biener , richard.sandiford@arm.com Cc: Richard Biener Subject: Re: [PATCH] Maintain (mis-)alignment info in the first element of a group References: <94r9n910-qsp7-778q-5ooq-pno71s1sqn6o@fhfr.qr> Date: Wed, 15 Sep 2021 11:08:43 +0100 In-Reply-To: (Richard Biener via Gcc-patches's message of "Wed, 15 Sep 2021 11:47:13 +0200 (CEST)") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Sep 2021 10:08:48 -0000 Richard Biener via Gcc-patches writes: > On Wed, 15 Sep 2021, Richard Sandiford wrote: >> Richard Biener writes: >> > On Tue, 14 Sep 2021, Richard Sandiford wrote: >> > >> >> Richard Biener via Gcc-patches writes: >> >> > This changes us to maintain and compute (mis-)alignment info for >> >> > the first element of a group only rather than for each DR when >> >> > doing interleaving and for the earliest, first, or first in the SLP >> >> > node (or any pair or all three of those) when SLP vectorizing. >> >> > >> >> > For this to work out the easiest way I have changed the accessors >> >> > DR_MISALIGNMENT and DR_TARGET_ALIGNMENT to do the indirection to >> >> > the first element rather than adjusting all callers. >> >> > dr_misalignment is moved out-of-line and I'm not too fond of the >> >> > poly-int dances there (any hints?), but basically we are now >> >> > adjusting the first elements misalignment based on the DR_INIT >> >> > difference. >> >> > >> >> > Bootstrap & regtest running on x86_64-unknown-linux-gnu. >> >> > >> >> > Richard. >> >> > >> >> > 2021-09-13 Richard Biener >> >> > >> >> > * tree-vectorizer.h (dr_misalignment): Move out of line. >> >> > (dr_target_alignment): New. >> >> > (DR_TARGET_ALIGNMENT): Wrap dr_target_alignment. >> >> > (set_dr_target_alignment): New. >> >> > (SET_DR_TARGET_ALIGNMENT): Wrap set_dr_target_alignment. >> >> > * tree-vect-data-refs.c (dr_misalignment): Compute and >> >> > return the group members misalignment. >> >> > (vect_compute_data_ref_alignment): Use SET_DR_TARGET_ALIGNMENT. >> >> > (vect_analyze_data_refs_alignment): Compute alignment only >> >> > for the first element of a DR group. >> >> > (vect_slp_analyze_node_alignment): Likewise. >> >> > --- >> >> > gcc/tree-vect-data-refs.c | 65 ++++++++++++++++++++++++--------------- >> >> > gcc/tree-vectorizer.h | 24 ++++++++++----- >> >> > 2 files changed, 57 insertions(+), 32 deletions(-) >> >> > >> >> > diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c >> >> > index 66e76132d14..b53d6a0b3f1 100644 >> >> > --- a/gcc/tree-vect-data-refs.c >> >> > +++ b/gcc/tree-vect-data-refs.c >> >> > @@ -887,6 +887,36 @@ vect_slp_analyze_instance_dependence (vec_info *vinfo, slp_instance instance) >> >> > return res; >> >> > } >> >> > >> >> > +/* Return the misalignment of DR_INFO. */ >> >> > + >> >> > +int >> >> > +dr_misalignment (dr_vec_info *dr_info) >> >> > +{ >> >> > + if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt)) >> >> > + { >> >> > + dr_vec_info *first_dr >> >> > + = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (dr_info->stmt)); >> >> > + int misalign = first_dr->misalignment; >> >> > + gcc_assert (misalign != DR_MISALIGNMENT_UNINITIALIZED); >> >> > + if (misalign == DR_MISALIGNMENT_UNKNOWN) >> >> > + return misalign; >> >> > + poly_offset_int diff = (wi::to_poly_offset (DR_INIT (dr_info->dr)) >> >> > + - wi::to_poly_offset (DR_INIT (first_dr->dr))); >> >> > + poly_int64 mispoly = misalign + diff.to_constant ().to_shwi (); >> >> > + bool res = known_misalignment (mispoly, >> >> > + first_dr->target_alignment.to_constant (), >> >> > + &misalign); >> >> > + gcc_assert (res); >> >> > + return misalign; >> >> >> >> Yeah, not too keen on the to_constants here. The one on diff looks >> >> redundant -- you could just use diff.force_shwi () instead, and >> >> keep everything poly_int. >> >> >> >> For the known_misalignment I think we should use: >> >> >> >> if (!can_div_trunc_p (mispoly, first_dr->target_alignment, >> >> "ient, &misalign)) >> >> misalign = DR_MISALIGNMENT_UNKNOWN; >> >> return misalign; >> >> >> >> There are then no to_constant assumptions. >> > >> > OK, note that group analysis does >> > >> > /* Check that the DR_INITs are compile-time constants. */ >> > if (TREE_CODE (DR_INIT (dra)) != INTEGER_CST >> > || TREE_CODE (DR_INIT (drb)) != INTEGER_CST) >> > break; >> > >> > /* Sorting has ensured that DR_INIT (dra) <= DR_INIT (drb). */ >> > HOST_WIDE_INT init_a = TREE_INT_CST_LOW (DR_INIT (dra)); >> > HOST_WIDE_INT init_b = TREE_INT_CST_LOW (DR_INIT (drb)); >> > >> > so I'm confident my variant was "correct", but it still was ugly. >> >> Ah, OK. In that case I don't mind the original version, but it would be >> good to have a comment above the to_constant saying where the condition >> is enforced. >> >> I'm just trying to avoid to_constant calls with no comment to explain >> them, and with no nearby is_constant call. Otherwise it could end up >> a bit like tree_to_uhwi, where sometimes tree_fits_uhwi_p really has >> been checked earlier (not always obvious where) and sometimes >> tree_to_uhwi is just used out of hope, to avoid having to think about >> the alternative. >> >> > There's also the issue that target_alignment is poly_uint64 but >> > misalignment is signed int. >> > >> > Note that can_div_trunc_p seems to require a poly_uint64 remainder, >> > I'm not sure what to do with that, so I used is_constant. >> >> Ah, yeah, forgot about that sorry. I guess in that case, using >> is_constant on first_dr->target_alignment and sticking with >> known_misalignment would make sense. >> >> > Btw, to what value do we want to align with variable sized vectors? >> > We are aligning globals according to target_alignment but only >> > support that for constant alignment. Most users only want to >> > distinguish between aligned or not aligned and the actual misalignment >> > is only used to seed the SSA alignment info, so I suppose being >> > too conservative in dr_misalignment for variable size vectors isn't >> > too bad if we correctly identify fully aligned accesses? >> >> In practice we don't require more than element alignment for VLA SVE >> as things stand. However, there's support for using fully-predicated >> loops in which the first loop iteration aligns by using leading >> inactive lanes where possible (e.g. start with 1 inactive lane >> for a misalignment of 1). In principle that would work for VLA too, >> we just haven't implemented it yet. >> >> > I'm now testing a variant that looks like >> > >> > int >> > dr_misalignment (dr_vec_info *dr_info) >> > { >> > if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt)) >> > { >> > dr_vec_info *first_dr >> > = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (dr_info->stmt)); >> > int misalign = first_dr->misalignment; >> > gcc_assert (misalign != DR_MISALIGNMENT_UNINITIALIZED); >> > if (misalign == DR_MISALIGNMENT_UNKNOWN) >> > return misalign; >> > /* vect_analyze_data_ref_accesses guarantees that DR_INIT are >> > INTEGER_CSTs and the first element in the group has the lowest >> > address. */ >> > poly_offset_int diff = (wi::to_poly_offset (DR_INIT (dr_info->dr)) >> > - wi::to_poly_offset (DR_INIT (first_dr->dr))); >> > gcc_assert (known_ge (diff, 0)); >> > poly_uint64 mispoly = misalign + diff.force_uhwi (); >> > unsigned HOST_WIDE_INT quotient; >> > if (can_div_trunc_p (mispoly, first_dr->target_alignment, "ient, >> > &mispoly) >> > && mispoly.is_constant ()) >> > return mispoly.to_constant (); >> > return DR_MISALIGNMENT_UNKNOWN; >> > >> > I wonder why a non-poly works for the quotient here. I suppose it's >> > because the runtime factor is the same for all poly-ints and thus >> > it cancels out? But then the remainder likely will never be constant >> > unless it is zero? >> >> It's not so much that the quotient is guaranteed to be constant, >> but that that particular overload is designed for the case in which >> the caller only cares about constant quotients, and wants can_div_trunc_p >> to fail otherwise. In other words, it's really just a way of cutting down >> on is_constant calls. >> >> poly-int.h doesn't provide every possible overload of can_div_trunc_p. >> In principle it could have a fully-general 4-poly_int version, but we >> haven't needed that yet. It could also have a new overload for the >> case we care about here, avoiding the separate is_constant. > > Hmm. I guess I'll see to somehow unify this with the logic in > vect_compute_data_ref_alignment which at the moment forces > DR_MISALIGNMENT to unknown if DR_TARGET_ALIGNMENT of the vector type > isn't constant (so for SVE all accesses appear unaligned?). In the > end vect_compute_data_ref_alignment does > > poly_int64 misalignment > = base_misalignment + wi::to_poly_offset (drb->init).force_shwi (); > > /* If this is a backward running DR then first access in the larger > vectype actually is N-1 elements before the address in the DR. > Adjust misalign accordingly. */ > if (tree_int_cst_sgn (drb->step) < 0) > /* PLUS because STEP is negative. */ > misalignment += ((TYPE_VECTOR_SUBPARTS (vectype) - 1) > * -TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE > (vectype)))); > > unsigned int const_misalignment; > if (!known_misalignment (misalignment, vect_align_c, > &const_misalignment)) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > "Non-constant misalignment for access: %T\n", > ref); > return; > } > > SET_DR_MISALIGNMENT (dr_info, const_misalignment); > > ignoring the negative step offset (where poly-int TYPE_VECTOR_SUBPARTS > sneaks in) this combines integer base_misaligned with DR_INIT > (known INTEGER_CST) and then with the known constant vect_align_c. > > So the most simplest version is > > if (misalign == DR_MISALIGNMENT_UNKNOWN) > return misalign; > /* vect_analyze_data_ref_accesses guarantees that DR_INIT are > INTEGER_CSTs and the first element in the group has the lowest > address. Likewise vect_compute_data_ref_alignment will > have ensured that target_alignment is constant and otherwise > set misalign to DR_MISALIGNMENT_UNKNOWN. */ > HOST_WIDE_INT diff = (TREE_INT_CST_LOW (DR_INIT (dr_info->dr)) > - TREE_INT_CST_LOW (DR_INIT (first_dr->dr))); > gcc_assert (diff >= 0); > unsigned HOST_WIDE_INT target_alignment_c > = first_dr->target_alignment.to_constant (); > return (misalign + diff) % target_alignment_c; LGTM, thanks. Richard > Note my intent is to lift the restriction of having only one > vector type for vectorizing a DR group and alignment is where > currently correctness issues pop up so in the end > dr_misalignment () would get a vectype parameter (and the > odd negative STEP handling would move to the caller, eventually > with another 'offset' parameter to dr_misalignment () / aligned_access_p > ()). > > So if you are fine with improving the situation for poly-ints > after all this then I'd go with the most simplest variant above. > > OK? > > Thanks, > Richard.