From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.sandiford@arm.com>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by sourceware.org (Postfix) with ESMTP id 13DD23858C3B
 for <gcc-patches@gcc.gnu.org>; Wed, 15 Sep 2021 10:08:46 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 13DD23858C3B
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9839C6D;
 Wed, 15 Sep 2021 03:08:45 -0700 (PDT)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 17C963F5A1;
 Wed, 15 Sep 2021 03:08:44 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org>
Mail-Followup-To: Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org>,
 Richard Biener <rguenther@suse.de>, richard.sandiford@arm.com
Cc: Richard Biener <rguenther@suse.de>
Subject: Re: [PATCH] Maintain (mis-)alignment info in the first element of a
 group
References: <94r9n910-qsp7-778q-5ooq-pno71s1sqn6o@fhfr.qr>
 <mpth7encszj.fsf@arm.com> <n25qnr9-463-q432-p12q-31s6142rp9pq@fhfr.qr>
 <mpt5yv2cmzs.fsf@arm.com>
 <s7p2933o-4058-28nq-351r-8224182oon96@fhfr.qr>
Date: Wed, 15 Sep 2021 11:08:43 +0100
In-Reply-To: <s7p2933o-4058-28nq-351r-8224182oon96@fhfr.qr> (Richard Biener
 via Gcc-patches's message of "Wed, 15 Sep 2021 11:47:13 +0200 (CEST)")
Message-ID: <mptzgseb178.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Wed, 15 Sep 2021 10:08:48 -0000

Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On Wed, 15 Sep 2021, Richard Sandiford wrote:
>> Richard Biener <rguenther@suse.de> writes:
>> > On Tue, 14 Sep 2021, Richard Sandiford wrote:
>> >
>> >> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> >> > This changes us to maintain and compute (mis-)alignment info for
>> >> > the first element of a group only rather than for each DR when
>> >> > doing interleaving and for the earliest, first, or first in the SLP
>> >> > node (or any pair or all three of those) when SLP vectorizing.
>> >> >
>> >> > For this to work out the easiest way I have changed the accessors
>> >> > DR_MISALIGNMENT and DR_TARGET_ALIGNMENT to do the indirection to
>> >> > the first element rather than adjusting all callers.
>> >> > dr_misalignment is moved out-of-line and I'm not too fond of the
>> >> > poly-int dances there (any hints?), but basically we are now
>> >> > adjusting the first elements misalignment based on the DR_INIT
>> >> > difference.
>> >> >
>> >> > Bootstrap & regtest running on x86_64-unknown-linux-gnu.
>> >> >
>> >> > Richard.
>> >> >
>> >> > 2021-09-13  Richard Biener  <rguenther@suse.de>
>> >> >
>> >> > 	* tree-vectorizer.h (dr_misalignment): Move out of line.
>> >> > 	(dr_target_alignment): New.
>> >> > 	(DR_TARGET_ALIGNMENT): Wrap dr_target_alignment.
>> >> > 	(set_dr_target_alignment): New.
>> >> > 	(SET_DR_TARGET_ALIGNMENT): Wrap set_dr_target_alignment.
>> >> > 	* tree-vect-data-refs.c (dr_misalignment): Compute and
>> >> > 	return the group members misalignment.
>> >> > 	(vect_compute_data_ref_alignment): Use SET_DR_TARGET_ALIGNMENT.
>> >> > 	(vect_analyze_data_refs_alignment): Compute alignment only
>> >> > 	for the first element of a DR group.
>> >> > 	(vect_slp_analyze_node_alignment): Likewise.
>> >> > ---
>> >> >  gcc/tree-vect-data-refs.c | 65 ++++++++++++++++++++++++---------------
>> >> >  gcc/tree-vectorizer.h     | 24 ++++++++++-----
>> >> >  2 files changed, 57 insertions(+), 32 deletions(-)
>> >> >
>> >> > diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
>> >> > index 66e76132d14..b53d6a0b3f1 100644
>> >> > --- a/gcc/tree-vect-data-refs.c
>> >> > +++ b/gcc/tree-vect-data-refs.c
>> >> > @@ -887,6 +887,36 @@ vect_slp_analyze_instance_dependence (vec_info *vinfo, slp_instance instance)
>> >> >    return res;
>> >> >  }
>> >> >  
>> >> > +/* Return the misalignment of DR_INFO.  */
>> >> > +
>> >> > +int
>> >> > +dr_misalignment (dr_vec_info *dr_info)
>> >> > +{
>> >> > +  if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt))
>> >> > +    {
>> >> > +      dr_vec_info *first_dr
>> >> > +	= STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (dr_info->stmt));
>> >> > +      int misalign = first_dr->misalignment;
>> >> > +      gcc_assert (misalign != DR_MISALIGNMENT_UNINITIALIZED);
>> >> > +      if (misalign == DR_MISALIGNMENT_UNKNOWN)
>> >> > +	return misalign;
>> >> > +      poly_offset_int diff = (wi::to_poly_offset (DR_INIT (dr_info->dr))
>> >> > +			      - wi::to_poly_offset (DR_INIT (first_dr->dr)));
>> >> > +      poly_int64 mispoly = misalign + diff.to_constant ().to_shwi ();
>> >> > +      bool res = known_misalignment (mispoly,
>> >> > +				     first_dr->target_alignment.to_constant (),
>> >> > +				     &misalign);
>> >> > +      gcc_assert (res);
>> >> > +      return misalign;
>> >> 
>> >> Yeah, not too keen on the to_constants here.  The one on diff looks
>> >> redundant -- you could just use diff.force_shwi () instead, and
>> >> keep everything poly_int.
>> >>
>> >> For the known_misalignment I think we should use:
>> >> 
>> >>    if (!can_div_trunc_p (mispoly, first_dr->target_alignment,
>> >> 		         &quotient, &misalign))
>> >>      misalign = DR_MISALIGNMENT_UNKNOWN;
>> >>    return misalign;
>> >> 
>> >> There are then no to_constant assumptions.
>> >
>> > OK, note that group analysis does
>> >
>> >           /* Check that the DR_INITs are compile-time constants.  */
>> >           if (TREE_CODE (DR_INIT (dra)) != INTEGER_CST
>> >               || TREE_CODE (DR_INIT (drb)) != INTEGER_CST)
>> >             break;
>> >
>> >           /* Sorting has ensured that DR_INIT (dra) <= DR_INIT (drb).  */
>> >           HOST_WIDE_INT init_a = TREE_INT_CST_LOW (DR_INIT (dra));
>> >           HOST_WIDE_INT init_b = TREE_INT_CST_LOW (DR_INIT (drb));
>> >
>> > so I'm confident my variant was "correct", but it still was ugly.
>> 
>> Ah, OK.  In that case I don't mind the original version, but it would be
>> good to have a comment above the to_constant saying where the condition
>> is enforced.
>> 
>> I'm just trying to avoid to_constant calls with no comment to explain
>> them, and with no nearby is_constant call.  Otherwise it could end up
>> a bit like tree_to_uhwi, where sometimes tree_fits_uhwi_p really has
>> been checked earlier (not always obvious where) and sometimes
>> tree_to_uhwi is just used out of hope, to avoid having to think about
>> the alternative.
>> 
>> > There's also the issue that target_alignment is poly_uint64 but
>> > misalignment is signed int.
>> >
>> > Note that can_div_trunc_p seems to require a poly_uint64 remainder,
>> > I'm not sure what to do with that, so I used is_constant.
>> 
>> Ah, yeah, forgot about that sorry.  I guess in that case, using
>> is_constant on first_dr->target_alignment and sticking with
>> known_misalignment would make sense.
>> 
>> > Btw, to what value do we want to align with variable sized vectors?
>> > We are aligning globals according to target_alignment but only
>> > support that for constant alignment.  Most users only want to
>> > distinguish between aligned or not aligned and the actual misalignment
>> > is only used to seed the SSA alignment info, so I suppose being
>> > too conservative in dr_misalignment for variable size vectors isn't
>> > too bad if we correctly identify fully aligned accesses?
>> 
>> In practice we don't require more than element alignment for VLA SVE
>> as things stand.  However, there's support for using fully-predicated
>> loops in which the first loop iteration aligns by using leading
>> inactive lanes where possible (e.g. start with 1 inactive lane
>> for a misalignment of 1).  In principle that would work for VLA too,
>> we just haven't implemented it yet.
>> 
>> > I'm now testing a variant that looks like
>> >
>> > int
>> > dr_misalignment (dr_vec_info *dr_info)
>> > {
>> >   if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt))
>> >     {
>> >       dr_vec_info *first_dr
>> >         = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (dr_info->stmt));
>> >       int misalign = first_dr->misalignment;
>> >       gcc_assert (misalign != DR_MISALIGNMENT_UNINITIALIZED);
>> >       if (misalign == DR_MISALIGNMENT_UNKNOWN)
>> >         return misalign;
>> >       /* vect_analyze_data_ref_accesses guarantees that DR_INIT are
>> >          INTEGER_CSTs and the first element in the group has the lowest
>> >          address.  */
>> >       poly_offset_int diff = (wi::to_poly_offset (DR_INIT (dr_info->dr))
>> >                               - wi::to_poly_offset (DR_INIT (first_dr->dr)));
>> >       gcc_assert (known_ge (diff, 0));
>> >       poly_uint64 mispoly = misalign + diff.force_uhwi ();
>> >       unsigned HOST_WIDE_INT quotient;
>> >       if (can_div_trunc_p (mispoly, first_dr->target_alignment, &quotient,
>> >                            &mispoly)
>> >           && mispoly.is_constant ())
>> >         return mispoly.to_constant ();
>> >       return DR_MISALIGNMENT_UNKNOWN;
>> >
>> > I wonder why a non-poly works for the quotient here.  I suppose it's
>> > because the runtime factor is the same for all poly-ints and thus
>> > it cancels out?  But then the remainder likely will never be constant
>> > unless it is zero?
>> 
>> It's not so much that the quotient is guaranteed to be constant,
>> but that that particular overload is designed for the case in which
>> the caller only cares about constant quotients, and wants can_div_trunc_p
>> to fail otherwise.  In other words, it's really just a way of cutting down
>> on is_constant calls.
>> 
>> poly-int.h doesn't provide every possible overload of can_div_trunc_p.
>> In principle it could have a fully-general 4-poly_int version, but we
>> haven't needed that yet.  It could also have a new overload for the
>> case we care about here, avoiding the separate is_constant.
>
> Hmm.  I guess I'll see to somehow unify this with the logic in
> vect_compute_data_ref_alignment which at the moment forces
> DR_MISALIGNMENT to unknown if DR_TARGET_ALIGNMENT of the vector type
> isn't constant (so for SVE all accesses appear unaligned?).  In the
> end vect_compute_data_ref_alignment does
>
>   poly_int64 misalignment
>     = base_misalignment + wi::to_poly_offset (drb->init).force_shwi ();
>
>   /* If this is a backward running DR then first access in the larger
>      vectype actually is N-1 elements before the address in the DR.
>      Adjust misalign accordingly.  */
>   if (tree_int_cst_sgn (drb->step) < 0)
>     /* PLUS because STEP is negative.  */
>     misalignment += ((TYPE_VECTOR_SUBPARTS (vectype) - 1)
>                      * -TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE 
> (vectype))));
>
>   unsigned int const_misalignment;
>   if (!known_misalignment (misalignment, vect_align_c, 
> &const_misalignment))
>     {
>       if (dump_enabled_p ())
>         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>                          "Non-constant misalignment for access: %T\n", 
> ref);
>       return;
>     }
>
>   SET_DR_MISALIGNMENT (dr_info, const_misalignment);
>
> ignoring the negative step offset (where poly-int TYPE_VECTOR_SUBPARTS
> sneaks in) this combines integer base_misaligned with DR_INIT
> (known INTEGER_CST) and then with the known constant vect_align_c.
>
> So the most simplest version is
>
>       if (misalign == DR_MISALIGNMENT_UNKNOWN)
>         return misalign;
>       /* vect_analyze_data_ref_accesses guarantees that DR_INIT are
>          INTEGER_CSTs and the first element in the group has the lowest
>          address.  Likewise vect_compute_data_ref_alignment will
>          have ensured that target_alignment is constant and otherwise
>          set misalign to DR_MISALIGNMENT_UNKNOWN.  */
>       HOST_WIDE_INT diff = (TREE_INT_CST_LOW (DR_INIT (dr_info->dr))
>                             - TREE_INT_CST_LOW (DR_INIT (first_dr->dr)));
>       gcc_assert (diff >= 0);
>       unsigned HOST_WIDE_INT target_alignment_c 
>         = first_dr->target_alignment.to_constant ();
>       return (misalign + diff) % target_alignment_c;

LGTM, thanks.

Richard

> Note my intent is to lift the restriction of having only one
> vector type for vectorizing a DR group and alignment is where
> currently correctness issues pop up so in the end
> dr_misalignment () would get a vectype parameter (and the
> odd negative STEP handling would move to the caller, eventually
> with another 'offset' parameter to dr_misalignment () / aligned_access_p 
> ()).
>
> So if you are fine with improving the situation for poly-ints
> after all this then I'd go with the most simplest variant above.
>
> OK?
>
> Thanks,
> Richard.