From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <rguenther@suse.de>
Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29])
 by sourceware.org (Postfix) with ESMTPS id 338C03858408
 for <gcc-patches@gcc.gnu.org>; Mon, 27 Sep 2021 06:25:22 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 338C03858408
Received: from relay1.suse.de (relay1.suse.de [149.44.160.133])
 by smtp-out2.suse.de (Postfix) with ESMTP id 50B982007E;
 Mon, 27 Sep 2021 06:25:21 +0000 (UTC)
Received: from murzim.suse.de (murzim.suse.de [10.160.4.192])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by relay1.suse.de (Postfix) with ESMTPS id 4AE7F25D57;
 Mon, 27 Sep 2021 06:25:21 +0000 (UTC)
Date: Mon, 27 Sep 2021 08:25:21 +0200 (CEST)
From: Richard Biener <rguenther@suse.de>
To: Richard Sandiford <richard.sandiford@arm.com>
cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] Allow different vector types for stmt groups
In-Reply-To: <mptpmsx99ky.fsf@arm.com>
Message-ID: <839254rr-2sns-3p81-1026-2qqnrn13q35q@fhfr.qr>
References: <op1o7789-5ns6-q56-6on9-948o6o2q83r1@fhfr.qr>
 <mptpmsx99ky.fsf@arm.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8BIT
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Sep 2021 06:25:23 -0000

On Fri, 24 Sep 2021, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> > This allows vectorization (in practice non-loop vectorization) to
> > have a stmt participate in different vector type vectorizations.
> > It allows us to remove vect_update_shared_vectype and replace it
> > by pushing/popping STMT_VINFO_VECTYPE from SLP_TREE_VECTYPE around
> > vect_analyze_stmt and vect_transform_stmt.
> >
> > For data-ref the situation is a bit more complicated since we
> > analyze alignment info with a specific vector type in mind which
> > doesn't play well when that changes.
> >
> > So the bulk of the change is passing down the actual vector type
> > used for a vectorized access to the various accessors of alignment
> > info, first and foremost dr_misalignment but also aligned_access_p,
> > known_alignment_for_access_p, vect_known_alignment_in_bytes and
> > vect_supportable_dr_alignment.  I took the liberty to replace
> > ALL_CAPS macro accessors with the lower-case function invocations.
> >
> > The actual changes to the behavior are in dr_misalignment which now
> > is the place factoring in the negative step adjustment as well as
> > handling alignment queries for a vector type with bigger alignment
> > requirements than what we can (or have) analyze(d).
> >
> > vect_slp_analyze_node_alignment makes use of this and upon receiving
> > a vector type with a bigger alingment desire re-analyzes the DR
> > with respect to it but keeps an older more precise result if possible.
> > In this context it might be possible to do the analysis just once
> > but instead of analyzing with respect to a specific desired alignment
> > look for the biggest alignment we can compute a not unknown alignment.
> >
> > The ChangeLog includes the functional changes but not the bulk due
> > to the alignment accessor API changes - I hope that's something good.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, testing on SPEC
> > CPU 2017 in progress (for stats and correctness).
> >
> > Any comments?
> 
> Sorry for the super-slow response, some comments below.
> 
> > […]
> > diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
> > index a57700f2c1b..c42fc2fb272 100644
> > --- a/gcc/tree-vect-data-refs.c
> > +++ b/gcc/tree-vect-data-refs.c
> > @@ -887,37 +887,53 @@ vect_slp_analyze_instance_dependence (vec_info *vinfo, slp_instance instance)
> >    return res;
> >  }
> >  
> > -/* Return the misalignment of DR_INFO.  */
> > +/* Return the misalignment of DR_INFO accessed in VECTYPE.  */
> >  
> >  int
> > -dr_misalignment (dr_vec_info *dr_info)
> > +dr_misalignment (dr_vec_info *dr_info, tree vectype)
> >  {
> > +  HOST_WIDE_INT diff = 0;
> > +  /* Alignment is only analyzed for the first element of a DR group,
> > +     use that but adjust misalignment by the offset of the access.  */
> >    if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt))
> >      {
> >        dr_vec_info *first_dr
> >  	= STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (dr_info->stmt));
> > -      int misalign = first_dr->misalignment;
> > -      gcc_assert (misalign != DR_MISALIGNMENT_UNINITIALIZED);
> > -      if (misalign == DR_MISALIGNMENT_UNKNOWN)
> > -	return misalign;
> >        /* vect_analyze_data_ref_accesses guarantees that DR_INIT are
> >  	 INTEGER_CSTs and the first element in the group has the lowest
> >  	 address.  Likewise vect_compute_data_ref_alignment will
> >  	 have ensured that target_alignment is constant and otherwise
> >  	 set misalign to DR_MISALIGNMENT_UNKNOWN.  */
> 
> Can you move the second sentence down so that it stays with the to_constant?
> 
> > -      HOST_WIDE_INT diff = (TREE_INT_CST_LOW (DR_INIT (dr_info->dr))
> > -			    - TREE_INT_CST_LOW (DR_INIT (first_dr->dr)));
> > +      diff = (TREE_INT_CST_LOW (DR_INIT (dr_info->dr))
> > +	      - TREE_INT_CST_LOW (DR_INIT (first_dr->dr)));
> >        gcc_assert (diff >= 0);
> > -      unsigned HOST_WIDE_INT target_alignment_c
> > -	= first_dr->target_alignment.to_constant ();
> > -      return (misalign + diff) % target_alignment_c;
> > +      dr_info = first_dr;
> >      }
> > -  else
> > +
> > +  int misalign = dr_info->misalignment;
> > +  gcc_assert (misalign != DR_MISALIGNMENT_UNINITIALIZED);
> > +  if (misalign == DR_MISALIGNMENT_UNKNOWN)
> > +    return misalign;
> > +
> > +  /* If the access is only aligned for a vector type with smaller alignment
> > +     requirement the access has unknown misalignment.  */
> > +  if (maybe_lt (dr_info->target_alignment * BITS_PER_UNIT,
> > +		targetm.vectorize.preferred_vector_alignment (vectype)))
> > +    return DR_MISALIGNMENT_UNKNOWN;
> > +
> > +  /* If this is a backward running DR then first access in the larger
> > +     vectype actually is N-1 elements before the address in the DR.
> > +     Adjust misalign accordingly.  */
> > +  if (tree_int_cst_sgn (DR_STEP (dr_info->dr)) < 0)
> >      {
> > -      int misalign = dr_info->misalignment;
> > -      gcc_assert (misalign != DR_MISALIGNMENT_UNINITIALIZED);
> > -      return misalign;
> > +      if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant ())
> > +	return DR_MISALIGNMENT_UNKNOWN;
> > +      diff += ((TYPE_VECTOR_SUBPARTS (vectype).to_constant () - 1)
> > +	       * -TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (vectype))));
> 
> The target alignment might only be element alignment.  If so, I think we
> should either skip this block or return early above.  This would avoid
> returning DR_MISALIGNMENT_UNKNOWN unnecessarily for VLA.
> 
> Or, more generally, I think this is case where known_misalignment
> does work.  I.e. diff can be a poly_int64 and:
> 
> >      }
> > +  unsigned HOST_WIDE_INT target_alignment_c
> > +    = dr_info->target_alignment.to_constant ();
> > +  return (misalign + diff) % target_alignment_c;
> 
> the last line can become:
> 
>   if (!known_misalignment (misalign + diff, target_alignment_c, &misalign))
>     return DR_MISALIGNMENT_UNKNOWN;
>   return misalign;
> 
> avoiding the need for the is_constant above.

OK, done - this then matches more closely what 
vect_compute_data_ref_alignment did.

> > […]
> > @@ -1896,10 +1902,9 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
> >  	      unsigned int target_align =
> >  		DR_TARGET_ALIGNMENT (dr_info).to_constant ();
> >  	      unsigned int dr_size = vect_get_scalar_dr_size (dr_info);
> > -	      mis = (negative
> > -		     ? DR_MISALIGNMENT (dr_info)
> > -		     : -DR_MISALIGNMENT (dr_info));
> > -	      if (DR_MISALIGNMENT (dr_info) != 0)
> > +	      unsigned int mis = dr_misalignment (dr_info, vectype);
> > +	      mis = negative ? mis : -mis;
> 
> Just checking: is this still correct?  It probably is, but it looked a
> bit like we're handling negative steps twice.  Same further down.

Yeah, it's done the same number of times than before.  But I didn't try
to understand this instance but I hope to followup with a change that
moves the negative step adjustment to where we actually do vectorize
with the computed address offset ... (there's a PR which shows we
currently generate wrong-code for negative stride SLP where this
adjustment is off and thus we get wrong aligned/misaligned accesses)

Richard.