public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Sandiford <richard.sandiford@arm.com>
To: Richard Biener <rguenther@suse.de>
Cc: <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] Avoid peeling for gaps if accesses are aligned
Date: Wed, 01 Mar 2017 19:59:00 -0000	[thread overview]
Message-ID: <87shmwepju.fsf@e105548-lin.cambridge.arm.com> (raw)
In-Reply-To: <alpine.LSU.2.20.1703011534340.30051@zhemvz.fhfr.qr> (Richard	Biener's message of "Wed, 1 Mar 2017 15:37:30 +0100")

Richard Biener <rguenther@suse.de> writes:
> On Wed, 1 Mar 2017, Richard Sandiford wrote:
>
>> Richard Biener <rguenther@suse.de> writes:
>> > On Wed, 1 Mar 2017, Richard Sandiford wrote:
>> >
>> >> Richard Biener <rguenther@suse.de> writes:
>> >> > On Wed, 1 Mar 2017, Richard Sandiford wrote:
>> >> >
>> >> >> Sorry for the late reply, but:
>> >> >> 
>> >> >> Richard Biener <rguenther@suse.de> writes:
>> >> >> > On Mon, 7 Nov 2016, Richard Biener wrote:
>> >> >> >
>> >> >> >> 
>> >> >> >> Currently we force peeling for gaps whenever element overrun can occur
>> >> >> >> but for aligned accesses we know that the loads won't trap and thus
>> >> >> >> we can avoid this.
>> >> >> >> 
>> >> >> >> Bootstrap and regtest running on x86_64-unknown-linux-gnu (I expect
>> >> >> >> some testsuite fallout here so didn't bother to invent a new testcase).
>> >> >> >> 
>> >> >> >> Just in case somebody thinks the overrun is a bad idea in general
>> >> >> >> (even when not trapping).  Like for ASAN or valgrind.
>> >> >> >
>> >> >> > This is what I applied.
>> >> >> >
>> >> >> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
>> >> >> >
>> >> >> > Richard.
>> >> >> [...]
>> >> >> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
>> >> >> > index 15aec21..c29e73d 100644
>> >> >> > --- a/gcc/tree-vect-stmts.c
>> >> >> > +++ b/gcc/tree-vect-stmts.c
>> >> >> > @@ -1789,6 +1794,10 @@ get_group_load_store_type (gimple *stmt, tree vectype, bool slp,
>> >> >> >        /* If there is a gap at the end of the group then these optimizations
>> >> >> >  	 would access excess elements in the last iteration.  */
>> >> >> >        bool would_overrun_p = (gap != 0);
>> >> >> > +      /* If the access is aligned an overrun is fine.  */
>> >> >> > +      if (would_overrun_p
>> >> >> > +	  && aligned_access_p (STMT_VINFO_DATA_REF (stmt_info)))
>> >> >> > +	would_overrun_p = false;
>> >> >> >        if (!STMT_VINFO_STRIDED_P (stmt_info)
>> >> >> >  	  && (can_overrun_p || !would_overrun_p)
>> >> >> >  	  && compare_step_with_zero (stmt) > 0)
>> >> >> 
>> >> >> ...is this right for all cases?  I think it only looks for single-vector
>> >> >> alignment, but the gap can in principle be vector-sized or larger,
>> >> >> at least for load-lanes.
>> >> >>
>> >> >> E.g. say we have a 128-bit vector of doubles in a group of size 4
>> >> >> and a gap of 2 or 3.  Even if the access itself is aligned, the group
>> >> >> spans two vectors and we have no guarantee that the second one
>> >> >> is mapped.
>> >> >
>> >> > The check assumes that if aligned_access_p () returns true then the
>> >> > whole access is aligned in a way that it can't cross page boundaries.
>> >> > That's of course not the case if alignment is 16 bytes but the access
>> >> > will be a multiple of that.
>> >> >  
>> >> >> I haven't been able to come up with a testcase though.  We seem to be
>> >> >> overly conservative when computing alignments.
>> >> >
>> >> > Not sure if we can run into this with load-lanes given that bumps the
>> >> > vectorization factor.  Also does load-lane work with gaps?
>> >> >
>> >> > I think that gap can never be larger than nunits-1 so it is by definition
>> >> > in the last "vector" independent of the VF.
>> >> >
>> >> > Classical gap case is
>> >> >
>> >> > for (i=0; i<n; ++i)
>> >> >  {
>> >> >    y[3*i + 0] = x[4*i + 0];
>> >> >    y[3*i + 1] = x[4*i + 1];
>> >> >    y[3*i + 2] = x[4*i + 2];
>> >> >  }
>> >> >
>> >> > where x has a gap of 1.  You'll get VF of 12 for the above.  Make
>> >> > the y's different streams and you should get the perfect case for
>> >> > load-lane:
>> >> >
>> >> > for (i=0; i<n; ++i)
>> >> >  {
>> >> >    y[i] = x[4*i + 0];
>> >> >    z[i] = x[4*i + 1];
>> >> >    w[i] = x[4*i + 2];
>> >> >  } 
>> >> >
>> >> > previously we'd peel at least 4 iterations into the epilogue for
>> >> > the fear of accessing x[4*i + 3].  When x is V4SI aligned that's
>> >> > ok.
>> >> 
>> >> The case I was thinking of was like the second, but with the
>> >> element type being DI or DF and with the + 2 statement removed.
>> >> E.g.:
>> >> 
>> >> double __attribute__((noinline))
>> >> foo (double *a)
>> >> {
>> >>   double res = 0.0;
>> >>   for (int n = 0; n < 256; n += 4)
>> >>     res += a[n] + a[n + 1];
>> >>   return res;
>> >> }
>> >> 
>> >> (with -ffast-math).  We do use LD4 for this, and having "a" aligned
>> >> to V2DF isn't enough to guarantee that we can access a[n + 2]
>> >> and a[n + 3].
>> >
>> > Yes, indeed.  It's safe when peeling for gaps would remove
>> > N < alignof (ref) / sizeof (ref) scalar iterations.
>> >
>> > Peeling for gaps simply subtracts one from the niter of the vectorized 
>> > loop.
>> 
>> I think subtracting one is enough in all cases.  It's only the final
>> iteration of the scalar loop that can't access a[n + 2] and a[n + 3].
>> 
>> (Of course, subtracting one happens before peeling for niters, so it
>> only makes a difference if the original niters was a multiple of the VF,
>> in which case we peel a full vector's worth of iterations instead of
>> peeling none.)
>
> I think one could extend the gcc.dg/vect/group-no-gaps-1.c testcase
> to covert the case with bigger VF, for example by having different
> types for a and b.
>
> I can try playing with this later this week but if you can come up
> with a testcase that exercises load-/store-lanes that would be great.
> See also gcc.dg/vect/pr49038.c for a less convoluted testcase to copy
> from.

Yeah, that's what I'd tried originally, but couldn't convince
vect_compute_data_ref_alignment to realise that the base was aligned.

On that topic, the same patch had:

      if (DECL_USER_ALIGN (base))
	{
	  if (dump_enabled_p ())
	    {
	      dump_printf_loc (MSG_NOTE, vect_location,
			       "not forcing alignment of user-aligned "
			       "variable: ");
	      dump_generic_expr (MSG_NOTE, TDF_SLIM, base);
	      dump_printf (MSG_NOTE, "\n");
	    }
	  return true;
	}

But shouldn't this also be testing whether decl is packed?  The docs
say that the aligned attribute only specifies a minimum, so increasing
it should be fine.

Thanks,
Richard

  reply	other threads:[~2017-03-01 19:59 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-07 15:02 Richard Biener
2016-11-08 10:20 ` Richard Biener
2017-03-01 11:54   ` Richard Sandiford
2017-03-01 12:06     ` Richard Biener
2017-03-01 12:18       ` Richard Sandiford
2017-03-01 12:23         ` Richard Biener
2017-03-01 12:40           ` Richard Sandiford
2017-03-01 14:37             ` Richard Biener
2017-03-01 19:59               ` Richard Sandiford [this message]
2017-03-02 12:05                 ` Richard Biener
2017-03-03  9:39                   ` Richard Sandiford
2017-03-03 10:19                     ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87shmwepju.fsf@e105548-lin.cambridge.arm.com \
    --to=richard.sandiford@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=rguenther@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).