public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Sandiford <richard.sandiford@arm.com>
To: Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org>
Cc: "Andre Vieira \(lists\)" <andre.simoesdiasvieira@arm.com>,
	Richard Biener <rguenther@suse.de>
Subject: Re: [PATCH 1v2/3][vect] Add main vectorized loop unrolling
Date: Mon, 04 Oct 2021 17:30:09 +0100	[thread overview]
Message-ID: <mptee9069da.fsf@arm.com> (raw)
In-Reply-To: <4272814n-8538-p793-157q-5n6q16r48n51@fhfr.qr> (Richard Biener via Gcc-patches's message of "Fri, 1 Oct 2021 10:19:23 +0200 (CEST)")

Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On Thu, 30 Sep 2021, Andre Vieira (lists) wrote:
>
>> Hi,
>> 
>> 
>> >> That just forces trying the vector modes we've tried before. Though I might
>> >> need to revisit this now I think about it. I'm afraid it might be possible
>> >> for
>> >> this to generate an epilogue with a vf that is not lower than that of the
>> >> main
>> >> loop, but I'd need to think about this again.
>> >>
>> >> Either way I don't think this changes the vector modes used for the
>> >> epilogue.
>> >> But maybe I'm just missing your point here.
>> > Yes, I was refering to the above which suggests that when we vectorize
>> > the main loop with V4SF but unroll then we try vectorizing the
>> > epilogue with V4SF as well (but not unrolled).  I think that's
>> > premature (not sure if you try V8SF if the main loop was V4SF but
>> > unrolled 4 times).
>> 
>> My main motivation for this was because I had a SVE loop that vectorized with
>> both VNx8HI, then V8HI which beat VNx8HI on cost, then it decided to unroll
>> V8HI by two and skipped using VNx8HI as a predicated epilogue which would've
>> been the best choice.
>
> I see, yes - for fully predicated epilogues it makes sense to consider
> the same vector mode as for the main loop anyways (independent on
> whether we're unrolling or not).  One could argue that with an
> unrolled V4SImode main loop a predicated V8SImode epilogue would also
> be a good match (but then somehow costing favored the unrolled V4SI
> over the V8SI for the main loop...).
>
>> So that is why I decided to just 'reset' the vector_mode selection. In a
>> scenario where you only have the traditional vector modes it might make less
>> sense.
>> 
>> Just realized I still didn't add any check to make sure the epilogue has a
>> lower VF than the previous loop, though I'm still not sure that could happen.
>> I'll go look at where to add that if you agree with this.
>
> As said above, it only needs a lower VF in case the epilogue is not
> fully masked - otherwise the same VF would be OK.
>
>> >> I can move it there, it would indeed remove the need for the change to
>> >> vect_update_vf_for_slp, the change to
>> >> vect_determine_partial_vectors_and_peeling would still be required I think.
>> >> It
>> >> is meant to disable using partial vectors in an unrolled loop.
>> > Why would we disable the use of partial vectors in an unrolled loop?
>> The motivation behind that is that the overhead caused by generating
>> predicates for each iteration will likely be too much for it to be profitable
>> to unroll. On top of that, when dealing with low iteration count loops, if
>> executing one predicated iteration would be enough we now still need to
>> execute all other unrolled predicated iterations, whereas if we keep them
>> unrolled we skip the unrolled loops.
>
> OK, I guess we're not factoring in costs when deciding on predication
> but go for it if it's gernally enabled and possible.

Yeah.  That's mostly be design in SVE's case, but I can see that it might
need tweaking for other targets.

I don't think that's the really the “problem” (if it is a problem)
for the unroll decision though.  The “correct” way to unroll SVE loops,
which we're hoping to add at some point, is to predicate only the final
vector in each unrolled iteration.  So if the unroll factor is 4, say,
the unrolled iterations would be:

  - unrolled 4x
    - unpredicated vector 0
    - unpredicated vector 1
    - unpredicated vector 2
    - predicated vector 3
 
The epilogue loop would then be predicated and execute at most
3 times.  Alternatively, there could be two non-looping epilogues:

  - unrolled 2x
    - unpredicated vector 0
    - predicated vector 1
  
  - not unrolled
    - predicated vector 0

This ensures that every executed vector operation does at least
some useful work.

Like Andre says, if we predicate as things stand, we'd have:

  - unrolled 4x
    - predicated vector 0
    - predicated vector 1
    - predicated vector 2
    - predicated vector 3

where the code could end up executing 4 vector operations per
scalar operation even if the final vector iteration only needs
to process 2 elements.  I don't think that's a costing issue,
it's just building in redundancy that doesn't need to be there.

Thanks,
Richard

  reply	other threads:[~2021-10-04 16:30 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-17 15:27 [PATCH 0/3][vect] Enable vector unrolling of main loop Andre Vieira (lists)
2021-09-17 15:31 ` [PATCH 1/3][vect] Add main vectorized loop unrolling Andre Vieira (lists)
2021-09-21 12:30   ` Richard Biener
2021-09-21 16:34     ` Andre Vieira (lists)
2021-09-22  6:14       ` Richard Biener
2021-09-30  8:52         ` [PATCH 1v2/3][vect] " Andre Vieira (lists)
2021-10-01  8:19           ` Richard Biener
2021-10-04 16:30             ` Richard Sandiford [this message]
2021-10-12 10:35             ` Andre Vieira (lists)
2021-10-15  8:48               ` Richard Biener
2021-10-20 13:29                 ` Andre Vieira (lists)
2021-10-21 12:14                   ` Richard Biener
2021-10-22 10:18                     ` Richard Sandiford
2021-11-11 16:02                       ` Andre Vieira (lists)
2021-11-12 13:12                         ` Richard Biener
2021-11-22 11:41                           ` Andre Vieira (lists)
2021-11-22 12:39                             ` Richard Biener
2021-11-24  9:46                               ` Andre Vieira (lists)
2021-11-24 11:00                                 ` Richard Biener
2021-11-25 10:40                                   ` Andre Vieira (lists)
2021-11-25 12:46                                     ` Richard Biener
2021-11-30 11:36                                       ` Andre Vieira (lists)
2021-11-30 13:56                                         ` Richard Biener
2021-12-07 11:27                                           ` [vect] Re-analyze all modes for epilogues Andre Vieira (lists)
2021-12-07 11:31                                             ` Andre Vieira (lists)
2021-12-07 11:48                                               ` Richard Biener
2021-12-07 13:31                                                 ` Richard Sandiford
2021-12-07 13:33                                                   ` Richard Biener
2021-12-07 11:45                                             ` Richard Biener
2021-12-07 15:17                                               ` Andre Vieira (lists)
2021-12-13 16:41                                                 ` Andre Vieira (lists)
2021-12-14 11:39                                                   ` Richard Sandiford
2021-12-17 16:33                                                     ` Andre Vieira (lists)
2022-01-07 12:39                                                       ` Richard Sandiford
2022-01-10 18:31                                           ` [PATCH 1v2/3][vect] Add main vectorized loop unrolling Andre Vieira (lists)
2022-01-11  7:14                                             ` Richard Biener
2021-10-22 10:12                   ` Richard Sandiford
2021-09-17 15:32 ` [PATCH 2/3][vect] Consider outside costs earlier for epilogue loops Andre Vieira (lists)
2021-10-14 13:44   ` Andre Vieira (lists)
2021-10-22 15:33   ` Richard Sandiford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mptee9069da.fsf@arm.com \
    --to=richard.sandiford@arm.com \
    --cc=andre.simoesdiasvieira@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=rguenther@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).