Re: Re: [PATCH][RFT] Vectorization of first-order recurrences

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: 钟居哲 <juzhe.zhong@rivai.ai>
To: "Andrew Stubbs" <ams@codesourcery.com>,
	 rguenther <rguenther@suse.de>,
	 gcc-patches <gcc-patches@gcc.gnu.org>
Cc: richard.sandiford <richard.sandiford@arm.com>
Subject: Re: Re: [PATCH][RFT] Vectorization of first-order recurrences
Date: Mon, 10 Oct 2022 22:08:19 +0800	[thread overview]
Message-ID: <F7A2D6C9578C4101+2022101022081852194915@rivai.ai> (raw)
In-Reply-To: <49b46c57-70b6-d9c0-a267-5e2f8315382b@codesourcery.com>

[-- Attachment #1: Type: text/plain, Size: 3546 bytes --]

RVV also doesn't have a two-input permutation instructions (unlike ARM SVE has tbl instructions) and 
RVV needs about 4 instructions to handle this permutation, it still improve performance a lot.
I think backend should handle this. Because this first-order recurrence loop vectorizer always generates the
special permuation index = [vl-1, vl, vl+1,. ......]  (This index sequence pattern is just following LLVM). 
If the backend doesn't want this permuation happens, just recognize this index pattern and disable it.


juzhe.zhong@rivai.ai
 
From: Andrew Stubbs
Date: 2022-10-10 21:57
To: Richard Biener; gcc-patches@gcc.gnu.org
CC: richard.sandiford@arm.com; juzhe.zhong@rivai.ai
Subject: Re: [PATCH][RFT] Vectorization of first-order recurrences
On 10/10/2022 12:03, Richard Biener wrote:
> The following picks up the prototype by Ju-Zhe Zhong for vectorizing
> first order recurrences.  That solves two TSVC missed optimization PRs.
> 
> There's a new scalar cycle def kind, vect_first_order_recurrence
> and it's handling of the backedge value vectorization is complicated
> by the fact that the vectorized value isn't the PHI but instead
> a (series of) permute(s) shifting in the recurring value from the
> previous iteration.  I've implemented this by creating both the
> single vectorized PHI and the series of permutes when vectorizing
> the scalar PHI but leave the backedge values in both unassigned.
> The backedge values are (for the testcases) computed by a load
> which is also the place after which the permutes are inserted.
> That placement also restricts the cases we can handle (without
> resorting to code motion).
> 
> I added both costing and SLP handling though SLP handling is
> restricted to the case where a single vectorized PHI is enough.
> 
> Missing is epilogue handling - while prologue peeling would
> be handled transparently by adjusting iv_phi_p the epilogue
> case doesn't work with just inserting a scalar LC PHI since
> that a) keeps the scalar load live and b) that loads is the
> wrong one, it has to be the last, much like when we'd vectorize
> the LC PHI as live operation.  Unfortunately LIVE
> compute/analysis happens too early before we decide on
> peeling.  When using fully masked loop vectorization the
> vect-recurr-6.c works as expected though.
> 
> I have tested this on x86_64 for now, but since epilogue
> handling is missing there's probably no practical cases.
> My prototype WHILE_ULT AVX512 patch can handle vect-recurr-6.c
> just fine but I didn't feel like running SPEC within SDE nor
> is the WHILE_ULT patch complete enough.  Builds of SPEC 2k7
> with fully masked loops succeed (minus three cases of
> PR107096, caused by my WHILE_ULT prototype).
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> 
> Testing with SVE, GCN or RVV appreciated, ideas how to cleanly
> handle epilogues welcome.
 
The testcases all produce correct code on GCN and pass the execution tests.
 
The code isn't terribly optimal because we don't have a two-input 
permutation instruction, so we permute each half separately and 
vec_merge the results. In this case the first vector is always a no-op 
permutation so that's wasted cycles. We'd really want a vector rotate 
and write-lane (or the other way around). I think the special-case 
permutations can be recognised and coded into the backend, but I don't 
know if we can easily tell that the first vector is just a bunch of 
duplicates, when it's not constant.
 
Andrew

next prev parent reply	other threads:[~2022-10-10 14:08 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-10 11:03 Richard Biener
2022-10-10 11:13 ` juzhe.zhong
2022-10-10 13:57 ` Andrew Stubbs
2022-10-10 14:08   ` 钟居哲 [this message]
2022-10-11  7:01   ` Richard Biener
2022-10-11  8:42     ` Richard Sandiford
2022-10-14  7:07       ` Richard Biener
2022-10-14  7:20         ` juzhe.zhong
2022-10-14  9:42         ` Andrew Stubbs
2022-10-14  9:46           ` Richard Biener
2022-10-17  8:48         ` Richard Sandiford
2022-10-11  8:34 ` juzhe.zhong
2022-10-17 12:14   ` Richard Biener
2022-10-12  9:48 ` Richard Sandiford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=F7A2D6C9578C4101+2022101022081852194915@rivai.ai \
    --to=juzhe.zhong@rivai.ai \
    --cc=ams@codesourcery.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=rguenther@suse.de \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).