Re: [PATCH 00/10] vect: Reuse reduction accumulators between loops

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: "Kewen.Lin" <linkw@linux.ibm.com>
To: richard.sandiford@arm.com
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH 00/10] vect: Reuse reduction accumulators between loops
Date: Sat, 10 Jul 2021 10:11:35 +0800	[thread overview]
Message-ID: <7b5ba7c1-4e8b-9aab-c0f3-ce05a701c899@linux.ibm.com> (raw)
In-Reply-To: <mptpmvtrner.fsf@arm.com>

Hi Richard,

on 2021/7/8 下午8:38, Richard Sandiford via Gcc-patches wrote:
> Quoting from the final patch in the series:
> 
> ------------------------------------------------------------------------
> This patch adds support for reusing a main loop's reduction accumulator
> in an epilogue loop.  This in turn lets the loops share a single piece
> of vector->scalar reduction code.
> 
> The patch has the following restrictions:
> 
> (1) The epilogue reduction can only operate on a single vector
>     (e.g. ncopies must be 1 for non-SLP reductions, and the group size
>     must be <= the element count for SLP reductions).
> 
> (2) Both loops must use the same vector mode for their accumulators.
>     This means that the patch is restricted to targets that support
>     --param vect-partial-vector-usage=1.
> 
> (3) The reduction must be a standard “tree code” reduction.
> 
> However, these restrictions could be lifted in future.  For example,
> if the main loop operates on 128-bit vectors and the epilogue loop
> operates on 64-bit vectors, we could in future reduce the 128-bit
> vector by one stage and use the 64-bit result as the starting point
> for the epilogue result.
> 
> The patch tries to handle chained SLP reductions, unchained SLP
> reductions and non-SLP reductions.  It also handles cases in which
> the epilogue loop is entered directly (rather than via the main loop)
> and cases in which the epilogue loop can be skipped.
> ------------------------------------------------------------------------
> 
> However, it ended up being difficult to do that without some preparatory
> clean-ups.  Some of them could probably stand on their own, but others
> are a bit “meh” without the final patch to justify them.
> 
> The diff below shows the effect of the patch when compiling:
> 
>   unsigned short __attribute__((noipa))
>   add_loop (unsigned short *x, int n)
>   {
>     unsigned short res = 0;
>     for (int i = 0; i < n; ++i)
>       res += x[i];
>     return res;
>   }
> 
> with -O3 --param vect-partial-vector-usage=1 on an SVE target:
> 
> add_loop:				add_loop:
> .LFB0:					.LFB0:
> 	.cfi_startproc				.cfi_startproc
> 	mov	x4, x0		      <
> 	cmp	w1, 0				cmp	w1, 0
> 	ble	.L7				ble	.L7
> 	cnth	x0		      |		cnth	x4
> 	sub	w2, w1, #1			sub	w2, w1, #1
> 	sub	w3, w0, #1	      |		sub	w3, w4, #1
> 	cmp	w2, w3				cmp	w2, w3
> 	bcc	.L8				bcc	.L8
> 	sub	w0, w1, w0	      |		sub	w4, w1, w4
> 	mov	x3, 0				mov	x3, 0
> 	cnth	x5				cnth	x5
> 	mov	z0.b, #0			mov	z0.b, #0
> 	ptrue	p0.b, all			ptrue	p0.b, all
> 	.p2align 3,,7				.p2align 3,,7
> .L4:					.L4:
> 	ld1h	z1.h, p0/z, [x4, x3,  |		ld1h	z1.h, p0/z, [x0, x3, 
> 	mov	x2, x3				mov	x2, x3
> 	add	x3, x3, x5			add	x3, x3, x5
> 	add	z0.h, z0.h, z1.h		add	z0.h, z0.h, z1.h
> 	cmp	w0, w3		      |		cmp	w4, w3
> 	bcs	.L4				bcs	.L4
> 	uaddv	d0, p0, z0.h	      <
> 	umov	w0, v0.h[0]	      <
> 	inch	x2				inch	x2
> 	and	w0, w0, 65535	      <
> 	cmp	w1, w2				cmp	w1, w2
> 	beq	.L2		      |		beq	.L6
> .L3:					.L3:
> 	sub	w1, w1, w2			sub	w1, w1, w2
> 	mov	z1.b, #0	      |		add	x2, x0, w2, uxtw 1
> 	whilelo	p0.h, wzr, w1			whilelo	p0.h, wzr, w1
> 	add	x2, x4, w2, uxtw 1    |		ld1h	z1.h, p0/z, [x2]
> 	ptrue	p1.b, all	      |		add	z0.h, p0/m, z0.h, z1.
> 	ld1h	z0.h, p0/z, [x2]      |	.L6:
> 	sel	z0.h, p0, z0.h, z1.h  |		ptrue	p0.b, all
> 	uaddv	d0, p1, z0.h	      |		uaddv	d0, p0, z0.h
> 	fmov	x1, d0		      |		umov	w0, v0.h[0]
> 	add	w0, w0, w1, uxth      <
> 	and	w0, w0, 65535			and	w0, w0, 65535
> .L2:				      <
> 	ret					ret
> 	.p2align 2,,3				.p2align 2,,3
> .L7:					.L7:
> 	mov	w0, 0				mov	w0, 0
> 	ret					ret
> .L8:					.L8:
> 	mov	w2, 0				mov	w2, 0
> 	mov	w0, 0		      |		mov	z0.b, #0
> 	b	.L3				b	.L3
> 	.cfi_endproc				.cfi_endproc
> 
> Kewen, could you give this a spin on Power 10 to see whether it
> works/helps there?  I've attached a combined diff.
> 

Thanks for the combined diff file.

I'm sorry that the current length based partial vector doesn't support
reduction, there are no conditional operations for length, we have to
preprocess the inactive lanes for the intermediate operations or final
reduction operations as operation types since the inactive lane value
is supposed to be undefined, this seems to require an efficient way to
turn length to a mask vector, Power10 doesn't have the corresponding
instruction so we have to do some tricks, it's still on my TODO list.
I did a hacking to relax the check in vectorizable_operation for
operations involved for reduction, I can see this patch series takes
effect for length based partial vector, so I believe it will help
length based partial vector once we enable it for reduction later.
Thanks for improving this!

This patch series was bootstrapped and regress-tested on Power10, also
benchmarked with SPEC2017 based on r12-2179 at Ofast unroll, no
remarkable regression and improvement was observed.

BR,
Kewen

next prev parent reply	other threads:[~2021-07-10  2:11 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-08 12:38 Richard Sandiford
2021-07-08 12:39 ` [PATCH 01/10] vect: Simplify epilogue reduction code Richard Sandiford
2021-07-08 12:58   ` Richard Biener
2021-07-08 12:39 ` [PATCH 02/10] vect: Create array_slice of live-out stmts Richard Sandiford
2021-07-08 12:58   ` Richard Biener
2021-07-08 12:39 ` [PATCH 03/10] vect: Remove new_phis from Richard Sandiford
2021-07-08 12:59   ` Richard Biener
2021-07-08 12:40 ` [PATCH 04/10] vect: Ensure reduc_inputs always have vectype Richard Sandiford
2021-07-08 13:01   ` Richard Biener
2021-07-13  9:26     ` Richard Sandiford
2021-07-08 12:40 ` [PATCH 05/10] vect: Add a vect_phi_initial_value helper function Richard Sandiford
2021-07-08 13:05   ` Richard Biener
2021-07-08 13:12     ` Richard Sandiford
2021-07-08 12:40 ` [PATCH 06/10] vect: Pass reduc_info to get_initial_defs_for_reduction Richard Sandiford
2021-07-08 13:10   ` Richard Biener
2021-07-08 16:48     ` Richard Sandiford
2021-07-09 11:33       ` Richard Biener
2021-07-08 12:41 ` [PATCH 07/10] vect: Pass reduc_info to get_initial_def_for_reduction Richard Sandiford
2021-07-08 12:41 ` [PATCH 08/10] vect: Generalise neutral_op_for_slp_reduction Richard Sandiford
2021-07-08 13:13   ` Richard Biener
2021-07-08 12:41 ` [PATCH 09/10] vect: Simplify get_initial_def_for_reduction Richard Sandiford
2021-07-08 13:14   ` Richard Biener
2021-07-08 12:43 ` [PATCH 10/10] vect: Reuse reduction accumulators between loops Richard Sandiford
2021-07-09 11:58   ` Richard Biener
2021-07-09 13:12     ` Richard Sandiford
2021-07-12  6:32       ` Richard Biener
2021-07-12 17:55         ` Richard Sandiford
2021-07-13  6:09           ` Richard Biener
2021-07-10  2:11 ` Kewen.Lin [this message]
2021-07-13  9:27   ` [PATCH 00/10] " Richard Sandiford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7b5ba7c1-4e8b-9aab-c0f3-ce05a701c899@linux.ibm.com \
    --to=linkw@linux.ibm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).