public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com>
To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Cc: Richard Sandiford <richard.sandiford@arm.com>,
	bin.cheng@linux.alibaba.com, Richard Biener <rguenther@suse.de>
Subject: [RFC][ivopts] Generate better code for IVs with uses outside the loop (was Re: [RFC] Implementing detection of saturation and rounding arithmetic)
Date: Thu, 3 Jun 2021 17:41:31 +0100	[thread overview]
Message-ID: <97def54f-e6da-935d-f6ca-21994ea4e286@arm.com> (raw)
In-Reply-To: <89750da9-54d1-6a21-ecff-0e10d3236b40@arm.com>

Streams got crossed there and used the wrong subject ...

On 03/06/2021 17:34, Andre Vieira (lists) via Gcc-patches wrote:
> Hi,
>
> This RFC is motivated by the IV sharing RFC in 
> https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569502.html and the 
> need to have the IVOPTS pass be able to clean up IV's shared between 
> multiple loops. When creating a similar problem with C code I noticed 
> IVOPTs treated IV's with uses outside the loop differently, this 
> didn't even required multiple loops, take for instance the following 
> example using SVE intrinsics:
>
> #include <arm_sve.h>
> #include <limits.h>
> extern void use (char *);
> void bar (char  * __restrict__ a, char * __restrict__ b, char * 
> __restrict__ c, unsigned n)
> {
>     svbool_t all_true = svptrue_b8 ();
>   unsigned i = 0;
>   if (n < (UINT_MAX - svcntb() - 1))
>     {
>         for (; i < n; i += svcntb())
>             {
>                 svuint8_t va = svld1 (all_true, (uint8_t*)a);
>                 svuint8_t vb = svld1 (all_true, (uint8_t*)b);
>                 svst1 (all_true, (uint8_t *)c, svadd_z (all_true, 
> va,vb));
>                 a += svcntb();
>                 b += svcntb();
>                 c += svcntb();
>             }
>     }
>   use (a);
> }
>
> IVOPTs tends to generate a shared IV for SVE memory accesses, as we 
> don't have a post-increment for SVE load/stores. If we had not 
> included 'use (a);' in this example, IVOPTs would have replaced the 
> IV's for a, b and c with a single one, (also used for the 
> loop-control). See:
>
>   <bb 4> [local count: 955630225]:
>   # ivtmp.7_8 = PHI <ivtmp.7_25(7), 0(6)>
>   va_14 = MEM <svuint8_t> [(unsigned char *)a_10(D) + ivtmp.7_8 * 1];
>   vb_15 = MEM <svuint8_t> [(unsigned char *)b_11(D) + ivtmp.7_8 * 1];
>   _2 = svadd_u8_z ({ -1, ... }, va_14, vb_15);
>   MEM <__SVUint8_t> [(unsigned char *)c_12(D) + ivtmp.7_8 * 1] = _2;
>   ivtmp.7_25 = ivtmp.7_8 + POLY_INT_CST [16, 16];
>   i_23 = (unsigned int) ivtmp.7_25;
>   if (n_9(D) > i_23)
>     goto <bb 7>; [89.00%]
>   else
>     goto <bb 5>; [11.00%]
>
>  However, due to the 'use (a);' it will create two IVs one for 
> loop-control, b and c and one for a. See:
>
>  <bb 4> [local count: 955630225]:
>   # a_28 = PHI <a_18(7), a_11(D)(6)>
>   # ivtmp.7_25 = PHI <ivtmp.7_24(7), 0(6)>
>   va_15 = MEM <svuint8_t> [(unsigned char *)a_28];
>   vb_16 = MEM <svuint8_t> [(unsigned char *)b_12(D) + ivtmp.7_25 * 1];
>   _2 = svadd_u8_z ({ -1, ... }, va_15, vb_16);
>   MEM <__SVUint8_t> [(unsigned char *)c_13(D) + ivtmp.7_25 * 1] = _2;
>   a_18 = a_28 + POLY_INT_CST [16, 16];
>   ivtmp.7_24 = ivtmp.7_25 + POLY_INT_CST [16, 16];
>   i_8 = (unsigned int) ivtmp.7_24;
>   if (n_10(D) > i_8)
>     goto <bb 7>; [89.00%]
>   else
>     goto <bb 10>; [11.00%]
>
> With the first patch attached in this RFC 'no_cost.patch', I tell 
> IVOPTs to not cost uses outside of the loop. This makes IVOPTs 
> generate a single IV, but unfortunately it decides to create the 
> variable for the use inside the loop and it also seems to use the 
> pre-increment value of the shared-IV and add the [16,16] to it. See:
>
>   <bb 4> [local count: 955630225]:
>   # ivtmp.7_25 = PHI <ivtmp.7_24(7), 0(6)>
>   va_15 = MEM <svuint8_t> [(unsigned char *)a_11(D) + ivtmp.7_25 * 1];
>   vb_16 = MEM <svuint8_t> [(unsigned char *)b_12(D) + ivtmp.7_25 * 1];
>   _2 = svadd_u8_z ({ -1, ... }, va_15, vb_16);
>   MEM <__SVUint8_t> [(unsigned char *)c_13(D) + ivtmp.7_25 * 1] = _2;
>   _8 = (unsigned long) a_11(D);
>   _7 = _8 + ivtmp.7_25;
>   _6 = _7 + POLY_INT_CST [16, 16];
>   a_18 = (char * restrict) _6;
>   ivtmp.7_24 = ivtmp.7_25 + POLY_INT_CST [16, 16];
>   i_5 = (unsigned int) ivtmp.7_24;
>   if (n_10(D) > i_5)
>     goto <bb 7>; [89.00%]
>   else
>     goto <bb 10>; [11.00%]
>
> With the patch 'var_after.patch' I make get_computation_aff_1 use 
> 'cand->var_after' for outside uses thus using the post-increment var 
> of the candidate IV. This means I have to insert it in a different 
> place and make sure to delete the old use->stmt. I'm sure there is a 
> better way to do this using IVOPTs current framework, but I didn't 
> find one yet. See the result:
>
>  <bb 4> [local count: 955630225]:
>   # ivtmp.7_25 = PHI <ivtmp.7_24(7), 0(6)>
>   va_15 = MEM <svuint8_t> [(unsigned char *)a_11(D) + ivtmp.7_25 * 1];
>   vb_16 = MEM <svuint8_t> [(unsigned char *)b_12(D) + ivtmp.7_25 * 1];
>   _2 = svadd_u8_z ({ -1, ... }, va_15, vb_16);
>   MEM <__SVUint8_t> [(unsigned char *)c_13(D) + ivtmp.7_25 * 1] = _2;
>   ivtmp.7_24 = ivtmp.7_25 + POLY_INT_CST [16, 16];
>   _8 = (unsigned long) a_11(D);
>   _7 = _8 + ivtmp.7_24;
>   a_18 = (char * restrict) _7;
>   i_6 = (unsigned int) ivtmp.7_24;
>   if (n_10(D) > i_6)
>     goto <bb 7>; [89.00%]
>   else
>     goto <bb 10>; [11.00%]
>
>
> This is still not optimal as we are still doing the update inside the 
> loop and there is absolutely no need for that. I found that running 
> sink would solve it and it seems someone has added a second sink pass, 
> so that saves me a third patch :) see after sink2:
>
>   <bb 4> [local count: 955630225]:
>   # ivtmp.7_25 = PHI <ivtmp.7_24(7), 0(6)>
>   va_15 = MEM <svuint8_t> [(unsigned char *)a_11(D) + ivtmp.7_25 * 1];
>   vb_16 = MEM <svuint8_t> [(unsigned char *)b_12(D) + ivtmp.7_25 * 1];
>   _2 = svadd_u8_z ({ -1, ... }, va_15, vb_16);
>   MEM <__SVUint8_t> [(unsigned char *)c_13(D) + ivtmp.7_25 * 1] = _2;
>   ivtmp.7_24 = ivtmp.7_25 + POLY_INT_CST [16, 16];
>   i_6 = (unsigned int) ivtmp.7_24;
>   if (i_6 < n_10(D))
>     goto <bb 7>; [89.00%]
>   else
>     goto <bb 10>; [11.00%]
>
>   <bb 10> [local count: 105119324]:
>   _8 = (unsigned long) a_11(D);
>   _7 = _8 + ivtmp.7_24;
>   a_18 = (char * restrict) _7;
>   goto <bb 5>; [100.00%]
>
>
> I haven't tested this at all, but I wanted to get the opinion of 
> someone more knowledgeable in IVOPTs before I continued this avenue. I 
> have two main questions:
> 1) How should we be costing outside uses, right now I use a nocost, 
> but that's not entirely accurate. Should we use a constant multiply 
> factor for inside loop uses to make them outweigh outside uses? Should 
> we use iteration count if available? Do we want to use a backend hook 
> to let targets provide their own costing for these?
> 2) Is there a cleaner way to generate the optimal 'post-increment' use 
> for the outside-use variable? I first thought the position in the 
> candidate might be something I could use or even the var_at_stmt 
> functionality, but the outside IV has the actual increment of the 
> variable as it's use, rather than the outside uses. This is this RFC's 
> main weakness I find.
>
> Kind regards,
> Andre
>

  reply	other threads:[~2021-06-03 16:41 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-03 16:34 [RFC] Implementing detection of saturation and rounding arithmetic Andre Vieira (lists)
2021-06-03 16:41 ` Andre Vieira (lists) [this message]
2021-06-07 11:28 ` Bin.Cheng
2021-06-08 15:00   ` Andre Simoes Dias Vieira
2021-06-10 11:51     ` [RFC][ivopts] Generate better code for IVs with uses outside the loop Andre Vieira (lists)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=97def54f-e6da-935d-f6ca-21994ea4e286@arm.com \
    --to=andre.simoesdiasvieira@arm.com \
    --cc=bin.cheng@linux.alibaba.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=rguenther@suse.de \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).