Re: [RFC][PATCH]Merge VEC_COND_EXPR into MASK_STORE after loop vectorization

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Biener <richard.guenther@gmail.com>
To: renlin.li@foss.arm.com
Cc: GCC Patches <gcc-patches@gcc.gnu.org>,
	Richard Sandiford <richard.sandiford@arm.com>,
		Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>,
	James Greenhalgh <James.Greenhalgh@arm.com>
Subject: Re: [RFC][PATCH]Merge VEC_COND_EXPR into MASK_STORE after loop vectorization
Date: Fri, 09 Nov 2018 11:48:00 -0000	[thread overview]
Message-ID: <CAFiYyc0cSGYvP0sLJgxfEBYKicPrPSHUSGOpcr-osS8eLrMQUQ@mail.gmail.com> (raw)
In-Reply-To: <afaa665f-3fa0-38cd-28ba-bd1ed1630b0b@foss.arm.com>

On Thu, Nov 8, 2018 at 5:55 PM Renlin Li <renlin.li@foss.arm.com> wrote:
>
> Hi Richard,
>
> On 11/08/2018 12:09 PM, Richard Biener wrote:
> > On Thu, Nov 8, 2018 at 12:02 PM Renlin Li <renlin.li@foss.arm.com> wrote:
> >>
> >> Hi all,
> >>
> >> When allow-store-data-races is enabled, ifcvt would prefer to generated
> >> conditional select and unconditional store to convert certain if statement
> >> into:
> >>
> >> _ifc_1 = val
> >> _ifc_2 = A[i]
> >> val = cond? _ifc_1 : _ifc_2
> >> A[i] = val
> >>
> >> When the loop got vectorized, this pattern will be turned into
> >> MASK_LOAD, VEC_COND_EXPR and MASK_STORE. This could be improved.
> >
> > I'm somewhat confused - the vectorizer doesn't generate a masked store when
> > if-conversion didn't create one in the first place
> >
> > In particular with allow-store-data-races=1 (what your testcase uses)
> > there are no
> > masked loads/stores generated at all.   So at least you need a better testcase
> > to motivate (one that doesn't load from array[i] so that we know the conditional
> > stores might trap).
>
> Thanks for trying this. The test case is a little bit simple and artificial.
> ifcvt won't generate mask_store, instead it will generate unconditional store with allow-store-data-races=1.
>
> My build is based on 25th Oct. I got the following IR from ifcvt with
> aarch64-none-elf-gcc -S -march=armv8-a+sve -O2 -ftree-vectorize --param allow-store-data-races=1
>
>    <bb 3> [local count: 1006632961]:
>    # i_20 = PHI <i_13(9), 1(19)>
>    # ivtmp_18 = PHI <ivtmp_5(9), 15(19)>
>    a_10 = array[i_20];
>    _1 = a_10 & 1;
>    _2 = a_10 + 1;
>    _ifc__32 = array[i_20];
>    _ifc__33 = _2;
>    _ifc__34 = _1 != 0 ? _ifc__33 : _ifc__32;
>    array[i_20] = _ifc__34;
>    prephitmp_8 = _1 != 0 ? _2 : a_10;
>    _4 = a_10 + 2;
>    _ifc__35 = array[i_20];
>    _ifc__36 = _4;
>    _ifc__37 = prephitmp_8 > 10 ? _ifc__36 : _ifc__35;
>    array[i_20] = _ifc__37;
>    i_13 = i_20 + 1;
>    ivtmp_5 = ivtmp_18 - 1;
>    if (ivtmp_5 != 0)
>      goto <bb 9>; [93.33%]
>    else
>      goto <bb 8>; [6.67%]
>
> *However*, after I rebased my patch on the latest trunk.
> Got the following dump from ifcvt:
>    <bb 3> [local count: 1006632961]:
>    # i_20 = PHI <i_13(9), 1(21)>
>    # ivtmp_18 = PHI <ivtmp_5(9), 15(21)>
>    a_10 = array[i_20];
>    _1 = a_10 & 1;
>    _2 = a_10 + 1;
>    _ifc__34 = _1 != 0 ? _2 : a_10;
>    array[i_20] = _ifc__34;
>    _4 = a_10 + 2;
>    _ifc__37 = _ifc__34 > 10 ? _4 : _ifc__34;
>    array[i_20] = _ifc__37;
>    i_13 = i_20 + 1;
>    ivtmp_5 = ivtmp_18 - 1;
>    if (ivtmp_5 != 0)
>      goto <bb 9>; [93.33%]
>    else
>      goto <bb 8>; [6.67%]
>
> the redundant load is not generated, but you could still see the unconditional store.

Yes, I fixed the redundant loads recently and indeed dead stores
remain (for the particular
testcase they would be easy to remove).

> After loop vectorization, the following is generated (without my change):

Huh.  But that's not because of if-conversion but because SVE needs to
mask _all_
loop operations that are not safe to execute with the loop_mask!

>    vect_a_10.6_6 = .MASK_LOAD (vectp_array.4_35, 4B, loop_mask_7);
>    a_10 = array[i_20];
>    vect__1.7_39 = vect_a_10.6_6 & vect_cst__38;
>    _1 = a_10 & 1;
>    vect__2.8_41 = vect_a_10.6_6 + vect_cst__40;
>    _2 = a_10 + 1;
>    vect__ifc__34.9_43 = VEC_COND_EXPR <vect__1.7_39 != vect_cst__42, vect__2.8_41, vect_a_10.6_6>;
>    _ifc__34 = _1 != 0 ? _2 : a_10;
>    .MASK_STORE (vectp_array.10_45, 4B, loop_mask_7, vect__ifc__34.9_43);
>    vect__4.12_49 = vect_a_10.6_6 + vect_cst__48;
>    _4 = a_10 + 2;
>    vect__ifc__37.13_51 = VEC_COND_EXPR <vect__ifc__34.9_43 > vect_cst__50, vect__4.12_49, vect__ifc__34.9_43>;
>    _ifc__37 = _ifc__34 > 10 ? _4 : _ifc__34;
>    .MASK_STORE (vectp_array.14_53, 4B, loop_mask_7, vect__ifc__37.13_51);
>
> With the old ifcvt code, my change here could improve it a little bit, eliminate some redundant load.
> With the new code, it could not improved it further. I'll adjust the patch based on the latest trunk.

So what does the patch change the above to?  The code has little to no
comments apart from a
small picture with code _before_ the transform...

I was wondering whether we can implement

  l = [masked]load;
  tem = cond ? x : l;
  masked-store = tem;

pattern matching in a regular pass - forwprop for example.  Note the
load doesn't need to be masked,
correct?  In fact if it is masked you need to make sure the
conditional never accesses parts that
are masked in the load, no?  Or require the mask to be the same as
that used by the store.  But then
you still cannot simply replace the store mask with a new mask
generated from the conditional?

Richard.

>
> >
> > So what I see (with store data races not allowed) from ifcvt is
>
> when store data races is not allowed, we won't generate unconditional store. Instead ifcvt
> generates predicated store. That's what you showed here.
>
> As I mentioned, we could always make ifcvt generate mask_store as it should be always safe.
> But I don't know the performance implication on other targets (I assume there must be reasons why
> people write code to generate unconditional store when data-race is allowed? What I understand is that,
> this option allows the compiler to be more aggressive on optimization).
>
> The other reason is the data reference analysis. There might be versioned loop created with a
> more complexer test case.
>
> Again, I need to rebase and check my patch with the latest trunk, and need to come up with a better test case.
>
> >
> >    <bb 3> [local count: 1006632961]:
> >    # i_20 = PHI <i_13(9), 1(21)>
> >    # ivtmp_18 = PHI <ivtmp_5(9), 15(21)>
> >    a_10 = array[i_20];
> >    _1 = a_10 & 1;
> >    _2 = a_10 + 1;
> >    _32 = _1 != 0;
> >    _33 = &array[i_20];
> >    .MASK_STORE (_33, 32B, _32, _2);
> >    prephitmp_8 = _1 != 0 ? _2 : a_10;
> >    _4 = a_10 + 2;
> >    _34 = prephitmp_8 > 10;
> >    .MASK_STORE (_33, 32B, _34, _4);
> >    i_13 = i_20 + 1;
> >    ivtmp_5 = ivtmp_18 - 1;
> >    if (ivtmp_5 != 0)
> >
> > and what you want to do is merge
> >
> >    prephitmp_8 = _1 != 0 ? _2 : a_10;
> >    _34 = prephitmp_8 > 10;
> >
> > somehow?  But your patch mentions that _4 should be prephitmp_8 so
> > it wouldn't do anything here?
> >
> >> The change here add a post processing function to combine the VEC_COND_EXPR
> >> expression into MASK_STORE, and delete related dead code.
> >>
> >> I am a little bit conservative here.
> >> I didn't change the default behavior of ifcvt to always generate MASK_STORE,
> >> although it should be safe in all cases (allow or dis-allow store data race).
> >>
> >> MASK_STORE might not well handled in vectorization pass compared with
> >> conditional select. It might be too early and aggressive to do that in ifcvt.
> >> And the performance of MASK_STORE might not good for some platforms.
> >> (We could add --param or target hook to differentiate this ifcvt behavior
> >> on different platforms)
> >>
> >> Another reason I did not do that in ifcvt is the data reference
> >> analysis. To create a MASK_STORE, a pointer is created as the first
> >> argument to the internal function call. If the pointer is created out of
> >> array references, e.g. x = &A[i], data reference analysis could not properly
> >> analysis the relationship between MEM_REF (x) and ARRAY_REF (A, i). This
> >> will create a versioned loop beside the vectorized one.
> >
> > Actually for your testcase it won't vectorize because there's compile-time
> > aliasing (somehow we miss handling of dependence distance zero?!)
> >
> >> (I have hacks to look through the MEM_REF, and restore the reference back to
> >> ARRAY_REF (A, i).  Maybe we could do analysis on lowered address expression?
> >> I saw we have gimple laddress pass to aid the vectorizer)
> >
> > An old idea of mine is to have dependence analysis fall back to lowered address
> > form when it fails to match two references.  This would require re-analysis and
> > eventually storing an alternate "inner reference" in the data-ref structure.
>
> Yes, that makes sense.
>
> My hack is in get_references_in_stmt. Look through the pointer to see where it is generated.
> This is only for mask_store as we know the first operand is original a pointer or a new pointer from something.
>
>
> Thanks,
> Renlin
>
>
> >
> >> The approach here comes a little bit late, on the condition that vector
> >> MASK_STORE is generated by loop vectorizer. In this case, it is definitely
> >> beneficial to do the code transformation.
> >>
> >> Any thoughts on the best way to fix the issue?
> >>
> >>
> >> This patch has been tested with aarch64-none-elf, no regressions.
> >>
> >> Regards,
> >> Renlin
> >>
> >> gcc/ChangeLog:
> >>
> >> 2018-11-08  Renlin Li  <renlin.li@arm.com>
> >>
> >>          * tree-vectorizer.h (combine_sel_mask_store): Declare new function.
> >>          * tree-vect-loop.c (combine_sel_mask_store): Define new function.
> >>          * tree-vectorizer.c (vectorize_loops): Call it.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> 2018-11-08  Renlin Li  <renlin.li@arm.com>
> >>
> >>          * gcc.target/aarch64/sve/combine_vcond_mask_store_1.c: New.
> >>

next prev parent reply	other threads:[~2018-11-09 11:48 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-08 11:02 Renlin Li
2018-11-08 12:09 ` Richard Biener
2018-11-08 16:56   ` Renlin Li
2018-11-09 11:48     ` Richard Biener [this message]
2018-11-09 15:50       ` Renlin Li
2018-11-14 14:59         ` Richard Biener
2018-11-20 14:57           ` Renlin Li
2018-12-03 19:39             ` Jeff Law
2018-12-04 16:34               ` Richard Sandiford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFiYyc0cSGYvP0sLJgxfEBYKicPrPSHUSGOpcr-osS8eLrMQUQ@mail.gmail.com \
    --to=richard.guenther@gmail.com \
    --cc=James.Greenhalgh@arm.com \
    --cc=Ramana.Radhakrishnan@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=renlin.li@foss.arm.com \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).