RE: [PATCH]AArch64 Add SVE implementation for cond_copysign.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Tamar Christina <Tamar.Christina@arm.com>
To: Richard Sandiford <Richard.Sandiford@arm.com>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	nd <nd@arm.com>, Richard Earnshaw <Richard.Earnshaw@arm.com>,
	Marcus Shawcroft <Marcus.Shawcroft@arm.com>,
	Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
Subject: RE: [PATCH]AArch64 Add SVE implementation for cond_copysign.
Date: Thu, 5 Oct 2023 20:45:10 +0000	[thread overview]
Message-ID: <VI1PR08MB53255672DF89DF1ADC094412FFCAA@VI1PR08MB5325.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <mpt1qe8n7ef.fsf@arm.com>

> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: Thursday, October 5, 2023 9:26 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Marcus Shawcroft
> <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Subject: Re: [PATCH]AArch64 Add SVE implementation for cond_copysign.
> 
> Tamar Christina <Tamar.Christina@arm.com> writes:
> >> -----Original Message-----
> >> From: Richard Sandiford <richard.sandiford@arm.com>
> >> Sent: Thursday, October 5, 2023 8:29 PM
> >> To: Tamar Christina <Tamar.Christina@arm.com>
> >> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw
> >> <Richard.Earnshaw@arm.com>; Marcus Shawcroft
> >> <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov
> <Kyrylo.Tkachov@arm.com>
> >> Subject: Re: [PATCH]AArch64 Add SVE implementation for cond_copysign.
> >>
> >> Tamar Christina <tamar.christina@arm.com> writes:
> >> > Hi All,
> >> >
> >> > This adds an implementation for masked copysign along with an
> >> > optimized pattern for masked copysign (x, -1).
> >>
> >> It feels like we're ending up with a lot of AArch64-specific code
> >> that just hard- codes the observation that changing the sign is
> >> equivalent to changing the top bit.  We then need to make sure that
> >> we choose the best way of changing the top bit for any given situation.
> >>
> >> Hard-coding the -1/negative case is one instance of that.  But it
> >> looks like we also fail to use the best sequence for SVE2.  E.g.
> >> [https://godbolt.org/z/ajh3MM5jv]:
> >>
> >> #include <stdint.h>
> >>
> >> void f(double *restrict a, double *restrict b) {
> >>     for (int i = 0; i < 100; ++i)
> >>         a[i] = __builtin_copysign(a[i], b[i]); }
> >>
> >> void g(uint64_t *restrict a, uint64_t *restrict b, uint64_t c) {
> >>     for (int i = 0; i < 100; ++i)
> >>         a[i] = (a[i] & ~c) | (b[i] & c); }
> >>
> >> gives:
> >>
> >> f:
> >>         mov     x2, 0
> >>         mov     w3, 100
> >>         whilelo p7.d, wzr, w3
> >> .L2:
> >>         ld1d    z30.d, p7/z, [x0, x2, lsl 3]
> >>         ld1d    z31.d, p7/z, [x1, x2, lsl 3]
> >>         and     z30.d, z30.d, #0x7fffffffffffffff
> >>         and     z31.d, z31.d, #0x8000000000000000
> >>         orr     z31.d, z31.d, z30.d
> >>         st1d    z31.d, p7, [x0, x2, lsl 3]
> >>         incd    x2
> >>         whilelo p7.d, w2, w3
> >>         b.any   .L2
> >>         ret
> >> g:
> >>         mov     x3, 0
> >>         mov     w4, 100
> >>         mov     z29.d, x2
> >>         whilelo p7.d, wzr, w4
> >> .L6:
> >>         ld1d    z30.d, p7/z, [x0, x3, lsl 3]
> >>         ld1d    z31.d, p7/z, [x1, x3, lsl 3]
> >>         bsl     z31.d, z31.d, z30.d, z29.d
> >>         st1d    z31.d, p7, [x0, x3, lsl 3]
> >>         incd    x3
> >>         whilelo p7.d, w3, w4
> >>         b.any   .L6
> >>         ret
> >>
> >> I saw that you originally tried to do this in match.pd and that the
> >> decision was to fold to copysign instead.  But perhaps there's a
> >> compromise where isel does something with the (new) copysign canonical
> form?
> >> I.e. could we go with your new version of the match.pd patch, and add
> >> some isel stuff as a follow-on?
> >>
> >
> > Sure if that's what's desired.... But..
> >
> > The example you posted above is for instance worse for x86
> > https://godbolt.org/z/x9ccqxW6T where the first operation has a
> > dependency chain of 2 and the latter of 3.  It's likely any open coding of this
> operation is going to hurt a target.
> >
> > So I'm unsure what isel transform this into...
> 
> I didn't mean that we should go straight to using isel for the general case, just
> for the new case.  The example above was instead trying to show the general
> point that hiding the logic ops in target code is a double-edged sword.

I see.. but the problem here is that transforming copysign (x, -1) into
(x | 0x8000000) would require an integer operation on an FP value.  I'm happy to
do it but it seems like it'll be an AArch64 only thing anyway.

If we want to do this we need to check can_change_mode_class or a hook.
Most targets including x86 reject the conversion.  So it'll just be effectively an AArch64
thing.

You're right that the actual equivalent transformation is this https://godbolt.org/z/KesfrMv5z
But the target won't allow it.

> 
> The x86_64 example for the -1 case would be
> https://godbolt.org/z/b9s6MaKs8 where the isel change would be an
> improvement.  Without that, I guess
> x86_64 will need to have a similar patch to the AArch64 one.
> 

I think that's to be expected.  I think it's logical that every target just needs to implement
their optabs optimally.

> That said, https://godbolt.org/z/e6nqoqbMh suggests that powerpc64 is
> probably relying on the current copysign -> neg/abs transform.
> (Not sure why the second function uses different IVs from the first.)
> 
> Personally, I wouldn't be against a target hook that indicated whether float bit
> manipulation is "free" for a given mode, if it comes to that.

I'm really sure Richi would agree there.  Generally speaking I don't think people see
doing FP operations on Int as beneficial.  But it's always the case on AArch64.

But sure, if you believe it to be beneficial I can follow up with a patch, but I'd still
need this one to allow the folding of the VEC_PERM_EXPR away.

Thanks,
Tamar

> 
> Thanks,
> Richard

next prev parent reply	other threads:[~2023-10-05 20:45 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-05 18:21 Tamar Christina
2023-10-05 19:28 ` Richard Sandiford
2023-10-05 19:47   ` Tamar Christina
2023-10-05 20:25     ` Richard Sandiford
2023-10-05 20:45       ` Tamar Christina [this message]
2023-10-06  7:32         ` Richard Biener
2023-10-07  9:57           ` Richard Sandiford
2023-10-09  9:38             ` Tamar Christina
2023-10-09  9:45               ` Richard Biener
2023-10-09  9:55                 ` Tamar Christina
2023-10-09  9:56               ` Richard Sandiford
2023-10-09 10:09                 ` Tamar Christina
2023-10-09 10:17                   ` Richard Sandiford
2023-10-09 11:30                     ` Richard Biener
2023-10-05 20:34     ` Andrew Pinski
2023-10-19 21:29 ` Richard Sandiford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=VI1PR08MB53255672DF89DF1ADC094412FFCAA@VI1PR08MB5325.eurprd08.prod.outlook.com \
    --to=tamar.christina@arm.com \
    --cc=Kyrylo.Tkachov@arm.com \
    --cc=Marcus.Shawcroft@arm.com \
    --cc=Richard.Earnshaw@arm.com \
    --cc=Richard.Sandiford@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=nd@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).