RE: [PATCH]middle-end Add optimized float addsub without needing VEC_PERM_EXPR.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Tamar Christina <Tamar.Christina@arm.com>
To: Richard Biener <rguenther@suse.de>
Cc: Richard Sandiford <Richard.Sandiford@arm.com>,
	Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>,
	nd <nd@arm.com>, "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai>
Subject: RE: [PATCH]middle-end Add optimized float addsub without needing VEC_PERM_EXPR.
Date: Fri, 23 Sep 2022 13:13:53 +0000	[thread overview]
Message-ID: <VI1PR08MB5325F786B64431564DA3D6E1FF519@VI1PR08MB5325.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <nycvar.YFH.7.77.849.2209231245260.6652@jbgna.fhfr.qr>

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Friday, September 23, 2022 8:54 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: Richard Sandiford <Richard.Sandiford@arm.com>; Tamar Christina via
> Gcc-patches <gcc-patches@gcc.gnu.org>; nd <nd@arm.com>;
> juzhe.zhong@rivai.ai
> Subject: RE: [PATCH]middle-end Add optimized float addsub without
> needing VEC_PERM_EXPR.
> 
> On Fri, 23 Sep 2022, Tamar Christina wrote:
> 
> > Hi,
> >
> > Attached is the respun version of the patch,
> >
> > > >>
> > > >> Wouldn't a target need to re-check if lanes are NaN or denormal
> > > >> if after a SFmode lane operation a DFmode lane operation follows?
> > > >> IIRC that is what usually makes punning "integer" vectors as FP vectors
> costly.
> >
> > I don't believe this is a problem, due to NANs not being a single
> > value and according to the standard the sign bit doesn't change the
> meaning of a NAN.
> >
> > That's why specifically for negates generally no check is performed
> > and it's Assumed that if a value is a NaN going in, it's a NaN coming
> > out, and this Optimization doesn't change that.  Also under fast-math
> > we don't guarantee a stable representation for NaN (or zeros, etc) afaik.
> >
> > So if that is still a concern I could add && !HONORS_NAN () to the
> constraints.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* match.pd: Add fneg/fadd rule.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 	* gcc.target/aarch64/simd/addsub_1.c: New test.
> > 	* gcc.target/aarch64/sve/addsub_1.c: New test.
> >
> > --- inline version of patch ---
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd index
> >
> 1bb936fc4010f98f24bb97671350e8432c55b347..2617d56091dfbd41ae49f980e
> e0a
> > f3757f5ec1cf 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -7916,6 +7916,59 @@ and,
> >    (simplify (reduc (op @0 VECTOR_CST@1))
> >      (op (reduc:type @0) (reduc:type @1))))
> >
> > +/* Simplify vector floating point operations of alternating sub/add pairs
> > +   into using an fneg of a wider element type followed by a normal add.
> > +   under IEEE 754 the fneg of the wider type will negate every even entry
> > +   and when doing an add we get a sub of the even and add of every odd
> > +   elements.  */
> > +(simplify
> > + (vec_perm (plus:c @0 @1) (minus @0 @1) VECTOR_CST@2)  (if
> > +(!VECTOR_INTEGER_TYPE_P (type) && !BYTES_BIG_ENDIAN)
> 
> shouldn't this be FLOAT_WORDS_BIG_ENDIAN instead?
> 
> I'm still concerned what
> 
>  (neg:V2DF (subreg:V2DF (reg:V4SF) 0))
> 
> means for architectures like RISC-V.  Can one "reformat" FP values in vector
> registers so that two floats overlap a double (and then back)?
> 
> I suppose you rely on target_can_change_mode_class to tell you that.

Indeed, the documentation says:

"This hook returns true if it is possible to bitcast values held in registers of class rclass
from mode from to mode to and if doing so preserves the low-order bits that are
common to both modes. The result is only meaningful if rclass has registers that can
hold both from and to."

This implies to me that if the bitcast shouldn't be possible the hook should reject it.
Of course you always where something is possible, but perhaps not cheap to do.

The specific implementation for RISC-V seem to imply to me that they disallow any FP
conversions. So seems to be ok.

> 
> 
> > +  (with
> > +   {
> > +     /* Build a vector of integers from the tree mask.  */
> > +     vec_perm_builder builder;
> > +     if (!tree_to_vec_perm_builder (&builder, @2))
> > +       return NULL_TREE;
> > +
> > +     /* Create a vec_perm_indices for the integer vector.  */
> > +     poly_uint64 nelts = TYPE_VECTOR_SUBPARTS (type);
> > +     vec_perm_indices sel (builder, 2, nelts);
> > +   }
> > +   (if (sel.series_p (0, 2, 0, 2))
> > +    (with
> > +     {
> > +       machine_mode vec_mode = TYPE_MODE (type);
> > +       auto elem_mode = GET_MODE_INNER (vec_mode);
> > +       auto nunits = exact_div (GET_MODE_NUNITS (vec_mode), 2);
> > +       tree stype;
> > +       switch (elem_mode)
> > +	 {
> > +	 case E_HFmode:
> > +	   stype = float_type_node;
> > +	   break;
> > +	 case E_SFmode:
> > +	   stype = double_type_node;
> > +	   break;
> > +	 default:
> > +	   return NULL_TREE;
> > +	 }
> 
> Can't you use GET_MODE_WIDER_MODE and double-check the mode-size
> doubles?  I mean you obviously miss DFmode -> TFmode.

Problem is I need the type, not the mode, but all even build_pointer_type_for_mode
requires the new scalar type.  So I couldn't find anything to help here given that there's
no inverse relationship between modes and types.

> 
> > +       tree ntype = build_vector_type (stype, nunits);
> > +       if (!ntype)
> 
> You want to check that the above results in a vector mode.

Does it? Technically you can cast a V2SF to both a V1DF or DF can't you?
Both seem equally valid here.

> > +	 return NULL_TREE;
> > +
> > +       /* The format has to have a simple sign bit.  */
> > +       const struct real_format *fmt = FLOAT_MODE_FORMAT (vec_mode);
> > +       if (fmt == NULL)
> > +	 return NULL_TREE;
> > +     }
> > +     (if (fmt->signbit_rw == GET_MODE_UNIT_BITSIZE (vec_mode) - 1
> 
> shouldn't this be a check on the component mode?  I think you'd want to
> check that the bigger format signbit_rw is equal to the smaller format mode
> size plus its signbit_rw or so?

Tbh, both are somewhat weak guarantees.  In a previous patch of mine I'd added a new field "is_ieee"
to the real formats to denote that they are an IEEE type.  Maybe I should revive that instead?

Regards,
Tamar

> 
> > +	  && fmt->signbit_rw == fmt->signbit_ro
> > +	  && targetm.can_change_mode_class (TYPE_MODE (ntype),
> TYPE_MODE (type), ALL_REGS)
> > +	  && (optimize_vectors_before_lowering_p ()
> > +	      || target_supports_op_p (ntype, NEGATE_EXPR, optab_vector)))
> > +      (plus (view_convert:type (negate (view_convert:ntype @1)))
> > +@0)))))))
> > +
> >  (simplify
> >   (vec_perm @0 @1 VECTOR_CST@2)
> >   (with
> > diff --git a/gcc/testsuite/gcc.target/aarch64/simd/addsub_1.c
> > b/gcc/testsuite/gcc.target/aarch64/simd/addsub_1.c
> > new file mode 100644
> > index
> >
> 0000000000000000000000000000000000000000..1fb91a34c421bbd2894faa0db
> bf1
> > b47ad43310c4
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/simd/addsub_1.c
> > @@ -0,0 +1,56 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target arm_v8_2a_fp16_neon_ok } */
> > +/* { dg-options "-Ofast" } */
> > +/* { dg-add-options arm_v8_2a_fp16_neon } */
> > +/* { dg-final { check-function-bodies "**" "" "" { target { le } } }
> > +} */
> > +
> > +#pragma GCC target "+nosve"
> > +
> > +/*
> > +** f1:
> > +** ...
> > +**	fneg	v[0-9]+.2d, v[0-9]+.2d
> > +**	fadd	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> > +** ...
> > +*/
> > +void f1 (float *restrict a, float *restrict b, float *res, int n) {
> > +   for (int i = 0; i < (n & -4); i+=2)
> > +    {
> > +      res[i+0] = a[i+0] + b[i+0];
> > +      res[i+1] = a[i+1] - b[i+1];
> > +    }
> > +}
> > +
> > +/*
> > +** d1:
> > +** ...
> > +** 	fneg	v[0-9]+.4s, v[0-9]+.4s
> > +** 	fadd	v[0-9]+.8h, v[0-9]+.8h, v[0-9]+.8h
> > +** ...
> > +*/
> > +void d1 (_Float16 *restrict a, _Float16 *restrict b, _Float16 *res,
> > +int n) {
> > +   for (int i = 0; i < (n & -8); i+=2)
> > +    {
> > +      res[i+0] = a[i+0] + b[i+0];
> > +      res[i+1] = a[i+1] - b[i+1];
> > +    }
> > +}
> > +
> > +/*
> > +** e1:
> > +** ...
> > +** 	fadd	v[0-9]+.2d, v[0-9]+.2d, v[0-9]+.2d
> > +** 	fsub	v[0-9]+.2d, v[0-9]+.2d, v[0-9]+.2d
> > +** 	ins	v[0-9]+.d\[1\], v[0-9]+.d\[1\]
> > +** ...
> > +*/
> > +void e1 (double *restrict a, double *restrict b, double *res, int n)
> > +{
> > +   for (int i = 0; i < (n & -4); i+=2)
> > +    {
> > +      res[i+0] = a[i+0] + b[i+0];
> > +      res[i+1] = a[i+1] - b[i+1];
> > +    }
> > +}
> > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/addsub_1.c
> > b/gcc/testsuite/gcc.target/aarch64/sve/addsub_1.c
> > new file mode 100644
> > index
> >
> 0000000000000000000000000000000000000000..ea7f9d9db2c8c9a3efe5c7951a
> 31
> > 4a29b7a7a922
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/sve/addsub_1.c
> > @@ -0,0 +1,52 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-Ofast" } */
> > +/* { dg-final { check-function-bodies "**" "" "" { target { le } } }
> > +} */
> > +
> > +/*
> > +** f1:
> > +** ...
> > +** 	fneg	z[0-9]+.d, p[0-9]+/m, z[0-9]+.d
> > +** 	fadd	z[0-9]+.s, z[0-9]+.s, z[0-9]+.s
> > +** ...
> > +*/
> > +void f1 (float *restrict a, float *restrict b, float *res, int n) {
> > +   for (int i = 0; i < (n & -4); i+=2)
> > +    {
> > +      res[i+0] = a[i+0] + b[i+0];
> > +      res[i+1] = a[i+1] - b[i+1];
> > +    }
> > +}
> > +
> > +/*
> > +** d1:
> > +** ...
> > +** 	fneg	z[0-9]+.s, p[0-9]+/m, z[0-9]+.s
> > +** 	fadd	z[0-9]+.h, z[0-9]+.h, z[0-9]+.h
> > +** ...
> > +*/
> > +void d1 (_Float16 *restrict a, _Float16 *restrict b, _Float16 *res,
> > +int n) {
> > +   for (int i = 0; i < (n & -8); i+=2)
> > +    {
> > +      res[i+0] = a[i+0] + b[i+0];
> > +      res[i+1] = a[i+1] - b[i+1];
> > +    }
> > +}
> > +
> > +/*
> > +** e1:
> > +** ...
> > +** 	fsub	z[0-9]+.d, z[0-9]+.d, z[0-9]+.d
> > +** 	movprfx	z[0-9]+.d, p[0-9]+/m, z[0-9]+.d
> > +** 	fadd	z[0-9]+.d, p[0-9]+/m, z[0-9]+.d, z[0-9]+.d
> > +** ...
> > +*/
> > +void e1 (double *restrict a, double *restrict b, double *res, int n)
> > +{
> > +   for (int i = 0; i < (n & -4); i+=2)
> > +    {
> > +      res[i+0] = a[i+0] + b[i+0];
> > +      res[i+1] = a[i+1] - b[i+1];
> > +    }
> > +}
> >
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald,
> Boudien Moerman; HRB 36809 (AG Nuernberg)

next prev parent reply	other threads:[~2022-09-23 13:14 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-16 10:58 Tamar Christina
2022-06-17 20:33 ` Andrew Pinski
2022-06-18 10:49   ` Richard Biener
2022-06-20 10:00     ` Tamar Christina
2022-06-20 11:56       ` Richard Biener
2022-06-20 12:05         ` Tamar Christina
2022-06-20 13:10           ` Richard Sandiford
2022-09-23  9:11             ` Tamar Christina
2022-09-23 12:54               ` Richard Biener
2022-09-23 13:07                 ` 钟居哲
2022-09-23 13:13                 ` Tamar Christina [this message]
2022-09-23 13:54                   ` Tamar Christina
2022-09-26 11:10                     ` Richard Biener
2022-10-31 11:38                       ` Tamar Christina
2022-10-31 15:49                         ` Jeff Law
2022-07-01 23:07         ` Jeff Law
2022-07-01 22:57   ` Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=VI1PR08MB5325F786B64431564DA3D6E1FF519@VI1PR08MB5325.eurprd08.prod.outlook.com \
    --to=tamar.christina@arm.com \
    --cc=Richard.Sandiford@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=juzhe.zhong@rivai.ai \
    --cc=nd@arm.com \
    --cc=rguenther@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).