From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <rguenther@suse.de>
Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28])
	by sourceware.org (Postfix) with ESMTPS id 2C05C3858D32
	for <gcc-patches@gcc.gnu.org>; Mon, 26 Sep 2022 11:10:30 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2C05C3858D32
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de
Received: from relay2.suse.de (relay2.suse.de [149.44.160.134])
	by smtp-out1.suse.de (Postfix) with ESMTP id DD27F21E8E;
	Mon, 26 Sep 2022 11:10:28 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa;
	t=1664190628; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=rkbZGfx7w+D/Fo+dHQSOirEa8PyyvEqXYgr1t8Cj4jc=;
	b=Ylu6ey37I7r3ar3bBuzbo9M7Kt45uNNguwciSAxCo7hWCw+TPAkxvQReCAwi8AYL//5FBA
	lK/wn704PrcpP+NPIUOUMif4eSH3XY+Sfoc4vu+KNlEkMq6rAgFwdqDZDsHW3dKbFmQoCD
	MazKGKWOQvvUfW+L/Iso14UZ99xEHbs=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
	s=susede2_ed25519; t=1664190628;
	h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=rkbZGfx7w+D/Fo+dHQSOirEa8PyyvEqXYgr1t8Cj4jc=;
	b=l/u8jtNgatL6PiorZh2rHRHxo/qVC7YSClA7c4hN5v145nwH7fPILuSkHTJjl/nHuHo6gr
	EYHOoDthmgXHBJDA==
Received: from wotan.suse.de (wotan.suse.de [10.160.0.1])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by relay2.suse.de (Postfix) with ESMTPS id CB49B2C152;
	Mon, 26 Sep 2022 11:10:28 +0000 (UTC)
Date: Mon, 26 Sep 2022 11:10:28 +0000 (UTC)
From: Richard Biener <rguenther@suse.de>
To: Tamar Christina <Tamar.Christina@arm.com>
cc: Richard Sandiford <Richard.Sandiford@arm.com>, nd <nd@arm.com>, 
    Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>, 
    "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai>
Subject: RE: [PATCH]middle-end Add optimized float addsub without needing
 VEC_PERM_EXPR.
In-Reply-To:  <VI1PR08MB5325E0C89FD6FEAFFF6A1660FF519@VI1PR08MB5325.eurprd08.prod.outlook.com>
Message-ID: <nycvar.YFH.7.77.849.2209261103180.6652@jbgna.fhfr.qr>
References: <CA+=Sn1=+BC0FpE0X6-ABOf2+1sCMy3Z=KAisSrpAnKfCcmv+NQ@mail.gmail.com> <1C4185AB-6EE6-4B8B-838C-465098DAFD3B@suse.de> <VI1PR08MB53257327F742E0B84F00C1C9FFB09@VI1PR08MB5325.eurprd08.prod.outlook.com> <997q6no-qsqp-1oro-52sp-899sr075p4po@fhfr.qr>
 <VI1PR08MB5325B83631F6DCA224E515D0FFB09@VI1PR08MB5325.eurprd08.prod.outlook.com> <mptletr7a5l.fsf@arm.com> <VI1PR08MB5325135B9198392717AB1616FF519@VI1PR08MB5325.eurprd08.prod.outlook.com> <nycvar.YFH.7.77.849.2209231245260.6652@jbgna.fhfr.qr>
 <VI1PR08MB5325F786B64431564DA3D6E1FF519@VI1PR08MB5325.eurprd08.prod.outlook.com>  <VI1PR08MB5325E0C89FD6FEAFFF6A1660FF519@VI1PR08MB5325.eurprd08.prod.outlook.com>
User-Agent: Alpine 2.22 (LSU 394 2020-01-19)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Fri, 23 Sep 2022, Tamar Christina wrote:

> > -----Original Message-----
> > From: Gcc-patches <gcc-patches-
> > bounces+tamar.christina=arm.com@gcc.gnu.org> On Behalf Of Tamar
> > Christina via Gcc-patches
> > Sent: Friday, September 23, 2022 9:14 AM
> > To: Richard Biener <rguenther@suse.de>
> > Cc: Richard Sandiford <Richard.Sandiford@arm.com>; nd <nd@arm.com>;
> > Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>;
> > juzhe.zhong@rivai.ai
> > Subject: RE: [PATCH]middle-end Add optimized float addsub without
> > needing VEC_PERM_EXPR.
> > 
> > > -----Original Message-----
> > > From: Richard Biener <rguenther@suse.de>
> > > Sent: Friday, September 23, 2022 8:54 AM
> > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > Cc: Richard Sandiford <Richard.Sandiford@arm.com>; Tamar Christina via
> > > Gcc-patches <gcc-patches@gcc.gnu.org>; nd <nd@arm.com>;
> > > juzhe.zhong@rivai.ai
> > > Subject: RE: [PATCH]middle-end Add optimized float addsub without
> > > needing VEC_PERM_EXPR.
> > >
> > > On Fri, 23 Sep 2022, Tamar Christina wrote:
> > >
> > > > Hi,
> > > >
> > > > Attached is the respun version of the patch,
> > > >
> > > > > >>
> > > > > >> Wouldn't a target need to re-check if lanes are NaN or denormal
> > > > > >> if after a SFmode lane operation a DFmode lane operation follows?
> > > > > >> IIRC that is what usually makes punning "integer" vectors as FP
> > > > > >> vectors
> > > costly.
> > > >
> > > > I don't believe this is a problem, due to NANs not being a single
> > > > value and according to the standard the sign bit doesn't change the
> > > meaning of a NAN.
> > > >
> > > > That's why specifically for negates generally no check is performed
> > > > and it's Assumed that if a value is a NaN going in, it's a NaN
> > > > coming out, and this Optimization doesn't change that.  Also under
> > > > fast-math we don't guarantee a stable representation for NaN (or zeros,
> > etc) afaik.
> > > >
> > > > So if that is still a concern I could add && !HONORS_NAN () to the
> > > constraints.
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > >
> > > > Ok for master?
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > 	* match.pd: Add fneg/fadd rule.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > 	* gcc.target/aarch64/simd/addsub_1.c: New test.
> > > > 	* gcc.target/aarch64/sve/addsub_1.c: New test.
> > > >
> > > > --- inline version of patch ---
> > > >
> > > > diff --git a/gcc/match.pd b/gcc/match.pd index
> > > >
> > >
> > 1bb936fc4010f98f24bb97671350e8432c55b347..2617d56091dfbd41ae49f980e
> > > e0a
> > > > f3757f5ec1cf 100644
> > > > --- a/gcc/match.pd
> > > > +++ b/gcc/match.pd
> > > > @@ -7916,6 +7916,59 @@ and,
> > > >    (simplify (reduc (op @0 VECTOR_CST@1))
> > > >      (op (reduc:type @0) (reduc:type @1))))
> > > >
> > > > +/* Simplify vector floating point operations of alternating sub/add pairs
> > > > +   into using an fneg of a wider element type followed by a normal add.
> > > > +   under IEEE 754 the fneg of the wider type will negate every even entry
> > > > +   and when doing an add we get a sub of the even and add of every odd
> > > > +   elements.  */
> > > > +(simplify
> > > > + (vec_perm (plus:c @0 @1) (minus @0 @1) VECTOR_CST@2)  (if
> > > > +(!VECTOR_INTEGER_TYPE_P (type) && !BYTES_BIG_ENDIAN)
> > >
> > > shouldn't this be FLOAT_WORDS_BIG_ENDIAN instead?
> > >
> > > I'm still concerned what
> > >
> > >  (neg:V2DF (subreg:V2DF (reg:V4SF) 0))
> > >
> > > means for architectures like RISC-V.  Can one "reformat" FP values in
> > > vector registers so that two floats overlap a double (and then back)?
> > >
> > > I suppose you rely on target_can_change_mode_class to tell you that.
> > 
> > Indeed, the documentation says:
> > 
> > "This hook returns true if it is possible to bitcast values held in registers of
> > class rclass from mode from to mode to and if doing so preserves the low-
> > order bits that are common to both modes. The result is only meaningful if
> > rclass has registers that can hold both from and to."
> > 
> > This implies to me that if the bitcast shouldn't be possible the hook should
> > reject it.
> > Of course you always where something is possible, but perhaps not cheap to
> > do.
> > 
> > The specific implementation for RISC-V seem to imply to me that they
> > disallow any FP conversions. So seems to be ok.

Ok, I see.

> > >
> > >
> > > > +  (with
> > > > +   {
> > > > +     /* Build a vector of integers from the tree mask.  */
> > > > +     vec_perm_builder builder;
> > > > +     if (!tree_to_vec_perm_builder (&builder, @2))
> > > > +       return NULL_TREE;
> > > > +
> > > > +     /* Create a vec_perm_indices for the integer vector.  */
> > > > +     poly_uint64 nelts = TYPE_VECTOR_SUBPARTS (type);
> > > > +     vec_perm_indices sel (builder, 2, nelts);
> > > > +   }
> > > > +   (if (sel.series_p (0, 2, 0, 2))
> > > > +    (with
> > > > +     {
> > > > +       machine_mode vec_mode = TYPE_MODE (type);
> > > > +       auto elem_mode = GET_MODE_INNER (vec_mode);
> > > > +       auto nunits = exact_div (GET_MODE_NUNITS (vec_mode), 2);
> > > > +       tree stype;
> > > > +       switch (elem_mode)
> > > > +	 {
> > > > +	 case E_HFmode:
> > > > +	   stype = float_type_node;
> > > > +	   break;
> > > > +	 case E_SFmode:
> > > > +	   stype = double_type_node;
> > > > +	   break;
> > > > +	 default:
> > > > +	   return NULL_TREE;
> > > > +	 }
> > >
> > > Can't you use GET_MODE_WIDER_MODE and double-check the mode-size
> > > doubles?  I mean you obviously miss DFmode -> TFmode.
> > 
> > Problem is I need the type, not the mode, but all even
> > build_pointer_type_for_mode requires the new scalar type.  So I couldn't
> > find anything to help here given that there's no inverse relationship between
> > modes and types.
> > 
> 
> I meant build_vector_type_for_mode here.

I think you should be able to use type_for_mode on the scalar wider
mode and build the vector type with the wider elements from that
though?

> > >
> > > > +       tree ntype = build_vector_type (stype, nunits);
> > > > +       if (!ntype)
> > >
> > > You want to check that the above results in a vector mode.
> > 
> > Does it? Technically you can cast a V2SF to both a V1DF or DF can't you?
> > Both seem equally valid here.

If the target doesn't support, say V2DF but only V4SF then you
could get BLKmode as result, so you can also check != BLKmode.
You could eventually check
mode_for_vector (scalar_mode, nunits).exists ().  build_vector_type
will never return NULL_TREE.

> > > > +	 return NULL_TREE;
> > > > +
> > > > +       /* The format has to have a simple sign bit.  */
> > > > +       const struct real_format *fmt = FLOAT_MODE_FORMAT
> > (vec_mode);
> > > > +       if (fmt == NULL)
> > > > +	 return NULL_TREE;
> > > > +     }
> > > > +     (if (fmt->signbit_rw == GET_MODE_UNIT_BITSIZE (vec_mode) - 1
> > >
> > > shouldn't this be a check on the component mode?  I think you'd want
> > > to check that the bigger format signbit_rw is equal to the smaller
> > > format mode size plus its signbit_rw or so?
> > 
> > Tbh, both are somewhat weak guarantees.  In a previous patch of mine I'd
> > added a new field "is_ieee"
> > to the real formats to denote that they are an IEEE type.  Maybe I should
> > revive that instead?

Not sure, isn't the only important part whether the larger mode signbit
is in the same position as the correct smaller mode element signbit?
And yes, that (negate ..) will actually just flip that bit and not
do anything else.

Richard.

> > 
> > Regards,
> > Tamar
> > 
> > >
> > > > +	  && fmt->signbit_rw == fmt->signbit_ro
> > > > +	  && targetm.can_change_mode_class (TYPE_MODE (ntype),
> > > TYPE_MODE (type), ALL_REGS)
> > > > +	  && (optimize_vectors_before_lowering_p ()
> > > > +	      || target_supports_op_p (ntype, NEGATE_EXPR, optab_vector)))
> > > > +      (plus (view_convert:type (negate (view_convert:ntype @1)))
> > > > +@0)))))))
> > > > +
> > > >  (simplify
> > > >   (vec_perm @0 @1 VECTOR_CST@2)
> > > >   (with
> > > > diff --git a/gcc/testsuite/gcc.target/aarch64/simd/addsub_1.c
> > > > b/gcc/testsuite/gcc.target/aarch64/simd/addsub_1.c
> > > > new file mode 100644
> > > > index
> > > >
> > >
> > 0000000000000000000000000000000000000000..1fb91a34c421bbd2894faa0db
> > > bf1
> > > > b47ad43310c4
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/aarch64/simd/addsub_1.c
> > > > @@ -0,0 +1,56 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-require-effective-target arm_v8_2a_fp16_neon_ok } */
> > > > +/* { dg-options "-Ofast" } */
> > > > +/* { dg-add-options arm_v8_2a_fp16_neon } */
> > > > +/* { dg-final { check-function-bodies "**" "" "" { target { le } }
> > > > +} } */
> > > > +
> > > > +#pragma GCC target "+nosve"
> > > > +
> > > > +/*
> > > > +** f1:
> > > > +** ...
> > > > +**	fneg	v[0-9]+.2d, v[0-9]+.2d
> > > > +**	fadd	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> > > > +** ...
> > > > +*/
> > > > +void f1 (float *restrict a, float *restrict b, float *res, int n) {
> > > > +   for (int i = 0; i < (n & -4); i+=2)
> > > > +    {
> > > > +      res[i+0] = a[i+0] + b[i+0];
> > > > +      res[i+1] = a[i+1] - b[i+1];
> > > > +    }
> > > > +}
> > > > +
> > > > +/*
> > > > +** d1:
> > > > +** ...
> > > > +** 	fneg	v[0-9]+.4s, v[0-9]+.4s
> > > > +** 	fadd	v[0-9]+.8h, v[0-9]+.8h, v[0-9]+.8h
> > > > +** ...
> > > > +*/
> > > > +void d1 (_Float16 *restrict a, _Float16 *restrict b, _Float16 *res,
> > > > +int n) {
> > > > +   for (int i = 0; i < (n & -8); i+=2)
> > > > +    {
> > > > +      res[i+0] = a[i+0] + b[i+0];
> > > > +      res[i+1] = a[i+1] - b[i+1];
> > > > +    }
> > > > +}
> > > > +
> > > > +/*
> > > > +** e1:
> > > > +** ...
> > > > +** 	fadd	v[0-9]+.2d, v[0-9]+.2d, v[0-9]+.2d
> > > > +** 	fsub	v[0-9]+.2d, v[0-9]+.2d, v[0-9]+.2d
> > > > +** 	ins	v[0-9]+.d\[1\], v[0-9]+.d\[1\]
> > > > +** ...
> > > > +*/
> > > > +void e1 (double *restrict a, double *restrict b, double *res, int
> > > > +n) {
> > > > +   for (int i = 0; i < (n & -4); i+=2)
> > > > +    {
> > > > +      res[i+0] = a[i+0] + b[i+0];
> > > > +      res[i+1] = a[i+1] - b[i+1];
> > > > +    }
> > > > +}
> > > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/addsub_1.c
> > > > b/gcc/testsuite/gcc.target/aarch64/sve/addsub_1.c
> > > > new file mode 100644
> > > > index
> > > >
> > >
> > 0000000000000000000000000000000000000000..ea7f9d9db2c8c9a3efe5c7951a
> > > 31
> > > > 4a29b7a7a922
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/aarch64/sve/addsub_1.c
> > > > @@ -0,0 +1,52 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-Ofast" } */
> > > > +/* { dg-final { check-function-bodies "**" "" "" { target { le } }
> > > > +} } */
> > > > +
> > > > +/*
> > > > +** f1:
> > > > +** ...
> > > > +** 	fneg	z[0-9]+.d, p[0-9]+/m, z[0-9]+.d
> > > > +** 	fadd	z[0-9]+.s, z[0-9]+.s, z[0-9]+.s
> > > > +** ...
> > > > +*/
> > > > +void f1 (float *restrict a, float *restrict b, float *res, int n) {
> > > > +   for (int i = 0; i < (n & -4); i+=2)
> > > > +    {
> > > > +      res[i+0] = a[i+0] + b[i+0];
> > > > +      res[i+1] = a[i+1] - b[i+1];
> > > > +    }
> > > > +}
> > > > +
> > > > +/*
> > > > +** d1:
> > > > +** ...
> > > > +** 	fneg	z[0-9]+.s, p[0-9]+/m, z[0-9]+.s
> > > > +** 	fadd	z[0-9]+.h, z[0-9]+.h, z[0-9]+.h
> > > > +** ...
> > > > +*/
> > > > +void d1 (_Float16 *restrict a, _Float16 *restrict b, _Float16 *res,
> > > > +int n) {
> > > > +   for (int i = 0; i < (n & -8); i+=2)
> > > > +    {
> > > > +      res[i+0] = a[i+0] + b[i+0];
> > > > +      res[i+1] = a[i+1] - b[i+1];
> > > > +    }
> > > > +}
> > > > +
> > > > +/*
> > > > +** e1:
> > > > +** ...
> > > > +** 	fsub	z[0-9]+.d, z[0-9]+.d, z[0-9]+.d
> > > > +** 	movprfx	z[0-9]+.d, p[0-9]+/m, z[0-9]+.d
> > > > +** 	fadd	z[0-9]+.d, p[0-9]+/m, z[0-9]+.d, z[0-9]+.d
> > > > +** ...
> > > > +*/
> > > > +void e1 (double *restrict a, double *restrict b, double *res, int
> > > > +n) {
> > > > +   for (int i = 0; i < (n & -4); i+=2)
> > > > +    {
> > > > +      res[i+0] = a[i+0] + b[i+0];
> > > > +      res[i+1] = a[i+1] - b[i+1];
> > > > +    }
> > > > +}
> > > >
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald,
> > > Boudien Moerman; HRB 36809 (AG Nuernberg)
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)