Re: [PATCH]middle-end match.pd: optimize fneg (fabs (x)) to x | (1 << signbit(x)) [PR109154]

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Sandiford <richard.sandiford@arm.com>
To: Richard Biener <rguenther@suse.de>
Cc: Tamar Christina <Tamar.Christina@arm.com>,
	 Andrew Pinski <pinskia@gmail.com>,
	 "gcc-patches\@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	 nd <nd@arm.com>,
	 "jlaw\@ventanamicro.com" <jlaw@ventanamicro.com>
Subject: Re: [PATCH]middle-end match.pd: optimize fneg (fabs (x)) to x | (1 << signbit(x)) [PR109154]
Date: Sat, 07 Oct 2023 10:22:48 +0100	[thread overview]
Message-ID: <mpt1qe6lrbr.fsf@arm.com> (raw)
In-Reply-To: <nycvar.YFH.7.77.849.2310060558560.5561@jbgna.fhfr.qr> (Richard Biener's message of "Fri, 6 Oct 2023 06:24:31 +0000 (UTC)")

Richard Biener <rguenther@suse.de> writes:
> On Thu, 5 Oct 2023, Tamar Christina wrote:
>
>> > I suppose the idea is that -abs(x) might be easier to optimize with other
>> > patterns (consider a - copysign(x,...), optimizing to a + abs(x)).
>> > 
>> > For abs vs copysign it's a canonicalization, but (negate (abs @0)) is less
>> > canonical than copysign.
>> > 
>> > > Should I try removing this?
>> > 
>> > I'd say yes (and put the reverse canonicalization next to this pattern).
>> > 
>> 
>> This patch transforms fneg (fabs (x)) into copysign (x, -1) which is more
>> canonical and allows a target to expand this sequence efficiently.  Such
>> sequences are common in scientific code working with gradients.
>> 
>> various optimizations in match.pd only happened on COPYSIGN but not COPYSIGN_ALL
>> which means they exclude IFN_COPYSIGN.  COPYSIGN however is restricted to only
>
> That's not true:
>
> (define_operator_list COPYSIGN
>     BUILT_IN_COPYSIGNF
>     BUILT_IN_COPYSIGN
>     BUILT_IN_COPYSIGNL
>     IFN_COPYSIGN)
>
> but they miss the extended float builtin variants like
> __builtin_copysignf16.  Also see below
>
>> the C99 builtins and so doesn't work for vectors.
>> 
>> The patch expands these optimizations to work on COPYSIGN_ALL.
>> 
>> There is an existing canonicalization of copysign (x, -1) to fneg (fabs (x))
>> which I remove since this is a less efficient form.  The testsuite is also
>> updated in light of this.
>> 
>> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>> 
>> Ok for master?
>> 
>> Thanks,
>> Tamar
>> 
>> gcc/ChangeLog:
>> 
>> 	PR tree-optimization/109154
>> 	* match.pd: Add new neg+abs rule, remove inverse copysign rule and
>> 	expand existing copysign optimizations.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>> 	PR tree-optimization/109154
>> 	* gcc.dg/fold-copysign-1.c: Updated.
>> 	* gcc.dg/pr55152-2.c: Updated.
>> 	* gcc.dg/tree-ssa/abs-4.c: Updated.
>> 	* gcc.dg/tree-ssa/backprop-6.c: Updated.
>> 	* gcc.dg/tree-ssa/copy-sign-2.c: Updated.
>> 	* gcc.dg/tree-ssa/mult-abs-2.c: Updated.
>> 	* gcc.target/aarch64/fneg-abs_1.c: New test.
>> 	* gcc.target/aarch64/fneg-abs_2.c: New test.
>> 	* gcc.target/aarch64/fneg-abs_3.c: New test.
>> 	* gcc.target/aarch64/fneg-abs_4.c: New test.
>> 	* gcc.target/aarch64/sve/fneg-abs_1.c: New test.
>> 	* gcc.target/aarch64/sve/fneg-abs_2.c: New test.
>> 	* gcc.target/aarch64/sve/fneg-abs_3.c: New test.
>> 	* gcc.target/aarch64/sve/fneg-abs_4.c: New test.
>> 
>> --- inline copy of patch ---
>> 
>> diff --git a/gcc/match.pd b/gcc/match.pd
>> index 4bdd83e6e061b16dbdb2845b9398fcfb8a6c9739..bd6599d36021e119f51a4928354f580ffe82c6e2 100644
>> --- a/gcc/match.pd
>> +++ b/gcc/match.pd
>> @@ -1074,45 +1074,43 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>>  
>>  /* cos(copysign(x, y)) -> cos(x).  Similarly for cosh.  */
>>  (for coss (COS COSH)
>> -     copysigns (COPYSIGN)
>> - (simplify
>> -  (coss (copysigns @0 @1))
>> -   (coss @0)))
>> + (for copysigns (COPYSIGN_ALL)
>
> So this ends up generating for example the match
> (cosf (copysignl ...)) which doesn't make much sense.
>
> The lock-step iteration did
> (cosf (copysignf ..)) ... (ifn_cos (ifn_copysign ...))
> which is leaner but misses the case of
> (cosf (ifn_copysign ..)) - that's probably what you are
> after with this change.
>
> That said, there isn't a nice solution (without altering the match.pd
> IL).  There's the explicit solution, spelling out all combinations.
>
> So if we want to go with yout pragmatic solution changing this
> to use COPYSIGN_ALL isn't necessary, only changing the lock-step
> for iteration to a cross product for iteration is.
>
> Changing just this pattern to
>
> (for coss (COS COSH)
>  (for copysigns (COPYSIGN)
>   (simplify
>    (coss (copysigns @0 @1))
>    (coss @0))))
>
> increases the total number of gimple-match-x.cc lines from
> 234988 to 235324.

I guess the difference between this and the later suggestions is that
this one allows builtin copysign to be paired with ifn cos, which would
be potentially useful in other situations.  (It isn't here because
ifn_cos is rarely provided.)  How much of the growth is due to that,
and much of it is from nonsensical combinations like
(builtin_cosf (builtin_copysignl ...))?

If it's mostly from nonsensical combinations then would it be possible
to make genmatch drop them?

> The alternative is to do
>
> (for coss (COS COSH)
>      copysigns (COPYSIGN)
>  (simplify
>   (coss (copysigns @0 @1))
>    (coss @0))
>  (simplify
>   (coss (IFN_COPYSIGN @0 @1))
>    (coss @0)))
>
> which properly will diagnose a duplicate pattern.  Ther are
> currently no operator lists with just builtins defined (that
> could be fixed, see gencfn-macros.cc), supposed we'd have
> COS_C we could do
>
> (for coss (COS_C COSH_C IFN_COS IFN_COSH)
>      copysigns (COPYSIGN_C COPYSIGN_C IFN_COPYSIGN IFN_COPYSIGN 
> IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN 
> IFN_COPYSIGN)
>  (simplify
>   (coss (copysigns @0 @1))
>    (coss @0)))
>
> which of course still looks ugly ;) (some syntax extension like
> allowing to specify IFN_COPYSIGN*8 would be nice here and easy
> enough to do)
>
> Can you split out the part changing COPYSIGN to COPYSIGN_ALL,
> re-do it to only split the fors, keeping COPYSIGN and provide
> some statistics on the gimple-match-* size?  I think this might
> be the pragmatic solution for now.
>
> Richard - can you think of a clever way to express the desired
> iteration?  How do RTL macro iterations address cases like this?

I don't think .md files have an equivalent construct, unfortunately.
(I also regret some of the choices I made for .md iterators, but that's
another story.)

Perhaps an alternative to the *8 thing would be "IFN_COPYSIGN...",
with the "..." meaning "fill to match the longest operator list
in the loop".

Thanks,
Richard

> Richard.
>
>> +  (simplify
>> +   (coss (copysigns @0 @1))
>> +    (coss @0))))
>>  
>>  /* pow(copysign(x, y), z) -> pow(x, z) if z is an even integer.  */
>>  (for pows (POW)
>> -     copysigns (COPYSIGN)
>> - (simplify
>> -  (pows (copysigns @0 @2) REAL_CST@1)
>> -  (with { HOST_WIDE_INT n; }
>> -   (if (real_isinteger (&TREE_REAL_CST (@1), &n) && (n & 1) == 0)
>> -    (pows @0 @1)))))
>> + (for copysigns (COPYSIGN_ALL)
>> +  (simplify
>> +   (pows (copysigns @0 @2) REAL_CST@1)
>> +   (with { HOST_WIDE_INT n; }
>> +    (if (real_isinteger (&TREE_REAL_CST (@1), &n) && (n & 1) == 0)
>> +     (pows @0 @1))))))
>>  /* Likewise for powi.  */
>>  (for pows (POWI)
>> -     copysigns (COPYSIGN)
>> - (simplify
>> -  (pows (copysigns @0 @2) INTEGER_CST@1)
>> -  (if ((wi::to_wide (@1) & 1) == 0)
>> -   (pows @0 @1))))
>> + (for copysigns (COPYSIGN_ALL)
>> +  (simplify
>> +   (pows (copysigns @0 @2) INTEGER_CST@1)
>> +   (if ((wi::to_wide (@1) & 1) == 0)
>> +    (pows @0 @1)))))
>>  
>>  (for hypots (HYPOT)
>> -     copysigns (COPYSIGN)
>> - /* hypot(copysign(x, y), z) -> hypot(x, z).  */
>> - (simplify
>> -  (hypots (copysigns @0 @1) @2)
>> -  (hypots @0 @2))
>> - /* hypot(x, copysign(y, z)) -> hypot(x, y).  */
>> - (simplify
>> -  (hypots @0 (copysigns @1 @2))
>> -  (hypots @0 @1)))
>> + (for copysigns (COPYSIGN)
>> +  /* hypot(copysign(x, y), z) -> hypot(x, z).  */
>> +  (simplify
>> +   (hypots (copysigns @0 @1) @2)
>> +   (hypots @0 @2))
>> +  /* hypot(x, copysign(y, z)) -> hypot(x, y).  */
>> +  (simplify
>> +   (hypots @0 (copysigns @1 @2))
>> +   (hypots @0 @1))))
>>  
>> -/* copysign(x, CST) -> [-]abs (x).  */
>> -(for copysigns (COPYSIGN_ALL)
>> - (simplify
>> -  (copysigns @0 REAL_CST@1)
>> -  (if (REAL_VALUE_NEGATIVE (TREE_REAL_CST (@1)))
>> -   (negate (abs @0))
>> -   (abs @0))))
>> +/* Transform fneg (fabs (X)) -> copysign (X, -1).  */
>> +
>> +(simplify
>> + (negate (abs @0))
>> + (IFN_COPYSIGN @0 { build_minus_one_cst (type); }))
>>  
>>  /* copysign(copysign(x, y), z) -> copysign(x, z).  */
>>  (for copysigns (COPYSIGN_ALL)
>> diff --git a/gcc/testsuite/gcc.dg/fold-copysign-1.c b/gcc/testsuite/gcc.dg/fold-copysign-1.c
>> index f17d65c24ee4dca9867827d040fe0a404c515e7b..f9cafd14ab05f5e8ab2f6f68e62801d21c2df6a6 100644
>> --- a/gcc/testsuite/gcc.dg/fold-copysign-1.c
>> +++ b/gcc/testsuite/gcc.dg/fold-copysign-1.c
>> @@ -12,5 +12,5 @@ double bar (double x)
>>    return __builtin_copysign (x, minuszero);
>>  }
>>  
>> -/* { dg-final { scan-tree-dump-times "= -" 1 "cddce1" } } */
>> -/* { dg-final { scan-tree-dump-times "= ABS_EXPR" 2 "cddce1" } } */
>> +/* { dg-final { scan-tree-dump-times "__builtin_copysign" 1 "cddce1" } } */
>> +/* { dg-final { scan-tree-dump-times "= ABS_EXPR" 1 "cddce1" } } */
>> diff --git a/gcc/testsuite/gcc.dg/pr55152-2.c b/gcc/testsuite/gcc.dg/pr55152-2.c
>> index 54db0f2062da105a829d6690ac8ed9891fe2b588..605f202ed6bc7aa8fe921457b02ff0b88cc63ce6 100644
>> --- a/gcc/testsuite/gcc.dg/pr55152-2.c
>> +++ b/gcc/testsuite/gcc.dg/pr55152-2.c
>> @@ -10,4 +10,5 @@ int f(int a)
>>    return (a<-a)?a:-a;
>>  }
>>  
>> -/* { dg-final { scan-tree-dump-times "ABS_EXPR" 2 "optimized" } } */
>> +/* { dg-final { scan-tree-dump-times "\.COPYSIGN" 1 "optimized" } } */
>> +/* { dg-final { scan-tree-dump-times "ABS_EXPR" 1 "optimized" } } */
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/abs-4.c b/gcc/testsuite/gcc.dg/tree-ssa/abs-4.c
>> index 6197519faf7b55aed7bc162cd0a14dd2145210ca..e1b825f37f69ac3c4666b3a52d733368805ad31d 100644
>> --- a/gcc/testsuite/gcc.dg/tree-ssa/abs-4.c
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/abs-4.c
>> @@ -9,5 +9,6 @@ long double abs_ld(long double x) { return __builtin_signbit(x) ? x : -x; }
>>  
>>  /* __builtin_signbit(x) ? x : -x. Should be convert into - ABS_EXP<x> */
>>  /* { dg-final { scan-tree-dump-not "signbit" "optimized"} } */
>> -/* { dg-final { scan-tree-dump-times "= ABS_EXPR" 3 "optimized"} } */
>> -/* { dg-final { scan-tree-dump-times "= -" 3 "optimized"} } */
>> +/* { dg-final { scan-tree-dump-times "= ABS_EXPR" 1 "optimized"} } */
>> +/* { dg-final { scan-tree-dump-times "= -" 1 "optimized"} } */
>> +/* { dg-final { scan-tree-dump-times "= \.COPYSIGN" 2 "optimized"} } */
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c b/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c
>> index 31f05716f1498dc709cac95fa20fb5796642c77e..c3a138642d6ff7be984e91fa1343cb2718db7ae1 100644
>> --- a/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c
>> @@ -26,5 +26,6 @@ TEST_FUNCTION (float, f)
>>  TEST_FUNCTION (double, )
>>  TEST_FUNCTION (long double, l)
>>  
>> -/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -} 6 "backprop" } } */
>> -/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = ABS_EXPR <} 3 "backprop" } } */
>> +/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -} 4 "backprop" } } */
>> +/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = \.COPYSIGN} 2 "backprop" } } */
>> +/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = ABS_EXPR <} 1 "backprop" } } */
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/copy-sign-2.c b/gcc/testsuite/gcc.dg/tree-ssa/copy-sign-2.c
>> index de52c5f7c8062958353d91f5031193defc9f3f91..e5d565c4b9832c00106588ef411fbd8c292a5cad 100644
>> --- a/gcc/testsuite/gcc.dg/tree-ssa/copy-sign-2.c
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/copy-sign-2.c
>> @@ -10,4 +10,5 @@ float f1(float x)
>>    float t = __builtin_copysignf (1.0f, -x);
>>    return x * t;
>>  }
>> -/* { dg-final { scan-tree-dump-times "ABS" 2 "optimized"} } */
>> +/* { dg-final { scan-tree-dump-times "ABS" 1 "optimized"} } */
>> +/* { dg-final { scan-tree-dump-times ".COPYSIGN" 1 "optimized"} } */
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/mult-abs-2.c b/gcc/testsuite/gcc.dg/tree-ssa/mult-abs-2.c
>> index a41f1baf25669a4fd301a586a49ba5e3c5b966ab..a22896b21c8b5a4d5d8e28bd8ae0db896e63ade0 100644
>> --- a/gcc/testsuite/gcc.dg/tree-ssa/mult-abs-2.c
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/mult-abs-2.c
>> @@ -34,4 +34,5 @@ float i1(float x)
>>  {
>>    return x * (x <= 0.f ? 1.f : -1.f);
>>  }
>> -/* { dg-final { scan-tree-dump-times "ABS" 8 "gimple"} } */
>> +/* { dg-final { scan-tree-dump-times "ABS" 4 "gimple"} } */
>> +/* { dg-final { scan-tree-dump-times "\.COPYSIGN" 4 "gimple"} } */
>> diff --git a/gcc/testsuite/gcc.target/aarch64/fneg-abs_1.c b/gcc/testsuite/gcc.target/aarch64/fneg-abs_1.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..f823013c3ddf6b3a266c3abfcbf2642fc2a75fa6
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/fneg-abs_1.c
>> @@ -0,0 +1,39 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O3" } */
>> +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
>> +
>> +#pragma GCC target "+nosve"
>> +
>> +#include <arm_neon.h>
>> +
>> +/*
>> +** t1:
>> +**	orr	v[0-9]+.2s, #128, lsl #24
>> +**	ret
>> +*/
>> +float32x2_t t1 (float32x2_t a)
>> +{
>> +  return vneg_f32 (vabs_f32 (a));
>> +}
>> +
>> +/*
>> +** t2:
>> +**	orr	v[0-9]+.4s, #128, lsl #24
>> +**	ret
>> +*/
>> +float32x4_t t2 (float32x4_t a)
>> +{
>> +  return vnegq_f32 (vabsq_f32 (a));
>> +}
>> +
>> +/*
>> +** t3:
>> +**	adrp	x0, .LC[0-9]+
>> +**	ldr	q[0-9]+, \[x0, #:lo12:.LC0\]
>> +**	orr	v[0-9]+.16b, v[0-9]+.16b, v[0-9]+.16b
>> +**	ret
>> +*/
>> +float64x2_t t3 (float64x2_t a)
>> +{
>> +  return vnegq_f64 (vabsq_f64 (a));
>> +}
>> diff --git a/gcc/testsuite/gcc.target/aarch64/fneg-abs_2.c b/gcc/testsuite/gcc.target/aarch64/fneg-abs_2.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..141121176b309e4b2aa413dc55271a6e3c93d5e1
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/fneg-abs_2.c
>> @@ -0,0 +1,31 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O3" } */
>> +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
>> +
>> +#pragma GCC target "+nosve"
>> +
>> +#include <arm_neon.h>
>> +#include <math.h>
>> +
>> +/*
>> +** f1:
>> +**	movi	v[0-9]+.2s, 0x80, lsl 24
>> +**	orr	v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
>> +**	ret
>> +*/
>> +float32_t f1 (float32_t a)
>> +{
>> +  return -fabsf (a);
>> +}
>> +
>> +/*
>> +** f2:
>> +**	mov	x0, -9223372036854775808
>> +**	fmov	d[0-9]+, x0
>> +**	orr	v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
>> +**	ret
>> +*/
>> +float64_t f2 (float64_t a)
>> +{
>> +  return -fabs (a);
>> +}
>> diff --git a/gcc/testsuite/gcc.target/aarch64/fneg-abs_3.c b/gcc/testsuite/gcc.target/aarch64/fneg-abs_3.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..b4652173a95d104ddfa70c497f0627a61ea89d3b
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/fneg-abs_3.c
>> @@ -0,0 +1,36 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O3" } */
>> +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
>> +
>> +#pragma GCC target "+nosve"
>> +
>> +#include <arm_neon.h>
>> +#include <math.h>
>> +
>> +/*
>> +** f1:
>> +**	...
>> +**	ldr	q[0-9]+, \[x0\]
>> +**	orr	v[0-9]+.4s, #128, lsl #24
>> +**	str	q[0-9]+, \[x0\], 16
>> +**	...
>> +*/
>> +void f1 (float32_t *a, int n)
>> +{
>> +  for (int i = 0; i < (n & -8); i++)
>> +   a[i] = -fabsf (a[i]);
>> +}
>> +
>> +/*
>> +** f2:
>> +**	...
>> +**	ldr	q[0-9]+, \[x0\]
>> +**	orr	v[0-9]+.16b, v[0-9]+.16b, v[0-9]+.16b
>> +**	str	q[0-9]+, \[x0\], 16
>> +**	...
>> +*/
>> +void f2 (float64_t *a, int n)
>> +{
>> +  for (int i = 0; i < (n & -8); i++)
>> +   a[i] = -fabs (a[i]);
>> +}
>> diff --git a/gcc/testsuite/gcc.target/aarch64/fneg-abs_4.c b/gcc/testsuite/gcc.target/aarch64/fneg-abs_4.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..10879dea74462d34b26160eeb0bd54ead063166b
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/fneg-abs_4.c
>> @@ -0,0 +1,39 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O3" } */
>> +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
>> +
>> +#pragma GCC target "+nosve"
>> +
>> +#include <string.h>
>> +
>> +/*
>> +** negabs:
>> +**	mov	x0, -9223372036854775808
>> +**	fmov	d[0-9]+, x0
>> +**	orr	v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
>> +**	ret
>> +*/
>> +double negabs (double x)
>> +{
>> +   unsigned long long y;
>> +   memcpy (&y, &x, sizeof(double));
>> +   y = y | (1UL << 63);
>> +   memcpy (&x, &y, sizeof(double));
>> +   return x;
>> +}
>> +
>> +/*
>> +** negabsf:
>> +**	movi	v[0-9]+.2s, 0x80, lsl 24
>> +**	orr	v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
>> +**	ret
>> +*/
>> +float negabsf (float x)
>> +{
>> +   unsigned int y;
>> +   memcpy (&y, &x, sizeof(float));
>> +   y = y | (1U << 31);
>> +   memcpy (&x, &y, sizeof(float));
>> +   return x;
>> +}
>> +
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_1.c b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_1.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..0c7664e6de77a497682952653ffd417453854d52
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_1.c
>> @@ -0,0 +1,37 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O3" } */
>> +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
>> +
>> +#include <arm_neon.h>
>> +
>> +/*
>> +** t1:
>> +**	orr	v[0-9]+.2s, #128, lsl #24
>> +**	ret
>> +*/
>> +float32x2_t t1 (float32x2_t a)
>> +{
>> +  return vneg_f32 (vabs_f32 (a));
>> +}
>> +
>> +/*
>> +** t2:
>> +**	orr	v[0-9]+.4s, #128, lsl #24
>> +**	ret
>> +*/
>> +float32x4_t t2 (float32x4_t a)
>> +{
>> +  return vnegq_f32 (vabsq_f32 (a));
>> +}
>> +
>> +/*
>> +** t3:
>> +**	adrp	x0, .LC[0-9]+
>> +**	ldr	q[0-9]+, \[x0, #:lo12:.LC0\]
>> +**	orr	v[0-9]+.16b, v[0-9]+.16b, v[0-9]+.16b
>> +**	ret
>> +*/
>> +float64x2_t t3 (float64x2_t a)
>> +{
>> +  return vnegq_f64 (vabsq_f64 (a));
>> +}
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_2.c b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_2.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..a60cd31b9294af2dac69eed1c93f899bd5c78fca
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_2.c
>> @@ -0,0 +1,29 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O3" } */
>> +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
>> +
>> +#include <arm_neon.h>
>> +#include <math.h>
>> +
>> +/*
>> +** f1:
>> +**	movi	v[0-9]+.2s, 0x80, lsl 24
>> +**	orr	v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
>> +**	ret
>> +*/
>> +float32_t f1 (float32_t a)
>> +{
>> +  return -fabsf (a);
>> +}
>> +
>> +/*
>> +** f2:
>> +**	mov	x0, -9223372036854775808
>> +**	fmov	d[0-9]+, x0
>> +**	orr	v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
>> +**	ret
>> +*/
>> +float64_t f2 (float64_t a)
>> +{
>> +  return -fabs (a);
>> +}
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_3.c b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_3.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..1bf34328d8841de8e6b0a5458562a9f00e31c275
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_3.c
>> @@ -0,0 +1,34 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O3" } */
>> +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
>> +
>> +#include <arm_neon.h>
>> +#include <math.h>
>> +
>> +/*
>> +** f1:
>> +**	...
>> +**	ld1w	z[0-9]+.s, p[0-9]+/z, \[x0, x2, lsl 2\]
>> +**	orr	z[0-9]+.s, z[0-9]+.s, #0x80000000
>> +**	st1w	z[0-9]+.s, p[0-9]+, \[x0, x2, lsl 2\]
>> +**	...
>> +*/
>> +void f1 (float32_t *a, int n)
>> +{
>> +  for (int i = 0; i < (n & -8); i++)
>> +   a[i] = -fabsf (a[i]);
>> +}
>> +
>> +/*
>> +** f2:
>> +**	...
>> +**	ld1d	z[0-9]+.d, p[0-9]+/z, \[x0, x2, lsl 3\]
>> +**	orr	z[0-9]+.d, z[0-9]+.d, #0x8000000000000000
>> +**	st1d	z[0-9]+.d, p[0-9]+, \[x0, x2, lsl 3\]
>> +**	...
>> +*/
>> +void f2 (float64_t *a, int n)
>> +{
>> +  for (int i = 0; i < (n & -8); i++)
>> +   a[i] = -fabs (a[i]);
>> +}
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_4.c b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_4.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..21f2a8da2a5d44e3d01f6604ca7be87e3744d494
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_4.c
>> @@ -0,0 +1,37 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O3" } */
>> +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
>> +
>> +#include <string.h>
>> +
>> +/*
>> +** negabs:
>> +**	mov	x0, -9223372036854775808
>> +**	fmov	d[0-9]+, x0
>> +**	orr	v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
>> +**	ret
>> +*/
>> +double negabs (double x)
>> +{
>> +   unsigned long long y;
>> +   memcpy (&y, &x, sizeof(double));
>> +   y = y | (1UL << 63);
>> +   memcpy (&x, &y, sizeof(double));
>> +   return x;
>> +}
>> +
>> +/*
>> +** negabsf:
>> +**	movi	v[0-9]+.2s, 0x80, lsl 24
>> +**	orr	v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
>> +**	ret
>> +*/
>> +float negabsf (float x)
>> +{
>> +   unsigned int y;
>> +   memcpy (&y, &x, sizeof(float));
>> +   y = y | (1U << 31);
>> +   memcpy (&x, &y, sizeof(float));
>> +   return x;
>> +}
>> +
>>

next prev parent reply	other threads:[~2023-10-07  9:22 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-27  0:50 Tamar Christina
2023-09-27  1:17 ` Andrew Pinski
2023-09-27  2:31   ` Tamar Christina
2023-09-27  7:11     ` Richard Biener
2023-09-27  7:56       ` Tamar Christina
2023-09-27  9:35         ` Tamar Christina
2023-09-27  9:39           ` Richard Biener
2023-10-05 18:11             ` Tamar Christina
2023-10-06  6:24               ` Richard Biener
2023-10-07  9:22                 ` Richard Sandiford [this message]
2023-10-07 10:34                   ` Richard Biener
2023-10-07 11:34                     ` Richard Sandiford
2023-10-09  7:20                       ` Richard Biener
2023-10-09  7:36                         ` Andrew Pinski
2023-10-09  9:06                           ` Richard Biener
2023-09-29 15:00 ` Jeff Law
2023-10-05 18:09   ` Tamar Christina

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mpt1qe6lrbr.fsf@arm.com \
    --to=richard.sandiford@arm.com \
    --cc=Tamar.Christina@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jlaw@ventanamicro.com \
    --cc=nd@arm.com \
    --cc=pinskia@gmail.com \
    --cc=rguenther@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).