public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][AArch64] Implement vsqrt_f64 intrinsic
@ 2014-11-17 17:47 Kyrill Tkachov
  2014-11-21 16:12 ` Marcus Shawcroft
  2014-12-15 12:15 ` James Greenhalgh
  0 siblings, 2 replies; 10+ messages in thread
From: Kyrill Tkachov @ 2014-11-17 17:47 UTC (permalink / raw)
  To: GCC Patches; +Cc: Marcus Shawcroft, Richard Earnshaw

[-- Attachment #1: Type: text/plain, Size: 523 bytes --]

Hi all,

This patch implements the vsqrt_f64 intrinsic in arm_neon.h.
There's not much to it, we can reuse __builtin_sqrt.
It's a fairly straightforward and self-contained patch,
do we still want it at this stage?

A new execute test is added.

Tested aarch64-none-elf.


Thanks,
Kyrill

2014-11-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * config/aarch64/arm_neon.h (vsqrt_f64): New intrinsic.

2014-11-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gcc.target/aarch64/simd/vsqrt_f64_1.c

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: aarch64-vsqrt_f64.patch --]
[-- Type: text/x-patch; name=aarch64-vsqrt_f64.patch, Size: 1567 bytes --]

commit d9e42debe2655287eef7b8c3ecf29bbdd11e6425
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Mon Nov 17 15:02:01 2014 +0000

    [AArch64] Implement vsqrt_f64 intrinsic

diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index b3b80b8..c58213a 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -22792,6 +22792,12 @@ vsqrtq_f32 (float32x4_t a)
   return __builtin_aarch64_sqrtv4sf (a);
 }
 
+__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+vsqrt_f64 (float64x1_t a)
+{
+  return (float64x1_t) { __builtin_sqrt (a[0]) };
+}
+
 __extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
 vsqrtq_f64 (float64x2_t a)
 {
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vsqrt_f64_1.c b/gcc/testsuite/gcc.target/aarch64/simd/vsqrt_f64_1.c
new file mode 100644
index 0000000..57fb6bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vsqrt_f64_1.c
@@ -0,0 +1,25 @@
+/* Test the vsqrt_f64 AArch64 SIMD intrinsic.  */
+
+/* { dg-do run } */
+/* { dg-options "-save-temps -O3" } */
+
+#include "arm_neon.h"
+
+extern void abort (void);
+
+
+int
+main (void)
+{
+  float64x1_t in = vcreate_f64(0x3febd3e560634d7bULL);
+  float64x1_t result = vsqrt_f64 (in);
+  float64_t expected = 0.9325321502142351;
+
+  if (result[0] != expected)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler "fsqrt\[ \t\]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+\n" } } */
+/* { dg-final { cleanup-saved-temps } } */

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH][AArch64] Implement vsqrt_f64 intrinsic
  2014-11-17 17:47 [PATCH][AArch64] Implement vsqrt_f64 intrinsic Kyrill Tkachov
@ 2014-11-21 16:12 ` Marcus Shawcroft
  2014-11-26 11:27   ` Christophe Lyon
  2014-12-15 12:15 ` James Greenhalgh
  1 sibling, 1 reply; 10+ messages in thread
From: Marcus Shawcroft @ 2014-11-21 16:12 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: GCC Patches

On 17 November 2014 17:35, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:

> 2014-11-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     * config/aarch64/arm_neon.h (vsqrt_f64): New intrinsic.
>
> 2014-11-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     * gcc.target/aarch64/simd/vsqrt_f64_1.c

OK /Marcus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH][AArch64] Implement vsqrt_f64 intrinsic
  2014-11-21 16:12 ` Marcus Shawcroft
@ 2014-11-26 11:27   ` Christophe Lyon
  2014-11-27 10:10     ` Kyrill Tkachov
  0 siblings, 1 reply; 10+ messages in thread
From: Christophe Lyon @ 2014-11-26 11:27 UTC (permalink / raw)
  To: Marcus Shawcroft; +Cc: Kyrill Tkachov, GCC Patches

Hi Kyrill,


On 21 November 2014 at 16:52, Marcus Shawcroft
<marcus.shawcroft@gmail.com> wrote:
> On 17 November 2014 17:35, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>
>> 2014-11-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>     * config/aarch64/arm_neon.h (vsqrt_f64): New intrinsic.
>>
>> 2014-11-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>     * gcc.target/aarch64/simd/vsqrt_f64_1.c
>
> OK /Marcus

Your new test fails at the scan-assembly step because all the code is
optimized away (even at -O1).

Christophe.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH][AArch64] Implement vsqrt_f64 intrinsic
  2014-11-26 11:27   ` Christophe Lyon
@ 2014-11-27 10:10     ` Kyrill Tkachov
  2014-11-27 10:14       ` Kyrill Tkachov
  0 siblings, 1 reply; 10+ messages in thread
From: Kyrill Tkachov @ 2014-11-27 10:10 UTC (permalink / raw)
  To: Christophe Lyon, Marcus Shawcroft; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 984 bytes --]


On 26/11/14 10:14, Christophe Lyon wrote:
> Hi Kyrill,
Hi Christophe,

>
>
> On 21 November 2014 at 16:52, Marcus Shawcroft
> <marcus.shawcroft@gmail.com> wrote:
>> On 17 November 2014 17:35, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>>
>>> 2014-11-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>
>>>      * config/aarch64/arm_neon.h (vsqrt_f64): New intrinsic.
>>>
>>> 2014-11-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>
>>>      * gcc.target/aarch64/simd/vsqrt_f64_1.c
>> OK /Marcus
> Your new test fails at the scan-assembly step because all the code is
> optimized away (even at -O1).

Sorry about that, I could have sworn I saw it pass when I initially 
wrote it...
In any case, I've committed this patch (r218117) as obvious to mark one 
of the variables as volatile
to make sure it's not optimised away.
I've confirmed that the scan-assembler test fails without this patch and 
passes with it.

Thanks,
Kyrill

>
> Christophe.
>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: aarch64-vsqrt-test.patch --]
[-- Type: text/x-patch; name=aarch64-vsqrt-test.patch, Size: 548 bytes --]

diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vsqrt_f64_1.c b/gcc/testsuite/gcc.target/aarch64/simd/vsqrt_f64_1.c
index 57fb6bb..7f99bd5 100644
--- a/gcc/testsuite/gcc.target/aarch64/simd/vsqrt_f64_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vsqrt_f64_1.c
@@ -11,7 +11,7 @@ extern void abort (void);
 int
 main (void)
 {
-  float64x1_t in = vcreate_f64(0x3febd3e560634d7bULL);
+  volatile float64x1_t in = vcreate_f64(0x3febd3e560634d7bULL);
   float64x1_t result = vsqrt_f64 (in);
   float64_t expected = 0.9325321502142351;
 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH][AArch64] Implement vsqrt_f64 intrinsic
  2014-11-27 10:10     ` Kyrill Tkachov
@ 2014-11-27 10:14       ` Kyrill Tkachov
  0 siblings, 0 replies; 10+ messages in thread
From: Kyrill Tkachov @ 2014-11-27 10:14 UTC (permalink / raw)
  To: Christophe Lyon, Marcus Shawcroft; +Cc: GCC Patches


On 27/11/14 10:03, Kyrill Tkachov wrote:
> On 26/11/14 10:14, Christophe Lyon wrote:
>> Hi Kyrill,
> Hi Christophe,
>
>>
>> On 21 November 2014 at 16:52, Marcus Shawcroft
>> <marcus.shawcroft@gmail.com> wrote:
>>> On 17 November 2014 17:35, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>>>
>>>> 2014-11-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>
>>>>       * config/aarch64/arm_neon.h (vsqrt_f64): New intrinsic.
>>>>
>>>> 2014-11-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>
>>>>       * gcc.target/aarch64/simd/vsqrt_f64_1.c
>>> OK /Marcus
>> Your new test fails at the scan-assembly step because all the code is
>> optimized away (even at -O1).
> Sorry about that, I could have sworn I saw it pass when I initially
> wrote it...
> In any case, I've committed this patch (r218117) as obvious to mark one
> of the variables as volatile
> to make sure it's not optimised away.
> I've confirmed that the scan-assembler test fails without this patch and
> passes with it.

With the following ChangeLog for completeness sake:

2014-11-27  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gcc.target/aarch64/simd/vsqrt_f64_1.c: Mark variable volatile.

>
> Thanks,
> Kyrill
>
>> Christophe.
> >


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH][AArch64] Implement vsqrt_f64 intrinsic
  2014-11-17 17:47 [PATCH][AArch64] Implement vsqrt_f64 intrinsic Kyrill Tkachov
  2014-11-21 16:12 ` Marcus Shawcroft
@ 2014-12-15 12:15 ` James Greenhalgh
  2014-12-17  0:19   ` Joseph Myers
  1 sibling, 1 reply; 10+ messages in thread
From: James Greenhalgh @ 2014-12-15 12:15 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: GCC Patches, Marcus Shawcroft, Richard Earnshaw

On Mon, Nov 17, 2014 at 05:35:23PM +0000, Kyrill Tkachov wrote:
> Hi all,
> 
> This patch implements the vsqrt_f64 intrinsic in arm_neon.h.
> There's not much to it, we can reuse __builtin_sqrt.
> It's a fairly straightforward and self-contained patch,
> do we still want it at this stage?
> 
> A new execute test is added.
> 
> Tested aarch64-none-elf.
> 
> 
> Thanks,
> Kyrill
> 
> 2014-11-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
>      * config/aarch64/arm_neon.h (vsqrt_f64): New intrinsic.
> 
> 2014-11-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
>      * gcc.target/aarch64/simd/vsqrt_f64_1.c

> commit d9e42debe2655287eef7b8c3ecf29bbdd11e6425
> Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
> Date:   Mon Nov 17 15:02:01 2014 +0000
> 
>     [AArch64] Implement vsqrt_f64 intrinsic
> 
> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index b3b80b8..c58213a 100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -22792,6 +22792,12 @@ vsqrtq_f32 (float32x4_t a)
>    return __builtin_aarch64_sqrtv4sf (a);
>  }
>  
> +__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
> +vsqrt_f64 (float64x1_t a)
> +{
> +  return (float64x1_t) { __builtin_sqrt (a[0]) };
> +}

Hi Kyrill,

Does this introduce an implicit need to link against a maths library if
we want arm_neon.h to work correctly? If so, I think we need to take a
different approach.

At O0 I've started to see:

  " undefined reference to `sqrt' "

When checking a large arm_neon.h testcase.

It does seem strange that the mid-end would convert a __builtin_sqrt back
to a library call at O0 when the target has an optab for it, so perhaps
there is a bug there to go hunt?

Thanks,
James

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH][AArch64] Implement vsqrt_f64 intrinsic
  2014-12-15 12:15 ` James Greenhalgh
@ 2014-12-17  0:19   ` Joseph Myers
  2014-12-18 12:02     ` Kyrill Tkachov
  2015-01-09 11:45     ` Kyrill Tkachov
  0 siblings, 2 replies; 10+ messages in thread
From: Joseph Myers @ 2014-12-17  0:19 UTC (permalink / raw)
  To: James Greenhalgh
  Cc: Kyrill Tkachov, GCC Patches, Marcus Shawcroft, Richard Earnshaw

On Mon, 15 Dec 2014, James Greenhalgh wrote:

> > @@ -22792,6 +22792,12 @@ vsqrtq_f32 (float32x4_t a)
> >    return __builtin_aarch64_sqrtv4sf (a);
> >  }
> >  
> > +__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
> > +vsqrt_f64 (float64x1_t a)
> > +{
> > +  return (float64x1_t) { __builtin_sqrt (a[0]) };
> > +}
> 
> Hi Kyrill,
> 
> Does this introduce an implicit need to link against a maths library if
> we want arm_neon.h to work correctly? If so, I think we need to take a
> different approach.
> 
> At O0 I've started to see:
> 
>   " undefined reference to `sqrt' "
> 
> When checking a large arm_neon.h testcase.
> 
> It does seem strange that the mid-end would convert a __builtin_sqrt back
> to a library call at O0 when the target has an optab for it, so perhaps
> there is a bug there to go hunt?

__builtin_sqrt has the same semantics as the sqrt library function.  This 
includes setting errno for negative arguments (other than -0 and -NaN).  
The semantics also include that it's always OK to expand to a call to that 
library function (generally, __builtin_foo may always expand to a call to 
foo, if there is such a library function).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH][AArch64] Implement vsqrt_f64 intrinsic
  2014-12-17  0:19   ` Joseph Myers
@ 2014-12-18 12:02     ` Kyrill Tkachov
  2014-12-18 13:29       ` Joseph Myers
  2015-01-09 11:45     ` Kyrill Tkachov
  1 sibling, 1 reply; 10+ messages in thread
From: Kyrill Tkachov @ 2014-12-18 12:02 UTC (permalink / raw)
  To: Joseph Myers, James Greenhalgh
  Cc: GCC Patches, Marcus Shawcroft, Richard Earnshaw


On 17/12/14 00:04, Joseph Myers wrote:
> On Mon, 15 Dec 2014, James Greenhalgh wrote:
>
>>> @@ -22792,6 +22792,12 @@ vsqrtq_f32 (float32x4_t a)
>>>     return __builtin_aarch64_sqrtv4sf (a);
>>>   }
>>>   
>>> +__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
>>> +vsqrt_f64 (float64x1_t a)
>>> +{
>>> +  return (float64x1_t) { __builtin_sqrt (a[0]) };
>>> +}
>> Hi Kyrill,
>>
>> Does this introduce an implicit need to link against a maths library if
>> we want arm_neon.h to work correctly? If so, I think we need to take a
>> different approach.
>>
>> At O0 I've started to see:
>>
>>    " undefined reference to `sqrt' "
>>
>> When checking a large arm_neon.h testcase.
>>
>> It does seem strange that the mid-end would convert a __builtin_sqrt back
>> to a library call at O0 when the target has an optab for it, so perhaps
>> there is a bug there to go hunt?
> __builtin_sqrt has the same semantics as the sqrt library function.  This
> includes setting errno for negative arguments (other than -0 and -NaN).
> The semantics also include that it's always OK to expand to a call to that
> library function (generally, __builtin_foo may always expand to a call to
> foo, if there is such a library function).

Thanks for the explanation.
I see that there are some intrinsics implemented in terms of 
__builtin_fabsf, presumable they can be at 'risk' too of having a 
library call emitted?

Kyrill

>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH][AArch64] Implement vsqrt_f64 intrinsic
  2014-12-18 12:02     ` Kyrill Tkachov
@ 2014-12-18 13:29       ` Joseph Myers
  0 siblings, 0 replies; 10+ messages in thread
From: Joseph Myers @ 2014-12-18 13:29 UTC (permalink / raw)
  To: Kyrill Tkachov
  Cc: James Greenhalgh, GCC Patches, Marcus Shawcroft, Richard Earnshaw

On Thu, 18 Dec 2014, Kyrill Tkachov wrote:

> I see that there are some intrinsics implemented in terms of __builtin_fabsf,
> presumable they can be at 'risk' too of having a library call emitted?

The semantics of fabsf/fabs/fabsl are to clear the sign bit, never raise 
any exceptions (even for signaling NaN arguments, subject to any 
limitations imposed by calling conventions / floating-point load 
conventions on the processor) and never set errno, as per IEEE 754-2008 
(and the C bindings thereto, TS 18661-1:2014).

So there is certainly no need to call a library function for 
__builtin_fabsf, but I don't know whether the implementation guarantees 
never to generate such a call.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH][AArch64] Implement vsqrt_f64 intrinsic
  2014-12-17  0:19   ` Joseph Myers
  2014-12-18 12:02     ` Kyrill Tkachov
@ 2015-01-09 11:45     ` Kyrill Tkachov
  1 sibling, 0 replies; 10+ messages in thread
From: Kyrill Tkachov @ 2015-01-09 11:45 UTC (permalink / raw)
  To: Joseph Myers, James Greenhalgh
  Cc: GCC Patches, Marcus Shawcroft, Richard Earnshaw


On 17/12/14 00:04, Joseph Myers wrote:
> On Mon, 15 Dec 2014, James Greenhalgh wrote:
>
>>> @@ -22792,6 +22792,12 @@ vsqrtq_f32 (float32x4_t a)
>>>     return __builtin_aarch64_sqrtv4sf (a);
>>>   }
>>>   
>>> +__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
>>> +vsqrt_f64 (float64x1_t a)
>>> +{
>>> +  return (float64x1_t) { __builtin_sqrt (a[0]) };
>>> +}
>> Hi Kyrill,
>>
>> Does this introduce an implicit need to link against a maths library if
>> we want arm_neon.h to work correctly? If so, I think we need to take a
>> different approach.
>>
>> At O0 I've started to see:
>>
>>    " undefined reference to `sqrt' "
>>
>> When checking a large arm_neon.h testcase.
>>
>> It does seem strange that the mid-end would convert a __builtin_sqrt back
>> to a library call at O0 when the target has an optab for it, so perhaps
>> there is a bug there to go hunt?
> __builtin_sqrt has the same semantics as the sqrt library function.  This
> includes setting errno for negative arguments (other than -0 and -NaN).
> The semantics also include that it's always OK to expand to a call to that
> library function (generally, __builtin_foo may always expand to a call to
> foo, if there is such a library function).

So my first attempt at this patch had created a target builtin 
(__builtin_aarch64_sqrtdf) and used that. Eventually though I went for 
the shorter __builtin_sqrt because I thought we could benefit from the 
tree-level information about the semantics rather than the RTL-level 
expansion that the target-specific builtin would provide.
But if there's a risk for it to expand to a library function call, I 
guess it's better to go with the target builtin. I'll prepare a patch.

Thanks for the explanations,
Kyrill

>


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-01-09 11:45 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-17 17:47 [PATCH][AArch64] Implement vsqrt_f64 intrinsic Kyrill Tkachov
2014-11-21 16:12 ` Marcus Shawcroft
2014-11-26 11:27   ` Christophe Lyon
2014-11-27 10:10     ` Kyrill Tkachov
2014-11-27 10:14       ` Kyrill Tkachov
2014-12-15 12:15 ` James Greenhalgh
2014-12-17  0:19   ` Joseph Myers
2014-12-18 12:02     ` Kyrill Tkachov
2014-12-18 13:29       ` Joseph Myers
2015-01-09 11:45     ` Kyrill Tkachov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).