public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* arm_neon.h / vext_u64 (uint64x1_t __a, uint64x1_t __b, __const int __c)
@ 2020-09-02 21:10 Jochen Barth
  2020-09-02 21:11 ` Jochen Barth
       [not found] ` <VI1PR08MB53253664D3E7644526965E27FF260@VI1PR08MB5325.eurprd08.prod.outlook.com>
  0 siblings, 2 replies; 6+ messages in thread
From: Jochen Barth @ 2020-09-02 21:10 UTC (permalink / raw)
  To: gcc-help

Dear reader,

the definition of aarch64/arm_neon.h (gcc 10.2) is

__extension__ extern __inline uint64x1_t
__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
vext_u64 (uint64x1_t __a, uint64x1_t __b, __const int __c)
{
   __AARCH64_LANE_CHECK (__a, __c);
   /* The only possible index to the assembler instruction returns 
element 0.  */
   return __a;
}

So this function does essentially »return __a«.

If the function name »vext_...« has, as the name suggests, something to 
do with the »ext« neon simd instruction,

then I do not understand where the asm-equivalent »ext« neon instrinct 
is, because in the »Arm Architecture Reference Manual«, chapter C7.2.543 
states: »<index> Is the lowest numbered byte element to be 
extracted...«, ranging from 0..7 for Q=8 and 0..15 for Q=16 (extraction 
over the whole 128 bit register).

PS: gcc with vector expressions does not (?) use »ext« for y=(x<<(c*8)) 
| (x>>(64-c*8)); // for Q=8

Kind regards, Jochen


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: arm_neon.h / vext_u64 (uint64x1_t __a, uint64x1_t __b, __const int __c)
  2020-09-02 21:10 arm_neon.h / vext_u64 (uint64x1_t __a, uint64x1_t __b, __const int __c) Jochen Barth
@ 2020-09-02 21:11 ` Jochen Barth
       [not found] ` <VI1PR08MB53253664D3E7644526965E27FF260@VI1PR08MB5325.eurprd08.prod.outlook.com>
  1 sibling, 0 replies; 6+ messages in thread
From: Jochen Barth @ 2020-09-02 21:11 UTC (permalink / raw)
  To: gcc-help

Ooops... "Arm Architecture Reference Manual Armv8, for Armv8-A 
architecture profile" chapter C7.2.43 states:...



^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: arm_neon.h / vext_u64 (uint64x1_t __a, uint64x1_t __b, __const int __c)
       [not found]   ` <e06ec9ce-78ec-8eb2-83c4-4d3eda9e18f4@gmail.com>
@ 2020-09-10  7:54     ` Tamar Christina
  2020-09-10  8:14       ` Jochen Barth
  0 siblings, 1 reply; 6+ messages in thread
From: Tamar Christina @ 2020-09-10  7:54 UTC (permalink / raw)
  To: Jochen Barth; +Cc: gcc-help

Hi Jochen,

> 
> according to the manual, for the lower 8 Byte of the 16 byte arm simd
> register c is range 0..7,
> 
> for the complete 16 byte of arm simd register c is range 0..15;
> 

Yes, and the point is that index #0 is useless on an x1_t type. See your own example below.

> example:
> 
> #include <stdio.h>
> #include <arm_neon.h>
> 
> uint64x1_t ext(uint64x1_t a, uint64x1_t b, int c) {
>    uint64x1_t result;
>    switch(c) {
>      case 0: asm("ext %0.8B, %1.8B, %2.8B, #0" : "=w" (result) : "w"
> (a), "w" (b)); break;
>      case 1: asm("ext %0.8B, %1.8B, %2.8B, #1" : "=w" (result) : "w"
> (a), "w" (b)); break;
>      case 2: asm("ext %0.8B, %1.8B, %2.8B, #2" : "=w" (result) : "w"
> (a), "w" (b)); break;
>      case 3: asm("ext %0.8B, %1.8B, %2.8B, #3" : "=w" (result) : "w"
> (a), "w" (b)); break;
>      case 4: asm("ext %0.8B, %1.8B, %2.8B, #4" : "=w" (result) : "w"
> (a), "w" (b)); break;
>      case 5: asm("ext %0.8B, %1.8B, %2.8B, #5" : "=w" (result) : "w"
> (a), "w" (b)); break;
>      case 6: asm("ext %0.8B, %1.8B, %2.8B, #6" : "=w" (result) : "w"
> (a), "w" (b)); break;
>      case 7: asm("ext %0.8B, %1.8B, %2.8B, #7" : "=w" (result) : "w"
> (a), "w" (b)); break;
>    }
>    return result;
> }
> 
> int main(int argc, char **argv) {
>    uint64x1_t a, b, result;
>    a[0]=0x0011223344556677;
>    b[0]=0x8899aabbccddeeff;
>    for(int c=0; c<8; c++) {
>      result=ext(a, b, c);
>      printf("%d %016lx\n", c, result[0]);
>    }
>    return 0;
> }
> 
> output:
> 
> 0 0011223344556677

For index 0 you have the same number back as was in a[0].
There is no point in the compiler emitting an instruction to get the same number back that it had as the input.

Regards,
Tamar

> 1 ff00112233445566
> 2 eeff001122334455
> 3 ddeeff0011223344
> 4 ccddeeff00112233
> 5 bbccddeeff001122
> 6 aabbccddeeff0011
> 7 99aabbccddeeff00
> 
> Kind regards, Jochen
> 
> Am 09.09.20 um 11:17 schrieb Tamar Christina:
> > Hi Jochen,
> >
> > EXT is a byte level extract, if you have a 64 bit vector and a 64-bit
> > type like uint64x1_t then the only possible index for n is 0.
> >
> > While the compiler could have emitted
> >
> > ext     v0.8b, v0.8b, v1.8b, #0
> >
> > this is pointless as this essentially means to return v0.
> >
> > As such the compiler just uses return __a; as there's no point in emitting an
> instruction.
> >
> > Regards,
> > Tamar
> >
> >> -----Original Message-----
> >> From: Gcc-help <gcc-help-bounces@gcc.gnu.org> On Behalf Of Jochen
> >> Barth via Gcc-help
> >> Sent: Wednesday, September 2, 2020 10:10 PM
> >> To: gcc-help@gcc.gnu.org
> >> Subject: arm_neon.h / vext_u64 (uint64x1_t __a, uint64x1_t __b,
> >> __const int __c)
> >>
> >> Dear reader,
> >>
> >> the definition of aarch64/arm_neon.h (gcc 10.2) is
> >>
> >> __extension__ extern __inline uint64x1_t __attribute__
> >> ((__always_inline__, __gnu_inline__, __artificial__))
> >> vext_u64 (uint64x1_t __a, uint64x1_t __b, __const int __c) {
> >>     __AARCH64_LANE_CHECK (__a, __c);
> >>     /* The only possible index to the assembler instruction returns
> >> element 0.  */
> >>     return __a;
> >> }
> >>
> >> So this function does essentially »return __a«.
> >>
> >> If the function name »vext_...« has, as the name suggests, something
> >> to do with the »ext« neon simd instruction,
> >>
> >> then I do not understand where the asm-equivalent »ext« neon
> >> instrinct is, because in the »Arm Architecture Reference Manual«,
> >> chapter C7.2.543
> >> states: »<index> Is the lowest numbered byte element to be
> >> extracted...«, ranging from 0..7 for Q=8 and 0..15 for Q=16
> >> (extraction over the whole 128 bit register).
> >>
> >> PS: gcc with vector expressions does not (?) use »ext« for
> >> y=(x<<(c*8))
> >> | (x>>(64-c*8)); // for Q=8
> >>
> >> Kind regards, Jochen
> > IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended recipient,
> please notify the sender immediately and do not disclose the contents to any
> other person, use it for any purpose, or store or copy the information in any
> medium. Thank you.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: arm_neon.h / vext_u64 (uint64x1_t __a, uint64x1_t __b, __const int __c)
  2020-09-10  7:54     ` Tamar Christina
@ 2020-09-10  8:14       ` Jochen Barth
  2020-09-10  8:34         ` Tamar Christina
  0 siblings, 1 reply; 6+ messages in thread
From: Jochen Barth @ 2020-09-10  8:14 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-help

Dear Tamar,

Sorry, I do no get the point:

>>> EXT is a byte level extract, if you have a 64 bit vector and a 64-bit
>>> type like uint64x1_t then the only possible index for n is 0.

But my previous examples with n=c=1..7 showed that different (n=c)'s are 
possible,

why is "the only possible index for n=0" ?

Kind regards, Jochen

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: arm_neon.h / vext_u64 (uint64x1_t __a, uint64x1_t __b, __const int __c)
  2020-09-10  8:14       ` Jochen Barth
@ 2020-09-10  8:34         ` Tamar Christina
  2020-09-10 10:35           ` Jochen Barth
  0 siblings, 1 reply; 6+ messages in thread
From: Tamar Christina @ 2020-09-10  8:34 UTC (permalink / raw)
  To: Jochen Barth; +Cc: gcc-help

Hi Jochen,

> -----Original Message-----
> From: Jochen Barth <jpunktbarth@gmail.com>
> Sent: Thursday, September 10, 2020 9:14 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-help <gcc-help@gcc.gnu.org>
> Subject: Re: arm_neon.h / vext_u64 (uint64x1_t __a, uint64x1_t __b,
> __const int __c)
> 
> Dear Tamar,
> 
> Sorry, I do no get the point:
> 
> >>> EXT is a byte level extract, if you have a 64 bit vector and a
> >>> 64-bit type like uint64x1_t then the only possible index for n is 0.
> 

Because those intrinsics are not doing byte level extraction. They are convenience functions that
do not allow partial extraction of a type. For instance vext_s16 which takes an int16x4_t as input
restricts the values of n to 0 to 3 because when used with the EXT instruction it always
makes sure they're a multiple of 2 bytes since a int16 is two bytes.

A uint64x1_t is a vector of 8 bytes which the intrinsic does as a group of 8 bytes since it
Always wants to extract whole numbers. As such the only possible index is 0.

To get the behavior you have in your example you need to do the extraction on bytes using
vext_u8 which will allow you to corrupt the number. i.e.

what you want is

 vreinterpret_u64_u8 (vext_u8 (vreinterpret_u8_u64 (a), vreinterpret_u8_u64 (b), <number>))

where your extraction happens on bytes. In this case n has the range 0-7.

Instead of looking at the Arm ARM you should look at the definition of the intrinsics
https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics

Regards,
Tamar

> But my previous examples with n=c=1..7 showed that different (n=c)'s are
> possible,
> 
> why is "the only possible index for n=0" ?
> 
> Kind regards, Jochen

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: arm_neon.h / vext_u64 (uint64x1_t __a, uint64x1_t __b, __const int __c)
  2020-09-10  8:34         ` Tamar Christina
@ 2020-09-10 10:35           ` Jochen Barth
  0 siblings, 0 replies; 6+ messages in thread
From: Jochen Barth @ 2020-09-10 10:35 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-help

Thanks a lot! Regards, Jochen

Von meinem iPhone gesendet

> Am 10.09.2020 um 10:35 schrieb Tamar Christina <Tamar.Christina@arm.com>:
> 
> Hi Jochen,
> 
>> -----Original Message-----
>> From: Jochen Barth <jpunktbarth@gmail.com>
>> Sent: Thursday, September 10, 2020 9:14 AM
>> To: Tamar Christina <Tamar.Christina@arm.com>
>> Cc: gcc-help <gcc-help@gcc.gnu.org>
>> Subject: Re: arm_neon.h / vext_u64 (uint64x1_t __a, uint64x1_t __b,
>> __const int __c)
>> 
>> Dear Tamar,
>> 
>> Sorry, I do no get the point:
>> 
>>>>> EXT is a byte level extract, if you have a 64 bit vector and a
>>>>> 64-bit type like uint64x1_t then the only possible index for n is 0.
>> 
> 
> Because those intrinsics are not doing byte level extraction. They are convenience functions that
> do not allow partial extraction of a type. For instance vext_s16 which takes an int16x4_t as input
> restricts the values of n to 0 to 3 because when used with the EXT instruction it always
> makes sure they're a multiple of 2 bytes since a int16 is two bytes.
> 
> A uint64x1_t is a vector of 8 bytes which the intrinsic does as a group of 8 bytes since it
> Always wants to extract whole numbers. As such the only possible index is 0.
> 
> To get the behavior you have in your example you need to do the extraction on bytes using
> vext_u8 which will allow you to corrupt the number. i.e.
> 
> what you want is
> 
> vreinterpret_u64_u8 (vext_u8 (vreinterpret_u8_u64 (a), vreinterpret_u8_u64 (b), <number>))
> 
> where your extraction happens on bytes. In this case n has the range 0-7.
> 
> Instead of looking at the Arm ARM you should look at the definition of the intrinsics
> https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics
> 
> Regards,
> Tamar
> 
>> But my previous examples with n=c=1..7 showed that different (n=c)'s are
>> possible,
>> 
>> why is "the only possible index for n=0" ?
>> 
>> Kind regards, Jochen

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-09-10 10:35 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-02 21:10 arm_neon.h / vext_u64 (uint64x1_t __a, uint64x1_t __b, __const int __c) Jochen Barth
2020-09-02 21:11 ` Jochen Barth
     [not found] ` <VI1PR08MB53253664D3E7644526965E27FF260@VI1PR08MB5325.eurprd08.prod.outlook.com>
     [not found]   ` <e06ec9ce-78ec-8eb2-83c4-4d3eda9e18f4@gmail.com>
2020-09-10  7:54     ` Tamar Christina
2020-09-10  8:14       ` Jochen Barth
2020-09-10  8:34         ` Tamar Christina
2020-09-10 10:35           ` Jochen Barth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).