public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* Use-case for _addcarryx_u64() wrapper
@ 2023-06-01  7:42 Mason
  2023-06-01  8:40 ` Uros Bizjak
  2023-06-03 11:37 ` Mason
  0 siblings, 2 replies; 11+ messages in thread
From: Mason @ 2023-06-01  7:42 UTC (permalink / raw)
  To: gcc-help; +Cc: Uros Bizjak, Jakub Jelinek, Jeffrey Walton, Marc Glisse

Hello,

As far as I can tell, intrinsics _addcarry_u64() and _addcarryx_u64() are
plain wrappers around the same __builtin_ia32_addcarryx_u64() function.

https://github.com/gcc-mirror/gcc/blob/master/gcc/config/i386/adxintrin.h

Thus, I wonder: what is the use-case for the wrappers?
Why would a programmer not call the builtin directly?
Is it for compatibility with Intel compilers?

Also, based on the names, I would have assumed that
_addcarry_u64 generates adc
while
_addcarryx_u64 generates adcx/adox ?

Relevant past discussion:
https://gcc.gnu.org/legacy-ml/gcc-help/2017-08/msg00100.html

Regards


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Use-case for _addcarryx_u64() wrapper
  2023-06-01  7:42 Use-case for _addcarryx_u64() wrapper Mason
@ 2023-06-01  8:40 ` Uros Bizjak
  2023-06-02 12:45   ` Mason
  2023-06-03 11:37 ` Mason
  1 sibling, 1 reply; 11+ messages in thread
From: Uros Bizjak @ 2023-06-01  8:40 UTC (permalink / raw)
  To: Mason; +Cc: gcc-help, Jakub Jelinek, Jeffrey Walton, Marc Glisse

On Thu, Jun 1, 2023 at 9:42 AM Mason <slash.tmp@free.fr> wrote:
>
> Hello,
>
> As far as I can tell, intrinsics _addcarry_u64() and _addcarryx_u64() are
> plain wrappers around the same __builtin_ia32_addcarryx_u64() function.
>
> https://github.com/gcc-mirror/gcc/blob/master/gcc/config/i386/adxintrin.h
>
> Thus, I wonder: what is the use-case for the wrappers?
> Why would a programmer not call the builtin directly?
> Is it for compatibility with Intel compilers?

Builtins are internal implementation detail, it is not published API.
Although rarely, builtins can be changed for some reason or another,
while intrinsic functions from adxintrin.h follow published API.

> Also, based on the names, I would have assumed that
> _addcarry_u64 generates adc
> while
> _addcarryx_u64 generates adcx/adox ?

No, they all generate add/adc. There is no use case to maintain two
interleaved carry chains, IOW rewriting:

--cut here--
#include <immintrin.h>

int foo (int A, int B, int D, int E)
{
  _Bool carry1 = 0, carry2 = 0;
  int C, F;

  carry1 = _addcarryx_u32 (carry1, A, B, &C);
  carry2 = _addcarryx_u32 (carry2, D, E, &F);
  carry1 = _addcarryx_u32 (carry1, A, B, &C);
  carry2 = _addcarryx_u32 (carry2, D, E, &F);

  return C + F;
}
--cut here--

to:

--cut here--
#include <immintrin.h>

int foo (int A, int B, int D, int E)
{
  _Bool carry1 = 0, carry2 = 0;
  int C, F;

  carry1 = _addcarryx_u32 (carry1, A, B, &C);
  carry1 = _addcarryx_u32 (carry1, A, B, &C);
  carry2 = _addcarryx_u32 (carry2, D, E, &F);
  carry2 = _addcarryx_u32 (carry2, D, E, &F);

  return C + F;
}
--cut here--

will give you:

       movl    %edi, %eax
       addl    %esi, %eax
       movl    %edx, %eax
       adcl    %esi, %edi
       addl    %ecx, %eax
       adcl    %ecx, %edx
       leal    (%rdi,%rdx), %eax
       ret

Uros.

> Relevant past discussion:
> https://gcc.gnu.org/legacy-ml/gcc-help/2017-08/msg00100.html
>
> Regards
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Use-case for _addcarryx_u64() wrapper
  2023-06-01  8:40 ` Uros Bizjak
@ 2023-06-02 12:45   ` Mason
  2023-06-02 12:50     ` Jeffrey Walton
  2023-06-02 12:59     ` Jakub Jelinek
  0 siblings, 2 replies; 11+ messages in thread
From: Mason @ 2023-06-02 12:45 UTC (permalink / raw)
  To: gcc-help, Uros Bizjak; +Cc: Jakub Jelinek, Jeffrey Walton, Marc Glisse

Hello Uros :)

On 01/06/2023 10:40, Uros Bizjak wrote:

> On Thu, Jun 1, 2023 at 9:42 AM Mason wrote:
>
>> As far as I can tell, intrinsics _addcarry_u64() and _addcarryx_u64() are
>> plain wrappers around the same __builtin_ia32_addcarryx_u64() function.
>>
>> https://github.com/gcc-mirror/gcc/blob/master/gcc/config/i386/adxintrin.h
>>
>> Thus, I wonder: what is the use-case for the wrappers?
>> Why would a programmer not call the builtin directly?
>> Is it for compatibility with Intel compilers?
> 
> Builtins are internal implementation detail, it is not published API.
> Although rarely, builtins can be changed for some reason or another,
> while intrinsic functions from adxintrin.h follow published API.

I'm confused.
Built-ins are officially documented:
https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html

Using Vector Instructions through Built-in Functions
Legacy __sync Built-in Functions for Atomic Memory Access
Built-in Functions for Memory Model Aware Atomic Operations
Built-in Functions to Perform Arithmetic with Overflow Checking
Other Built-in Functions Provided by GCC
Built-in Functions Specific to Particular Target Machines

What do you mean by "not published API" ?
Or perhaps you meant __builtin_ia32_addcarryx_u64 specifically?


>> Also, based on the names, I would have assumed that
>> _addcarry_u64 generates adc
>> while
>> _addcarryx_u64 generates adcx/adox ?
> 
> No, they all generate add/adc.

Why are there two wrappers for the same function?
Is it because the API was not designed by GCC?
(Intel ICC intrinsics perhaps?)


> There is no use case to maintain two
> interleaved carry chains,

What do you mean by "there is no use-case" ?
Are you saying ADCX/ADOX are useless?

Regards



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Use-case for _addcarryx_u64() wrapper
  2023-06-02 12:45   ` Mason
@ 2023-06-02 12:50     ` Jeffrey Walton
  2023-06-03  9:10       ` Mason
  2023-06-02 12:59     ` Jakub Jelinek
  1 sibling, 1 reply; 11+ messages in thread
From: Jeffrey Walton @ 2023-06-02 12:50 UTC (permalink / raw)
  To: Mason; +Cc: gcc-help, Uros Bizjak, Jakub Jelinek, Marc Glisse

On Fri, Jun 2, 2023 at 8:45 AM Mason <slash.tmp@free.fr> wrote:
>
> On 01/06/2023 10:40, Uros Bizjak wrote:
> [...]
> > There is no use case to maintain two
> > interleaved carry chains,
>
> What do you mean by "there is no use-case" ?
> Are you saying ADCX/ADOX are useless?

The advertised use case for dual addc instructions is big integer
operations. I read that somewhere in the Intel docs several years ago.

When I benchmarked ADCX/ADOX several years ago, ADCX/ADOX were slower
than using a single ADDC. So I stayed with ADDC. Maybe cpu
manufacturers have improved things nowadays.

Jeff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Use-case for _addcarryx_u64() wrapper
  2023-06-02 12:45   ` Mason
  2023-06-02 12:50     ` Jeffrey Walton
@ 2023-06-02 12:59     ` Jakub Jelinek
  2023-06-02 22:53       ` Gabriel Ravier
  1 sibling, 1 reply; 11+ messages in thread
From: Jakub Jelinek @ 2023-06-02 12:59 UTC (permalink / raw)
  To: Mason; +Cc: gcc-help, Uros Bizjak, Jeffrey Walton, Marc Glisse

On Fri, Jun 02, 2023 at 02:45:40PM +0200, Mason wrote:
> On 01/06/2023 10:40, Uros Bizjak wrote:
> 
> > On Thu, Jun 1, 2023 at 9:42 AM Mason wrote:
> >
> >> As far as I can tell, intrinsics _addcarry_u64() and _addcarryx_u64() are
> >> plain wrappers around the same __builtin_ia32_addcarryx_u64() function.
> >>
> >> https://github.com/gcc-mirror/gcc/blob/master/gcc/config/i386/adxintrin.h
> >>
> >> Thus, I wonder: what is the use-case for the wrappers?
> >> Why would a programmer not call the builtin directly?
> >> Is it for compatibility with Intel compilers?
> > 
> > Builtins are internal implementation detail, it is not published API.
> > Although rarely, builtins can be changed for some reason or another,
> > while intrinsic functions from adxintrin.h follow published API.
> 
> I'm confused.
> Built-ins are officially documented:
> https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html

Sure, some builtins are officially supported.
Those are meant to be used directly by users.

Then there are many builtins which are implementation detail for some
other API that users should use instead.
That includes e.g. builtins used under the hood for <*intrin.h>
implementation - users should use the intrinsics from those headers,
that is documented interface which is supported by multiple compilers,
or builtins used under the hood inside of libstdc++ headers (again,
users should use standard C++ APIs which are supported by multiple
compilers instead of the builtins directly) etc.
E.g. between GCC 3.4 and current trunk 62 __builtin_ia32_* builtins
which were implementation details of the x86 intrinsic headers
have been removed as the intrinsics got implemented some other way
(e.g. using generic vectors etc.).

Some builtins are in both categories, e.g. __atomic_* builtins
are both used in C++ <atomic> APIs, when using C++ one should use those,
or in C <stdatomic.h> APIs, but one can use them directly as well.

	Jakub


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Use-case for _addcarryx_u64() wrapper
  2023-06-02 12:59     ` Jakub Jelinek
@ 2023-06-02 22:53       ` Gabriel Ravier
  2023-06-03  8:53         ` Mason
  0 siblings, 1 reply; 11+ messages in thread
From: Gabriel Ravier @ 2023-06-02 22:53 UTC (permalink / raw)
  To: Jakub Jelinek, Mason; +Cc: gcc-help, Uros Bizjak, Jeffrey Walton, Marc Glisse

On 6/2/23 14:59, Jakub Jelinek via Gcc-help wrote:
> On Fri, Jun 02, 2023 at 02:45:40PM +0200, Mason wrote:
>> On 01/06/2023 10:40, Uros Bizjak wrote:
>>
>>> On Thu, Jun 1, 2023 at 9:42 AM Mason wrote:
>>>
>>>> As far as I can tell, intrinsics _addcarry_u64() and _addcarryx_u64() are
>>>> plain wrappers around the same __builtin_ia32_addcarryx_u64() function.
>>>>
>>>> https://github.com/gcc-mirror/gcc/blob/master/gcc/config/i386/adxintrin.h
>>>>
>>>> Thus, I wonder: what is the use-case for the wrappers?
>>>> Why would a programmer not call the builtin directly?
>>>> Is it for compatibility with Intel compilers?
>>> Builtins are internal implementation detail, it is not published API.
>>> Although rarely, builtins can be changed for some reason or another,
>>> while intrinsic functions from adxintrin.h follow published API.
>> I'm confused.
>> Built-ins are officially documented:
>> https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html
> Sure, some builtins are officially supported.
> Those are meant to be used directly by users.
>
> Then there are many builtins which are implementation detail for some
> other API that users should use instead.
> That includes e.g. builtins used under the hood for <*intrin.h>
> implementation - users should use the intrinsics from those headers,
> that is documented interface which is supported by multiple compilers,
> or builtins used under the hood inside of libstdc++ headers (again,
> users should use standard C++ APIs which are supported by multiple
> compilers instead of the builtins directly) etc.
> E.g. between GCC 3.4 and current trunk 62 __builtin_ia32_* builtins
> which were implementation details of the x86 intrinsic headers
> have been removed as the intrinsics got implemented some other way
> (e.g. using generic vectors etc.).
Does it matter whether or not those builtins are documented ? It seems 
like most of the __builtin_ia32_ builtins are explicitly documented in 
the manual, despite the fact that these seem like those you're referring 
to as being potentially removable at will whenever - and I see no 
indication in the documentation that they are implementation details/may 
be removed at any time for any reason.
>
> Some builtins are in both categories, e.g. __atomic_* builtins
> are both used in C++ <atomic> APIs, when using C++ one should use those,
> or in C <stdatomic.h> APIs, but one can use them directly as well.
>
> 	Jakub
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Use-case for _addcarryx_u64() wrapper
  2023-06-02 22:53       ` Gabriel Ravier
@ 2023-06-03  8:53         ` Mason
  2023-06-03  9:09           ` Jakub Jelinek
  0 siblings, 1 reply; 11+ messages in thread
From: Mason @ 2023-06-03  8:53 UTC (permalink / raw)
  To: Gabriel Ravier, Jakub Jelinek
  Cc: gcc-help, Uros Bizjak, Jeffrey Walton, Marc Glisse

On 3/06/2023 00:53, Gabriel Ravier wrote:

> On 2/06/2023 14:59, Jakub Jelinek wrote:
> 
>> Sure, some builtins are officially supported.
>> Those are meant to be used directly by users.
>>
>> Then there are many builtins which are implementation detail for some
>> other API that users should use instead.
>> That includes e.g. builtins used under the hood for <*intrin.h>
>> implementation - users should use the intrinsics from those headers,
>> that is documented interface which is supported by multiple compilers,
>> or builtins used under the hood inside of libstdc++ headers (again,
>> users should use standard C++ APIs which are supported by multiple
>> compilers instead of the builtins directly) etc.
>> E.g. between GCC 3.4 and current trunk 62 __builtin_ia32_* builtins
>> which were implementation details of the x86 intrinsic headers
>> have been removed as the intrinsics got implemented some other way
>> (e.g. using generic vectors etc.).
>
> Does it matter whether or not those builtins are documented ? It seems 
> like most of the __builtin_ia32_ builtins are explicitly documented in 
> the manual, despite the fact that these seem like those you're referring 
> to as being potentially removable at will whenever - and I see no 
> indication in the documentation that they are implementation details/may 
> be removed at any time for any reason.

Hello Gabriel,

As far as I understand, Jakub is merely saying:
If a builtin is NOT documented, then it is NOT part of the API.

https://gcc.gnu.org/onlinedocs/gcc.pdf
*builtin_ia32_addcarry* appears nowhere in the manual,
thus it is NOT documented, thus it is NOT part of the API.

Regards


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Use-case for _addcarryx_u64() wrapper
  2023-06-03  8:53         ` Mason
@ 2023-06-03  9:09           ` Jakub Jelinek
  0 siblings, 0 replies; 11+ messages in thread
From: Jakub Jelinek @ 2023-06-03  9:09 UTC (permalink / raw)
  To: Mason; +Cc: Gabriel Ravier, gcc-help, Uros Bizjak, Jeffrey Walton, Marc Glisse

On Sat, Jun 03, 2023 at 10:53:04AM +0200, Mason wrote:
> >> Sure, some builtins are officially supported.
> >> Those are meant to be used directly by users.
> >>
> >> Then there are many builtins which are implementation detail for some
> >> other API that users should use instead.
> >> That includes e.g. builtins used under the hood for <*intrin.h>
> >> implementation - users should use the intrinsics from those headers,
> >> that is documented interface which is supported by multiple compilers,
> >> or builtins used under the hood inside of libstdc++ headers (again,
> >> users should use standard C++ APIs which are supported by multiple
> >> compilers instead of the builtins directly) etc.
> >> E.g. between GCC 3.4 and current trunk 62 __builtin_ia32_* builtins
> >> which were implementation details of the x86 intrinsic headers
> >> have been removed as the intrinsics got implemented some other way
> >> (e.g. using generic vectors etc.).
> >
> > Does it matter whether or not those builtins are documented ? It seems 
> > like most of the __builtin_ia32_ builtins are explicitly documented in 
> > the manual, despite the fact that these seem like those you're referring 
> > to as being potentially removable at will whenever - and I see no 
> > indication in the documentation that they are implementation details/may 
> > be removed at any time for any reason.
> 
> Hello Gabriel,
> 
> As far as I understand, Jakub is merely saying:
> If a builtin is NOT documented, then it is NOT part of the API.
> 
> https://gcc.gnu.org/onlinedocs/gcc.pdf
> *builtin_ia32_addcarry* appears nowhere in the manual,
> thus it is NOT documented, thus it is NOT part of the API.

Roughly, except that some builtins are mistakenly documented when they
shouldn't be.  I think e.g. none of the __builtin_ia32_* builtins are actually
supported.
The documentation even mentions some __builtin_ia32_* builtins which were
removed years ago.  E.g. __builtin_ia32_paddw128 but lots of similar ones in GCC 5
(2015).

	Jakub


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Use-case for _addcarryx_u64() wrapper
  2023-06-02 12:50     ` Jeffrey Walton
@ 2023-06-03  9:10       ` Mason
  0 siblings, 0 replies; 11+ messages in thread
From: Mason @ 2023-06-03  9:10 UTC (permalink / raw)
  To: Jeffrey Walton; +Cc: gcc-help, Uros Bizjak, Jakub Jelinek, Marc Glisse

On 2/06/2023 14:50, Jeffrey Walton wrote:

> The advertised use case for dual addc instructions is big integer
> operations. I read that somewhere in the Intel docs several years ago.

I know ;)
That's why I added you to CC:
https://gcc.gnu.org/legacy-ml/gcc-help/2017-08/msg00085.html

https://stackoverflow.com/questions/29747508/what-is-the-difference-between-the-adc-and-adcx-instructions-on-ia32-ia64


> When I benchmarked ADCX/ADOX several years ago, ADCX/ADOX were slower
> than using a single ADDC. So I stayed with ADDC. Maybe CPU
> manufacturers have improved things nowadays.

That's what I'm working on right now.
Thanks for sharing.

Regards


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Use-case for _addcarryx_u64() wrapper
  2023-06-01  7:42 Use-case for _addcarryx_u64() wrapper Mason
  2023-06-01  8:40 ` Uros Bizjak
@ 2023-06-03 11:37 ` Mason
  2023-06-03 11:49   ` Jakub Jelinek
  1 sibling, 1 reply; 11+ messages in thread
From: Mason @ 2023-06-03 11:37 UTC (permalink / raw)
  To: Uros Bizjak, Jakub Jelinek; +Cc: gcc-help, Jeffrey Walton, Marc Glisse

On 01/06/2023 09:42, Mason wrote:

> As far as I can tell, intrinsics _addcarry_u64() and _addcarryx_u64() are
> plain wrappers around the same __builtin_ia32_addcarryx_u64() function.
> 
> https://github.com/gcc-mirror/gcc/blob/master/gcc/config/i386/adxintrin.h

Hello Uros, Jakub,

I want to report a missed-optimization bug with _addcarry_u64().
(I can file an issue on Bugzilla, if you deem it appropriate.)


#include <x86intrin.h>
typedef unsigned long long u64;
typedef unsigned __int128 u128;
void testcase1(u64 *acc, u64 a, u64 b)
{
  u128 res = (u128)a*b;
  u64 lo = res, hi = res >> 64;
  unsigned char cf = 0;
  cf = _addcarry_u64(cf, lo, acc[0], acc+0);
  cf = _addcarry_u64(cf, hi, acc[1], acc+1);
  cf = _addcarry_u64(cf,  0, acc[2], acc+2);
}
void testcase2(u64 *acc, u64 a, u64 b)
{
  u128 res = (u128)a * b;
  u64 lo = res, hi = res >> 64;
  asm("add %[LO], %[D0]\n\t" "adc %[HI], %[D1]\n\t" "adc $0, %[D2]" :
  [D0] "+m" (acc[0]), [D1] "+m" (acc[1]), [D2] "+m" (acc[2]) :
  [LO] "r" (lo), [HI] "r" (hi) : "cc");
}

gcc-trunk -Wall -Wextra -O3 -S testcase.c
(Same code generated with -Os)

/*** rdi = acc, rsi = a, rdx = b ***/

testcase1:
  movq %rsi, %rax
  mulq %rdx
  addq %rax, (%rdi)
  movq %rdx, %rax
  adcq 8(%rdi), %rax
  adcq $0, 16(%rdi)
  movq %rax, 8(%rdi)
  ret

testcase2:
  movq %rsi, %rax	; rax = rsi = a
  mulq %rdx		; rdx:rax = rax*rdx = a*b
  add %rax, (%rdi)	; acc[0] += lo
  adc %rdx, 8(%rdi)	; acc[1] += hi + cf
  adc $0, 16(%rdi)	; acc[2] += cf
  ret


As you can see, gcc generates the expected code for testcase2,
but it generates sub-optimal code for testcase1:

  movq %rdx, %rax
  adcq 8(%rdi), %rax
  movq %rax, 8(%rdi)

instead of

  adc %rdx, 8(%rdi)	; acc[1] += hi + cf


Do you know why it's missing the optimization?

Regards


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Use-case for _addcarryx_u64() wrapper
  2023-06-03 11:37 ` Mason
@ 2023-06-03 11:49   ` Jakub Jelinek
  0 siblings, 0 replies; 11+ messages in thread
From: Jakub Jelinek @ 2023-06-03 11:49 UTC (permalink / raw)
  To: Mason; +Cc: Uros Bizjak, gcc-help, Jeffrey Walton, Marc Glisse

On Sat, Jun 03, 2023 at 01:37:53PM +0200, Mason wrote:
> On 01/06/2023 09:42, Mason wrote:
> 
> > As far as I can tell, intrinsics _addcarry_u64() and _addcarryx_u64() are
> > plain wrappers around the same __builtin_ia32_addcarryx_u64() function.
> > 
> > https://github.com/gcc-mirror/gcc/blob/master/gcc/config/i386/adxintrin.h
> 
> Hello Uros, Jakub,
> 
> I want to report a missed-optimization bug with _addcarry_u64().
> (I can file an issue on Bugzilla, if you deem it appropriate.)

Filing this in bugzilla is the right way to go.
I think we'll need to do something about this stuff urgently on
most of the arches anyway, see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989#c56
But what your testcase shows is a separate issue, so should be
filed separately.
Thanks.

	Jakub


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-06-03 11:51 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-01  7:42 Use-case for _addcarryx_u64() wrapper Mason
2023-06-01  8:40 ` Uros Bizjak
2023-06-02 12:45   ` Mason
2023-06-02 12:50     ` Jeffrey Walton
2023-06-03  9:10       ` Mason
2023-06-02 12:59     ` Jakub Jelinek
2023-06-02 22:53       ` Gabriel Ravier
2023-06-03  8:53         ` Mason
2023-06-03  9:09           ` Jakub Jelinek
2023-06-03 11:37 ` Mason
2023-06-03 11:49   ` Jakub Jelinek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).