* Use-case for _addcarryx_u64() wrapper
@ 2023-06-01 7:42 Mason
2023-06-01 8:40 ` Uros Bizjak
2023-06-03 11:37 ` Mason
0 siblings, 2 replies; 11+ messages in thread
From: Mason @ 2023-06-01 7:42 UTC (permalink / raw)
To: gcc-help; +Cc: Uros Bizjak, Jakub Jelinek, Jeffrey Walton, Marc Glisse
Hello,
As far as I can tell, intrinsics _addcarry_u64() and _addcarryx_u64() are
plain wrappers around the same __builtin_ia32_addcarryx_u64() function.
https://github.com/gcc-mirror/gcc/blob/master/gcc/config/i386/adxintrin.h
Thus, I wonder: what is the use-case for the wrappers?
Why would a programmer not call the builtin directly?
Is it for compatibility with Intel compilers?
Also, based on the names, I would have assumed that
_addcarry_u64 generates adc
while
_addcarryx_u64 generates adcx/adox ?
Relevant past discussion:
https://gcc.gnu.org/legacy-ml/gcc-help/2017-08/msg00100.html
Regards
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Use-case for _addcarryx_u64() wrapper
2023-06-01 7:42 Use-case for _addcarryx_u64() wrapper Mason
@ 2023-06-01 8:40 ` Uros Bizjak
2023-06-02 12:45 ` Mason
2023-06-03 11:37 ` Mason
1 sibling, 1 reply; 11+ messages in thread
From: Uros Bizjak @ 2023-06-01 8:40 UTC (permalink / raw)
To: Mason; +Cc: gcc-help, Jakub Jelinek, Jeffrey Walton, Marc Glisse
On Thu, Jun 1, 2023 at 9:42 AM Mason <slash.tmp@free.fr> wrote:
>
> Hello,
>
> As far as I can tell, intrinsics _addcarry_u64() and _addcarryx_u64() are
> plain wrappers around the same __builtin_ia32_addcarryx_u64() function.
>
> https://github.com/gcc-mirror/gcc/blob/master/gcc/config/i386/adxintrin.h
>
> Thus, I wonder: what is the use-case for the wrappers?
> Why would a programmer not call the builtin directly?
> Is it for compatibility with Intel compilers?
Builtins are internal implementation detail, it is not published API.
Although rarely, builtins can be changed for some reason or another,
while intrinsic functions from adxintrin.h follow published API.
> Also, based on the names, I would have assumed that
> _addcarry_u64 generates adc
> while
> _addcarryx_u64 generates adcx/adox ?
No, they all generate add/adc. There is no use case to maintain two
interleaved carry chains, IOW rewriting:
--cut here--
#include <immintrin.h>
int foo (int A, int B, int D, int E)
{
_Bool carry1 = 0, carry2 = 0;
int C, F;
carry1 = _addcarryx_u32 (carry1, A, B, &C);
carry2 = _addcarryx_u32 (carry2, D, E, &F);
carry1 = _addcarryx_u32 (carry1, A, B, &C);
carry2 = _addcarryx_u32 (carry2, D, E, &F);
return C + F;
}
--cut here--
to:
--cut here--
#include <immintrin.h>
int foo (int A, int B, int D, int E)
{
_Bool carry1 = 0, carry2 = 0;
int C, F;
carry1 = _addcarryx_u32 (carry1, A, B, &C);
carry1 = _addcarryx_u32 (carry1, A, B, &C);
carry2 = _addcarryx_u32 (carry2, D, E, &F);
carry2 = _addcarryx_u32 (carry2, D, E, &F);
return C + F;
}
--cut here--
will give you:
movl %edi, %eax
addl %esi, %eax
movl %edx, %eax
adcl %esi, %edi
addl %ecx, %eax
adcl %ecx, %edx
leal (%rdi,%rdx), %eax
ret
Uros.
> Relevant past discussion:
> https://gcc.gnu.org/legacy-ml/gcc-help/2017-08/msg00100.html
>
> Regards
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Use-case for _addcarryx_u64() wrapper
2023-06-01 8:40 ` Uros Bizjak
@ 2023-06-02 12:45 ` Mason
2023-06-02 12:50 ` Jeffrey Walton
2023-06-02 12:59 ` Jakub Jelinek
0 siblings, 2 replies; 11+ messages in thread
From: Mason @ 2023-06-02 12:45 UTC (permalink / raw)
To: gcc-help, Uros Bizjak; +Cc: Jakub Jelinek, Jeffrey Walton, Marc Glisse
Hello Uros :)
On 01/06/2023 10:40, Uros Bizjak wrote:
> On Thu, Jun 1, 2023 at 9:42 AM Mason wrote:
>
>> As far as I can tell, intrinsics _addcarry_u64() and _addcarryx_u64() are
>> plain wrappers around the same __builtin_ia32_addcarryx_u64() function.
>>
>> https://github.com/gcc-mirror/gcc/blob/master/gcc/config/i386/adxintrin.h
>>
>> Thus, I wonder: what is the use-case for the wrappers?
>> Why would a programmer not call the builtin directly?
>> Is it for compatibility with Intel compilers?
>
> Builtins are internal implementation detail, it is not published API.
> Although rarely, builtins can be changed for some reason or another,
> while intrinsic functions from adxintrin.h follow published API.
I'm confused.
Built-ins are officially documented:
https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html
Using Vector Instructions through Built-in Functions
Legacy __sync Built-in Functions for Atomic Memory Access
Built-in Functions for Memory Model Aware Atomic Operations
Built-in Functions to Perform Arithmetic with Overflow Checking
Other Built-in Functions Provided by GCC
Built-in Functions Specific to Particular Target Machines
What do you mean by "not published API" ?
Or perhaps you meant __builtin_ia32_addcarryx_u64 specifically?
>> Also, based on the names, I would have assumed that
>> _addcarry_u64 generates adc
>> while
>> _addcarryx_u64 generates adcx/adox ?
>
> No, they all generate add/adc.
Why are there two wrappers for the same function?
Is it because the API was not designed by GCC?
(Intel ICC intrinsics perhaps?)
> There is no use case to maintain two
> interleaved carry chains,
What do you mean by "there is no use-case" ?
Are you saying ADCX/ADOX are useless?
Regards
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Use-case for _addcarryx_u64() wrapper
2023-06-02 12:45 ` Mason
@ 2023-06-02 12:50 ` Jeffrey Walton
2023-06-03 9:10 ` Mason
2023-06-02 12:59 ` Jakub Jelinek
1 sibling, 1 reply; 11+ messages in thread
From: Jeffrey Walton @ 2023-06-02 12:50 UTC (permalink / raw)
To: Mason; +Cc: gcc-help, Uros Bizjak, Jakub Jelinek, Marc Glisse
On Fri, Jun 2, 2023 at 8:45 AM Mason <slash.tmp@free.fr> wrote:
>
> On 01/06/2023 10:40, Uros Bizjak wrote:
> [...]
> > There is no use case to maintain two
> > interleaved carry chains,
>
> What do you mean by "there is no use-case" ?
> Are you saying ADCX/ADOX are useless?
The advertised use case for dual addc instructions is big integer
operations. I read that somewhere in the Intel docs several years ago.
When I benchmarked ADCX/ADOX several years ago, ADCX/ADOX were slower
than using a single ADDC. So I stayed with ADDC. Maybe cpu
manufacturers have improved things nowadays.
Jeff
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Use-case for _addcarryx_u64() wrapper
2023-06-02 12:45 ` Mason
2023-06-02 12:50 ` Jeffrey Walton
@ 2023-06-02 12:59 ` Jakub Jelinek
2023-06-02 22:53 ` Gabriel Ravier
1 sibling, 1 reply; 11+ messages in thread
From: Jakub Jelinek @ 2023-06-02 12:59 UTC (permalink / raw)
To: Mason; +Cc: gcc-help, Uros Bizjak, Jeffrey Walton, Marc Glisse
On Fri, Jun 02, 2023 at 02:45:40PM +0200, Mason wrote:
> On 01/06/2023 10:40, Uros Bizjak wrote:
>
> > On Thu, Jun 1, 2023 at 9:42 AM Mason wrote:
> >
> >> As far as I can tell, intrinsics _addcarry_u64() and _addcarryx_u64() are
> >> plain wrappers around the same __builtin_ia32_addcarryx_u64() function.
> >>
> >> https://github.com/gcc-mirror/gcc/blob/master/gcc/config/i386/adxintrin.h
> >>
> >> Thus, I wonder: what is the use-case for the wrappers?
> >> Why would a programmer not call the builtin directly?
> >> Is it for compatibility with Intel compilers?
> >
> > Builtins are internal implementation detail, it is not published API.
> > Although rarely, builtins can be changed for some reason or another,
> > while intrinsic functions from adxintrin.h follow published API.
>
> I'm confused.
> Built-ins are officially documented:
> https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html
Sure, some builtins are officially supported.
Those are meant to be used directly by users.
Then there are many builtins which are implementation detail for some
other API that users should use instead.
That includes e.g. builtins used under the hood for <*intrin.h>
implementation - users should use the intrinsics from those headers,
that is documented interface which is supported by multiple compilers,
or builtins used under the hood inside of libstdc++ headers (again,
users should use standard C++ APIs which are supported by multiple
compilers instead of the builtins directly) etc.
E.g. between GCC 3.4 and current trunk 62 __builtin_ia32_* builtins
which were implementation details of the x86 intrinsic headers
have been removed as the intrinsics got implemented some other way
(e.g. using generic vectors etc.).
Some builtins are in both categories, e.g. __atomic_* builtins
are both used in C++ <atomic> APIs, when using C++ one should use those,
or in C <stdatomic.h> APIs, but one can use them directly as well.
Jakub
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Use-case for _addcarryx_u64() wrapper
2023-06-02 12:59 ` Jakub Jelinek
@ 2023-06-02 22:53 ` Gabriel Ravier
2023-06-03 8:53 ` Mason
0 siblings, 1 reply; 11+ messages in thread
From: Gabriel Ravier @ 2023-06-02 22:53 UTC (permalink / raw)
To: Jakub Jelinek, Mason; +Cc: gcc-help, Uros Bizjak, Jeffrey Walton, Marc Glisse
On 6/2/23 14:59, Jakub Jelinek via Gcc-help wrote:
> On Fri, Jun 02, 2023 at 02:45:40PM +0200, Mason wrote:
>> On 01/06/2023 10:40, Uros Bizjak wrote:
>>
>>> On Thu, Jun 1, 2023 at 9:42 AM Mason wrote:
>>>
>>>> As far as I can tell, intrinsics _addcarry_u64() and _addcarryx_u64() are
>>>> plain wrappers around the same __builtin_ia32_addcarryx_u64() function.
>>>>
>>>> https://github.com/gcc-mirror/gcc/blob/master/gcc/config/i386/adxintrin.h
>>>>
>>>> Thus, I wonder: what is the use-case for the wrappers?
>>>> Why would a programmer not call the builtin directly?
>>>> Is it for compatibility with Intel compilers?
>>> Builtins are internal implementation detail, it is not published API.
>>> Although rarely, builtins can be changed for some reason or another,
>>> while intrinsic functions from adxintrin.h follow published API.
>> I'm confused.
>> Built-ins are officially documented:
>> https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html
> Sure, some builtins are officially supported.
> Those are meant to be used directly by users.
>
> Then there are many builtins which are implementation detail for some
> other API that users should use instead.
> That includes e.g. builtins used under the hood for <*intrin.h>
> implementation - users should use the intrinsics from those headers,
> that is documented interface which is supported by multiple compilers,
> or builtins used under the hood inside of libstdc++ headers (again,
> users should use standard C++ APIs which are supported by multiple
> compilers instead of the builtins directly) etc.
> E.g. between GCC 3.4 and current trunk 62 __builtin_ia32_* builtins
> which were implementation details of the x86 intrinsic headers
> have been removed as the intrinsics got implemented some other way
> (e.g. using generic vectors etc.).
Does it matter whether or not those builtins are documented ? It seems
like most of the __builtin_ia32_ builtins are explicitly documented in
the manual, despite the fact that these seem like those you're referring
to as being potentially removable at will whenever - and I see no
indication in the documentation that they are implementation details/may
be removed at any time for any reason.
>
> Some builtins are in both categories, e.g. __atomic_* builtins
> are both used in C++ <atomic> APIs, when using C++ one should use those,
> or in C <stdatomic.h> APIs, but one can use them directly as well.
>
> Jakub
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Use-case for _addcarryx_u64() wrapper
2023-06-02 22:53 ` Gabriel Ravier
@ 2023-06-03 8:53 ` Mason
2023-06-03 9:09 ` Jakub Jelinek
0 siblings, 1 reply; 11+ messages in thread
From: Mason @ 2023-06-03 8:53 UTC (permalink / raw)
To: Gabriel Ravier, Jakub Jelinek
Cc: gcc-help, Uros Bizjak, Jeffrey Walton, Marc Glisse
On 3/06/2023 00:53, Gabriel Ravier wrote:
> On 2/06/2023 14:59, Jakub Jelinek wrote:
>
>> Sure, some builtins are officially supported.
>> Those are meant to be used directly by users.
>>
>> Then there are many builtins which are implementation detail for some
>> other API that users should use instead.
>> That includes e.g. builtins used under the hood for <*intrin.h>
>> implementation - users should use the intrinsics from those headers,
>> that is documented interface which is supported by multiple compilers,
>> or builtins used under the hood inside of libstdc++ headers (again,
>> users should use standard C++ APIs which are supported by multiple
>> compilers instead of the builtins directly) etc.
>> E.g. between GCC 3.4 and current trunk 62 __builtin_ia32_* builtins
>> which were implementation details of the x86 intrinsic headers
>> have been removed as the intrinsics got implemented some other way
>> (e.g. using generic vectors etc.).
>
> Does it matter whether or not those builtins are documented ? It seems
> like most of the __builtin_ia32_ builtins are explicitly documented in
> the manual, despite the fact that these seem like those you're referring
> to as being potentially removable at will whenever - and I see no
> indication in the documentation that they are implementation details/may
> be removed at any time for any reason.
Hello Gabriel,
As far as I understand, Jakub is merely saying:
If a builtin is NOT documented, then it is NOT part of the API.
https://gcc.gnu.org/onlinedocs/gcc.pdf
*builtin_ia32_addcarry* appears nowhere in the manual,
thus it is NOT documented, thus it is NOT part of the API.
Regards
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Use-case for _addcarryx_u64() wrapper
2023-06-03 8:53 ` Mason
@ 2023-06-03 9:09 ` Jakub Jelinek
0 siblings, 0 replies; 11+ messages in thread
From: Jakub Jelinek @ 2023-06-03 9:09 UTC (permalink / raw)
To: Mason; +Cc: Gabriel Ravier, gcc-help, Uros Bizjak, Jeffrey Walton, Marc Glisse
On Sat, Jun 03, 2023 at 10:53:04AM +0200, Mason wrote:
> >> Sure, some builtins are officially supported.
> >> Those are meant to be used directly by users.
> >>
> >> Then there are many builtins which are implementation detail for some
> >> other API that users should use instead.
> >> That includes e.g. builtins used under the hood for <*intrin.h>
> >> implementation - users should use the intrinsics from those headers,
> >> that is documented interface which is supported by multiple compilers,
> >> or builtins used under the hood inside of libstdc++ headers (again,
> >> users should use standard C++ APIs which are supported by multiple
> >> compilers instead of the builtins directly) etc.
> >> E.g. between GCC 3.4 and current trunk 62 __builtin_ia32_* builtins
> >> which were implementation details of the x86 intrinsic headers
> >> have been removed as the intrinsics got implemented some other way
> >> (e.g. using generic vectors etc.).
> >
> > Does it matter whether or not those builtins are documented ? It seems
> > like most of the __builtin_ia32_ builtins are explicitly documented in
> > the manual, despite the fact that these seem like those you're referring
> > to as being potentially removable at will whenever - and I see no
> > indication in the documentation that they are implementation details/may
> > be removed at any time for any reason.
>
> Hello Gabriel,
>
> As far as I understand, Jakub is merely saying:
> If a builtin is NOT documented, then it is NOT part of the API.
>
> https://gcc.gnu.org/onlinedocs/gcc.pdf
> *builtin_ia32_addcarry* appears nowhere in the manual,
> thus it is NOT documented, thus it is NOT part of the API.
Roughly, except that some builtins are mistakenly documented when they
shouldn't be. I think e.g. none of the __builtin_ia32_* builtins are actually
supported.
The documentation even mentions some __builtin_ia32_* builtins which were
removed years ago. E.g. __builtin_ia32_paddw128 but lots of similar ones in GCC 5
(2015).
Jakub
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Use-case for _addcarryx_u64() wrapper
2023-06-02 12:50 ` Jeffrey Walton
@ 2023-06-03 9:10 ` Mason
0 siblings, 0 replies; 11+ messages in thread
From: Mason @ 2023-06-03 9:10 UTC (permalink / raw)
To: Jeffrey Walton; +Cc: gcc-help, Uros Bizjak, Jakub Jelinek, Marc Glisse
On 2/06/2023 14:50, Jeffrey Walton wrote:
> The advertised use case for dual addc instructions is big integer
> operations. I read that somewhere in the Intel docs several years ago.
I know ;)
That's why I added you to CC:
https://gcc.gnu.org/legacy-ml/gcc-help/2017-08/msg00085.html
https://stackoverflow.com/questions/29747508/what-is-the-difference-between-the-adc-and-adcx-instructions-on-ia32-ia64
> When I benchmarked ADCX/ADOX several years ago, ADCX/ADOX were slower
> than using a single ADDC. So I stayed with ADDC. Maybe CPU
> manufacturers have improved things nowadays.
That's what I'm working on right now.
Thanks for sharing.
Regards
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Use-case for _addcarryx_u64() wrapper
2023-06-01 7:42 Use-case for _addcarryx_u64() wrapper Mason
2023-06-01 8:40 ` Uros Bizjak
@ 2023-06-03 11:37 ` Mason
2023-06-03 11:49 ` Jakub Jelinek
1 sibling, 1 reply; 11+ messages in thread
From: Mason @ 2023-06-03 11:37 UTC (permalink / raw)
To: Uros Bizjak, Jakub Jelinek; +Cc: gcc-help, Jeffrey Walton, Marc Glisse
On 01/06/2023 09:42, Mason wrote:
> As far as I can tell, intrinsics _addcarry_u64() and _addcarryx_u64() are
> plain wrappers around the same __builtin_ia32_addcarryx_u64() function.
>
> https://github.com/gcc-mirror/gcc/blob/master/gcc/config/i386/adxintrin.h
Hello Uros, Jakub,
I want to report a missed-optimization bug with _addcarry_u64().
(I can file an issue on Bugzilla, if you deem it appropriate.)
#include <x86intrin.h>
typedef unsigned long long u64;
typedef unsigned __int128 u128;
void testcase1(u64 *acc, u64 a, u64 b)
{
u128 res = (u128)a*b;
u64 lo = res, hi = res >> 64;
unsigned char cf = 0;
cf = _addcarry_u64(cf, lo, acc[0], acc+0);
cf = _addcarry_u64(cf, hi, acc[1], acc+1);
cf = _addcarry_u64(cf, 0, acc[2], acc+2);
}
void testcase2(u64 *acc, u64 a, u64 b)
{
u128 res = (u128)a * b;
u64 lo = res, hi = res >> 64;
asm("add %[LO], %[D0]\n\t" "adc %[HI], %[D1]\n\t" "adc $0, %[D2]" :
[D0] "+m" (acc[0]), [D1] "+m" (acc[1]), [D2] "+m" (acc[2]) :
[LO] "r" (lo), [HI] "r" (hi) : "cc");
}
gcc-trunk -Wall -Wextra -O3 -S testcase.c
(Same code generated with -Os)
/*** rdi = acc, rsi = a, rdx = b ***/
testcase1:
movq %rsi, %rax
mulq %rdx
addq %rax, (%rdi)
movq %rdx, %rax
adcq 8(%rdi), %rax
adcq $0, 16(%rdi)
movq %rax, 8(%rdi)
ret
testcase2:
movq %rsi, %rax ; rax = rsi = a
mulq %rdx ; rdx:rax = rax*rdx = a*b
add %rax, (%rdi) ; acc[0] += lo
adc %rdx, 8(%rdi) ; acc[1] += hi + cf
adc $0, 16(%rdi) ; acc[2] += cf
ret
As you can see, gcc generates the expected code for testcase2,
but it generates sub-optimal code for testcase1:
movq %rdx, %rax
adcq 8(%rdi), %rax
movq %rax, 8(%rdi)
instead of
adc %rdx, 8(%rdi) ; acc[1] += hi + cf
Do you know why it's missing the optimization?
Regards
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Use-case for _addcarryx_u64() wrapper
2023-06-03 11:37 ` Mason
@ 2023-06-03 11:49 ` Jakub Jelinek
0 siblings, 0 replies; 11+ messages in thread
From: Jakub Jelinek @ 2023-06-03 11:49 UTC (permalink / raw)
To: Mason; +Cc: Uros Bizjak, gcc-help, Jeffrey Walton, Marc Glisse
On Sat, Jun 03, 2023 at 01:37:53PM +0200, Mason wrote:
> On 01/06/2023 09:42, Mason wrote:
>
> > As far as I can tell, intrinsics _addcarry_u64() and _addcarryx_u64() are
> > plain wrappers around the same __builtin_ia32_addcarryx_u64() function.
> >
> > https://github.com/gcc-mirror/gcc/blob/master/gcc/config/i386/adxintrin.h
>
> Hello Uros, Jakub,
>
> I want to report a missed-optimization bug with _addcarry_u64().
> (I can file an issue on Bugzilla, if you deem it appropriate.)
Filing this in bugzilla is the right way to go.
I think we'll need to do something about this stuff urgently on
most of the arches anyway, see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989#c56
But what your testcase shows is a separate issue, so should be
filed separately.
Thanks.
Jakub
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2023-06-03 11:51 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-01 7:42 Use-case for _addcarryx_u64() wrapper Mason
2023-06-01 8:40 ` Uros Bizjak
2023-06-02 12:45 ` Mason
2023-06-02 12:50 ` Jeffrey Walton
2023-06-03 9:10 ` Mason
2023-06-02 12:59 ` Jakub Jelinek
2023-06-02 22:53 ` Gabriel Ravier
2023-06-03 8:53 ` Mason
2023-06-03 9:09 ` Jakub Jelinek
2023-06-03 11:37 ` Mason
2023-06-03 11:49 ` Jakub Jelinek
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).