* Use-case for _addcarryx_u64() wrapper @ 2023-06-01 7:42 Mason 2023-06-01 8:40 ` Uros Bizjak 2023-06-03 11:37 ` Mason 0 siblings, 2 replies; 11+ messages in thread From: Mason @ 2023-06-01 7:42 UTC (permalink / raw) To: gcc-help; +Cc: Uros Bizjak, Jakub Jelinek, Jeffrey Walton, Marc Glisse Hello, As far as I can tell, intrinsics _addcarry_u64() and _addcarryx_u64() are plain wrappers around the same __builtin_ia32_addcarryx_u64() function. https://github.com/gcc-mirror/gcc/blob/master/gcc/config/i386/adxintrin.h Thus, I wonder: what is the use-case for the wrappers? Why would a programmer not call the builtin directly? Is it for compatibility with Intel compilers? Also, based on the names, I would have assumed that _addcarry_u64 generates adc while _addcarryx_u64 generates adcx/adox ? Relevant past discussion: https://gcc.gnu.org/legacy-ml/gcc-help/2017-08/msg00100.html Regards ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Use-case for _addcarryx_u64() wrapper 2023-06-01 7:42 Use-case for _addcarryx_u64() wrapper Mason @ 2023-06-01 8:40 ` Uros Bizjak 2023-06-02 12:45 ` Mason 2023-06-03 11:37 ` Mason 1 sibling, 1 reply; 11+ messages in thread From: Uros Bizjak @ 2023-06-01 8:40 UTC (permalink / raw) To: Mason; +Cc: gcc-help, Jakub Jelinek, Jeffrey Walton, Marc Glisse On Thu, Jun 1, 2023 at 9:42 AM Mason <slash.tmp@free.fr> wrote: > > Hello, > > As far as I can tell, intrinsics _addcarry_u64() and _addcarryx_u64() are > plain wrappers around the same __builtin_ia32_addcarryx_u64() function. > > https://github.com/gcc-mirror/gcc/blob/master/gcc/config/i386/adxintrin.h > > Thus, I wonder: what is the use-case for the wrappers? > Why would a programmer not call the builtin directly? > Is it for compatibility with Intel compilers? Builtins are internal implementation detail, it is not published API. Although rarely, builtins can be changed for some reason or another, while intrinsic functions from adxintrin.h follow published API. > Also, based on the names, I would have assumed that > _addcarry_u64 generates adc > while > _addcarryx_u64 generates adcx/adox ? No, they all generate add/adc. There is no use case to maintain two interleaved carry chains, IOW rewriting: --cut here-- #include <immintrin.h> int foo (int A, int B, int D, int E) { _Bool carry1 = 0, carry2 = 0; int C, F; carry1 = _addcarryx_u32 (carry1, A, B, &C); carry2 = _addcarryx_u32 (carry2, D, E, &F); carry1 = _addcarryx_u32 (carry1, A, B, &C); carry2 = _addcarryx_u32 (carry2, D, E, &F); return C + F; } --cut here-- to: --cut here-- #include <immintrin.h> int foo (int A, int B, int D, int E) { _Bool carry1 = 0, carry2 = 0; int C, F; carry1 = _addcarryx_u32 (carry1, A, B, &C); carry1 = _addcarryx_u32 (carry1, A, B, &C); carry2 = _addcarryx_u32 (carry2, D, E, &F); carry2 = _addcarryx_u32 (carry2, D, E, &F); return C + F; } --cut here-- will give you: movl %edi, %eax addl %esi, %eax movl %edx, %eax adcl %esi, %edi addl %ecx, %eax adcl %ecx, %edx leal (%rdi,%rdx), %eax ret Uros. > Relevant past discussion: > https://gcc.gnu.org/legacy-ml/gcc-help/2017-08/msg00100.html > > Regards > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Use-case for _addcarryx_u64() wrapper 2023-06-01 8:40 ` Uros Bizjak @ 2023-06-02 12:45 ` Mason 2023-06-02 12:50 ` Jeffrey Walton 2023-06-02 12:59 ` Jakub Jelinek 0 siblings, 2 replies; 11+ messages in thread From: Mason @ 2023-06-02 12:45 UTC (permalink / raw) To: gcc-help, Uros Bizjak; +Cc: Jakub Jelinek, Jeffrey Walton, Marc Glisse Hello Uros :) On 01/06/2023 10:40, Uros Bizjak wrote: > On Thu, Jun 1, 2023 at 9:42 AM Mason wrote: > >> As far as I can tell, intrinsics _addcarry_u64() and _addcarryx_u64() are >> plain wrappers around the same __builtin_ia32_addcarryx_u64() function. >> >> https://github.com/gcc-mirror/gcc/blob/master/gcc/config/i386/adxintrin.h >> >> Thus, I wonder: what is the use-case for the wrappers? >> Why would a programmer not call the builtin directly? >> Is it for compatibility with Intel compilers? > > Builtins are internal implementation detail, it is not published API. > Although rarely, builtins can be changed for some reason or another, > while intrinsic functions from adxintrin.h follow published API. I'm confused. Built-ins are officially documented: https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html Using Vector Instructions through Built-in Functions Legacy __sync Built-in Functions for Atomic Memory Access Built-in Functions for Memory Model Aware Atomic Operations Built-in Functions to Perform Arithmetic with Overflow Checking Other Built-in Functions Provided by GCC Built-in Functions Specific to Particular Target Machines What do you mean by "not published API" ? Or perhaps you meant __builtin_ia32_addcarryx_u64 specifically? >> Also, based on the names, I would have assumed that >> _addcarry_u64 generates adc >> while >> _addcarryx_u64 generates adcx/adox ? > > No, they all generate add/adc. Why are there two wrappers for the same function? Is it because the API was not designed by GCC? (Intel ICC intrinsics perhaps?) > There is no use case to maintain two > interleaved carry chains, What do you mean by "there is no use-case" ? Are you saying ADCX/ADOX are useless? Regards ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Use-case for _addcarryx_u64() wrapper 2023-06-02 12:45 ` Mason @ 2023-06-02 12:50 ` Jeffrey Walton 2023-06-03 9:10 ` Mason 2023-06-02 12:59 ` Jakub Jelinek 1 sibling, 1 reply; 11+ messages in thread From: Jeffrey Walton @ 2023-06-02 12:50 UTC (permalink / raw) To: Mason; +Cc: gcc-help, Uros Bizjak, Jakub Jelinek, Marc Glisse On Fri, Jun 2, 2023 at 8:45 AM Mason <slash.tmp@free.fr> wrote: > > On 01/06/2023 10:40, Uros Bizjak wrote: > [...] > > There is no use case to maintain two > > interleaved carry chains, > > What do you mean by "there is no use-case" ? > Are you saying ADCX/ADOX are useless? The advertised use case for dual addc instructions is big integer operations. I read that somewhere in the Intel docs several years ago. When I benchmarked ADCX/ADOX several years ago, ADCX/ADOX were slower than using a single ADDC. So I stayed with ADDC. Maybe cpu manufacturers have improved things nowadays. Jeff ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Use-case for _addcarryx_u64() wrapper 2023-06-02 12:50 ` Jeffrey Walton @ 2023-06-03 9:10 ` Mason 0 siblings, 0 replies; 11+ messages in thread From: Mason @ 2023-06-03 9:10 UTC (permalink / raw) To: Jeffrey Walton; +Cc: gcc-help, Uros Bizjak, Jakub Jelinek, Marc Glisse On 2/06/2023 14:50, Jeffrey Walton wrote: > The advertised use case for dual addc instructions is big integer > operations. I read that somewhere in the Intel docs several years ago. I know ;) That's why I added you to CC: https://gcc.gnu.org/legacy-ml/gcc-help/2017-08/msg00085.html https://stackoverflow.com/questions/29747508/what-is-the-difference-between-the-adc-and-adcx-instructions-on-ia32-ia64 > When I benchmarked ADCX/ADOX several years ago, ADCX/ADOX were slower > than using a single ADDC. So I stayed with ADDC. Maybe CPU > manufacturers have improved things nowadays. That's what I'm working on right now. Thanks for sharing. Regards ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Use-case for _addcarryx_u64() wrapper 2023-06-02 12:45 ` Mason 2023-06-02 12:50 ` Jeffrey Walton @ 2023-06-02 12:59 ` Jakub Jelinek 2023-06-02 22:53 ` Gabriel Ravier 1 sibling, 1 reply; 11+ messages in thread From: Jakub Jelinek @ 2023-06-02 12:59 UTC (permalink / raw) To: Mason; +Cc: gcc-help, Uros Bizjak, Jeffrey Walton, Marc Glisse On Fri, Jun 02, 2023 at 02:45:40PM +0200, Mason wrote: > On 01/06/2023 10:40, Uros Bizjak wrote: > > > On Thu, Jun 1, 2023 at 9:42 AM Mason wrote: > > > >> As far as I can tell, intrinsics _addcarry_u64() and _addcarryx_u64() are > >> plain wrappers around the same __builtin_ia32_addcarryx_u64() function. > >> > >> https://github.com/gcc-mirror/gcc/blob/master/gcc/config/i386/adxintrin.h > >> > >> Thus, I wonder: what is the use-case for the wrappers? > >> Why would a programmer not call the builtin directly? > >> Is it for compatibility with Intel compilers? > > > > Builtins are internal implementation detail, it is not published API. > > Although rarely, builtins can be changed for some reason or another, > > while intrinsic functions from adxintrin.h follow published API. > > I'm confused. > Built-ins are officially documented: > https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html Sure, some builtins are officially supported. Those are meant to be used directly by users. Then there are many builtins which are implementation detail for some other API that users should use instead. That includes e.g. builtins used under the hood for <*intrin.h> implementation - users should use the intrinsics from those headers, that is documented interface which is supported by multiple compilers, or builtins used under the hood inside of libstdc++ headers (again, users should use standard C++ APIs which are supported by multiple compilers instead of the builtins directly) etc. E.g. between GCC 3.4 and current trunk 62 __builtin_ia32_* builtins which were implementation details of the x86 intrinsic headers have been removed as the intrinsics got implemented some other way (e.g. using generic vectors etc.). Some builtins are in both categories, e.g. __atomic_* builtins are both used in C++ <atomic> APIs, when using C++ one should use those, or in C <stdatomic.h> APIs, but one can use them directly as well. Jakub ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Use-case for _addcarryx_u64() wrapper 2023-06-02 12:59 ` Jakub Jelinek @ 2023-06-02 22:53 ` Gabriel Ravier 2023-06-03 8:53 ` Mason 0 siblings, 1 reply; 11+ messages in thread From: Gabriel Ravier @ 2023-06-02 22:53 UTC (permalink / raw) To: Jakub Jelinek, Mason; +Cc: gcc-help, Uros Bizjak, Jeffrey Walton, Marc Glisse On 6/2/23 14:59, Jakub Jelinek via Gcc-help wrote: > On Fri, Jun 02, 2023 at 02:45:40PM +0200, Mason wrote: >> On 01/06/2023 10:40, Uros Bizjak wrote: >> >>> On Thu, Jun 1, 2023 at 9:42 AM Mason wrote: >>> >>>> As far as I can tell, intrinsics _addcarry_u64() and _addcarryx_u64() are >>>> plain wrappers around the same __builtin_ia32_addcarryx_u64() function. >>>> >>>> https://github.com/gcc-mirror/gcc/blob/master/gcc/config/i386/adxintrin.h >>>> >>>> Thus, I wonder: what is the use-case for the wrappers? >>>> Why would a programmer not call the builtin directly? >>>> Is it for compatibility with Intel compilers? >>> Builtins are internal implementation detail, it is not published API. >>> Although rarely, builtins can be changed for some reason or another, >>> while intrinsic functions from adxintrin.h follow published API. >> I'm confused. >> Built-ins are officially documented: >> https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html > Sure, some builtins are officially supported. > Those are meant to be used directly by users. > > Then there are many builtins which are implementation detail for some > other API that users should use instead. > That includes e.g. builtins used under the hood for <*intrin.h> > implementation - users should use the intrinsics from those headers, > that is documented interface which is supported by multiple compilers, > or builtins used under the hood inside of libstdc++ headers (again, > users should use standard C++ APIs which are supported by multiple > compilers instead of the builtins directly) etc. > E.g. between GCC 3.4 and current trunk 62 __builtin_ia32_* builtins > which were implementation details of the x86 intrinsic headers > have been removed as the intrinsics got implemented some other way > (e.g. using generic vectors etc.). Does it matter whether or not those builtins are documented ? It seems like most of the __builtin_ia32_ builtins are explicitly documented in the manual, despite the fact that these seem like those you're referring to as being potentially removable at will whenever - and I see no indication in the documentation that they are implementation details/may be removed at any time for any reason. > > Some builtins are in both categories, e.g. __atomic_* builtins > are both used in C++ <atomic> APIs, when using C++ one should use those, > or in C <stdatomic.h> APIs, but one can use them directly as well. > > Jakub > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Use-case for _addcarryx_u64() wrapper 2023-06-02 22:53 ` Gabriel Ravier @ 2023-06-03 8:53 ` Mason 2023-06-03 9:09 ` Jakub Jelinek 0 siblings, 1 reply; 11+ messages in thread From: Mason @ 2023-06-03 8:53 UTC (permalink / raw) To: Gabriel Ravier, Jakub Jelinek Cc: gcc-help, Uros Bizjak, Jeffrey Walton, Marc Glisse On 3/06/2023 00:53, Gabriel Ravier wrote: > On 2/06/2023 14:59, Jakub Jelinek wrote: > >> Sure, some builtins are officially supported. >> Those are meant to be used directly by users. >> >> Then there are many builtins which are implementation detail for some >> other API that users should use instead. >> That includes e.g. builtins used under the hood for <*intrin.h> >> implementation - users should use the intrinsics from those headers, >> that is documented interface which is supported by multiple compilers, >> or builtins used under the hood inside of libstdc++ headers (again, >> users should use standard C++ APIs which are supported by multiple >> compilers instead of the builtins directly) etc. >> E.g. between GCC 3.4 and current trunk 62 __builtin_ia32_* builtins >> which were implementation details of the x86 intrinsic headers >> have been removed as the intrinsics got implemented some other way >> (e.g. using generic vectors etc.). > > Does it matter whether or not those builtins are documented ? It seems > like most of the __builtin_ia32_ builtins are explicitly documented in > the manual, despite the fact that these seem like those you're referring > to as being potentially removable at will whenever - and I see no > indication in the documentation that they are implementation details/may > be removed at any time for any reason. Hello Gabriel, As far as I understand, Jakub is merely saying: If a builtin is NOT documented, then it is NOT part of the API. https://gcc.gnu.org/onlinedocs/gcc.pdf *builtin_ia32_addcarry* appears nowhere in the manual, thus it is NOT documented, thus it is NOT part of the API. Regards ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Use-case for _addcarryx_u64() wrapper 2023-06-03 8:53 ` Mason @ 2023-06-03 9:09 ` Jakub Jelinek 0 siblings, 0 replies; 11+ messages in thread From: Jakub Jelinek @ 2023-06-03 9:09 UTC (permalink / raw) To: Mason; +Cc: Gabriel Ravier, gcc-help, Uros Bizjak, Jeffrey Walton, Marc Glisse On Sat, Jun 03, 2023 at 10:53:04AM +0200, Mason wrote: > >> Sure, some builtins are officially supported. > >> Those are meant to be used directly by users. > >> > >> Then there are many builtins which are implementation detail for some > >> other API that users should use instead. > >> That includes e.g. builtins used under the hood for <*intrin.h> > >> implementation - users should use the intrinsics from those headers, > >> that is documented interface which is supported by multiple compilers, > >> or builtins used under the hood inside of libstdc++ headers (again, > >> users should use standard C++ APIs which are supported by multiple > >> compilers instead of the builtins directly) etc. > >> E.g. between GCC 3.4 and current trunk 62 __builtin_ia32_* builtins > >> which were implementation details of the x86 intrinsic headers > >> have been removed as the intrinsics got implemented some other way > >> (e.g. using generic vectors etc.). > > > > Does it matter whether or not those builtins are documented ? It seems > > like most of the __builtin_ia32_ builtins are explicitly documented in > > the manual, despite the fact that these seem like those you're referring > > to as being potentially removable at will whenever - and I see no > > indication in the documentation that they are implementation details/may > > be removed at any time for any reason. > > Hello Gabriel, > > As far as I understand, Jakub is merely saying: > If a builtin is NOT documented, then it is NOT part of the API. > > https://gcc.gnu.org/onlinedocs/gcc.pdf > *builtin_ia32_addcarry* appears nowhere in the manual, > thus it is NOT documented, thus it is NOT part of the API. Roughly, except that some builtins are mistakenly documented when they shouldn't be. I think e.g. none of the __builtin_ia32_* builtins are actually supported. The documentation even mentions some __builtin_ia32_* builtins which were removed years ago. E.g. __builtin_ia32_paddw128 but lots of similar ones in GCC 5 (2015). Jakub ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Use-case for _addcarryx_u64() wrapper 2023-06-01 7:42 Use-case for _addcarryx_u64() wrapper Mason 2023-06-01 8:40 ` Uros Bizjak @ 2023-06-03 11:37 ` Mason 2023-06-03 11:49 ` Jakub Jelinek 1 sibling, 1 reply; 11+ messages in thread From: Mason @ 2023-06-03 11:37 UTC (permalink / raw) To: Uros Bizjak, Jakub Jelinek; +Cc: gcc-help, Jeffrey Walton, Marc Glisse On 01/06/2023 09:42, Mason wrote: > As far as I can tell, intrinsics _addcarry_u64() and _addcarryx_u64() are > plain wrappers around the same __builtin_ia32_addcarryx_u64() function. > > https://github.com/gcc-mirror/gcc/blob/master/gcc/config/i386/adxintrin.h Hello Uros, Jakub, I want to report a missed-optimization bug with _addcarry_u64(). (I can file an issue on Bugzilla, if you deem it appropriate.) #include <x86intrin.h> typedef unsigned long long u64; typedef unsigned __int128 u128; void testcase1(u64 *acc, u64 a, u64 b) { u128 res = (u128)a*b; u64 lo = res, hi = res >> 64; unsigned char cf = 0; cf = _addcarry_u64(cf, lo, acc[0], acc+0); cf = _addcarry_u64(cf, hi, acc[1], acc+1); cf = _addcarry_u64(cf, 0, acc[2], acc+2); } void testcase2(u64 *acc, u64 a, u64 b) { u128 res = (u128)a * b; u64 lo = res, hi = res >> 64; asm("add %[LO], %[D0]\n\t" "adc %[HI], %[D1]\n\t" "adc $0, %[D2]" : [D0] "+m" (acc[0]), [D1] "+m" (acc[1]), [D2] "+m" (acc[2]) : [LO] "r" (lo), [HI] "r" (hi) : "cc"); } gcc-trunk -Wall -Wextra -O3 -S testcase.c (Same code generated with -Os) /*** rdi = acc, rsi = a, rdx = b ***/ testcase1: movq %rsi, %rax mulq %rdx addq %rax, (%rdi) movq %rdx, %rax adcq 8(%rdi), %rax adcq $0, 16(%rdi) movq %rax, 8(%rdi) ret testcase2: movq %rsi, %rax ; rax = rsi = a mulq %rdx ; rdx:rax = rax*rdx = a*b add %rax, (%rdi) ; acc[0] += lo adc %rdx, 8(%rdi) ; acc[1] += hi + cf adc $0, 16(%rdi) ; acc[2] += cf ret As you can see, gcc generates the expected code for testcase2, but it generates sub-optimal code for testcase1: movq %rdx, %rax adcq 8(%rdi), %rax movq %rax, 8(%rdi) instead of adc %rdx, 8(%rdi) ; acc[1] += hi + cf Do you know why it's missing the optimization? Regards ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Use-case for _addcarryx_u64() wrapper 2023-06-03 11:37 ` Mason @ 2023-06-03 11:49 ` Jakub Jelinek 0 siblings, 0 replies; 11+ messages in thread From: Jakub Jelinek @ 2023-06-03 11:49 UTC (permalink / raw) To: Mason; +Cc: Uros Bizjak, gcc-help, Jeffrey Walton, Marc Glisse On Sat, Jun 03, 2023 at 01:37:53PM +0200, Mason wrote: > On 01/06/2023 09:42, Mason wrote: > > > As far as I can tell, intrinsics _addcarry_u64() and _addcarryx_u64() are > > plain wrappers around the same __builtin_ia32_addcarryx_u64() function. > > > > https://github.com/gcc-mirror/gcc/blob/master/gcc/config/i386/adxintrin.h > > Hello Uros, Jakub, > > I want to report a missed-optimization bug with _addcarry_u64(). > (I can file an issue on Bugzilla, if you deem it appropriate.) Filing this in bugzilla is the right way to go. I think we'll need to do something about this stuff urgently on most of the arches anyway, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989#c56 But what your testcase shows is a separate issue, so should be filed separately. Thanks. Jakub ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2023-06-03 11:51 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-06-01 7:42 Use-case for _addcarryx_u64() wrapper Mason 2023-06-01 8:40 ` Uros Bizjak 2023-06-02 12:45 ` Mason 2023-06-02 12:50 ` Jeffrey Walton 2023-06-03 9:10 ` Mason 2023-06-02 12:59 ` Jakub Jelinek 2023-06-02 22:53 ` Gabriel Ravier 2023-06-03 8:53 ` Mason 2023-06-03 9:09 ` Jakub Jelinek 2023-06-03 11:37 ` Mason 2023-06-03 11:49 ` Jakub Jelinek
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).