public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/111209] New: GCC fails to understand adc pattern
@ 2023-08-28 6:47 unlvsur at live dot com
2023-08-28 18:56 ` [Bug target/111209] GCC fails to understand adc pattern what its document describes jakub at gcc dot gnu.org
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: unlvsur at live dot com @ 2023-08-28 6:47 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111209
Bug ID: 111209
Summary: GCC fails to understand adc pattern
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: unlvsur at live dot com
Target Milestone: ---
Created attachment 55805
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55805&action=edit
bug example
template<typename T>
inline constexpr T addc(T a,T b,bool carryin,bool& carryout) noexcept
{
T s;
auto c1 = __builtin_add_overflow(a, b, __builtin_addressof(s));
auto c2 = __builtin_add_overflow(s, carryin, __builtin_addressof(s));
carryout = c1 | c2;
return s;
}
void test_addc(unsigned long long* a,unsigned long long* b,unsigned long long*
r)
{
bool carry{};
r[0]=addc(a[0],b[0],carry,carry);
r[1]=addc(a[1],b[1],carry,carry);
}
I copied the example from gcc documentation
https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html
But gcc seems to fail to understand the pattern correctly even with what the
document describes.
gcc:
https://godbolt.org/z/Whaaahn41
clang:
https://godbolt.org/z/Ma6rvaYd6
Looks like a bug here.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/111209] GCC fails to understand adc pattern what its document describes
2023-08-28 6:47 [Bug tree-optimization/111209] New: GCC fails to understand adc pattern unlvsur at live dot com
@ 2023-08-28 18:56 ` jakub at gcc dot gnu.org
2023-08-28 18:58 ` [Bug middle-end/111209] " jakub at gcc dot gnu.org
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-08-28 18:56 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111209
--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Just use __int128 addition if all you want is double-word addition (or long
long for 32-bit arches)?
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/111209] GCC fails to understand adc pattern what its document describes
2023-08-28 6:47 [Bug tree-optimization/111209] New: GCC fails to understand adc pattern unlvsur at live dot com
2023-08-28 18:56 ` [Bug target/111209] GCC fails to understand adc pattern what its document describes jakub at gcc dot gnu.org
@ 2023-08-28 18:58 ` jakub at gcc dot gnu.org
2023-08-28 19:02 ` unlvsur at live dot com
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-08-28 18:58 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111209
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed| |2023-08-28
Ever confirmed|0 |1
Status|UNCONFIRMED |ASSIGNED
Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org
--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 55809
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55809&action=edit
gcc14-pr111209.patch
Anyway, here is a patch that makes it match, but it is getting ugly to avoid
making it match prematurely and break other matching.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/111209] GCC fails to understand adc pattern what its document describes
2023-08-28 6:47 [Bug tree-optimization/111209] New: GCC fails to understand adc pattern unlvsur at live dot com
2023-08-28 18:56 ` [Bug target/111209] GCC fails to understand adc pattern what its document describes jakub at gcc dot gnu.org
2023-08-28 18:58 ` [Bug middle-end/111209] " jakub at gcc dot gnu.org
@ 2023-08-28 19:02 ` unlvsur at live dot com
2023-08-28 19:04 ` jakub at gcc dot gnu.org
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: unlvsur at live dot com @ 2023-08-28 19:02 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111209
--- Comment #3 from cqwrteur <unlvsur at live dot com> ---
(In reply to Jakub Jelinek from comment #1)
> Just use __int128 addition if all you want is double-word addition (or long
> long for 32-bit arches)?
Well, I've presented this merely as an illustrative example. The length can
actually be arbitrary. I've directly taken the code from the GCC documentation,
but it doesn't appear to perform as the document asserts.
"
Built-in Function: unsigned int __builtin_addc (unsigned int a, unsigned int b,
unsigned int carry_in, unsigned int *carry_out)
Built-in Function: unsigned long int __builtin_addcl (unsigned long int a,
unsigned long int b, unsigned int carry_in, unsigned long int *carry_out)
Built-in Function: unsigned long long int __builtin_addcll (unsigned long long
int a, unsigned long long int b, unsigned long long int carry_in, unsigned long
long int *carry_out)
These built-in functions are equivalent to:
({ __typeof__ (a) s; \
__typeof__ (a) c1 = __builtin_add_overflow (a, b, &s); \
__typeof__ (a) c2 = __builtin_add_overflow (s, carry_in, &s); \
*(carry_out) = c1 | c2; \
s; })
i.e. they add 3 unsigned values, set what the last argument points to to 1 if
any of the two additions overflowed (otherwise 0) and return the sum of those 3
unsigned values. Note, while all the first 3 arguments can have arbitrary
values, better code will be emitted if one of them (preferrably the third one)
has only values 0 or 1 (i.e. carry-in).
"
Additionally, it's advisable to steer clear of using __uint128_t in certain
situations. This data type is not compatible with the Microsoft compiler and
32-bit machines. Moreover, the compiler does not effectively optimize the
associated costs.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/111209] GCC fails to understand adc pattern what its document describes
2023-08-28 6:47 [Bug tree-optimization/111209] New: GCC fails to understand adc pattern unlvsur at live dot com
` (2 preceding siblings ...)
2023-08-28 19:02 ` unlvsur at live dot com
@ 2023-08-28 19:04 ` jakub at gcc dot gnu.org
2023-08-28 19:08 ` unlvsur at live dot com
2023-08-29 8:47 ` cvs-commit at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-08-28 19:04 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111209
--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to cqwrteur from comment #3)
> (In reply to Jakub Jelinek from comment #1)
> > Just use __int128 addition if all you want is double-word addition (or long
> > long for 32-bit arches)?
>
> Well, I've presented this merely as an illustrative example. The length can
> actually be arbitrary.
No, it was working with all the other lengths.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/111209] GCC fails to understand adc pattern what its document describes
2023-08-28 6:47 [Bug tree-optimization/111209] New: GCC fails to understand adc pattern unlvsur at live dot com
` (3 preceding siblings ...)
2023-08-28 19:04 ` jakub at gcc dot gnu.org
@ 2023-08-28 19:08 ` unlvsur at live dot com
2023-08-29 8:47 ` cvs-commit at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: unlvsur at live dot com @ 2023-08-28 19:08 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111209
--- Comment #5 from cqwrteur <unlvsur at live dot com> ---
(In reply to Jakub Jelinek from comment #4)
> (In reply to cqwrteur from comment #3)
> > (In reply to Jakub Jelinek from comment #1)
> > > Just use __int128 addition if all you want is double-word addition (or long
> > > long for 32-bit arches)?
> >
> > Well, I've presented this merely as an illustrative example. The length can
> > actually be arbitrary.
>
> No, it was working with all the other lengths.
This might come across as unusual. I frequently engage in manipulations
involving the carry flag.
like this implementation for 128 bit division (for 32 bit machine and Microsoft
compiler)
auto shift = static_cast<unsigned>(::std::countl_zero(divisorhigh) -
::std::countl_zero(dividendhigh));
divisorhigh =
::fast_io::intrinsics::shiftleft(divisorlow,divisorhigh,shift);
divisorlow <<= shift;
quotientlow = 0;
bool carry;
do
{
carry=0;
dividendlow=intrinsics::subc(dividendlow,divisorlow,carry,carry);
dividendhigh=intrinsics::subc(dividendhigh,divisorhigh,carry,carry);
constexpr T zero{};
T mask{zero-carry};
T templow{divisorlow&mask},temphigh{divisorhigh&mask};
carry=!carry;
quotientlow=intrinsics::addc(quotientlow,quotientlow,carry,carry);
carry=0;
dividendlow=intrinsics::addc(dividendlow,templow,carry,carry);
dividendhigh=intrinsics::addc(dividendhigh,temphigh,carry,carry);
divisorlow = intrinsics::shiftright(divisorlow,divisorhigh,1u);
divisorhigh >>= 1u;
}
while(shift--);
return {quotientlow,0,dividendlow,dividendhigh};
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/111209] GCC fails to understand adc pattern what its document describes
2023-08-28 6:47 [Bug tree-optimization/111209] New: GCC fails to understand adc pattern unlvsur at live dot com
` (4 preceding siblings ...)
2023-08-28 19:08 ` unlvsur at live dot com
@ 2023-08-29 8:47 ` cvs-commit at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-08-29 8:47 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111209
--- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:
https://gcc.gnu.org/g:a7aec76a74dd38524be325343158d3049b6ab3ac
commit r14-3541-ga7aec76a74dd38524be325343158d3049b6ab3ac
Author: Jakub Jelinek <jakub@redhat.com>
Date: Tue Aug 29 10:46:01 2023 +0200
tree-ssa-math-opts: Improve uaddc/usubc pattern matching [PR111209]
The uaddc/usubc usual matching is of the .{ADD,SUB}_OVERFLOW pair in the
middle, which adds/subtracts carry-in (from lower limbs) and computes
carry-out (to higher limbs). Before optimizations (unless user writes
it intentionally that way already), all the steps look the same, but
optimizations simplify the handling of the least significant limb
(one which adds/subtracts 0 carry-in) to just a single
.{ADD,SUB}_OVERFLOW and the handling of the most significant limb
if the computed carry-out is ignored to normal addition/subtraction
of multiple operands.
Now, match_uaddc_usubc has code to turn that least significant
.{ADD,SUB}_OVERFLOW call into .U{ADD,SUB}C call with 0 carry-in if
a more significant limb above it is matched into .U{ADD,SUB}C; this
isn't necessary for functionality, as .ADD_OVERFLOW (x, y) is
functionally equal to .UADDC (x, y, 0) (provided the types of operands
are the same and result is complex type with that type element), and
it also has code to match the most significant limb with ignored carry-out
(in that case one pattern match turns both the penultimate limb pair of
.{ADD,SUB}_OVERFLOW into .U{ADD,SUB}C and the addition/subtraction
of the 4 values (2 carries) into another .U{ADD,SUB}C.
As the following patch shows, what we weren't handling is the case when
one uses either the __builtin_{add,sub}c builtins or hand written forms
thereof (either __builtin_*_overflow or even that written by hand) for
just 2 limbs, where the least significant has 0 carry-in and the most
significant ignores carry-out. The following patch matches that, e.g.
_16 = .ADD_OVERFLOW (_1, _2);
_17 = REALPART_EXPR <_16>;
_18 = IMAGPART_EXPR <_16>;
_15 = _3 + _4;
_12 = _15 + _18;
into
_16 = .UADDC (_1, _2, 0);
_17 = REALPART_EXPR <_16>;
_18 = IMAGPART_EXPR <_16>;
_19 = .UADDC (_3, _4, _18);
_12 = IMAGPART_EXPR <_19>;
so that we can emit better code.
As the 2 later comments show, we must do that carefully, because the
pass walks the IL from first to last stmt in a bb and we must avoid
pattern matching this way something that should be matched on a later
instruction differently.
2023-08-29 Jakub Jelinek <jakub@redhat.com>
PR middle-end/79173
PR middle-end/111209
* tree-ssa-math-opts.cc (match_uaddc_usubc): Match also
just 2 limb uaddc/usubc with 0 carry-in on lower limb and ignored
carry-out on higher limb. Don't match it though if it could be
matched later on 4 argument addition/subtraction.
* gcc.target/i386/pr79173-12.c: New test.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-08-29 8:47 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-28 6:47 [Bug tree-optimization/111209] New: GCC fails to understand adc pattern unlvsur at live dot com
2023-08-28 18:56 ` [Bug target/111209] GCC fails to understand adc pattern what its document describes jakub at gcc dot gnu.org
2023-08-28 18:58 ` [Bug middle-end/111209] " jakub at gcc dot gnu.org
2023-08-28 19:02 ` unlvsur at live dot com
2023-08-28 19:04 ` jakub at gcc dot gnu.org
2023-08-28 19:08 ` unlvsur at live dot com
2023-08-29 8:47 ` cvs-commit at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).