public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/94945] New: Missed optimization: Carry chain not recognized in manually unrolled loop
@ 2020-05-04 16:44 madhur4127 at gmail dot com
2020-05-22 11:27 ` [Bug tree-optimization/94945] " madhur4127 at gmail dot com
2021-08-20 1:22 ` pinskia at gcc dot gnu.org
0 siblings, 2 replies; 3+ messages in thread
From: madhur4127 at gmail dot com @ 2020-05-04 16:44 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94945
Bug ID: 94945
Summary: Missed optimization: Carry chain not recognized in
manually unrolled loop
Product: gcc
Version: 10.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: madhur4127 at gmail dot com
Target Milestone: ---
Context: Big integer addition using ADC (_addcarry_u64).
See Godbolt link: https://godbolt.org/z/rMxe6W
Example:
Suppose the case of big integer addition:
// pa, pb: pointer to big integer A, B
// n: size of big integer A, B
// pr: pointer to result
void add(const uint64_t * __restrict__ pa, const uint64_t * __restrict__ pb,
uint64_t * __restrict__ pr, unsigned n) {
unsigned char carry = 0;
unsigned i;
for(i = 0; i<n; i += 4) {
carry = _addcarry_u64(carry, pa[i+0], pb[i+0], &pr[i+0]);
carry = _addcarry_u64(carry, pa[i+1], pb[i+1], &pr[i+1]);
carry = _addcarry_u64(carry, pa[i+2], pb[i+2], &pr[i+2]);
carry = _addcarry_u64(carry, pa[i+3], pb[i+3], &pr[i+3]);
}
}
Without loop unrolling GCC saves the Carry Flag at the end of the loop and
again sets the saved carry flag in the next iteration. GCC doesn't recognize
the propagation of Carry Flag across loop iterations even when manually
unrolling the loop (while Clang does). GCC saves the carry and triggers it
again in this fashion (2 iterations shown):
mov ecx, eax # i, i
add r9b, -1 # carry,
mov rdx, QWORD PTR [rdi+rcx*8] # tmp132, *_6
adc rdx, QWORD PTR [rsi+rcx*8] # tmp132, *_4
mov QWORD PTR [r8+rcx*8], rdx #* pr, tmp132
setc r9b #, _48 <--------SAVING CARRY
lea ecx, [rax+1] # tmp134,
mov rdx, QWORD PTR [rdi+rcx*8] # tmp140, *_15
add r9b, -1 # _48, <--------SETTING CARRY
adc rdx, QWORD PTR [rsi+rcx*8] # tmp140, *_13
mov QWORD PTR [r8+rcx*8], rdx #* pr, tmp140
setc r9b #, _47
Another optimization:
Trigger loop unrolling (without the need to manually unrolling) and propagate
the carry without the need to save/set it in between.
Side Note:
Is this the fastest and optimal way to add two big integers? Considering ASM to
be the last resort?
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug tree-optimization/94945] Missed optimization: Carry chain not recognized in manually unrolled loop
2020-05-04 16:44 [Bug rtl-optimization/94945] New: Missed optimization: Carry chain not recognized in manually unrolled loop madhur4127 at gmail dot com
@ 2020-05-22 11:27 ` madhur4127 at gmail dot com
2021-08-20 1:22 ` pinskia at gcc dot gnu.org
1 sibling, 0 replies; 3+ messages in thread
From: madhur4127 at gmail dot com @ 2020-05-22 11:27 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94945
Madhur Chauhan <madhur4127 at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Version|10.0 |10.1.1
Component|rtl-optimization |tree-optimization
--- Comment #1 from Madhur Chauhan <madhur4127 at gmail dot com> ---
Is the scope of this optimization too narrow?
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug tree-optimization/94945] Missed optimization: Carry chain not recognized in manually unrolled loop
2020-05-04 16:44 [Bug rtl-optimization/94945] New: Missed optimization: Carry chain not recognized in manually unrolled loop madhur4127 at gmail dot com
2020-05-22 11:27 ` [Bug tree-optimization/94945] " madhur4127 at gmail dot com
@ 2021-08-20 1:22 ` pinskia at gcc dot gnu.org
1 sibling, 0 replies; 3+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-20 1:22 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94945
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|UNCONFIRMED |RESOLVED
See Also| |https://gcc.gnu.org/bugzill
| |a/show_bug.cgi?id=94913,
| |https://gcc.gnu.org/bugzill
| |a/show_bug.cgi?id=97387
Target Milestone|--- |11.0
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
GCC 11+ can produce:
.L3:
movl %eax, %r10d
addb $-1, %r9b
leal 1(%rax), %r9d
movq (%rdi,%r10,8), %rdx
adcq (%rsi,%r10,8), %rdx
movq %rdx, (%r8,%r10,8)
movq (%rdi,%r9,8), %rdx
leal 3(%rax), %r10d
adcq (%rsi,%r9,8), %rdx
movq %rdx, (%r8,%r9,8)
leal 2(%rax), %r9d
movq (%rdi,%r9,8), %rdx
adcq (%rsi,%r9,8), %rdx
movq %rdx, (%r8,%r9,8)
movq (%rdi,%r10,8), %rdx
adcq (%rsi,%r10,8), %rdx
setc %r9b
addl $4, %eax
movq %rdx, (%r8,%r10,8)
cmpl %eax, %ecx
ja .L3
With a single iteration and -funroll-loops GCC 11 gives this for the inner
loop:
.L12:
movq (%r8,%r10), %r11
addb $-1, %dl
adcq (%rdi,%r10), %r11
movq %r11, (%rsi,%r10)
movq 8(%r8,%r10), %r9
adcq 8(%rdi,%r10), %r9
movq %r9, 8(%rsi,%r10)
movq 16(%r8,%r10), %rax
adcq 16(%rdi,%r10), %rax
movq %rax, 16(%rsi,%r10)
movq 24(%r8,%r10), %rdx
adcq 24(%rdi,%r10), %rdx
movq %rdx, 24(%rsi,%r10)
movq 32(%r8,%r10), %r11
adcq 32(%rdi,%r10), %r11
movq %r11, 32(%rsi,%r10)
movq 40(%r8,%r10), %r9
adcq 40(%rdi,%r10), %r9
movq %r9, 40(%rsi,%r10)
movq 48(%r8,%r10), %rax
adcq 48(%rdi,%r10), %rax
movq %rax, 48(%rsi,%r10)
movq 56(%r8,%r10), %r11
adcq 56(%rdi,%r10), %r11
movq %r11, 56(%rsi,%r10)
setc %dl
addq $64, %r10
cmpq %rcx, %r10
jne .L12
So Fixed in GCC 11.
Note was most likely fixed by r11-145 (which was pushed around the time you
filed the bug) and r11-3882 (later on last year).
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-08-20 1:22 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-04 16:44 [Bug rtl-optimization/94945] New: Missed optimization: Carry chain not recognized in manually unrolled loop madhur4127 at gmail dot com
2020-05-22 11:27 ` [Bug tree-optimization/94945] " madhur4127 at gmail dot com
2021-08-20 1:22 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).