public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/98673] New: pass fre4 inhibit pass dom3 to create much more optimized code
@ 2021-01-14 9:32 rjiejie at me dot com
2021-01-14 10:45 ` [Bug tree-optimization/98673] " rguenth at gcc dot gnu.org
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: rjiejie at me dot com @ 2021-01-14 9:32 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98673
Bug ID: 98673
Summary: pass fre4 inhibit pass dom3 to create much more
optimized code
Product: gcc
Version: 10.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: rjiejie at me dot com
Target Milestone: ---
Created attachment 49962
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49962&action=edit
bug test file
a, compiler option:
cc1 -mabi=lp64d -march=rv64gc -O2 -S
b, hot code in function t_run_test:
j .L30
.L39:
mv a4,a3
.L30:
ld a2,8(a5)
addi a3,a4,1
slli t3,a4,3
ble a2,a1,.L28
ld t5,0(a5)
bge a1,t5,.L50
.L28:
addi a5,a5,8
bne a3,a0,.L39 : hot code loop to .L39
better code in version 8.4 with same compiler option:
=====================================================
.L30:
ld t1,8(a4)
slli a7,a5,3
ble t1,a3,.L28
ld t4,0(a4)
bge a3,t4,.L50
.L28:
addi a5,a5,1
addi a4,a4,8
bne a5,t3,.L30 : hot code loop to .L30
v10.2.0 gcc has more one instruction than v8.4.0.
analize gcc pass of source code in v10.2.0:
===========================================
before pass fr4:
----------------
<bb 8> [local count: 82176881]:
engLoad.11_20 = engLoad;
loadValue.13_26 = loadValue;
_410 = (unsigned long) numXEntries.17_218;
_409 = _410 + 18446744073709551615;
_408 = (long int) _409;
... ...
<bb 12> [local count: 986782143]:
i1_174 = i1_6 + 1;
if (i1_174 != _408)
goto <bb 9>; [94.50%]
else
goto <bb 13>; [5.50%]
<bb 13> [local count: 54273018]:
# i1_420 = PHI <i1_174(12)>
_433 = (long unsigned int) i1_420;
_434 = _433 + 1;
_435 = _434 * 8;
_436 = i1_420 + 1;
_440 = _435 - 8;
_442 = engLoad.11_20 + _440;
goto <bb 15>; [100.00%]
after pass fr4:
---------------
<bb 8> [local count: 82176881]:
engLoad.11_20 = engLoad;
loadValue.13_26 = loadValue;
_410 = (unsigned long) numXEntries.17_218;
_409 = _410 + 18446744073709551615;
... ...
<bb 12> [local count: 986782143]:
i1_174 = i1_6 + 1;
if (i1_174 != _213)
goto <bb 9>; [94.50%]
else
goto <bb 13>; [5.50%]
<bb 13> [local count: 54273018]:
_433 = (long unsigned int) i1_174;
_434 = _433 + 1;
_435 = _434 * 8;
_436 = i1_174 + 1;
_440 = _435 - 8;
_442 = engLoad.11_20 + _440;
goto <bb 15>; [100.00%]
pass fr4 remove 'Removing dead stmt _408 = (long int) _409;',
pass dom3 can't optimize this <bb 13> about '_433 = (long unsigned int)
i1_174;'
if <bb 13> use i1_174 node same as <bb 12>, so that conflict will be happened
in pass expand on processing coalesced ssa/phi nodes, and then will split edge.
need help ....:)
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/98673] pass fre4 inhibit pass dom3 to create much more optimized code
2021-01-14 9:32 [Bug tree-optimization/98673] New: pass fre4 inhibit pass dom3 to create much more optimized code rjiejie at me dot com
@ 2021-01-14 10:45 ` rguenth at gcc dot gnu.org
2021-01-15 1:35 ` rjiejie at me dot com
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-14 10:45 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98673
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rguenth at gcc dot gnu.org
Target| |riscv
Keywords| |missed-optimization
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
The analysis sounds a bit confused. What is the transform that DOM cannot do
after the transform that FRE does? There's some older bug about out-of-SSA
coalescing issues with loops and liveness of induction variables but it's
not clear if this is related (the assembly doesn't show the loop exit block).
Can you name the loop in the source that is problematic?
See PR86270 and PR70359
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/98673] pass fre4 inhibit pass dom3 to create much more optimized code
2021-01-14 9:32 [Bug tree-optimization/98673] New: pass fre4 inhibit pass dom3 to create much more optimized code rjiejie at me dot com
2021-01-14 10:45 ` [Bug tree-optimization/98673] " rguenth at gcc dot gnu.org
@ 2021-01-15 1:35 ` rjiejie at me dot com
2021-01-18 1:31 ` rjiejie at me dot com
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: rjiejie at me dot com @ 2021-01-15 1:35 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98673
--- Comment #2 from jojo <rjiejie at me dot com> ---
(In reply to Richard Biener from comment #1)
> The analysis sounds a bit confused. What is the transform that DOM cannot
> do after the transform that FRE does? There's some older bug about
> out-of-SSA
> coalescing issues with loops and liveness of induction variables but it's
> not clear if this is related (the assembly doesn't show the loop exit block).
>
> Can you name the loop in the source that is problematic?
>
see this loop:
for( i1 = 0 ; i1 < ( numXEntries - 1 ) ; i1++ )
{
if( ( loadValue < engLoad[i1+1] ) && ( loadValue >= engLoad[i1] ) )
{
break ;
}
}
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/98673] pass fre4 inhibit pass dom3 to create much more optimized code
2021-01-14 9:32 [Bug tree-optimization/98673] New: pass fre4 inhibit pass dom3 to create much more optimized code rjiejie at me dot com
2021-01-14 10:45 ` [Bug tree-optimization/98673] " rguenth at gcc dot gnu.org
2021-01-15 1:35 ` rjiejie at me dot com
@ 2021-01-18 1:31 ` rjiejie at me dot com
2021-01-18 10:05 ` rguenth at gcc dot gnu.org
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: rjiejie at me dot com @ 2021-01-18 1:31 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98673
--- Comment #3 from jojo <rjiejie at me dot com> ---
(In reply to jojo from comment #2)
> (In reply to Richard Biener from comment #1)
> > The analysis sounds a bit confused. What is the transform that DOM cannot
> > do after the transform that FRE does? There's some older bug about
> > out-of-SSA
> > coalescing issues with loops and liveness of induction variables but it's
> > not clear if this is related (the assembly doesn't show the loop exit block).
> >
> > Can you name the loop in the source that is problematic?
> >
>
> see this loop:
>
> for( i1 = 0 ; i1 < ( numXEntries - 1 ) ; i1++ )
> {
> if( ( loadValue < engLoad[i1+1] ) && ( loadValue >= engLoad[i1]
> ) )
> {
> break ;
> }
> }
Richi: Can you please update Known to work?
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/98673] pass fre4 inhibit pass dom3 to create much more optimized code
2021-01-14 9:32 [Bug tree-optimization/98673] New: pass fre4 inhibit pass dom3 to create much more optimized code rjiejie at me dot com
` (2 preceding siblings ...)
2021-01-18 1:31 ` rjiejie at me dot com
@ 2021-01-18 10:05 ` rguenth at gcc dot gnu.org
2021-02-03 9:58 ` rjiejie at me dot com
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-18 10:05 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98673
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Known to work| |8.4.0
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Please try simplifying the testcase, I've tried
long loadValue;
const long *engLoad;
float engLoadDelta1;
void foo()
{
long i1, numXEntries = 50;
for( i1 = 0 ; i1 < ( numXEntries - 1 ) ; i1++ )
{
if( ( loadValue < engLoad[i1+1] ) && ( loadValue >= engLoad[i1] ) )
{
break ;
}
}
if( i1 == ( numXEntries - 1 ) )
{
loadValue = engLoad[i1] ;
}
engLoadDelta1 = (float)( loadValue - engLoad[i1] ) /
(float)( engLoad[i1 + 1] - engLoad[i1] ) ;
}
which on x86 doesn't exhibit the issue (same code with GCC 8 and GCC 10).
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/98673] pass fre4 inhibit pass dom3 to create much more optimized code
2021-01-14 9:32 [Bug tree-optimization/98673] New: pass fre4 inhibit pass dom3 to create much more optimized code rjiejie at me dot com
` (3 preceding siblings ...)
2021-01-18 10:05 ` rguenth at gcc dot gnu.org
@ 2021-02-03 9:58 ` rjiejie at me dot com
2021-02-03 10:19 ` rguenth at gcc dot gnu.org
2023-12-06 10:39 ` [Bug middle-end/98673] " rguenth at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: rjiejie at me dot com @ 2021-02-03 9:58 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98673
--- Comment #5 from jojo <rjiejie at me dot com> ---
Sorry for late :)
Please test with following c case:
long
YTableLookup (long xValue, long xEntries, const long *xAxis,
const long *yTable )
{
int i ;
long xDelta ;
long outValue ;
for (i=0; i<(xEntries - 1); i++)
{
if ((xValue < xAxis[i + 1]) && (xValue >= xAxis[i]))
break ;
}
if (i == (xEntries - 1))
xValue = xAxis[i] ;
xDelta = (long) ((xValue - xAxis[i]) * 1000) / (xAxis[i + 1] - xAxis[i]);
outValue = (long) ((((1000 - xDelta) * (long) yTable[i]) / 1000) +
((xDelta * (long) yTable[i+1]) / 1000)) ;
return outValue ;
}
risc-v cc1 option: -O2 -march=rv32gc -mabi=ilp32d
=================================================
YTableLookup:
addi a1,a1,-1
ble a1,zero,.L2
li a7,4
li a4,0
sub a7,a7,a2
j .L5
.L6:
mv a4,a5
.L5:
lw a6,4(a2)
addi a5,a4,1
add t1,a7,a2
ble a6,a0,.L3
lw t3,0(a2)
slli t4,a4,2
ble t3,a0,.L9
.L3:
addi a2,a2,4
bne a1,a5,.L6
addi a4,a4,2
slli t1,a4,2
addi t4,t1,-4
add t4,a3,t4
li a1,0
li a5,1000
.L4:
or x86 cc1 option: -O2 -march=i386
==================================
YTableLookup:
.LFB0:
pushl %ebp
.LCFI0:
pushl %edi
.LCFI1:
pushl %esi
.LCFI2:
pushl %ebx
.LCFI3:
pushl %ecx
.LCFI4:
movl 24(%esp), %esi
movl 32(%esp), %edi
movl 28(%esp), %eax
decl %eax
testl %eax, %eax
jle .L2
movl $4, %ecx
xorl %edx, %edx
jmp .L5
.align 4
.L6:
movl %ebx, %edx
.L5:
movl (%edi,%ecx), %ebx
cmpl %esi, %ebx
jle .L3
leal -4(%ecx), %ebp
movl %ebp, (%esp)
movl (%edi,%edx,4), %ebp
cmpl %esi, %ebp
jle .L10
.L3:
leal 1(%edx), %ebx
addl $4, %ecx
cmpl %ebx, %eax
jne .L6
leal 8(,%edx,4), %ecx
movl 36(%esp), %eax
leal -4(%eax,%ecx), %esi
xorl %eax, %eax
movl $1000, %ebx
.L4:
Please check the redundancy instruction 'mov' at .L6:
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/98673] pass fre4 inhibit pass dom3 to create much more optimized code
2021-01-14 9:32 [Bug tree-optimization/98673] New: pass fre4 inhibit pass dom3 to create much more optimized code rjiejie at me dot com
` (4 preceding siblings ...)
2021-02-03 9:58 ` rjiejie at me dot com
@ 2021-02-03 10:19 ` rguenth at gcc dot gnu.org
2023-12-06 10:39 ` [Bug middle-end/98673] " rguenth at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-02-03 10:19 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98673
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Known to fail| |10.2.1
See Also| |https://gcc.gnu.org/bugzill
| |a/show_bug.cgi?id=70359
Known to work| |11.0
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
So with your testcase on trunk I see for RISCV
ble a1,zero,.L2
li a6,4
li a5,0
sub a6,a6,a2
.L5:
lw a4,4(a2)
slli a7,a5,2
add t1,a6,a2
addi a5,a5,1
ble a4,a0,.L3
lw t3,0(a2)
ble t3,a0,.L9
.L3:
addi a2,a2,4
bne a1,a5,.L5
which is fine, same for x86. This is usually a SSA coalescing issue where
a failed coalesce ends up splitting the backedge and emitting a move there.
I can see the issue on the branch where the problematic one is
;; basic block 4, loop depth 1
;; pred: 3
;; 7
# i_57 = PHI <0(3), i_41(7)>
...
;; basic block 7, loop depth 1
;; pred: 4
;; 5
i_41 = i_57 + 1;
ivtmp.14_90 = ivtmp.14_91 + 4;
if (_6 != i_41)
goto <bb 4>; [94.50%]
else
goto <bb 8>; [5.50%]
;; succ: 4
;; 8
;; basic block 8, loop depth 0
;; pred: 7
_87 = (sizetype) i_57;
_146 = _87 + 2;
which is a use of the pre-increment i_57 on the loop exit edge. This
inhibits coalescing of i_57 and i_41 causing the copy.
That's exactly the issue noted in the cited PRs. There have been patches
floating around re-materializing i_41 + 1 at the point of i_57 to make
the coalescing possible but I think nobody developed them in full.
See the thread starting at
https://gcc.gnu.org/pipermail/gcc-patches/2018-March/495843.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug middle-end/98673] pass fre4 inhibit pass dom3 to create much more optimized code
2021-01-14 9:32 [Bug tree-optimization/98673] New: pass fre4 inhibit pass dom3 to create much more optimized code rjiejie at me dot com
` (5 preceding siblings ...)
2021-02-03 10:19 ` rguenth at gcc dot gnu.org
@ 2023-12-06 10:39 ` rguenth at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-12-06 10:39 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98673
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
Resolution|--- |WORKSFORME
--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
No updates, verified the code is still the same, assuming not an issue.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-12-06 10:39 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-14 9:32 [Bug tree-optimization/98673] New: pass fre4 inhibit pass dom3 to create much more optimized code rjiejie at me dot com
2021-01-14 10:45 ` [Bug tree-optimization/98673] " rguenth at gcc dot gnu.org
2021-01-15 1:35 ` rjiejie at me dot com
2021-01-18 1:31 ` rjiejie at me dot com
2021-01-18 10:05 ` rguenth at gcc dot gnu.org
2021-02-03 9:58 ` rjiejie at me dot com
2021-02-03 10:19 ` rguenth at gcc dot gnu.org
2023-12-06 10:39 ` [Bug middle-end/98673] " rguenth at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).