public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/112612] New: [Missed Optimization] Holding on the loop variable rather than a derived value which can replace it
@ 2023-11-18 21:04 eyalroz1 at gmx dot com
2023-11-18 21:41 ` [Bug tree-optimization/112612] " pinskia at gcc dot gnu.org
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: eyalroz1 at gmx dot com @ 2023-11-18 21:04 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112612
Bug ID: 112612
Summary: [Missed Optimization] Holding on the loop variable
rather than a derived value which can replace it
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: eyalroz1 at gmx dot com
Target Milestone: ---
Consider the following function:
void foo(int* __restrict__ a) {
int i, val;
for (i = 0; i < 100; i++) {
val = 2 * i;
a[i] = val;
}
}
When compiling it for x86_64 with -O3 -fno-unroll-loops -fno-tree-vectorize,
GCC 7.2 used to give:
foo:
xor eax, eax
.L2:
mov DWORD PTR [rdi], eax
add eax, 2
add rdi, 4
cmp eax, 200
jne .L2
rep ret
which was rather wasteful, as eax and rdi - eax are linearly related. With GCC
13.2 or trunk on GodBolt as of today, this improves, but not really:
foo:
xor eax, eax
.L2:
lea edx, [rax+rax]
mov DWORD PTR [rdi+rax*4], edx
add rax, 1
cmp rax, 100
jne .L2
ret
So, we don't increment two things; but - we do have an addition-via-lea in each
iteration. Is that really necessary? I mean, instead of keeping the i variable
(in rax), we could keep v = 2 * i, and that's good enough for both addressing
and condition checking. Indeed, clang 17 emits:
foo: # @foo
xor eax, eax
.LBB0_1: # =>This Inner Loop Header: Depth=1
mov dword ptr [rdi + 2*rax], eax
add rax, 2
cmp rax, 200
jne .LBB0_1
ret
which is almost the same, except that it holds v = 2 * i rather than i. (clang
has produced this code since v3.0.0 at least.)
GodBolt link: https://gcc.godbolt.org/z/MjzTbr831
Originally discussed in this SO question:
https://stackoverflow.com/q/48354636/1593077
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/112612] Holding on the loop variable rather than a derived value which can replace it
2023-11-18 21:04 [Bug c/112612] New: [Missed Optimization] Holding on the loop variable rather than a derived value which can replace it eyalroz1 at gmx dot com
@ 2023-11-18 21:41 ` pinskia at gcc dot gnu.org
2023-11-18 23:54 ` eyalroz1 at gmx dot com
2023-11-20 9:42 ` rguenth at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-11-18 21:41 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112612
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|middle-end |tree-optimization
Summary|[Missed Optimization] |Holding on the loop
|Holding on the loop |variable rather than a
|variable rather than a |derived value which can
|derived value which can |replace it
|replace it |
Target| |x86_64-linux-gnu
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
IV-OPTs selects these IVs and is very much target specific due to cost model.
It is a N complete problem after all too.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/112612] Holding on the loop variable rather than a derived value which can replace it
2023-11-18 21:04 [Bug c/112612] New: [Missed Optimization] Holding on the loop variable rather than a derived value which can replace it eyalroz1 at gmx dot com
2023-11-18 21:41 ` [Bug tree-optimization/112612] " pinskia at gcc dot gnu.org
@ 2023-11-18 23:54 ` eyalroz1 at gmx dot com
2023-11-20 9:42 ` rguenth at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: eyalroz1 at gmx dot com @ 2023-11-18 23:54 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112612
--- Comment #2 from Eyal Rozenberg <eyalroz1 at gmx dot com> ---
(In reply to Andrew Pinski from comment #1)
> IV-OPTs selects these IVs and is very much target specific due to cost model.
In this example, it seems that the missed optimization should be useful under
most/all cost models. Of course, I may be wrong, I'm no CPU expert.
> It is a N[P] complete problem after all too.
I wonder if the asymptotic nature of the general problem is really the issue
here.
Anyway - I'm just noting the behavior. It is of course up to you all to decide
whether you want to do something about it.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/112612] Holding on the loop variable rather than a derived value which can replace it
2023-11-18 21:04 [Bug c/112612] New: [Missed Optimization] Holding on the loop variable rather than a derived value which can replace it eyalroz1 at gmx dot com
2023-11-18 21:41 ` [Bug tree-optimization/112612] " pinskia at gcc dot gnu.org
2023-11-18 23:54 ` eyalroz1 at gmx dot com
@ 2023-11-20 9:42 ` rguenth at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-11-20 9:42 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112612
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
Last reconfirmed| |2023-11-20
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.
Candidate 1:
Var befor: ivtmp.5
Var after: ivtmp.5
Incr POS: before exit test
IV struct:
Type: unsigned long
Base: 0
Step: 1
Biv: N
Overflowness wrto loop niter: No-overflow
...
Candidate 8:
Var befor: ivtmp.10
Var after: ivtmp.10
Incr POS: before exit test
IV struct:
Type: unsigned int
Base: 0
Step: 2
Biv: N
Overflowness wrto loop niter: No-overflow
Candidate 9:
Var befor: ivtmp.11
Var after: ivtmp.11
Incr POS: before exit test
IV struct:
Type: unsigned long
Base: (unsigned long) a_8(D)
Step: 4
Object: (void *) a_8(D)
Biv: N
Overflowness wrto loop niter: Overflow
so we do have this candidate.
Improved to:
cost: 16 (complexity 0)
reg_cost: 4
cand_cost: 10
cand_group_cost: 2 (complexity 0)
candidates: 8, 9
group:0 --> iv_cand:8, cost=(0,0)
group:1 --> iv_cand:9, cost=(2,0)
group:2 --> iv_cand:8, cost=(0,0)
invariant variables:
invariant expressions:
Initial set of candidates:
cost: 15 (complexity 2)
reg_cost: 3
cand_cost: 5
cand_group_cost: 7 (complexity 2)
candidates: 1
group:0 --> iv_cand:1, cost=(4,0)
group:1 --> iv_cand:1, cost=(3,2)
group:2 --> iv_cand:1, cost=(0,0)
invariant variables: 1
invariant expressions:
but somehow we fail to express(?) some of the uses with just candidate 8?
It "works" with -m32 added:
<bb 3> [local count: 1063004408]:
# ivtmp.10_12 = PHI <ivtmp.10_11(5), 0(2)>
val_7 = (int) ivtmp.10_12;
MEM[(int *)a_8(D) + ivtmp.10_12 * 2] = val_7;
ivtmp.10_11 = ivtmp.10_12 + 2;
if (ivtmp.10_11 != 200)
goto <bb 5>; [98.99%]
so it might be the 32->64bit promotion is what gets us off. We might
possibly want to consider a 'unsigned long' candidate with step 2.
.L2:
movl %eax, (%edx,%eax,2)
addl $2, %eax
cmpl $200, %eax
jne .L2
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-11-20 9:42 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-18 21:04 [Bug c/112612] New: [Missed Optimization] Holding on the loop variable rather than a derived value which can replace it eyalroz1 at gmx dot com
2023-11-18 21:41 ` [Bug tree-optimization/112612] " pinskia at gcc dot gnu.org
2023-11-18 23:54 ` eyalroz1 at gmx dot com
2023-11-20 9:42 ` rguenth at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).