public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/113903] New: sched1 should schedule across EBBS
@ 2024-02-13 10:40 tnfchris at gcc dot gnu.org
2024-02-13 11:26 ` [Bug rtl-optimization/113903] " amonakov at gcc dot gnu.org
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-02-13 10:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113903
Bug ID: 113903
Summary: sched1 should schedule across EBBS
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: tnfchris at gcc dot gnu.org
Target Milestone: ---
The following testcase:
#define N 306
#define NEEDLE 136
int table[N];
int foo (int i, unsigned short parse_tables_n)
{
parse_tables_n >>= 9;
parse_tables_n += 11;
while (i < N && parse_tables_n--)
table[i++] = 0;
return table[NEEDLE];
}
compiled at -O3 shows an issue we've started getting with the support for early
break vectorization.
sched1 doesn't seem to be able to schedule across EBBs, which is logical since
we never really needed to before.
However the above code generates:
.L10:
st1w z28.s, p7, [x1, #1, mul vl]
st1w z28.s, p7, [x1]
add x1, x1, x5
cmp w0, w2
bcc .L17
.L8:
cmpne p15.h, p7/z, z31.h, #0
mov z29.d, z31.d
not p15.b, p14/z, p15.b
mov z27.d, z30.d
add w2, w2, w4
dech z31.h
ptest p14, p15.b
incw z30.s, all, mul #2
b.none .L10
umov w1, v29.h[0]
umov w20, v27.s[0]
and w3, w1, 65535
b .L6
and the AArch64 codegen inefficiencies aside (which I will tackle myself) shows
that we're copying the old value of the induction variables in every loop
iteration to keep them for the reductions if we exit.
However the new values are not live in L8 and so the operations can be moved to
L10:
.L10:
incw z30.s, all, mul #2
dech z31.h
st1w z28.s, p7, [x1, #1, mul vl]
st1w z28.s, p7, [x1]
add x1, x1, x5
cmp w0, w2
bcc .L17
.L8:
cmpne p15.h, p7/z, z31.h, #0
not p15.b, p14/z, p15.b
add w2, w2, w4
ptest p14, p15.b
b.none .L10
umov w1, v31.h[0]
umov w20, v30.s[0]
and w3, w1, 65535
b .L6
and thus decreasing the live ranges. The optimal codegen for this sequence is:
.L10:
dech z31.h
incw z30.s, all, mul #2
st1w z28.s, p7, [x1, #1, mul vl]
st1w z28.s, p7, [x1]
add x1, x1, x5
cmp w0, w2
bcc .L17
.L8:
cmpeq p15.h, p7/z, z31.h, #0
add w2, w2, w4
b.none .L10
umov w1, v31.h[0]
umov w20, v30.s[0]
b .L6
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug rtl-optimization/113903] sched1 should schedule across EBBS
2024-02-13 10:40 [Bug rtl-optimization/113903] New: sched1 should schedule across EBBS tnfchris at gcc dot gnu.org
@ 2024-02-13 11:26 ` amonakov at gcc dot gnu.org
2024-02-13 13:45 ` tnfchris at gcc dot gnu.org
2024-02-13 19:09 ` pinskia at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: amonakov at gcc dot gnu.org @ 2024-02-13 11:26 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113903
Alexander Monakov <amonakov at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |amonakov at gcc dot gnu.org
--- Comment #1 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
Lifting those insns from the L8 BB to the L10 BB requires duplicating them on
all incoming edges targeting L8, doesn't it?
Why is decreasing live ranges important here?
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug rtl-optimization/113903] sched1 should schedule across EBBS
2024-02-13 10:40 [Bug rtl-optimization/113903] New: sched1 should schedule across EBBS tnfchris at gcc dot gnu.org
2024-02-13 11:26 ` [Bug rtl-optimization/113903] " amonakov at gcc dot gnu.org
@ 2024-02-13 13:45 ` tnfchris at gcc dot gnu.org
2024-02-13 19:09 ` pinskia at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-02-13 13:45 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113903
--- Comment #2 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to Alexander Monakov from comment #1)
> Lifting those insns from the L8 BB to the L10 BB requires duplicating them
> on all incoming edges targeting L8, doesn't it?
>
No, because they're unused before L10. If they are used then they can't be
moved. (note that L10 is only reachable from L8 as it's a branch in the loop).
> Why is decreasing live ranges important here?
two reasons, first we have to avoid prematurely creating the copies.
The loop has multiple exits, and the values are not relevant for all exits.
mov z29.d, z31.d
mov z27.d, z30.d
is being done because we increment the inductions in the same basic block. But
the incremented value is not needed in L8.
for loop induction variables I suppose we can change the materialization point
in the vectorizer to deal with them that way, but that only takes care of
inductions and ideally we shouldn't perform operations before an exit if it's
not needed for that exit.
At the moment the vectorizer only deals with moving statements that are needed
for correctness.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug rtl-optimization/113903] sched1 should schedule across EBBS
2024-02-13 10:40 [Bug rtl-optimization/113903] New: sched1 should schedule across EBBS tnfchris at gcc dot gnu.org
2024-02-13 11:26 ` [Bug rtl-optimization/113903] " amonakov at gcc dot gnu.org
2024-02-13 13:45 ` tnfchris at gcc dot gnu.org
@ 2024-02-13 19:09 ` pinskia at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-13 19:09 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113903
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |enhancement
CC| |pinskia at gcc dot gnu.org
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-02-13 19:09 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-13 10:40 [Bug rtl-optimization/113903] New: sched1 should schedule across EBBS tnfchris at gcc dot gnu.org
2024-02-13 11:26 ` [Bug rtl-optimization/113903] " amonakov at gcc dot gnu.org
2024-02-13 13:45 ` tnfchris at gcc dot gnu.org
2024-02-13 19:09 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).