public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/106952] New: Missed optimization: x < y ? x : y not lowered to minss
@ 2022-09-15 15:30 tavianator at gmail dot com
2022-09-15 15:42 ` [Bug target/106952] " amonakov at gcc dot gnu.org
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: tavianator at gmail dot com @ 2022-09-15 15:30 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106952
Bug ID: 106952
Summary: Missed optimization: x < y ? x : y not lowered to
minss
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: tavianator at gmail dot com
Target Milestone: ---
Created attachment 53580
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53580&action=edit
Assembly from gcc -O3 -S bug.c
The following is an implementation of a ray/axis-aligned box intersection test:
struct ray {
float origin[3];
float dir_inv[3];
};
struct box {
float min[3];
float max[3];
};
static inline float min(float x, float y) {
return x < y ? x : y;
}
static inline float max(float x, float y) {
return x < y ? x : y;
}
_Bool intersection(const struct ray *ray, const struct box *box) {
float tmin = 0.0, tmax = 1.0 / 0.0;
for (int i = 0; i < 3; ++i) {
float t1 = (box->min[i] - ray->origin[i]) * ray->dir_inv[i];
float t2 = (box->max[i] - ray->origin[i]) * ray->dir_inv[i];
tmin = min(max(t1, tmin), max(t2, tmin));
tmax = max(min(t1, tmax), min(t2, tmax));
}
return tmin < tmax;
}
However, gcc -O3 doesn't use minss/maxss for every min()/max(). Instead, some
of them are lowered to conditional jumps which regresses performance
significantly since the branches are unpredictable.
Simpler variants like
tmin = max(tmin, min(t1, t2));
tmax = min(tmax, max(t1, t2));
get the desired codegen, but that behaves differently if t1 or t2 is NaN.
"Bisecting" with godbolt.org, it seems this is an old regression: 4.8.5 was
good, but 4.9.0 was bad.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/106952] Missed optimization: x < y ? x : y not lowered to minss
2022-09-15 15:30 [Bug target/106952] New: Missed optimization: x < y ? x : y not lowered to minss tavianator at gmail dot com
@ 2022-09-15 15:42 ` amonakov at gcc dot gnu.org
2022-09-15 15:54 ` tavianator at gmail dot com
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: amonakov at gcc dot gnu.org @ 2022-09-15 15:42 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106952
Alexander Monakov <amonakov at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |amonakov at gcc dot gnu.org
--- Comment #1 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
Note, your 'max' function is the same as 'min' (the issue remains with that
corrected).
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/106952] Missed optimization: x < y ? x : y not lowered to minss
2022-09-15 15:30 [Bug target/106952] New: Missed optimization: x < y ? x : y not lowered to minss tavianator at gmail dot com
2022-09-15 15:42 ` [Bug target/106952] " amonakov at gcc dot gnu.org
@ 2022-09-15 15:54 ` tavianator at gmail dot com
2022-09-17 4:47 ` rguenth at gcc dot gnu.org
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: tavianator at gmail dot com @ 2022-09-15 15:54 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106952
--- Comment #2 from Tavian Barnes <tavianator at gmail dot com> ---
(In reply to Alexander Monakov from comment #1)
> Note, your 'max' function is the same as 'min' (the issue remains with that
> corrected).
Whoops, thanks.
Also I just noticed that GCC 12.2 does better (but not perfect) with
#define min(x, y) ((x) < (y) ? (x) : (y))
#define max(x, y) ((x) > (y) ? (x) : (y))
instead of the inline functions. Doesn't seem to help GCC trunk though.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/106952] Missed optimization: x < y ? x : y not lowered to minss
2022-09-15 15:30 [Bug target/106952] New: Missed optimization: x < y ? x : y not lowered to minss tavianator at gmail dot com
2022-09-15 15:42 ` [Bug target/106952] " amonakov at gcc dot gnu.org
2022-09-15 15:54 ` tavianator at gmail dot com
@ 2022-09-17 4:47 ` rguenth at gcc dot gnu.org
2023-07-18 10:39 ` rguenth at gcc dot gnu.org
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-09-17 4:47 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106952
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
Last reconfirmed| |2022-09-17
Target| |x86_64-*-*
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
The main issue is that combine is confused with the more complicated setup and
the x86 ieee min/max patterns and on the GIMPLE level we are not forming IEEE
compatible .FMIN and .FMAX plus the x86 backend doesn't announce any.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/106952] Missed optimization: x < y ? x : y not lowered to minss
2022-09-15 15:30 [Bug target/106952] New: Missed optimization: x < y ? x : y not lowered to minss tavianator at gmail dot com
` (2 preceding siblings ...)
2022-09-17 4:47 ` rguenth at gcc dot gnu.org
@ 2023-07-18 10:39 ` rguenth at gcc dot gnu.org
2023-07-20 8:39 ` rguenth at gcc dot gnu.org
2023-07-21 8:18 ` rguenth at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-18 10:39 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106952
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
With the proposed patches for PR88540 and PR105715 I get with -O3 -msse4.1
intersection:
.LFB2:
.cfi_startproc
movss .LC0(%rip), %xmm5
pxor %xmm2, %xmm2
movss (%rdi), %xmm4
movss 12(%rsi), %xmm1
movss 12(%rdi), %xmm0
divss %xmm2, %xmm5
movss (%rsi), %xmm3
subss %xmm4, %xmm1
subss %xmm4, %xmm3
pxor %xmm4, %xmm4
mulss %xmm0, %xmm1
mulss %xmm0, %xmm3
movaps %xmm1, %xmm0
cmpnltss %xmm2, %xmm0
blendvps %xmm0, %xmm1, %xmm4
movaps %xmm3, %xmm0
cmpnltss %xmm2, %xmm0
pxor %xmm2, %xmm2
blendvps %xmm0, %xmm3, %xmm2
movss 16(%rsi), %xmm0
minss %xmm5, %xmm3
minss %xmm5, %xmm1
movss 4(%rdi), %xmm5
minss %xmm4, %xmm2
movss 16(%rdi), %xmm4
subss %xmm5, %xmm0
maxss %xmm3, %xmm1
movss 4(%rsi), %xmm3
subss %xmm5, %xmm3
mulss %xmm4, %xmm0
movss 8(%rdi), %xmm5
mulss %xmm4, %xmm3
movaps %xmm2, %xmm4
maxss %xmm0, %xmm4
minss %xmm1, %xmm0
maxss %xmm3, %xmm2
minss %xmm1, %xmm3
movss 8(%rsi), %xmm1
subss %xmm5, %xmm1
maxss %xmm3, %xmm0
movss 20(%rsi), %xmm3
minss %xmm4, %xmm2
movss 20(%rdi), %xmm4
subss %xmm5, %xmm3
mulss %xmm4, %xmm1
movaps %xmm2, %xmm5
mulss %xmm4, %xmm3
movaps %xmm2, %xmm4
maxss %xmm1, %xmm4
minss %xmm0, %xmm1
movaps %xmm3, %xmm2
maxss %xmm3, %xmm5
minss %xmm0, %xmm2
minss %xmm5, %xmm4
maxss %xmm1, %xmm2
comiss %xmm4, %xmm2
seta %al
ret
there's the existing issue that RTL conditional move expansion doesn't
preserve the equality of constants for
_33 = t2_34 < 0.0;
_12 = _33 ? 0.0 : t2_34;
but it emits two loads from the constant pool for 0.0 here which in the x86
backend fail to be recognized as min/max.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/106952] Missed optimization: x < y ? x : y not lowered to minss
2022-09-15 15:30 [Bug target/106952] New: Missed optimization: x < y ? x : y not lowered to minss tavianator at gmail dot com
` (3 preceding siblings ...)
2023-07-18 10:39 ` rguenth at gcc dot gnu.org
@ 2023-07-20 8:39 ` rguenth at gcc dot gnu.org
2023-07-21 8:18 ` rguenth at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-20 8:39 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106952
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org
Status|NEW |ASSIGNED
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/106952] Missed optimization: x < y ? x : y not lowered to minss
2022-09-15 15:30 [Bug target/106952] New: Missed optimization: x < y ? x : y not lowered to minss tavianator at gmail dot com
` (4 preceding siblings ...)
2023-07-20 8:39 ` rguenth at gcc dot gnu.org
@ 2023-07-21 8:18 ` rguenth at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-21 8:18 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106952
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rguenth at gcc dot gnu.org,
| |uros at gcc dot gnu.org
Status|ASSIGNED |NEW
Assignee|rguenth at gcc dot gnu.org |unassigned at gcc dot gnu.org
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
After the latest fixes we still fail to recognize min/max early for
float < 0.0 ? float : 0.0
because prepare_cmp_insn doesn't push the FP 0.0 constant to a reg
since RTX cost for this seems to be zero. We then call insn_operand_matches
which ultimatively fails in ix86_fp_comparison_operator as
ix86_fp_comparison_strategy is IX86_FPCMP_COMI here and
ix86_trivial_fp_comparison_operator for
(lt (reg/v:SF 110 [ t2 ])
(const_double:SF 0.0 [0x0.0p+0]))
returns false.
If I fix things so we try (gt (const_double:SF 0.0 [0x0.0p+0]) (reg:SF ..))
then maybe_legitimize_operands "breaks" things here since it forces
the cond operand to a register but not the comparison operand so
ix86_expand_fp_movcc again FAILs.
I'm not sure why the x86 backend allows any CONST_DOUBLE as part of
comparisons (during expansion only?). This and maybe special-handling
of rtx_cost with this special constant and LT/GT code makes the first
compares not recognized as MIN/MAX.
The rest is fixed now.
Patch for trying (gt ..):
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index 32ff379ffc3..3ff8ba88bbb 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -4607,6 +4607,14 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code
comparison, rtx size,
break;
}
+ if (FLOAT_MODE_P (mode))
+ {
+ prepare_cmp_insn (y, x, swap_condition (comparison),
+ size, unsignedp, methods, ptest, pmode);
+ if (*ptest)
+ return;
+ }
+
if (methods != OPTAB_LIB_WIDEN)
goto fail;
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-07-21 8:18 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-15 15:30 [Bug target/106952] New: Missed optimization: x < y ? x : y not lowered to minss tavianator at gmail dot com
2022-09-15 15:42 ` [Bug target/106952] " amonakov at gcc dot gnu.org
2022-09-15 15:54 ` tavianator at gmail dot com
2022-09-17 4:47 ` rguenth at gcc dot gnu.org
2023-07-18 10:39 ` rguenth at gcc dot gnu.org
2023-07-20 8:39 ` rguenth at gcc dot gnu.org
2023-07-21 8:18 ` rguenth at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).