public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
* [Bug target/106952] New: Missed optimization: x < y ? x : y not lowered to minss @ 2022-09-15 15:30 tavianator at gmail dot com 2022-09-15 15:42 ` [Bug target/106952] " amonakov at gcc dot gnu.org ` (5 more replies) 0 siblings, 6 replies; 7+ messages in thread From: tavianator at gmail dot com @ 2022-09-15 15:30 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106952 Bug ID: 106952 Summary: Missed optimization: x < y ? x : y not lowered to minss Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: tavianator at gmail dot com Target Milestone: --- Created attachment 53580 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53580&action=edit Assembly from gcc -O3 -S bug.c The following is an implementation of a ray/axis-aligned box intersection test: struct ray { float origin[3]; float dir_inv[3]; }; struct box { float min[3]; float max[3]; }; static inline float min(float x, float y) { return x < y ? x : y; } static inline float max(float x, float y) { return x < y ? x : y; } _Bool intersection(const struct ray *ray, const struct box *box) { float tmin = 0.0, tmax = 1.0 / 0.0; for (int i = 0; i < 3; ++i) { float t1 = (box->min[i] - ray->origin[i]) * ray->dir_inv[i]; float t2 = (box->max[i] - ray->origin[i]) * ray->dir_inv[i]; tmin = min(max(t1, tmin), max(t2, tmin)); tmax = max(min(t1, tmax), min(t2, tmax)); } return tmin < tmax; } However, gcc -O3 doesn't use minss/maxss for every min()/max(). Instead, some of them are lowered to conditional jumps which regresses performance significantly since the branches are unpredictable. Simpler variants like tmin = max(tmin, min(t1, t2)); tmax = min(tmax, max(t1, t2)); get the desired codegen, but that behaves differently if t1 or t2 is NaN. "Bisecting" with godbolt.org, it seems this is an old regression: 4.8.5 was good, but 4.9.0 was bad. ^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/106952] Missed optimization: x < y ? x : y not lowered to minss 2022-09-15 15:30 [Bug target/106952] New: Missed optimization: x < y ? x : y not lowered to minss tavianator at gmail dot com @ 2022-09-15 15:42 ` amonakov at gcc dot gnu.org 2022-09-15 15:54 ` tavianator at gmail dot com ` (4 subsequent siblings) 5 siblings, 0 replies; 7+ messages in thread From: amonakov at gcc dot gnu.org @ 2022-09-15 15:42 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106952 Alexander Monakov <amonakov at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |amonakov at gcc dot gnu.org --- Comment #1 from Alexander Monakov <amonakov at gcc dot gnu.org> --- Note, your 'max' function is the same as 'min' (the issue remains with that corrected). ^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/106952] Missed optimization: x < y ? x : y not lowered to minss 2022-09-15 15:30 [Bug target/106952] New: Missed optimization: x < y ? x : y not lowered to minss tavianator at gmail dot com 2022-09-15 15:42 ` [Bug target/106952] " amonakov at gcc dot gnu.org @ 2022-09-15 15:54 ` tavianator at gmail dot com 2022-09-17 4:47 ` rguenth at gcc dot gnu.org ` (3 subsequent siblings) 5 siblings, 0 replies; 7+ messages in thread From: tavianator at gmail dot com @ 2022-09-15 15:54 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106952 --- Comment #2 from Tavian Barnes <tavianator at gmail dot com> --- (In reply to Alexander Monakov from comment #1) > Note, your 'max' function is the same as 'min' (the issue remains with that > corrected). Whoops, thanks. Also I just noticed that GCC 12.2 does better (but not perfect) with #define min(x, y) ((x) < (y) ? (x) : (y)) #define max(x, y) ((x) > (y) ? (x) : (y)) instead of the inline functions. Doesn't seem to help GCC trunk though. ^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/106952] Missed optimization: x < y ? x : y not lowered to minss 2022-09-15 15:30 [Bug target/106952] New: Missed optimization: x < y ? x : y not lowered to minss tavianator at gmail dot com 2022-09-15 15:42 ` [Bug target/106952] " amonakov at gcc dot gnu.org 2022-09-15 15:54 ` tavianator at gmail dot com @ 2022-09-17 4:47 ` rguenth at gcc dot gnu.org 2023-07-18 10:39 ` rguenth at gcc dot gnu.org ` (2 subsequent siblings) 5 siblings, 0 replies; 7+ messages in thread From: rguenth at gcc dot gnu.org @ 2022-09-17 4:47 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106952 Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Last reconfirmed| |2022-09-17 Target| |x86_64-*-* --- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> --- The main issue is that combine is confused with the more complicated setup and the x86 ieee min/max patterns and on the GIMPLE level we are not forming IEEE compatible .FMIN and .FMAX plus the x86 backend doesn't announce any. ^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/106952] Missed optimization: x < y ? x : y not lowered to minss 2022-09-15 15:30 [Bug target/106952] New: Missed optimization: x < y ? x : y not lowered to minss tavianator at gmail dot com ` (2 preceding siblings ...) 2022-09-17 4:47 ` rguenth at gcc dot gnu.org @ 2023-07-18 10:39 ` rguenth at gcc dot gnu.org 2023-07-20 8:39 ` rguenth at gcc dot gnu.org 2023-07-21 8:18 ` rguenth at gcc dot gnu.org 5 siblings, 0 replies; 7+ messages in thread From: rguenth at gcc dot gnu.org @ 2023-07-18 10:39 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106952 --- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- With the proposed patches for PR88540 and PR105715 I get with -O3 -msse4.1 intersection: .LFB2: .cfi_startproc movss .LC0(%rip), %xmm5 pxor %xmm2, %xmm2 movss (%rdi), %xmm4 movss 12(%rsi), %xmm1 movss 12(%rdi), %xmm0 divss %xmm2, %xmm5 movss (%rsi), %xmm3 subss %xmm4, %xmm1 subss %xmm4, %xmm3 pxor %xmm4, %xmm4 mulss %xmm0, %xmm1 mulss %xmm0, %xmm3 movaps %xmm1, %xmm0 cmpnltss %xmm2, %xmm0 blendvps %xmm0, %xmm1, %xmm4 movaps %xmm3, %xmm0 cmpnltss %xmm2, %xmm0 pxor %xmm2, %xmm2 blendvps %xmm0, %xmm3, %xmm2 movss 16(%rsi), %xmm0 minss %xmm5, %xmm3 minss %xmm5, %xmm1 movss 4(%rdi), %xmm5 minss %xmm4, %xmm2 movss 16(%rdi), %xmm4 subss %xmm5, %xmm0 maxss %xmm3, %xmm1 movss 4(%rsi), %xmm3 subss %xmm5, %xmm3 mulss %xmm4, %xmm0 movss 8(%rdi), %xmm5 mulss %xmm4, %xmm3 movaps %xmm2, %xmm4 maxss %xmm0, %xmm4 minss %xmm1, %xmm0 maxss %xmm3, %xmm2 minss %xmm1, %xmm3 movss 8(%rsi), %xmm1 subss %xmm5, %xmm1 maxss %xmm3, %xmm0 movss 20(%rsi), %xmm3 minss %xmm4, %xmm2 movss 20(%rdi), %xmm4 subss %xmm5, %xmm3 mulss %xmm4, %xmm1 movaps %xmm2, %xmm5 mulss %xmm4, %xmm3 movaps %xmm2, %xmm4 maxss %xmm1, %xmm4 minss %xmm0, %xmm1 movaps %xmm3, %xmm2 maxss %xmm3, %xmm5 minss %xmm0, %xmm2 minss %xmm5, %xmm4 maxss %xmm1, %xmm2 comiss %xmm4, %xmm2 seta %al ret there's the existing issue that RTL conditional move expansion doesn't preserve the equality of constants for _33 = t2_34 < 0.0; _12 = _33 ? 0.0 : t2_34; but it emits two loads from the constant pool for 0.0 here which in the x86 backend fail to be recognized as min/max. ^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/106952] Missed optimization: x < y ? x : y not lowered to minss 2022-09-15 15:30 [Bug target/106952] New: Missed optimization: x < y ? x : y not lowered to minss tavianator at gmail dot com ` (3 preceding siblings ...) 2023-07-18 10:39 ` rguenth at gcc dot gnu.org @ 2023-07-20 8:39 ` rguenth at gcc dot gnu.org 2023-07-21 8:18 ` rguenth at gcc dot gnu.org 5 siblings, 0 replies; 7+ messages in thread From: rguenth at gcc dot gnu.org @ 2023-07-20 8:39 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106952 Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Status|NEW |ASSIGNED ^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/106952] Missed optimization: x < y ? x : y not lowered to minss 2022-09-15 15:30 [Bug target/106952] New: Missed optimization: x < y ? x : y not lowered to minss tavianator at gmail dot com ` (4 preceding siblings ...) 2023-07-20 8:39 ` rguenth at gcc dot gnu.org @ 2023-07-21 8:18 ` rguenth at gcc dot gnu.org 5 siblings, 0 replies; 7+ messages in thread From: rguenth at gcc dot gnu.org @ 2023-07-21 8:18 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106952 Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rguenth at gcc dot gnu.org, | |uros at gcc dot gnu.org Status|ASSIGNED |NEW Assignee|rguenth at gcc dot gnu.org |unassigned at gcc dot gnu.org --- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> --- After the latest fixes we still fail to recognize min/max early for float < 0.0 ? float : 0.0 because prepare_cmp_insn doesn't push the FP 0.0 constant to a reg since RTX cost for this seems to be zero. We then call insn_operand_matches which ultimatively fails in ix86_fp_comparison_operator as ix86_fp_comparison_strategy is IX86_FPCMP_COMI here and ix86_trivial_fp_comparison_operator for (lt (reg/v:SF 110 [ t2 ]) (const_double:SF 0.0 [0x0.0p+0])) returns false. If I fix things so we try (gt (const_double:SF 0.0 [0x0.0p+0]) (reg:SF ..)) then maybe_legitimize_operands "breaks" things here since it forces the cond operand to a register but not the comparison operand so ix86_expand_fp_movcc again FAILs. I'm not sure why the x86 backend allows any CONST_DOUBLE as part of comparisons (during expansion only?). This and maybe special-handling of rtx_cost with this special constant and LT/GT code makes the first compares not recognized as MIN/MAX. The rest is fixed now. Patch for trying (gt ..): diff --git a/gcc/optabs.cc b/gcc/optabs.cc index 32ff379ffc3..3ff8ba88bbb 100644 --- a/gcc/optabs.cc +++ b/gcc/optabs.cc @@ -4607,6 +4607,14 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code comparison, rtx size, break; } + if (FLOAT_MODE_P (mode)) + { + prepare_cmp_insn (y, x, swap_condition (comparison), + size, unsignedp, methods, ptest, pmode); + if (*ptest) + return; + } + if (methods != OPTAB_LIB_WIDEN) goto fail; ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-07-21 8:18 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-09-15 15:30 [Bug target/106952] New: Missed optimization: x < y ? x : y not lowered to minss tavianator at gmail dot com 2022-09-15 15:42 ` [Bug target/106952] " amonakov at gcc dot gnu.org 2022-09-15 15:54 ` tavianator at gmail dot com 2022-09-17 4:47 ` rguenth at gcc dot gnu.org 2023-07-18 10:39 ` rguenth at gcc dot gnu.org 2023-07-20 8:39 ` rguenth at gcc dot gnu.org 2023-07-21 8:18 ` rguenth at gcc dot gnu.org
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).