public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/106952] New: Missed optimization: x < y ? x : y not lowered to minss
@ 2022-09-15 15:30 tavianator at gmail dot com
  2022-09-15 15:42 ` [Bug target/106952] " amonakov at gcc dot gnu.org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: tavianator at gmail dot com @ 2022-09-15 15:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106952

            Bug ID: 106952
           Summary: Missed optimization: x < y ? x : y not lowered to
                    minss
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tavianator at gmail dot com
  Target Milestone: ---

Created attachment 53580
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53580&action=edit
Assembly from gcc -O3 -S bug.c

The following is an implementation of a ray/axis-aligned box intersection test:

struct ray {
    float origin[3];
    float dir_inv[3];
};

struct box {
    float min[3];
    float max[3];
};

static inline float min(float x, float y) {
    return x < y ? x : y;
}

static inline float max(float x, float y) {
    return x < y ? x : y;
}

_Bool intersection(const struct ray *ray, const struct box *box) {
    float tmin = 0.0, tmax = 1.0 / 0.0;

    for (int i = 0; i < 3; ++i) {
        float t1 = (box->min[i] - ray->origin[i]) * ray->dir_inv[i];
        float t2 = (box->max[i] - ray->origin[i]) * ray->dir_inv[i];

        tmin = min(max(t1, tmin), max(t2, tmin));
        tmax = max(min(t1, tmax), min(t2, tmax));
    }

    return tmin < tmax;
}

However, gcc -O3 doesn't use minss/maxss for every min()/max().  Instead, some
of them are lowered to conditional jumps which regresses performance
significantly since the branches are unpredictable.

Simpler variants like

        tmin = max(tmin, min(t1, t2));
        tmax = min(tmax, max(t1, t2));

get the desired codegen, but that behaves differently if t1 or t2 is NaN.

"Bisecting" with godbolt.org, it seems this is an old regression: 4.8.5 was
good, but 4.9.0 was bad.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/106952] Missed optimization: x < y ? x : y not lowered to minss
  2022-09-15 15:30 [Bug target/106952] New: Missed optimization: x < y ? x : y not lowered to minss tavianator at gmail dot com
@ 2022-09-15 15:42 ` amonakov at gcc dot gnu.org
  2022-09-15 15:54 ` tavianator at gmail dot com
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: amonakov at gcc dot gnu.org @ 2022-09-15 15:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106952

Alexander Monakov <amonakov at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |amonakov at gcc dot gnu.org

--- Comment #1 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
Note, your 'max' function is the same as 'min' (the issue remains with that
corrected).

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/106952] Missed optimization: x < y ? x : y not lowered to minss
  2022-09-15 15:30 [Bug target/106952] New: Missed optimization: x < y ? x : y not lowered to minss tavianator at gmail dot com
  2022-09-15 15:42 ` [Bug target/106952] " amonakov at gcc dot gnu.org
@ 2022-09-15 15:54 ` tavianator at gmail dot com
  2022-09-17  4:47 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: tavianator at gmail dot com @ 2022-09-15 15:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106952

--- Comment #2 from Tavian Barnes <tavianator at gmail dot com> ---
(In reply to Alexander Monakov from comment #1)
> Note, your 'max' function is the same as 'min' (the issue remains with that
> corrected).

Whoops, thanks.

Also I just noticed that GCC 12.2 does better (but not perfect) with 

#define min(x, y) ((x) < (y) ? (x) : (y))
#define max(x, y) ((x) > (y) ? (x) : (y))

instead of the inline functions.  Doesn't seem to help GCC trunk though.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/106952] Missed optimization: x < y ? x : y not lowered to minss
  2022-09-15 15:30 [Bug target/106952] New: Missed optimization: x < y ? x : y not lowered to minss tavianator at gmail dot com
  2022-09-15 15:42 ` [Bug target/106952] " amonakov at gcc dot gnu.org
  2022-09-15 15:54 ` tavianator at gmail dot com
@ 2022-09-17  4:47 ` rguenth at gcc dot gnu.org
  2023-07-18 10:39 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-09-17  4:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106952

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2022-09-17
             Target|                            |x86_64-*-*

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
The main issue is that combine is confused with the more complicated setup and
the x86 ieee min/max patterns and on the GIMPLE level we are not forming IEEE
compatible .FMIN and .FMAX plus the x86 backend doesn't announce any.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/106952] Missed optimization: x < y ? x : y not lowered to minss
  2022-09-15 15:30 [Bug target/106952] New: Missed optimization: x < y ? x : y not lowered to minss tavianator at gmail dot com
                   ` (2 preceding siblings ...)
  2022-09-17  4:47 ` rguenth at gcc dot gnu.org
@ 2023-07-18 10:39 ` rguenth at gcc dot gnu.org
  2023-07-20  8:39 ` rguenth at gcc dot gnu.org
  2023-07-21  8:18 ` rguenth at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-18 10:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106952

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
With the proposed patches for PR88540 and PR105715 I get with -O3 -msse4.1

intersection:
.LFB2:
        .cfi_startproc
        movss   .LC0(%rip), %xmm5
        pxor    %xmm2, %xmm2
        movss   (%rdi), %xmm4
        movss   12(%rsi), %xmm1
        movss   12(%rdi), %xmm0
        divss   %xmm2, %xmm5
        movss   (%rsi), %xmm3
        subss   %xmm4, %xmm1
        subss   %xmm4, %xmm3
        pxor    %xmm4, %xmm4
        mulss   %xmm0, %xmm1
        mulss   %xmm0, %xmm3
        movaps  %xmm1, %xmm0
        cmpnltss        %xmm2, %xmm0
        blendvps        %xmm0, %xmm1, %xmm4
        movaps  %xmm3, %xmm0
        cmpnltss        %xmm2, %xmm0
        pxor    %xmm2, %xmm2
        blendvps        %xmm0, %xmm3, %xmm2
        movss   16(%rsi), %xmm0
        minss   %xmm5, %xmm3
        minss   %xmm5, %xmm1
        movss   4(%rdi), %xmm5
        minss   %xmm4, %xmm2
        movss   16(%rdi), %xmm4
        subss   %xmm5, %xmm0
        maxss   %xmm3, %xmm1
        movss   4(%rsi), %xmm3
        subss   %xmm5, %xmm3
        mulss   %xmm4, %xmm0
        movss   8(%rdi), %xmm5
        mulss   %xmm4, %xmm3
        movaps  %xmm2, %xmm4
        maxss   %xmm0, %xmm4
        minss   %xmm1, %xmm0
        maxss   %xmm3, %xmm2
        minss   %xmm1, %xmm3
        movss   8(%rsi), %xmm1
        subss   %xmm5, %xmm1
        maxss   %xmm3, %xmm0
        movss   20(%rsi), %xmm3
        minss   %xmm4, %xmm2
        movss   20(%rdi), %xmm4
        subss   %xmm5, %xmm3
        mulss   %xmm4, %xmm1
        movaps  %xmm2, %xmm5
        mulss   %xmm4, %xmm3
        movaps  %xmm2, %xmm4
        maxss   %xmm1, %xmm4
        minss   %xmm0, %xmm1
        movaps  %xmm3, %xmm2
        maxss   %xmm3, %xmm5
        minss   %xmm0, %xmm2
        minss   %xmm5, %xmm4
        maxss   %xmm1, %xmm2
        comiss  %xmm4, %xmm2
        seta    %al
        ret

there's the existing issue that RTL conditional move expansion doesn't
preserve the equality of constants for

  _33 = t2_34 < 0.0;
  _12 = _33 ? 0.0 : t2_34;

but it emits two loads from the constant pool for 0.0 here which in the x86
backend fail to be recognized as min/max.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/106952] Missed optimization: x < y ? x : y not lowered to minss
  2022-09-15 15:30 [Bug target/106952] New: Missed optimization: x < y ? x : y not lowered to minss tavianator at gmail dot com
                   ` (3 preceding siblings ...)
  2023-07-18 10:39 ` rguenth at gcc dot gnu.org
@ 2023-07-20  8:39 ` rguenth at gcc dot gnu.org
  2023-07-21  8:18 ` rguenth at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-20  8:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106952

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
             Status|NEW                         |ASSIGNED

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/106952] Missed optimization: x < y ? x : y not lowered to minss
  2022-09-15 15:30 [Bug target/106952] New: Missed optimization: x < y ? x : y not lowered to minss tavianator at gmail dot com
                   ` (4 preceding siblings ...)
  2023-07-20  8:39 ` rguenth at gcc dot gnu.org
@ 2023-07-21  8:18 ` rguenth at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-21  8:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106952

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org,
                   |                            |uros at gcc dot gnu.org
             Status|ASSIGNED                    |NEW
           Assignee|rguenth at gcc dot gnu.org         |unassigned at gcc dot gnu.org

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
After the latest fixes we still fail to recognize min/max early for

  float < 0.0 ? float : 0.0

because prepare_cmp_insn doesn't push the FP 0.0 constant to a reg
since RTX cost for this seems to be zero.  We then call insn_operand_matches
which ultimatively fails in ix86_fp_comparison_operator as
ix86_fp_comparison_strategy is IX86_FPCMP_COMI here and
ix86_trivial_fp_comparison_operator for

(lt (reg/v:SF 110 [ t2 ])
    (const_double:SF 0.0 [0x0.0p+0]))

returns false.

If I fix things so we try (gt (const_double:SF 0.0 [0x0.0p+0]) (reg:SF ..))
then maybe_legitimize_operands "breaks" things here since it forces
the cond operand to a register but not the comparison operand so
ix86_expand_fp_movcc again FAILs.

I'm not sure why the x86 backend allows any CONST_DOUBLE as part of
comparisons (during expansion only?).  This and maybe special-handling
of rtx_cost with this special constant and LT/GT code makes the first
compares not recognized as MIN/MAX.

The rest is fixed now.

Patch for trying (gt ..):

diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index 32ff379ffc3..3ff8ba88bbb 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -4607,6 +4607,14 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code
comparison, rtx size,
        break;
     }

+  if (FLOAT_MODE_P (mode))
+    {
+      prepare_cmp_insn (y, x, swap_condition (comparison),
+                       size, unsignedp, methods, ptest, pmode);
+      if (*ptest)
+       return;
+    }
+
   if (methods != OPTAB_LIB_WIDEN)
     goto fail;

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-07-21  8:18 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-15 15:30 [Bug target/106952] New: Missed optimization: x < y ? x : y not lowered to minss tavianator at gmail dot com
2022-09-15 15:42 ` [Bug target/106952] " amonakov at gcc dot gnu.org
2022-09-15 15:54 ` tavianator at gmail dot com
2022-09-17  4:47 ` rguenth at gcc dot gnu.org
2023-07-18 10:39 ` rguenth at gcc dot gnu.org
2023-07-20  8:39 ` rguenth at gcc dot gnu.org
2023-07-21  8:18 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).