public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/99930] New: Failure to optimize floating point -abs(x) in nontrivial code at -O2/3
@ 2021-04-06 10:38 core13 at gmx dot net
  2021-04-06 11:57 ` [Bug rtl-optimization/99930] " rguenth at gcc dot gnu.org
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: core13 at gmx dot net @ 2021-04-06 10:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930

            Bug ID: 99930
           Summary: Failure to optimize floating point -abs(x) in
                    nontrivial code at -O2/3
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: core13 at gmx dot net
  Target Milestone: ---

Expected compiler output for -abs(x) is an orps setting the sign bit.


It works as expected with trivial code at -O1/2/3 optimization levels:

float q(float p)
{
    return -std::abs(p);
}

orps    xmm0, XMMWORD PTR .LC1[rip]
ret


With more complex code the compiler uses orps at -O1 but andps + xorps at
-O2/3:

bool t(float n[2], float m)
{
    for (int i = 0; i < 2; i++)
        if (m > -std::abs(n[i]))
            return true;
    return false;
}

-O1
movss   xmm1, DWORD PTR [rdi]
orps    xmm1, XMMWORD PTR .LC1[rip]
comiss  xmm0, xmm1
ja      .L3
movss   xmm1, DWORD PTR [rdi+4]
orps    xmm1, XMMWORD PTR .LC1[rip]
comiss  xmm0, xmm1
seta    al
ret

-O2/3
movss   xmm1, DWORD PTR [rdi]
movss   xmm3, DWORD PTR .LC0[rip]
movss   xmm2, DWORD PTR .LC1[rip]
andps   xmm1, xmm3
xorps   xmm1, xmm2
comiss  xmm0, xmm1
ja      .L3
movss   xmm1, DWORD PTR [rdi+4]
andps   xmm1, xmm3
xorps   xmm1, xmm2
comiss  xmm0, xmm1
seta    al
ret

https://godbolt.org/z/5ch5ceEj7

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3
  2021-04-06 10:38 [Bug target/99930] New: Failure to optimize floating point -abs(x) in nontrivial code at -O2/3 core13 at gmx dot net
@ 2021-04-06 11:57 ` rguenth at gcc dot gnu.org
  2021-04-06 12:00 ` rguenth at gcc dot gnu.org
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-04-06 11:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2021-04-06
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
             Target|                            |x86_64-*-*
          Component|target                      |rtl-optimization

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.  At -O1

Trying 10 -> 12:
   10: {r91:SF=abs(r92:SF);use [`*.LC0'];clobber flags:CC;}
      REG_UNUSED flags:CC
      REG_DEAD r92:SF
   12: {r94:SF=-r91:SF;use r95:V4SF;clobber flags:CC;}
      REG_DEAD r95:V4SF
      REG_DEAD r91:SF
      REG_UNUSED flags:CC
Failed to match this instruction:
(parallel [
        (set (reg:SF 94)
            (neg:SF (abs:SF (reg:SF 92 [ *n_9(D) ]))))
        (use (reg:V4SF 95))
        (clobber (reg:CC 17 flags))
    ])
Successfully matched this instruction:
(parallel [
        (set (reg:SF 94)
            (neg:SF (abs:SF (reg:SF 92 [ *n_9(D) ]))))
        (use (reg:V4SF 95))
    ])
allowing combination of insns 10 and 12
original costs 4 + 4 = 8
replacement cost 8

but with -O2:

Trying 10 -> 12:
   10: {r91:SF=abs(r92:SF);use r93:V4SF;clobber flags:CC;}
      REG_DEAD r92:SF
      REG_UNUSED flags:CC
   12: {r94:SF=-r91:SF;use r95:V4SF;clobber flags:CC;}
      REG_DEAD r91:SF
      REG_UNUSED flags:CC
Can't combine i2 into i3

we're later trying

Trying 10, 12 -> 13:
   10: {r91:SF=abs(r92:SF);use r93:V4SF;clobber flags:CC;}
      REG_DEAD r92:SF
      REG_UNUSED flags:CC
   12: {r94:SF=-r91:SF;use r95:V4SF;clobber flags:CC;}
      REG_DEAD r91:SF
      REG_UNUSED flags:CC
   13: flags:CCFP=cmp(r90:SF,r94:SF)
      REG_DEAD r94:SF
Failed to match this instruction:
(set (reg:CCFP 17 flags)
    (compare:CCFP (neg:SF (abs:SF (reg:SF 92 [ *n_9(D) ])))
        (reg/v:SF 90 [ m ])))
Failed to match this instruction:
(set (reg:SF 94)
    (abs:SF (reg:SF 92 [ *n_9(D) ])))

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3
  2021-04-06 10:38 [Bug target/99930] New: Failure to optimize floating point -abs(x) in nontrivial code at -O2/3 core13 at gmx dot net
  2021-04-06 11:57 ` [Bug rtl-optimization/99930] " rguenth at gcc dot gnu.org
@ 2021-04-06 12:00 ` rguenth at gcc dot gnu.org
  2021-04-06 17:31 ` segher at gcc dot gnu.org
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-04-06 12:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |segher at gcc dot gnu.org

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Seems because of r93 being live:

insn_cost 8 for     9: r93:V4SF=[`*.LC0']
      REG_EQUAL const_vector
insn_cost 4 for    10: {r91:SF=abs(r92:SF);use r93:V4SF;clobber flags:CC;}
      REG_DEAD r92:SF
      REG_UNUSED flags:CC
insn_cost 8 for    11: r95:V4SF=[`*.LC1']
      REG_EQUAL const_vector
insn_cost 4 for    12: {r94:SF=-r91:SF;use r95:V4SF;clobber flags:CC;}
      REG_DEAD r91:SF
      REG_UNUSED flags:CC
insn_cost 4 for    13: flags:CCFP=cmp(r90:SF,r94:SF)
      REG_DEAD r94:SF
insn_cost 12 for    14: pc={(flags:CCFP>0)?L35:pc}
      REG_DEAD flags:CCFP
      REG_BR_PROB 59055804
insn_cost 8 for    16: r97:SF=[r89:DI+0x4]
      REG_DEAD r89:DI
insn_cost 4 for    18: {r96:SF=abs(r97:SF);use r93:V4SF;clobber flags:CC;}
      REG_DEAD r97:SF
      REG_DEAD r93:V4SF
      REG_UNUSED flags:CC
insn_cost 4 for    20: {r99:SF=-r96:SF;use r95:V4SF;clobber flags:CC;}
      REG_DEAD r96:SF
      REG_DEAD r95:V4SF
      REG_UNUSED flags:CC

while at -O1 we have two loads of LC0 and r93 is dead after insn 10.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3
  2021-04-06 10:38 [Bug target/99930] New: Failure to optimize floating point -abs(x) in nontrivial code at -O2/3 core13 at gmx dot net
  2021-04-06 11:57 ` [Bug rtl-optimization/99930] " rguenth at gcc dot gnu.org
  2021-04-06 12:00 ` rguenth at gcc dot gnu.org
@ 2021-04-06 17:31 ` segher at gcc dot gnu.org
  2021-04-07 10:04 ` jakub at gcc dot gnu.org
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: segher at gcc dot gnu.org @ 2021-04-06 17:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930

--- Comment #3 from Segher Boessenkool <segher at gcc dot gnu.org> ---
What happens here is
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/combine.c;h=3294575357bfcb19e589868da34364498a860dcf;hb=HEAD#l1884

"*<code><mode>2_1" for absneg:MODEF has a bare "use".  And then we trigger

  If the USE in INSN was for a pseudo register, the matching
  insn pattern will likely match any register; combining this
  with any other USE would only be safe if we knew that the
  used registers have identical values, or if there was
  something to tell them apart, e.g. different modes.  For
  now, we forgo such complicated tests and simply disallow
  combining of USES of pseudo registers with any other USE.

because both the abs and the neg have a bare use.

The patterns should be rewritten to not have such bare uses.  Alternatively
we can add some pretty-much-never-triggered code do combine to handle this
case.  Patches welcome.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3
  2021-04-06 10:38 [Bug target/99930] New: Failure to optimize floating point -abs(x) in nontrivial code at -O2/3 core13 at gmx dot net
                   ` (2 preceding siblings ...)
  2021-04-06 17:31 ` segher at gcc dot gnu.org
@ 2021-04-07 10:04 ` jakub at gcc dot gnu.org
  2021-04-07 10:06 ` jakub at gcc dot gnu.org
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-04-07 10:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org,
                   |                            |uros at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Is there some reason why the patterns are written that way rather than split
immediately into the AND or XOR?  Perhaps it could be done on SUBREGs to make
it valid RTL, but we split into those post reload already anyway.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3
  2021-04-06 10:38 [Bug target/99930] New: Failure to optimize floating point -abs(x) in nontrivial code at -O2/3 core13 at gmx dot net
                   ` (3 preceding siblings ...)
  2021-04-07 10:04 ` jakub at gcc dot gnu.org
@ 2021-04-07 10:06 ` jakub at gcc dot gnu.org
  2021-04-07 10:14 ` ubizjak at gmail dot com
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-04-07 10:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Maybe the X alternatives where we don't know the sign bit mask.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3
  2021-04-06 10:38 [Bug target/99930] New: Failure to optimize floating point -abs(x) in nontrivial code at -O2/3 core13 at gmx dot net
                   ` (4 preceding siblings ...)
  2021-04-07 10:06 ` jakub at gcc dot gnu.org
@ 2021-04-07 10:14 ` ubizjak at gmail dot com
  2021-04-07 10:27 ` crazylht at gmail dot com
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: ubizjak at gmail dot com @ 2021-04-07 10:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930

--- Comment #6 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Jakub Jelinek from comment #4)
> Is there some reason why the patterns are written that way rather than split
> immediately into the AND or XOR?  Perhaps it could be done on SUBREGs to
> make it valid RTL, but we split into those post reload already anyway.

I don't know, since these patterns pre-date my involvement in gcc.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3
  2021-04-06 10:38 [Bug target/99930] New: Failure to optimize floating point -abs(x) in nontrivial code at -O2/3 core13 at gmx dot net
                   ` (5 preceding siblings ...)
  2021-04-07 10:14 ` ubizjak at gmail dot com
@ 2021-04-07 10:27 ` crazylht at gmail dot com
  2021-04-07 16:32 ` segher at gcc dot gnu.org
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: crazylht at gmail dot com @ 2021-04-07 10:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930

--- Comment #7 from Hongtao.liu <crazylht at gmail dot com> ---
i'm testing

1 file changed, 30 insertions(+)
gcc/combine.c | 30 ++++++++++++++++++++++++++++++

modified   gcc/combine.c
@@ -1811,6 +1811,33 @@ set_nonzero_bits_and_sign_copies (rtx x, const_rtx set,
void *data)
        }
     }
 }
+
+/* Return true is reg is only defined by loading from constant pool.  */
+static int
+single_ref_from_constant_pool (rtx reg)
+{
+  gcc_assert (REG_P (reg));
+  rtx_insn* insn;
+  rtx src, set;
+
+  if (DF_REG_DEF_COUNT (REGNO (reg)) != 1)
+    return 0;
+  insn = DF_REF_INSN (DF_REG_DEF_CHAIN (REGNO (reg)));
+  if (!insn)
+    return 0;
+  set = single_set (insn);
+  if (!set)
+    return 0;
+  src = SET_SRC (set);
+
+  /* Constant pool.  */
+  if (!MEM_P (src)
+      || !SYMBOL_REF_P (XEXP (src, 0))
+      || !CONSTANT_POOL_ADDRESS_P (XEXP (src, 0)))
+    return 0;
+
+  return 1;
+}

 /* See if INSN can be combined into I3.  PRED, PRED2, SUCC and SUCC2 are
    optionally insns that were previously combined into I3 or that will be
@@ -1895,7 +1922,10 @@ can_combine_p (rtx_insn *insn, rtx_insn *i3, rtx_insn
*pred ATTRIBUTE_UNUSED,
                 something to tell them apart, e.g. different modes.  For
                 now, we forgo such complicated tests and simply disallow
                 combining of USES of pseudo registers with any other USE.  */
+             /* If the USE in INSN is only defined by loading from constant
+                pool, it must have identical value.  */
              if (REG_P (XEXP (elt, 0))
+                 && !single_ref_from_constant_pool (XEXP (elt, 0))
                  && GET_CODE (PATTERN (i3)) == PARALLEL)
                {
                  rtx i3pat = PATTERN (i3);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3
  2021-04-06 10:38 [Bug target/99930] New: Failure to optimize floating point -abs(x) in nontrivial code at -O2/3 core13 at gmx dot net
                   ` (6 preceding siblings ...)
  2021-04-07 10:27 ` crazylht at gmail dot com
@ 2021-04-07 16:32 ` segher at gcc dot gnu.org
  2021-04-08  9:48 ` crazylht at gmail dot com
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: segher at gcc dot gnu.org @ 2021-04-07 16:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930

--- Comment #8 from Segher Boessenkool <segher at gcc dot gnu.org> ---
That patch is no good.  The combination is not allowed because it is not
known what the "use"s are *for*.  Checking if something is from the constant
pools is not enough at all.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3
  2021-04-06 10:38 [Bug target/99930] New: Failure to optimize floating point -abs(x) in nontrivial code at -O2/3 core13 at gmx dot net
                   ` (7 preceding siblings ...)
  2021-04-07 16:32 ` segher at gcc dot gnu.org
@ 2021-04-08  9:48 ` crazylht at gmail dot com
  2021-04-08 22:46 ` segher at gcc dot gnu.org
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: crazylht at gmail dot com @ 2021-04-08  9:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930

--- Comment #9 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Segher Boessenkool from comment #8)
> That patch is no good.  The combination is not allowed because it is not
> known what the "use"s are *for*.  Checking if something is from the constant
> pools is not enough at all.

in -O1 the USE of INSN is ---use [`*.LC0']--- a reference of constant pool, we
also don't know what the uses are for, why it can be combined?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3
  2021-04-06 10:38 [Bug target/99930] New: Failure to optimize floating point -abs(x) in nontrivial code at -O2/3 core13 at gmx dot net
                   ` (8 preceding siblings ...)
  2021-04-08  9:48 ` crazylht at gmail dot com
@ 2021-04-08 22:46 ` segher at gcc dot gnu.org
  2021-07-25  1:30 ` pinskia at gcc dot gnu.org
  2023-12-24 23:16 ` [Bug middle-end/99930] " pinskia at gcc dot gnu.org
  11 siblings, 0 replies; 13+ messages in thread
From: segher at gcc dot gnu.org @ 2021-04-08 22:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930

--- Comment #10 from Segher Boessenkool <segher at gcc dot gnu.org> ---
That is a USE of a constant, which is a no-op always.  Here we have a USE
of a register, which is not.  We actually have *two* uses of pseudos, and
combine cannot know what that means for the target (all PARALLELs are split
up in combine).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3
  2021-04-06 10:38 [Bug target/99930] New: Failure to optimize floating point -abs(x) in nontrivial code at -O2/3 core13 at gmx dot net
                   ` (9 preceding siblings ...)
  2021-04-08 22:46 ` segher at gcc dot gnu.org
@ 2021-07-25  1:30 ` pinskia at gcc dot gnu.org
  2023-12-24 23:16 ` [Bug middle-end/99930] " pinskia at gcc dot gnu.org
  11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-07-25  1:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|2021-04-06 00:00:00         |2021-7-24
           Severity|normal                      |enhancement

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug middle-end/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3
  2021-04-06 10:38 [Bug target/99930] New: Failure to optimize floating point -abs(x) in nontrivial code at -O2/3 core13 at gmx dot net
                   ` (10 preceding siblings ...)
  2021-07-25  1:30 ` pinskia at gcc dot gnu.org
@ 2023-12-24 23:16 ` pinskia at gcc dot gnu.org
  11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-12-24 23:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|rtl-optimization            |middle-end

--- Comment #11 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
This is worse in GCC 13 because

`  _5 = .COPYSIGN (_4, -1.0e+0);`

is not expanded as `_4 | 0x8000000` ...

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-12-24 23:16 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-06 10:38 [Bug target/99930] New: Failure to optimize floating point -abs(x) in nontrivial code at -O2/3 core13 at gmx dot net
2021-04-06 11:57 ` [Bug rtl-optimization/99930] " rguenth at gcc dot gnu.org
2021-04-06 12:00 ` rguenth at gcc dot gnu.org
2021-04-06 17:31 ` segher at gcc dot gnu.org
2021-04-07 10:04 ` jakub at gcc dot gnu.org
2021-04-07 10:06 ` jakub at gcc dot gnu.org
2021-04-07 10:14 ` ubizjak at gmail dot com
2021-04-07 10:27 ` crazylht at gmail dot com
2021-04-07 16:32 ` segher at gcc dot gnu.org
2021-04-08  9:48 ` crazylht at gmail dot com
2021-04-08 22:46 ` segher at gcc dot gnu.org
2021-07-25  1:30 ` pinskia at gcc dot gnu.org
2023-12-24 23:16 ` [Bug middle-end/99930] " pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).