public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/97887] New: Failure to optimize neg plus div to avoid using x87 floating point stack
@ 2020-11-18  9:35 gabravier at gmail dot com
  2020-11-18 10:33 ` [Bug target/97887] [10/11 Regression] " rguenth at gcc dot gnu.org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: gabravier at gmail dot com @ 2020-11-18  9:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97887

            Bug ID: 97887
           Summary: Failure to optimize neg plus div to avoid using x87
                    floating point stack
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gabravier at gmail dot com
  Target Milestone: ---

float f(float a)
{
    return -a / a;
}

On x86 -O3, LLVM outputs this:

.LCPI0_0:
  .long 0x80000000 # float -0
  .long 0x80000000 # float -0
  .long 0x80000000 # float -0
  .long 0x80000000 # float -0
f(float):
  movaps xmm1, xmmword ptr [rip + .LCPI0_0] # xmm1 =
[-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0]
  xorps xmm1, xmm0
  divss xmm1, xmm0
  movaps xmm0, xmm1
  ret

GCC outputs this:

f(float):
  movss DWORD PTR [rsp-4], xmm0
  fld DWORD PTR [rsp-4]
  movaps xmm1, xmm0
  fchs
  fstp DWORD PTR [rsp-4]
  movss xmm0, DWORD PTR [rsp-4]
  divss xmm0, xmm1
  ret

I'm *pretty sure* that loading the value into the x87 stack (especially mixed
with SSE instructions) is much slower than using SSE instructions for this.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/97887] [10/11 Regression] Failure to optimize neg plus div to avoid using x87 floating point stack
  2020-11-18  9:35 [Bug target/97887] New: Failure to optimize neg plus div to avoid using x87 floating point stack gabravier at gmail dot com
@ 2020-11-18 10:33 ` rguenth at gcc dot gnu.org
  2020-11-18 10:33 ` rguenth at gcc dot gnu.org
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-11-18 10:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97887

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vmakarov at gcc dot gnu.org
             Target|x86_64 i?86                 |x86_64-*-* i?86-*-*
           Keywords|                            |needs-bisection, ra
   Target Milestone|---                         |10.3
           Priority|P3                          |P2
            Summary|Failure to optimize neg     |[10/11 Regression] Failure
                   |plus div to avoid using x87 |to optimize neg plus div to
                   |floating point stack        |avoid using x87 floating
                   |                            |point stack
      Known to fail|                            |10.2.0, 11.0
      Known to work|                            |7.5.0, 9.3.1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Whoo:

********** Local #1: **********

           Spilling non-eliminable hard regs: 7
New elimination table:
Can eliminate 16 to 7 (offset=8, prev_offset=0)
Can eliminate 16 to 6 (offset=8, prev_offset=0)
Can eliminate 19 to 7 (offset=0, prev_offset=0)
Can eliminate 19 to 6 (offset=0, prev_offset=0)
            0 Non-prefered reload: reject+=600
            0 Non input pseudo reload: reject++
            1 Matching alt: reject+=2
            1 Non-prefered reload: reject+=600
          alt=0,overall=1227,losers=4,rld_nregs=2
            Staticly defined alt reject+=600
            0 Non-prefered reload: reject+=600
            0 Non input pseudo reload: reject++
            1 Matching alt: reject+=2
            1 Non-prefered reload: reject+=600
            alt=1,overall=1815,losers=2 -- refuse
         Choosing alt 0 in insn 7:  (0) =f  (1) 0 {*negsf2_i387_1}
      Creating newreg=89 from oldreg=86, assigning class FLOAT_REGS to r89
    7: {r89:SF=-r89:SF;clobber flags:CC;}
      REG_UNUSED flags:CC
    Inserting insn reload before:
   18: r89:SF=r88:SF
    Inserting insn reload after:
   19: r86:SF=r89:SF

WTF.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/97887] [10/11 Regression] Failure to optimize neg plus div to avoid using x87 floating point stack
  2020-11-18  9:35 [Bug target/97887] New: Failure to optimize neg plus div to avoid using x87 floating point stack gabravier at gmail dot com
  2020-11-18 10:33 ` [Bug target/97887] [10/11 Regression] " rguenth at gcc dot gnu.org
@ 2020-11-18 10:33 ` rguenth at gcc dot gnu.org
  2020-11-18 10:42 ` rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-11-18 10:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97887

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2020-11-18
     Ever confirmed|0                           |1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/97887] [10/11 Regression] Failure to optimize neg plus div to avoid using x87 floating point stack
  2020-11-18  9:35 [Bug target/97887] New: Failure to optimize neg plus div to avoid using x87 floating point stack gabravier at gmail dot com
  2020-11-18 10:33 ` [Bug target/97887] [10/11 Regression] " rguenth at gcc dot gnu.org
  2020-11-18 10:33 ` rguenth at gcc dot gnu.org
@ 2020-11-18 10:42 ` rguenth at gcc dot gnu.org
  2020-11-18 12:11 ` ubizjak at gmail dot com
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-11-18 10:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97887

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |uros at gcc dot gnu.org

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
combine first makes recog pick negsf2_i387_1:

Trying 6 -> 7:
    6: r87:V4SF=[`*.LC0']
      REG_EQUAL const_vector
    7: {r86:SF=-r84:SF;use r87:V4SF;clobber flags:CC;}
      REG_DEAD r87:V4SF
      REG_UNUSED flags:CC
Successfully matched this instruction:
(parallel [
        (set (reg:SF 86) 
            (neg:SF (reg/v:SF 84 [ a ])))
        (use (mem/u/c:V4SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0  S16
A128]))
        (clobber (reg:CC 17 flags))
    ])
allowing combination of insns 6 and 7
original costs 8 + 4 = 12
replacement cost 4

...

(insn 7 6 8 2 (parallel [
            (set (reg:SF 86)
                (neg:SF (reg:SF 88))) 
            (clobber (reg:CC 17 flags))
        ]) "t.i":3:12 595 {*negsf2_i387_1}
     (expr_list:REG_UNUSED (reg:CC 17 flags)
        (nil)))
(insn 8 7 13 2 (set (reg:SF 85)
        (div:SF (reg:SF 86)
            (reg:SF 88))) "t.i":3:15 965 {*fop_sf_1}
     (expr_list:REG_DEAD (reg:SF 88)
        (expr_list:REG_DEAD (reg:SF 86)
            (nil))))

of course we have fop_sf_1 for the division already before but that's
probably just a misnamed pattern.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/97887] [10/11 Regression] Failure to optimize neg plus div to avoid using x87 floating point stack
  2020-11-18  9:35 [Bug target/97887] New: Failure to optimize neg plus div to avoid using x87 floating point stack gabravier at gmail dot com
                   ` (2 preceding siblings ...)
  2020-11-18 10:42 ` rguenth at gcc dot gnu.org
@ 2020-11-18 12:11 ` ubizjak at gmail dot com
  2020-11-18 13:54 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: ubizjak at gmail dot com @ 2020-11-18 12:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97887

--- Comment #3 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Richard Biener from comment #2)
> combine first makes recog pick negsf2_i387_1:

This should have the following insn constraint:

  "TARGET_80387 && !(SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)"

to hide it from combine in cases where relevant SSE mode is available.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/97887] [10/11 Regression] Failure to optimize neg plus div to avoid using x87 floating point stack
  2020-11-18  9:35 [Bug target/97887] New: Failure to optimize neg plus div to avoid using x87 floating point stack gabravier at gmail dot com
                   ` (3 preceding siblings ...)
  2020-11-18 12:11 ` ubizjak at gmail dot com
@ 2020-11-18 13:54 ` rguenth at gcc dot gnu.org
  2020-11-18 13:56 ` ubizjak at gmail dot com
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-11-18 13:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97887

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Uroš Bizjak from comment #3)
> (In reply to Richard Biener from comment #2)
> > combine first makes recog pick negsf2_i387_1:
> 
> This should have the following insn constraint:
> 
>   "TARGET_80387 && !(SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)"
> 
> to hide it from combine in cases where relevant SSE mode is available.

Hmm, it is

;; Changing of sign for FP values is doable using integer unit too.
(define_insn "*<code><mode>2_i387_1"
  [(set (match_operand:X87MODEF 0 "register_operand" "=f,!r")
        (absneg:X87MODEF
          (match_operand:X87MODEF 1 "register_operand" "0,0")))
   (clobber (reg:CC FLAGS_REG))]
  "TARGET_80387"
  "#")

that is not guarded in this way?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/97887] [10/11 Regression] Failure to optimize neg plus div to avoid using x87 floating point stack
  2020-11-18  9:35 [Bug target/97887] New: Failure to optimize neg plus div to avoid using x87 floating point stack gabravier at gmail dot com
                   ` (4 preceding siblings ...)
  2020-11-18 13:54 ` rguenth at gcc dot gnu.org
@ 2020-11-18 13:56 ` ubizjak at gmail dot com
  2020-11-18 13:58 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: ubizjak at gmail dot com @ 2020-11-18 13:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97887

--- Comment #5 from Uroš Bizjak <ubizjak at gmail dot com> ---
> > This should have the following insn constraint:
> > 
> >   "TARGET_80387 && !(SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)"
> > 
> > to hide it from combine in cases where relevant SSE mode is available.
> 
> Hmm, it is
> 
> ;; Changing of sign for FP values is doable using integer unit too.
> (define_insn "*<code><mode>2_i387_1"
>   [(set (match_operand:X87MODEF 0 "register_operand" "=f,!r")
>         (absneg:X87MODEF
>           (match_operand:X87MODEF 1 "register_operand" "0,0")))
>    (clobber (reg:CC FLAGS_REG))]
>   "TARGET_80387"
>   "#")
> 
> that is not guarded in this way?

Yes, this is the one.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/97887] [10/11 Regression] Failure to optimize neg plus div to avoid using x87 floating point stack
  2020-11-18  9:35 [Bug target/97887] New: Failure to optimize neg plus div to avoid using x87 floating point stack gabravier at gmail dot com
                   ` (5 preceding siblings ...)
  2020-11-18 13:56 ` ubizjak at gmail dot com
@ 2020-11-18 13:58 ` rguenth at gcc dot gnu.org
  2020-11-18 15:09 ` ubizjak at gmail dot com
  2020-11-19  9:00 ` rguenth at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-11-18 13:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97887

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
So likely caused by g:f359611b363490b48a7ce0fd021f7e47d8816eb0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/97887] [10/11 Regression] Failure to optimize neg plus div to avoid using x87 floating point stack
  2020-11-18  9:35 [Bug target/97887] New: Failure to optimize neg plus div to avoid using x87 floating point stack gabravier at gmail dot com
                   ` (6 preceding siblings ...)
  2020-11-18 13:58 ` rguenth at gcc dot gnu.org
@ 2020-11-18 15:09 ` ubizjak at gmail dot com
  2020-11-19  9:00 ` rguenth at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: ubizjak at gmail dot com @ 2020-11-18 15:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97887

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |ubizjak at gmail dot com

--- Comment #7 from Uroš Bizjak <ubizjak at gmail dot com> ---
I'll fix this.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/97887] [10/11 Regression] Failure to optimize neg plus div to avoid using x87 floating point stack
  2020-11-18  9:35 [Bug target/97887] New: Failure to optimize neg plus div to avoid using x87 floating point stack gabravier at gmail dot com
                   ` (7 preceding siblings ...)
  2020-11-18 15:09 ` ubizjak at gmail dot com
@ 2020-11-19  9:00 ` rguenth at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-11-19  9:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97887

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|ASSIGNED                    |RESOLVED
           Keywords|needs-bisection, ra         |

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed by

commit 50134189a434e638861f8bf27d5caab9622811c8
Author: Uros Bizjak <ubizjak@gmail.com>
Date:   Thu Nov 19 09:23:46 2020 +0100

    i386: Disable *<absneg:code><mode>2_i387_1 for TARGET_SSE_MATH modes

    This pattern interferes with *<absneg:code><mode>2_1 when TARGET_SSE_MATH
    modes are active. Combine pass is able to remove (use) RTXes and transforms
    *<absneg:code><mode>2_1 to *<absneg:code><mode>2_i387_1 where SSE
    alternatives are not available.

    2020-11-19  Uro305241 Bizjak  <ubizjak@gmail.com>

    gcc/
            * config/i386/i386.md (*<absneg:code><mode>2_i387_1):
            Disable for TARGET_SSE_MATH modes.

    gcc/testsuite/
            * gcc.target/i386/pr97887.c: New test.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-11-19  9:00 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-18  9:35 [Bug target/97887] New: Failure to optimize neg plus div to avoid using x87 floating point stack gabravier at gmail dot com
2020-11-18 10:33 ` [Bug target/97887] [10/11 Regression] " rguenth at gcc dot gnu.org
2020-11-18 10:33 ` rguenth at gcc dot gnu.org
2020-11-18 10:42 ` rguenth at gcc dot gnu.org
2020-11-18 12:11 ` ubizjak at gmail dot com
2020-11-18 13:54 ` rguenth at gcc dot gnu.org
2020-11-18 13:56 ` ubizjak at gmail dot com
2020-11-18 13:58 ` rguenth at gcc dot gnu.org
2020-11-18 15:09 ` ubizjak at gmail dot com
2020-11-19  9:00 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).