public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/94837] New: Failure to optimize out spurious movbe into bswap
@ 2020-04-29  1:11 gabravier at gmail dot com
  2020-04-29  1:48 ` [Bug rtl-optimization/94837] " pinskia at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: gabravier at gmail dot com @ 2020-04-29  1:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94837

            Bug ID: 94837
           Summary: Failure to optimize out spurious movbe into bswap
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gabravier at gmail dot com
  Target Milestone: ---

float swapFloat(float x)
{
    union
    {
        float f;
        uint32_t u32;
    } swapper;

    swapper.f = x;
    swapper.u32 = __builtin_bswap32(swapper.u32);
    return swapper.f;
}

For this function, on x86-64 with `-O3 -mmovbe`, LLVM outputs this : 

swapFloat(float): # @swapFloat(float)
  movd eax, xmm0
  bswap eax
  movd xmm0, eax
  ret

GCC instead outputs this :

swapFloat(float):
  movd DWORD PTR [rsp-4], xmm0
  movbe eax, DWORD PTR [rsp-4]
  movd xmm0, eax
  ret

It seems highly likely to me that a spill to memory is much slower than a
direct `bswap`.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug rtl-optimization/94837] Failure to optimize out spurious movbe into bswap
  2020-04-29  1:11 [Bug rtl-optimization/94837] New: Failure to optimize out spurious movbe into bswap gabravier at gmail dot com
@ 2020-04-29  1:48 ` pinskia at gcc dot gnu.org
  2020-04-29  8:26 ` gabravier at gmail dot com
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2020-04-29  1:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94837

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |DUPLICATE
             Status|UNCONFIRMED                 |RESOLVED

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
This is on purpose.

Use -mtune=intel to get the result you want.

See PR 54593 of the reason why.

*** This bug has been marked as a duplicate of bug 54593 ***

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug rtl-optimization/94837] Failure to optimize out spurious movbe into bswap
  2020-04-29  1:11 [Bug rtl-optimization/94837] New: Failure to optimize out spurious movbe into bswap gabravier at gmail dot com
  2020-04-29  1:48 ` [Bug rtl-optimization/94837] " pinskia at gcc dot gnu.org
@ 2020-04-29  8:26 ` gabravier at gmail dot com
  2020-04-29  8:46 ` gabravier at gmail dot com
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: gabravier at gmail dot com @ 2020-04-29  8:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94837

--- Comment #2 from Gabriel Ravier <gabravier at gmail dot com> ---
This is what I get with `-O3 -mmovbe -mtune=intel` : 

swapFloat(float):
  movd DWORD PTR [rsp-4], xmm0
  movbe eax, DWORD PTR [rsp-4]
  movd xmm0, eax
  ret

This seems erroneous

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug rtl-optimization/94837] Failure to optimize out spurious movbe into bswap
  2020-04-29  1:11 [Bug rtl-optimization/94837] New: Failure to optimize out spurious movbe into bswap gabravier at gmail dot com
  2020-04-29  1:48 ` [Bug rtl-optimization/94837] " pinskia at gcc dot gnu.org
  2020-04-29  8:26 ` gabravier at gmail dot com
@ 2020-04-29  8:46 ` gabravier at gmail dot com
  2020-04-29  9:49 ` ubizjak at gmail dot com
  2020-04-29  9:54 ` ubizjak at gmail dot com
  4 siblings, 0 replies; 6+ messages in thread
From: gabravier at gmail dot com @ 2020-04-29  8:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94837

--- Comment #3 from Gabriel Ravier <gabravier at gmail dot com> ---
Also, I've tested the code from
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54593 and the optimization in
question is no longer in in `-mtune=generic`, only with specific architectures
like `-mtune=k8`

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug rtl-optimization/94837] Failure to optimize out spurious movbe into bswap
  2020-04-29  1:11 [Bug rtl-optimization/94837] New: Failure to optimize out spurious movbe into bswap gabravier at gmail dot com
                   ` (2 preceding siblings ...)
  2020-04-29  8:46 ` gabravier at gmail dot com
@ 2020-04-29  9:49 ` ubizjak at gmail dot com
  2020-04-29  9:54 ` ubizjak at gmail dot com
  4 siblings, 0 replies; 6+ messages in thread
From: ubizjak at gmail dot com @ 2020-04-29  9:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94837

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|missed-optimization         |ra
                 CC|                            |vmakarov at gcc dot gnu.org
   Last reconfirmed|                            |2020-04-29
         Resolution|DUPLICATE                   |---
             Status|RESOLVED                    |NEW
     Ever confirmed|0                           |1

--- Comment #4 from Uroš Bizjak <ubizjak at gmail dot com> ---
Looks like RA (tuning?) problem.

We enter reload (-O2 -mmovbe -mtune=intel) with:

(insn 14 4 2 2 (set (reg:SF 87)
        (reg:SF 20 xmm0 [ x ])) "pr94837.c":2:1 112 {*movsf_internal}
     (expr_list:REG_DEAD (reg:SF 20 xmm0 [ x ])
        (nil)))
(insn 7 6 11 2 (set (subreg:SI (reg:SF 84 [ <retval> ]) 0)
        (bswap:SI (subreg:SI (reg:SF 87) 0))) "pr94837.c":11:19 869
{*bswapsi2_movbe}
     (expr_list:REG_DEAD (reg:SF 87)
        (nil)))
(insn 11 7 12 2 (set (reg/i:SF 20 xmm0)
        (reg:SF 84 [ <retval> ])) "pr94837.c":12:1 112 {*movsf_internal}
     (expr_list:REG_DEAD (reg:SF 84 [ <retval> ])
        (nil)))

and this sequence gets reloaded to:

(insn 17 6 7 2 (set (mem/c:SI (plus:DI (reg/f:DI 7 sp)
                (const_int -4 [0xfffffffffffffffc])) [1 %sfp+-4 S4 A32])
        (reg:SI 20 xmm0 [87])) "pr94837.c":11:19 67 {*movsi_internal}
     (nil))
(insn 7 17 16 2 (set (reg:SI 0 ax [88])
        (bswap:SI (mem/c:SI (plus:DI (reg/f:DI 7 sp)
                    (const_int -4 [0xfffffffffffffffc])) [1 %sfp+-4 S4 A32])))
"pr94837.c":11:19 869 {*bswapsi2_movbe}
     (nil))
(insn 16 7 12 2 (set (reg:SI 20 xmm0 [orig:84 <retval> ] [84])
        (reg:SI 0 ax [88])) "pr94837.c":11:19 67 {*movsi_internal}
     (nil))

One would expect reg allocator to choose alternative 0 from:

(define_insn "*bswap<mode>2_movbe"
  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=r,r,m")
        (bswap:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "0,m,r")))]
  "TARGET_MOVBE
   && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
  "@
    bswap\t%0
    movbe{<imodesuffix>}\t{%1, %0|%0, %1}
    movbe{<imodesuffix>}\t{%1, %0|%0, %1}"

but for some reason this is not the case.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug rtl-optimization/94837] Failure to optimize out spurious movbe into bswap
  2020-04-29  1:11 [Bug rtl-optimization/94837] New: Failure to optimize out spurious movbe into bswap gabravier at gmail dot com
                   ` (3 preceding siblings ...)
  2020-04-29  9:49 ` ubizjak at gmail dot com
@ 2020-04-29  9:54 ` ubizjak at gmail dot com
  4 siblings, 0 replies; 6+ messages in thread
From: ubizjak at gmail dot com @ 2020-04-29  9:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94837

--- Comment #5 from Uroš Bizjak <ubizjak at gmail dot com> ---
Probably some secondary effect of subregs on register allocation, changing
"float" to "int" in the original testcase gets us expected alternative and
optimal code using BSWAP.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-04-29  9:54 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-29  1:11 [Bug rtl-optimization/94837] New: Failure to optimize out spurious movbe into bswap gabravier at gmail dot com
2020-04-29  1:48 ` [Bug rtl-optimization/94837] " pinskia at gcc dot gnu.org
2020-04-29  8:26 ` gabravier at gmail dot com
2020-04-29  8:46 ` gabravier at gmail dot com
2020-04-29  9:49 ` ubizjak at gmail dot com
2020-04-29  9:54 ` ubizjak at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).