public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/97873] New: Failure to optimize abs optimally (at least one useless instruction on x86)
@ 2020-11-17 10:05 gabravier at gmail dot com
  2020-11-17 15:30 ` [Bug target/97873] Failure to optimize abs optimally (at least one completely " rguenth at gcc dot gnu.org
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: gabravier at gmail dot com @ 2020-11-17 10:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97873

            Bug ID: 97873
           Summary: Failure to optimize abs optimally (at least one
                    useless instruction on x86)
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gabravier at gmail dot com
  Target Milestone: ---

int abs(int x)
{
        return (x < 0) ? -x : x;
}

For x86 -O3, LLVM outputs this:

abs:
  mov eax, edi
  neg eax
  cmovl eax, edi
  ret

GCC outputs this:

abs:
  mov eax, edi
  neg eax
  cmp eax, edi
  cmovl eax, edi
  ret

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/97873] Failure to optimize abs optimally (at least one completely useless instruction on x86)
  2020-11-17 10:05 [Bug tree-optimization/97873] New: Failure to optimize abs optimally (at least one useless instruction on x86) gabravier at gmail dot com
@ 2020-11-17 15:30 ` rguenth at gcc dot gnu.org
  2020-11-17 15:32 ` rguenth at gcc dot gnu.org
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-11-17 15:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97873

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
             Target|                            |x86_64-*-* i?86-*-*
             Status|UNCONFIRMED                 |NEW
          Component|tree-optimization           |target
   Last reconfirmed|                            |2020-11-17

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
We're expanding from

  _2 = ABS_EXPR <x_1(D)>;
  return _2;

that's nothing to improve on the GIMPLE level.  x86 has an abs expander
that might or might not trigger (it doesn't look like for generic).
We expand as

insn 6 3 7 2 (parallel [
            (set (reg:SI 85)
                (neg:SI (reg/v:SI 83 [ x ])))
            (clobber (reg:CC 17 flags))
        ]) "t.i":3:29 -1
     (nil))
(insn 7 6 8 2 (parallel [
            (set (reg:SI 84)
                (smax:SI (reg/v:SI 83 [ x ])
                    (reg:SI 85)))
            (clobber (reg:CC 17 flags))
        ]) "t.i":3:29 -1
     (nil))
(insn 8 7 12 2 (set (reg:SI 82 [ <retval> ])
        (reg:SI 84)) "t.i":3:29 -1
     (nil))
(insn 12 8 13 2 (set (reg/i:SI 0 ax)
        (reg:SI 82 [ <retval> ])) "t.i":4:1 -1
     (nil))

where the smax:SI is late splitted IIRC.  That's probably not the optimal
expansion choice.

I remember a duplicate bug.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/97873] Failure to optimize abs optimally (at least one completely useless instruction on x86)
  2020-11-17 10:05 [Bug tree-optimization/97873] New: Failure to optimize abs optimally (at least one useless instruction on x86) gabravier at gmail dot com
  2020-11-17 15:30 ` [Bug target/97873] Failure to optimize abs optimally (at least one completely " rguenth at gcc dot gnu.org
@ 2020-11-17 15:32 ` rguenth at gcc dot gnu.org
  2020-11-18  5:16 ` crazylht at gmail dot com
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-11-17 15:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97873

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
gcc-7 goes the x86 abs expander way:

abs:
.LFB0:
        .cfi_startproc
        movl    %edi, %edx
        movl    %edi, %eax
        sarl    $31, %edx
        xorl    %edx, %eax
        subl    %edx, %eax
        ret

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/97873] Failure to optimize abs optimally (at least one completely useless instruction on x86)
  2020-11-17 10:05 [Bug tree-optimization/97873] New: Failure to optimize abs optimally (at least one useless instruction on x86) gabravier at gmail dot com
  2020-11-17 15:30 ` [Bug target/97873] Failure to optimize abs optimally (at least one completely " rguenth at gcc dot gnu.org
  2020-11-17 15:32 ` rguenth at gcc dot gnu.org
@ 2020-11-18  5:16 ` crazylht at gmail dot com
  2020-11-18 13:45 ` jakub at gcc dot gnu.org
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: crazylht at gmail dot com @ 2020-11-18  5:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97873

Hongtao.liu <crazylht at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |crazylht at gmail dot com

--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
> 
> where the smax:SI is late splitted IIRC.  That's probably not the optimal
> expansion choice.
> 
> I remember a duplicate bug.

PR92651?

we introduce

(define_expand "abs<mode>2"
  [(set (match_operand:SWI48x 0 "register_operand")
    (abs:SWI48x
      (match_operand:SWI48x 1 "register_operand")))]
  "TARGET_EXPAND_ABS"

Compile with -O2 -march=corei7 got

abs(int):
        movl    %edi, %eax
        cltd
        xorl    %edx, %eax
        subl    %edx, %eax
        ret

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/97873] Failure to optimize abs optimally (at least one completely useless instruction on x86)
  2020-11-17 10:05 [Bug tree-optimization/97873] New: Failure to optimize abs optimally (at least one useless instruction on x86) gabravier at gmail dot com
                   ` (2 preceding siblings ...)
  2020-11-18  5:16 ` crazylht at gmail dot com
@ 2020-11-18 13:45 ` jakub at gcc dot gnu.org
  2020-11-18 18:23 ` ubizjak at gmail dot com
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-11-18 13:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97873

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
So then either we should expand the SWI48x mode abs for !TARGET_EXPAND_ABS into
a pre-reload define_insn_and_split with abs that we'd split almost like smax,
except for using the result of neg for the condition codes (and we'd need to
see how it plays with STV), or add a define_insn_and_split that would match
what the combiner is trying:
        (set (reg:SI 84)
            (smax:SI (neg:SI (reg/v:SI 83 [ x ]))
                (reg/v:SI 83 [ x ])))
        (clobber (reg:CC 17 flags))
and again split that cmov consumer of neg as flag setter (and again see what
STV does with that).

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/97873] Failure to optimize abs optimally (at least one completely useless instruction on x86)
  2020-11-17 10:05 [Bug tree-optimization/97873] New: Failure to optimize abs optimally (at least one useless instruction on x86) gabravier at gmail dot com
                   ` (3 preceding siblings ...)
  2020-11-18 13:45 ` jakub at gcc dot gnu.org
@ 2020-11-18 18:23 ` ubizjak at gmail dot com
  2020-11-18 18:54 ` ubizjak at gmail dot com
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: ubizjak at gmail dot com @ 2020-11-18 18:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97873

--- Comment #5 from Uroš Bizjak <ubizjak at gmail dot com> ---
Created attachment 49588
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49588&action=edit
Proposed patch

Attached patch introduces relevant peephole2 pattern (and fixes some other
issues).

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/97873] Failure to optimize abs optimally (at least one completely useless instruction on x86)
  2020-11-17 10:05 [Bug tree-optimization/97873] New: Failure to optimize abs optimally (at least one useless instruction on x86) gabravier at gmail dot com
                   ` (4 preceding siblings ...)
  2020-11-18 18:23 ` ubizjak at gmail dot com
@ 2020-11-18 18:54 ` ubizjak at gmail dot com
  2020-11-18 19:38 ` ubizjak at gmail dot com
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: ubizjak at gmail dot com @ 2020-11-18 18:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97873

--- Comment #6 from Uroš Bizjak <ubizjak at gmail dot com> ---
The attached patch generates:

        movl    %edi, %eax
        negl    %eax
        cmovs   %edi, %eax
        ret

The patch changes CC mode of NEG instruction to CCGOCmode, which is the same
mode as the mode of SUB instruction. IOW, sign bit becomes usable.

We have r1 = neg (r0). The NEG insn sets sign flag (SF) based on the *result*,
so when the SF is set, we are sure that r1 holds the negative and consequently
r0 holds the positive value. Now, when SF is set, CMOVE should select r0 (the
positive mirror image), otherwise it should select r1. This is just how CMOVS
works.

The patch also changes the mode iterator of <maxmin:code><mode> patterns to
SWI48 instead of SWI248. The purpose of maxmin expander is to prepare max/min
RTX for STV to eventually convert them to SSE MAX/MIN instructions, in order to
*avoid* CMOV insns with general registers. HImode will not be converted, so it
can be expanded by middle-end to a generic shift/xor/sub sequence as well.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/97873] Failure to optimize abs optimally (at least one completely useless instruction on x86)
  2020-11-17 10:05 [Bug tree-optimization/97873] New: Failure to optimize abs optimally (at least one useless instruction on x86) gabravier at gmail dot com
                   ` (5 preceding siblings ...)
  2020-11-18 18:54 ` ubizjak at gmail dot com
@ 2020-11-18 19:38 ` ubizjak at gmail dot com
  2020-11-19 16:56 ` ubizjak at gmail dot com
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: ubizjak at gmail dot com @ 2020-11-18 19:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97873

--- Comment #7 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Jakub Jelinek from comment #4)
> So then either we should expand the SWI48x mode abs for !TARGET_EXPAND_ABS
> into
> a pre-reload define_insn_and_split with abs that we'd split almost like
> smax, except for using the result of neg for the condition codes (and we'd
> need to see how it plays with STV), or add a define_insn_and_split that
> would match what the combiner is trying:
>         (set (reg:SI 84)
>             (smax:SI (neg:SI (reg/v:SI 83 [ x ]))
>                 (reg/v:SI 83 [ x ])))
>         (clobber (reg:CC 17 flags))
> and again split that cmov consumer of neg as flag setter (and again see what
> STV does with that).

This should be done after combine pass, so STV has a chance to convert min/max
to SSE instructions.  Effectively that means that only a peephole2 is feasible.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/97873] Failure to optimize abs optimally (at least one completely useless instruction on x86)
  2020-11-17 10:05 [Bug tree-optimization/97873] New: Failure to optimize abs optimally (at least one useless instruction on x86) gabravier at gmail dot com
                   ` (6 preceding siblings ...)
  2020-11-18 19:38 ` ubizjak at gmail dot com
@ 2020-11-19 16:56 ` ubizjak at gmail dot com
  2020-11-20 10:26 ` ubizjak at gmail dot com
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: ubizjak at gmail dot com @ 2020-11-19 16:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97873

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #49588|0                           |1
        is obsolete|                            |

--- Comment #8 from Uroš Bizjak <ubizjak at gmail dot com> ---
Created attachment 49598
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49598&action=edit
Proposed v2 patch

This patch implements different approach, where we expand ABS to an ABS RTX. 
The patch introduces STV handling of ABS insn and splits to optimal sequence if
STV isn't profitable. The patch also fixes various issues with existing minmax,
neg and abs patterns.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/97873] Failure to optimize abs optimally (at least one completely useless instruction on x86)
  2020-11-17 10:05 [Bug tree-optimization/97873] New: Failure to optimize abs optimally (at least one useless instruction on x86) gabravier at gmail dot com
                   ` (7 preceding siblings ...)
  2020-11-19 16:56 ` ubizjak at gmail dot com
@ 2020-11-20 10:26 ` ubizjak at gmail dot com
  2020-11-20 10:26 ` ubizjak at gmail dot com
  2020-11-20 14:22 ` ubizjak at gmail dot com
  10 siblings, 0 replies; 12+ messages in thread
From: ubizjak at gmail dot com @ 2020-11-20 10:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97873

--- Comment #9 from Uroš Bizjak <ubizjak at gmail dot com> ---
Fixed by:

commit fdace7584056de2f63bde2e3087f26beb6b0f97d
Author: Uros Bizjak <ubizjak@gmail.com>
Date:   Fri Nov 20 10:26:34 2020 +0100

    i386: Optimize abs expansion [PR97873]

    The patch introduces absM named pattern to generate optimal insn sequence
    for CMOVE_TARGET targets.  Currently, the expansion goes through neg+max
    optabs, and the following code is generated:

            movl    %edi, %eax
            negl    %eax
            cmpl    %edi, %eax
            cmovl   %edi, %eax

    This sequence is unoptimal in two ways.  a) The compare instruction is
    not needed, since NEG insn sets the sign flag based on the result.
    The CMOV can use sign flag to select between negated and original value:

            movl    %edi, %eax
            negl    %eax
            cmovs   %edi, %eax

    b) On some targets, CMOV is undesirable due to its performance issues.
    In addition to TARGET_EXPAND_ABS bypass, the patch introduces STV
    conversion of abs RTX to use PABS SSE insn:

            vmovd   %edi, %xmm0
            vpabsd  %xmm0, %xmm0
            vmovd   %xmm0, %eax

    The patch changes compare mode of NEG instruction to CCGOCmode,
    which is the same mode as the mode of SUB instruction. IOW, sign bit
    becomes usable.

    Also, the mode iterator of <maxmin:code><mode>3 pattern is changed
    to SWI48x instead of SWI248. The purpose of maxmin expander is to
    prepare max/min RTX for STV to eventually convert them to SSE PMAX/PMIN
    instructions, in order to *avoid* CMOV insns with general registers.

    2020-11-20  Uroš Bizjak  <ubizjak@gmail.com>

    gcc/
            PR target/97873
            * config/i386/i386.md (*neg<mode>2_2): Rename from
            "*neg<mode>2_cmpz".  Use CCGOCmode instead of CCZmode.
            (*negsi2_zext): Rename from *negsi2_cmpz_zext.
            Use CCGOCmode instead of CCZmode.
            (*neg<mode>_ccc_1): New insn pattern.
            (*neg<dwi>2_doubleword): Use *neg<mode>_ccc_1.

            (abs<mode>2): Add FLAGS_REG clobber.
            Use TARGET_CMOVE insn predicate.
            (*abs<mode>2_1): New insn_and_split pattern.
            (*absdi2_doubleword): Ditto.

            (<maxmin:code><mode>3): Use SWI48x mode iterator.
            (*<maxmin:code><mode>3): Use SWI48 mode iterator.

            * config/i386/i386-features.c
            (general_scalar_chain::compute_convert_gain): Handle ABS code.
            (general_scalar_chain::convert_insn): Ditto.
            (general_scalar_to_vector_candidate_p): Ditto.

    gcc/testsuite/
            PR target/97873
            * gcc.target/i386/pr97873.c: New test.
            * gcc.target/i386/pr97873-1.c: New test.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/97873] Failure to optimize abs optimally (at least one completely useless instruction on x86)
  2020-11-17 10:05 [Bug tree-optimization/97873] New: Failure to optimize abs optimally (at least one useless instruction on x86) gabravier at gmail dot com
                   ` (8 preceding siblings ...)
  2020-11-20 10:26 ` ubizjak at gmail dot com
@ 2020-11-20 10:26 ` ubizjak at gmail dot com
  2020-11-20 14:22 ` ubizjak at gmail dot com
  10 siblings, 0 replies; 12+ messages in thread
From: ubizjak at gmail dot com @ 2020-11-20 10:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97873

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |11.0
           Assignee|unassigned at gcc dot gnu.org      |ubizjak at gmail dot com
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #10 from Uroš Bizjak <ubizjak at gmail dot com> ---
Fixed.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/97873] Failure to optimize abs optimally (at least one completely useless instruction on x86)
  2020-11-17 10:05 [Bug tree-optimization/97873] New: Failure to optimize abs optimally (at least one useless instruction on x86) gabravier at gmail dot com
                   ` (9 preceding siblings ...)
  2020-11-20 10:26 ` ubizjak at gmail dot com
@ 2020-11-20 14:22 ` ubizjak at gmail dot com
  10 siblings, 0 replies; 12+ messages in thread
From: ubizjak at gmail dot com @ 2020-11-20 14:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97873

--- Comment #11 from Uroš Bizjak <ubizjak at gmail dot com> ---
For the record, the removal of compare triggers:

- for linux x86_64 defconfig: 93 times
- for x86_64 GCC bootstrap: 360 times

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-11-20 14:22 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-17 10:05 [Bug tree-optimization/97873] New: Failure to optimize abs optimally (at least one useless instruction on x86) gabravier at gmail dot com
2020-11-17 15:30 ` [Bug target/97873] Failure to optimize abs optimally (at least one completely " rguenth at gcc dot gnu.org
2020-11-17 15:32 ` rguenth at gcc dot gnu.org
2020-11-18  5:16 ` crazylht at gmail dot com
2020-11-18 13:45 ` jakub at gcc dot gnu.org
2020-11-18 18:23 ` ubizjak at gmail dot com
2020-11-18 18:54 ` ubizjak at gmail dot com
2020-11-18 19:38 ` ubizjak at gmail dot com
2020-11-19 16:56 ` ubizjak at gmail dot com
2020-11-20 10:26 ` ubizjak at gmail dot com
2020-11-20 10:26 ` ubizjak at gmail dot com
2020-11-20 14:22 ` ubizjak at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).