public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/98713] New: Failure to generate branch version of abs if user requested it
@ 2021-01-17 20:08 david.bolvansky at gmail dot com
  2021-01-18  8:44 ` [Bug middle-end/98713] " marxin at gcc dot gnu.org
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: david.bolvansky at gmail dot com @ 2021-01-17 20:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98713

            Bug ID: 98713
           Summary: Failure to generate branch version of abs if user
                    requested it
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: david.bolvansky at gmail dot com
  Target Milestone: ---

int branch_abs(int v) {
    return __builtin_expect(v > 0, 1) ? v : -v;
}

GCC -O2 now:

branch_abs:
  mov eax, edi
  neg eax
  cmovs eax, edi
  ret


Expected:
branch_abs:
  mov eax, edi
  test edi, edi
  js .LBB0_1
  ret
.LBB0_1:
  neg eax
  ret


Same for min/max.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug middle-end/98713] Failure to generate branch version of abs if user requested it
  2021-01-17 20:08 [Bug c/98713] New: Failure to generate branch version of abs if user requested it david.bolvansky at gmail dot com
@ 2021-01-18  8:44 ` marxin at gcc dot gnu.org
  2021-01-18  9:19 ` roger at nextmovesoftware dot com
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-01-18  8:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98713

Martin Liška <marxin at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2021-01-18
                 CC|                            |marxin at gcc dot gnu.org,
                   |                            |sayle at gcc dot gnu.org,
                   |                            |uros at gcc dot gnu.org

--- Comment #1 from Martin Liška <marxin at gcc dot gnu.org> ---
I think it's fixed since r11-2588-gc072fd236dc08f99.

@Roger, Uros: Can you please verify that?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug middle-end/98713] Failure to generate branch version of abs if user requested it
  2021-01-17 20:08 [Bug c/98713] New: Failure to generate branch version of abs if user requested it david.bolvansky at gmail dot com
  2021-01-18  8:44 ` [Bug middle-end/98713] " marxin at gcc dot gnu.org
@ 2021-01-18  9:19 ` roger at nextmovesoftware dot com
  2021-01-18  9:35 ` rguenth at gcc dot gnu.org
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: roger at nextmovesoftware dot com @ 2021-01-18  9:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98713

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |roger at nextmovesoftware dot com

--- Comment #2 from Roger Sayle <roger at nextmovesoftware dot com> ---
My first impression is that this isn't a bug, it's a feature.  In an optimizing
compiler, the user specifies the computation to be performed and the compiler
selects the implementation.  Hence "x+0" isn't a user request to perform an
addition.

Perhaps David could provide more information on why a branch implementation is
required/preferred (for example on which target)?  On generic x86_64, I believe
the code currently generated is both smaller and faster.  Assuming "neg eax"
takes about the same time as "test edi,edi", and that "cmovs" takes about the
same time as (either branch) of "js".

As a workaround a branch version can be implemented in inline assembly using
__asm, but I'm still hazy as to why this would be desirable.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug middle-end/98713] Failure to generate branch version of abs if user requested it
  2021-01-17 20:08 [Bug c/98713] New: Failure to generate branch version of abs if user requested it david.bolvansky at gmail dot com
  2021-01-18  8:44 ` [Bug middle-end/98713] " marxin at gcc dot gnu.org
  2021-01-18  9:19 ` roger at nextmovesoftware dot com
@ 2021-01-18  9:35 ` rguenth at gcc dot gnu.org
  2021-01-18  9:45 ` ubizjak at gmail dot com
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-18  9:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98713

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
A well-predicted branch will be faster than the cmov because of the shorter
data dependence path.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug middle-end/98713] Failure to generate branch version of abs if user requested it
  2021-01-17 20:08 [Bug c/98713] New: Failure to generate branch version of abs if user requested it david.bolvansky at gmail dot com
                   ` (2 preceding siblings ...)
  2021-01-18  9:35 ` rguenth at gcc dot gnu.org
@ 2021-01-18  9:45 ` ubizjak at gmail dot com
  2021-01-18 10:31 ` david.bolvansky at gmail dot com
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: ubizjak at gmail dot com @ 2021-01-18  9:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98713

--- Comment #4 from Uroš Bizjak <ubizjak at gmail dot com> ---
Please see PR 56309 (and PR 85559 meta bug).

Quote from Honza:

The decision on whether to use cmov or jmp was always tricky on x86
architectures. Cmov increase dependency chains, register pressure (both values
needs to be loaded in) and has long opcode. So jump sequence, if well
predicted, flows better through the out-of-order core. If badly predicted it
is, of course, a disaster. I think more modern CPUs solved the problems with
long latency of cmov, but the dependency chains are still there.

We don't know how to drive the decision, as this is a deep architectural issue.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug middle-end/98713] Failure to generate branch version of abs if user requested it
  2021-01-17 20:08 [Bug c/98713] New: Failure to generate branch version of abs if user requested it david.bolvansky at gmail dot com
                   ` (3 preceding siblings ...)
  2021-01-18  9:45 ` ubizjak at gmail dot com
@ 2021-01-18 10:31 ` david.bolvansky at gmail dot com
  2021-01-18 10:34 ` jakub at gcc dot gnu.org
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: david.bolvansky at gmail dot com @ 2021-01-18 10:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98713

--- Comment #5 from Dávid Bolvanský <david.bolvansky at gmail dot com> ---
User knows the data better, so he/she may prefer abs with branch.

Also PGO may say that branch for abs is better based on profile data.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug middle-end/98713] Failure to generate branch version of abs if user requested it
  2021-01-17 20:08 [Bug c/98713] New: Failure to generate branch version of abs if user requested it david.bolvansky at gmail dot com
                   ` (4 preceding siblings ...)
  2021-01-18 10:31 ` david.bolvansky at gmail dot com
@ 2021-01-18 10:34 ` jakub at gcc dot gnu.org
  2021-01-18 11:46 ` roger at nextmovesoftware dot com
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-01-18 10:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98713

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Guess we should punt on the various phiopt detections if the branch
probabilities are substantially different.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug middle-end/98713] Failure to generate branch version of abs if user requested it
  2021-01-17 20:08 [Bug c/98713] New: Failure to generate branch version of abs if user requested it david.bolvansky at gmail dot com
                   ` (5 preceding siblings ...)
  2021-01-18 10:34 ` jakub at gcc dot gnu.org
@ 2021-01-18 11:46 ` roger at nextmovesoftware dot com
  2021-01-18 14:03 ` jakub at gcc dot gnu.org
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: roger at nextmovesoftware dot com @ 2021-01-18 11:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98713

--- Comment #7 from Roger Sayle <roger at nextmovesoftware dot com> ---
I agree in the general case, a conditional jump (that depends only on the
condition flags) potentially has a shorter dependence chain than a cmov (which
depends on the condition flags and two registers).  But in this case, the
condition codes can't be determined any faster than the register operands.

I believe the branch prediction argument is a red herring.  Yes, with clever
hardware and luck, the CPU can predict which instruction will be executed next
after a conditional jump, but it can always know which instruction is executed
next after a cmov.  A cmov (and multiple cmovs) can be scheduled and executed
out-of-order without speculation.  Hence branch prediction is only a factor
when dependency chain lengths are an issue/unequal (or the cmov is slower than
a correctly predicted branch).

An understanding the data distribution is also irrelevant if the best/fastest
(correctly predicted) branch implementation is no better/faster than the cmov.

But I can also imagine microarchitectures where predicted conditional jumps are
free (requiring zero cycles) and where the condition code "test" is eliminated
having been set/forwarded from an earier instruction, in which case a
zero-latency abs is about as good as you can get.  Are we assuming a target
with "predicted_branch_cost < conditional_move_cost"? I wouldn't be surprised
if GCC internally assumes these are both always COSTS_N_INSNS(1).

If conditional_move_cost <= predicted_branch_cost <= mispredicted_branch_cost
then the cmov should always preferred (independent of branch probabilities or
__builtin_expect hints).  If predicted_branch_cost <= mispredicted_branch_cost
<= conditional_move_cost, the branch should always be preferred, and the cmov
shouldn't be part of the ISA.  The interesting domain of trade-offs is
where/when predicted_branch_cost < conditional_move_cost <=
mispredicted_branch_cost (which I'm not yet convinced is the case here).

Do we have any numbers that show the branch is better (for this case) on real
hardware, than can't be explained by other factors?  For example, on ABS where
the inputs are always positive.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug middle-end/98713] Failure to generate branch version of abs if user requested it
  2021-01-17 20:08 [Bug c/98713] New: Failure to generate branch version of abs if user requested it david.bolvansky at gmail dot com
                   ` (6 preceding siblings ...)
  2021-01-18 11:46 ` roger at nextmovesoftware dot com
@ 2021-01-18 14:03 ` jakub at gcc dot gnu.org
  2021-09-26  9:41 ` pinskia at gcc dot gnu.org
  2021-09-26  9:42 ` pinskia at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-01-18 14:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98713

--- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
This is specific to x86, where if the inputs are inpredictable and results
aren't consumed too early that the cmov latency kills performance cmov
sometimes improves performance a lot, on the other side, if the inputs are
predictable, branches are often much faster than cmov.
I'm not aware of other architectures where the conditional moves are such a
mixed bag, e.g. on arm/aarch64 I think using cmov is generally always better.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug middle-end/98713] Failure to generate branch version of abs if user requested it
  2021-01-17 20:08 [Bug c/98713] New: Failure to generate branch version of abs if user requested it david.bolvansky at gmail dot com
                   ` (7 preceding siblings ...)
  2021-01-18 14:03 ` jakub at gcc dot gnu.org
@ 2021-09-26  9:41 ` pinskia at gcc dot gnu.org
  2021-09-26  9:42 ` pinskia at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-09-26  9:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98713

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |x86_64
           Keywords|                            |missed-optimization

--- Comment #9 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
This is basically a bug in PHI-OPT which assumes ABS_EXPR will always generate
better code than the conditional case.

Hmm, Why is x86_64 using a cmov here for ABS_EXPR instead of:
Shift
xor
sub
?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug middle-end/98713] Failure to generate branch version of abs if user requested it
  2021-01-17 20:08 [Bug c/98713] New: Failure to generate branch version of abs if user requested it david.bolvansky at gmail dot com
                   ` (8 preceding siblings ...)
  2021-09-26  9:41 ` pinskia at gcc dot gnu.org
@ 2021-09-26  9:42 ` pinskia at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-09-26  9:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98713

--- Comment #10 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Martin Liška from comment #1)
> I think it's fixed since r11-2588-gc072fd236dc08f99.

Oh this changed from the shift/xor/sub to using cmov ...

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-09-26  9:42 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-17 20:08 [Bug c/98713] New: Failure to generate branch version of abs if user requested it david.bolvansky at gmail dot com
2021-01-18  8:44 ` [Bug middle-end/98713] " marxin at gcc dot gnu.org
2021-01-18  9:19 ` roger at nextmovesoftware dot com
2021-01-18  9:35 ` rguenth at gcc dot gnu.org
2021-01-18  9:45 ` ubizjak at gmail dot com
2021-01-18 10:31 ` david.bolvansky at gmail dot com
2021-01-18 10:34 ` jakub at gcc dot gnu.org
2021-01-18 11:46 ` roger at nextmovesoftware dot com
2021-01-18 14:03 ` jakub at gcc dot gnu.org
2021-09-26  9:41 ` pinskia at gcc dot gnu.org
2021-09-26  9:42 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).