public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/112657] New: missed optimization: cmove not used with multiple returns
@ 2023-11-21 21:42 goon.pri.low at gmail dot com
  2023-11-21 21:53 ` [Bug target/112657] " pinskia at gcc dot gnu.org
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: goon.pri.low at gmail dot com @ 2023-11-21 21:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112657

            Bug ID: 112657
           Summary: missed optimization: cmove not used with multiple
                    returns
           Product: gcc
           Version: 13.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: goon.pri.low at gmail dot com
  Target Milestone: ---

This function

int unopt(int c) {
    if (c == 14)
        return -9;
    else
        return c;
}

unopt:
        mov     eax, edi
        cmp     edi, 14
        je      .L4
        ret
.L4:
        mov     eax, -9
        ret

Should probably be optimized to:

int opt(int c) {
    if (c == 14)
        c = -9;

    return c;
}

opt:
        cmp     edi, 14
        mov     eax, -9
        cmovne  eax, edi
        ret

This seems to only really happen when negative numbers are used,

int positive(int c) {
    if (c == 14)
        return 9;
    else
        return c;
}

positive:
        mov     eax, edi
        cmp     edi, 14
        mov     edx, 9
        cmove   eax, edx
        ret

Though use of positive values still isn't completely optimized (possibly same
as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97968)

Also it seems like if the order is reversed:

int reverse(int c) {
    if (c != 14)
        c = 9120;

    return c;
}

reverse:
        cmp     edi, 14
        mov     edx, 14
        mov     eax, 9120
        cmove   eax, edx
        ret

We could use %edi in the cmove and eliminate the 2nd instruction.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/112657] missed optimization: cmove not used with multiple returns
  2023-11-21 21:42 [Bug tree-optimization/112657] New: missed optimization: cmove not used with multiple returns goon.pri.low at gmail dot com
@ 2023-11-21 21:53 ` pinskia at gcc dot gnu.org
  2023-11-21 21:53 ` pinskia at gcc dot gnu.org
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-11-21 21:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112657

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
It is converted to a csel on aarch64 so it is either ifcvt.cc rejecting the
issue or a tuning issue with the x86_64 backend.

x86_64:
```
IF-THEN-JOIN block found, pass 1, test 2, then 3, join 4


========== no more changes

1 possible IF blocks searched.
0 IF blocks converted.
0 true changes made.
```
vs aarch64:
```
IF-THEN-JOIN block found, pass 1, test 2, then 3, join 4
scanning new insn with uid = 30.
scanning new insn with uid = 31.
scanning new insn with uid = 32.
if-conversion succeeded through noce_try_cmove
...
```

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/112657] missed optimization: cmove not used with multiple returns
  2023-11-21 21:42 [Bug tree-optimization/112657] New: missed optimization: cmove not used with multiple returns goon.pri.low at gmail dot com
  2023-11-21 21:53 ` [Bug target/112657] " pinskia at gcc dot gnu.org
@ 2023-11-21 21:53 ` pinskia at gcc dot gnu.org
  2023-11-21 21:59 ` [Bug rtl-optimization/112657] [13/14 Regression] " pinskia at gcc dot gnu.org
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-11-21 21:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112657

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2023-11-21

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/112657] [13/14 Regression] missed optimization: cmove not used with multiple returns
  2023-11-21 21:42 [Bug tree-optimization/112657] New: missed optimization: cmove not used with multiple returns goon.pri.low at gmail dot com
  2023-11-21 21:53 ` [Bug target/112657] " pinskia at gcc dot gnu.org
  2023-11-21 21:53 ` pinskia at gcc dot gnu.org
@ 2023-11-21 21:59 ` pinskia at gcc dot gnu.org
  2023-11-22  9:16 ` ubizjak at gmail dot com
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-11-21 21:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112657

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|target                      |rtl-optimization
      Known to fail|                            |13.1.0
      Known to work|                            |12.3.0
            Summary|missed optimization: cmove  |[13/14 Regression] missed
                   |not used with multiple      |optimization: cmove not
                   |returns                     |used with multiple returns
   Target Milestone|---                         |13.3

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The difference between GCC 12 and GCC 13 is:

GCC 13:
```
IF-THEN-JOIN block found, pass 1, test 2, then 3, join 4
```
GCC 12 and before:
```
IF-THEN-ELSE-JOIN block found, pass 1, test 2, then 3, else 4, join 5
```

Someone will have to debug ifcvt.cc to see why it fails on x86_64 but works on
aarch64.  Note there are some new changes to ifcvt.cc in review which might
improve this, though I am not sure.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/112657] [13/14 Regression] missed optimization: cmove not used with multiple returns
  2023-11-21 21:42 [Bug tree-optimization/112657] New: missed optimization: cmove not used with multiple returns goon.pri.low at gmail dot com
                   ` (2 preceding siblings ...)
  2023-11-21 21:59 ` [Bug rtl-optimization/112657] [13/14 Regression] " pinskia at gcc dot gnu.org
@ 2023-11-22  9:16 ` ubizjak at gmail dot com
  2023-11-22  9:28 ` ubizjak at gmail dot com
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: ubizjak at gmail dot com @ 2023-11-22  9:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112657

--- Comment #3 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Andrew Pinski from comment #2)

> Someone will have to debug ifcvt.cc to see why it fails on x86_64 but works
> on aarch64.  Note there are some new changes to ifcvt.cc in review which
> might improve this, though I am not sure.

x86_64 targetm.noce_conversion_profitable_p returns false for:

(insn 20 0 19 (set (reg:SI 101)
        (const_int -9 [0xfffffffffffffff7])) 85 {*movsi_internal}
     (nil))

(insn 19 20 21 (set (reg:CCZ 17 flags)
        (compare:CCZ (reg/v:SI 99 [ c ])
            (const_int 14 [0xe]))) 11 {*cmpsi_1}
     (nil))

(insn 21 19 0 (set (reg/v:SI 99 [ c ])
        (if_then_else:SI (ne (reg:CCZ 17 flags)
                (const_int 0 [0]))
            (reg/v:SI 99 [ c ])
            (reg:SI 101))) 1438 {*movsicc_noc}
     (nil))

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/112657] [13/14 Regression] missed optimization: cmove not used with multiple returns
  2023-11-21 21:42 [Bug tree-optimization/112657] New: missed optimization: cmove not used with multiple returns goon.pri.low at gmail dot com
                   ` (3 preceding siblings ...)
  2023-11-22  9:16 ` ubizjak at gmail dot com
@ 2023-11-22  9:28 ` ubizjak at gmail dot com
  2023-11-22  9:38 ` ubizjak at gmail dot com
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: ubizjak at gmail dot com @ 2023-11-22  9:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112657

--- Comment #4 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Uroš Bizjak from comment #3)
> (In reply to Andrew Pinski from comment #2)
> 
> > Someone will have to debug ifcvt.cc to see why it fails on x86_64 but works
> > on aarch64.  Note there are some new changes to ifcvt.cc in review which
> > might improve this, though I am not sure.
> 
> x86_64 targetm.noce_conversion_profitable_p returns false for:

Actually, the cost function goes to default_noce_conversion_profitable_p,
where:

(gdb) p cost
$1 = 16
(gdb) p if_info->original_cost 
$2 = 8
(gdb) p if_info->max_seq_cost 
$3 = 0

For some reason, max_seq_cost remains zero, while on aarch64:

(gdb) p cost
$2 = 12
(gdb) p if_info->original_cost
$3 = 8
(gdb) p if_info->max_seq_cost
$4 = 12

So, x86_64 returns false from the default cost function:

  /* When compiling for size, we can make a reasonably accurately guess
     at the size growth.  When compiling for speed, use the maximum.  */
  return speed_p && cost <= if_info->max_seq_cost;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/112657] [13/14 Regression] missed optimization: cmove not used with multiple returns
  2023-11-21 21:42 [Bug tree-optimization/112657] New: missed optimization: cmove not used with multiple returns goon.pri.low at gmail dot com
                   ` (4 preceding siblings ...)
  2023-11-22  9:28 ` ubizjak at gmail dot com
@ 2023-11-22  9:38 ` ubizjak at gmail dot com
  2023-11-22  9:46 ` ubizjak at gmail dot com
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: ubizjak at gmail dot com @ 2023-11-22  9:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112657

--- Comment #5 from Uroš Bizjak <ubizjak at gmail dot com> ---
Digging a bit further:

if_info.max_seq_cost is calculated via targetm.max_noce_ifcvt_seq_cost, where
without params set we return:

  return BRANCH_COST (true, predictable_p) * COSTS_N_INSNS (2);

with:

#define BRANCH_COST(speed_p, predictable_p) \
  (!(speed_p) ? 2 : (predictable_p) ? 0 : ix86_branch_cost)

So, the conversion is clearly not desirable for well predicted jumps.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/112657] [13/14 Regression] missed optimization: cmove not used with multiple returns
  2023-11-21 21:42 [Bug tree-optimization/112657] New: missed optimization: cmove not used with multiple returns goon.pri.low at gmail dot com
                   ` (5 preceding siblings ...)
  2023-11-22  9:38 ` ubizjak at gmail dot com
@ 2023-11-22  9:46 ` ubizjak at gmail dot com
  2023-11-22 10:48 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: ubizjak at gmail dot com @ 2023-11-22  9:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112657

--- Comment #6 from Uroš Bizjak <ubizjak at gmail dot com> ---
This is by design, CMOV should not be used instead of well predicted jumps.

FYI, CMOV is quite problematic on x86, there are several PRs where conversion
to CMOV resulted in 2x slower execution. Please see e.g.:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309#c26

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/112657] [13/14 Regression] missed optimization: cmove not used with multiple returns
  2023-11-21 21:42 [Bug tree-optimization/112657] New: missed optimization: cmove not used with multiple returns goon.pri.low at gmail dot com
                   ` (6 preceding siblings ...)
  2023-11-22  9:46 ` ubizjak at gmail dot com
@ 2023-11-22 10:48 ` rguenth at gcc dot gnu.org
  2023-11-22 13:35 ` hubicka at gcc dot gnu.org
  2024-03-07 21:01 ` law at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-11-22 10:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112657

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hubicka at gcc dot gnu.org

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
I think a return of a negative value is predicted to be cold (aka "error"):

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  if (c == 14)
    goto <bb 3>; [INV]
  else
    goto <bb 4>; [INV]
;;    succ:       3
;;                4

;;   basic block 3, loop depth 0
;;    pred:       2
  D.2771 = -9;
  // predicted unlikely by early return (on trees) predictor.
  goto <bb 5>; [INV]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/112657] [13/14 Regression] missed optimization: cmove not used with multiple returns
  2023-11-21 21:42 [Bug tree-optimization/112657] New: missed optimization: cmove not used with multiple returns goon.pri.low at gmail dot com
                   ` (7 preceding siblings ...)
  2023-11-22 10:48 ` rguenth at gcc dot gnu.org
@ 2023-11-22 13:35 ` hubicka at gcc dot gnu.org
  2024-03-07 21:01 ` law at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-11-22 13:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112657

--- Comment #8 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
The negative return value branch predictor is set to have 98% hitrate (measured
on SPEC2k17 some time ago).  There is --param predictable-branch-outcome that
is also set to 2% so indeed we consider the branch as well predictable by this
heuristics.

Reducing --param should make cmov to happen.

With profile_probability data type we could try something smarter on guessing
if given branch is predictable (such as ignoring guessed values and let
predictor to optionally mark branches as (un)predictable). But it is not quite
clear to me what desired behavior would be...

Guessing predictability of data branches is generally quite hard problem.
Predictablity of loop branches is easier, but we hardly apply BRANCH_COST on
branch closing loop since those are not if-conversion candidates.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/112657] [13/14 Regression] missed optimization: cmove not used with multiple returns
  2023-11-21 21:42 [Bug tree-optimization/112657] New: missed optimization: cmove not used with multiple returns goon.pri.low at gmail dot com
                   ` (8 preceding siblings ...)
  2023-11-22 13:35 ` hubicka at gcc dot gnu.org
@ 2024-03-07 21:01 ` law at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: law at gcc dot gnu.org @ 2024-03-07 21:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112657

Jeffrey A. Law <law at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2
                 CC|                            |law at gcc dot gnu.org

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-03-07 21:01 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-21 21:42 [Bug tree-optimization/112657] New: missed optimization: cmove not used with multiple returns goon.pri.low at gmail dot com
2023-11-21 21:53 ` [Bug target/112657] " pinskia at gcc dot gnu.org
2023-11-21 21:53 ` pinskia at gcc dot gnu.org
2023-11-21 21:59 ` [Bug rtl-optimization/112657] [13/14 Regression] " pinskia at gcc dot gnu.org
2023-11-22  9:16 ` ubizjak at gmail dot com
2023-11-22  9:28 ` ubizjak at gmail dot com
2023-11-22  9:38 ` ubizjak at gmail dot com
2023-11-22  9:46 ` ubizjak at gmail dot com
2023-11-22 10:48 ` rguenth at gcc dot gnu.org
2023-11-22 13:35 ` hubicka at gcc dot gnu.org
2024-03-07 21:01 ` law at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).