public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/114589] New: missed optimization: losing bool range information
@ 2024-04-04 17:28 barry.revzin at gmail dot com
  2024-04-04 17:47 ` [Bug tree-optimization/114589] " pinskia at gcc dot gnu.org
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: barry.revzin at gmail dot com @ 2024-04-04 17:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114589

            Bug ID: 114589
           Summary: missed optimization: losing bool range information
           Product: gcc
           Version: 13.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: barry.revzin at gmail dot com
  Target Milestone: ---

Consider the following example:

template <class T>
struct simple_optional {
    bool has_val;
    T val;

    auto begin() const -> T const* { return &val; }
    #ifdef SIMPLE
    auto end() const -> T const* { return &val + (has_val ? 1 : 0); }
    #else
    auto end() const -> T const* { return has_val ? &val + 1 : &val; }
    #endif
};

void f(int);

void call_f(simple_optional<int> const& o) {   
    for (int i : o) {
        f(i);
    }
}

This function should call f at most one time. With the SIMPLE implementation
that adds (has_val ? 1 : 0), or simply has_val, or static_cast<int>(has_val),
or any version thereof - gcc trunk -O3 still emits a loop:

call_f(simple_optional<int> const&):
        push    rbp
        push    rbx
        lea     rbx, [rdi+4]
        sub     rsp, 8
        movzx   edx, BYTE PTR [rdi]
        lea     rbp, [rbx+rdx*4]
        test    dl, dl
        je      .L1
.L3:
        mov     edi, DWORD PTR [rbx]
        add     rbx, 4
        call    f(int)
        cmp     rbp, rbx
        jne     .L3
.L1:
        add     rsp, 8
        pop     rbx
        pop     rbp
        ret

With the other approach, where end() explicitly returns either &val + 1 or
&val, gcc does not emit a loop:

call_f(simple_optional<int> const&):
        cmp     BYTE PTR [rdi], 0
        jne     .L4
        ret
.L4:
        mov     edi, DWORD PTR [rdi+4]
        jmp     f(int)

On compiler explorer: https://godbolt.org/z/jaMxqsj5q

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/114589] missed optimization: losing bool range information
  2024-04-04 17:28 [Bug c++/114589] New: missed optimization: losing bool range information barry.revzin at gmail dot com
@ 2024-04-04 17:47 ` pinskia at gcc dot gnu.org
  2024-04-04 17:58 ` pinskia at gcc dot gnu.org
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-04-04 17:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114589

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
          Component|middle-end                  |tree-optimization
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2024-04-04
     Ever confirmed|0                           |1

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed.

Why didn't sink1 push _10, _13, _12, and _11 past the conditional here ...
If it did that I think it might have optimized correctly.

  <bb 2> [local count: 118111600]:
  _11 = &o_3(D)->val;
  _5 = o_3(D)->has_val;
  _12 = (sizetype) _5;
  _13 = _12 << 2;
  _10 = _11 + _13;
  if (_5 != 0)
    goto <bb 5>; [89.00%]
  else
    goto <bb 9>; [11.00%]

  <bb 5> [local count: 105119324]:

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/114589] missed optimization: losing bool range information
  2024-04-04 17:28 [Bug c++/114589] New: missed optimization: losing bool range information barry.revzin at gmail dot com
  2024-04-04 17:47 ` [Bug tree-optimization/114589] " pinskia at gcc dot gnu.org
@ 2024-04-04 17:58 ` pinskia at gcc dot gnu.org
  2024-04-05  7:52 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-04-04 17:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114589

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #1)
> Confirmed.
> 
> Why didn't sink1 push _10, _13, _12, and _11 past the conditional here ...
> If it did that I think it might have optimized correctly.

Because then _12 would be 1, _13 would be 4 and _10 would be come `_11 + 4` and
the loop induction variable would be going from _11 to `_11 + 4` then:
```
  # __for_begin_15 = PHI <__for_begin_8(6), _11(5)>
  # .MEM_16 = PHI <.MEM_7(6), .MEM_4(D)(5)>
  # VUSE <.MEM_16>
  i_6 = *__for_begin_15;
  # .MEM_7 = VDEF <.MEM_16>
  # USE = nonlocal escaped 
  # CLB = nonlocal escaped 
  _Z1fiD.2782 (i_6);
  # PT = nonlocal 
  __for_begin_8 = __for_begin_15 + 4;
  if (__for_begin_8 != _10)
    goto <bb 6>; [89.00%]
  else
    goto <bb 10>; [11.00%]
```

And we could figure out this loop is only gone through once only.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/114589] missed optimization: losing bool range information
  2024-04-04 17:28 [Bug c++/114589] New: missed optimization: losing bool range information barry.revzin at gmail dot com
  2024-04-04 17:47 ` [Bug tree-optimization/114589] " pinskia at gcc dot gnu.org
  2024-04-04 17:58 ` pinskia at gcc dot gnu.org
@ 2024-04-05  7:52 ` rguenth at gcc dot gnu.org
  2024-04-05  9:39 ` [Bug tree-optimization/114589] missed optimization: losing bool range information, missed sinking rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-04-05  7:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114589

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
This is the "old" issue with the bogus

  /* If BEST_BB is at the same nesting level, then require it to have
     significantly lower execution frequency to avoid gratuitous movement.  */
  if (bb_loop_depth (best_bb) == bb_loop_depth (early_bb)
      /* If result of comparsion is unknown, prefer EARLY_BB.
         Thus use !(...>=..) rather than (...<...)  */
      && !(best_bb->count * 100 >= early_bb->count * threshold))
    return best_bb;

check.  We have a BB2 count of 118111600 and BB5 we'd sink to has 105119324
which is higher than 75% of BB2.

This was supposed to prevent sinking across if (x) unlikely (); blocks
but it has odd side-effects like this one.

IIRC there's duplicates for this.

I'll note that we probably want to avoid sinking to a post-dominator
unless it sinks out of a loop (that's cross-BB scheduling).  But we no
longer compute post-dominators.  Without re-introducing those and
limiting sinking this way removing the odd code will cause regressions.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/114589] missed optimization: losing bool range information, missed sinking
  2024-04-04 17:28 [Bug c++/114589] New: missed optimization: losing bool range information barry.revzin at gmail dot com
                   ` (2 preceding siblings ...)
  2024-04-05  7:52 ` rguenth at gcc dot gnu.org
@ 2024-04-05  9:39 ` rguenth at gcc dot gnu.org
  2024-05-15  9:37 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-04-05  9:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114589

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
            Summary|missed optimization: losing |missed optimization: losing
                   |bool range information      |bool range information,
                   |                            |missed sinking
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Let me work on this in stage1.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/114589] missed optimization: losing bool range information, missed sinking
  2024-04-04 17:28 [Bug c++/114589] New: missed optimization: losing bool range information barry.revzin at gmail dot com
                   ` (3 preceding siblings ...)
  2024-04-05  9:39 ` [Bug tree-optimization/114589] missed optimization: losing bool range information, missed sinking rguenth at gcc dot gnu.org
@ 2024-05-15  9:37 ` rguenth at gcc dot gnu.org
  2024-05-15 11:32 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-05-15  9:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114589

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |85316

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
So when doing the sinking it's too late since there's no VRP between early
sinking and late unrolling and also threadfull doesn't figure out

  <bb 3> [local count: 105119324]:
  _11 = &o_3(D)->val;
  _12 = 1;
  _13 = 4;
  _10 = _11 + 4;
  ivtmp.11_14 = (unsigned long) _11;

  <bb 4> [local count: 955630224]:
  # ivtmp.11_17 = PHI <ivtmp.11_2(6), ivtmp.11_14(3)>
  _18 = (void *) ivtmp.11_17;
  i_6 = MEM[(const int *)_18];
  f (i_6);
  ivtmp.11_2 = ivtmp.11_17 + 4;
  __for_begin_19 = (const int *) ivtmp.11_2;
  if (__for_begin_19 != _10)
    goto <bb 6>; [89.00%]
  else
    goto <bb 5>; [11.00%]

  <bb 6> [local count: 850510900]:
  goto <bb 4>; [100.00%]

but IIRC it doesn't consider the loop entry "fallthru" as "threading" source.

Nevertheless I have a fix for the sinking issue.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85316
[Bug 85316] [meta-bug] VRP range propagation missed cases

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/114589] missed optimization: losing bool range information, missed sinking
  2024-04-04 17:28 [Bug c++/114589] New: missed optimization: losing bool range information barry.revzin at gmail dot com
                   ` (4 preceding siblings ...)
  2024-05-15  9:37 ` rguenth at gcc dot gnu.org
@ 2024-05-15 11:32 ` rguenth at gcc dot gnu.org
  2024-05-15 16:13 ` cvs-commit at gcc dot gnu.org
  2024-05-15 16:14 ` rguenth at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-05-15 11:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114589

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
So actually it also needs -fno-ivopts since otherwise VRP is confused.  With
-fno-ivopts it's late DOM that figures out the loop doesn't iterate.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/114589] missed optimization: losing bool range information, missed sinking
  2024-04-04 17:28 [Bug c++/114589] New: missed optimization: losing bool range information barry.revzin at gmail dot com
                   ` (5 preceding siblings ...)
  2024-05-15 11:32 ` rguenth at gcc dot gnu.org
@ 2024-05-15 16:13 ` cvs-commit at gcc dot gnu.org
  2024-05-15 16:14 ` rguenth at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-05-15 16:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114589

--- Comment #7 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:99b1daae18c095d6c94d32efb77442838e11cbfb

commit r15-518-g99b1daae18c095d6c94d32efb77442838e11cbfb
Author: Richard Biener <rguenther@suse.de>
Date:   Fri May 3 14:04:41 2024 +0200

    tree-optimization/114589 - remove profile based sink heuristics

    The following removes the profile based heuristic limiting sinking
    and instead uses post-dominators to avoid sinking to places that
    are executed under the same conditions as the earlier location which
    the profile based heuristic should have guaranteed as well.

    To avoid regressing this moves the empty-latch check to cover all
    sink cases.

    It also stream-lines the resulting select_best_block a bit but avoids
    adjusting heuristics more with this change.  gfortran.dg/streamio_9.f90
    starts execute failing with this on x86_64 with -m32 because the
    (float)i * 9.9999...e-7 compute is sunk across a STOP causing it
    to be no longer spilled and thus the compare failing due to excess
    precision.  The patch adds -ffloat-store to avoid this, following
    other similar testcases.

    This change fixes the testcase in the PR only when using -fno-ivopts
    as otherwise VRP is confused.

            PR tree-optimization/114589
            * tree-ssa-sink.cc (select_best_block): Remove profile-based
            heuristics.  Instead reject sink locations that sink
            to post-dominators.  Move empty latch check here from
            statement_sink_location.  Also consider early_bb for the
            loop depth check.
            (statement_sink_location): Remove superfluous check.  Remove
            empty latch check.
            (pass_sink_code::execute): Compute/release post-dominators.

            * gfortran.dg/streamio_9.f90: Use -ffloat-store to avoid
            excess precision when not spilling.
            * g++.dg/tree-ssa/pr114589.C: New testcase.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/114589] missed optimization: losing bool range information, missed sinking
  2024-04-04 17:28 [Bug c++/114589] New: missed optimization: losing bool range information barry.revzin at gmail dot com
                   ` (6 preceding siblings ...)
  2024-05-15 16:13 ` cvs-commit at gcc dot gnu.org
@ 2024-05-15 16:14 ` rguenth at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-05-15 16:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114589

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|rguenth at gcc dot gnu.org         |unassigned at gcc dot gnu.org
             Status|ASSIGNED                    |NEW

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
The missed sinking is now fixed for GCC 15, VRP is still confused by what
IVOPTs does so without -fno-ivopts the loop remains.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-05-15 16:14 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-04 17:28 [Bug c++/114589] New: missed optimization: losing bool range information barry.revzin at gmail dot com
2024-04-04 17:47 ` [Bug tree-optimization/114589] " pinskia at gcc dot gnu.org
2024-04-04 17:58 ` pinskia at gcc dot gnu.org
2024-04-05  7:52 ` rguenth at gcc dot gnu.org
2024-04-05  9:39 ` [Bug tree-optimization/114589] missed optimization: losing bool range information, missed sinking rguenth at gcc dot gnu.org
2024-05-15  9:37 ` rguenth at gcc dot gnu.org
2024-05-15 11:32 ` rguenth at gcc dot gnu.org
2024-05-15 16:13 ` cvs-commit at gcc dot gnu.org
2024-05-15 16:14 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).