public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/104515] New: trivially-destructible destructors interfere with loop optimization - maybe related to lifetime-dse.
@ 2022-02-12 20:16 gcc at rabensky dot com
  2022-02-14  9:32 ` [Bug tree-optimization/104515] [11/12 Regression] " rguenth at gcc dot gnu.org
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: gcc at rabensky dot com @ 2022-02-12 20:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104515

            Bug ID: 104515
           Summary: trivially-destructible destructors interfere with loop
                    optimization - maybe related to lifetime-dse.
           Product: gcc
           Version: og11 (devel/omp/gcc-11)
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gcc at rabensky dot com
  Target Milestone: ---

This issue started in GCC-9.1, but a change in GCC-11 made it worse.

It didn't exist in GCC-7.1-GCC-8.5

Short description:
-----------------

When we have a loop that can be optimized out, calling the destructor for a
trivially-destructible type will prevent the optimization starting from GCC-9.1

These are loops that correctly optimized out in GCC-7.1 to GCC-8.5

This bug doesn't happen if we set -fno-lifetime-dse

Interestingly enough - a non-trivially-destructible destructor doesn't
necessarily prevent the optimization.

How this became worse in GCC-11:
-------------------------------

In GCC-11 this also applies to calling the destructor of basic types (int, long
etc.)

So loops that optimized in GCC-7.1 to GCC-10.3 no longer optimize.

Short reproducing example:
-------------------------

NOTE: No `include`s are needed

```
using T = int;
struct Vec {
  T* end;
};
void pop_back_many(Vec& v, unsigned n) {
  for (unsigned i = 0; i < n; ++i) {
    --v.end;
    v.end->~T();
  }
}
```
compiled with `-O3 -Wall`

In GCC-7 to GCC-10, `pop_back_many` optimizes out the loop (becomes
`v.end-=n`).
In GCC-11, the loop remains.

See https://godbolt.org/z/vTexxhxP9

NOTE that adding `-fno-lifetime-dse` will re-enable the loop optimization.

Why this matters
----------------
This prevents optimization of a loop over `std::vector<int>::pop_back()`, which
is a very common usecase!

Loops that optimize out in GCC-7.1 to GCC-10.3 will suddenly not optimize in
GCC-11.1/2, making existing code run MUCH slower! (O(n) instead of O(1))

NOTE: std::vector<int>::resize is a lot slower than loop over pop_back. A loop
over pop_back is currently the most efficient way to do pop_back_many!

More complete reproducing example:
---------------------------------

- We can replace the type `T` with a class that is trivially destructible.
**In that case, the problem exists in previous versions of GCC as well**

- We can replace the type `T` with a class that had user-supplied destructor.
**In that case, the loop correctly optimizes out if possible**

Actual examples:
https://godbolt.org/z/7WqTPq3cE

compiled with `-O3 -Wall`
```
template <typename T>
struct Vec {
  T* end;
};

template <typename T>
void pop_back_many(Vec<T>& v, unsigned n) {
  for (unsigned i = 0; i < n; ++i) {
    --v.end;
    v.end->~T();
  }
}

struct TrivialDestruct {
    ~TrivialDestruct()=default;
};

struct NoopDestruct {
    ~NoopDestruct(){}
};

unsigned count=0;
struct CountDestruct {
    ~CountDestruct(){++count;}
};

// Here loop optimization fails in GCC-11.1-11.2
// But succeeds in GCC 7.1-10.3
//
// NOTE that adding -fno-lifetime-dse re-enabled the optimization
template void pop_back_many(Vec<int>&, unsigned);
// Here loop optimization fails in GCC-9.1-11.2
// But succeeds in GCC 7.1-8.5
//
// NOTE that adding -fno-lifetime-dse re-enabled the optimization
template void pop_back_many(Vec<TrivialDestruct>&, unsigned);
// Here loop optimization succeeds in all versions
//
// NOTE that it's surprising that a no-op destructor can be optimized
// but a trivial destructor can't
template void pop_back_many(Vec<NoopDestruct>&, unsigned);
// Here loop optimization succeeds in all version
//
// NOTE that it's surprising that a destructor with an action
// can be optimized, but a trivial destructor can't
template void pop_back_many(Vec<CountDestruct>&, unsigned);
```

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/104515] [11/12 Regression] trivially-destructible destructors interfere with loop optimization - maybe related to lifetime-dse.
  2022-02-12 20:16 [Bug rtl-optimization/104515] New: trivially-destructible destructors interfere with loop optimization - maybe related to lifetime-dse gcc at rabensky dot com
@ 2022-02-14  9:32 ` rguenth at gcc dot gnu.org
  2022-02-15 20:28 ` gcc at rabensky dot com
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-02-14  9:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104515

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2022-02-14
     Ever confirmed|0                           |1
           Priority|P3                          |P2
          Component|rtl-optimization            |tree-optimization
            Summary|trivially-destructible      |[11/12 Regression]
                   |destructors interfere with  |trivially-destructible
                   |loop optimization - maybe   |destructors interfere with
                   |related to lifetime-dse.    |loop optimization - maybe
                   |                            |related to lifetime-dse.
                 CC|                            |jakub at gcc dot gnu.org,
                   |                            |rguenth at gcc dot gnu.org
             Status|UNCONFIRMED                 |ASSIGNED
   Target Milestone|---                         |11.3

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.  The issue is that store motion of v_7(D)->end cannot be performed
on

  <bb 3> [local count: 955630225]:
  # i_13 = PHI <i_10(6), 0(5)>
  _1 = v_7(D)->end;
  _2 = _1 + 18446744073709551612;
  v_7(D)->end = _2;
  MEM[(T *)_1 + -4B] ={v} {CLOBBER};   // <- inserted by -flifetime-dse
  i_10 = i_13 + 1;
  if (n_6(D) > i_10)
    goto <bb 6>; [89.00%]
  else
    goto <bb 4>; [11.00%]

  <bb 6> [local count: 850510901]:
  goto <bb 3>; [100.00%]

With GCC 10 we didn't have this particular CLOBBER and at least in GCC 10 the
first testcase was properly optimized.

Note there's a dependence on the value of v_7(D)->end for the clobber so
the clobber itself cannot be moved out of the loop.  So the only option
here would be to drop it if it enables store-motion.

We remove all *ssaname_N ={v} {CLOBBER} stmts during the fold-all-builtins
pass but that's very late, in particular _after_ the last CDDCE.  I suspect
that instead of

      NEXT_PASS (pass_dse);
      NEXT_PASS (pass_cd_dce, true /* update_address_taken_p */);
      /* After late CD DCE we rewrite no longer addressed locals into SSA
         form if possible.  */
      NEXT_PASS (pass_forwprop);
      NEXT_PASS (pass_phiopt, false /* early_p */);
      NEXT_PASS (pass_fold_builtins);

we might consider doing FAB (or the CLOBBER removal part) after the last
DSE (or simply direct that last DSE pass do that, with some pass specific
flag).  Note that's still too late since we need store-motion here.

Doing the *ssaname_N ={v} {CLOBBER} clobber removal after the _first_
DSE after IPA inlining would be another option.  Likewise of course
removing it as part of SM as said above.

I can see how easy it is to do from store-motion.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/104515] [11/12 Regression] trivially-destructible destructors interfere with loop optimization - maybe related to lifetime-dse.
  2022-02-12 20:16 [Bug rtl-optimization/104515] New: trivially-destructible destructors interfere with loop optimization - maybe related to lifetime-dse gcc at rabensky dot com
  2022-02-14  9:32 ` [Bug tree-optimization/104515] [11/12 Regression] " rguenth at gcc dot gnu.org
@ 2022-02-15 20:28 ` gcc at rabensky dot com
  2022-02-15 21:02 ` redi at gcc dot gnu.org
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: gcc at rabensky dot com @ 2022-02-15 20:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104515

--- Comment #2 from GBE <gcc at rabensky dot com> ---
I don't know if it helps - but I used git-bisect to find the original commit
that added this issue: cdc184174ce

https://github.com/gcc-mirror/gcc/commit/cdc184174ce

It indeed has to do with trivial destructors and clobber and lifetime-dse, so
that confirms it.

It's a fairly small commit - but it's beyond my knowledge of the codebase to
understand it.

I hope it's helpful!

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/104515] [11/12 Regression] trivially-destructible destructors interfere with loop optimization - maybe related to lifetime-dse.
  2022-02-12 20:16 [Bug rtl-optimization/104515] New: trivially-destructible destructors interfere with loop optimization - maybe related to lifetime-dse gcc at rabensky dot com
  2022-02-14  9:32 ` [Bug tree-optimization/104515] [11/12 Regression] " rguenth at gcc dot gnu.org
  2022-02-15 20:28 ` gcc at rabensky dot com
@ 2022-02-15 21:02 ` redi at gcc dot gnu.org
  2022-02-23 18:22 ` gcc at rabensky dot com
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: redi at gcc dot gnu.org @ 2022-02-15 21:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104515

--- Comment #3 from Jonathan Wakely <redi at gcc dot gnu.org> ---
aka r9-84-gcdc184174ce56d

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/104515] [11/12 Regression] trivially-destructible destructors interfere with loop optimization - maybe related to lifetime-dse.
  2022-02-12 20:16 [Bug rtl-optimization/104515] New: trivially-destructible destructors interfere with loop optimization - maybe related to lifetime-dse gcc at rabensky dot com
                   ` (2 preceding siblings ...)
  2022-02-15 21:02 ` redi at gcc dot gnu.org
@ 2022-02-23 18:22 ` gcc at rabensky dot com
  2022-04-21  7:51 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: gcc at rabensky dot com @ 2022-02-23 18:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104515

--- Comment #4 from GBE <gcc at rabensky dot com> ---
The commit that make this issue affect "basic types" as well:

https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=e443d821386

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/104515] [11/12 Regression] trivially-destructible destructors interfere with loop optimization - maybe related to lifetime-dse.
  2022-02-12 20:16 [Bug rtl-optimization/104515] New: trivially-destructible destructors interfere with loop optimization - maybe related to lifetime-dse gcc at rabensky dot com
                   ` (3 preceding siblings ...)
  2022-02-23 18:22 ` gcc at rabensky dot com
@ 2022-04-21  7:51 ` rguenth at gcc dot gnu.org
  2023-04-12 14:05 ` [Bug tree-optimization/104515] [11/12/13 " rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-04-21  7:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104515

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|11.3                        |11.4

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 11.3 is being released, retargeting bugs to GCC 11.4.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/104515] [11/12/13 Regression] trivially-destructible destructors interfere with loop optimization - maybe related to lifetime-dse.
  2022-02-12 20:16 [Bug rtl-optimization/104515] New: trivially-destructible destructors interfere with loop optimization - maybe related to lifetime-dse gcc at rabensky dot com
                   ` (4 preceding siblings ...)
  2022-04-21  7:51 ` rguenth at gcc dot gnu.org
@ 2023-04-12 14:05 ` rguenth at gcc dot gnu.org
  2023-04-17 13:02 ` rguenth at gcc dot gnu.org
  2023-05-29 10:06 ` [Bug tree-optimization/104515] [11/12/13/14 " jakub at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-04-12 14:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104515

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |pshevchuk at pshevchuk dot com

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
*** Bug 109486 has been marked as a duplicate of this bug. ***

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/104515] [11/12/13 Regression] trivially-destructible destructors interfere with loop optimization - maybe related to lifetime-dse.
  2022-02-12 20:16 [Bug rtl-optimization/104515] New: trivially-destructible destructors interfere with loop optimization - maybe related to lifetime-dse gcc at rabensky dot com
                   ` (5 preceding siblings ...)
  2023-04-12 14:05 ` [Bug tree-optimization/104515] [11/12/13 " rguenth at gcc dot gnu.org
@ 2023-04-17 13:02 ` rguenth at gcc dot gnu.org
  2023-05-29 10:06 ` [Bug tree-optimization/104515] [11/12/13/14 " jakub at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-04-17 13:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104515

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #1)
> Confirmed.  The issue is that store motion of v_7(D)->end cannot be
> performed on
> 
>   <bb 3> [local count: 955630225]:
>   # i_13 = PHI <i_10(6), 0(5)>
>   _1 = v_7(D)->end;
>   _2 = _1 + 18446744073709551612;
>   v_7(D)->end = _2;
>   MEM[(T *)_1 + -4B] ={v} {CLOBBER};   // <- inserted by -flifetime-dse
>   i_10 = i_13 + 1;
>   if (n_6(D) > i_10)
>     goto <bb 6>; [89.00%]
>   else
>     goto <bb 4>; [11.00%]
> 
>   <bb 6> [local count: 850510901]:
>   goto <bb 3>; [100.00%]

It's possible to perform store-motion of the above as the following,
preserving the CLOBBER in the loop but necessarily re-issueing the
last iteration CLOBBER on the exit edge.  I _think_ that's fine to do.
The oacc-kernels pipeline does LIM quite early before IPA.  For the
later LIM passes we might as well go with eliminating the non-decl
CLOBBERs earlier - but the issue also exists for decl CLOBBERs of course.

Note duplicating "lifetime" CLOBBERs might prove bad (we only have EOL
clobbers right now).

  <bb 3> [local count: 955630225]:
  # i_13 = PHI <i_10(6), 0(5)>
  # v__end_lsm.3_4 = PHI <v__end_lsm.3_12(6), v__end_lsm.3_3(5)>
  _1 = v__end_lsm.3_4;
  _2 = _1 + 18446744073709551612;
  v__end_lsm.3_12 = _2;
  MEM[(int *)_1 + -4B] ={v} {CLOBBER};
  i_10 = i_13 + 1;
  if (n_6(D) > i_10)
    goto <bb 6>; [89.00%]
  else
    goto <bb 7>; [11.00%]

  <bb 6> [local count: 850510901]:
  goto <bb 3>; [100.00%]

  <bb 7> [local count: 105119324]:
  # _17 = PHI <_1(3)>
  # v__end_lsm.3_19 = PHI <v__end_lsm.3_12(3)>
  v_7(D)->end = v__end_lsm.3_19;
  MEM[(int *)_17 + -4B] ={v} {CLOBBER};

  <bb 4> [local count: 118111600]:
  return;

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/104515] [11/12/13/14 Regression] trivially-destructible destructors interfere with loop optimization - maybe related to lifetime-dse.
  2022-02-12 20:16 [Bug rtl-optimization/104515] New: trivially-destructible destructors interfere with loop optimization - maybe related to lifetime-dse gcc at rabensky dot com
                   ` (6 preceding siblings ...)
  2023-04-17 13:02 ` rguenth at gcc dot gnu.org
@ 2023-05-29 10:06 ` jakub at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-05-29 10:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104515

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|11.4                        |11.5

--- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 11.4 is being released, retargeting bugs to GCC 11.5.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-05-29 10:06 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-12 20:16 [Bug rtl-optimization/104515] New: trivially-destructible destructors interfere with loop optimization - maybe related to lifetime-dse gcc at rabensky dot com
2022-02-14  9:32 ` [Bug tree-optimization/104515] [11/12 Regression] " rguenth at gcc dot gnu.org
2022-02-15 20:28 ` gcc at rabensky dot com
2022-02-15 21:02 ` redi at gcc dot gnu.org
2022-02-23 18:22 ` gcc at rabensky dot com
2022-04-21  7:51 ` rguenth at gcc dot gnu.org
2023-04-12 14:05 ` [Bug tree-optimization/104515] [11/12/13 " rguenth at gcc dot gnu.org
2023-04-17 13:02 ` rguenth at gcc dot gnu.org
2023-05-29 10:06 ` [Bug tree-optimization/104515] [11/12/13/14 " jakub at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).