public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/109945] New: Escape analysis hates copy elision: different result with -O1 vs -O2
@ 2023-05-23 14:00 arthur.j.odwyer at gmail dot com
  2023-05-23 16:09 ` [Bug tree-optimization/109945] " rguenth at gcc dot gnu.org
                   ` (32 more replies)
  0 siblings, 33 replies; 34+ messages in thread
From: arthur.j.odwyer at gmail dot com @ 2023-05-23 14:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109945

            Bug ID: 109945
           Summary: Escape analysis hates copy elision: different result
                    with -O1 vs -O2
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Keywords: wrong-code
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: arthur.j.odwyer at gmail dot com
  Target Milestone: ---

Background:
https://quuxplusone.github.io/blog/2021/03/07/copy-elision-borks-escape-analysis/
The background paradox here is, "When class Widget is subject to copy elision,
then any unseen function can return a prvalue Widget whose address has already
escaped!" Aaron Puchert and I were discussing this, with examples. (He thinks
the resolution of the paradox is "you *must* treat a lot more things as
escaped"; I think an acceptable resolution would be "you may treat copy-elision
itself as a magic that invalidates pointers and references even though the
object is still in the same place.")
But then I came up with an example that didn't rely on copy elision at all. We
both agree this code is perfectly well-defined — yet GCC miscompiles it at -O1!

// https://godbolt.org/z/bTnv68nhG
struct Widget {
    Widget();
    int i = 1;
    int a[4];
};
Widget *global = nullptr;
Widget::Widget() { global = this; }
Widget make() { return Widget(); }
void g() { global->i = 42; }
int main() {
  Widget w = make();
  int i = w.i;
  g();
  return (i == w.i);
    // Does this need to be reloaded and
    // compared? or is it obviously true?  
}

gcc -O0 and gcc -O2 both correctly return 0 from main.
gcc -O1 wrongly returns 1 from main.

*At least* since C++17, I think the -O1 result is flat-out wrong codegen. We
have `global == &w`, and so the call to `g()` can definitely modify `w.i`.

(Clang always treats Widgets' addresses as escaped, no matter what Widget looks
like. MSVC's escape analysis is more complicated and I have not yet been able
to trick it into wrong codegen.)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug tree-optimization/109945] Escape analysis hates copy elision: different result with -O1 vs -O2
  2023-05-23 14:00 [Bug c++/109945] New: Escape analysis hates copy elision: different result with -O1 vs -O2 arthur.j.odwyer at gmail dot com
@ 2023-05-23 16:09 ` rguenth at gcc dot gnu.org
  2023-05-23 16:11 ` rguenth at gcc dot gnu.org
                   ` (31 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-23 16:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109945

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|c++                         |tree-optimization
   Last reconfirmed|                            |2023-05-23
                 CC|                            |jakub at gcc dot gnu.org
             Status|UNCONFIRMED                 |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
     Ever confirmed|0                           |1
            Version|unknown                     |14.0

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
ESCAPED, points-to vars: { }

intD.9 mainD.2827 ()
{
  intD.9 w$iD.2845;
  intD.9 iD.2831;
  struct WidgetD.2773 wD.2829;
  intD.9 _1; 
  boolD.2740 _2;
  intD.9 _7; 

  <bb 2> :
  # USE = nonlocal  
  # CLB = nonlocal escaped
  wD.2829 = makeD.2813 (); [return slot optimization]
  w$i_9 = wD.2829.iD.2778;
  i_5 = w$i_9;
  # USE = nonlocal escaped
  # CLB = nonlocal escaped
  gD.2825 ();
  _1 = w$i_9;
  _2 = _1 == i_5;
  _7 = (intD.9) _2;
  wD.2829 ={v} {CLOBBER(eol)};
  return _7;

}

so indeed GCC points-to doesn't consider

  wD.2829 = makeD.2813 (); [return slot optimization]

as escape point for the return.  We have

  /* And if we applied NRV the address of the return slot escapes as well.  */
  if (gimple_call_return_slot_opt_p (stmt)
      && gimple_call_lhs (stmt) != NULL_TREE
      && TREE_ADDRESSABLE (TREE_TYPE (gimple_call_lhs (stmt))))
    {

but in this case the type is not TREE_ADDRESSABLE.  I'm not sure why this
check is here.  Removing it fixes the issue.  A !is_gimple_reg (lhs)
check may be a substitute in case we can get stale return-slot-opt flags?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug tree-optimization/109945] Escape analysis hates copy elision: different result with -O1 vs -O2
  2023-05-23 14:00 [Bug c++/109945] New: Escape analysis hates copy elision: different result with -O1 vs -O2 arthur.j.odwyer at gmail dot com
  2023-05-23 16:09 ` [Bug tree-optimization/109945] " rguenth at gcc dot gnu.org
@ 2023-05-23 16:11 ` rguenth at gcc dot gnu.org
  2023-05-23 16:19 ` pinskia at gcc dot gnu.org
                   ` (30 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-23 16:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109945

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
The fix for PR40389, r0-94078-g4d61856d0a221c, changed this to look at type
addressability (from LHS addressability).

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug tree-optimization/109945] Escape analysis hates copy elision: different result with -O1 vs -O2
  2023-05-23 14:00 [Bug c++/109945] New: Escape analysis hates copy elision: different result with -O1 vs -O2 arthur.j.odwyer at gmail dot com
  2023-05-23 16:09 ` [Bug tree-optimization/109945] " rguenth at gcc dot gnu.org
  2023-05-23 16:11 ` rguenth at gcc dot gnu.org
@ 2023-05-23 16:19 ` pinskia at gcc dot gnu.org
  2023-05-23 16:48 ` rguenth at gcc dot gnu.org
                   ` (29 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-23 16:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109945

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note to make it fail at -O2 and above is simple just add [[gnu::noipa]] to make
definition like this:
[[gnu::noipa]]
Widget make() { return Widget(); }

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug tree-optimization/109945] Escape analysis hates copy elision: different result with -O1 vs -O2
  2023-05-23 14:00 [Bug c++/109945] New: Escape analysis hates copy elision: different result with -O1 vs -O2 arthur.j.odwyer at gmail dot com
                   ` (2 preceding siblings ...)
  2023-05-23 16:19 ` pinskia at gcc dot gnu.org
@ 2023-05-23 16:48 ` rguenth at gcc dot gnu.org
  2023-05-23 16:53 ` rguenth at gcc dot gnu.org
                   ` (28 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-23 16:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109945

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jason at gcc dot gnu.org
      Known to fail|4.1.2, 4.5.3, 4.7.4, 4.9.0  |

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
The testcase is very related to the old PR and shows the issue manifests itself
also for !TYPE_NEEDS_CONSTRUCTING types.

I wonder whether we need a CTOR at all here to show the issue.

I also wonder how we can query whether the type has a CTOR (as opposed to
"needs" one).  The function type is

 <function_type 0x7ffff6ff3bd0
    type <record_type 0x7ffff6fdef18 Widget sizes-gimplified needs-constructing
cxx-odr-p type_1 type_5 type_6 BLK
        size <integer_cst 0x7ffff6ff6048 constant 160>
        unit-size <integer_cst 0x7ffff6ff6060 constant 20>
        align:32 warn_if_not_align:0 symtab:0 alias-set 2 canonical-type
0x7ffff6fdef18
        fields <function_decl 0x7ffff6ffd000 __dt  type <method_type
0x7ffff6ff3f18>
            public abstract external autoinline decl_3 QI t.ii:1:8 align:16
warn_if_not_align:0 context <record_type 0x7ffff6fdef18 Widget>
            full-name "Widget::~Widget() noexcept (<uninstantiated>)"
            not-really-extern chain <function_decl 0x7ffff6ffd200 __dt_base >>
context <translation_unit_decl 0x7ffff6e3f168 t.ii>
        full-name "struct Widget"
        needs-constructor X() X(constX&) this=(X&) n_parents=0 use_template=0
interface-unknown
        pointer_to_this <pointer_type 0x7ffff6ff30a8> reference_to_this
<reference_type 0x7ffff6ff3930> chain <type_decl 0x7ffff6e64c78 Widget>>

so the type has TYPE_NEEDS_CONSTRUCTING set but it isn't TREE_ADDRESSABLE.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug tree-optimization/109945] Escape analysis hates copy elision: different result with -O1 vs -O2
  2023-05-23 14:00 [Bug c++/109945] New: Escape analysis hates copy elision: different result with -O1 vs -O2 arthur.j.odwyer at gmail dot com
                   ` (3 preceding siblings ...)
  2023-05-23 16:48 ` rguenth at gcc dot gnu.org
@ 2023-05-23 16:53 ` rguenth at gcc dot gnu.org
  2023-05-23 16:53 ` pinskia at gcc dot gnu.org
                   ` (27 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-23 16:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109945

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
It seems a copy CTOR or DTOR is required to make it TREE_ADDRESSABLE.  So
adding either

  ~Widget();

or

  Widget(const Widget&);

to the Widget class declaration fixes the testcase.  Are we using the wrong
check or is escaping 'this' for these kind of classes invoking undefined
behavior?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug tree-optimization/109945] Escape analysis hates copy elision: different result with -O1 vs -O2
  2023-05-23 14:00 [Bug c++/109945] New: Escape analysis hates copy elision: different result with -O1 vs -O2 arthur.j.odwyer at gmail dot com
                   ` (4 preceding siblings ...)
  2023-05-23 16:53 ` rguenth at gcc dot gnu.org
@ 2023-05-23 16:53 ` pinskia at gcc dot gnu.org
  2023-05-23 17:46 ` arthur.j.odwyer at gmail dot com
                   ` (26 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-23 16:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109945

--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note here is a testcase which fails even with C++98 (basically changing the
default assignment in the class to the ctor):
```
struct Widget {
    Widget();
    int i;
    int a[4];
};
Widget *global = 0;
Widget::Widget() : i(1) { global = this; }
Widget make() __attribute__((noinline));
Widget make() { return Widget(); }
void g() { global->i = 42; }
int main() {
  Widget w = make();
  int i = w.i;
  g();
  if (i == w.i) __builtin_abort();
    // Does this need to be reloaded and
    // compared? or is it obviously true? 
  return 0; 
}
```

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug tree-optimization/109945] Escape analysis hates copy elision: different result with -O1 vs -O2
  2023-05-23 14:00 [Bug c++/109945] New: Escape analysis hates copy elision: different result with -O1 vs -O2 arthur.j.odwyer at gmail dot com
                   ` (5 preceding siblings ...)
  2023-05-23 16:53 ` pinskia at gcc dot gnu.org
@ 2023-05-23 17:46 ` arthur.j.odwyer at gmail dot com
  2023-05-23 17:49 ` pinskia at gcc dot gnu.org
                   ` (25 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: arthur.j.odwyer at gmail dot com @ 2023-05-23 17:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109945

--- Comment #7 from Arthur O'Dwyer <arthur.j.odwyer at gmail dot com> ---
Richard Biener wrote:
> Are we using the wrong check or is escaping 'this'
> for these kind of classes invoking undefined behavior?

Wow, this got a lot of traffic quickly!  Sounds like you (Richard, Andrew) are
on top of the issue here, where a constructor is involved; but once you start
talking about heuristics and checks, I think it would be productive for you two
to make sure you're both on the same page with *this* example:

// https://godbolt.org/z/Ea43Y65z4
struct Widget {
    int i = 1;
    int a[4];
};
Widget *global = nullptr;
Widget make2() { Widget w; global = &w; return w; }
void g() { global->i = 42; }
int main() {
  Widget w = make2();
  int i = w.i;
  g();
  return (i == w.i);
    // Does this need to be reloaded and
    // compared? or is it obviously true?  
}

In this case, Widget has no constructors, and I think it would be perfectly
defensible for the compiler to say that "global = &w" is not a valid way to
escape the address of that prvalue result (even though copy elision will make
the w in make() and the w in main() the same object). But do both of *you* have
the same answer to that paradox? If you don't, then you might also not agree
about what the appropriate heuristic should be in the original case and might
end up talking past each other.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug tree-optimization/109945] Escape analysis hates copy elision: different result with -O1 vs -O2
  2023-05-23 14:00 [Bug c++/109945] New: Escape analysis hates copy elision: different result with -O1 vs -O2 arthur.j.odwyer at gmail dot com
                   ` (6 preceding siblings ...)
  2023-05-23 17:46 ` arthur.j.odwyer at gmail dot com
@ 2023-05-23 17:49 ` pinskia at gcc dot gnu.org
  2023-05-23 17:52 ` pinskia at gcc dot gnu.org
                   ` (24 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-23 17:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109945

--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Arthur O'Dwyer from comment #7)
> // https://godbolt.org/z/Ea43Y65z4
> struct Widget {
>     int i = 1;
...
> In this case, Widget has no constructors,

No, it has a constructor because of the NSDMI . NSDMI causes a non-trival
constexpr constructor to be created.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug tree-optimization/109945] Escape analysis hates copy elision: different result with -O1 vs -O2
  2023-05-23 14:00 [Bug c++/109945] New: Escape analysis hates copy elision: different result with -O1 vs -O2 arthur.j.odwyer at gmail dot com
                   ` (7 preceding siblings ...)
  2023-05-23 17:49 ` pinskia at gcc dot gnu.org
@ 2023-05-23 17:52 ` pinskia at gcc dot gnu.org
  2023-05-23 17:53 ` pinskia at gcc dot gnu.org
                   ` (23 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-23 17:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109945

--- Comment #9 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #8)
> (In reply to Arthur O'Dwyer from comment #7)
> > // https://godbolt.org/z/Ea43Y65z4
> > struct Widget {
> >     int i = 1;
> ...
> > In this case, Widget has no constructors,
> 
> No, it has a constructor because of the NSDMI . NSDMI causes a non-trival
> constexpr constructor to be created.

Now this version does not:
```
struct Widget {
    int i;
    int a[4];
};
Widget *global = 0;
Widget make2() { Widget w; w.i = 1; global = &w; return w; }
void g() { global->i = 42; }
int main() {
  Widget w = make2();
  int i = w.i;
  g();
  return (i == w.i);
    // Does this need to be reloaded and
    // compared? or is it obviously true?  
}
```

But does w go out of the scope at the end of make2? Similar question to make in
the original testcase, does the temp go out of scope?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug tree-optimization/109945] Escape analysis hates copy elision: different result with -O1 vs -O2
  2023-05-23 14:00 [Bug c++/109945] New: Escape analysis hates copy elision: different result with -O1 vs -O2 arthur.j.odwyer at gmail dot com
                   ` (8 preceding siblings ...)
  2023-05-23 17:52 ` pinskia at gcc dot gnu.org
@ 2023-05-23 17:53 ` pinskia at gcc dot gnu.org
  2023-05-23 19:38 ` arthur.j.odwyer at gmail dot com
                   ` (22 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-23 17:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109945

--- Comment #10 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #9)
> (In reply to Andrew Pinski from comment #8)
> > (In reply to Arthur O'Dwyer from comment #7)
> But does w go out of the scope at the end of make2? Similar question to make
> in the original testcase, does the temp go out of scope?

For both of these testcases GCC currently warns even:
<source>: In function 'Widget make2()':
<source>:6:35: warning: storing the address of local variable '<return value>
w' in 'global' [-Wdangling-pointer=]
    6 | Widget make2() { Widget w; global = &w; return w; }
      |                            ~~~~~~~^~~~
<source>:6:1: note: '<return value> w' declared here
    6 | Widget make2() { Widget w; global = &w; return w; }
      | ^~~~~~
<source>:5:9: note: 'global' declared here
    5 | Widget *global = nullptr;
      |         ^~~~~~

So Maybe this is undefined code after all.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug tree-optimization/109945] Escape analysis hates copy elision: different result with -O1 vs -O2
  2023-05-23 14:00 [Bug c++/109945] New: Escape analysis hates copy elision: different result with -O1 vs -O2 arthur.j.odwyer at gmail dot com
                   ` (9 preceding siblings ...)
  2023-05-23 17:53 ` pinskia at gcc dot gnu.org
@ 2023-05-23 19:38 ` arthur.j.odwyer at gmail dot com
  2023-05-24  6:48 ` rguenth at gcc dot gnu.org
                   ` (21 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: arthur.j.odwyer at gmail dot com @ 2023-05-23 19:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109945

--- Comment #11 from Arthur O'Dwyer <arthur.j.odwyer at gmail dot com> ---
(In reply to Andrew Pinski from comment #8)
> (In reply to Arthur O'Dwyer from comment #7)
> > // https://godbolt.org/z/Ea43Y65z4
> > struct Widget {
> >     int i = 1;
> ...
> > In this case, Widget has no constructors,
> 
> No, it has a constructor because of the NSDMI. NSDMI causes a non-trivial
> constexpr constructor to be created.

Fair. I meant Widget has no _program-defined_ constructors (such as would have
unknown effects and might invisibly-to-the-compiler escape a copy of `&w`). I
might still be using the wrong term. But you found a better example without
that wrinkle, anyway. :)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug tree-optimization/109945] Escape analysis hates copy elision: different result with -O1 vs -O2
  2023-05-23 14:00 [Bug c++/109945] New: Escape analysis hates copy elision: different result with -O1 vs -O2 arthur.j.odwyer at gmail dot com
                   ` (10 preceding siblings ...)
  2023-05-23 19:38 ` arthur.j.odwyer at gmail dot com
@ 2023-05-24  6:48 ` rguenth at gcc dot gnu.org
  2023-05-24  7:59 ` rguenth at gcc dot gnu.org
                   ` (20 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-24  6:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109945

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ibuclaw at gcc dot gnu.org

--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #9)
> (In reply to Andrew Pinski from comment #8)
> > (In reply to Arthur O'Dwyer from comment #7)
> > > // https://godbolt.org/z/Ea43Y65z4
> > > struct Widget {
> > >     int i = 1;
> > ...
> > > In this case, Widget has no constructors,
> > 
> > No, it has a constructor because of the NSDMI . NSDMI causes a non-trival
> > constexpr constructor to be created.
> 
> Now this version does not:
> ```
> struct Widget {
>     int i;
>     int a[4];
> };
> Widget *global = 0;
> Widget make2() { Widget w; w.i = 1; global = &w; return w; }
> void g() { global->i = 42; }
> int main() {
>   Widget w = make2();
>   int i = w.i;
>   g();
>   return (i == w.i);
>     // Does this need to be reloaded and
>     // compared? or is it obviously true?  
> }
> ```
> 
> But does w go out of the scope at the end of make2? Similar question to make
> in the original testcase, does the temp go out of scope?

In this example we don't have TYPE_NEEDS_CONSTRUCTING set, in the NSDMI case
it's still set.

I'm happy to do points-to analysis adjustments but C++ folks have to answer
whether using return-slot-opt is valid here and what constraints apply.
From the middle-end description

/* In a CALL_EXPR, means that it's safe to use the target of the call
   expansion as the return slot for a call that returns in memory.  */
#define CALL_EXPR_RETURN_SLOT_OPT(NODE) \
  (CALL_EXPR_CHECK (NODE)->base.private_flag)

I'd read that - unless we have another bit unknown to me explicitely indicating
this - the return destination escapes.

Besides the C++ frontend the D frontend uses return-slot-opt as well.  I'll
also note that the gimplifier in gimplify_modify_expr_rhs can make
calls CALL_EXPR_RETURN_SLOT_OPT when they were not before and the only
common condition is the CALL_EXPR is aggregate_value_p (*from_p, *from_p)

For the fun of it I'm testing

diff --git a/gcc/tree-ssa-structalias.cc b/gcc/tree-ssa-structalias.cc
index 56021c59cb9..1e7f0383371 100644
--- a/gcc/tree-ssa-structalias.cc
+++ b/gcc/tree-ssa-structalias.cc
@@ -4366,7 +4366,7 @@ handle_rhs_call (gcall *stmt, vec<ce_s> *results,
   /* And if we applied NRV the address of the return slot escapes as well.  */
   if (gimple_call_return_slot_opt_p (stmt)
       && gimple_call_lhs (stmt) != NULL_TREE
-      && TREE_ADDRESSABLE (TREE_TYPE (gimple_call_lhs (stmt))))
+      && aggregate_value_p (gimple_call_lhs (stmt), gimple_call_fntype
(stmt)))
     {
       int flags = gimple_call_retslot_flags (stmt);
       const int relevant_flags = EAF_NO_DIRECT_ESCAPE

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug tree-optimization/109945] Escape analysis hates copy elision: different result with -O1 vs -O2
  2023-05-23 14:00 [Bug c++/109945] New: Escape analysis hates copy elision: different result with -O1 vs -O2 arthur.j.odwyer at gmail dot com
                   ` (11 preceding siblings ...)
  2023-05-24  6:48 ` rguenth at gcc dot gnu.org
@ 2023-05-24  7:59 ` rguenth at gcc dot gnu.org
  2023-05-24  8:16 ` jakub at gcc dot gnu.org
                   ` (19 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-24  7:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109945

--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #12)
> For the fun of it I'm testing
> 
> diff --git a/gcc/tree-ssa-structalias.cc b/gcc/tree-ssa-structalias.cc
> index 56021c59cb9..1e7f0383371 100644
> --- a/gcc/tree-ssa-structalias.cc
> +++ b/gcc/tree-ssa-structalias.cc
> @@ -4366,7 +4366,7 @@ handle_rhs_call (gcall *stmt, vec<ce_s> *results,
>    /* And if we applied NRV the address of the return slot escapes as well. 
> */
>    if (gimple_call_return_slot_opt_p (stmt)
>        && gimple_call_lhs (stmt) != NULL_TREE
> -      && TREE_ADDRESSABLE (TREE_TYPE (gimple_call_lhs (stmt))))
> +      && aggregate_value_p (gimple_call_lhs (stmt), gimple_call_fntype
> (stmt)))
>      {
>        int flags = gimple_call_retslot_flags (stmt);
>        const int relevant_flags = EAF_NO_DIRECT_ESCAPE

FAIL: g++.dg/opt/pr91838.C  -std=c++14  scan-assembler
pxor\\\\s+%xmm0,\\\\s+%xmm0
FAIL: gcc.dg/tree-ssa/tailcall-7.c scan-tree-dump-times tailc "Found tail call"
5

with the former for -m64 and the latter for -m32 only seems to be the
only fallout here.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug tree-optimization/109945] Escape analysis hates copy elision: different result with -O1 vs -O2
  2023-05-23 14:00 [Bug c++/109945] New: Escape analysis hates copy elision: different result with -O1 vs -O2 arthur.j.odwyer at gmail dot com
                   ` (12 preceding siblings ...)
  2023-05-24  7:59 ` rguenth at gcc dot gnu.org
@ 2023-05-24  8:16 ` jakub at gcc dot gnu.org
  2023-05-24  8:42 ` rguenther at suse dot de
                   ` (18 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-05-24  8:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109945

--- Comment #14 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #13)
> with the former for -m64 and the latter for -m32 only seems to be the
> only fallout here.

It will penalize C and other languages without mandatory NRV in the FEs,
without that I think the address can't escape (taking address then would either
prevent tree-nrv.cc or
even if not, would be still considered taking address of a local variable).
Perhaps we could remember in some FUNCTION_DECL bit whether mandatory NRV was
done and
least for the cases where we know the callee, we know it hasn't done NRV and
!TREE_ADDRESSABLE (TREE_TYPE (gimple_call_lhs (stmt)))) we could avoid this.
But perhaps it is an overkill.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug tree-optimization/109945] Escape analysis hates copy elision: different result with -O1 vs -O2
  2023-05-23 14:00 [Bug c++/109945] New: Escape analysis hates copy elision: different result with -O1 vs -O2 arthur.j.odwyer at gmail dot com
                   ` (13 preceding siblings ...)
  2023-05-24  8:16 ` jakub at gcc dot gnu.org
@ 2023-05-24  8:42 ` rguenther at suse dot de
  2023-05-24  8:51 ` jakub at gcc dot gnu.org
                   ` (17 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: rguenther at suse dot de @ 2023-05-24  8:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109945

--- Comment #15 from rguenther at suse dot de <rguenther at suse dot de> ---
On Wed, 24 May 2023, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109945
> 
> --- Comment #14 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #13)
> > with the former for -m64 and the latter for -m32 only seems to be the
> > only fallout here.
> 
> It will penalize C and other languages without mandatory NRV in the FEs,
> without that I think the address can't escape (taking address then would either
> prevent tree-nrv.cc or
> even if not, would be still considered taking address of a local variable).
> Perhaps we could remember in some FUNCTION_DECL bit whether mandatory NRV was
> done and
> least for the cases where we know the callee, we know it hasn't done NRV and
> !TREE_ADDRESSABLE (TREE_TYPE (gimple_call_lhs (stmt)))) we could avoid this.
> But perhaps it is an overkill.

Well, but then the gimplifier doesn't look at the functions implementation
but decides based on the call alone.  It does have

              else if (TREE_CODE (*to_p) != SSA_NAME
                      && (!is_gimple_variable (*to_p)
                          || needs_to_live_in_memory (*to_p)))
                /* Don't use the original target if it's already 
addressable;
                   if its address escapes, and the called function uses 
the
                   NRV optimization, a conforming program could see *to_p
                   change before the called function returns; see 
c++/19317.
                   When optimizing, the return_slot pass marks more 
functions
                   as safe after we have escape info.  */
                use_target = false;

but as we've seen TREE_ADDRESSABLE is not consistently set on the LHS
even when it's eventually passed by reference to the call
(aka aggregate_value_p is true).  It also seems that the gimplifier
will apply RSO when the call is in a INIT_EXPR.

Note the C frontend shows the same non-escaping when massaging the
testcase to

typedef struct {
    int i;
    int a[4];
} Widget;
Widget *global;
Widget make2() { Widget w; w.i = 1; global = &w; return w; }
void g() { global->i = 42; }
int main() {
  Widget w = make2();
  int i = w.i;
  g();
  return (i == w.i);
    // Does this need to be reloaded and
    // compared? or is it obviously true?  
}

then we get

int main ()
{
  int w$i;
  int i;
  struct Widget w;
  int _1;
  _Bool _2;
  int _7;

  <bb 2> :
  w = make2 (); [return slot optimization]
  w$i_9 = w.i;
  i_5 = w$i_9;
  g ();
  _1 = w$i_9;
  _2 = _1 == i_5;
  _7 = (int) _2;
  w ={v} {CLOBBER(eol)};
  return _7;
}

and w not escaped (but it doesn't seem to miscompile then).  Of course
the testcase relies on RSO to be valid in the first place, C doesn't
make any guarantees here and I'm unsure whether C++ guarantees
that for any of the testcases.  As soon as there's a copy involved
the testcases invoke undefined behavior.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug tree-optimization/109945] Escape analysis hates copy elision: different result with -O1 vs -O2
  2023-05-23 14:00 [Bug c++/109945] New: Escape analysis hates copy elision: different result with -O1 vs -O2 arthur.j.odwyer at gmail dot com
                   ` (14 preceding siblings ...)
  2023-05-24  8:42 ` rguenther at suse dot de
@ 2023-05-24  8:51 ` jakub at gcc dot gnu.org
  2023-05-24  8:55 ` rguenther at suse dot de
                   ` (16 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-05-24  8:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109945

--- Comment #16 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I think the C version is UB, it escapes address of a local variable.  The C++
case is different when the language mandates NVR, in that case one can rely on
it.
TREE_ADDRESSABLE on a type is I think set consistently on a type, it means the
type has non-trivial copy constructor or destructor, so can't be copied
implicitly.
Widget can be copied implicitly in C++, but with mandatory NRV user can rely on
the return value still being in scope when global is dereferenced.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug tree-optimization/109945] Escape analysis hates copy elision: different result with -O1 vs -O2
  2023-05-23 14:00 [Bug c++/109945] New: Escape analysis hates copy elision: different result with -O1 vs -O2 arthur.j.odwyer at gmail dot com
                   ` (15 preceding siblings ...)
  2023-05-24  8:51 ` jakub at gcc dot gnu.org
@ 2023-05-24  8:55 ` rguenther at suse dot de
  2023-05-24  9:00 ` jakub at gcc dot gnu.org
                   ` (15 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: rguenther at suse dot de @ 2023-05-24  8:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109945

--- Comment #17 from rguenther at suse dot de <rguenther at suse dot de> ---
On Wed, 24 May 2023, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109945
> 
> --- Comment #16 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> I think the C version is UB, it escapes address of a local variable.  The C++
> case is different when the language mandates NVR, in that case one can rely on
> it.
> TREE_ADDRESSABLE on a type is I think set consistently on a type, it means the
> type has non-trivial copy constructor or destructor, so can't be copied
> implicitly.
> Widget can be copied implicitly in C++, but with mandatory NRV user can rely on
> the return value still being in scope when global is dereferenced.

OK, so how can we know whether the frontend has mandatory NRV for a
specific call then?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug tree-optimization/109945] Escape analysis hates copy elision: different result with -O1 vs -O2
  2023-05-23 14:00 [Bug c++/109945] New: Escape analysis hates copy elision: different result with -O1 vs -O2 arthur.j.odwyer at gmail dot com
                   ` (16 preceding siblings ...)
  2023-05-24  8:55 ` rguenther at suse dot de
@ 2023-05-24  9:00 ` jakub at gcc dot gnu.org
  2023-05-24 10:12 ` rguenth at gcc dot gnu.org
                   ` (14 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-05-24  9:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109945

--- Comment #18 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
We'd need the FE to note it somewhere (of course, if it is indirect call or the
call doesn't bind to the definition we'd need to assume it might be with
mandatory NRV).
I think in the C++ FE it is finalized in finalize_nrv, but which one is
mandatory and which one is just that the FE does it because it is allowed to,
no idea.
Plus, not really sure if it would be valid to rely on it the way the testcase
does if it is not mandatory.
Jason?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug tree-optimization/109945] Escape analysis hates copy elision: different result with -O1 vs -O2
  2023-05-23 14:00 [Bug c++/109945] New: Escape analysis hates copy elision: different result with -O1 vs -O2 arthur.j.odwyer at gmail dot com
                   ` (17 preceding siblings ...)
  2023-05-24  9:00 ` jakub at gcc dot gnu.org
@ 2023-05-24 10:12 ` rguenth at gcc dot gnu.org
  2023-07-01 23:37 ` pinskia at gcc dot gnu.org
                   ` (13 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-24 10:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109945

--- Comment #19 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #13)
> (In reply to Richard Biener from comment #12)
> > For the fun of it I'm testing
> > 
> > diff --git a/gcc/tree-ssa-structalias.cc b/gcc/tree-ssa-structalias.cc
> > index 56021c59cb9..1e7f0383371 100644
> > --- a/gcc/tree-ssa-structalias.cc
> > +++ b/gcc/tree-ssa-structalias.cc
> > @@ -4366,7 +4366,7 @@ handle_rhs_call (gcall *stmt, vec<ce_s> *results,
> >    /* And if we applied NRV the address of the return slot escapes as well. 
> > */
> >    if (gimple_call_return_slot_opt_p (stmt)
> >        && gimple_call_lhs (stmt) != NULL_TREE
> > -      && TREE_ADDRESSABLE (TREE_TYPE (gimple_call_lhs (stmt))))
> > +      && aggregate_value_p (gimple_call_lhs (stmt), gimple_call_fntype
> > (stmt)))
> >      {
> >        int flags = gimple_call_retslot_flags (stmt);
> >        const int relevant_flags = EAF_NO_DIRECT_ESCAPE
> 
> FAIL: g++.dg/opt/pr91838.C  -std=c++14  scan-assembler
> pxor\\\\s+%xmm0,\\\\s+%xmm0

The above is present even without the patch so it's only the tailcall
testcase that's fallout.

> FAIL: gcc.dg/tree-ssa/tailcall-7.c scan-tree-dump-times tailc "Found tail
> call" 5
> 
> with the former for -m64 and the latter for -m32 only seems to be the
> only fallout here.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug tree-optimization/109945] Escape analysis hates copy elision: different result with -O1 vs -O2
  2023-05-23 14:00 [Bug c++/109945] New: Escape analysis hates copy elision: different result with -O1 vs -O2 arthur.j.odwyer at gmail dot com
                   ` (18 preceding siblings ...)
  2023-05-24 10:12 ` rguenth at gcc dot gnu.org
@ 2023-07-01 23:37 ` pinskia at gcc dot gnu.org
  2023-07-10  8:52 ` rguenth at gcc dot gnu.org
                   ` (12 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-07-01 23:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109945

--- Comment #20 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Isn't this just a dup of bug 84414 ?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug tree-optimization/109945] Escape analysis hates copy elision: different result with -O1 vs -O2
  2023-05-23 14:00 [Bug c++/109945] New: Escape analysis hates copy elision: different result with -O1 vs -O2 arthur.j.odwyer at gmail dot com
                   ` (19 preceding siblings ...)
  2023-07-01 23:37 ` pinskia at gcc dot gnu.org
@ 2023-07-10  8:52 ` rguenth at gcc dot gnu.org
  2024-02-20  9:58 ` rguenth at gcc dot gnu.org
                   ` (11 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-10  8:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109945

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|rguenth at gcc dot gnu.org         |unassigned at gcc dot gnu.org
                 CC|                            |rguenth at gcc dot gnu.org
             Status|ASSIGNED                    |NEW

--- Comment #21 from Richard Biener <rguenth at gcc dot gnu.org> ---
Yes, probably a duplicate.

Jason, it would be nice to have a handle to fix this old issue.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug tree-optimization/109945] Escape analysis hates copy elision: different result with -O1 vs -O2
  2023-05-23 14:00 [Bug c++/109945] New: Escape analysis hates copy elision: different result with -O1 vs -O2 arthur.j.odwyer at gmail dot com
                   ` (20 preceding siblings ...)
  2023-07-10  8:52 ` rguenth at gcc dot gnu.org
@ 2024-02-20  9:58 ` rguenth at gcc dot gnu.org
  2024-02-20 10:45 ` jakub at gcc dot gnu.org
                   ` (10 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-20  9:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109945

--- Comment #22 from Richard Biener <rguenth at gcc dot gnu.org> ---
Ping.  Jakub/Jason - we now have three open bugs because of this.  The patch
in comment#12 works but Jakub doesn't like it.

Instead as far as I understand we need a flag to tell us whether there's
guaranteed copy elision (in which case the programmer can expect using
the return slot address will "work", maybe be even well-defined).
As can be seen with the testcases TREE_ADDRESSABLE on the type does not
cover all cases where that's appearantly so.  Maybe those types simply
need to have TREE_ADDRESSABLE set as otherwise RTL expansion or gimplification
might introduce an additional copy, breaking copy elision?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug tree-optimization/109945] Escape analysis hates copy elision: different result with -O1 vs -O2
  2023-05-23 14:00 [Bug c++/109945] New: Escape analysis hates copy elision: different result with -O1 vs -O2 arthur.j.odwyer at gmail dot com
                   ` (21 preceding siblings ...)
  2024-02-20  9:58 ` rguenth at gcc dot gnu.org
@ 2024-02-20 10:45 ` jakub at gcc dot gnu.org
  2024-02-20 11:53 ` jakub at gcc dot gnu.org
                   ` (9 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-02-20 10:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109945

--- Comment #23 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Changing what types have TREE_ADDRESSABLE flag on it would cause significant
ABI changes, that is not what we can do.
Widget above does not have trivial default constructor, but has trivial copy
constructor and trivial destructor, it certainly can be copied.
Consider adjusted testcase:
struct Widget {
    Widget();
    long i = 1;
    long j = 2;
};
Widget *global = nullptr;
Widget::Widget() { global = this; }
[[gnu::noipa]] Widget make() { return Widget(); }
void g() { global->i = 42; }
int main() {
  Widget w = make();
  int i = w.i;
  g();
  return (i == w.i);
    // Does this need to be reloaded and
    // compared? or is it obviously true?  
}
On x86_64, this Widget is returned in registers, so the assumptions the
testcase has
look wrong to me, dereferencing global when the return went out of scope looks
UB to me.
On i686, this Widget is returned in memory.  Is the testcase valid there or UB
as well?

Jason, thoughts on this?

If the above testcase is for some reason valid conditionally (on some targets,
not on others, depending on calling conventions), I think we could check
TREE_ADDRESSABLE on the called function RESULT_DECL if it has some extra flag
set by the FE that there was guaranteed copy ellision (and assume worst case if
we don't see the called function (we then don't know if it has been compiled by
C or C++ etc.) or it has noipa attribute).

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug tree-optimization/109945] Escape analysis hates copy elision: different result with -O1 vs -O2
  2023-05-23 14:00 [Bug c++/109945] New: Escape analysis hates copy elision: different result with -O1 vs -O2 arthur.j.odwyer at gmail dot com
                   ` (22 preceding siblings ...)
  2024-02-20 10:45 ` jakub at gcc dot gnu.org
@ 2024-02-20 11:53 ` jakub at gcc dot gnu.org
  2024-02-20 12:07 ` redi at gcc dot gnu.org
                   ` (8 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-02-20 11:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109945

--- Comment #24 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
On the #c23 testcase, actually even on i686-linux where it is passed in memory
we optimize main at -O2 -m32 -fno-strict-aliasing -fno-tree-pta (i.e. when the
code mentioned in #c12 is never called) to return 1; - it is SRA which
optimizes that because w isn't TREE_ADDRESSABLE on the caller side.  So we'd
also need to treat such return values as TREE_ADDRESSABLE.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug tree-optimization/109945] Escape analysis hates copy elision: different result with -O1 vs -O2
  2023-05-23 14:00 [Bug c++/109945] New: Escape analysis hates copy elision: different result with -O1 vs -O2 arthur.j.odwyer at gmail dot com
                   ` (23 preceding siblings ...)
  2024-02-20 11:53 ` jakub at gcc dot gnu.org
@ 2024-02-20 12:07 ` redi at gcc dot gnu.org
  2024-02-20 12:15 ` redi at gcc dot gnu.org
                   ` (7 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: redi at gcc dot gnu.org @ 2024-02-20 12:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109945

--- Comment #25 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #9)
> struct Widget {
>     int i;
>     int a[4];
> };
> Widget *global = 0;
> Widget make2() { Widget w; w.i = 1; global = &w; return w; }
> void g() { global->i = 42; }
> int main() {
>   Widget w = make2();
>   int i = w.i;
>   g();
>   return (i == w.i);
>     // Does this need to be reloaded and
>     // compared? or is it obviously true?  
> }
> ```
> 
> But does w go out of the scope at the end of make2?

Yes. This example has undefined behaviour in all versions of C++.

w is a local variable, its lifetime is the function body of make2, and global
becomes an invalid pointer after make2 returns, so dereferencing global in g()
is UB.

> Similar question to make
> in the original testcase, does the temp go out of scope?

Before C++17 yes, it was undefined for exactly the same reasons.

That changed in C++17 and now a temporary is not "materialized" until as late
as possible. In many cases there would be no temporary, the prvalue returned by
make() would initialize w in main without any intermediate temporary.

However, [class.temporary] p3 explicitly allows the compiler to create a
temporary object when returning a class with a trivial copy constructor and
trivial (or deleted) destructor. This is permitted precisely to allow what the
x86_64 ABI requires: the return value is passed in registers. Completely
eliding the creation and copy of a temporary would have required an ABI break.

So the example in comment 0 is also an incorrect program, even in C++17. It's
unspecified whether the prvalue created in make() is only materialized when
constructing w (so there is never any temporary Widget) or whether the prvalue
is materialized to a temporary which then gets copied to w using the trivial
copy constructor.

Since it's unspecified, the program cannot rely on any particular behaviour.
The 'global' pointer might point to w, or it might point to a temporary which
has gone out of scope, and in the latter case, dereferencing it in g() is UB.

So inconsistent behaviour with different optimization settings and/or noipa
attributes seems fine. Either global == &w or the program has UB.

I think this is INVALID.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug tree-optimization/109945] Escape analysis hates copy elision: different result with -O1 vs -O2
  2023-05-23 14:00 [Bug c++/109945] New: Escape analysis hates copy elision: different result with -O1 vs -O2 arthur.j.odwyer at gmail dot com
                   ` (24 preceding siblings ...)
  2024-02-20 12:07 ` redi at gcc dot gnu.org
@ 2024-02-20 12:15 ` redi at gcc dot gnu.org
  2024-02-20 12:33 ` rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: redi at gcc dot gnu.org @ 2024-02-20 12:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109945

--- Comment #26 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Jonathan Wakely from comment #25)
> However, [class.temporary] p3 explicitly allows the compiler to create a
> temporary object when returning a class with a trivial copy constructor and
> trivial (or deleted) destructor. This is permitted precisely to allow what
> the x86_64 ABI requires: the return value is passed in registers. Completely
> eliding the creation and copy of a temporary would have required an ABI
> break.

N.B. this permission to create an intermediate temporary only depends on the
class type and whether it has a trivial copy constructor etc. *not* on the
actual platform ABI and whether it _actually_ gets passed in registers or not.

So GCC is always allowed to make a temporary copy even on i686 where the object
is really passed in memory via invisible reference, because the class has a
trivial copy ctor.

The fact that it doesn't actually make that copy on i686 doesn't make the
program valid on i686 IMHO. Conceptually, there's a temporary which is copied
then goes out of scope, and the 'global' pointer becomes invalid and cannot be
dereferenced. The fact that the actual copy isn't present any more in the
codegen (and the pointer happens to point to w itself which now exists at the
same address as the temporary previously existed) doesn't make the program
valid.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug tree-optimization/109945] Escape analysis hates copy elision: different result with -O1 vs -O2
  2023-05-23 14:00 [Bug c++/109945] New: Escape analysis hates copy elision: different result with -O1 vs -O2 arthur.j.odwyer at gmail dot com
                   ` (25 preceding siblings ...)
  2024-02-20 12:15 ` redi at gcc dot gnu.org
@ 2024-02-20 12:33 ` rguenth at gcc dot gnu.org
  2024-02-20 12:41 ` redi at gcc dot gnu.org
                   ` (5 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-20 12:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109945

--- Comment #27 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Jonathan Wakely from comment #25)
>
> Since it's unspecified, the program cannot rely on any particular behaviour.
> The 'global' pointer might point to w, or it might point to a temporary
> which has gone out of scope, and in the latter case, dereferencing it in g()
> is UB.
> 
> So inconsistent behaviour with different optimization settings and/or noipa
> attributes seems fine. Either global == &w or the program has UB.

Hmm, so GCC thinks that global != &w does always hold and would optimize
an explicit compare that way.  Whether or not it actually creates a
temporary.

Is that invalid optimization?  That is, do we need to consider that
global _might_ point to w?  We currently think it can never point to w.

> I think this is INVALID.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug tree-optimization/109945] Escape analysis hates copy elision: different result with -O1 vs -O2
  2023-05-23 14:00 [Bug c++/109945] New: Escape analysis hates copy elision: different result with -O1 vs -O2 arthur.j.odwyer at gmail dot com
                   ` (26 preceding siblings ...)
  2024-02-20 12:33 ` rguenth at gcc dot gnu.org
@ 2024-02-20 12:41 ` redi at gcc dot gnu.org
  2024-02-20 12:48 ` jakub at gcc dot gnu.org
                   ` (4 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: redi at gcc dot gnu.org @ 2024-02-20 12:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109945

--- Comment #28 from Jonathan Wakely <redi at gcc dot gnu.org> ---
What I tried to say in comment 26 is that we're allowed to always make the
extra copy, which means that global is an invalid pointer, so the program has
UB.

I think the only flaw in that reasoning is what Richard Smith said in PR 84414
comment 2:

"In fact, I think the *only* problem here is that the above rule does not allow
the caller and the callee to observe the objects having the same address."

I would argue that it doesn't say they *can't* have the same address (even if
logically it seems obvious that if we copy a temporary to w then at some point
there must have been two distinct objects with two distinct addresses).

But since one of the pointers is an invalid pointer, you can't do anything with
its value anyway, including comparing it to &w. We can pretend that `global`
initially has the address of the temporary, but after that temporary goes out
of scope and `global` becomes an invalid pointer it bit-flips to some other
address, which happens to be the same bit-pattern as &w. But you can't observe
that in a correct program.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug tree-optimization/109945] Escape analysis hates copy elision: different result with -O1 vs -O2
  2023-05-23 14:00 [Bug c++/109945] New: Escape analysis hates copy elision: different result with -O1 vs -O2 arthur.j.odwyer at gmail dot com
                   ` (27 preceding siblings ...)
  2024-02-20 12:41 ` redi at gcc dot gnu.org
@ 2024-02-20 12:48 ` jakub at gcc dot gnu.org
  2024-02-20 15:47 ` arthur.j.odwyer at gmail dot com
                   ` (3 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-02-20 12:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109945

--- Comment #29 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Jonathan Wakely from comment #28)
> What I tried to say in comment 26 is that we're allowed to always make the
> extra copy, which means that global is an invalid pointer, so the program
> has UB.
> 
> I think the only flaw in that reasoning is what Richard Smith said in PR
> 84414 comment 2:
> 
> "In fact, I think the *only* problem here is that the above rule does not
> allow the caller and the callee to observe the objects having the same
> address."
> 
> I would argue that it doesn't say they *can't* have the same address (even
> if logically it seems obvious that if we copy a temporary to w then at some
> point there must have been two distinct objects with two distinct addresses).

For the passing in registers case, the usual case is that there isn't just a
single copy but several, one to the registers, another to other registers or
some memory.
And I don't see why it couldn't yield the same final memory as the source
memory.
Usually it will not, in the caller you ask for the stack slot already before
the call, but in case the compiler would ask for it only after the call (in
alloca-ish way), it could happen to have the same address.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug tree-optimization/109945] Escape analysis hates copy elision: different result with -O1 vs -O2
  2023-05-23 14:00 [Bug c++/109945] New: Escape analysis hates copy elision: different result with -O1 vs -O2 arthur.j.odwyer at gmail dot com
                   ` (28 preceding siblings ...)
  2024-02-20 12:48 ` jakub at gcc dot gnu.org
@ 2024-02-20 15:47 ` arthur.j.odwyer at gmail dot com
  2024-02-20 15:52 ` arthur.j.odwyer at gmail dot com
                   ` (2 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: arthur.j.odwyer at gmail dot com @ 2024-02-20 15:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109945

--- Comment #30 from Arthur O'Dwyer <arthur.j.odwyer at gmail dot com> ---
I think I understand jwakely's argument at this point, and it's consistent and
teachable. https://eel.is/c++draft/class.temporary#3.sentence-1 says:

> When an object of class type X is passed to or returned from a function, if X has at least one eligible copy or move constructor, each such constructor is trivial, and the destructor of X is either trivial or deleted, implementations are permitted to create a temporary object to hold the function parameter or result object.

It says "implementations are permitted"; this means that an implementation
(GCC) can create such a temporary *whenever* permitted, and then re-optimize
the codegen under the As-If Rule to eliminate the trivial move. That still ends
the lifetime of the original object. `global` points to that original object,
which is out-of-lifetime, so dereferencing `global` is UB. (Yes, even if its
value happens to _compare_ equal to some other pointer that's still
in-lifetime: this is "pointer provenance" as I see and understand it.)

So, in order for me to call this `Widget` example a "bug," I'd have to find a
`Widget` that doesn't meet [class.temporary]/3's conditions. That is, it would
have to be a type that does NOT have eligible copy or move constructors; or has
at least one NON-trivial eligible copy or move constructor; or has a
NON-deleted NON-trivial destructor. Then the implementation (GCC) would NOT be
permitted to create a temporary and the argument wouldn't hold anymore.

And indeed, I cannot find such a `Widget`! Every example I've tried so far
matches jwakely's explanation. For example: https://godbolt.org/z/rT6Mv537e

struct X {
    X();
    X(X&, int=0);
    X(const X&) = default;
    int i = 1;
    int a[4];
};

This `X` has a non-trivial eligible copy constructor, so it doesn't meet
[class.temporary]/3's conditions, and GCC treats it the same at -O1 and -O2.
If you delete the characters `=0`, then that constructor is no longer a copy
constructor, so `X` DOES meet [class.temporary]/3 and GCC treats it differently
at -O1 and -O2 (which is fine because now the program has UB, as jwakely says).
GCC is not misbehaving in this example.

Here's another example (still matching jwakely's argument perfectly: GCC is not
misbehaving) -- this one exploits [special]/6 to play with whether the
non-trivial copy ctor is "eligible" -- https://godbolt.org/z/PWT85n5xb

I'm satisfied with the explanation -- and GCC seems to be implementing it
correctly AFAICT -- so I think it would be reasonable to close this bug as
INVALID. On the other hand, if anyone wants to argue that the current behavior
is technically correct but super confusing to working programmers, I wouldn't
argue against them, either. ;)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug tree-optimization/109945] Escape analysis hates copy elision: different result with -O1 vs -O2
  2023-05-23 14:00 [Bug c++/109945] New: Escape analysis hates copy elision: different result with -O1 vs -O2 arthur.j.odwyer at gmail dot com
                   ` (29 preceding siblings ...)
  2024-02-20 15:47 ` arthur.j.odwyer at gmail dot com
@ 2024-02-20 15:52 ` arthur.j.odwyer at gmail dot com
  2024-02-20 16:49 ` redi at gcc dot gnu.org
  2024-03-01 17:20 ` redi at gcc dot gnu.org
  32 siblings, 0 replies; 34+ messages in thread
From: arthur.j.odwyer at gmail dot com @ 2024-02-20 15:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109945

--- Comment #31 from Arthur O'Dwyer <arthur.j.odwyer at gmail dot com> ---
Oops, I guess my reading did disagree with jwakely's in one small point:
jwakely writes--
> But since one of the pointers is an invalid pointer,
> you can't do anything with its value anyway, including
> comparing it to &w.

In my interpretation, it was fine to compare `global == &w`; but the boolean
value of that comparison would be unspecified (because `*global` was
out-of-lifetime), and even if it *did* happen to come out to `true`, it would
still be UB to execute `global->i = 42` (because `*global` would still be a
different object from `w`, and `*global` is out-of-lifetime). I think this
makes no difference to the ultimate conclusion, i.e. something here ends up
being UB either way.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug tree-optimization/109945] Escape analysis hates copy elision: different result with -O1 vs -O2
  2023-05-23 14:00 [Bug c++/109945] New: Escape analysis hates copy elision: different result with -O1 vs -O2 arthur.j.odwyer at gmail dot com
                   ` (30 preceding siblings ...)
  2024-02-20 15:52 ` arthur.j.odwyer at gmail dot com
@ 2024-02-20 16:49 ` redi at gcc dot gnu.org
  2024-03-01 17:20 ` redi at gcc dot gnu.org
  32 siblings, 0 replies; 34+ messages in thread
From: redi at gcc dot gnu.org @ 2024-02-20 16:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109945

--- Comment #32 from Jonathan Wakely <redi at gcc dot gnu.org> ---
[basic.stc.general] says global == &w is implementation-defined if global is an
invalid pointer value, not just unspecified. GCC could define it to be
unspecified, or UB, and even say "it's UB just in this one case where the
address was a temporary created under the permission granted by
[class.temporary] p3" (although I don't think we do define it that way, we just
compare the bits).

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bug tree-optimization/109945] Escape analysis hates copy elision: different result with -O1 vs -O2
  2023-05-23 14:00 [Bug c++/109945] New: Escape analysis hates copy elision: different result with -O1 vs -O2 arthur.j.odwyer at gmail dot com
                   ` (31 preceding siblings ...)
  2024-02-20 16:49 ` redi at gcc dot gnu.org
@ 2024-03-01 17:20 ` redi at gcc dot gnu.org
  32 siblings, 0 replies; 34+ messages in thread
From: redi at gcc dot gnu.org @ 2024-03-01 17:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109945

--- Comment #33 from Jonathan Wakely <redi at gcc dot gnu.org> ---
This is now https://cplusplus.github.io/CWG/issues/2868.html

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2024-03-01 17:21 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-23 14:00 [Bug c++/109945] New: Escape analysis hates copy elision: different result with -O1 vs -O2 arthur.j.odwyer at gmail dot com
2023-05-23 16:09 ` [Bug tree-optimization/109945] " rguenth at gcc dot gnu.org
2023-05-23 16:11 ` rguenth at gcc dot gnu.org
2023-05-23 16:19 ` pinskia at gcc dot gnu.org
2023-05-23 16:48 ` rguenth at gcc dot gnu.org
2023-05-23 16:53 ` rguenth at gcc dot gnu.org
2023-05-23 16:53 ` pinskia at gcc dot gnu.org
2023-05-23 17:46 ` arthur.j.odwyer at gmail dot com
2023-05-23 17:49 ` pinskia at gcc dot gnu.org
2023-05-23 17:52 ` pinskia at gcc dot gnu.org
2023-05-23 17:53 ` pinskia at gcc dot gnu.org
2023-05-23 19:38 ` arthur.j.odwyer at gmail dot com
2023-05-24  6:48 ` rguenth at gcc dot gnu.org
2023-05-24  7:59 ` rguenth at gcc dot gnu.org
2023-05-24  8:16 ` jakub at gcc dot gnu.org
2023-05-24  8:42 ` rguenther at suse dot de
2023-05-24  8:51 ` jakub at gcc dot gnu.org
2023-05-24  8:55 ` rguenther at suse dot de
2023-05-24  9:00 ` jakub at gcc dot gnu.org
2023-05-24 10:12 ` rguenth at gcc dot gnu.org
2023-07-01 23:37 ` pinskia at gcc dot gnu.org
2023-07-10  8:52 ` rguenth at gcc dot gnu.org
2024-02-20  9:58 ` rguenth at gcc dot gnu.org
2024-02-20 10:45 ` jakub at gcc dot gnu.org
2024-02-20 11:53 ` jakub at gcc dot gnu.org
2024-02-20 12:07 ` redi at gcc dot gnu.org
2024-02-20 12:15 ` redi at gcc dot gnu.org
2024-02-20 12:33 ` rguenth at gcc dot gnu.org
2024-02-20 12:41 ` redi at gcc dot gnu.org
2024-02-20 12:48 ` jakub at gcc dot gnu.org
2024-02-20 15:47 ` arthur.j.odwyer at gmail dot com
2024-02-20 15:52 ` arthur.j.odwyer at gmail dot com
2024-02-20 16:49 ` redi at gcc dot gnu.org
2024-03-01 17:20 ` redi at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).