From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id C06CC3858D28; Mon, 25 Apr 2022 09:31:13 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C06CC3858D28 From: "avi at scylladb dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug c++/105373] New: miscompile involving lambda coroutines and an object bitwise copied instead of via the copy constructor Date: Mon, 25 Apr 2022 09:31:13 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: c++ X-Bugzilla-Version: 11.3.1 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: avi at scylladb dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Apr 2022 09:31:13 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D105373 Bug ID: 105373 Summary: miscompile involving lambda coroutines and an object bitwise copied instead of via the copy constructor Product: gcc Version: 11.3.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: avi at scylladb dot com Target Milestone: --- This is a bug in a complex piece of code, so I'll need guidance on what fur= ther information to provide (e.g. intermediate code dumps). It reproduces with various levels of debug information, and the code works with clang. At the heart there is a shared pointer, which I've enhanced so that all pointers that point to the same object keep track of each other (also the pointee points back to one of the pointers). The pointee keeps an increment= ing generation count. In the end I have two pointer objects that are bitwise eq= ual, even though they should have different generation counts and different doubly-linked-list pointers. This proves that a bitwise copy happened. It doesn't prove a miscompile, since my code could have decided to perform a bitwise copy, but the fact that it works in clang indicates it's a gcc bug.= The bug happens with asan too, and with different sizeof(the smart pointer) so = it's not some stray write. This is the snippet that causes the trouble: 0 tlogger.info("before updating cache {}", fmt::ptr(old3.get())); 1 co_await with_scheduling_group(_config.memtable_to_cache_scheduling_group, [this, ol= d4 =3D old3, &newtabs] () -> future<> { 2 tlogger.info("updating cache {}", fmt::ptr(old4.get())= ); 3 return update_cache(old4, newtabs); 4 }); old3 and old4 are all copies of the same smart pointer (there are also old = and old1 and old2 elsewhere, but they are correct). In the call to update_cache= (), we attempt to make a copy of old4, and an internal check finds the link lis= t is corrupted. Inspecting old4 in the debugger (from the printout in line 0) and the source of the copy in line 3 shows they are the same, but have different addresses: 0 INFO 2022-04-25 12:04:14,722 [shard 0] table - before updating cache 0x600000666c40 1 copying @0x60000520bef8 <- @0x6000052108f0 0x600000666c40 refcnt 6 gen 43 2 INFO 2022-04-25 12:04:14,722 [shard 0] table - updating cache 0x60000066= 6c40 3 (gdb) p this $1 =3D (const seastar::lw_shared_ptr * const) 0x60000520= bed0 4 (gdb) p *this 5 $2 =3D {_ptr =3D 0x600000666c40, _next =3D 0x6000052108f0, _prev =3D 0x60= 000597ae48, _generation =3D 43} 6 (gdb) x/4gx 0x60000520bef8 7 0x60000520bef8: 0x0000600000666c40 0x00006000052108f0 8 0x60000520bf08: 0x000060000597ae48 0x000000000000002b 9 (gdb) x/4gx this 10 0x60000520bed0: 0x0000600000666c40 0x00006000052108f0 11 0x60000520bee0: 0x000060000597ae48 0x000000000000002b in line 3 I dump old4 (from the printout in line 1, where old3 is copied in= to old4). But "this" doesn't point to old4., it points to a bitwise copy of ol= d4, as shown in lines 7-8 and 10-11. Note that both old4 and the bad copy "this" are members of the coroutine fr= ame. I realize this isn't enough to analyze the situation, I'm happy to provide = more information if you direct me how.=