From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 44F9B3938C2C; Thu, 11 Jun 2020 17:57:46 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 44F9B3938C2C From: "carlos at redhat dot com" To: glibc-bugs@sourceware.org Subject: [Bug nptl/26104] New forked process __reclaim_stacks endless loop Date: Thu, 11 Jun 2020 17:57:46 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: nptl X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: carlos at redhat dot com X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc everconfirmed cf_reconfirmed_on see_also bug_status Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: glibc-bugs@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Glibc-bugs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Jun 2020 17:57:46 -0000 https://sourceware.org/bugzilla/show_bug.cgi?id=3D26104 Carlos O'Donell changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |carlos at redhat dot com Ever confirmed|0 |1 Last reconfirmed| |2020-06-11 See Also| |https://sourceware.org/bugz | |illa/show_bug.cgi?id=3D173= 26 Status|UNCONFIRMED |NEW --- Comment #1 from Carlos O'Donell --- I believe this is a duplicate of bug 17326. What version of glibc are you using? It doesn't seem like you've shown proof of an endless loop. Your example se= ems to show that the list simply terminates in unmapped memory which would caus= e a crash. The next addres e.g. 0x11940177fe9c0 looks corrupted. Do you see an endless loop or a crash? In order to end up in an endless loop you have to have a non-head entry in = the cache that points to reused memory that happens to contain values that crea= te an endless loop in __reclaim_stacks(). In order to accomplish this the forked process must observe: * An incomplete transition of the list entry, with the next pointer pointin= g to a mapping that is being used for other purposes and contains data that caus= es the circular list. * A thread in the parent must have unlinked the entry via free_stacks, must have unmapped the memory, must have remapped something else in that VMA whi= ch contains data that causes the circular list. I'm not sure how that could happen. If you can write out your analysis that would help. Process A - Thread B - calls pthread_join - Thread A exits and is joined - __free_tcb() - __deallocate_stack() - queue_stack() - stack cache is full - free_stacks (walk list backwards via ->prev) - stack_list_del - in_flight_stack =3D elem - atomic write barrier - list_del(elem) 52 /* Remove element from list. */ 53 static inline void 54 list_del (list_t *elem) 55 { 56 elem->next->prev =3D elem->prev; 57 elem->prev->next =3D elem->next; 58 } Process B - Forked. - COW copies page containing stack_cache head, observing none of the updates from line 56, and 57 because they have not been flushed yet. - COW copies page that contains the reused memory that used to be a struct pthread at the head's next element. - __reclaim_stacks() - Loops endlessly clearing the same memory. This looks like a real bug, but the probability of this seems low. It is st= ill a real problem that should be solved. I think the list manipulation needs to be capable of being asynchronously interrupted by the fork and that needs to be taken into consideration. In bug 17326 we make the locking more complex, and I think that's a bad ide= a. This needs to work in a lock-free manner. --=20 You are receiving this mail because: You are on the CC list for the bug.=