From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <sourceware-bugzilla@sourceware.org>
Received: by sourceware.org (Postfix, from userid 48)
 id 44F9B3938C2C; Thu, 11 Jun 2020 17:57:46 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 44F9B3938C2C
From: "carlos at redhat dot com" <sourceware-bugzilla@sourceware.org>
To: glibc-bugs@sourceware.org
Subject: [Bug nptl/26104] New forked process __reclaim_stacks endless loop
Date: Thu, 11 Jun 2020 17:57:46 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: glibc
X-Bugzilla-Component: nptl
X-Bugzilla-Version: unspecified
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: carlos at redhat dot com
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: unassigned at sourceware dot org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: cc everconfirmed cf_reconfirmed_on see_also
 bug_status
Message-ID: <bug-26104-131-lX3KOwBX58@http.sourceware.org/bugzilla/>
In-Reply-To: <bug-26104-131@http.sourceware.org/bugzilla/>
References: <bug-26104-131@http.sourceware.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://sourceware.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: glibc-bugs@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Glibc-bugs mailing list <glibc-bugs.sourceware.org>
List-Unsubscribe: <http://sourceware.org/mailman/options/glibc-bugs>,
 <mailto:glibc-bugs-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/glibc-bugs/>
List-Help: <mailto:glibc-bugs-request@sourceware.org?subject=help>
List-Subscribe: <http://sourceware.org/mailman/listinfo/glibc-bugs>,
 <mailto:glibc-bugs-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Jun 2020 17:57:46 -0000

https://sourceware.org/bugzilla/show_bug.cgi?id=3D26104

Carlos O'Donell <carlos at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |carlos at redhat dot com
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2020-06-11
           See Also|                            |https://sourceware.org/bugz
                   |                            |illa/show_bug.cgi?id=3D173=
26
             Status|UNCONFIRMED                 |NEW
--- Comment #1 from Carlos O'Donell <carlos at redhat dot com> ---
I believe this is a duplicate of bug 17326.

What version of glibc are you using?

It doesn't seem like you've shown proof of an endless loop. Your example se=
ems
to show that the list simply terminates in unmapped memory which would caus=
e a
crash. The next addres e.g. 0x11940177fe9c0 looks corrupted. Do you see an
endless loop or a crash?

In order to end up in an endless loop you have to have a non-head entry in =
the
cache that points to reused memory that happens to contain values that crea=
te
an endless loop in __reclaim_stacks().

In order to accomplish this the forked process must observe:

* An incomplete transition of the list entry, with the next pointer pointin=
g to
a mapping that is being used for other purposes and contains data that caus=
es
the circular list.

* A thread in the parent must have unlinked the entry via free_stacks, must
have unmapped the memory, must have remapped something else in that VMA whi=
ch
contains data that causes the circular list.

I'm not sure how that could happen. If you can write out your analysis that
would help.

Process A
- Thread B
  - calls pthread_join
- Thread A exits and is joined
  - __free_tcb()
    - __deallocate_stack()
      - queue_stack()
        - stack cache is full
        - free_stacks (walk list backwards via ->prev)
          - stack_list_del
          - in_flight_stack =3D elem
          - atomic write barrier
          - list_del(elem)

 52 /* Remove element from list.  */
 53 static inline void
 54 list_del (list_t *elem)
 55 {
 56   elem->next->prev =3D elem->prev;
 57   elem->prev->next =3D elem->next;
 58 }

Process B
- Forked.
- COW copies page containing stack_cache head, observing none of the updates
from line 56, and 57 because they have not been flushed yet.
- COW copies page that contains the reused memory that used to be a struct
pthread at the head's next element.
- __reclaim_stacks()
  - Loops endlessly clearing the same memory.

This looks like a real bug, but the probability of this seems low. It is st=
ill
a real problem that should be solved.

I think the list manipulation needs to be capable of being asynchronously
interrupted by the fork and that needs to be taken into consideration.

In bug 17326 we make the locking more complex, and I think that's a bad ide=
a.
This needs to work in a lock-free manner.

--=20
You are receiving this mail because:
You are on the CC list for the bug.=