From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <sourceware-bugzilla@sourceware.org>
Received: by sourceware.org (Postfix, from userid 48)
 id E3CB83951C78; Fri, 12 Jun 2020 02:38:24 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E3CB83951C78
From: "wuxu.wu at huawei dot com" <sourceware-bugzilla@sourceware.org>
To: glibc-bugs@sourceware.org
Subject: [Bug nptl/26104] New forked process __reclaim_stacks endless loop
Date: Fri, 12 Jun 2020 02:38:24 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: glibc
X-Bugzilla-Component: nptl
X-Bugzilla-Version: unspecified
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: wuxu.wu at huawei dot com
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: unassigned at sourceware dot org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-26104-131-MtOnukjK7D@http.sourceware.org/bugzilla/>
In-Reply-To: <bug-26104-131@http.sourceware.org/bugzilla/>
References: <bug-26104-131@http.sourceware.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://sourceware.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: glibc-bugs@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Glibc-bugs mailing list <glibc-bugs.sourceware.org>
List-Unsubscribe: <http://sourceware.org/mailman/options/glibc-bugs>,
 <mailto:glibc-bugs-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/glibc-bugs/>
List-Help: <mailto:glibc-bugs-request@sourceware.org?subject=help>
List-Subscribe: <http://sourceware.org/mailman/listinfo/glibc-bugs>,
 <mailto:glibc-bugs-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Fri, 12 Jun 2020 02:38:25 -0000

https://sourceware.org/bugzilla/show_bug.cgi?id=3D26104

--- Comment #3 from buque <wuxu.wu at huawei dot com> ---
    Hi, your analysis is exactly what I think.

    We install a device with glibc 2.17(centos 7.5), process B deadloop in =
line
2-6, cpu core is 100%.=20
    It's seems like crash at first, the bad address 0x11940177fe9c0 is
0x7fd9177fe9c0=EF=BC=8Cline 5 damaged this pointer later=EF=BC=88self-pid=
=3D72000=EF=BC=89=E3=80=82It's amazing
that not crash, I guss there is a ring.
    As your says, it is very hard to reproduce the problem. glibc2.17 had u=
sed
sereval years and only one time, I will reproduce with white box test in a =
few
days.
    I think it's hard to solve this bug with lock-free manner, it can't stop
reading and writting, this bring intermediate state. Maybe you have a better
way.


(gdb) p 0x11940
$1 =3D 72000

1  /* Reset the PIDs in any cached stacks.  */
2  list_for_each (runp, &stack_cache)
3    {
4      struct pthread *curp =3D list_entry (runp, struct pthread, list);
5      curp->pid =3D self->pid;
6    }

Detaching from program: /usr/bin/sysmonitor, process 72000
[root@cn-north-4b-CloudDataCompassSurfer-010077236019 ~]#

(gdb) info r
rax            0x7fd9167fc9c0   140570362104256
rbx            0x7fd93f30f010   140571044802576
rcx            0x7fd9177fe9c0   140570378889664
rdx            0x11940  72000
rsi            0x7fd915ffb9c0   140570353711552
rdi            0x7fd93eeef5c0   140571040478656
rbp            0x7fd93f30f020   0x7fd93f30f020 <stack_cache>
rsp            0x7fd9357f9608   0x7fd9357f9608
r8             0x7fd93f30f010   140571044802576
r9             0x159a   5530
r10            0x7fd93eb23700   140571036497664
r11            0x7fd9357fa700   140570882189056
r12            0x0      0
r13            0x0      0
r14            0x7fd93f749000   140571049234432
r15            0x7fd9357f99e0   140570882185696
rip            0x7fd93f0ff3aa   0x7fd93f0ff3aa <__reclaim_stacks+538>
eflags         0x287    [ CF PF SF IF ]
cs             0x33     51
ss             0x2b     43
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x0      0
(gdb) n
900           curp->pid =3D self->pid;
(gdb) p curp
$3 =3D (struct pthread *) 0x7fd916ffd700
(gdb) p stack_cache
$4 =3D {next =3D 0x11940177fe9c0, prev =3D 0x7fd9177fe9c0}
(gdb) p stack_cache.prev
$5 =3D (struct list_head *) 0x7fd9177fe9c0
(gdb) p stack_cache.prev->prev
$6 =3D (struct list_head *) 0x7fd93f30f020 <stack_cache>
(gdb) p curp
$7 =3D (struct pthread *) 0x7fd916ffd700
(gdb) i r
rax            0x7fd916ffd9c0   140570370496960
rbx            0x7fd93f30f010   140571044802576
rcx            0x7fd9177fe9c0   140570378889664
rdx            0x11940  72000
rsi            0x7fd915ffb9c0   140570353711552
rdi            0x7fd93eeef5c0   140571040478656
rbp            0x7fd93f30f020   0x7fd93f30f020 <stack_cache>
rsp            0x7fd9357f9608   0x7fd9357f9608
r8             0x7fd93f30f010   140571044802576
r9             0x159a   5530
r10            0x7fd93eb23700   140571036497664
r11            0x7fd9357fa700   140570882189056
r12            0x0      0
r13            0x0      0
r14            0x7fd93f749000   140571049234432
r15            0x7fd9357f99e0   140570882185696
rip            0x7fd93f0ff3a0   0x7fd93f0ff3a0 <__reclaim_stacks+528>
eflags         0x287    [ CF PF SF IF ]
cs             0x33     51
ss             0x2b     43
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x0      0
(gdb) n
897       list_for_each (runp, &stack_cache)
(gdb) n
900           curp->pid =3D self->pid;
(gdb) p curp
$8 =3D (struct pthread *) 0x7fd90f7fe700
(gdb) n
897       list_for_each (runp, &stack_cache)
(gdb) n
900           curp->pid =3D self->pid;
(gdb) p curp
$9 =3D (struct pthread *) 0x7fd917fff700
(gdb) n
897       list_for_each (runp, &stack_cache)
(gdb) n
900           curp->pid =3D self->pid;
(gdb) p curp
$10 =3D (struct pthread *) 0x7fd934ff9700
(gdb) n
897       list_for_each (runp, &stack_cache)
(gdb)
900           curp->pid =3D self->pid;
(gdb) p curp
$11 =3D (struct pthread *) 0x7fd9357fa700
(gdb) p stack_cache
$12 =3D {next =3D 0x11940177fe9c0, prev =3D 0x7fd9177fe9c0}
(gdb) q
A debugging session is active.

        Inferior 1 [process 72000] will be detached.

(gdb) p stack_cache
$1 =3D {next =3D 0x11940177fe9c0, prev =3D 0x7fd9177fe9c0}
(gdb) p stack_cache.next
$2 =3D (struct list_head *) 0x11940177fe9c0
(gdb) p stack_cache.next->next
Cannot access memory at address 0x11940177fe9c0=20=20
//0x11940177fe9c0(0x7fd9177fe9c0)

--=20
You are receiving this mail because:
You are on the CC list for the bug.=