From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id E3CB83951C78; Fri, 12 Jun 2020 02:38:24 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E3CB83951C78 From: "wuxu.wu at huawei dot com" To: glibc-bugs@sourceware.org Subject: [Bug nptl/26104] New forked process __reclaim_stacks endless loop Date: Fri, 12 Jun 2020 02:38:24 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: nptl X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: wuxu.wu at huawei dot com X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: glibc-bugs@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Glibc-bugs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Jun 2020 02:38:25 -0000 https://sourceware.org/bugzilla/show_bug.cgi?id=3D26104 --- Comment #3 from buque --- Hi, your analysis is exactly what I think. We install a device with glibc 2.17(centos 7.5), process B deadloop in = line 2-6, cpu core is 100%.=20 It's seems like crash at first, the bad address 0x11940177fe9c0 is 0x7fd9177fe9c0=EF=BC=8Cline 5 damaged this pointer later=EF=BC=88self-pid= =3D72000=EF=BC=89=E3=80=82It's amazing that not crash, I guss there is a ring. As your says, it is very hard to reproduce the problem. glibc2.17 had u= sed sereval years and only one time, I will reproduce with white box test in a = few days. I think it's hard to solve this bug with lock-free manner, it can't stop reading and writting, this bring intermediate state. Maybe you have a better way. (gdb) p 0x11940 $1 =3D 72000 1 /* Reset the PIDs in any cached stacks. */ 2 list_for_each (runp, &stack_cache) 3 { 4 struct pthread *curp =3D list_entry (runp, struct pthread, list); 5 curp->pid =3D self->pid; 6 } Detaching from program: /usr/bin/sysmonitor, process 72000 [root@cn-north-4b-CloudDataCompassSurfer-010077236019 ~]# (gdb) info r rax 0x7fd9167fc9c0 140570362104256 rbx 0x7fd93f30f010 140571044802576 rcx 0x7fd9177fe9c0 140570378889664 rdx 0x11940 72000 rsi 0x7fd915ffb9c0 140570353711552 rdi 0x7fd93eeef5c0 140571040478656 rbp 0x7fd93f30f020 0x7fd93f30f020 rsp 0x7fd9357f9608 0x7fd9357f9608 r8 0x7fd93f30f010 140571044802576 r9 0x159a 5530 r10 0x7fd93eb23700 140571036497664 r11 0x7fd9357fa700 140570882189056 r12 0x0 0 r13 0x0 0 r14 0x7fd93f749000 140571049234432 r15 0x7fd9357f99e0 140570882185696 rip 0x7fd93f0ff3aa 0x7fd93f0ff3aa <__reclaim_stacks+538> eflags 0x287 [ CF PF SF IF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0 (gdb) n 900 curp->pid =3D self->pid; (gdb) p curp $3 =3D (struct pthread *) 0x7fd916ffd700 (gdb) p stack_cache $4 =3D {next =3D 0x11940177fe9c0, prev =3D 0x7fd9177fe9c0} (gdb) p stack_cache.prev $5 =3D (struct list_head *) 0x7fd9177fe9c0 (gdb) p stack_cache.prev->prev $6 =3D (struct list_head *) 0x7fd93f30f020 (gdb) p curp $7 =3D (struct pthread *) 0x7fd916ffd700 (gdb) i r rax 0x7fd916ffd9c0 140570370496960 rbx 0x7fd93f30f010 140571044802576 rcx 0x7fd9177fe9c0 140570378889664 rdx 0x11940 72000 rsi 0x7fd915ffb9c0 140570353711552 rdi 0x7fd93eeef5c0 140571040478656 rbp 0x7fd93f30f020 0x7fd93f30f020 rsp 0x7fd9357f9608 0x7fd9357f9608 r8 0x7fd93f30f010 140571044802576 r9 0x159a 5530 r10 0x7fd93eb23700 140571036497664 r11 0x7fd9357fa700 140570882189056 r12 0x0 0 r13 0x0 0 r14 0x7fd93f749000 140571049234432 r15 0x7fd9357f99e0 140570882185696 rip 0x7fd93f0ff3a0 0x7fd93f0ff3a0 <__reclaim_stacks+528> eflags 0x287 [ CF PF SF IF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0 (gdb) n 897 list_for_each (runp, &stack_cache) (gdb) n 900 curp->pid =3D self->pid; (gdb) p curp $8 =3D (struct pthread *) 0x7fd90f7fe700 (gdb) n 897 list_for_each (runp, &stack_cache) (gdb) n 900 curp->pid =3D self->pid; (gdb) p curp $9 =3D (struct pthread *) 0x7fd917fff700 (gdb) n 897 list_for_each (runp, &stack_cache) (gdb) n 900 curp->pid =3D self->pid; (gdb) p curp $10 =3D (struct pthread *) 0x7fd934ff9700 (gdb) n 897 list_for_each (runp, &stack_cache) (gdb) 900 curp->pid =3D self->pid; (gdb) p curp $11 =3D (struct pthread *) 0x7fd9357fa700 (gdb) p stack_cache $12 =3D {next =3D 0x11940177fe9c0, prev =3D 0x7fd9177fe9c0} (gdb) q A debugging session is active. Inferior 1 [process 72000] will be detached. (gdb) p stack_cache $1 =3D {next =3D 0x11940177fe9c0, prev =3D 0x7fd9177fe9c0} (gdb) p stack_cache.next $2 =3D (struct list_head *) 0x11940177fe9c0 (gdb) p stack_cache.next->next Cannot access memory at address 0x11940177fe9c0=20=20 //0x11940177fe9c0(0x7fd9177fe9c0) --=20 You are receiving this mail because: You are on the CC list for the bug.=