From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 39497 invoked by alias); 22 Sep 2016 14:26:27 -0000 Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: systemtap-owner@sourceware.org Received: (qmail 33927 invoked by uid 48); 22 Sep 2016 14:26:13 -0000 From: "dsmith at redhat dot com" To: systemtap@sourceware.org Subject: [Bug testsuite/20600] parallel testsuite hang in [nd_]syscall.exp Date: Thu, 22 Sep 2016 14:26:00 -0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: systemtap X-Bugzilla-Component: testsuite X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: dsmith at redhat dot com X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: systemtap at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2016-q3/txt/msg00304.txt.bz2 https://sourceware.org/bugzilla/show_bug.cgi?id=3D20600 --- Comment #4 from David Smith --- Here's an update. I believe I've tracked this down to _stp_init_time(). I've added lots of 'might_sleep()' calls to that function in my local copy of systemtap, and I= got the following: =3D=3D=3D=3D Sep 21 22:32:40 ibm-p8-01-lp7.lab.eng.rdu.redhat.com kernel: BUG: sleeping function called from invalid context at /usr/local/share/systemtap/runtime/time.c:323 Sep 21 22:32:40 ibm-p8-01-lp7.lab.eng.rdu.redhat.com kernel: in_atomic(): 1, irqs_disabled(): 0, pid: 5960, name: stapio Sep 21 22:32:40 ibm-p8-01-lp7.lab.eng.rdu.redhat.com kernel: INFO: lockdep = is turned off. =3D=3D=3D=3D Here's that section of runtime/time.c (with line numbers): =3D=3D=3D=3D 310 might_sleep(); 311 stp_time =3D _stp_alloc_percpu(sizeof(stp_time_t)); 312 if (unlikely(stp_time =3D=3D 0)) 313 return -1; 314=20=20 315 might_sleep(); 316 #ifdef STAPCONF_ONEACHCPU_RETRY 317 ret =3D on_each_cpu(__stp_init_time, NULL, 0, 1); 318 #else 319 ret =3D on_each_cpu(__stp_init_time, NULL, 1); 320 #endif 321=20=20 322 #ifdef STAPCONF_ADD_TIMER_ON 323 might_sleep(); 324 for_each_online_cpu(cpu) { 325 stp_time_t *time =3D per_cpu_ptr(stp_time, cpu); 326 add_timer_on(&time->timer, cpu); 327 } 328 #endif =3D=3D=3D I believe this means that something in the following line is causing us to become atomic: ret =3D on_each_cpu(__stp_init_time, NULL, 1); At line 315, might_sleep() didn't complain, but at line 323 we're suddenly atomic. On RHEL7, on_each_cpu() looks like the following: =3D=3D=3D=3D static inline int on_each_cpu(smp_call_func_t func, void *info, int wait) { unsigned long flags; local_irq_save(flags); func(info); local_irq_restore(flags); return 0; } =3D=3D=3D=3D That matches up to the kernel message above, since irqs aren't disabled. So= , my guess is that something in __stp_init_time() is causing us to become atomic. --=20 You are receiving this mail because: You are the assignee for the bug.