From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 3996F3830B1D for ; Thu, 24 Nov 2022 16:01:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3996F3830B1D Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1669305681; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=9Z3JqHzSSwDVRfobB8euwqeoXEOfDsmBMvV6/ZZ6Bvc=; b=G97NJUWCByNf0QNEcX1c+Hbr/l9YE4EsH+eFqBcDflNQijCQKKNZIgLoWcLBlsXdRXv0o+ TAVJ4TW+VE/7lJvRToPbkvAnXjfy1uuoxXesimHixgW96eurfpyxnyVpAs3xdy5qvW1ADY 0NRVi3ky/Nb58rE94kKT/BprwR8lMJA= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-149-Iuqm5m34NqCl9fDk80bjEQ-1; Thu, 24 Nov 2022 11:01:20 -0500 X-MC-Unique: Iuqm5m34NqCl9fDk80bjEQ-1 Received: by mail-wm1-f71.google.com with SMTP id v188-20020a1cacc5000000b003cf76c4ae66so2795227wme.7 for ; Thu, 24 Nov 2022 08:01:20 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:message-id:date:references:in-reply-to:subject:cc:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=9Z3JqHzSSwDVRfobB8euwqeoXEOfDsmBMvV6/ZZ6Bvc=; b=ESh/6RYcQGegqJaN5ZSS5Yy0uGhLM7JH1msasIRlgWcv3h6lL0ze9C/k/5AnuxZjun Wa0kf9CcicFht/s2Tzjb37uz/LcLdUdmgD1jLot2V5Qi1irGYag4poV+XipJXn2weT81 3+/sgi9wX//Wm2t0EOyxn0A1LKM3sBHEV3pJ14rHcXXU/jGkmGJGQdPLchar3tphLNW9 10FE56xdJJcK4l/Hfg2jJIUEfK/2ExizoTIOk8e+qzUnm/JILcJ4hhhiT5a7pOGmmMC2 HjUE2R0Alzp0qVc+5jwUlNto4rzFEUJRlzzcnZPmzc2PUBLctWGVLQRJh0a2KiRV8g9d +LRg== X-Gm-Message-State: ANoB5pkB0EVtL5h5oNCTZuEbC962CptZmkaFkrAmd8dq46j6Dhw/sHi/ vKkxSlZNLZh9CSxn6erzKmV2Itvd3g7hLSX7L4y7RirX1B/ffm8o6bvfbgdVRoCvw7Uem3Y1pKr M0+G3Ebv6+Jg68rYjKR1BnA== X-Received: by 2002:a5d:4526:0:b0:236:8425:7571 with SMTP id j6-20020a5d4526000000b0023684257571mr12698007wra.244.1669305678306; Thu, 24 Nov 2022 08:01:18 -0800 (PST) X-Google-Smtp-Source: AA0mqf4CP19/8cCw2jvlPK8AWiKIn62BkbI7ftOqrRLQCbelWv1IB/VAYQwp2qPV9OedqY6+2k24AA== X-Received: by 2002:a5d:4526:0:b0:236:8425:7571 with SMTP id j6-20020a5d4526000000b0023684257571mr12697966wra.244.1669305677802; Thu, 24 Nov 2022 08:01:17 -0800 (PST) Received: from localhost ([31.111.84.238]) by smtp.gmail.com with ESMTPSA id a11-20020adfed0b000000b002365730eae8sm1751422wro.55.2022.11.24.08.01.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 24 Nov 2022 08:01:17 -0800 (PST) From: Andrew Burgess To: Rainer Orth , Andrew Burgess Cc: gdb-patches@sourceware.org Subject: Re: [PATCH] Fix expected received signal message in testsuite In-Reply-To: References: <20190913221823.GV6076@embecosm.com> Date: Thu, 24 Nov 2022 16:01:15 +0000 Message-ID: <87sfi82vg4.fsf@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain X-Spam-Status: No, score=-5.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP,WEIRD_PORT autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Rainer Orth writes: > Hi Andrew, > >> * Rainer Orth [2019-09-05 14:04:06 +0200]: >> >>> Quite a number of tests FAIL on Solaris due to a mismatch between >>> expected and received messages: the testsuite expects something like >>> >>> Program received signal SIGABRT, Aborted. >>> >>> while on Solaris it gets >>> >>> Thread 2 received signal SIGABRT, Aborted. >>> >>> For a simple testcase, info threads shows >>> >>> (gdb) info threads >>> Id Target Id Frame >>> 1 LWP 1 main () at /vol/src/gnu/gdb/doc/bugs/ua.c:5 >>> * 2 Thread 1 (LWP 1) main () at /vol/src/gnu/gdb/doc/bugs/ua.c:5 >>> >>> I suspect this is due to support for the old pre-Solaris 9 MxN thread >>> model where user level threads were mapped to a different set of lwps. >>> >>> For the moment, I'm dealing with this by allowing both forms of the >>> message in the testsuite. The patch is almost completely mechanical, >>> with the exception of gdb.base/sigbpt.exp where the introduction of a >>> new group in the RE required adjustments in the $expect_out indices. >> >> I'm a little nervous about just allowing either "Thread" or "Program" >> for all tests for all targets. Maybe others will disagree and think >> I'm worrying about nothing, but I wonder if we could be more >> conservative by adding a support function into lib/gdb.exp that takes >> the name of a signal and returns the string we expect from GDB, which >> we can then change based on Solaris/non-Solaris. >> >> I haven't looked through the patch in enough detail to know if there's >> any reason why this wouldn't work, so please push back if you think >> the idea is unworkable. > > sorry for letting the ball drop on this one. Only recently did I > stumble across it again when looking into a related issue and now I > finally understand why Solaris is different here. > > [Thread starting at https://sourceware.org/ml/gdb-patches/2019-09/msg00050.html] > > * Consider the following testcase: > > $ cat selfkill.c > #include > #include > #include > #include > > void * > selfkill (void *arg) > { > kill (getpid (), SIGINT); > return NULL; > } > > int > main (void) > { > #ifdef _REENTRANT > pthread_t tid; > pthread_create (&tid, NULL, selfkill, NULL); > pthread_join (tid, NULL); > #else > selfkill (NULL); > #endif > return 0; > } > > * Now compile on Solaris 9, both without and with -pthread: > > $ gcc -o selfkill selfkill.c > $ gcc -pthread -o selfkill-mt selfkill.c > > * Run the identical binaries and versions of gdb (7.11 here) on both > Solaris 9 and Solaris 10: > > $ gdb -q --batch -ex run selfkill{,-mt} > > ** Solaris 9, selfkill: > > Program received signal SIGINT, Interrupt. > 0xb5d54186 in _libc_kill () from /usr/lib/libc.so.1 > > ** Solaris 9, selfkill-mt: > > [Thread debugging using libthread_db enabled] > [New Thread 1 (LWP 1)] > [New LWP 2 ] > [New Thread 2 (LWP 2)] > > Thread 2 received signal SIGINT, Interrupt. > [Switching to Thread 1 (LWP 1)] > 0xb5c9fad5 in _lwp_wait () from /usr/lib/libc.so.1 > > ** Solaris 10, selfkill: > > [Thread debugging using libthread_db enabled] > [New Thread 1 (LWP 1)] > > Thread 2 received signal SIGINT, Interrupt. > [Switching to Thread 1 (LWP 1)] > 0xfef0c165 in kill () from /lib/libc.so.1 > > ** Solaris 10, selfkill-mt: > > [Thread debugging using libthread_db enabled] > [New Thread 1 (LWP 1)] > [New LWP 2 ] > [New Thread 2 (LWP 2)] > > Thread 2 received signal SIGINT, Interrupt. > [Switching to Thread 1 (LWP 1)] > 0xfeedca05 in __lwp_wait () from /lib/libc.so.1 > > ** Trying the same on Linux/x86_64, one sees the same behaviour as on > Solaris 9: non-threaded and threaded programs behave differently. > > * As you can see, on Solaris 10 even the not explicitly threaded version > of the test is shown as threaded, explaining the difference in the > "... received signal" messages. > > This is a consequence of the Thread Model Unification Project in > Solaris 10, which removed the difference between non-threaded and > threaded processes. This has nothing to do with the removal of the > pre-Solaris 9 MxN multilevel thread model as I'd originally > suspected. I tried to take a look at this a little. The only Solaris machines I have access to run on Sparc, not x86-64, but hopefully should still have much the same behaviour. I did manage to (eventually) build GDB on one of these machines, but, I'm not sure if I built it wrong, or if the Sparc/Solaris support is just bad, but GDB was crashing all over the place with assertion failures. Still, with some persistence I could see the behaviour you observe. Now, I've not done any Solaris work in >10years, so I don't claim to be any kind of expert, but I wonder if the fix you're proposing here isn't simply hiding a GDB bug. I wrote a simple test program that starts 3 worker threads and then blocks. Here's the 'info threads' output for GNU/Linux: (gdb) info threads Id Target Id Frame * 1 Thread 0x7ffff7da3740 (LWP 2243115) "thr.x" 0x00007ffff7e74215 in nanosleep () from /lib64/libc.so.6 2 Thread 0x7ffff7da2700 (LWP 2243118) "thr.x" 0x00007ffff7e74215 in nanosleep () from /lib64/libc.so.6 3 Thread 0x7ffff75a1700 (LWP 2243119) "thr.x" 0x00007ffff7e74215 in nanosleep () from /lib64/libc.so.6 4 Thread 0x7ffff6da0700 (LWP 2243120) "thr.x" 0x00007ffff7e74215 in nanosleep () from /lib64/libc.so.6 What you'd expect. Now here's the same on Solaris: (gdb) info threads Id Target Id Frame * 1 LWP 1 0xfee4ddd4 in ___nanosleep () from /lib/libc.so.1 2 LWP 4 0xfee4ddd4 in ___nanosleep () from /lib/libc.so.1 3 LWP 3 0xfee4ddd4 in ___nanosleep () from /lib/libc.so.1 4 LWP 2 0xfee4ddd4 in ___nanosleep () from /lib/libc.so.1 5 Thread 1 (LWP 1) 0xfee4ddd4 in ___nanosleep () from /lib/libc.so.1 6 Thread 2 (LWP 2) 0xfee4ddd4 in ___nanosleep () from /lib/libc.so.1 7 Thread 3 (LWP 3) 0xfee4ddd4 in ___nanosleep () from /lib/libc.so.1 8 Thread 4 (LWP 4) 0xfee4ddd4 in ___nanosleep () from /lib/libc.so.1 This is inline with what you describe, but, I think we can all agree, this seems a little odd; are there really 8 thread like things running as part of this process? The output of `ps -aL` would suggest not: $ ps -aL PID LWP TTY LTIME CMD 3855 1 pts/6 0:00 thr.x 3855 2 pts/6 0:00 thr.x 3855 3 pts/6 0:00 thr.x 3855 4 pts/6 0:00 thr.x 4132 1 pts/8 0:00 ps And also, when I run the same test application using the dbx debugger, I see this: (dbx) threads *> t@1 a l@1 ?() signal SIGINT in ___nanosleep() t@2 a l@2 thread_worker() running in ___nanosleep() t@3 a l@3 thread_worker() running in ___nanosleep() t@4 a l@4 thread_worker() running in ___nanosleep() So here, the process is represented as just 4 thread like things. So, why does GDB think there are 8, while every tools that ships with Solaris seems to think there are 4? My guess, is that is has something to do with the thread lookup code in sol-thread.c, and/or the operation of libthread-db. So, what I run your original selfkill test application, and use GDB to break on GDB's add_thread_with_info function (the thing that is responsible for printing the "New Thread ..." message), here's what I see: (gdb) bt #0 add_thread_with_info (targ=targ@entry=0x940678 , ptid=..., priv=priv@entry=0x0) at ../../src/gdb/thread.c:290 #1 0x0053b61c in add_thread (targ=0x940678 , ptid=...) at ../../src/gdb/thread.c:305 #2 0x004ab5f4 in sol_thread_target::wait (this=, ptid=..., ourstatus=0xffbff620, options=...) at ../../src/gdb/sol-thread.c:459 #3 0x0053019c in target_wait (ptid=..., status=status@entry=0xffbff620, options=...) at ../../src/gdb/target.c:2598 #4 0x00395478 in do_target_wait_1 (inf=inf@entry=0x969288, ptid=..., status=status@entry=0xffbff620, options=) at ../../src/gdb/infrun.c:3763 #5 0x003a7e8c in ::operator() (inf=0x969288, __closure=) at ../../src/gdb/infrun.c:3822 #6 do_target_wait (options=..., ecs=0xffbff600) at ../../src/gdb/infrun.c:3841 #7 fetch_inferior_event () at ../../src/gdb/infrun.c:4201 #8 0x001b0bd8 in check_async_event_handlers () at ../../src/gdb/async-event.c:337 #9 0x006c4e3c in gdb_do_one_event (mstimeout=mstimeout@entry=-1) at ../../src/gdbsupport/event-loop.cc:221 #10 0x003d7ea0 in start_event_loop () at ../../src/gdb/main.c:411 #11 captured_command_loop () at ../../src/gdb/main.c:471 #12 0x003d9fa8 in captured_main (data=0xffbff84c) at ../../src/gdb/main.c:1330 #13 gdb_main (args=args@entry=0xffbff84c) at ../../src/gdb/main.c:1345 #14 0x006f7c5c in main (argc=4, argv=0xffbff8bc) at ../../src/gdb/gdb.c:32 (gdb) frame 2 #2 0x004ab5f4 in sol_thread_target::wait (this=, ptid=..., ourstatus=0xffbff620, options=...) at ../../src/gdb/sol-thread.c:459 459 add_thread (proc_target, rtnval); (gdb) p rtnval $1 = {m_pid = 7218, m_lwp = 0, m_tid = 1} (gdb) p current_inferior_.m_obj->thread_list.m_front.ptid $2 = {m_pid = 7218, m_lwp = 1, m_tid = 0} (gdb) What this is telling us, is that, when GDB stopped after the ::wait call, the ptid_t it got back was '{m_pid = 7218, m_lwp = 0, m_tid = 1}', however, the original thread that GDB found when starting the application was '{m_pid = 7218, m_lwp = 1, m_tid = 0}'. This difference is what causes GDB to add the new thread. My guess is that this m_lwp/m_tid difference is a bug somewhere in the stack, and that really, we should be seeing the same ptid_t here. If we did, then GDB would not add the new thread, and the test messages would not change. What are your thoughts on this analysis? Thanks, Andrew