From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 44CAE382FC96 for ; Thu, 24 Nov 2022 17:06:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 44CAE382FC96 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1669309595; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=2NDqFSQCVjm54N+U30ckCJswdRiAcrDfJuYIhvQJCyg=; b=XjgLJ+EFFN9mg5UctN4cmRvoJsHsb68zIpiazgqUreSIIQeUSzJQH7//Lr7c/XA8rxoXNk JjYx9WglVrAaeks07i8QbrbqYIFGBHV17ONYt5dltQbJ6VB7aKGA54goC2PPb8e5sdrsI5 j00VJtTWmpAxkkOx7OmBW4OeB4pSOjU= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-508-XBqHe4UoOFSsqmRY5nAKjQ-1; Thu, 24 Nov 2022 12:06:34 -0500 X-MC-Unique: XBqHe4UoOFSsqmRY5nAKjQ-1 Received: by mail-wm1-f72.google.com with SMTP id h4-20020a1c2104000000b003d01b66fe65so2881901wmh.2 for ; Thu, 24 Nov 2022 09:06:34 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:message-id:date:references:in-reply-to:subject:cc:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=2NDqFSQCVjm54N+U30ckCJswdRiAcrDfJuYIhvQJCyg=; b=jCIaR+YXJRBCkUNXIlVpfz7omE23gfAmjrleiZbYGwq1aEHVTC9yU/EP2XkWcKuAR3 A+MdB4jmir0LplIvjP57am9JmRaB3MvxWsbjyP2z90EYc4H7/AeTzuuhGCx0AajWHzxJ B5pdo+rJf7PduktJxvUIbC2EfnilICiVpqoj+FnC6I5JWw/UghbRNItCjOGsv8RtvZ6h kpwdRO/CE69aH/veOWEIszR4hRqIMmcA2giRIpcaNXJW7idipEppVqdosSvjR4FxCCru UKG37Ki5bDAEsgdJP3/SBW1NUn/Ubn44LQTyhwJTXykpQtPPjRWRiw7C/d/hT7n1Fqgo 9aQA== X-Gm-Message-State: ANoB5plEBbsKc0YSTvPuBSMi3C25mpw8Fd37rPuT0ZQLJCG5jve7LIPR qQD6Pjx8Dw0ot0ApeOcekQqxt8t8K0u+7NKqPWY47EeO+/MVBNfWktMcLdUlXqlX66ZlWVOg1M9 GbWbGQg+/UUrolXzNMpMGlg== X-Received: by 2002:adf:ef45:0:b0:230:c987:138 with SMTP id c5-20020adfef45000000b00230c9870138mr9630635wrp.518.1669309593385; Thu, 24 Nov 2022 09:06:33 -0800 (PST) X-Google-Smtp-Source: AA0mqf65FLpeXfmjagyUTpOWfo3FZobnOuSqmt9PGAVd9OEI7KklhU3vdUzWkPqPrkYL+0aPYnmQuQ== X-Received: by 2002:adf:ef45:0:b0:230:c987:138 with SMTP id c5-20020adfef45000000b00230c9870138mr9630611wrp.518.1669309592934; Thu, 24 Nov 2022 09:06:32 -0800 (PST) Received: from localhost ([31.111.84.238]) by smtp.gmail.com with ESMTPSA id i8-20020a5d4388000000b00228d52b935asm1809720wrq.71.2022.11.24.09.06.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 24 Nov 2022 09:06:32 -0800 (PST) From: Andrew Burgess To: Rainer Orth , Andrew Burgess Cc: gdb-patches@sourceware.org Subject: Re: [PATCH] Fix expected received signal message in testsuite In-Reply-To: <87sfi82vg4.fsf@redhat.com> References: <20190913221823.GV6076@embecosm.com> <87sfi82vg4.fsf@redhat.com> Date: Thu, 24 Nov 2022 17:06:30 +0000 Message-ID: <87o7sw2sfd.fsf@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain X-Spam-Status: No, score=-5.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP,WEIRD_PORT autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Andrew Burgess writes: > Rainer Orth writes: > >> Hi Andrew, >> >>> * Rainer Orth [2019-09-05 14:04:06 +0200]: >>> >>>> Quite a number of tests FAIL on Solaris due to a mismatch between >>>> expected and received messages: the testsuite expects something like >>>> >>>> Program received signal SIGABRT, Aborted. >>>> >>>> while on Solaris it gets >>>> >>>> Thread 2 received signal SIGABRT, Aborted. >>>> >>>> For a simple testcase, info threads shows >>>> >>>> (gdb) info threads >>>> Id Target Id Frame >>>> 1 LWP 1 main () at /vol/src/gnu/gdb/doc/bugs/ua.c:5 >>>> * 2 Thread 1 (LWP 1) main () at /vol/src/gnu/gdb/doc/bugs/ua.c:5 >>>> >>>> I suspect this is due to support for the old pre-Solaris 9 MxN thread >>>> model where user level threads were mapped to a different set of lwps. >>>> >>>> For the moment, I'm dealing with this by allowing both forms of the >>>> message in the testsuite. The patch is almost completely mechanical, >>>> with the exception of gdb.base/sigbpt.exp where the introduction of a >>>> new group in the RE required adjustments in the $expect_out indices. >>> >>> I'm a little nervous about just allowing either "Thread" or "Program" >>> for all tests for all targets. Maybe others will disagree and think >>> I'm worrying about nothing, but I wonder if we could be more >>> conservative by adding a support function into lib/gdb.exp that takes >>> the name of a signal and returns the string we expect from GDB, which >>> we can then change based on Solaris/non-Solaris. >>> >>> I haven't looked through the patch in enough detail to know if there's >>> any reason why this wouldn't work, so please push back if you think >>> the idea is unworkable. >> >> sorry for letting the ball drop on this one. Only recently did I >> stumble across it again when looking into a related issue and now I >> finally understand why Solaris is different here. >> >> [Thread starting at https://sourceware.org/ml/gdb-patches/2019-09/msg00050.html] >> >> * Consider the following testcase: >> >> $ cat selfkill.c >> #include >> #include >> #include >> #include >> >> void * >> selfkill (void *arg) >> { >> kill (getpid (), SIGINT); >> return NULL; >> } >> >> int >> main (void) >> { >> #ifdef _REENTRANT >> pthread_t tid; >> pthread_create (&tid, NULL, selfkill, NULL); >> pthread_join (tid, NULL); >> #else >> selfkill (NULL); >> #endif >> return 0; >> } >> >> * Now compile on Solaris 9, both without and with -pthread: >> >> $ gcc -o selfkill selfkill.c >> $ gcc -pthread -o selfkill-mt selfkill.c >> >> * Run the identical binaries and versions of gdb (7.11 here) on both >> Solaris 9 and Solaris 10: >> >> $ gdb -q --batch -ex run selfkill{,-mt} >> >> ** Solaris 9, selfkill: >> >> Program received signal SIGINT, Interrupt. >> 0xb5d54186 in _libc_kill () from /usr/lib/libc.so.1 >> >> ** Solaris 9, selfkill-mt: >> >> [Thread debugging using libthread_db enabled] >> [New Thread 1 (LWP 1)] >> [New LWP 2 ] >> [New Thread 2 (LWP 2)] >> >> Thread 2 received signal SIGINT, Interrupt. >> [Switching to Thread 1 (LWP 1)] >> 0xb5c9fad5 in _lwp_wait () from /usr/lib/libc.so.1 >> >> ** Solaris 10, selfkill: >> >> [Thread debugging using libthread_db enabled] >> [New Thread 1 (LWP 1)] >> >> Thread 2 received signal SIGINT, Interrupt. >> [Switching to Thread 1 (LWP 1)] >> 0xfef0c165 in kill () from /lib/libc.so.1 >> >> ** Solaris 10, selfkill-mt: >> >> [Thread debugging using libthread_db enabled] >> [New Thread 1 (LWP 1)] >> [New LWP 2 ] >> [New Thread 2 (LWP 2)] >> >> Thread 2 received signal SIGINT, Interrupt. >> [Switching to Thread 1 (LWP 1)] >> 0xfeedca05 in __lwp_wait () from /lib/libc.so.1 >> >> ** Trying the same on Linux/x86_64, one sees the same behaviour as on >> Solaris 9: non-threaded and threaded programs behave differently. >> >> * As you can see, on Solaris 10 even the not explicitly threaded version >> of the test is shown as threaded, explaining the difference in the >> "... received signal" messages. >> >> This is a consequence of the Thread Model Unification Project in >> Solaris 10, which removed the difference between non-threaded and >> threaded processes. This has nothing to do with the removal of the >> pre-Solaris 9 MxN multilevel thread model as I'd originally >> suspected. > > I tried to take a look at this a little. The only Solaris machines I > have access to run on Sparc, not x86-64, but hopefully should still have > much the same behaviour. > > I did manage to (eventually) build GDB on one of these machines, but, > I'm not sure if I built it wrong, or if the Sparc/Solaris support is > just bad, but GDB was crashing all over the place with assertion > failures. > > Still, with some persistence I could see the behaviour you observe. > > Now, I've not done any Solaris work in >10years, so I don't claim to be > any kind of expert, but I wonder if the fix you're proposing here isn't > simply hiding a GDB bug. > > I wrote a simple test program that starts 3 worker threads and then > blocks. Here's the 'info threads' output for GNU/Linux: > > (gdb) info threads > Id Target Id Frame > * 1 Thread 0x7ffff7da3740 (LWP 2243115) "thr.x" 0x00007ffff7e74215 in nanosleep () from /lib64/libc.so.6 > 2 Thread 0x7ffff7da2700 (LWP 2243118) "thr.x" 0x00007ffff7e74215 in nanosleep () from /lib64/libc.so.6 > 3 Thread 0x7ffff75a1700 (LWP 2243119) "thr.x" 0x00007ffff7e74215 in nanosleep () from /lib64/libc.so.6 > 4 Thread 0x7ffff6da0700 (LWP 2243120) "thr.x" 0x00007ffff7e74215 in nanosleep () from /lib64/libc.so.6 > > What you'd expect. Now here's the same on Solaris: > > (gdb) info threads > Id Target Id Frame > * 1 LWP 1 0xfee4ddd4 in ___nanosleep () from /lib/libc.so.1 > 2 LWP 4 0xfee4ddd4 in ___nanosleep () from /lib/libc.so.1 > 3 LWP 3 0xfee4ddd4 in ___nanosleep () from /lib/libc.so.1 > 4 LWP 2 0xfee4ddd4 in ___nanosleep () from /lib/libc.so.1 > 5 Thread 1 (LWP 1) 0xfee4ddd4 in ___nanosleep () from /lib/libc.so.1 > 6 Thread 2 (LWP 2) 0xfee4ddd4 in ___nanosleep () from /lib/libc.so.1 > 7 Thread 3 (LWP 3) 0xfee4ddd4 in ___nanosleep () from /lib/libc.so.1 > 8 Thread 4 (LWP 4) 0xfee4ddd4 in ___nanosleep () from /lib/libc.so.1 > > This is inline with what you describe, but, I think we can all agree, > this seems a little odd; are there really 8 thread like things running > as part of this process? The output of `ps -aL` would suggest not: > > $ ps -aL > PID LWP TTY LTIME CMD > 3855 1 pts/6 0:00 thr.x > 3855 2 pts/6 0:00 thr.x > 3855 3 pts/6 0:00 thr.x > 3855 4 pts/6 0:00 thr.x > 4132 1 pts/8 0:00 ps > > And also, when I run the same test application using the dbx debugger, I > see this: > > (dbx) threads > *> t@1 a l@1 ?() signal SIGINT in ___nanosleep() > t@2 a l@2 thread_worker() running in ___nanosleep() > t@3 a l@3 thread_worker() running in ___nanosleep() > t@4 a l@4 thread_worker() running in ___nanosleep() > > So here, the process is represented as just 4 thread like things. > > So, why does GDB think there are 8, while every tools that ships with > Solaris seems to think there are 4? My guess, is that is has something > to do with the thread lookup code in sol-thread.c, and/or the operation > of libthread-db. > > So, what I run your original selfkill test application, and use GDB to > break on GDB's add_thread_with_info function (the thing that is > responsible for printing the "New Thread ..." message), here's what I > see: > > (gdb) bt > #0 add_thread_with_info (targ=targ@entry=0x940678 , ptid=..., priv=priv@entry=0x0) at ../../src/gdb/thread.c:290 > #1 0x0053b61c in add_thread (targ=0x940678 , ptid=...) at ../../src/gdb/thread.c:305 > #2 0x004ab5f4 in sol_thread_target::wait (this=, ptid=..., ourstatus=0xffbff620, options=...) at ../../src/gdb/sol-thread.c:459 > #3 0x0053019c in target_wait (ptid=..., status=status@entry=0xffbff620, options=...) at ../../src/gdb/target.c:2598 > #4 0x00395478 in do_target_wait_1 (inf=inf@entry=0x969288, ptid=..., status=status@entry=0xffbff620, options=) at ../../src/gdb/infrun.c:3763 > #5 0x003a7e8c in ::operator() (inf=0x969288, __closure=) at ../../src/gdb/infrun.c:3822 > #6 do_target_wait (options=..., ecs=0xffbff600) at ../../src/gdb/infrun.c:3841 > #7 fetch_inferior_event () at ../../src/gdb/infrun.c:4201 > #8 0x001b0bd8 in check_async_event_handlers () at ../../src/gdb/async-event.c:337 > #9 0x006c4e3c in gdb_do_one_event (mstimeout=mstimeout@entry=-1) at ../../src/gdbsupport/event-loop.cc:221 > #10 0x003d7ea0 in start_event_loop () at ../../src/gdb/main.c:411 > #11 captured_command_loop () at ../../src/gdb/main.c:471 > #12 0x003d9fa8 in captured_main (data=0xffbff84c) at ../../src/gdb/main.c:1330 > #13 gdb_main (args=args@entry=0xffbff84c) at ../../src/gdb/main.c:1345 > #14 0x006f7c5c in main (argc=4, argv=0xffbff8bc) at ../../src/gdb/gdb.c:32 > (gdb) frame 2 > #2 0x004ab5f4 in sol_thread_target::wait (this=, ptid=..., ourstatus=0xffbff620, options=...) at ../../src/gdb/sol-thread.c:459 > 459 add_thread (proc_target, rtnval); > (gdb) p rtnval > $1 = {m_pid = 7218, m_lwp = 0, m_tid = 1} > (gdb) p current_inferior_.m_obj->thread_list.m_front.ptid > $2 = {m_pid = 7218, m_lwp = 1, m_tid = 0} > (gdb) > > What this is telling us, is that, when GDB stopped after the ::wait > call, the ptid_t it got back was '{m_pid = 7218, m_lwp = 0, m_tid = 1}', > however, the original thread that GDB found when starting the > application was '{m_pid = 7218, m_lwp = 1, m_tid = 0}'. > > This difference is what causes GDB to add the new thread. > > My guess is that this m_lwp/m_tid difference is a bug somewhere in the > stack, and that really, we should be seeing the same ptid_t here. If we > did, then GDB would not add the new thread, and the test messages would > not change. So, to clarify this a little, the discrepancy seems to arise from lwp_to_thread, this is where we query libthread-db. Before this point, in sol_thread_target::wait, we call: ptid_t rtnval = beneath ()->wait (ptid, ourstatus, options); this returns us the (maybe?) expected ptid_t {m_pid = 7218, m_lwp = 1, m_tid = 0}, then when we call lwp_to_thread, we get back the alternative ptid_t where the tid field is set, but the lwp field is not. I don't know if this indicates a bug in libthread-db, or a bug in GDB. Thanks, Andrew