From: Andrew Burgess <aburgess@redhat.com>
To: Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>,
Andrew Burgess <andrew.burgess@embecosm.com>
Cc: gdb-patches@sourceware.org
Subject: Re: [PATCH] Fix expected received signal message in testsuite
Date: Thu, 24 Nov 2022 17:06:30 +0000 [thread overview]
Message-ID: <87o7sw2sfd.fsf@redhat.com> (raw)
In-Reply-To: <87sfi82vg4.fsf@redhat.com>
Andrew Burgess <aburgess@redhat.com> writes:
> Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> writes:
>
>> Hi Andrew,
>>
>>> * Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> [2019-09-05 14:04:06 +0200]:
>>>
>>>> Quite a number of tests FAIL on Solaris due to a mismatch between
>>>> expected and received messages: the testsuite expects something like
>>>>
>>>> Program received signal SIGABRT, Aborted.
>>>>
>>>> while on Solaris it gets
>>>>
>>>> Thread 2 received signal SIGABRT, Aborted.
>>>>
>>>> For a simple testcase, info threads shows
>>>>
>>>> (gdb) info threads
>>>> Id Target Id Frame
>>>> 1 LWP 1 main () at /vol/src/gnu/gdb/doc/bugs/ua.c:5
>>>> * 2 Thread 1 (LWP 1) main () at /vol/src/gnu/gdb/doc/bugs/ua.c:5
>>>>
>>>> I suspect this is due to support for the old pre-Solaris 9 MxN thread
>>>> model where user level threads were mapped to a different set of lwps.
>>>>
>>>> For the moment, I'm dealing with this by allowing both forms of the
>>>> message in the testsuite. The patch is almost completely mechanical,
>>>> with the exception of gdb.base/sigbpt.exp where the introduction of a
>>>> new group in the RE required adjustments in the $expect_out indices.
>>>
>>> I'm a little nervous about just allowing either "Thread" or "Program"
>>> for all tests for all targets. Maybe others will disagree and think
>>> I'm worrying about nothing, but I wonder if we could be more
>>> conservative by adding a support function into lib/gdb.exp that takes
>>> the name of a signal and returns the string we expect from GDB, which
>>> we can then change based on Solaris/non-Solaris.
>>>
>>> I haven't looked through the patch in enough detail to know if there's
>>> any reason why this wouldn't work, so please push back if you think
>>> the idea is unworkable.
>>
>> sorry for letting the ball drop on this one. Only recently did I
>> stumble across it again when looking into a related issue and now I
>> finally understand why Solaris is different here.
>>
>> [Thread starting at https://sourceware.org/ml/gdb-patches/2019-09/msg00050.html]
>>
>> * Consider the following testcase:
>>
>> $ cat selfkill.c
>> #include <sys/types.h>
>> #include <signal.h>
>> #include <unistd.h>
>> #include <pthread.h>
>>
>> void *
>> selfkill (void *arg)
>> {
>> kill (getpid (), SIGINT);
>> return NULL;
>> }
>>
>> int
>> main (void)
>> {
>> #ifdef _REENTRANT
>> pthread_t tid;
>> pthread_create (&tid, NULL, selfkill, NULL);
>> pthread_join (tid, NULL);
>> #else
>> selfkill (NULL);
>> #endif
>> return 0;
>> }
>>
>> * Now compile on Solaris 9, both without and with -pthread:
>>
>> $ gcc -o selfkill selfkill.c
>> $ gcc -pthread -o selfkill-mt selfkill.c
>>
>> * Run the identical binaries and versions of gdb (7.11 here) on both
>> Solaris 9 and Solaris 10:
>>
>> $ gdb -q --batch -ex run selfkill{,-mt}
>>
>> ** Solaris 9, selfkill:
>>
>> Program received signal SIGINT, Interrupt.
>> 0xb5d54186 in _libc_kill () from /usr/lib/libc.so.1
>>
>> ** Solaris 9, selfkill-mt:
>>
>> [Thread debugging using libthread_db enabled]
>> [New Thread 1 (LWP 1)]
>> [New LWP 2 ]
>> [New Thread 2 (LWP 2)]
>>
>> Thread 2 received signal SIGINT, Interrupt.
>> [Switching to Thread 1 (LWP 1)]
>> 0xb5c9fad5 in _lwp_wait () from /usr/lib/libc.so.1
>>
>> ** Solaris 10, selfkill:
>>
>> [Thread debugging using libthread_db enabled]
>> [New Thread 1 (LWP 1)]
>>
>> Thread 2 received signal SIGINT, Interrupt.
>> [Switching to Thread 1 (LWP 1)]
>> 0xfef0c165 in kill () from /lib/libc.so.1
>>
>> ** Solaris 10, selfkill-mt:
>>
>> [Thread debugging using libthread_db enabled]
>> [New Thread 1 (LWP 1)]
>> [New LWP 2 ]
>> [New Thread 2 (LWP 2)]
>>
>> Thread 2 received signal SIGINT, Interrupt.
>> [Switching to Thread 1 (LWP 1)]
>> 0xfeedca05 in __lwp_wait () from /lib/libc.so.1
>>
>> ** Trying the same on Linux/x86_64, one sees the same behaviour as on
>> Solaris 9: non-threaded and threaded programs behave differently.
>>
>> * As you can see, on Solaris 10 even the not explicitly threaded version
>> of the test is shown as threaded, explaining the difference in the
>> "... received signal" messages.
>>
>> This is a consequence of the Thread Model Unification Project in
>> Solaris 10, which removed the difference between non-threaded and
>> threaded processes. This has nothing to do with the removal of the
>> pre-Solaris 9 MxN multilevel thread model as I'd originally
>> suspected.
>
> I tried to take a look at this a little. The only Solaris machines I
> have access to run on Sparc, not x86-64, but hopefully should still have
> much the same behaviour.
>
> I did manage to (eventually) build GDB on one of these machines, but,
> I'm not sure if I built it wrong, or if the Sparc/Solaris support is
> just bad, but GDB was crashing all over the place with assertion
> failures.
>
> Still, with some persistence I could see the behaviour you observe.
>
> Now, I've not done any Solaris work in >10years, so I don't claim to be
> any kind of expert, but I wonder if the fix you're proposing here isn't
> simply hiding a GDB bug.
>
> I wrote a simple test program that starts 3 worker threads and then
> blocks. Here's the 'info threads' output for GNU/Linux:
>
> (gdb) info threads
> Id Target Id Frame
> * 1 Thread 0x7ffff7da3740 (LWP 2243115) "thr.x" 0x00007ffff7e74215 in nanosleep () from /lib64/libc.so.6
> 2 Thread 0x7ffff7da2700 (LWP 2243118) "thr.x" 0x00007ffff7e74215 in nanosleep () from /lib64/libc.so.6
> 3 Thread 0x7ffff75a1700 (LWP 2243119) "thr.x" 0x00007ffff7e74215 in nanosleep () from /lib64/libc.so.6
> 4 Thread 0x7ffff6da0700 (LWP 2243120) "thr.x" 0x00007ffff7e74215 in nanosleep () from /lib64/libc.so.6
>
> What you'd expect. Now here's the same on Solaris:
>
> (gdb) info threads
> Id Target Id Frame
> * 1 LWP 1 0xfee4ddd4 in ___nanosleep () from /lib/libc.so.1
> 2 LWP 4 0xfee4ddd4 in ___nanosleep () from /lib/libc.so.1
> 3 LWP 3 0xfee4ddd4 in ___nanosleep () from /lib/libc.so.1
> 4 LWP 2 0xfee4ddd4 in ___nanosleep () from /lib/libc.so.1
> 5 Thread 1 (LWP 1) 0xfee4ddd4 in ___nanosleep () from /lib/libc.so.1
> 6 Thread 2 (LWP 2) 0xfee4ddd4 in ___nanosleep () from /lib/libc.so.1
> 7 Thread 3 (LWP 3) 0xfee4ddd4 in ___nanosleep () from /lib/libc.so.1
> 8 Thread 4 (LWP 4) 0xfee4ddd4 in ___nanosleep () from /lib/libc.so.1
>
> This is inline with what you describe, but, I think we can all agree,
> this seems a little odd; are there really 8 thread like things running
> as part of this process? The output of `ps -aL` would suggest not:
>
> $ ps -aL
> PID LWP TTY LTIME CMD
> 3855 1 pts/6 0:00 thr.x
> 3855 2 pts/6 0:00 thr.x
> 3855 3 pts/6 0:00 thr.x
> 3855 4 pts/6 0:00 thr.x
> 4132 1 pts/8 0:00 ps
>
> And also, when I run the same test application using the dbx debugger, I
> see this:
>
> (dbx) threads
> *> t@1 a l@1 ?() signal SIGINT in ___nanosleep()
> t@2 a l@2 thread_worker() running in ___nanosleep()
> t@3 a l@3 thread_worker() running in ___nanosleep()
> t@4 a l@4 thread_worker() running in ___nanosleep()
>
> So here, the process is represented as just 4 thread like things.
>
> So, why does GDB think there are 8, while every tools that ships with
> Solaris seems to think there are 4? My guess, is that is has something
> to do with the thread lookup code in sol-thread.c, and/or the operation
> of libthread-db.
>
> So, what I run your original selfkill test application, and use GDB to
> break on GDB's add_thread_with_info function (the thing that is
> responsible for printing the "New Thread ..." message), here's what I
> see:
>
> (gdb) bt
> #0 add_thread_with_info (targ=targ@entry=0x940678 <the_procfs_target>, ptid=..., priv=priv@entry=0x0) at ../../src/gdb/thread.c:290
> #1 0x0053b61c in add_thread (targ=0x940678 <the_procfs_target>, ptid=...) at ../../src/gdb/thread.c:305
> #2 0x004ab5f4 in sol_thread_target::wait (this=<optimized out>, ptid=..., ourstatus=0xffbff620, options=...) at ../../src/gdb/sol-thread.c:459
> #3 0x0053019c in target_wait (ptid=..., status=status@entry=0xffbff620, options=...) at ../../src/gdb/target.c:2598
> #4 0x00395478 in do_target_wait_1 (inf=inf@entry=0x969288, ptid=..., status=status@entry=0xffbff620, options=<error reading variable: Cannot access memory at address 0x0>) at ../../src/gdb/infrun.c:3763
> #5 0x003a7e8c in <lambda(inferior*)>::operator() (inf=0x969288, __closure=<synthetic pointer>) at ../../src/gdb/infrun.c:3822
> #6 do_target_wait (options=..., ecs=0xffbff600) at ../../src/gdb/infrun.c:3841
> #7 fetch_inferior_event () at ../../src/gdb/infrun.c:4201
> #8 0x001b0bd8 in check_async_event_handlers () at ../../src/gdb/async-event.c:337
> #9 0x006c4e3c in gdb_do_one_event (mstimeout=mstimeout@entry=-1) at ../../src/gdbsupport/event-loop.cc:221
> #10 0x003d7ea0 in start_event_loop () at ../../src/gdb/main.c:411
> #11 captured_command_loop () at ../../src/gdb/main.c:471
> #12 0x003d9fa8 in captured_main (data=0xffbff84c) at ../../src/gdb/main.c:1330
> #13 gdb_main (args=args@entry=0xffbff84c) at ../../src/gdb/main.c:1345
> #14 0x006f7c5c in main (argc=4, argv=0xffbff8bc) at ../../src/gdb/gdb.c:32
> (gdb) frame 2
> #2 0x004ab5f4 in sol_thread_target::wait (this=<optimized out>, ptid=..., ourstatus=0xffbff620, options=...) at ../../src/gdb/sol-thread.c:459
> 459 add_thread (proc_target, rtnval);
> (gdb) p rtnval
> $1 = {m_pid = 7218, m_lwp = 0, m_tid = 1}
> (gdb) p current_inferior_.m_obj->thread_list.m_front.ptid
> $2 = {m_pid = 7218, m_lwp = 1, m_tid = 0}
> (gdb)
>
> What this is telling us, is that, when GDB stopped after the ::wait
> call, the ptid_t it got back was '{m_pid = 7218, m_lwp = 0, m_tid = 1}',
> however, the original thread that GDB found when starting the
> application was '{m_pid = 7218, m_lwp = 1, m_tid = 0}'.
>
> This difference is what causes GDB to add the new thread.
>
> My guess is that this m_lwp/m_tid difference is a bug somewhere in the
> stack, and that really, we should be seeing the same ptid_t here. If we
> did, then GDB would not add the new thread, and the test messages would
> not change.
So, to clarify this a little, the discrepancy seems to arise from
lwp_to_thread, this is where we query libthread-db.
Before this point, in sol_thread_target::wait, we call:
ptid_t rtnval = beneath ()->wait (ptid, ourstatus, options);
this returns us the (maybe?) expected ptid_t {m_pid = 7218, m_lwp = 1,
m_tid = 0}, then when we call lwp_to_thread, we get back the alternative
ptid_t where the tid field is set, but the lwp field is not.
I don't know if this indicates a bug in libthread-db, or a bug in GDB.
Thanks,
Andrew
next prev parent reply other threads:[~2022-11-24 17:06 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-05 12:04 Rainer Orth
2019-09-13 12:47 ` Rainer Orth
2019-09-13 22:18 ` Andrew Burgess
2022-11-22 9:18 ` Rainer Orth
2022-11-24 16:01 ` Andrew Burgess
2022-11-24 17:06 ` Andrew Burgess [this message]
2022-11-25 10:04 ` Rainer Orth
2022-11-25 9:58 ` Rainer Orth
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87o7sw2sfd.fsf@redhat.com \
--to=aburgess@redhat.com \
--cc=andrew.burgess@embecosm.com \
--cc=gdb-patches@sourceware.org \
--cc=ro@CeBiTec.Uni-Bielefeld.DE \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).