From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=AZ7P=3Y=redhat.com=aburgess@sourceware.org>
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124])
	by sourceware.org (Postfix) with ESMTPS id 3996F3830B1D
	for <gdb-patches@sourceware.org>; Thu, 24 Nov 2022 16:01:22 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3996F3830B1D
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1669305681;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=9Z3JqHzSSwDVRfobB8euwqeoXEOfDsmBMvV6/ZZ6Bvc=;
	b=G97NJUWCByNf0QNEcX1c+Hbr/l9YE4EsH+eFqBcDflNQijCQKKNZIgLoWcLBlsXdRXv0o+
	TAVJ4TW+VE/7lJvRToPbkvAnXjfy1uuoxXesimHixgW96eurfpyxnyVpAs3xdy5qvW1ADY
	0NRVi3ky/Nb58rE94kKT/BprwR8lMJA=
Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com
 [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id
 us-mta-149-Iuqm5m34NqCl9fDk80bjEQ-1; Thu, 24 Nov 2022 11:01:20 -0500
X-MC-Unique: Iuqm5m34NqCl9fDk80bjEQ-1
Received: by mail-wm1-f71.google.com with SMTP id v188-20020a1cacc5000000b003cf76c4ae66so2795227wme.7
        for <gdb-patches@sourceware.org>; Thu, 24 Nov 2022 08:01:20 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=mime-version:message-id:date:references:in-reply-to:subject:cc:to
         :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=9Z3JqHzSSwDVRfobB8euwqeoXEOfDsmBMvV6/ZZ6Bvc=;
        b=ESh/6RYcQGegqJaN5ZSS5Yy0uGhLM7JH1msasIRlgWcv3h6lL0ze9C/k/5AnuxZjun
         Wa0kf9CcicFht/s2Tzjb37uz/LcLdUdmgD1jLot2V5Qi1irGYag4poV+XipJXn2weT81
         3+/sgi9wX//Wm2t0EOyxn0A1LKM3sBHEV3pJ14rHcXXU/jGkmGJGQdPLchar3tphLNW9
         10FE56xdJJcK4l/Hfg2jJIUEfK/2ExizoTIOk8e+qzUnm/JILcJ4hhhiT5a7pOGmmMC2
         HjUE2R0Alzp0qVc+5jwUlNto4rzFEUJRlzzcnZPmzc2PUBLctWGVLQRJh0a2KiRV8g9d
         +LRg==
X-Gm-Message-State: ANoB5pkB0EVtL5h5oNCTZuEbC962CptZmkaFkrAmd8dq46j6Dhw/sHi/
	vKkxSlZNLZh9CSxn6erzKmV2Itvd3g7hLSX7L4y7RirX1B/ffm8o6bvfbgdVRoCvw7Uem3Y1pKr
	M0+G3Ebv6+Jg68rYjKR1BnA==
X-Received: by 2002:a5d:4526:0:b0:236:8425:7571 with SMTP id j6-20020a5d4526000000b0023684257571mr12698007wra.244.1669305678306;
        Thu, 24 Nov 2022 08:01:18 -0800 (PST)
X-Google-Smtp-Source: AA0mqf4CP19/8cCw2jvlPK8AWiKIn62BkbI7ftOqrRLQCbelWv1IB/VAYQwp2qPV9OedqY6+2k24AA==
X-Received: by 2002:a5d:4526:0:b0:236:8425:7571 with SMTP id j6-20020a5d4526000000b0023684257571mr12697966wra.244.1669305677802;
        Thu, 24 Nov 2022 08:01:17 -0800 (PST)
Received: from localhost ([31.111.84.238])
        by smtp.gmail.com with ESMTPSA id a11-20020adfed0b000000b002365730eae8sm1751422wro.55.2022.11.24.08.01.16
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 24 Nov 2022 08:01:17 -0800 (PST)
From: Andrew Burgess <aburgess@redhat.com>
To: Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>, Andrew Burgess
 <andrew.burgess@embecosm.com>
Cc: gdb-patches@sourceware.org
Subject: Re: [PATCH] Fix expected received signal message in testsuite
In-Reply-To: <yddmt8jqrdl.fsf@CeBiTec.Uni-Bielefeld.DE>
References: <yddlfv3eyq1.fsf@CeBiTec.Uni-Bielefeld.DE>
 <20190913221823.GV6076@embecosm.com>
 <yddmt8jqrdl.fsf@CeBiTec.Uni-Bielefeld.DE>
Date: Thu, 24 Nov 2022 16:01:15 +0000
Message-ID: <87sfi82vg4.fsf@redhat.com>
MIME-Version: 1.0
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain
X-Spam-Status: No, score=-5.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP,WEIRD_PORT autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gdb-patches.sourceware.org>

Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> writes:

> Hi Andrew,
>
>> * Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> [2019-09-05 14:04:06 +0200]:
>>
>>> Quite a number of tests FAIL on Solaris due to a mismatch between
>>> expected and received messages: the testsuite expects something like
>>> 
>>> 	Program received signal SIGABRT, Aborted.
>>> 
>>> while on Solaris it gets
>>> 
>>> 	Thread 2 received signal SIGABRT, Aborted.
>>> 
>>> For a simple testcase, info threads shows
>>> 
>>> (gdb) info threads 
>>>   Id   Target Id         Frame 
>>>   1    LWP    1          main () at /vol/src/gnu/gdb/doc/bugs/ua.c:5
>>> * 2    Thread 1 (LWP 1)  main () at /vol/src/gnu/gdb/doc/bugs/ua.c:5
>>> 
>>> I suspect this is due to support for the old pre-Solaris 9 MxN thread
>>> model where user level threads were mapped to a different set of lwps.
>>> 
>>> For the moment, I'm dealing with this by allowing both forms of the
>>> message in the testsuite.  The patch is almost completely mechanical,
>>> with the exception of gdb.base/sigbpt.exp where the introduction of a
>>> new group in the RE required adjustments in the $expect_out indices.
>>
>> I'm a little nervous about just allowing either "Thread" or "Program"
>> for all tests for all targets.  Maybe others will disagree and think
>> I'm worrying about nothing, but I wonder if we could be more
>> conservative by adding a support function into lib/gdb.exp that takes
>> the name of a signal and returns the string we expect from GDB, which
>> we can then change based on Solaris/non-Solaris.
>>
>> I haven't looked through the patch in enough detail to know if there's
>> any reason why this wouldn't work, so please push back if you think
>> the idea is unworkable.
>
> sorry for letting the ball drop on this one.  Only recently did I
> stumble across it again when looking into a related issue and now I
> finally understand why Solaris is different here.
>
> [Thread starting at https://sourceware.org/ml/gdb-patches/2019-09/msg00050.html]
>
> * Consider the following testcase:
>
> $ cat selfkill.c 
> #include <sys/types.h>
> #include <signal.h>
> #include <unistd.h>
> #include <pthread.h>
>
> void *
> selfkill (void *arg)
> {
>   kill (getpid (), SIGINT);
>   return NULL;
> }
>
> int
> main (void)
> {
> #ifdef _REENTRANT
>   pthread_t tid;
>   pthread_create (&tid, NULL, selfkill, NULL);
>   pthread_join (tid, NULL);
> #else
>   selfkill (NULL);
> #endif
>   return 0;
> }
>
> * Now compile on Solaris 9, both without and with -pthread:
>
> $ gcc -o selfkill selfkill.c
> $ gcc -pthread -o selfkill-mt selfkill.c
>
> * Run the identical binaries and versions of gdb (7.11 here) on both
>   Solaris 9 and Solaris 10:
>
> $ gdb -q --batch -ex run selfkill{,-mt}
>
> ** Solaris 9, selfkill:
>
> Program received signal SIGINT, Interrupt.
> 0xb5d54186 in _libc_kill () from /usr/lib/libc.so.1
>
> ** Solaris 9, selfkill-mt:
>
> [Thread debugging using libthread_db enabled]
> [New Thread 1 (LWP 1)]
> [New LWP    2        ]
> [New Thread 2 (LWP 2)]
>
> Thread 2 received signal SIGINT, Interrupt.
> [Switching to Thread 1 (LWP 1)]
> 0xb5c9fad5 in _lwp_wait () from /usr/lib/libc.so.1
>
> ** Solaris 10, selfkill:
>
> [Thread debugging using libthread_db enabled]
> [New Thread 1 (LWP 1)]
>
> Thread 2 received signal SIGINT, Interrupt.
> [Switching to Thread 1 (LWP 1)]
> 0xfef0c165 in kill () from /lib/libc.so.1
>
> ** Solaris 10, selfkill-mt:
>
> [Thread debugging using libthread_db enabled]
> [New Thread 1 (LWP 1)]
> [New LWP    2        ]
> [New Thread 2 (LWP 2)]
>
> Thread 2 received signal SIGINT, Interrupt.
> [Switching to Thread 1 (LWP 1)]
> 0xfeedca05 in __lwp_wait () from /lib/libc.so.1
>
> ** Trying the same on Linux/x86_64, one sees the same behaviour as on
>    Solaris 9: non-threaded and threaded programs behave differently.
>
> * As you can see, on Solaris 10 even the not explicitly threaded version
>   of the test is shown as threaded, explaining the difference in the
>   "... received signal" messages.
>
>   This is a consequence of the Thread Model Unification Project in
>   Solaris 10, which removed the difference between non-threaded and
>   threaded processes.  This has nothing to do with the removal of the
>   pre-Solaris 9 MxN multilevel thread model as I'd originally
>   suspected.

I tried to take a look at this a little.  The only Solaris machines I
have access to run on Sparc, not x86-64, but hopefully should still have
much the same behaviour.

I did manage to (eventually) build GDB on one of these machines, but,
I'm not sure if I built it wrong, or if the Sparc/Solaris support is
just bad, but GDB was crashing all over the place with assertion
failures.

Still, with some persistence I could see the behaviour you observe.

Now, I've not done any Solaris work in >10years, so I don't claim to be
any kind of expert, but I wonder if the fix you're proposing here isn't
simply hiding a GDB bug.

I wrote a simple test program that starts 3 worker threads and then
blocks.  Here's the 'info threads' output for GNU/Linux:

  (gdb) info threads 
    Id   Target Id                                   Frame 
  * 1    Thread 0x7ffff7da3740 (LWP 2243115) "thr.x" 0x00007ffff7e74215 in nanosleep () from /lib64/libc.so.6
    2    Thread 0x7ffff7da2700 (LWP 2243118) "thr.x" 0x00007ffff7e74215 in nanosleep () from /lib64/libc.so.6
    3    Thread 0x7ffff75a1700 (LWP 2243119) "thr.x" 0x00007ffff7e74215 in nanosleep () from /lib64/libc.so.6
    4    Thread 0x7ffff6da0700 (LWP 2243120) "thr.x" 0x00007ffff7e74215 in nanosleep () from /lib64/libc.so.6

What you'd expect.  Now here's the same on Solaris:

  (gdb) info threads
    Id   Target Id         Frame 
  * 1    LWP    1          0xfee4ddd4 in ___nanosleep () from /lib/libc.so.1
    2    LWP    4          0xfee4ddd4 in ___nanosleep () from /lib/libc.so.1
    3    LWP    3          0xfee4ddd4 in ___nanosleep () from /lib/libc.so.1
    4    LWP    2          0xfee4ddd4 in ___nanosleep () from /lib/libc.so.1
    5    Thread 1 (LWP 1)  0xfee4ddd4 in ___nanosleep () from /lib/libc.so.1
    6    Thread 2 (LWP 2)  0xfee4ddd4 in ___nanosleep () from /lib/libc.so.1
    7    Thread 3 (LWP 3)  0xfee4ddd4 in ___nanosleep () from /lib/libc.so.1
    8    Thread 4 (LWP 4)  0xfee4ddd4 in ___nanosleep () from /lib/libc.so.1

This is inline with what you describe, but, I think we can all agree,
this seems a little odd; are there really 8 thread like things running
as part of this process?  The output of `ps -aL` would suggest not:

  $ ps -aL
     PID   LWP TTY        LTIME CMD
    3855     1 pts/6       0:00 thr.x
    3855     2 pts/6       0:00 thr.x
    3855     3 pts/6       0:00 thr.x
    3855     4 pts/6       0:00 thr.x
    4132     1 pts/8       0:00 ps

And also, when I run the same test application using the dbx debugger, I
see this:

  (dbx) threads
  *>    t@1  a  l@1   ?()   signal SIGINT in  ___nanosleep() 
        t@2  a  l@2   thread_worker()   running          in  ___nanosleep() 
        t@3  a  l@3   thread_worker()   running          in  ___nanosleep() 
        t@4  a  l@4   thread_worker()   running          in  ___nanosleep() 

So here, the process is represented as just 4 thread like things.

So, why does GDB think there are 8, while every tools that ships with
Solaris seems to think there are 4?  My guess, is that is has something
to do with the thread lookup code in sol-thread.c, and/or the operation
of libthread-db.

So, what I run your original selfkill test application, and use GDB to
break on GDB's add_thread_with_info function (the thing that is
responsible for printing the "New Thread ..." message), here's what I
see:

  (gdb) bt
  #0  add_thread_with_info (targ=targ@entry=0x940678 <the_procfs_target>, ptid=..., priv=priv@entry=0x0) at ../../src/gdb/thread.c:290
  #1  0x0053b61c in add_thread (targ=0x940678 <the_procfs_target>, ptid=...) at ../../src/gdb/thread.c:305
  #2  0x004ab5f4 in sol_thread_target::wait (this=<optimized out>, ptid=..., ourstatus=0xffbff620, options=...) at ../../src/gdb/sol-thread.c:459
  #3  0x0053019c in target_wait (ptid=..., status=status@entry=0xffbff620, options=...) at ../../src/gdb/target.c:2598
  #4  0x00395478 in do_target_wait_1 (inf=inf@entry=0x969288, ptid=..., status=status@entry=0xffbff620, options=<error reading variable: Cannot access memory at address 0x0>) at ../../src/gdb/infrun.c:3763
  #5  0x003a7e8c in <lambda(inferior*)>::operator() (inf=0x969288, __closure=<synthetic pointer>) at ../../src/gdb/infrun.c:3822
  #6  do_target_wait (options=..., ecs=0xffbff600) at ../../src/gdb/infrun.c:3841
  #7  fetch_inferior_event () at ../../src/gdb/infrun.c:4201
  #8  0x001b0bd8 in check_async_event_handlers () at ../../src/gdb/async-event.c:337
  #9  0x006c4e3c in gdb_do_one_event (mstimeout=mstimeout@entry=-1) at ../../src/gdbsupport/event-loop.cc:221
  #10 0x003d7ea0 in start_event_loop () at ../../src/gdb/main.c:411
  #11 captured_command_loop () at ../../src/gdb/main.c:471
  #12 0x003d9fa8 in captured_main (data=0xffbff84c) at ../../src/gdb/main.c:1330
  #13 gdb_main (args=args@entry=0xffbff84c) at ../../src/gdb/main.c:1345
  #14 0x006f7c5c in main (argc=4, argv=0xffbff8bc) at ../../src/gdb/gdb.c:32
  (gdb) frame 2
  #2  0x004ab5f4 in sol_thread_target::wait (this=<optimized out>, ptid=..., ourstatus=0xffbff620, options=...) at ../../src/gdb/sol-thread.c:459
  459                   add_thread (proc_target, rtnval);
  (gdb) p rtnval
  $1 = {m_pid = 7218, m_lwp = 0, m_tid = 1}
  (gdb) p current_inferior_.m_obj->thread_list.m_front.ptid 
  $2 = {m_pid = 7218, m_lwp = 1, m_tid = 0}
  (gdb) 

What this is telling us, is that, when GDB stopped after the ::wait
call, the ptid_t it got back was '{m_pid = 7218, m_lwp = 0, m_tid = 1}',
however, the original thread that GDB found when starting the
application was '{m_pid = 7218, m_lwp = 1, m_tid = 0}'.

This difference is what causes GDB to add the new thread.

My guess is that this m_lwp/m_tid difference is a bug somewhere in the
stack, and that really, we should be seeing the same ptid_t here.  If we
did, then GDB would not add the new thread, and the test messages would
not change.

What are your thoughts on this analysis?

Thanks,
Andrew