public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
From: Aditya Kamath1 <Aditya.Kamath1@ibm.com>
To: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>,
	"simark@simark.ca" <simark@simark.ca>,
	"gdb-patches@sourceware.org" <gdb-patches@sourceware.org>
Cc: Sangamesh Mallayya <sangamesh.swamy@in.ibm.com>
Subject: Re: [PATCH] 0001-Fix-multi-thread-debug-bug-in-AIX.patch
Date: Thu, 8 Dec 2022 10:28:31 +0000	[thread overview]
Message-ID: <CH2PR15MB35447EE88C64F7F36A0C61F5D61D9@CH2PR15MB3544.namprd15.prod.outlook.com> (raw)
In-Reply-To: <8302c3570292b864ab21176e58bdee546f6e4544.camel@de.ibm.com>


[-- Attachment #1.1: Type: text/plain, Size: 28014 bytes --]

Hi Ulrich and community,

Please find the new patch [See:- 0001-Fix-multi-thread-bug-in-AIX.patch ]

I have moved get_signaled_thread () function which tells us which is the kernel thread that caused an event due to which the debugger had to wait to rs6000-aix-nat.c.

So, we figure out which kernel thread might have caused an event in the rs6000-aix-nat.c code itself. If let us say the main thread is pthreaded or a new user thread is being created or user thread is deleted, then I have taken care of what to return in sync_threadlists () code itself.

>>So, if you observe output 3 or 4, the program first multi threads,
>>I mean thread events are handled first and then the threads fork.
>>So, when this happens, I cannot return ptid_t (parent_pid). If I do
>>so, the GDB core will treat it as a new process and add it in my
>>threadlist as say process 100 despite existence of 'thread 1'
>>representing the same. So, I need to correctly send which thread
>>did the fork () event or which thread of the process is the one who
>>gave birth to a new inferior process [say 2 or 3 in output 3 below],
>>I mean which thread caused the mult process event when the process
>>is mutli threaded. This has to handled here as control from target.c
>>comes directly to rs6000-aix-nat::wait and not through
>>aix-thread.c::wait since fork () is a process event..

>So this last bit seems to be the problem.  Could you elaborate on
>what the exact call stack is?  I thought once the thread layer is
>initialized, calls to ::wait should always go through it ...

Kindly see the backtrace sections

  *   BT:- Thread_wait [which is on a thread event like new thread born or main process is pthreaded],
  *   BT:- Post thread wait in rs6000-aix-nat::wait  [which is the beneath ()->wait () in aix_thread_target::wait],
  *   BT:- If direct rs6000-aix-nat::wait [ where in output 3 and 4 {below in this email} you can see it will directly come to rs6000-aix-nat.c if the main process after having threads forks or uses a fork () call ] pasted below in this email.

>So the way this works e.g. on Linux is that the process layer handles
>both processes and the *kernel* aspect of threads, while the thread
>layer handles the *user-space* (libc/libpthread) aspect of threads.

>In terms of the GDB ptid_t, this means that both the "pid" and "lwp"
>field are "owned" by the process layer (which would be rs6000-aix-nat.c
>in your case), while only the "tid" field is owned by the thread
>layer (which would be aix-thread.c).

>Linux does that because it allows correctly debugging programs that
>only use the kernel threading capabilities without using libpthread,
>e.g. by directly calling the "clone" system call and not "pthread_create".
>Such threads won't be in the thread list managed by the user space
>library, but are still handled by the process layer in GDB, tracked
>as lwp without associated tid.

>Not sure if something like that is even possible in AIX.  If it does
>make sense to handle things similarly in AIX (one other reason would
>be ptrace commands that require LWPs, e.g. like the VSX register
>access you had in another thread), some code would indeed need
>to move, e.g. everything related to accessing *kernel* threads
>(fetch_regs_kernel_thread etc.), while code that accesses *user*
>threads via the libpthread accessors (fetch_regs_user_thread etc.)
>would still remain in aix-thread.c.

With this patch I have moved all my lwp checks in rs6000-aix-nat.c file and user thread things in aix-thread.c.. Yes, this will help us in the vector patch.

>>While debugged in depth last two days I realised our pid_to_str
>>is needed in rs6000-aix-nat.c as control comes here in search of it.
>>If it doesn't GDB treats all threads as process.

>This is again very suspicious.  We obviously already have
>threads, so the thread layer should be initialized.  This
>means that any "pid_to_str" call should go through the
>*thread* layer (implementation in aix-thread.c).  If that
>doesn't happen, we should understand why.  (This may be the
>same problem that causes "wait" to be called from the
>wrong layer, as seen above.)

Kindly check the backtrace section [ BT:- pid_to_str stack ] below this email. So, what is happening is a thread event will come through threads and a process even will come through process layer. For example, while I press an interrupt key [Ctrl+c] in a multi process scenario, for the GDB core knowing which process is needed. By looking at the stack, it is built assuming the target will figure out the kernel thread that eventually caused this event in the process layer.

Secondly kindly look at aix-thread.c:pid_to_str. We have a beneath()->pid_to_str () there in case the process is not threaded. So, we need one in the rs6000-aix-nat.c.

aix_thread_target::pid_to_str (ptid_t ptid)

{

  if (!PD_TID (ptid))

    return beneath ()->pid_to_str (ptid);


  return string_printf (_("Thread %s"), pulongest (ptid.tid ()));

}

>>I have added an assertion here just
>>to be sure. I get what you are thinking.
>Having an assertion is of course good, but it isn't obvious to
>me that this never can be hit.

So, while I ran a few unit tests I did not find any case where we might end swapping the pid. So, I added the same so that if anyone hits this in the future, we are aware and can change accordingly.

>The point is if GDB stops because the target received a signal, it
>should automatically switch to the particular thread where the signal
>was in fact received.  I don't think this will actually happen in all
>cases with the current code.

>Shouldn't you instead check for *any* signal in get_signaled_thread?

Yes, kindly check the get_signaled_thread_rs6000 ().

----------------------------------------------------

These are the changes I made thinking about how we can handle that get_signaled thread in one place. I have also attached the outputs and programs below.

Also, now we pass ptid in some functions instead of pid in aix-thread.c.

Kindly let me know what you think.

Have a nice day ahead.

Thanks and regards,
Aditya.



---------------------------------------------------------------------------------------------

BT:- Thread_wait


Thread 1 hit Breakpoint 1, aix_thread_target::wait (this=0x11001f758 <_aixthread.rw_+24>, ptid=...,

    status=0xffffffffffff360, options=...) at aix-thread.c:1051

1051        pid_to_prc (&ptid);

(gdb) bt

#0  aix_thread_target::wait (this=0x11001f758 <_aixthread.rw_+24>, ptid=..., status=0xffffffffffff360,

    options=...) at aix-thread.c:1051

#1  0x0000000100340778 in target_wait (ptid=..., status=0xffffffffffff360, options=...) at target.c:2598

#2  0x000000010037f158 in do_target_wait_1 (inf=0x1101713f0, ptid=..., status=0xffffffffffff360,

    options=...) at infrun.c:3763

#3  0x000000010037f41c in <lambda(inferior*)>::operator()(inferior *) const (

    __closure=0xffffffffffff130, inf=0x1101713f0) at infrun.c:3822

#4  0x000000010037f85c in do_target_wait (ecs=0xffffffffffff338, options=...) at infrun.c:3841

#5  0x0000000100380cc8 in fetch_inferior_event () at infrun.c:4201

#6  0x0000000100a1e354 in inferior_event_handler (event_type=INF_REG_EVENT) at inf-loop.c:41

#7  0x0000000100392700 in infrun_async_inferior_event_handler (data=0x0) at infrun.c:9555

#8  0x0000000100677d88 in check_async_event_handlers () at async-event.c:337

#9  0x000000010067439c in gdb_do_one_event (mstimeout=-1) at event-loop.cc:221

#10 0x0000000100001dd0 in start_event_loop () at main.c:411

#11 0x0000000100001fd8 in captured_command_loop () at main.c:471

#12 0x0000000100004150 in captured_main (data=0xffffffffffff9f0) at main.c:1330

#13 0x0000000100004224 in gdb_main (args=0xffffffffffff9f0) at main.c:1345

#14 0x0000000100000aa0 in main (argc=2, argv=0xffffffffffffa90) at gdb.c:32

------------------------------------------------------------

BT:- Post thread wait in rs6000-aix-nat::wait


(gdb) c

Continuing.


Thread 1 hit Breakpoint 2, rs6000_nat_target::wait (this=0x1100a2e10 <_rs6000aixnat.rw_>, ptid=...,

    ourstatus=0xffffffffffff360, options=...) at rs6000-aix-nat.c:695

695           set_sigint_trap ();

(gdb) bt

#0  rs6000_nat_target::wait (this=0x1100a2e10 <_rs6000aixnat.rw_>, ptid=...,

    ourstatus=0xffffffffffff360, options=...) at rs6000-aix-nat.c:695

#1  0x0000000100599d68 in aix_thread_target::wait (this=0x11001f758 <_aixthread.rw_+24>, ptid=...,

    status=0xffffffffffff360, options=...) at aix-thread.c:1053

#2  0x0000000100340778 in target_wait (ptid=..., status=0xffffffffffff360, options=...) at target.c:2598

#3  0x000000010037f158 in do_target_wait_1 (inf=0x1101713f0, ptid=..., status=0xffffffffffff360,

    options=...) at infrun.c:3763

#4  0x000000010037f41c in <lambda(inferior*)>::operator()(inferior *) const (

    __closure=0xffffffffffff130, inf=0x1101713f0) at infrun.c:3822

#5  0x000000010037f85c in do_target_wait (ecs=0xffffffffffff338, options=...) at infrun.c:3841

#6  0x0000000100380cc8 in fetch_inferior_event () at infrun.c:4201

#7  0x0000000100a1e354 in inferior_event_handler (event_type=INF_REG_EVENT) at inf-loop.c:41

#8  0x0000000100392700 in infrun_async_inferior_event_handler (data=0x0) at infrun.c:9555

#9  0x0000000100677d88 in check_async_event_handlers () at async-event.c:337

#10 0x000000010067439c in gdb_do_one_event (mstimeout=-1) at event-loop.cc:221

#11 0x0000000100001dd0 in start_event_loop () at main.c:411

#12 0x0000000100001fd8 in captured_command_loop () at main.c:471

#13 0x0000000100004150 in captured_main (data=0xffffffffffff9f0) at main.c:1330

#14 0x0000000100004224 in gdb_main (args=0xffffffffffff9f0) at main.c:1345

#15 0x0000000100000aa0 in main (argc=2, argv=0xffffffffffffa90) at gdb.c:32

-----------------------------------------------------------------------------------------------------
BT:- If direct rs6000-aix-nat::wait


Thread 1 hit Breakpoint 2, rs6000_nat_target::wait (this=0x1100a2e10 <_rs6000aixnat.rw_>, ptid=...,

    ourstatus=0xffffffffffff360, options=...) at rs6000-aix-nat.c:695

695           set_sigint_trap ();

(gdb) bt

#0  rs6000_nat_target::wait (this=0x1100a2e10 <_rs6000aixnat.rw_>, ptid=...,

    ourstatus=0xffffffffffff360, options=...) at rs6000-aix-nat.c:695

#1  0x0000000100340778 in target_wait (ptid=..., status=0xffffffffffff360, options=...) at target.c:2598

#2  0x000000010037f158 in do_target_wait_1 (inf=0x1105f4430, ptid=..., status=0xffffffffffff360,

    options=...) at infrun.c:3763

#3  0x000000010037f41c in <lambda(inferior*)>::operator()(inferior *) const (

    __closure=0xffffffffffff130, inf=0x1105f4430) at infrun.c:3822

#4  0x000000010037f85c in do_target_wait (ecs=0xffffffffffff338, options=...) at infrun.c:3841

#5  0x0000000100380cc8 in fetch_inferior_event () at infrun.c:4201

#6  0x0000000100a1e354 in inferior_event_handler (event_type=INF_REG_EVENT) at inf-loop.c:41

#7  0x0000000100392700 in infrun_async_inferior_event_handler (data=0x0) at infrun.c:9555

#8  0x0000000100677d88 in check_async_event_handlers () at async-event.c:337

#9  0x000000010067439c in gdb_do_one_event (mstimeout=-1) at event-loop.cc:221

#10 0x0000000100001dd0 in start_event_loop () at main.c:411

#11 0x0000000100001fd8 in captured_command_loop () at main.c:471

#12 0x0000000100004150 in captured_main (data=0xffffffffffff9f0) at main.c:1330

#13 0x0000000100004224 in gdb_main (args=0xffffffffffff9f0) at main.c:1345

#14 0x0000000100000aa0 in main (argc=2, argv=0xffffffffffffa90) at gdb.c:32

---------------------------------------------------------

BT:- pid_to_str stack


(gdb) bt

#0  rs6000_nat_target::pid_to_str[abi:cxx11](ptid_t) (this=0x1100a2e10 <_rs6000aixnat.rw_>, ptid=...)

    at rs6000-aix-nat.c:674

#1  0x00000001003409ec in target_pid_to_str[abi:cxx11](ptid_t) (ptid=...) at target.c:2623

#2  0x000000010038fc08 in normal_stop () at infrun.c:8697

#3  0x0000000100380ff4 in fetch_inferior_event () at infrun.c:4266

#4  0x0000000100a1e354 in inferior_event_handler (event_type=INF_REG_EVENT) at inf-loop.c:41

#5  0x0000000100392700 in infrun_async_inferior_event_handler (data=0x0) at infrun.c:9555

#6  0x0000000100677d88 in check_async_event_handlers () at async-event.c:337

#7  0x000000010067439c in gdb_do_one_event (mstimeout=-1) at event-loop.cc:221

#8  0x0000000100001dd0 in start_event_loop () at main.c:411

#9  0x0000000100001fd8 in captured_command_loop () at main.c:471

#10 0x0000000100004150 in captured_main (data=0xffffffffffff9f0) at main.c:1330

#11 0x0000000100004224 in gdb_main (args=0xffffffffffff9f0) at main.c:1345

#12 0x0000000100000aa0 in main (argc=2, argv=0xffffffffffffa90) at gdb.c:32


---------------------------------------------------------------------


Program1:- [Credits gdb.threads/continuous-pending.c]


#include <stdio.h>

#include <unistd.h>

#include <stdlib.h>

#include <pthread.h>

#include <assert.h>


pthread_barrier_t barrier;


#define NUM_THREADS 3


void *

thread_function (void *arg)

{

  /* This ensures that the breakpoint is only hit after both threads

     are created, so the test can always switch to the non-event

     thread when the breakpoint triggers.  */

  pthread_barrier_wait (&barrier);


  while (1); /* break here */

}


int

main (void)

{

  int i;


  alarm (300);


  pthread_barrier_init (&barrier, NULL, NUM_THREADS);


  for (i = 0; i < NUM_THREADS; i++)

    {

      pthread_t thread;

      int res;


      res = pthread_create (&thread, NULL,

                            thread_function, NULL);

      assert (res == 0);

    }


  while (1)

    sleep (1);


  return 0;

}



----------------------------------------------------------------
Output1:- Single process


Reading symbols from /home/aditya/gdb_tests/continue-pending-status...

(gdb) r

Starting program: /home/aditya/gdb_tests/continue-pending-status

^C[New Thread 258]

[New Thread 515]

[New Thread 772]


Thread 3 received signal SIGINT, Interrupt.

[Switching to Thread 515]

thread_function (arg=0x0)

    at /home/aditya/gdb_tests/continue-pending-status.c:36

36        while (1); /* break here */

(gdb) info threads

  Id   Target Id                          Frame

  1    Thread 1 (tid 24838585, running)   warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)


  2    Thread 258 (tid 23134635, running) thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)


0x0)

    at /home/aditya/gdb_tests/continue-pending-status.c:36

* 3    Thread 515 (tid 30146867, running) thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)


0x0)

    at /home/aditya/gdb_tests/continue-pending-status.c:36

  4    Thread 772 (tid 27853165, running) thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)


0x0)

    at /home/aditya/gdb_tests/continue-pending-status.c:36

---------------------------------------------------------------------------------

Program 2:- Multi process Code


#include <stdio.h>

#include <unistd.h>

#include <stdlib.h>

#include <pthread.h>

#include <assert.h>


pthread_barrier_t barrier;


#define NUM_THREADS 2


void *

thread_function (void *arg)

{

  /* This ensures that the breakpoint is only hit after both threads

     are created, so the test can always switch to the non-event

     thread when the breakpoint triggers.  */


  pthread_barrier_wait (&barrier);

  pid_t child;


  child = fork ();

  if (child > 0)

    printf ("I am parent \n");

  else{

    printf (" Iam child \n");

    child = fork ();

    if (child > 0)

      printf ("From child I became a parent \n");

    else

      printf ("I am grandchild \n");

  }

  while (1); /* break here */

}


int

main (void)

{

  int i;


  alarm (300);


  pthread_barrier_init (&barrier, NULL, NUM_THREADS);


  for (i = 0; i < NUM_THREADS; i++)

    {

      pthread_t thread;

      int res;


      res = pthread_create (&thread, NULL,

                            thread_function, NULL);

      assert (res == 0);

    }


  while (1)

  {

    sleep (15);

    break;

  }


  return 0;

}


-------------------------------------------------------------------------

Output 2:- With detach-on-fork on


Reading symbols from /home/aditya/gdb_tests/ultimate-multi-thread-fork...

(gdb) r

Starting program: /home/aditya/gdb_tests/ultimate-multi-thread-fork

[New Thread 258]

[New Thread 515]

[Detaching after fork from child process 8323572]

 Iam child

I am grandchild

From child I became a parent

I am parent

[Detaching after fork from child process 11665884]

 Iam child

I am grandchild

From child I became a parent

I am parent

^C

Thread 2 received signal SIGINT, Interrupt.

[Switching to Thread 258]

thread_function (arg=0x0)

    at /home/aditya/gdb_tests/ultimate-multi-thread-fork.c:32

32        while (1); /* break here */

(gdb) info threads

  Id   Target Id                          Frame

  1    Thread 1 (tid 27263269, running)   warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)


* 2    Thread 258 (tid 28705075, running) thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)


0x0)

    at /home/aditya/gdb_tests/ultimate-multi-thread-fork.c:32

  3    Thread 515 (tid 27853169, running) thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)


0x0)

    at /home/aditya/gdb_tests/ultimate-multi-thread-fork.c:32


-------------------------------------------------------------------------

Output 3:- With detach-on-fork off


Reading symbols from /home/aditya/gdb_tests/ultimate-multi-thread-fork...

(gdb) set detach-on-fork off

(gdb) r

Starting program: /home/aditya/gdb_tests/ultimate-multi-thread-fork

[New Thread 258]

[New Thread 515]

[New inferior 2 (Process 15466928)]

[New inferior 3 (Process 13894048)]

I am parent

I am parent

^C

Thread 1.1 received signal SIGINT, Interrupt.

[Switching to Thread 1]

0xd0595fb0 in _p_nsleep () from /usr/lib/libpthread.a(shr_xpg5.o)

(gdb) info threads

  Id   Target Id         Frame

* 1.1  Thread 1          0xd0595fb0 in _p_nsleep ()

   from /usr/lib/libpthread.a(shr_xpg5.o)

  1.2  Thread 258        0xd0595fb0 in _p_nsleep ()

   from /usr/lib/libpthread.a(shr_xpg5.o)

  1.3  Thread 515        0xd0595fb0 in _p_nsleep ()

   from /usr/lib/libpthread.a(shr_xpg5.o)

  2.1  Process 15466928  0xd0594fc8 in ?? ()

  3.1  Process 13894048  0xd0594fc8 in ?? ()

--------------------------------------------------

Output 4:- detach fork off and following child


Reading symbols from /home/aditya/gdb_tests/ultimate-multi-thread-fork...

(gdb) set detach-on-fork off

(gdb) set follow-fork-mode child

(gdb) r

Starting program: /home/aditya/gdb_tests/ultimate-multi-thread-fork

[New Thread 258]

[New Thread 515]

[Attaching after Thread 515 fork to child Process 13894050]

[New inferior 2 (Process 13894050)]

 Iam child

[Attaching after Process 13894050 fork to child Process 11010474]

[New inferior 3 (Process 11010474)]

I am grandchild

^CReading symbols from /home/aditya/gdb_tests/ultimate-multi-thread-fork...


Thread 3.1 received signal SIGINT, Interrupt.

[Switching to Process 11010474]

thread_function (arg=0x0)

    at /home/aditya/gdb_tests/ultimate-multi-thread-fork.c:32

32        while (1); /* break here */

(gdb) info threads

  Id   Target Id         Frame

  1.1  Thread 1          0xd0594fc8 in _sigsetmask ()

   from /usr/lib/libpthread.a(shr_xpg5.o)

  1.2  Thread 258        0xd0594fc8 in _sigsetmask ()

   from /usr/lib/libpthread.a(shr_xpg5.o)

  1.3  Thread 515        0xd0594fc8 in _sigsetmask ()

   from /usr/lib/libpthread.a(shr_xpg5.o)

  2.1  Process 13894050  0xd0594fc8 in ?? ()

* 3.1  Process 11010474  thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)


0x0)

    at /home/aditya/gdb_tests/ultimate-multi-thread-fork.c:32




________________________________
From: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
Sent: 06 December 2022 00:03
To: simark@simark.ca <simark@simark.ca>; Aditya Kamath1 <Aditya.Kamath1@ibm.com>; gdb-patches@sourceware.org <gdb-patches@sourceware.org>
Cc: Sangamesh Mallayya <sangamesh.swamy@in.ibm.com>
Subject: Re: [PATCH] 0001-Fix-multi-thread-debug-bug-in-AIX.patch

Aditya Kamath1 <Aditya.Kamath1@ibm.com> wrote:

>>I'm not sure why it is necessary to handle this in the process layer
>>(rs6000-aix-nat.c) instead of the thread layer (aix-thread.c).
>>What specifically breaks if you do not have these rs6000-aix-nat.c
>>changes?
>
>So, if you observe output 3 or 4, the program first multi threads,
>I mean thread events are handled first and then the threads fork.
>So, when this happens, I cannot return ptid_t (parent_pid). If I do
>so, the GDB core will treat it as a new process and add it in my
>threadlist as say process 100 despite existence of 'thread 1'
>representing the same. So, I need to correctly send which thread
>did the fork () event or which thread of the process is the one who
>gave birth to a new inferior process [say 2 or 3 in output 3 below],
>I mean which thread caused the mult process event when the process
>is mutli threaded. This has to handled here as control from target.c
>comes directly to rs6000-aix-nat::wait and not through
>aix-thread.c::wait since fork () is a process event..

So this last bit seems to be the problem.  Could you elaborate on
what the exact call stack is?  I thought once the thread layer is
initialized, calls to ::wait should always go through it ...

>>If you *do* need to handle LWPs (kernel thread IDs) in the process
>>layer (this can be a reasonable choice, and it done by several other
>>native targets), then it should be *consistent*, and *all* LWP handling
>>should be in the process layer. In particular, under no circumstances
>>does it make sense to duplicate the "find current/signalled thread"
>>code in *both* the process any thread layers.
>
>This not straightforward to do. The reason being say our application is pthreaded
>We need our sync_threadlists() code to detect multiple threads and sync..
>We cannot handle this in rs6000-aix-nat.c with the current design of the code..
>Let's say child process is multi-threaded things can get complex..
>It will require us to move that whole GDB list and Pthread list sync code to
>rs6000-aix-nat.c code. The essence or most selling product or the USP
>[Unique Selling Proposition] of aix-thread.c code will be lost.

So the way this works e.g. on Linux is that the process layer handles
both processes and the *kernel* aspect of threads, while the thread
layer handles the *user-space* (libc/libpthread) aspect of threads.

In terms of the GDB ptid_t, this means that both the "pid" and "lwp"
field are "owned" by the process layer (which would be rs6000-aix-nat.c
in your case), while only the "tid" field is owned by the thread
layer (which would be aix-thread.c).

Linux does that because it allows correctly debugging programs that
only use the kernel threading capabilities without using libpthread,
e.g. by directly calling the "clone" system call and not "pthread_create".
Such threads won't be in the thread list managed by the user space
library, but are still handled by the process layer in GDB, tracked
as lwp without associated tid.

Not sure if something like that is even possible in AIX.  If it does
make sense to handle things similarly in AIX (one other reason would
be ptrace commands that require LWPs, e.g. like the VSX register
access you had in another thread), some code would indeed need
to move, e.g. everything related to accessing *kernel* threads
(fetch_regs_kernel_thread etc.), while code that accesses *user*
threads via the libpthread accessors (fetch_regs_user_thread etc.)
would still remain in aix-thread.c.


>>>[Switching to process 16777620]
>
>>This outputs inferior_ptid ...
>
>Yes, you were right
>
>>>* 1.1  process 16777620  0xd0595fb0 in _p_nsleep ()
>>>   from /usr/lib/libpthread.a(shr_xpg5.o)
>>>  1.2  process 16777620  0xd0595fb0 in _p_nsleep ()
>>>   from /usr/lib/libpthread.a(shr_xpg5.o)
>>>  1.3  process 16777620  0xd0595fb0 in _p_nsleep ()
>>>   from /usr/lib/libpthread.a(shr_xpg5.o)
>>>  2.1  process 8323570   0xd0594fc8 in ?? ()
>>>  3.1  process 17957172  0xd0594fc8 in ?? ()
>
>>... and this outputs the ptid values for those threads.
>
>>If it says "process ...", then those ptid values have not
>>properly been switched over to the (pid, lwp, tid) format.

>While debugged in depth last two days I realised our pid_to_str
>is needed in rs6000-aix-nat.c as control comes here in search of it.
>If it doesn't GDB treats all threads as process.

This is again very suspicious.  We obviously already have
threads, so the thread layer should be initialized.  This
means that any "pid_to_str" call should go through the
*thread* layer (implementation in aix-thread.c).  If that
doesn't happen, we should understand why.  (This may be the
same problem that causes "wait" to be called from the
wrong layer, as seen above.)


>>You should verify that the sync_threadlists code handles
>>all multi-process cases correctly.  I haven't looked at
>>this in detail, but are you sure that here:
>
>>>@@ -841,8 +829,22 @@ sync_threadlists (int pid)
> >>            }
> >>          else if (cmp_result > 0)
> >>            {
>>>-             delete_thread (gbuf[gi]);
>
>
>>you never accidentally switch the *pid* part (if "gptid"
>>belows to a different pid than "pptid")?
>
>So, this is not the reason. I have added an assertion here just
>to be sure. I get what you are thinking.

Having an assertion is of course good, but it isn't obvious to
me that this never can be hit.


>>Hmm.  So when "wait" returns, it needs to determine which thread
>>triggered the event that caused ptrace to stop.  On Linux, "wait"
>>will actually return the LWP of that thread, so it can be directly
>>used.  It seems on AIX, "wait" only returns a PID, and you do not
>>immediately know which thread caused the event?
>
>>In that case, I can see why you'd have to consider SIGINT as well
>>as SIGTRAP. However, it seems to me that even those two are not the
>>*only* cases that can cause "wait" to return - doesn't *any* signal
>>(potentially) trigger a ptrace intercept (causing wait to return)?
>
>>But that's probably a more general problem, and wouldn't occur in
>>this simple test case.
>
>Exactly. So I tried debugging few examples causing a few other signals
>as mentioned in this document [https://www.ibm.com/docs/en/sdk-java-technology/8?topic=reference-signal-handling].
>In AIX we have most of them mentioned in the link. It does not block
>us from doing things or crashes incase of a segment fault signal
>[from our debugger code]. Abort also works fine. Let me know what you think.

The point is if GDB stops because the target received a signal, it
should automatically switch to the particular thread where the signal
was in fact received.  I don't think this will actually happen in all
cases with the current code.

Shouldn't you instead check for *any* signal in get_signaled_thread?


Bye,
Ulrich


[-- Attachment #2: 0001-Fix-multi-thread-bug-in-AIX.patch --]
[-- Type: application/octet-stream, Size: 12381 bytes --]

From bba91f717b7779f39d71282835cceaaeda7ef588 Mon Sep 17 00:00:00 2001
From: Aditya Vidyadhar Kamath <Aditya.Kamath1@ibm.com>
Date: Thu, 8 Dec 2022 01:03:58 -0600
Subject: [PATCH] Fix multi thread bug in AIX

---
 gdb/aix-thread.c     | 149 ++++++++++++++++---------------------------
 gdb/rs6000-aix-nat.c |  66 ++++++++++++++++++-
 2 files changed, 118 insertions(+), 97 deletions(-)

diff --git a/gdb/aix-thread.c b/gdb/aix-thread.c
index e556c153576..6e6f8619b64 100644
--- a/gdb/aix-thread.c
+++ b/gdb/aix-thread.c
@@ -508,14 +508,13 @@ pdc_read_data (pthdb_user_t user_current_pid, void *buf,
   /* This is needed to eliminate the dependency of current thread
      which is null so that thread reads the correct target memory.  */
   {
-    scoped_restore_current_thread restore_current_thread;
+    scoped_restore save_inferior_ptid = make_scoped_restore (&inferior_ptid);
     /* Before the first inferior is added, we pass inferior_ptid.pid ()
        from pd_enable () which is 0.  There is no need to switch threads
        during first initialisation.  In the rest of the callbacks the
        current thread needs to be correct.  */
     if (user_current_pid != 0)
-      switch_to_thread (current_inferior ()->process_target (),
-			ptid_t (user_current_pid));
+      inferior_ptid = ptid_t (user_current_pid);
     status = target_read_memory (addr, (gdb_byte *) buf, len);
   }
   ret = status == 0 ? PDC_SUCCESS : PDC_FAILURE;
@@ -639,36 +638,24 @@ pcmp (const void *p1v, const void *p2v)
   return p1->pthid < p2->pthid ? -1 : p1->pthid > p2->pthid;
 }
 
-/* iterate_over_threads() callback for counting GDB threads.
+/* iterate_over_threads() callback for counting GDB threads.  */
 
-   Do not count the main thread (whose tid is zero).  This matches
-   the list of threads provided by the pthreaddebug library, which
-   does not include that main thread either, and thus allows us
-   to compare the two lists.  */
 
 static int
 giter_count (struct thread_info *thread, void *countp)
 {
-  if (PD_TID (thread->ptid))
-    (*(int *) countp)++;
+  (*(int *) countp)++;
   return 0;
 }
 
-/* iterate_over_threads() callback for accumulating GDB thread pids.
-
-   Do not include the main thread (whose tid is zero).  This matches
-   the list of threads provided by the pthreaddebug library, which
-   does not include that main thread either, and thus allows us
-   to compare the two lists.  */
+/* iterate_over_threads() callback for accumulating GDB thread pids.  */
 
 static int
 giter_accum (struct thread_info *thread, void *bufp)
 {
-  if (PD_TID (thread->ptid))
-    {
-      **(struct thread_info ***) bufp = thread;
-      (*(struct thread_info ***) bufp)++;
-    }
+  **(struct thread_info ***) bufp = thread;
+  (*(struct thread_info ***) bufp)++;
+    
   return 0;
 }
 
@@ -703,30 +690,6 @@ gcmp (const void *t1v, const void *t2v)
   return ptid_cmp (t1->ptid, t2->ptid);
 }
 
-/* Search through the list of all kernel threads for the thread
-   that has stopped on a SIGTRAP signal, and return its TID.
-   Return 0 if none found.  */
-
-static pthdb_tid_t
-get_signaled_thread (int pid)
-{
-  struct thrdsinfo64 thrinf;
-  tid_t ktid = 0;
-
-  while (1)
-    {
-      if (getthrds (pid, &thrinf,
-		    sizeof (thrinf), &ktid, 1) != 1)
-	break;
-
-      if (thrinf.ti_cursig == SIGTRAP)
-	return thrinf.ti_tid;
-    }
-
-  /* Didn't find any thread stopped on a SIGTRAP signal.  */
-  return 0;
-}
-
 /* Synchronize GDB's thread list with libpthdebug's.
 
    There are some benefits of doing this every time the inferior stops:
@@ -740,8 +703,8 @@ get_signaled_thread (int pid)
      - simplifies the demands placed on libpthdebug, which seems to
        have difficulty with certain call patterns */
 
-static void
-sync_threadlists (int pid)
+static ptid_t 
+sync_threadlists (ptid_t ptid)
 {
   int cmd, status;
   int pcount, psize, pi, gcount, gi;
@@ -750,6 +713,15 @@ sync_threadlists (int pid)
   pthdb_pthread_t pdtid;
   pthread_t pthid;
   pthdb_tid_t tid;
+  process_stratum_target *proc_target
+	= current_inferior ()->process_target ();
+  thread_info *tp;
+  pid_t pid = ptid.pid ();
+
+  /* This ptid should hold the ptid of a new thread 
+     or return the incoming ptid incase of delete thread.  */
+
+  ptid_t post_sync_ptid = ptid;
 
   /* Accumulate an array of libpthdebug threads sorted by pthread id.  */
 
@@ -810,12 +782,10 @@ sync_threadlists (int pid)
 	  priv->pdtid = pbuf[pi].pdtid;
 	  priv->tid = pbuf[pi].tid;
 
-	  process_stratum_target *proc_target
-	    = current_inferior ()->process_target ();
 	  thread = add_thread_with_info (proc_target,
-					 ptid_t (pid, 0, pbuf[pi].pthid),
+					 ptid_t (pid, pbuf[pi].tid, pbuf[pi].pthid),
 					 priv);
-
+	  post_sync_ptid = thread->ptid;
 	  pi++;
 	}
       else
@@ -823,7 +793,7 @@ sync_threadlists (int pid)
 	  ptid_t pptid, gptid;
 	  int cmp_result;
 
-	  pptid = ptid_t (pid, 0, pbuf[pi].pthid);
+	  pptid = ptid_t (pid, pbuf[pi].tid, pbuf[pi].pthid);
 	  gptid = gbuf[gi]->ptid;
 	  pdtid = pbuf[pi].pdtid;
 	  tid = pbuf[pi].tid;
@@ -841,15 +811,31 @@ sync_threadlists (int pid)
 	    }
 	  else if (cmp_result > 0)
 	    {
-	      delete_thread (gbuf[gi]);
-	      gi++;
+	      if (gptid.is_pid ())
+		{
+		  gdb_assert (gptid.pid () == pptid.pid ());
+		  thread_change_ptid (proc_target, gptid, pptid);
+		  aix_thread_info *priv = new aix_thread_info;
+		  priv->pdtid = pbuf[pi].pdtid;
+		  priv->tid = pbuf[pi].tid;
+		  tp = find_thread_ptid (proc_target, pptid);
+		  tp->priv.reset (priv);
+		  pi++;
+		  gi++;
+		  post_sync_ptid = pptid;
+		}
+	      else
+		{
+		  delete_thread (gbuf[gi]);
+		  gi++;
+		}
 	    }
 	  else
 	    {
 	      process_stratum_target *proc_target
 		= current_inferior ()->process_target ();
 	      thread = add_thread (proc_target, pptid);
-
+	      post_sync_ptid = pptid;	
 	      aix_thread_info *priv = new aix_thread_info;
 	      thread->priv.reset (priv);
 	      priv->pdtid = pdtid;
@@ -861,18 +847,7 @@ sync_threadlists (int pid)
 
   xfree (pbuf);
   xfree (gbuf);
-}
-
-/* Iterate_over_threads() callback for locating a thread, using
-   the TID of its associated kernel thread.  */
-
-static int
-iter_tid (struct thread_info *thread, void *tidp)
-{
-  const pthdb_tid_t tid = *(pthdb_tid_t *)tidp;
-  aix_thread_info *priv = get_aix_thread_info (thread);
-
-  return priv->tid == tid;
+  return post_sync_ptid;
 }
 
 /* Synchronize libpthdebug's state with the inferior and with GDB,
@@ -881,33 +856,23 @@ iter_tid (struct thread_info *thread, void *tidp)
    return a pid-only ptid with PID.  */
 
 static ptid_t
-pd_update (int pid)
+pd_update (ptid_t ptid)
 {
   int status;
-  ptid_t ptid;
   pthdb_tid_t tid;
   struct thread_info *thread = NULL;
+  ptid_t post_sync_ptid;
 
   if (!pd_active)
-    return ptid_t (pid);
+    return ptid;
 
   status = pthdb_session_update (pd_session);
   if (status != PTHDB_SUCCESS)
-    return ptid_t (pid);
+    return ptid;
 
-  sync_threadlists (pid);
+  post_sync_ptid = sync_threadlists (ptid);
 
-  /* Define "current thread" as one that just received a trap signal.  */
-
-  tid = get_signaled_thread (pid);
-  if (tid != 0)
-    thread = iterate_over_threads (iter_tid, &tid);
-  if (!thread)
-    ptid = ptid_t (pid);
-  else
-    ptid = thread->ptid;
-
-  return ptid;
+  return post_sync_ptid;
 }
 
 /* Try to start debugging threads in the current process.
@@ -915,19 +880,19 @@ pd_update (int pid)
    for that thread.  Otherwise, return a ptid-only ptid using PID.  */
 
 static ptid_t
-pd_activate (int pid)
+pd_activate (ptid_t ptid)
 {
   int status;
 		
-  status = pthdb_session_init (pid, arch64 ? PEM_64BIT : PEM_32BIT,
+  status = pthdb_session_init (ptid.pid (), arch64 ? PEM_64BIT : PEM_32BIT,
 			       PTHDB_FLAG_REGS, &pd_callbacks, 
 			       &pd_session);
   if (status != PTHDB_SUCCESS)
     {
-      return ptid_t (pid);
+      return ptid;
     }
   pd_active = 1;
-  return pd_update (pid);
+  return pd_update (ptid);
 }
 
 /* Undo the effects of pd_activate().  */
@@ -983,7 +948,7 @@ pd_enable (void)
   /* If we're debugging a core file or an attached inferior, the
      pthread library may already have been initialized, so try to
      activate thread debugging.  */
-  pd_activate (inferior_ptid.pid ());
+  pd_activate (inferior_ptid);
 }
 
 /* Undo the effects of pd_enable().  */
@@ -1091,10 +1056,6 @@ aix_thread_target::wait (ptid_t ptid, struct target_waitstatus *status,
   if (ptid.pid () == -1)
     return ptid_t (-1);
 
-  /* The target beneath does not deal with threads, so it should only return
-     pid-only ptids.  */
-  gdb_assert (ptid.is_pid ());
-
   /* Check whether libpthdebug might be ready to be initialized.  */
   if (!pd_active && status->kind () == TARGET_WAITKIND_STOPPED
       && status->sig () == GDB_SIGNAL_TRAP)
@@ -1106,10 +1067,10 @@ aix_thread_target::wait (ptid_t ptid, struct target_waitstatus *status,
 
       if (regcache_read_pc (regcache)
 	  - gdbarch_decr_pc_after_break (gdbarch) == pd_brk_addr)
-	return pd_activate (ptid.pid ());
+	return pd_activate (ptid);
     }
 
-  return pd_update (ptid.pid ());
+  return pd_update (ptid);
 }
 
 /* Record that the 64-bit general-purpose registers contain VALS.  */
diff --git a/gdb/rs6000-aix-nat.c b/gdb/rs6000-aix-nat.c
index 2ac1f6e70b6..fe7ca4f3f2a 100644
--- a/gdb/rs6000-aix-nat.c
+++ b/gdb/rs6000-aix-nat.c
@@ -99,6 +99,8 @@ class rs6000_nat_target final : public inf_ptrace_target
      support.  */
   void follow_fork (inferior *, ptid_t, target_waitkind, bool, bool) override;
 
+  std::string pid_to_str (ptid_t) override;
+
 protected:
 
   void post_startup_inferior (ptid_t ptid) override;
@@ -619,6 +621,64 @@ rs6000_nat_target::xfer_partial (enum target_object object,
     }
 }
 
+/* Search through the list of all kernel threads for the thread
+   that has stopped on a SIGTRAP or SIGINT signal, and return
+   its TID.  Return 0 if none found.  */
+
+static tid_t
+get_signaled_thread_rs6000 (int pid)
+{
+  struct thrdsinfo64 thrinf;
+  tid_t ktid = 0;
+
+  while (1)
+    {
+      if (getthrds (pid, &thrinf,
+                    sizeof (thrinf), &ktid, 1) != 1)
+        break;
+
+      if (thrinf.ti_cursig != 0)
+        return thrinf.ti_tid;
+    }
+
+  /* Didn't find any thread stopped on a SIGTRAP signal.  */
+  return 0;
+}
+
+/* If my process is pthreaded I need to return that ptid else ptid_t
+   (pid).  */
+
+static ptid_t
+find_the_return_ptid (pid_t pid)
+{
+  ptid_t ptid = ptid_t (pid);
+  process_stratum_target *proc_target
+        = current_inferior ()->process_target ();
+  inferior *inf = find_inferior_pid (proc_target, pid);
+  thread_info *tp = find_thread_ptid (inf, ptid_t (pid));
+  if (tp == nullptr)
+    for (thread_info *tp1 : inf->threads ())
+       if (tp1->ptid.lwp () == get_signaled_thread_rs6000 (pid))
+         return tp1->ptid;
+  return ptid;
+}
+
+/* Returning "thread" or "process" info as control comes here 
+   during a process switch in multi process debugging.  This 
+   is needed for "info threads" command as a process can be
+   threaded or non threaded in multi process case.  */
+
+std::string
+rs6000_nat_target::pid_to_str (ptid_t ptid)
+{
+  if (ptid.tid () != 0)
+    return string_printf (_("Thread %s"), pulongest (ptid.tid ()));
+
+  else
+    return string_printf (_("Process %s"), pulongest (ptid.pid ()));
+}
+
+
 /* Wait for the child specified by PTID to do something.  Return the
    process ID of the child, or MINUS_ONE_PTID in case of error; store
    the status in *OURSTATUS.  */
@@ -672,7 +732,7 @@ rs6000_nat_target::wait (ptid_t ptid, struct target_waitstatus *ourstatus,
 	      if (parent_pid > 0)
 		{
 		  ourstatus->set_forked (ptid_t (pid));
-		  return ptid_t (parent_pid);
+		  return find_the_return_ptid (parent_pid);
 		}
 	      aix_remember_child (pid);
 	    }
@@ -687,7 +747,7 @@ rs6000_nat_target::wait (ptid_t ptid, struct target_waitstatus *ourstatus,
 	      if (child_pid > 0)
 		{
 		  ourstatus->set_forked (ptid_t (child_pid));
-		  return ptid_t (pid);
+		  return find_the_return_ptid (pid);
 		}
 	      aix_remember_parent (pid);
 	    }
@@ -712,7 +772,7 @@ rs6000_nat_target::wait (ptid_t ptid, struct target_waitstatus *ourstatus,
   else
     *ourstatus = host_status_to_waitstatus (status);
 
-  return ptid_t (pid);
+  return find_the_return_ptid (pid);
 }
 \f
 
-- 
2.31.1


  reply	other threads:[~2022-12-08 10:28 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-25  6:47 Aditya Kamath1
2022-10-28  9:49 ` Ulrich Weigand
2022-11-08 12:00   ` Aditya Kamath1
2022-11-08 12:17     ` Ulrich Weigand
2022-11-13 18:15       ` Aditya Kamath1
2022-11-15 18:16         ` Ulrich Weigand
2022-11-21  8:27           ` Aditya Kamath1
2022-11-23 14:15             ` Ulrich Weigand
2022-11-23 16:03               ` Aditya Kamath1
2022-11-23 17:09                 ` Ulrich Weigand
2022-11-23 18:45                   ` Aditya Kamath1
2022-11-29  8:18                     ` Aditya Kamath1
2022-11-30 14:57                       ` Ulrich Weigand
2022-12-02  7:50                         ` Aditya Kamath1
2022-12-05 18:33                           ` Ulrich Weigand
2022-12-08 10:28                             ` Aditya Kamath1 [this message]
2022-12-08 10:46                               ` Aditya Kamath1
2022-12-08 16:29                               ` Ulrich Weigand
2022-12-15 12:58                                 ` Aditya Kamath1
2022-12-15 15:53                                   ` Ulrich Weigand
2022-12-19  6:30                                     ` Aditya Kamath1
2022-12-22 12:50                                       ` Ulrich Weigand
2022-12-26 13:18                                         ` Aditya Kamath1
2023-01-09 14:04                                           ` Ulrich Weigand
2023-01-10 12:23                                             ` Aditya Kamath1
2023-01-11 13:31                                               ` Ulrich Weigand
2023-01-13 14:06                                                 ` Aditya Kamath1
2023-01-20 14:44                                                   ` Ulrich Weigand
2023-01-27 14:40                                                     ` Aditya Kamath1
2023-01-30 19:54                                                       ` Tom Tromey
2023-02-02  6:24                                                       ` Aditya Kamath1
2023-02-02  6:35                                                         ` Aditya Kamath1
2023-02-02 17:43                                                           ` Ulrich Weigand
2023-02-03 11:10                                                             ` Aditya Kamath1
2023-02-06 19:07                                                               ` Ulrich Weigand
2023-02-07 11:57                                                                 ` Aditya Kamath1
2023-02-08 18:44                                                                   ` Ulrich Weigand
2023-02-10 16:33                                                                     ` Aditya Kamath1
2023-02-10 16:46                                                                       ` Aditya Kamath1
2023-02-13 19:01                                                                       ` Ulrich Weigand
2023-02-14 14:13                                                                         ` Aditya Kamath1
2023-02-16 19:46                                                                           ` Ulrich Weigand
2023-02-17 11:26                                                                             ` Aditya Kamath1
2023-02-17 12:04                                                                               ` Ulrich Weigand
2023-02-17 13:22                                                                                 ` Aditya Kamath1
2023-02-17 14:18                                                                                   ` Ulrich Weigand
2023-02-17 15:15                                                                                     ` Aditya Kamath1
2023-02-17 19:14                                                                                       ` Ulrich Weigand
2022-11-08 12:00 Aditya Kamath1

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CH2PR15MB35447EE88C64F7F36A0C61F5D61D9@CH2PR15MB3544.namprd15.prod.outlook.com \
    --to=aditya.kamath1@ibm.com \
    --cc=Ulrich.Weigand@de.ibm.com \
    --cc=gdb-patches@sourceware.org \
    --cc=sangamesh.swamy@in.ibm.com \
    --cc=simark@simark.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).