Hi Ulrich,

I'm confused why this is the correct place.  From what I can see,
in this scenario, we have:

- libpthdebug reports some threads using a thread ID, i.e. pbuf has
   ptid_t (pid, 0, pthid1)
    ..
   ptid_t (pid, 0, pthidN)
with pcount >= 1.

- GDB only has one single thread in unthreaded mode, i.e. gbuf has
   ptid_t (pid, 0, 0)
with gcount == 1.

So when running the loop, during the first iteration, we should compare
   ptid_cmp (ptid_t (pid, 0, pthid1), ptid_t (pid, 0, 0))
which should be > 0 since pthid1 > 0.  Right?

This means we'll get into the branch that just does:
              delete_thread (gbuf[gi]);
thereby deleting the original thread.  Does this not happen for you?
What is going on instead?

[ Note that this is a simplified case with only a single process; in the
multi-process scenario, this may be more complex.



>What is going on instead?
The one I highlighted in bold does not happen. In our first iteration pcount is 1 but gcount is 0. That is why when gi =0 == gcount becomes true and control enters into that block instead of going into the else block..

>[ Note that this is a simplified case with only a single process; in the
multi-process scenario, this may be more complex.

This I agree. I just checked now..

Hmm. So, something is going wrong here..

gcount = 0;

  iterate_over_threads (giter_count, &gcount);

  g = gbuf = XNEWVEC (struct thread_info *, gcount);

  iterate_over_threads (giter_accum, &g);

  qsort (gbuf, gcount, sizeof *gbuf, gcmp);

Let's me check these lines. Hope I answered why I was doing it..

The rest of the changes you mentioned I will do it..
________________________________
From: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
Sent: 23 November 2022 19:45
To: simark@simark.ca <simark@simark.ca>; Aditya Kamath1 <Aditya.Kamath1@ibm.com>; gdb-patches@sourceware.org <gdb-patches@sourceware.org>
Cc: Sangamesh Mallayya <sangamesh.swamy@in.ibm.com>
Subject: Re: [PATCH] 0001-Fix-multi-thread-debug-bug-in-AIX.patch

Aditya Kamath1 <Aditya.Kamath1@ibm.com> wrote:

>@@ -514,8 +514,16 @@ pdc_read_data (pthdb_user_t user_current_pid, void *buf,
>        during first initialisation.  In the rest of the callbacks the
>        current thread needs to be correct.  */
>     if (user_current_pid != 0)
>-      switch_to_thread (current_inferior ()->process_target (),
>-                      ptid_t (user_current_pid));
>+    {
>+      inferior *inf = find_inferior_ptid (current_inferior ()-> process_target (),
>+                                          ptid_t (user_current_pid));
This would be simpler using find_inferior_pid.
>+        for (thread_info *tp: inf->threads ())
>+        if (tp != NULL)
This would be simpler using any_thread_of_inferior.
>+          {
>+            switch_to_thread (tp);
>+            break;
>+          }
>+    }
>     status = target_read_memory (addr, (gdb_byte *) buf, len);

However, switching to just any random thread of the process seems odd.

Looking at sol-thread.c, they don't switch to a thread at all
in the equivalent place, but rather do this:

  scoped_restore save_inferior_ptid = make_scoped_restore (&inferior_ptid);

  if (inferior_ptid.tid_p () || !target_thread_alive (inferior_ptid))
    {
      /* It's either a thread or an LWP that isn't alive.  Any live
         LWP will do so use the first available.

         NOTE: We don't need to call switch_to_thread; we're just
         reading memory.  */
      inferior_ptid = procfs_first_available ();
    }

Since your xfer_partial routine only ever looks at the PID
component of the ptid, I'm wondering if we couldn't similarly
just switch inferior_ptid, without actually switching theads.
Something along the lines of

  scoped_restore save_inferior_ptid = make_scoped_restore (&inferior_ptid);
  if (user_current_pid != 0)
    inferior_ptid = ptid_t (user_current_pid);

Does this work for you?

>-      if (thrinf.ti_cursig == SIGTRAP)
>+      /* In a multi threaded application user can interrupt the main
>+       thread as well. This function should return the tid in this
>+         case apart from threads that can trap or be interrupted.  */
Whitespace problem.
>+
>+      if (thrinf.ti_cursig == SIGTRAP || thrinf.ti_cursig == SIGINT)
>        return thrinf.ti_tid;

This seems an unrelated change?  If this is actually necessary,
then all the comments (e.g. at the top of this function, or at
the call site) likewise need to be updated - they only refer to
trap signals currently.

>          process_stratum_target *proc_target
>            = current_inferior ()->process_target ();
>-        thread = add_thread_with_info (proc_target,
>-                                       ptid_t (pid, 0, pbuf[pi].pthid),
>-                                       priv);
>+
>+        thread_info *tp = find_thread_ptid (proc_target, ptid_t (pid));
>+
>+        /* If the pthread library is used then we replace the main
>+           with the thread having the main thread ID and process ID.
>+           Otherwise this is a new thread and we need to add it.  */
>+        if (tp != NULL && tp->priv == NULL)
>+          {
>+          thread_change_ptid (proc_target, tp->ptid,
>+                              ptid_t (pid, 0, pbuf[pi].pthid));
>+          tp->priv.reset (priv);
>+          }
>+        else
>+          thread = add_thread_with_info (proc_target,
>+                                         ptid_t (pid, 0, pbuf[pi].pthid),
>+                                         priv);

I'm confused why this is the correct place.  From what I can see,
in this scenario, we have:

- libpthdebug reports some threads using a thread ID, i.e. pbuf has
   ptid_t (pid, 0, pthid1)
    ..
   ptid_t (pid, 0, pthidN)
with pcount >= 1.

- GDB only has one single thread in unthreaded mode, i.e. gbuf has
   ptid_t (pid, 0, 0)
with gcount == 1.

So when running the loop, during the first iteration, we should compare
   ptid_cmp (ptid_t (pid, 0, pthid1), ptid_t (pid, 0, 0))
which should be > 0 since pthid1 > 0.  Right?

This means we'll get into the branch that just does:
              delete_thread (gbuf[gi]);
thereby deleting the original thread.  Does this not happen for you?
What is going on instead?

[ Note that this is a simplified case with only a single process; in the
multi-process scenario, this may be more complex. ]


Bye,
Ulrich