public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
From: Aditya Kamath1 <Aditya.Kamath1@ibm.com>
To: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>,
	"gdb-patches@sourceware.org" <gdb-patches@sourceware.org>
Cc: Sangamesh Mallayya <sangamesh.swamy@in.ibm.com>
Subject: Re: [PATCH] Fix AIX thread NULL assertion failure during fork
Date: Wed, 22 Nov 2023 14:14:30 +0000	[thread overview]
Message-ID: <CH2PR15MB35443BE24D90A9242DC7B8E0D6BAA@CH2PR15MB3544.namprd15.prod.outlook.com> (raw)
In-Reply-To: <CH2PR15MB35443D51B7A0C6B3823B832CD6BAA@CH2PR15MB3544.namprd15.prod.outlook.com>


[-- Attachment #1.1: Type: text/plain, Size: 12971 bytes --]

Respected Ulrich and community members,

>>Yes. So we have a little more to understand here. So once pd_enable () was called and then pd_activate () >>was called it looks like either the libraries were not fully loaded though the pthdb_session_pthreaded was >>successful or the create thread event breakpoint was not successful. So the code execution did not reach >>pd_activate (). Therefore our child process was still ptid_t (pid, 0 ,0). When I used the pd_activate () inside >?>>the update_threadlist () it succeeded to initialise a session later on when it was called and then we were >>able to set pd_active and get to sync_threadlists (). Please see the outputs below for Program 1. We are >>awesome here.

I can confirm that this condition
/* When attaching / handling fork child, don't try activating
     thread debugging until we know about all shared libraries.  */
  if (inf->in_initial_library_scan)
    return;

is what is the reason we fail to reach pd_activate ().. Sorry for not being clear in my previous email..

From: Aditya Kamath1 <Aditya.Kamath1@ibm.com>
Date: Wednesday, 22 November 2023 at 7:28 PM
To: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>, gdb-patches@sourceware.org <gdb-patches@sourceware.org>
Cc: Sangamesh Mallayya <sangamesh.swamy@in.ibm.com>
Subject: Re: [PATCH] Fix AIX thread NULL assertion failure during fork
Respected Ulrich and community members,

Hi,

Please find attached the patch. {See: 0001-Fix-AIX-thr-NULL-assertion-failure-during-fork.patch}.

You are genius. Thank you for your inputs. I have a lot to learn from you.

>I see one possible issue with the current patch - there is one
>other existing caller to pd_activate in the ::wait routine:

 >     if (regcache_read_pc (regcache)
 >         - gdbarch_decr_pc_after_break (gdbarch) == data->pd_brk_addr)
 >       return pd_activate (ptid.pid ());

>This is the one place other than update_thread_list where you do
>in fact need to call pd_update.  However, now that you've changed
>pd_activate to no longer call pd_update, in the case where the
>code above executes "return pd_activate", that call is now missing.

>I think a straightforward fix might be to simply remove the "return"
>and just fall through into the pd_update after pd_activate returned.

This is done.


>Also, it looks like your're only updating the *current* inferior
>in update_thread_list.  Other targets update the thread lists of
>*all* inferiors.  E.g. have a look at the linux-thread-db.c
>version of update_thread_list.

Yeah, so from here on we are doing this. Please see in the patch.

>Having said that we are missing some corner cases that I want to solve
>in a separate patch or the same depending on whatever you suggest.

Do the above change fix those corner cases?

Yes. So we have a little more to understand here. So once pd_enable () was called and then pd_activate () was called it looks like either the libraries were not fully loaded though the pthdb_session_pthreaded was successful or the create thread event breakpoint was not successful. So the code execution did not reach pd_activate (). Therefore our child process was still ptid_t (pid, 0 ,0). When I used the pd_activate () inside the update_threadlist () it succeeded to initialise a session later on when it was called and then we were able to set pd_active and get to sync_threadlists (). Please see the outputs below for Program 1. We are awesome here.

Kindly let me know if this patch is good. Kindly push if there are no changes needed.

Let me also tell you why there are 2 failures out 32 as of now in foll-fork-print-inferiors.exp test case in which we are not able to see the message if child exits when we follow child.

The code is like this.
#include <stdlib.h>
#include <unistd.h>

int
main (int argc, char *argv[])
{
  pid_t child;

  child = fork ();
  switch (child)
    {
      case -1:
        abort ();
      case 0:
      default:
        break;
    }

  return 0;
}

So when pd_enable () is called and we succeed in pthdb_session_pthreaded (), the libraries are still not loaded or breakpoint is not se and we are unable to go to pd_activate() leaving our pd_active to be 0. This means pd_update () cannot go to sync_threadlists (). That is why the child is still ptid_t (pid, 0, 0), and by the time we could do something we receive a HUP signal and the code exits. That is the issue. Please see the output below. We see this only when we follow parent. When we follow parent all is well irrespective of detach-on-fork.

Reading symbols from //gdb_tests/fork-print-inferior-events...
(gdb) set follow-fork-mode child
(gdb) r
Starting program: /gdb_tests/fork-print-inferior-events
[Attaching after process 7340304 fork to child process 24117514]
[New inferior 2 (process 24117514)]
[Detaching after fork from parent process 7340304]
[Inferior 1 (process 7340304) detached]

Thread 2.1 received signal SIGHUP, Hangup.
[Switching to process 24117514]
0xd028b3f0 in fork () from /usr/lib/libc.a(_shr.o)
(gdb) info threads
  Id   Target Id         Frame
* 2.1  process 24117514  0xd028b3f0 in fork () from /usr/lib/libc.a(_shr.o)
(gdb) q

Have a nice day ahead.

Thanks and regards,
Aditya.



Program 1:-
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <pthread.h>
#include <assert.h>

pthread_barrier_t barrier;

#define NUM_THREADS 2

void *
thread_function (void *arg)
{
  /* This ensures that the breakpoint is only hit after both threads
     are created, so the test can always switch to the non-event
     thread when the breakpoint triggers.  */
  pthread_barrier_wait (&barrier);

  pid_t p;
    p = fork();
    if(p<0)
    {
      perror("fork fail");
      exit(1);
    }
    // child process because return value zero
    else if ( p == 0)
        printf("Hello from Child!\n");
    // parent process because return value non-zero.
    else
        printf("Hello from Parent!\n");
  while (1); /* break here */
}

int
main (void)
{
  int i;

  alarm (300);

  pthread_barrier_init (&barrier, NULL, NUM_THREADS);

  for (i = 0; i < NUM_THREADS; i++)
    {
      pthread_t thread;
      int res;

      res = pthread_create (&thread, NULL,
                            thread_function, NULL);
      assert (res == 0);
    }

  while (1)
    sleep (1);

  return 0;
}
Follow-fork-mode = parent, detach-on-fork = on
Reading symbols from //gdb_tests/multi-thread-fork...
(gdb) r
Starting program: /gdb_tests/multi-thread-fork
[New Thread 258]
[New Thread 515]
[Detaching after fork from child process 11338010]
Hello from Child!
Hello from Parent!
[Detaching after fork from child process 16515516]
Hello from Child!
Hello from Parent!

Thread 2 received signal SIGINT, Interrupt.
[Switching to Thread 258]
thread_function (arg=0x0) at //gdb_tests/multi-thread-fork.c:50
50        while (1); /* break here */
(gdb) info threads
  Id   Target Id                          Frame
  1    Thread 1 (tid 8853293, running)    warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)
* 2    Thread 258 (tid 10098629, running) thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)
0x0) at //gdb_tests/multi-thread-fork.c:50
  3    Thread 515 (tid 9443313, running)  thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)

0x0) at //gdb_tests/multi-thread-fork.c:50
(gdb)

Follow-fork-mode = child, detach-on-fork = on

Reading symbols from //gdb_tests/multi-thread-fork...
(gdb) set follow-fork-mode child
(gdb) r
Starting program: /gdb_tests/multi-thread-fork
[New Thread 258]
[New Thread 515]
[Attaching after Thread 515 fork to child process 16515518]
[New inferior 2 (process 16515518)]
[Detaching after fork from parent process 7340290]
[Inferior 1 (process 7340290) detached]
Hello from Parent!
Hello from Child!
Hello from Parent!
Hello from Child!

Thread 2.1 received signal SIGINT, Interrupt.
[Switching to Thread 515]
thread_function (arg=0x0) at //gdb_tests/multi-thread-fork.c:50
50        while (1); /* break here */
(gdb) info threads
  Id   Target Id                          Frame
* 2.1  Thread 515 (tid 76879955, running) thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)

0x0) at //gdb_tests/multi-thread-fork.c:50
(gdb)
Follow-fork-mode = parent, detach-on-fork = off

Reading symbols from //gdb_tests/multi-thread-fork...
(gdb) set detach-on-fork off
(gdb) r
Starting program: /gdb_tests/multi-thread-fork
[New Thread 258]
[New Thread 515]
[New inferior 2 (process 24117510)]
Hello from Parent!
[New inferior 3 (process 11338014)]
Hello from Parent!

Thread 1.1 received signal SIGINT, Interrupt.
0xd0611d70 in _p_nsleep () from /usr/lib/libpthread.a(_shr_xpg5.o)
(gdb) info threads
  Id   Target Id                          Frame
* 1.1  Thread 1 (tid 7870387, running)    0xd0611d70 in _p_nsleep () from /usr/lib/libpthread.a(_shr_xpg5.o)
  1.2  Thread 258 (tid 10098633, running) thread_function (arg=0x0) at //gdb_tests/multi-thread-fork.c:50
  1.3  Thread 515 (tid 9443317, running)  thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)

0x0) at //gdb_tests/multi-thread-fork.c:50
  2.1  Thread 515 (tid 8132439, running)  0xd0610df0 in _sigsetmask () from /usr/lib/libpthread.a(_shr_xpg5.o)
  3.1  Thread 258 (tid 77533225, running) 0xd0610df0 in _sigsetmask () from /usr/lib/libpthread.a(_shr_xpg5.o)
(gdb)


Follow-fork-mode = child, detach-on-fork = off

Reading symbols from //gdb_tests/multi-thread-fork...
(gdb) set detach-on-fork off
(gdb) set follow-fork-mode child
(gdb) r
Starting program: /gdb_tests/multi-thread-fork
[New Thread 258]
[New Thread 515]
[Attaching after Thread 515 fork to child process 11338016]
[New inferior 2 (process 11338016)]
Hello from Child!

Thread 2.1 received signal SIGINT, Interrupt.
[Switching to Thread 515]
thread_function (arg=0x0) at //gdb_tests/multi-thread-fork.c:50
50        while (1); /* break here */
(gdb) info threads
  Id   Target Id                          Frame
  1.1  Thread 1 (tid 113118883, running)  warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)
  1.2  Thread 258 (tid 9115441, sleeping) warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)
  1.3  Thread 515 (tid 7870389, running)  0xd0610df0 in _sigsetmask () from /usr/lib/libpthread.a(_shr_xpg5.o)
* 2.1  Thread 515 (tid 9443319, running)  thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)
0x0) at //gdb_tests/multi-thread-fork.c:50
(gdb)



From: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
Date: Wednesday, 22 November 2023 at 5:00 PM
To: gdb-patches@sourceware.org <gdb-patches@sourceware.org>, Aditya Kamath1 <Aditya.Kamath1@ibm.com>
Cc: Sangamesh Mallayya <sangamesh.swamy@in.ibm.com>
Subject: Re: [PATCH] Fix AIX thread NULL assertion failure during fork
Aditya Kamath1 <Aditya.Kamath1@ibm.com> wrote:

>So this was the issue. Thank you for suggesting this. I appreciate your knowledge
>to tell us this. Please see Program 1 and the GDB outputs pasted below this email
>for set detach-on-fork on/off and set follow-fork-mode=parent/child. This works
>in this test case and it looks like many other test cases as well are fixed from
>the GDB.base and GDB.threads test suite numbers mentioned below in a section.

Excellent, that's good to hear!

>Kindly let me know if this is okay.

I see one possible issue with the current patch - there is one
other existing caller to pd_activate in the ::wait routine:

      if (regcache_read_pc (regcache)
          - gdbarch_decr_pc_after_break (gdbarch) == data->pd_brk_addr)
        return pd_activate (ptid.pid ());
    }

  return pd_update (ptid.pid ());

This is the one place other than update_thread_list where you do
in fact need to call pd_update.  However, now that you've changed
pd_activate to no longer call pd_update, in the case where the
code above executes "return pd_activate", that call is now missing.

I think a straightforward fix might be to simply remove the "return"
and just fall through into the pd_update after pd_activate returned.

(Note that the return value of pd_activate should then no longer
be used anywhere and can be removed.)


Also, it looks like your're only updating the *current* inferior
in update_thread_list.  Other targets update the thread lists of
*all* inferiors.  E.g. have a look at the linux-thread-db.c
version of update_thread_list.


>Having said that we are missing some corner cases that I want to solve
>in a separate patch or the same depending on whatever you suggest.

Do the above change fix those corner cases?

Bye,
Ulrich

[-- Attachment #2: 0001-Fix-AIX-thr-NULL-assertion-failure-during-fork.patch --]
[-- Type: application/octet-stream, Size: 2734 bytes --]

From e87e4085af2f7d4a6f899e8ce05baae6a718395f Mon Sep 17 00:00:00 2001
From: Aditya Vidyadhar Kamath <Aditya.Kamath1@ibm.com>
Date: Wed, 22 Nov 2023 07:33:51 -0600
Subject: [PATCH] Fix AIX thr!= NULL assertion failure during fork.

In AIX, while we followed the child process and detach on fork was on we hit thr!= NULL assertion failure.

The reason for the same was GDB core trying to switch to a child thread with tid not set that does not
exist, since child's ptid was changed to ptid_t (pid, 0, tid) in sync_threadlists() as it was threaded.

The way this happened was when a new child process is born, its object file will be loaded, calling the new_objfile ()
in aix-thread.c file from clone_program_space, which is
called from within follow_fork_inferior. Therefore it end ups syncing threadlists via pd_update ().

This patch is a fix for the same where pd_update () is called in the wait () or in update_thread_list() hook only.
---
 gdb/aix-thread.c | 28 +++++++++++++++++++++++++---
 1 file changed, 25 insertions(+), 3 deletions(-)

diff --git a/gdb/aix-thread.c b/gdb/aix-thread.c
index 945b7f6c697..b777db5e06c 100644
--- a/gdb/aix-thread.c
+++ b/gdb/aix-thread.c
@@ -136,6 +136,8 @@ class aix_thread_target final : public target_ops
   const char *extra_thread_info (struct thread_info *) override;
 
   ptid_t get_ada_task_ptid (long lwp, ULONGEST thread) override;
+
+  void update_thread_list () override;
 };
 
 static aix_thread_target aix_thread_ops;
@@ -1015,7 +1017,7 @@ pd_update (pid_t pid)
    If successful and there exists and we can find an event thread, return a ptid
    for that thread.  Otherwise, return a ptid-only ptid using PID.  */
 
-static ptid_t
+static void 
 pd_activate (pid_t pid)
 {
   int status;
@@ -1030,9 +1032,29 @@ pd_activate (pid_t pid)
       return ptid_t (pid);
     }
   data->pd_active = 1;
-  return pd_update (pid);
+  return;
+}
+
+/* AIX implementation of update_thread_list.  */
+
+void
+aix_thread_target::update_thread_list ()
+{
+  struct thread_db_info *info;
+  struct aix_thread_variables *data;
+  for (inferior *inf : all_inferiors ())
+	{
+	  if (inf->pid == 0)
+		continue;
+
+	  data = get_aix_thread_variables_data (inf);
+
+	  pd_activate (inf->pid);
+	  pd_update (inf->pid);
+	}
 }
 
+
 /* An object file has just been loaded.  Check whether the current
    application is pthreaded, and if so, prepare for thread debugging.  */
 
@@ -1216,7 +1238,7 @@ aix_thread_target::wait (ptid_t ptid, struct target_waitstatus *status,
 
       if (regcache_read_pc (regcache)
 	  - gdbarch_decr_pc_after_break (gdbarch) == data->pd_brk_addr)
-	return pd_activate (ptid.pid ());
+		pd_activate (ptid.pid ());
     }
 
   return pd_update (ptid.pid ());
-- 
2.41.0


  reply	other threads:[~2023-11-22 14:14 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-20  7:25 Aditya Kamath1
2023-11-20 11:27 ` Ulrich Weigand
2023-11-21  7:30   ` Aditya Kamath1
2023-11-21 12:17     ` Ulrich Weigand
2023-11-22 10:48       ` Aditya Kamath1
2023-11-22 11:30         ` Ulrich Weigand
2023-11-22 13:58           ` Aditya Kamath1
2023-11-22 14:14             ` Aditya Kamath1 [this message]
2023-11-22 15:33               ` Ulrich Weigand
2023-11-22 16:22                 ` Aditya Kamath1
2023-11-22 18:30                   ` Ulrich Weigand
2023-11-23  6:06                     ` Aditya Kamath1

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CH2PR15MB35443BE24D90A9242DC7B8E0D6BAA@CH2PR15MB3544.namprd15.prod.outlook.com \
    --to=aditya.kamath1@ibm.com \
    --cc=Ulrich.Weigand@de.ibm.com \
    --cc=gdb-patches@sourceware.org \
    --cc=sangamesh.swamy@in.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).