public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
From: Aditya Kamath1 <Aditya.Kamath1@ibm.com>
To: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>,
	"gdb-patches@sourceware.org" <gdb-patches@sourceware.org>
Cc: Sangamesh Mallayya <sangamesh.swamy@in.ibm.com>
Subject: Re: [PATCH] Fix AIX thread NULL assertion failure during fork
Date: Wed, 22 Nov 2023 10:48:40 +0000	[thread overview]
Message-ID: <CH2PR15MB35444EC543B7C8830A4CB0B1D6BAA@CH2PR15MB3544.namprd15.prod.outlook.com> (raw)
In-Reply-To: <211a99648e5b2733d155a1b6e431cea61b006582.camel@de.ibm.com>


[-- Attachment #1.1: Type: text/plain, Size: 12181 bytes --]

Respected Ulrich and community members,

Thank you for your reply.

>Ah, I see. new_objfile is called from clone_program_space, which is
>called from within follow_fork_inferior.  If this changes the child's
>main thread ptid under the covers, then any future use of child_ptid
>in follow_fork_inferior / follow_fork will be incorrect.

>Instead, there is a new ::update_thread_list target hook that
>common code calls at appropriate times (when it expects ptids
>to change).  I think you need to look into removing the call to
>sync_threadlists from new_objfile and instead implement the
>update_thread_list hook and perform that call there.

>I believe you still need to perform the pd_enable / pd_activate
>tasks during new_objfile, but no longer automatically call
>pd_update.  The latter should be moved to update_thread_list.

Please find attached the patch. {See: 0001-Fix-AIX-thr-NULL-assertion-failure-during-fork.patch}.

So this was the issue. Thank you for suggesting this. I appreciate your knowledge to tell us this. Please see Program 1 and the GDB outputs pasted below this email for set detach-on-fork on/off and set follow-fork-mode=parent/child. This works in this test case and it looks like many other test cases as well are fixed from the GDB.base and GDB.threads test suite numbers mentioned below in a section.

Kindly let me know if this is okay.

Having said that we are missing some corner cases that I want to solve in a separate patch or the same depending on whatever you suggest. I get why we are moving pd_update () to update_thread_list () as it will take care of syncing thread lists whenever required.

Let us take one classic case of fork-print-inferior-event test case.
The code looks like
#include <stdlib.h>
#include <unistd.h>

int
main (int argc, char *argv[])
{
  pid_t child;

  child = fork ();
  switch (child)
    {
      case -1:
        abort ();
      case 0:
      default:
        break;
    }

  return 0;
}

Here if we observe the difference, Linux clearly shows inferior2 the child exited.




In Linux we get :-

gdb) set follow-fork-mode child
(gdb) r
Starting program: /home/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[Attaching after Thread 0x7ffff7d85740 (LWP 57) fork to child process 58]
[New inferior 2 (process 58)]
[Detaching after fork from parent process 57]
[Inferior 1 (process 57) detached]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[Inferior 2 (process 58) exited normally]

}

In AIX we get:-

Reading symbols from //gdb_tests/fork-print-inferior-events...
(gdb) set follow-fork-mode child
(gdb) r
Starting program: /gdb_tests/fork-print-inferior-events
[Attaching after process 22151436 fork to child process 23527898]
[New inferior 2 (process 23527898)]
[Detaching after fork from parent process 22151436]
[Inferior 1 (process 22151436) detached]

Thread 2.1 received signal SIGHUP, Hangup.
[Switching to process 23527898]
0xd028b3f0 in fork () from /usr/lib/libc.a(_shr.o)
(gdb)

-------------------------------------------------------------------------------------------------------------------------------------

But in AIX we are not able to catch this event currently. That is why this condition was added.
+  if (inferior_ptid.pid ())

If we do not add this condition, then get_thread_data_helper_for_ptid () will crash as pid is 0 with a pid!= NULL assertion since child exits.

Also, if we check the output of Program 1 below and the case where we follow-parent and detach-on-fork is off, the child processes are shown as process and not threads. But in Linux it is shown as threads when we use info threads command.

Only these two corner cases we have problems in AIX.

Kindly let me know where I went wrong in catching the exit event and the new child being converted to thread ptid event so that we can solve this for AIX and GDB. Somewhere my debug analysis went wrong while solving this issue. It would be of helpful to have better user experience though with the current patch , debugging works.

Have a nice day ahead.

Thanks and regards,
Aditya.

The test case numbers :-

before this patch GDB.base
=== gdb Summary ===
# of expected passes        28377
# of unexpected failures    2163
# of expected failures      15
# of known failures     10
# of unresolved testcases   154
# of untested testcases     79
# of unsupported tests      117
# of paths in test names    5
# of duplicate test names   4

After this patch GDB.base
=== gdb Summary ===
# of expected passes            28588
# of unexpected failures        2191
# of unexpected successes       1
# of expected failures          16
# of known failures             9
# of unresolved testcases       138
# of untested testcases         79
# of unsupported tests          116
# of paths in test names        5
# of duplicate test names       2

Before this patch GDB.threads
# of expected passes        1790
# of unexpected failures    610
# of expected failures      4
# of known failures     21
# of unresolved testcases   10
# of untested testcases     9
# of unsupported tests      82
# of paths in test names    3

After this patch GDB.thread
=== gdb Summary ===
# of expected passes            1871
# of unexpected failures        630
# of expected failures          4
# of known failures             21
# of unresolved testcases       2
# of untested testcases         9
# of unsupported tests          81
# of paths in test names        3

-------------------------------------------------------------------------------------------------------------------------

Program 1:-
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <pthread.h>
#include <assert.h>

pthread_barrier_t barrier;

#define NUM_THREADS 2

void *
thread_function (void *arg)
{
  /* This ensures that the breakpoint is only hit after both threads
     are created, so the test can always switch to the non-event
     thread when the breakpoint triggers.  */
  pthread_barrier_wait (&barrier);

  pid_t p;
    p = fork();
    if(p<0)
    {
      perror("fork fail");
      exit(1);
    }
    // child process because return value zero
    else if ( p == 0)
        printf("Hello from Child!\n");
    // parent process because return value non-zero.
    else
        printf("Hello from Parent!\n");
  while (1); /* break here */
}

int
main (void)
{
  int i;

  alarm (300);

  pthread_barrier_init (&barrier, NULL, NUM_THREADS);

  for (i = 0; i < NUM_THREADS; i++)
    {
      pthread_t thread;
      int res;

      res = pthread_create (&thread, NULL,
                            thread_function, NULL);
      assert (res == 0);
    }

  while (1)
    sleep (1);

  return 0;
}


Follow-fork = parent detach-on-fork = on

Reading symbols from //gdb_tests/multi-thread-fork...
(gdb) r
Starting program: /gdb_tests/multi-thread-fork
[New Thread 258]
[New Thread 515]
[Detaching after fork from child process 15663474]
Hello from Child!
[Detaching after fork from child process 15532296]
Hello from Child!
Hello from Parent!
Hello from Parent!

Thread 1 received signal SIGINT, Interrupt.
0xd0611d70 in _p_nsleep () from /usr/lib/libpthread.a(_shr_xpg5.o)
(gdb)

Follow-fork = child detach-on-fork = parent

eading symbols from //gdb_tests/multi-thread-fork...
(gdb) set follow-fork-mode child
(gdb) r
Starting program: /gdb_tests/multi-thread-fork
[New Thread 258]
[New Thread 515]
[Attaching after Thread 515 fork to child process 15532298]
[New inferior 2 (process 15532298)]
[Detaching after fork from parent process 22151424]
[Inferior 1 (process 22151424) detached]
Hello from Parent!
Hello from Parent!
Hello from Child!
Hello from Child!

Thread 2.1 received signal SIGINT, Interrupt.
[Switching to Thread 515]
thread_function (arg=0x0) at //gdb_tests/multi-thread-fork.c:50
50        while (1); /* break here */
(gdb)

Follow-fork = parent detach-on-fork = parent

Reading symbols from //gdb_tests/multi-thread-fork...
(gdb) set detach-on-fork off
(gdb) r
Starting program: /gdb_tests/multi-thread-fork
[New Thread 258]
[New Thread 515]
[New inferior 2 (process 23527894)]
[New inferior 3 (process 15663478)]
Hello from Parent!
Hello from Parent!

Thread 1.1 received signal SIGINT, Interrupt.
0xd0611d70 in _p_nsleep () from /usr/lib/libpthread.a(_shr_xpg5.o)
(gdb) info threads
  Id   Target Id                          Frame
* 1.1  Thread 1 (tid 30744339, running)   0xd0611d70 in _p_nsleep () from /usr/lib/libpthread.a(_shr_xpg5.o)
  1.2  Thread 258 (tid 39264247, running) thread_function (arg=0x0) at //gdb_tests/multi-thread-fork.c:50
  1.3  Thread 515 (tid 8984493, running)  thread_function (arg=warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)

0x0) at //gdb_tests/multi-thread-fork.c:50
  2.1  process 23527894                   0xd0610df0 in _sigsetmask () from /usr/lib/libpthread.a(_shr_xpg5.o)
  3.1  process 15663478                   0xd0610df0 in _sigsetmask () from /usr/lib/libpthread.a(_shr_xpg5.o)
(gdb)

Follow-fork = child detach-on-fork = parent

Reading symbols from //gdb_tests/multi-thread-fork...
(gdb) set follow-fork-mode child
(gdb) set detach-on-fork off
(gdb) r
Starting program: /gdb_tests/multi-thread-fork
[New Thread 258]
[New Thread 515]
[Attaching after Thread 515 fork to child process 15663480]
[New inferior 2 (process 15663480)]
Hello from Child!

Thread 2.1 received signal SIGINT, Interrupt.
[Switching to process 15663480]
thread_function (arg=0x0) at //gdb_tests/multi-thread-fork.c:50
50        while (1); /* break here */
(gdb)

From: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
Date: Tuesday, 21 November 2023 at 5:47 PM
To: gdb-patches@sourceware.org <gdb-patches@sourceware.org>, Aditya Kamath1 <Aditya.Kamath1@ibm.com>
Cc: Sangamesh Mallayya <sangamesh.swamy@in.ibm.com>
Subject: Re: [PATCH] Fix AIX thread NULL assertion failure during fork
Aditya Kamath1 <Aditya.Kamath1@ibm.com> wrote:

>>So how is it possible that in between wait () setting the child_ptid and infrun.c
>>using it to switch to the child, the child is becoming multi-threaded?  Where is
>>the sync_threadlists () call that makes this happen?
>
>>I think we should understand better how this could have happened.
>
>I’m sorry I missed an information to tell you. So the parent process is loaded it is
>multi-threaded, child is loaded and through wait we have informed that fork () event
>has happened and given the GDB core its required information.
>
>This child now will have its object file which will be loaded soon. So new_objfile ()
>is called which will inturn call pd_enable () and this function will call pd_activate ()
>then pd_update (), then sync_threadlists ().

Ah, I see. new_objfile is called from clone_program_space, which is
called from within follow_fork_inferior.  If this changes the child's
main thread ptid under the covers, then any future use of child_ptid
in follow_fork_inferior / follow_fork will be incorrect.

Now, the question is whether new_objfile should actually do that.
I believe that in current GDB, the answer is actually no.  While
in the past, platform-specific thread targets were indeed supposed
to do that, it seems this has been changed a while ago.  Today,
none of the other thread targets do that, as far as I can see.

Instead, there is a new ::update_thread_list target hook that
common code calls at appropriate times (when it expects ptids
to change).  I think you need to look into removing the call to
sync_threadlists from new_objfile and instead implement the
update_thread_list hook and perform that call there.

I believe you still need to perform the pd_enable / pd_activate
tasks during new_objfile, but no longer automatically call
pd_update.  The latter should be moved to update_thread_list.


Bye,
Ulrich


[-- Attachment #2: 0001-Fix-AIX-thr-NULL-assertion-failure-during-fork.patch --]
[-- Type: application/octet-stream, Size: 2099 bytes --]

From 03a6b2336bfd721fa217b089be51f0701c7334ad Mon Sep 17 00:00:00 2001
From: Aditya Vidyadhar Kamath <Aditya.Kamath1@ibm.com>
Date: Wed, 22 Nov 2023 04:03:17 -0600
Subject: [PATCH] Fix AIX thr!= NULL assertion failure during fork.

In AIX, while we followed the child process and detach on fork was on we hit thr!= NULL assertion failure.

The reason for the same was GDB core trying to switch to a child thread with tid not set that does not
exist, since child's ptid was changed to ptid_t (pid, 0, tid) in sync_threadlists() as it was threaded.

The way this happened was when a new child process is born, its object file will be loaded, calling the new_objfile ()
in aix-thread.c file from clone_program_space, which is
called from within follow_fork_inferior. Therefore it end ups syncing threadlists via pd_update ().

This patch is a fix for the same where pd_update () is called in the wait () or in update_thread_list() hook only.
---
 gdb/aix-thread.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/gdb/aix-thread.c b/gdb/aix-thread.c
index 945b7f6c697..717d0f2129c 100644
--- a/gdb/aix-thread.c
+++ b/gdb/aix-thread.c
@@ -136,6 +136,8 @@ class aix_thread_target final : public target_ops
   const char *extra_thread_info (struct thread_info *) override;
 
   ptid_t get_ada_task_ptid (long lwp, ULONGEST thread) override;
+
+  void update_thread_list () override;
 };
 
 static aix_thread_target aix_thread_ops;
@@ -1011,6 +1013,13 @@ pd_update (pid_t pid)
   return ptid;
 }
 
+void
+aix_thread_target::update_thread_list ()
+{
+  if (inferior_ptid.pid ())
+    pd_update (inferior_ptid.pid ());
+}
+
 /* Try to start debugging threads in the current process.
    If successful and there exists and we can find an event thread, return a ptid
    for that thread.  Otherwise, return a ptid-only ptid using PID.  */
@@ -1030,7 +1039,7 @@ pd_activate (pid_t pid)
       return ptid_t (pid);
     }
   data->pd_active = 1;
-  return pd_update (pid);
+  return ptid_t (pid);
 }
 
 /* An object file has just been loaded.  Check whether the current
-- 
2.41.0


  reply	other threads:[~2023-11-22 10:48 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-20  7:25 Aditya Kamath1
2023-11-20 11:27 ` Ulrich Weigand
2023-11-21  7:30   ` Aditya Kamath1
2023-11-21 12:17     ` Ulrich Weigand
2023-11-22 10:48       ` Aditya Kamath1 [this message]
2023-11-22 11:30         ` Ulrich Weigand
2023-11-22 13:58           ` Aditya Kamath1
2023-11-22 14:14             ` Aditya Kamath1
2023-11-22 15:33               ` Ulrich Weigand
2023-11-22 16:22                 ` Aditya Kamath1
2023-11-22 18:30                   ` Ulrich Weigand
2023-11-23  6:06                     ` Aditya Kamath1

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CH2PR15MB35444EC543B7C8830A4CB0B1D6BAA@CH2PR15MB3544.namprd15.prod.outlook.com \
    --to=aditya.kamath1@ibm.com \
    --cc=Ulrich.Weigand@de.ibm.com \
    --cc=gdb-patches@sourceware.org \
    --cc=sangamesh.swamy@in.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).