Hi Zhiyong,

I looked at the backtrace that you provided and see that maybe_hw_step()
is being called from linux_process_target::resume_stopped_resumed_lwps,
which is the one location where I wasn't able to convince myself that
the assert should hold.

I was running your test case executable (osm) as an unprivileged user,
so neither the syslog calls nor the sudo were working.  (Sudo could
perhaps work, but it wanted to prompt for a password and stdin and
stdout were closed.)  I've since modified it so that sudo isn't used
and I'm using 'fprintf(stderr, ...)' instead of syslog - which is how
I discovered that sudo wasn't working.  I've tried next'ing quite a
lot, but so far I haven't reproduced the bug.  (Hopefully, the sudo
isn't required to reproduce the problem.)

If you manage to reproduce the bug on a Raspberry Pi 4 (and tell me
how to do it), that'd be great!

So, what I'm doing, using three separate terminals, in an attempt to
reproduce the bug is:

1) Run osm in terminal 1.  (I didn't want to mess with systemd.)  Once
I start running it, I see a bunch of messages from the dd command.

2) In terminal 2, I run:

   /path/to/gdbserver --debug --debug-format=all --remote-debug --event-loop-debug --once --attach :1234 $(pgrep osm)

3) In terminal 3, I run:

   /path/to/gdb osm -x ./gdbx2

(I've changed the target remote command in gdbx2 to refer to localhost.)

I'm also attaching my hacked lupdated.c.  If you see anything wrong
with what I'm trying, please let me know.

Kevin

On Mon, 24 Jul 2023 13:36:24 +0000
"Yan, Zhiyong" <Zhiyong.Yan@windriver.com> wrote:

> Hi Kevin,
>     The callstack of assert is attached.
>     Please see attached gdbx2 which add more 'n' commands, on arm platform, keep execute 'n' command, this test case can trigger assert error.
> 
>     Today, I didn't finish setting up test environments on RaspBerry Pi4. Before I produced this issue on Xilinx arm platform.
> 
> Best Regards.
> Zhiyong
> 
> -----Original Message-----
> From: Kevin Buettner <kevinb@redhat.com> 
> Sent: Saturday, July 22, 2023 4:50 AM
> To: Yan, Zhiyong <Zhiyong.Yan@windriver.com>
> Cc: gdb-patches@sourceware.org; luis.machado@arm.com; tom@tromey.com
> Subject: Re: [PATCH] gdbserver: Install single-step breakpoint for a pending thread whose last_resume_kind is resume_step
> 
> CAUTION: This email comes from a non Wind River email account!
> Do not click links or open attachments unless you recognize the sender and know the content is safe.
> 
> Hi Zhiyong,
> 
> I set up a Raspberry Pi running a recent 32-bit Raspberry Pi OS so that I could test your patch.  I was able to build and run your test case, but I could not reproduce the bug on the Pi.
> 
> I tested gdb.threads/*.exp using --target_board=native-gdbserver both with and without your patch.  Some of these tests are racy, but my conclusion from just looking at the PASSes and FAILs (after many test
> runs) is that there are no regressions.
> 
> But then I remembered to enable core dumps on the Pi and after running gdb.threads/pending-fork-event-detach/pending-fork-event-detach-main-vfork
> by itself, I saw that it left a core file...
> 
> $ make check RUNTESTFLAGS="--target_board=native-gdbserver" TESTS=gdb.threads/pending-fork-event-detach.exp
> ...
>                 === gdb Summary ===
> 
> # of unexpected core files      1
> # of expected passes            240
> 
> The core file was from the running test case, not gdbserver, nor gdb.
> 
> Looking at the core file in GDB shows...
> 
> Program terminated with signal SIGTRAP, Trace/breakpoint trap.
> #0  0x00010624 in break_here () at /mesquite2/sourceware-git/rpi-gdbserver/bld/../../worktree-gdbserver/gdb/testsuite/gdb.threads/pending-fork-event-detach.c:29
> 29        x++;
> [Current thread is 1 (Thread 0xf7e10440 (LWP 4835))]
> (gdb) x/i $pc
> => 0x10624 <break_here+12>:     udf     #16  
> (gdb) x/x $pc
> 0x10624 <break_here+12>:        0xe7f001f0
> 
> ...and in gdbserver/linux-aarch32-low.cc:
> 
> #define arm_eabi_breakpoint 0xe7f001f0UL
> 
> I think what's happened here is that the breakpoint added by your patch is left in place when GDB detaches the test case.  When it starts running again, it hits the software single step breakpoint and, since it's no longer under GDB control, it dies with a SIGTRAP.
> 
> This core file is not created when I run the test using a gdbserver without your patch.
> 
> I'm suspicious of the assert in linux_process_target::maybe_hw_step.
> Currently, it looks like this:
> 
> bool
> linux_process_target::maybe_hw_step (thread_info *thread) {
>   if (supports_hardware_single_step ())
>     return true;
>   else
>     {
>       /* GDBserver must insert single-step breakpoint for software
>          single step.  */
>       gdb_assert (has_single_step_breakpoints (thread));
>       return false;
>     }
> }
> 
> But, when Yao Qi introduced it back in June, 2016, it looked like
> this:
> 
> static int
> maybe_hw_step (struct thread_info *thread) {
>   if (can_hardware_single_step ())
>     return 1;
>   else
>     {
>       struct process_info *proc = get_thread_process (thread);
> 
>       /* GDBserver must insert reinsert breakpoint for software
>      single step.  */
>       gdb_assert (has_reinsert_breakpoints (proc));
>       return 0;
>     }
> }
> 
> So, back is 2016, when it was introduced, it's clear that the assert was referring to breakpoints which needed to be reinserted.  Now, that's not at all obvious.
> 
> Also, back in 2016, maybe_hw_step() was only called from two locations; in each case it was in a block in which the condition
> lwp->bp_reinsert != 0 was true.  But now there are two other
> calls; in one case, the software single step breakpoints have just been inserted, so that should be okay, but for the other case, in linux_process_target::resume_stopped_resumed_lwps, I'm less certain.
> 
> In any case, could you comment out (or delete) the assert in a version of the source without your patch and let me know what happens?
> 
> Also, if possible, I'd like to see a backtrace from where the assert occurs so that I can see which call to maybe_hw_step is responsible for triggering the failing assert.
> 
> Kevin
>