public inbox for gdb-prs@sourceware.org
help / color / mirror / Atom feed
* [Bug gdb/28942] New: Problem with breakpoint condition calling a function in multi-threaded program
@ 2022-03-03 19:40 simon.marchi at polymtl dot ca
  2022-03-04 11:15 ` [Bug gdb/28942] " aburgess at redhat dot com
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: simon.marchi at polymtl dot ca @ 2022-03-03 19:40 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=28942

            Bug ID: 28942
           Summary: Problem with breakpoint condition calling a function
                    in multi-threaded program
           Product: gdb
           Version: HEAD
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: gdb
          Assignee: unassigned at sourceware dot org
          Reporter: simon.marchi at polymtl dot ca
  Target Milestone: ---

This program:

---8<---
#include <pthread.h>
#include <unistd.h>

static void
function_that_segfaults (void)
{
  int *p = 0;
  *p = 1;
}

static void
break_here (void)
{}

static void *
thread_func (void *p)
{
  for (;;)
    sleep (1);
  return NULL;
}

static void *
thread_func2 (void *p)
{
  sleep (1);
  break_here ();
  return NULL;
}

int
main (void)
{
  pthread_t threads[10];
  pthread_create (&threads[0], NULL, thread_func, NULL);
  pthread_create (&threads[1], NULL, thread_func, NULL);
  pthread_create (&threads[2], NULL, thread_func, NULL);
  pthread_create (&threads[3], NULL, thread_func, NULL);
  pthread_create (&threads[5], NULL, thread_func, NULL);
  pthread_create (&threads[6], NULL, thread_func, NULL);
  pthread_create (&threads[4], NULL, thread_func2, NULL);
  sleep (60);
  return function_that_segfaults != 0;
}

--->8---


$ gcc test.c  -g3 -O0 -pthread
$ ./gdb -q -nx --data-directory=data-directory a.out -ex "b break_here if
function_that_segfaults()"
Reading symbols from a.out...
Breakpoint 1 at 0x11ae: file test.c, line 13.
(gdb) r
Starting program: /home/smarchi/build/binutils-gdb/gdb/a.out 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7d99700 (LWP 3567019)]
[New Thread 0x7ffff7598700 (LWP 3567020)]
[New Thread 0x7ffff6d97700 (LWP 3567021)]
[New Thread 0x7ffff6596700 (LWP 3567022)]
[New Thread 0x7ffff5d95700 (LWP 3567023)]
[New Thread 0x7ffff5594700 (LWP 3567024)]
[New Thread 0x7ffff4d93700 (LWP 3567025)]
Error in testing breakpoint condition:
Couldn't get registers: No such process.
An error occurred while in a function called from GDB.
Evaluation of the expression containing the function
(function_that_segfaults) will be abandoned.
When the function is done executing, GDB will silently stop.
Selected thread is running.
(gdb) 

The "Couldn't get registers: No such process." is very strange.  We expect GDB
to say that the thread received a signal (SIGSEGV) while running the
hand-called function.

And then if you continue with:

(gdb) kill                                                                      
Kill the program being debugged? (y or n) y
[Inferior 1 (process 3567034) killed]
(gdb) r                                                                         
Starting program: /home/smarchi/build/binutils-gdb/gdb/a.out                    
/home/smarchi/src/binutils-gdb/gdb/target.c:2607: internal-error: target_wait:
Assertion `!proc_target->commit_resumed_state' failed.                          
A problem internal to GDB has been detected,                                    
further debugging may prove unreliable.

Looking at the proceed call here:

(top-gdb) bt
#0  proceed (addr=0x555555555189, siggnal=GDB_SIGNAL_0) at
/home/smarchi/src/binutils-gdb/gdb/infrun.c:3046
#1  0x0000558e5d95a128 in run_inferior_call
(sm=std::unique_ptr<call_thread_fsm> = {...}, call_thread=0x61700009e680,
real_pc=0x555555555189) at /home/smarchi/src/binutils-gdb/gdb/infcall.c:610
#2  0x0000558e5d95ff6e in call_function_by_hand_dummy (function=0x611000489d00,
default_return_type=0x0, args=..., dummy_dtor=0x0, dummy_dtor_data=0x0) at
/home/smarchi/src/binutils-gdb/gdb/infcall.c:1279
#3  0x0000558e5d95b4be in call_function_by_hand (function=0x611000489d00,
default_return_type=0x0, args=...) at
/home/smarchi/src/binutils-gdb/gdb/infcall.c:741
#4  0x0000558e5d609a2e in evaluate_subexp_do_call (exp=0x6030001579f0,
noside=EVAL_NORMAL, callee=0x611000489d00, argvec=..., function_name=0x0,
default_return_type=0x0) at /home/smarchi/src/binutils-gdb/gdb/eval.c:674
#5  0x0000558e5d60a7c5 in expr::operation::evaluate_funcall
(this=0x603000157ab0, expect_type=0x0, exp=0x6030001579f0, noside=EVAL_NORMAL,
function_name=0x0, args=std::__debug::vector of length 0, capacity 0) at
/home/smarchi/src/binutils-gdb/gdb/eval.c:702
#6  0x0000558e5c4090aa in expr::operation::evaluate_funcall
(this=0x603000157ab0, expect_type=0x0, exp=0x6030001579f0, noside=EVAL_NORMAL,
args=std::__debug::vector of length 0, capacity 0) at
/home/smarchi/src/binutils-gdb/gdb/expression.h:136
#7  0x0000558e5d60ad63 in expr::var_value_operation::evaluate_funcall
(this=0x603000157ab0, expect_type=0x0, exp=0x6030001579f0, noside=EVAL_NORMAL,
args=std::__debug::vector of length 0, capacity 0) at
/home/smarchi/src/binutils-gdb/gdb/eval.c:714
#8  0x0000558e5cb8d2be in expr::funcall_operation::evaluate
(this=0x607000083f80, expect_type=0x0, exp=0x6030001579f0, noside=EVAL_NORMAL)
at /home/smarchi/src/binutils-gdb/gdb/expop.h:2178
#9  0x0000558e5d604e00 in expression::evaluate (During symbol reading: Child
DIE 0x8d876c and its abstract origin 0x8f9b2b have different parents
sthis=0x6030001579f0, expect_type=0x0, noside=EVAL_NORMAL) at
/home/smarchi/src/binutils-gdb/gdb/eval.c:101
#10 0x0000558e5d604f71 in evaluate_expression (exp=0x6030001579f0,
expect_type=0x0) at /home/smarchi/src/binutils-gdb/gdb/eval.c:115
#11 0x0000558e5c8c99b9 in breakpoint_cond_eval (exp=0x6030001579f0) at
/home/smarchi/src/binutils-gdb/gdb/breakpoint.c:4739
#12 0x0000558e5c8d1f11 in bpstat_check_breakpoint_conditions
(bs=0x6060001b29c0, thread=0x61700009e680) at
/home/smarchi/src/binutils-gdb/gdb/breakpoint.c:5303
#13 0x0000558e5c8d4b45 in bpstat_stop_status (aspace=0x603000045a00,
bp_addr=0x5555555551ae, thread=0x61700009e680, ws=...,
stop_chain=0x6060001b29c0) at
/home/smarchi/src/binutils-gdb/gdb/breakpoint.c:5475
#14 0x0000558e5da1f939 in handle_signal_stop (ecs=0x7fff97a4bd50) at
/home/smarchi/src/binutils-gdb/gdb/infrun.c:6200
#15 0x0000558e5da19441 in handle_inferior_event (ecs=0x7fff97a4bd50) at
/home/smarchi/src/binutils-gdb/gdb/infrun.c:5690
#16 0x0000558e5da05206 in fetch_inferior_event () at
/home/smarchi/src/binutils-gdb/gdb/infrun.c:4091
#17 0x0000558e5d94fad4 in inferior_event_handler (event_type=INF_REG_EVENT) at
/home/smarchi/src/binutils-gdb/gdb/inf-loop.c:41
#18 0x0000558e5dc29bdd in handle_target_event (error=0, client_data=0x0) at
/home/smarchi/src/binutils-gdb/gdb/linux-nat.c:4096
#19 0x0000558e5f4e4dd1 in handle_file_event (file_ptr=0x607000016050,
ready_mask=1) at /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:574
#20 0x0000558e5f4e562c in gdb_wait_for_event (block=0) at
/home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:700
#21 0x0000558e5f4e343c in gdb_do_one_event () at
/home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:212
#22 0x0000558e5dd29d99 in start_event_loop () at
/home/smarchi/src/binutils-gdb/gdb/main.c:421
#23 0x0000558e5dd2a1df in captured_command_loop () at
/home/smarchi/src/binutils-gdb/gdb/main.c:481
#24 0x0000558e5dd2fad9 in captured_main (data=0x7fff97a4c200) at
/home/smarchi/src/binutils-gdb/gdb/main.c:1348
#25 0x0000558e5dd2fbc2 in gdb_main (args=0x7fff97a4c200) at
/home/smarchi/src/binutils-gdb/gdb/main.c:1363
#26 0x0000558e5c3e1ddd in main (argc=7, argv=0x7fff97a4c378) at
/home/smarchi/src/binutils-gdb/gdb/gdb.c:32


We find that GDB tries to resume some other threads than the event thread (for
which we evaluate the breakpoint condition), because it thinks they are not
resumed. Probably because when the linux-nat target added them, they were added
in the non-resumed state and stayed this way.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug gdb/28942] Problem with breakpoint condition calling a function in multi-threaded program
  2022-03-03 19:40 [Bug gdb/28942] New: Problem with breakpoint condition calling a function in multi-threaded program simon.marchi at polymtl dot ca
@ 2022-03-04 11:15 ` aburgess at redhat dot com
  2022-03-04 14:01 ` aburgess at redhat dot com
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: aburgess at redhat dot com @ 2022-03-04 11:15 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=28942

Andrew Burgess <aburgess at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
                 CC|                            |aburgess at redhat dot com
   Last reconfirmed|                            |2022-03-04
     Ever confirmed|0                           |1

--- Comment #1 from Andrew Burgess <aburgess at redhat dot com> ---
Wow, it's a small world.  I literally  just started looking at this same issue
this week.

The whole thread not marked resumed issue is fixed by this excellent patch:

  https://sourceware.org/pipermail/gdb-patches/2022-January/185109.html

Which you know as you already posted a link to this bug to that thread.

However, there are so many other problem related to this issue.

The first thing I noticed is that run_inferior_call calls clear_proceed_status,
which in all-stop mode calls clear_proceed_status_thread for each thread.

Once the above patch is merged I plan to add an assert to
clear_proceed_status_thread that the thread we are clearing is not resumed and
not executing.

Currently the not-executing assert will fail, but (due to the above patch being
missing) the not-resumed assert will only fail sometimes.

If we ignore the clear_proceed_status issue, then with the above patch the
resumed flag will be correct, and GDB will not try to start the already resumed
threads as part of the inferior call.

However, after the call, as we're in all-stop mode, GDB will stop all threads.

However, if the breakpoint condition doesn't segfault, but instead just returns
false, then GDB will resume the single thread that stopped for the breakpoint -
leaving all the other threads stopped.

I'm currently working on the idea that when we evaluate the breakpoint
condition we temporarily place GDB into non-stop mode, this would mean that,
when we evaluate the b/p condition we only restart the one thread, and
afterwards, we only expect the one thread to stop, but I need to do lots more
testing yet - maybe this is a really bad idea.

The only other option I can think of is to somehow have the infcall code figure
out that we are in all-stop mode, but some threads are already running.  Then,
after making the inferior call we only stop the set of threads that we started.
 However, this has a massive problem; how to handle new threads?

I'll clean up my correct patch and post it to this bug later today in case
anyone wants to try it.  I'll also add your crashing function test to my
working branch to make sure that is handled too.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug gdb/28942] Problem with breakpoint condition calling a function in multi-threaded program
  2022-03-03 19:40 [Bug gdb/28942] New: Problem with breakpoint condition calling a function in multi-threaded program simon.marchi at polymtl dot ca
  2022-03-04 11:15 ` [Bug gdb/28942] " aburgess at redhat dot com
@ 2022-03-04 14:01 ` aburgess at redhat dot com
  2022-03-04 14:44 ` simark at simark dot ca
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: aburgess at redhat dot com @ 2022-03-04 14:01 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=28942

--- Comment #2 from Andrew Burgess <aburgess at redhat dot com> ---
Created attachment 14005
  --> https://sourceware.org/bugzilla/attachment.cgi?id=14005&action=edit
A WIP patch

Here's the patch I'm currently working on.  This should apply to current master
and resolves the issue in this bug, as well as the original issue I was working
on.  I've run the complete testsuite on GNU/Linux x86-64 with no regressions.

I still need to do lots more testing, especially around things like handling
targets that don't support non-stop mode, and what happens if some other thread
stops while we are evaluating the breakpoint condition.

But any initial thoughts are welcome.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug gdb/28942] Problem with breakpoint condition calling a function in multi-threaded program
  2022-03-03 19:40 [Bug gdb/28942] New: Problem with breakpoint condition calling a function in multi-threaded program simon.marchi at polymtl dot ca
  2022-03-04 11:15 ` [Bug gdb/28942] " aburgess at redhat dot com
  2022-03-04 14:01 ` aburgess at redhat dot com
@ 2022-03-04 14:44 ` simark at simark dot ca
  2022-03-07  7:34 ` tankut.baris.aktemur at intel dot com
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: simark at simark dot ca @ 2022-03-04 14:44 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=28942

Simon Marchi <simark at simark dot ca> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |simark at simark dot ca

--- Comment #3 from Simon Marchi <simark at simark dot ca> ---
(In reply to Andrew Burgess from comment #1)
> Wow, it's a small world.  I literally  just started looking at this same
> issue this week.
> 
> The whole thread not marked resumed issue is fixed by this excellent patch:
> 
>   https://sourceware.org/pipermail/gdb-patches/2022-January/185109.html
> 
> Which you know as you already posted a link to this bug to that thread.
> 
> However, there are so many other problem related to this issue.
> 
> The first thing I noticed is that run_inferior_call calls
> clear_proceed_status, which in all-stop mode calls
> clear_proceed_status_thread for each thread.
> 
> Once the above patch is merged I plan to add an assert to
> clear_proceed_status_thread that the thread we are clearing is not resumed
> and not executing.
> 
> Currently the not-executing assert will fail, but (due to the above patch
> being missing) the not-resumed assert will only fail sometimes.
> 
> If we ignore the clear_proceed_status issue, then with the above patch the
> resumed flag will be correct, and GDB will not try to start the already
> resumed threads as part of the inferior call.
> 
> However, after the call, as we're in all-stop mode, GDB will stop all
> threads.
> 
> However, if the breakpoint condition doesn't segfault, but instead just
> returns false, then GDB will resume the single thread that stopped for the
> breakpoint - leaving all the other threads stopped.

Yeah, the fact that the breakpoint condition function caused a segfault is just
another difficulty on top.  You can ignore that part.

> I'm currently working on the idea that when we evaluate the breakpoint
> condition we temporarily place GDB into non-stop mode, this would mean that,
> when we evaluate the b/p condition we only restart the one thread, and
> afterwards, we only expect the one thread to stop, but I need to do lots
> more testing yet - maybe this is a really bad idea.
> 
> The only other option I can think of is to somehow have the infcall code
> figure out that we are in all-stop mode, but some threads are already
> running.  Then, after making the inferior call we only stop the set of
> threads that we started.  However, this has a massive problem; how to handle
> new threads?

When thinking about this, my intuition was more like the later.

In all-stop over a non-stop target:

1. A thread hits a breakpoint, only that thread is stopped while we process the
breakpoint hit
2. When doing the infcall in the breakpoint condition, only that thread is
resumed (the other threads already are)
3. When the infcall is done, only that thread is stopped
4a. If the condition is true, then GDB stops all threads
4b. if the condition is false, that thread is resumed

In all-stop over an all-stop target:

1. A thread hits a breakpoint, all threads are stopped while we process the
breakpoint hit
2. When doing the infcall in the breakpoint condition, all threads are resumed
(is this what would happen if the user were to do a manual infcall?)
3. When the infcall is done, all threads are stopped
4a. If the condition is true, all threads remain stopped
4b. If the condition is false, all threads are resumed

In non-stop over a non-stop target, then it looks like
"all-stop-on-top-of-non-stop", except that not all threads are stopped in step
4a.

I didn't really think through what would happen to new threads, I suppose they
would just keep running.

> 
> I'll clean up my correct patch and post it to this bug later today in case
> anyone wants to try it.  I'll also add your crashing function test to my
> working branch to make sure that is handled too.

Thanks, that's some really quick customer service.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug gdb/28942] Problem with breakpoint condition calling a function in multi-threaded program
  2022-03-03 19:40 [Bug gdb/28942] New: Problem with breakpoint condition calling a function in multi-threaded program simon.marchi at polymtl dot ca
                   ` (2 preceding siblings ...)
  2022-03-04 14:44 ` simark at simark dot ca
@ 2022-03-07  7:34 ` tankut.baris.aktemur at intel dot com
  2022-10-21 17:57 ` tromey at sourceware dot org
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: tankut.baris.aktemur at intel dot com @ 2022-03-07  7:34 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=28942

Baris Aktemur <tankut.baris.aktemur at intel dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tankut.baris.aktemur@intel.
                   |                            |com

--- Comment #4 from Baris Aktemur <tankut.baris.aktemur at intel dot com> ---
A highly-related patch series was this:

  https://sourceware.org/pipermail/gdb-patches/2021-March/176654.html

Perhaps there are a few useful things that still apply to the current master.

> In all-stop over an all-stop target:
>
> 1. A thread hits a breakpoint, all threads are stopped while we process
> the breakpoint hit
> 2. When doing the infcall in the breakpoint condition, all threads are
> resumed (is this what would happen if the user were to do a manual infcall?)

I think GDB should act like the "scheduler-locking on" mode in this case,
because if another thread has a pending event, the condition evaluation
could be dismissed.  This is what distinguishes an infcall in condition
evaluation from a manual infcall.  The series linked above introduced an
`in_cond_eval` flag to make this distinction.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug gdb/28942] Problem with breakpoint condition calling a function in multi-threaded program
  2022-03-03 19:40 [Bug gdb/28942] New: Problem with breakpoint condition calling a function in multi-threaded program simon.marchi at polymtl dot ca
                   ` (3 preceding siblings ...)
  2022-03-07  7:34 ` tankut.baris.aktemur at intel dot com
@ 2022-10-21 17:57 ` tromey at sourceware dot org
  2022-10-21 17:57 ` tromey at sourceware dot org
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: tromey at sourceware dot org @ 2022-10-21 17:57 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=28942

Tom Tromey <tromey at sourceware dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tromey at sourceware dot org

--- Comment #5 from Tom Tromey <tromey at sourceware dot org> ---
https://sourceware.org/pipermail/gdb-patches/2022-October/192926.html

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug gdb/28942] Problem with breakpoint condition calling a function in multi-threaded program
  2022-03-03 19:40 [Bug gdb/28942] New: Problem with breakpoint condition calling a function in multi-threaded program simon.marchi at polymtl dot ca
                   ` (4 preceding siblings ...)
  2022-10-21 17:57 ` tromey at sourceware dot org
@ 2022-10-21 17:57 ` tromey at sourceware dot org
  2022-10-21 17:58 ` tromey at sourceware dot org
  2024-03-25 17:40 ` cvs-commit at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: tromey at sourceware dot org @ 2022-10-21 17:57 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=28942

Tom Tromey <tromey at sourceware dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mingwei.zhang at intel dot com

--- Comment #6 from Tom Tromey <tromey at sourceware dot org> ---
*** Bug 23191 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug gdb/28942] Problem with breakpoint condition calling a function in multi-threaded program
  2022-03-03 19:40 [Bug gdb/28942] New: Problem with breakpoint condition calling a function in multi-threaded program simon.marchi at polymtl dot ca
                   ` (5 preceding siblings ...)
  2022-10-21 17:57 ` tromey at sourceware dot org
@ 2022-10-21 17:58 ` tromey at sourceware dot org
  2024-03-25 17:40 ` cvs-commit at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: tromey at sourceware dot org @ 2022-10-21 17:58 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=28942

Tom Tromey <tromey at sourceware dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ppluzhnikov at google dot com

--- Comment #7 from Tom Tromey <tromey at sourceware dot org> ---
*** Bug 28911 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug gdb/28942] Problem with breakpoint condition calling a function in multi-threaded program
  2022-03-03 19:40 [Bug gdb/28942] New: Problem with breakpoint condition calling a function in multi-threaded program simon.marchi at polymtl dot ca
                   ` (6 preceding siblings ...)
  2022-10-21 17:58 ` tromey at sourceware dot org
@ 2024-03-25 17:40 ` cvs-commit at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-03-25 17:40 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=28942

--- Comment #8 from Sourceware Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Andrew Burgess <aburgess@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=3df7843699ff3610f89ac880685396b531d8ec1b

commit 3df7843699ff3610f89ac880685396b531d8ec1b
Author: Andrew Burgess <aburgess@redhat.com>
Date:   Fri Oct 9 13:27:13 2020 +0200

    gdb: fix b/p conditions with infcalls in multi-threaded inferiors

    This commit fixes bug PR 28942, that is, creating a conditional
    breakpoint in a multi-threaded inferior, where the breakpoint
    condition includes an inferior function call.

    Currently, when a user tries to create such a breakpoint, then GDB
    will fail with:

      (gdb) break infcall-from-bp-cond-single.c:61 if (return_true ())
      Breakpoint 2 at 0x4011fa: file
/tmp/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/infcall-from-bp-cond-single.c,
line 61.
      (gdb) continue
      Continuing.
      [New Thread 0x7ffff7c5d700 (LWP 2460150)]
      [New Thread 0x7ffff745c700 (LWP 2460151)]
      [New Thread 0x7ffff6c5b700 (LWP 2460152)]
      [New Thread 0x7ffff645a700 (LWP 2460153)]
      [New Thread 0x7ffff5c59700 (LWP 2460154)]
      Error in testing breakpoint condition:
      Couldn't get registers: No such process.
      An error occurred while in a function called from GDB.
      Evaluation of the expression containing the function
      (return_true) will be abandoned.
      When the function is done executing, GDB will silently stop.
      Selected thread is running.
      (gdb)

    Or, in some cases, like this:

      (gdb) break infcall-from-bp-cond-simple.c:56 if (is_matching_tid (arg,
1))
      Breakpoint 2 at 0x401194: file
/tmp/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/infcall-from-bp-cond-simple.c,
line 56.
      (gdb) continue
      Continuing.
      [New Thread 0x7ffff7c5d700 (LWP 2461106)]
      [New Thread 0x7ffff745c700 (LWP 2461107)]
      ../../src.release/gdb/nat/x86-linux-dregs.c:146: internal-error:
x86_linux_update_debug_registers: Assertion `lwp_is_stopped (lwp)' failed.
      A problem internal to GDB has been detected,
      further debugging may prove unreliable.

    The precise error depends on the exact thread state; so there's race
    conditions depending on which threads have fully started, and which
    have not.  But the underlying problem is always the same; when GDB
    tries to execute the inferior function call from within the breakpoint
    condition, GDB will, incorrectly, try to resume threads that are
    already running - GDB doesn't realise that some threads might already
    be running.

    The solution proposed in this patch requires an additional member
    variable thread_info::in_cond_eval.  This flag is set to true (in
    breakpoint.c) when GDB is evaluating a breakpoint condition.

    In user_visible_resume_ptid (infrun.c), when the in_cond_eval flag is
    true, then GDB will only try to resume the current thread, that is,
    the thread for which the breakpoint condition is being evaluated.
    This solves the problem of GDB trying to resume threads that are
    already running.

    The next problem is that inferior function calls are assumed to be
    synchronous, that is, GDB doesn't expect to start an inferior function
    call in thread #1, then receive a stop from thread #2 for some other,
    unrelated reason.  To prevent GDB responding to an event from another
    thread, we update fetch_inferior_event and do_target_wait in infrun.c,
    so that, when an inferior function call (on behalf of a breakpoint
    condition) is in progress, we only wait for events from the current
    thread (the one evaluating the condition).

    In do_target_wait I had to change the inferior_matches lambda
    function, which is used to select which inferior to wait on.
    Previously the logic was this:

       auto inferior_matches = [&wait_ptid] (inferior *inf)
         {
           return (inf->process_target () != nullptr
                   && ptid_t (inf->pid).matches (wait_ptid));
         };

    This compares the pid of the inferior against the complete ptid we
    want to wait on.  Before this commit wait_ptid was only ever
    minus_one_ptid (which is special, and means any process), and so every
    inferior would match.

    After this commit though wait_ptid might represent a specific thread
    in a specific inferior.  If we compare the pid of the inferior to a
    specific ptid then these will not match.  The fix is to compare
    against the pid extracted from the wait_ptid, not against the complete
    wait_ptid itself.

    In fetch_inferior_event, after receiving the event, we only want to
    stop all the other threads, and call inferior_event_handler with
    INF_EXEC_COMPLETE, if we are not evaluating a conditional breakpoint.
    If we are, then all the other threads should be left doing whatever
    they were before.  The inferior_event_handler call will be performed
    once the breakpoint condition has finished being evaluated, and GDB
    decides to stop or not.

    The final problem that needs solving relates to GDB's commit-resume
    mechanism, which allows GDB to collect resume requests into a single
    packet in order to reduce traffic to a remote target.

    The problem is that the commit-resume mechanism will not send any
    resume requests for an inferior if there are already events pending on
    the GDB side.

    Imagine an inferior with two threads.  Both threads hit a breakpoint,
    maybe the same conditional breakpoint.  At this point there are two
    pending events, one for each thread.

    GDB selects one of the events and spots that this is a conditional
    breakpoint, GDB evaluates the condition.

    The condition includes an inferior function call, so GDB sets up for
    the call and resumes the one thread, the resume request is added to
    the commit-resume queue.

    When the commit-resume queue is committed GDB sees that there is a
    pending event from another thread, and so doesn't send any resume
    requests to the actual target, GDB is assuming that when we wait we
    will select the event from the other thread.

    However, as this is an inferior function call for a condition
    evaluation, we will not select the event from the other thread, we
    only care about events from the thread that is evaluating the
    condition - and the resume for this thread was never sent to the
    target.

    And so, GDB hangs, waiting for an event from a thread that was never
    fully resumed.

    To fix this issue I have added the concept of "forcing" the
    commit-resume queue.  When enabling commit resume, if the force flag
    is true, then any resumes will be committed to the target, even if
    there are other threads with pending events.

    A note on authorship: this patch was based on some work done by
    Natalia Saiapova and Tankut Baris Aktemur from Intel[1].  I have made
    some changes to their work in this version.

    Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28942

    [1] https://sourceware.org/pipermail/gdb-patches/2020-October/172454.html

    Co-authored-by: Natalia Saiapova <natalia.saiapova@intel.com>
    Co-authored-by: Tankut Baris Aktemur <tankut.baris.aktemur@intel.com>
    Reviewed-By: Tankut Baris Aktemur <tankut.baris.aktemur@intel.com>
    Tested-By: Luis Machado <luis.machado@arm.com>
    Tested-By: Keith Seitz <keiths@redhat.com>

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-03-25 17:41 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-03 19:40 [Bug gdb/28942] New: Problem with breakpoint condition calling a function in multi-threaded program simon.marchi at polymtl dot ca
2022-03-04 11:15 ` [Bug gdb/28942] " aburgess at redhat dot com
2022-03-04 14:01 ` aburgess at redhat dot com
2022-03-04 14:44 ` simark at simark dot ca
2022-03-07  7:34 ` tankut.baris.aktemur at intel dot com
2022-10-21 17:57 ` tromey at sourceware dot org
2022-10-21 17:57 ` tromey at sourceware dot org
2022-10-21 17:58 ` tromey at sourceware dot org
2024-03-25 17:40 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).