public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
* Re:Re:GDB often is blocked at async_file_flush
@ 2021-07-05 12:25 周春明(日月)
  2021-07-05 12:38 ` Pedro Alves
  0 siblings, 1 reply; 5+ messages in thread
From: 周春明(日月) @ 2021-07-05 12:25 UTC (permalink / raw)
  To: 周春明(日月),
	Simon Marchi, Gdb-patches, gdb-patches

Hi Simon,
I did more experiments today, basically I can confirm the reason of GDB stuck in that loop is the linux_nat_event_pipe[1] is closed while SIGCHLD happens.
They are asynchronize, so the issue is random.
But I still don't know exactly why linux_nat_event_pipe[1] is closed with unexpected.  Even sometimes SIGCHLD happens at same time as below line52: target_async(0);

36| void
37| inferior_event_handler (enum inferior_event_type event_type)
38| {
39|  switch (event_type)
40|  {
41|  case INF_REG_EVENT:
42|  fetch_inferior_event ();
43|  break;
44|
45|  case INF_EXEC_COMPLETE:
46|  if (!non_stop) 
47|  {
48|  /* Unregister the inferior from the event loop. This is done
49|   so that when the inferior is not running we don't get
50|   distracted by spurious inferior output. */
51|  if (target_has_execution && target_can_async_p ())
52+> target_async (0);
53|  }

-David
------------------------------------------------------------------
发件人:周春明(日月) <riyue.zcm@alibaba-inc.com>
发送时间:2021年7月5日(星期一) 13:30
收件人:Simon Marchi <simon.marchi@polymtl.ca>; Gdb-patches <gdb-patches-bounces+riyue.zcm=alibaba-inc.com@sourceware.org>; gdb-patches <gdb-patches@sourceware.org>
主 题:Re:Re:GDB often is blocked at async_file_flush



------------------------------------------------------------------
发件人:Simon Marchi <simon.marchi@polymtl.ca>
发送时间:2021年7月5日(星期一) 08:53
收件人:周春明(日月) <riyue.zcm@alibaba-inc.com>; Gdb-patches <gdb-patches-bounces+riyue.zcm=alibaba-inc.com@sourceware.org>; gdb-patches <gdb-patches@sourceware.org>
主 题:Re: Re:GDB often is blocked at async_file_flush

On 2021-07-04 8:13 p.m., 周春明(日月) wrote:
> Hi Simon,
> Thanks for reply.
> and yes, gdb is stuck in this loop:
>   do
>     {
>       ret = read (linux_nat_event_pipe[0], &buf, 1);
>     }
>   while (ret >= 0 || (ret == -1 && errno == EINTR));
> 
> The ret from read is always 0 when stuck happens. With my further debug in kernel pipe_read, this situation happens when pipe->writers is NULL.
> Because this is random issue, I compared with normal execution, the pipe->writers is not NULL and pipe->wait_writers is null, pipe_read will return -EAGAIN, then above loop exit normally.
> So do you know when pipe->writers would be NULL? sub-process is suspended?

Hmm, does that mean that the writer end of the pipe would be closed, but
not the read end?  I don't see how that can happen, as they are both
closed as a pair in linux_async_pipe, when enable is 0.

I tried the following test program, and indeed read returns 0:

    #include <unistd.h>
    #include <stdio.h>
    #include <fcntl.h>

    int main ()
    {
      int fds[2];
      pipe(fds);
      fcntl(fds[0], F_SETFL, O_NONBLOCK);
      fcntl(fds[1], F_SETFL, O_NONBLOCK);
      close(fds[1]);

      char c;
      int ret = read (fds[0], &c, 1);
      if (ret < 0)
 perror("read");

      printf("ret = %d\n", ret);
    }

When you have that infinite loop, what is the value of the two elements
of linux_nat_event_pipe?
[David] I tried this, when infinit loop happens, two elements of linux_nat_event_pipe are "
*****pipe[1]:12, pipe[0]:11",  do you know any other case will result in pipe[0]-read returns 0 except closing pipe[1]?
If you could share a reproducer for how to get to this state, it would
be useful.
[David] The project is our custom project for our asic, which isn't public yet. I also tried narrow down special case to reproduce it in common GDB, but failed.

-David 

Simon



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Re:Re:GDB often is blocked at async_file_flush
  2021-07-05 12:25 Re:Re:GDB often is blocked at async_file_flush 周春明(日月)
@ 2021-07-05 12:38 ` Pedro Alves
  2021-07-05 13:11   ` 回复:Re:Re:GDB " 周春明(日月)
  0 siblings, 1 reply; 5+ messages in thread
From: Pedro Alves @ 2021-07-05 12:38 UTC (permalink / raw)
  To: 周春明(日月),
	Simon Marchi, Gdb-patches, gdb-patches

On 2021-07-05 1:25 p.m., 周春明(日月) via Gdb-patches wrote:
> Hi Simon,
> I did more experiments today, basically I can confirm the reason of GDB stuck in that loop is the linux_nat_event_pipe[1] is closed while SIGCHLD happens.
> They are asynchronize, so the issue is random.
> But I still don't know exactly why linux_nat_event_pipe[1] is closed with unexpected.  Even sometimes SIGCHLD happens at same time as below line52: target_async(0);
> 
> 36| void
> 37| inferior_event_handler (enum inferior_event_type event_type)
> 38| {
> 39|  switch (event_type)
> 40|  {
> 41|  case INF_REG_EVENT:
> 42|  fetch_inferior_event ();
> 43|  break;
> 44|
> 45|  case INF_EXEC_COMPLETE:
> 46|  if (!non_stop) 
> 47|  {
> 48|  /* Unregister the inferior from the event loop. This is done
> 49|   so that when the inferior is not running we don't get
> 50|   distracted by spurious inferior output. */
> 51|  if (target_has_execution && target_can_async_p ())
> 52+> target_async (0);
> 53|  }

Note that linux_async_pipe closes the pipe with SIGCHLD blocked.  See
block_child_signals call.

I think you mentioned you're targeting some private architecture?
Could there be some bug at the libc or kernel level with blocking signals?
This is a glibc-based system?

Or maybe there's some stray code elsewhere in your GDB that closes
the wrong file descriptor?

I would maybe debug gdb and put a conditional breakpoint at "close" (conditional
on the file descriptor number), trying to catch where the pipe is closed.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* 回复:Re:Re:GDB often is blocked at async_file_flush
  2021-07-05 12:38 ` Pedro Alves
@ 2021-07-05 13:11   ` 周春明(日月)
  2021-07-05 13:48     ` Pedro Alves
  0 siblings, 1 reply; 5+ messages in thread
From: 周春明(日月) @ 2021-07-05 13:11 UTC (permalink / raw)
  To: Pedro Alves, Simon Marchi, Gdb-patches, gdb-patches







------------------------------------------------------------------
发件人:Pedro Alves <pedro@palves.net>
发送时间:2021年7月5日(星期一) 20:38
收件人:周春明(日月) <riyue.zcm@alibaba-inc.com>; Simon Marchi <simon.marchi@polymtl.ca>; Gdb-patches <gdb-patches-bounces+riyue.zcm=alibaba-inc.com@sourceware.org>; gdb-patches <gdb-patches@sourceware.org>
主 题:Re: Re:Re:GDB often is blocked at async_file_flush

On 2021-07-05 1:25 p.m., 周春明(日月) via Gdb-patches wrote:
> Hi Simon,
> I did more experiments today, basically I can confirm the reason of GDB stuck in that loop is the linux_nat_event_pipe[1] is closed while SIGCHLD happens.
> They are asynchronize, so the issue is random.
> But I still don't know exactly why linux_nat_event_pipe[1] is closed with unexpected.  Even sometimes SIGCHLD happens at same time as below line52: target_async(0);
> 
> 36| void
> 37| inferior_event_handler (enum inferior_event_type event_type)
> 38| {
> 39|  switch (event_type)
> 40|  {
> 41|  case INF_REG_EVENT:
> 42|  fetch_inferior_event ();
> 43|  break;
> 44|
> 45|  case INF_EXEC_COMPLETE:
> 46|  if (!non_stop) 
> 47|  {
> 48|  /* Unregister the inferior from the event loop. This is done
> 49|   so that when the inferior is not running we don't get
> 50|   distracted by spurious inferior output. */
> 51|  if (target_has_execution && target_can_async_p ())
> 52+> target_async (0);
> 53|  }

Note that linux_async_pipe closes the pipe with SIGCHLD blocked.  See
block_child_signals call.

[David] Not sure:
1. SIGCHLD comes, run into below handler:

staticvoid
sigchld_handler (int signo)
{
  int old_errno = errno;

  if (debug_linux_nat)
    gdb_stdlog->write_async_safe ("sigchld\n", sizeof ("sigchld\n") - 1);

  if (signo == SIGCHLD
      && linux_nat_event_pipe[0] != -1)
    async_file_mark (); /* Let the event loop know that there are
                           events to handle.  */

  errno = old_errno;
} 


When run to async_file_mark,  Not sure where close the linux_nat_event_pipe[1], suddenly.


I think you mentioned you're targeting some private architecture?
Could there be some bug at the libc or kernel level with blocking signals?
This is a glibc-based system?


[David]ldd (Ubuntu GLIBC 2.23-0ubuntu11.3) 2.23


Or maybe there's some stray code elsewhere in your GDB that closes
the wrong file descriptor?

I would maybe debug gdb and put a conditional breakpoint at "close" (conditional
on the file descriptor number), trying to catch where the pipe is closed.

[David] could you detail the conditional breakpoint?   I don't which variable should be used for close.  "b close if xxx==12" ?



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 回复:Re:Re:GDB often is blocked at async_file_flush
  2021-07-05 13:11   ` 回复:Re:Re:GDB " 周春明(日月)
@ 2021-07-05 13:48     ` Pedro Alves
  2021-07-05 14:06       ` 回复:回复:Re:Re:GDB " 周春明(日月)
  0 siblings, 1 reply; 5+ messages in thread
From: Pedro Alves @ 2021-07-05 13:48 UTC (permalink / raw)
  To: 周春明(日月),
	Simon Marchi, Gdb-patches, gdb-patches

On 2021-07-05 2:11 p.m., 周春明(日月) wrote:
> 
> 
> I would maybe debug gdb and put a conditional breakpoint at "close" (conditional
> on the file descriptor number), trying to catch where the pipe is closed.
> 
> [David] could you detail the conditional breakpoint?   I don't which variable should be used for close.  "b close if xxx==12" ?

The fd argument:

       int close(int fd);

b close if fd==12

^ permalink raw reply	[flat|nested] 5+ messages in thread

* 回复:回复:Re:Re:GDB often is blocked at async_file_flush
  2021-07-05 13:48     ` Pedro Alves
@ 2021-07-05 14:06       ` 周春明(日月)
  0 siblings, 0 replies; 5+ messages in thread
From: 周春明(日月) @ 2021-07-05 14:06 UTC (permalink / raw)
  To: Pedro Alves, Simon Marchi, Gdb-patches, gdb-patches

Hi Pedro and Simon,

Do you know how glibc handle ESRCH (no such process)?, the error is returned from ioctl. Glibc will close the process automatically when receive this error, and then send SIGCHLD to GDB parent process? 
I found every time the pipe[1] would be closed automatically when my ioctl returns ESRCH. Does this guess make sense? Or how to verify it?
Thanks!

-David
------------------------------------------------------------------
发件人:Pedro Alves <pedro@palves.net>
发送时间:2021年7月5日(星期一) 21:49
收件人:周春明(日月) <riyue.zcm@alibaba-inc.com>; Simon Marchi <simon.marchi@polymtl.ca>; Gdb-patches <gdb-patches-bounces+riyue.zcm=alibaba-inc.com@sourceware.org>; gdb-patches <gdb-patches@sourceware.org>
主 题:Re: 回复:Re:Re:GDB often is blocked at async_file_flush

On 2021-07-05 2:11 p.m., 周春明(日月) wrote:
> 
> 
> I would maybe debug gdb and put a conditional breakpoint at "close" (conditional
> on the file descriptor number), trying to catch where the pipe is closed.
> 
> [David] could you detail the conditional breakpoint?   I don't which variable should be used for close.  "b close if xxx==12" ?

The fd argument:

       int close(int fd);

b close if fd==12

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-07-05 14:06 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-05 12:25 Re:Re:GDB often is blocked at async_file_flush 周春明(日月)
2021-07-05 12:38 ` Pedro Alves
2021-07-05 13:11   ` 回复:Re:Re:GDB " 周春明(日月)
2021-07-05 13:48     ` Pedro Alves
2021-07-05 14:06       ` 回复:回复:Re:Re:GDB " 周春明(日月)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).