public inbox for gdb-prs@sourceware.org
help / color / mirror / Atom feed
From: "thiago.bauermann at linaro dot org" <sourceware-bugzilla@sourceware.org>
To: gdb-prs@sourceware.org
Subject: [Bug testsuite/31312] attach-many-short-lived-threads gives inconsistent results
Date: Tue, 19 Mar 2024 19:10:16 +0000	[thread overview]
Message-ID: <bug-31312-4717-FDwHluC5iL@http.sourceware.org/bugzilla/> (raw)
In-Reply-To: <bug-31312-4717@http.sourceware.org/bugzilla/>

https://sourceware.org/bugzilla/show_bug.cgi?id=31312

--- Comment #23 from Thiago Jung Bauermann <thiago.bauermann at linaro dot org> ---
(In reply to Thiago Jung Bauermann from comment #19)
> 1. The issue I mentioned in comment #17 (which I have since confirmed is
>    what is going on), where the linux_proc_attach_tgid_threads () never
>    ends when there are zombie threads present in the inferior. Since
>    attach-many-short-lived-threads.c constantly creates and finishes
>    joinable threads, the chance of having zombie threads is high.
> 
>    From looking at the gdb.log files Carl provided, I believe he is
>    seeing the same problem.
> 
>    The solution is to make GDB remember when it has already visited the
>    /proc directory of a given LWP, and skip it in the following iterations.
>    I implemented the attached patch to do that, and now I don't observe GDB
>    hanging anymore in the aarch64-linux server in which I used to easily
>    reproduce this problem. If Carl could test it on POWER10, it would be
>    helpful. I'll clean up the code and post it on the mailing list.

From looking at the Power 10 gdb.log files attached to this bugzilla, and
also Carl's results with my proposed fix I believe this bugzilla is
specifically about the issue described above.

> 2. Behaviour 2 which I described in comment #12. I'll repeat it here for

Sorry, I referenced the wrong comment. It's actually comment #16.

>    completeness:
> 
>    (gdb) attach 2039552
>    Attaching to process 2039552
>    Cannot attach to lwp 2689792: Operation not permitted (1), process
>    2689792 is already traced by process 2039527
> 
>    PID 2039552 is the testcase inferior, and 2039527 is GDB. GDB didn't
>    report any success in attaching to the process.
> 
>    This is very rarely observed on my test system. I saw it only 3 times in
>    thousands of testcase runs. I wasn't able to investigate it yet.
> 
>    I'll open a separate bugzilla about this.

I didn't find any existing bugzilla about this problem, so I opened
bug #31512 about it, and pasted there Tom Tromey's suggestion from
comment #18 about modifying the testcase to generate an strace log file
(thanks for the suggestion).

> 3. This one isn't a bug, but an issue that arises from the way
>    attach-many-short-lived-threads.c behaves: since it's constantly
>    creating new threads it's impossible for GDB to know when it has
>    attached to all of them so that it can finish the loop in
>    linux_proc_attach_tgid_threads (). Because of this, even with the fix
>    for issue #1 applied, the testcase fails once in a while — I left the
>    test running in a loop overnight and it failed after about 2500
>    iterations.

There is already bug #26286 about this issue, so I updated it with the
results reported here, and my understanding of the problem.

>    The only way I can see to improve GDB's behaviour is to increase the
>    number of iterations of the loop that checks for new threads. I suspect
>    that the ability of the inferior to create new threads is proportional
>    to the number of CPUs present in the system (my test machine has 160
>    cores), so I will propose a patch that makes the number of iterations
>    proportinal to the number of CPUs.

As I mentioned in bug #26286, I've changed my mind about making the number
of iterations proportional to the number of CPUs, because on the machines I
have at hand, the one where it takes longest to reproduce the problem has
the most CPUs (160, vs 8 CPUs on the other machines). I'm not sure how to
move forward about this.

(In reply to Carl E Love from comment #22)
> Thiago:
> 
> Yes, the log files where the failures "the program is no longer running"
> occur has the line:
> 
> Program terminated with signal SIGTRAP, Trace/breakpoint trap.
> The program no longer exists.
> 
> So yes, that does match issue 3, comment #19.

Nice, thank you for confirming.

> Fixing the detach issue would go a long way to making the test a lot more
> reliable.

Just a minor correction, to avoid confusion: this GDB hang happens at
attach time and is not related to any previous detach command.

> The SIGTRAP issue happens about 0.5% of the time.

Yes, it's also not very common on my machines. Somewhat surprisingly, my
experience is that it's easier to reproduce on x86_64-linux than on
aarch64-linux.

> I haven't seen issue 2 yet, at least not that I can tell.  But based on
> what you said it is really unlikely to hit.

Yes, it's very uncommon. Though I did hit it or something like it on an
x86_64-linux machine just now (reported on bug #31512).

-- 
You are receiving this mail because:
You are on the CC list for the bug.

  parent reply	other threads:[~2024-03-19 19:10 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-29 18:06 [Bug testsuite/31312] New: " cel at linux dot ibm.com
2024-01-29 18:08 ` [Bug testsuite/31312] " cel at linux dot ibm.com
2024-01-29 18:20 ` tromey at sourceware dot org
2024-01-29 20:55 ` vries at gcc dot gnu.org
2024-01-29 21:35 ` cel at linux dot ibm.com
2024-01-29 21:44 ` cel at linux dot ibm.com
2024-01-29 22:38 ` cel at linux dot ibm.com
2024-01-30  7:21 ` vries at gcc dot gnu.org
2024-01-30 10:13 ` vries at gcc dot gnu.org
2024-01-31 16:14 ` cel at linux dot ibm.com
2024-02-06 18:59 ` cel at linux dot ibm.com
2024-02-12 18:58 ` tromey at sourceware dot org
2024-02-12 18:59 ` tromey at sourceware dot org
2024-02-16  4:42 ` cel at linux dot ibm.com
2024-03-09  0:45 ` tromey at sourceware dot org
2024-03-09  1:29 ` cel at linux dot ibm.com
2024-03-09  6:59 ` brobecker at gnat dot com
2024-03-09 16:43 ` tromey at sourceware dot org
2024-03-15 16:41 ` cel at linux dot ibm.com
2024-03-15 21:57 ` thiago.bauermann at linaro dot org
2024-03-16  1:37 ` thiago.bauermann at linaro dot org
2024-03-16 17:42 ` tromey at sourceware dot org
2024-03-18 18:45 ` thiago.bauermann at linaro dot org
2024-03-19 15:14 ` cel at linux dot ibm.com
2024-03-19 15:35 ` thiago.bauermann at linaro dot org
2024-03-19 15:57 ` cel at linux dot ibm.com
2024-03-19 19:10 ` thiago.bauermann at linaro dot org [this message]
2024-03-21 23:17 ` thiago.bauermann at linaro dot org
2024-04-14 17:56 ` brobecker at gnat dot com
2024-04-16  4:56 ` thiago.bauermann at linaro dot org
2024-04-17 14:52 ` pedro at palves dot net
2024-04-30  2:37 ` cvs-commit at gcc dot gnu.org
2024-05-10 22:14 ` brobecker at gnat dot com
2024-05-10 22:28 ` cel at linux dot ibm.com
2024-05-11 23:48 ` thiago.bauermann at linaro dot org
2024-05-13 19:03 ` tromey at sourceware dot org
2024-05-14 15:24 ` cel at linux dot ibm.com
2024-05-17 16:26 ` tromey at sourceware dot org
2024-05-17 16:33 ` cel at linux dot ibm.com
2024-05-17 17:10 ` vries at gcc dot gnu.org
2024-05-17 19:54 ` cel at linux dot ibm.com
2024-05-17 19:58 ` pedro at palves dot net
2024-05-17 23:02 ` cel at linux dot ibm.com

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-31312-4717-FDwHluC5iL@http.sourceware.org/bugzilla/ \
    --to=sourceware-bugzilla@sourceware.org \
    --cc=gdb-prs@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).