public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Improve vmcore loading
@ 2023-06-05  9:11 Andrew Burgess
  2023-06-05  9:11 ` [PATCH 1/3] gdb: split inferior and thread setup when opening a core file Andrew Burgess
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Andrew Burgess @ 2023-06-05  9:11 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

This patch started as a proposal to upstream a test from the Fedora
GDB tree (patch #2), however, as I started looking into the test a
little more I realised that there was scope for further improvements
to GDB (patch #3).  Patch #1 is a small cleanup/refactor.

This patch is all about loading vmcore files -- that is core files
generated by the Linux kernel.  See the commit messages on patches #2
and #3 for more details.

Thanks,
Andrew

---

Andrew Burgess (3):
  gdb: split inferior and thread setup when opening a core file
  gdb/testsuite: add test for core file with a 0 pid
  gdb: handle core files with .reg/0 section names

 gdb/corelow.c                                 | 212 +++++++++++++++---
 gdb/testsuite/gdb.arch/core-file-pid0.exp     |  73 ++++++
 .../gdb.arch/core-file-pid0.x86-64.core.bz2   | Bin 0 -> 750 bytes
 3 files changed, 252 insertions(+), 33 deletions(-)
 create mode 100644 gdb/testsuite/gdb.arch/core-file-pid0.exp
 create mode 100644 gdb/testsuite/gdb.arch/core-file-pid0.x86-64.core.bz2


base-commit: e9683acf5e51c2bac8aa68d30d9ac3683dddcc7d
-- 
2.25.4


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/3] gdb: split inferior and thread setup when opening a core file
  2023-06-05  9:11 [PATCH 0/3] Improve vmcore loading Andrew Burgess
@ 2023-06-05  9:11 ` Andrew Burgess
  2023-06-10  0:04   ` Kevin Buettner
  2023-06-05  9:11 ` [PATCH 2/3] gdb/testsuite: add test for core file with a 0 pid Andrew Burgess
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Andrew Burgess @ 2023-06-05  9:11 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

I noticed that in corelow.c, when a core file is opened, both the
thread and inferior setup is done in add_to_thread_list.  In this
patch I propose hoisting the inferior setup out of add_to_thread_list
into core_target_open.

The only thing about this change that gave me cause for concern is
that in add_to_thread_list, we only setup the inferior after finding
the first section with a name like ".reg/NN".  If we find no such
section then the inferior will never be setup.

Is this important?

Well, I don't think so.  Back in core_target_open, if there is no
current thread (which there will not be if no ".reg/NN" section was
found), then we look for a thread in the current inferior.  If there
are no threads (which there will not be if no ".reg/NN" is found),
then we once again setup the current inferior.

What I think this means, is that, in all cases, the current inferior
will end up being setup.  By moving the inferior setup code earlier in
core_target_open and making it non-conditional, we can remove the
later code that sets up the inferior, we now know this will always
have been done.

There should be no user visible changes after this commit.
---
 gdb/corelow.c | 62 ++++++++++++++++++++++++---------------------------
 1 file changed, 29 insertions(+), 33 deletions(-)

diff --git a/gdb/corelow.c b/gdb/corelow.c
index db489b4280e..7312d40374f 100644
--- a/gdb/corelow.c
+++ b/gdb/corelow.c
@@ -351,40 +351,24 @@ core_target::close ()
 /* Look for sections whose names start with `.reg/' so that we can
    extract the list of threads in a core file.  */
 
+/* If ASECT is a section whose name begins with '.reg/' then extract the
+   lwpid after the '/' and create a new thread in INF.
+
+   If REG_SECT is not nullptr, and the both ASECT and REG_SECT point at the
+   same position in the parent bfd object then switch to the newly created
+   thread, otherwise, the selected thread is left unchanged.  */
+
 static void
-add_to_thread_list (asection *asect, asection *reg_sect)
+add_to_thread_list (asection *asect, asection *reg_sect, inferior *inf)
 {
-  int core_tid;
-  int pid, lwpid;
-  bool fake_pid_p = false;
-  struct inferior *inf;
-
   if (!startswith (bfd_section_name (asect), ".reg/"))
     return;
 
-  core_tid = atoi (bfd_section_name (asect) + 5);
-
-  pid = bfd_core_file_pid (core_bfd);
-  if (pid == 0)
-    {
-      fake_pid_p = true;
-      pid = CORELOW_PID;
-    }
-
-  lwpid = core_tid;
-
-  inf = current_inferior ();
-  if (inf->pid == 0)
-    {
-      inferior_appeared (inf, pid);
-      inf->fake_pid_p = fake_pid_p;
-    }
-
-  ptid_t ptid (pid, lwpid);
-
+  int lwpid = atoi (bfd_section_name (asect) + 5);
+  ptid_t ptid (inf->pid, lwpid);
   thread_info *thr = add_thread (inf->process_target (), ptid);
 
-/* Warning, Will Robinson, looking at BFD private data! */
+  /* Warning, Will Robinson, looking at BFD private data! */
 
   if (reg_sect != NULL
       && asect->filepos == reg_sect->filepos)	/* Did we find .reg?  */
@@ -541,12 +525,27 @@ core_target_open (const char *arg, int from_tty)
      previous session, and the frame cache being stale.  */
   registers_changed ();
 
+  /* Find (or fake) the pid for the process in this core file, and
+     initialise the current inferior with that pid.  */
+  bool fake_pid_p = false;
+  int pid = bfd_core_file_pid (core_bfd);
+  if (pid == 0)
+    {
+      fake_pid_p = true;
+      pid = CORELOW_PID;
+    }
+
+  inferior *inf = current_inferior ();
+  gdb_assert (inf->pid == 0);
+  inferior_appeared (inf, pid);
+  inf->fake_pid_p = fake_pid_p;
+
   /* Build up thread list from BFD sections, and possibly set the
      current thread to the .reg/NN section matching the .reg
      section.  */
   asection *reg_sect = bfd_get_section_by_name (core_bfd, ".reg");
   for (asection *sect : gdb_bfd_sections (core_bfd))
-    add_to_thread_list (sect, reg_sect);
+    add_to_thread_list (sect, reg_sect, inf);
 
   if (inferior_ptid == null_ptid)
     {
@@ -556,13 +555,10 @@ core_target_open (const char *arg, int from_tty)
 	 which was the "main" thread.  The latter case shouldn't
 	 usually happen, but we're dealing with input here, which can
 	 always be broken in different ways.  */
-      thread_info *thread = first_thread_of_inferior (current_inferior ());
+      thread_info *thread = first_thread_of_inferior (inf);
 
       if (thread == NULL)
-	{
-	  inferior_appeared (current_inferior (), CORELOW_PID);
-	  thread = add_thread_silent (target, ptid_t (CORELOW_PID));
-	}
+	thread = add_thread_silent (target, ptid_t (CORELOW_PID));
 
       switch_to_thread (thread);
     }
-- 
2.25.4


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 2/3] gdb/testsuite: add test for core file with a 0 pid
  2023-06-05  9:11 [PATCH 0/3] Improve vmcore loading Andrew Burgess
  2023-06-05  9:11 ` [PATCH 1/3] gdb: split inferior and thread setup when opening a core file Andrew Burgess
@ 2023-06-05  9:11 ` Andrew Burgess
  2023-06-10  0:16   ` Kevin Buettner
                     ` (2 more replies)
  2023-06-05  9:11 ` [PATCH 3/3] gdb: handle core files with .reg/0 section names Andrew Burgess
  2023-07-03 17:03 ` [PATCH 0/3] Improve vmcore loading Andrew Burgess
  3 siblings, 3 replies; 11+ messages in thread
From: Andrew Burgess @ 2023-06-05  9:11 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

This patch contains a test for this commit:

  commit c820c52a914cc9d7c63cb41ad396f4ddffff2196
  Date:   Fri Aug 6 19:45:58 2010 +0000

              * thread.c (add_thread_silent): Use null_ptid instead of
              minus_one_ptid while getting rid of stale inferior_ptid.

This is another test that has been carried in the Fedora GDB tree for
some time, and I thought that it would be worth merging to master.  I
don't believe there is any test like this currently in the testsuite.

The original issue was reported in this thread:

  https://inbox.sourceware.org/gdb-patches/AANLkTi=zuEDw6qiZ1jRatkdwHO99xF2Qu+WZ7i0EQjef@mail.gmail.com/

The problem was that when GDB was used to open a vmcore (core file)
image generated by the Linux kernel GDB would (sometimes) crash with
an assertion failure:

  thread.c:884: internal-error: switch_to_thread: Assertion `inf != NULL' failed.

To understand what's going on we need some background; a vmcore file
represents each processor core in the same way that a standard
application core file represents threads.  Thus, we might say, a
vmcore file represents cores as threads.

When writing a vmcore file, the kernel will store the pid of the
process currently running on that core as the thread's lwpid.

However, if a core is idle, with no process currently running on it,
then the lwpid for that thread is stored as 0 in the vmcore file.  If
multiple cores are idle then multiple threads will have a lwpid of 0.

Back in 2010, the original issue reported tried to change the kernel's
behaviour in this thread:

  https://lkml.org/lkml/2010/8/3/75

This change was rejected by the kernel team, the current
behaviour (lwpid of 0) was considered correct.  I've checked the
source of a recent kernel.  The code mentioned in the lkml.org posting
has moved, it's now in the function crash_save_cpu in the file
kernel/kexec_core.c, but the general behaviour is unchanged, an idle
core will have an lwpid of 0, so I think GDB still needs to be able to
handle this case.

When GDB loads a vmcore file (which is handled just like any other
core file) the sections are processed in core_open to generate the
threads for the core file.  The processing is done by calling
add_to_thread_list, a function which looks for sections named .reg/NN
where NN is the lwpid of the thread, GDB then builds a ptid_t for the
new thread and calls add_thread.

Remember, in our case the lwpid is 0.  Now for the first thread this
is fine, if a little weird, 0 isn't usually a valid lwpid, but that's
OK, GDB creates a thread with lwpid of 0 and carries on.

When we find the next thread (core) with lwpid of 0, we attempt to
create another thread with an lwpid of 0.  This of course clashes with
the previously created thread, they have the same ptid_t, so GDB tries
to delete the first thread.

And it was within this thread delete code that we triggered a bug
which would then cause GDB to assert -- when deleting we tried to
switch to a thread with minus_one_ptid, this resulted in a call to
find_inferior_pid (passing in minus_one_ptid's pid, which is -1), the
find_inferior_pid call fails and returns NULL, which then triggered an
assert in switch_to_thread.

The actual details of the why the assert triggered are really not
important.  What's important (I think) is that a vmcore file might
have this interesting lwpid of 0 characteristic, which isn't something
we see in "normal" application core files, and it is this that I think
we should be testing.

Now, you might be thinking: isn't deleting the first thread the wrong
thing to do?  If the vmcore file has two threads that represent two
cores, and both have an lwpid of 0 (indicating both cores are idle),
then surely GDB should still represent this as two threads?  You're
not wrong.  This was mentioned by Pedro in the original GDB mailing
list thread here:

  https://inbox.sourceware.org/gdb-patches/201008061057.03037.pedro@codesourcery.com/

This is indeed a problem, and this problem is still present in GDB
today.  I plan to try and address this in a later commit, however,
this first commit is about getting a test in place to confirm that GDB
at a minimum doesn't crash when loading such a vmcore file.

And so, finally, what's in this commit?

This commit contains a new test.  The test doesn't actually contain a
vmcore file.  Instead I've created a standard application core file
that contains two threads, and then manually edited the core file to
set the lwpid of each thread to 0.

To further reduce the size of the core file (as it will be stored in
git), I've zeroed all of the LOAD-able segments in the core file.
This test really doesn't care about that part of the core file, we
only really care about loading the register's, this is enough to
confirm that the GDB doesn't crash.

Obviously as the core file is pre-generated, this test is architecture
specific.  There are already a few tests in gdb.arch/ that include
pre-generate core files.  Just as those existing tests do, I've
compressed the core file with bzip2, which reduces it to just 750
bytes.  I have structured the test so that if/when this patch is
merged I can add some additional core files for other architectures,
however, these are not included in this commit.

The test simply expands the core file, and then loads it into GDB.
One interesting thing to note is that GDB reports the core file
loading like this:

  (gdb) core-file ./gdb/testsuite/outputs/gdb.arch/core-file-pid0/core-file-pid0.x86-64.core
  [New process 1]
  [New process 1]
  Failed to read a valid object file image from memory.
  Core was generated by `./segv-mt'.
  Program terminated with signal SIGSEGV, Segmentation fault.
  The current thread has terminated
  (gdb)

There's two interesting things here: first, the repeated "New process
1" message.  This is caused because linux_core_pid_to_str reports
anything with an lwpid of 0 as a process, rather than an LWP.  And
second, the "The current thread has terminated" message.  This is
because the first thread in the core file is the current thread, but
when GDB loads the second thread (which also has lwpid 0) this causes
the first thread to be deleted, as a result GDB thinks that the
current (first) thread has terminated.

As I said previously, both of these problems are a result of the lwpid
0 aliasing, which is not being fixed in this commit -- this commit is
just confirming that GDB doesn't crash when loading this core file.
---
 gdb/testsuite/gdb.arch/core-file-pid0.exp     |  63 ++++++++++++++++++
 .../gdb.arch/core-file-pid0.x86-64.core.bz2   | Bin 0 -> 750 bytes
 2 files changed, 63 insertions(+)
 create mode 100644 gdb/testsuite/gdb.arch/core-file-pid0.exp
 create mode 100644 gdb/testsuite/gdb.arch/core-file-pid0.x86-64.core.bz2

diff --git a/gdb/testsuite/gdb.arch/core-file-pid0.exp b/gdb/testsuite/gdb.arch/core-file-pid0.exp
new file mode 100644
index 00000000000..b960dfe095b
--- /dev/null
+++ b/gdb/testsuite/gdb.arch/core-file-pid0.exp
@@ -0,0 +1,63 @@
+# This testcase is part of GDB, the GNU debugger.
+#
+# Copyright 2023 Free Software Foundation, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+
+# Some kernel core files have PID 0 (for the idle task), check that
+# GDB can handle such a core file.
+
+standard_testfile
+
+# Set CF_NAME, the name of the compressed core file within the source
+# tree, and CF_SIZE, the size (in bytes) of the uncompressed core
+# file.
+if {[istarget "x86_64-*-linux*"]} {
+    set cf_name ${testfile}.x86-64.core.bz2
+    set cf_size 8757248
+} else {
+    unsupported "no pre-generated core file for this target"
+}
+
+# Decompress the core file.
+set corebz2file ${srcdir}/${subdir}/${cf_name}
+set corefile [decompress_bz2 $corebz2file]
+if { $corefile eq "" } {
+    untested "failed to bunzip2 the core file"
+    return -1
+}
+
+# Check the size of the decompressed core file.  Just for sanity.
+file stat ${corefile} corestat
+if { $corestat(size) != ${cf_size} } {
+    untested "uncompressed core file is the wrong size"
+    return -1
+}
+
+# Copy over the corefile if we are remote testing.
+set corefile [gdb_remote_download host $corefile]
+
+clean_restart
+
+# Load the core file.  At one point GDB would assert, complaining that
+# the inferior was nullptr.  For now we see a message about the
+# current thread having terminated, this is because GDB gets confused
+# and incorrectly deletes what should be the current thread.
+gdb_test "core-file ${corefile}" \
+    [multi_line \
+	 "Core was generated by \[^\r\n\]+\\." \
+	 "Program terminated with signal (?:11|SIGSEGV), Segmentation fault\\." \
+	 "The current thread has terminated"] \
+    "check core file termination reason"
diff --git a/gdb/testsuite/gdb.arch/core-file-pid0.x86-64.core.bz2 b/gdb/testsuite/gdb.arch/core-file-pid0.x86-64.core.bz2
new file mode 100644
index 0000000000000000000000000000000000000000..081a35250f1fbb9743aecc70723abc90ad2704e8
GIT binary patch
literal 750
zcmV<K0ulW}T4*^jL0KkKS=gyx?Ew#FfB*mg`1I*p$U}Rr+C%1oU4hy`OaKNj#IVo+
z1U5!VNl(B6v_J@?B@ju7Q_1B=sro3|H9bu|L()AofM|MyAR0WOpn8o1Pf?&AX+V)A
z3TZ!6)YC?$ntFg}8fl=@)B`{nGynmi>HzgLXaH~k4FC-QXaE4whyWS@0000Q0B`^e
z01W_W007a502%-Q000^RIm5u!H4!vOi4u;~XXvv+_(4wTM27K9uvA#A$z8Re3k?Vx
zg=D9N<JN>R?O2NX^{dvY9IV+DHS4>5kX7fxS%zD4RsCzh73;S*BC&GECiK?zCXv=Y
z65o1Zd|ax6qtvPl(c_9KX5k85w|e}6l%`SES|O^nN<jri%s3*DRy|MfGVsPsWH1B(
zkOPuQFDpPQ6A;q|h9C^<#7K$C^|I+HTB9IhY(hW@08}tVBSPDZLKK*gTVrAy2GbZK
zVUW{c+i4h$Bp?KUF~g$(fG;!x;xQ!&h1Ob#1Z1?55D5TU=5uS4F3-*}vnUZUsA9n;
zAkW);9BX<W{Gcf`;UTn=#(M#9c45p8ofEMjrHN6K7R%6rcRNqdUZ-MiEedv*7;`qn
z_%Jq-&2_Ut^}toC5Qt-7)j-C#Zu{%xQ9TI6o+z~#X%QG_&9Z)!YnH<brn+QjJ0yF8
zz<{fd2=)r*P%f}w4*d$ovZsi}CXi%YU4O+bi8Z3F1Dw^e_|l&DF1G0B^Kr0F=1d?&
z>NxR=)iL;!(m48t3>)nWv>L}odw_(LwYmfFnX^tXRS%I%0_gs2X0CMw)LAuUYge!*
z*ho4d^b_t7!mr3Ox;W?}{mO+P*A-{+6zTs%^$`P2v5rXCxq<qEZyFlSwrFD9mhBBS
zLbTCha|<kE)-W>ZV4i%UXhQ~m4Hr~y%;=R2;=vk&8NAlooiprS>1{0SArS-sL;&Pz
gX|25!i~Ev+81)1P_+g=)e((HU$rRy2Lt>?Ww9hY4asU7T

literal 0
HcmV?d00001

-- 
2.25.4


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 3/3] gdb: handle core files with .reg/0 section names
  2023-06-05  9:11 [PATCH 0/3] Improve vmcore loading Andrew Burgess
  2023-06-05  9:11 ` [PATCH 1/3] gdb: split inferior and thread setup when opening a core file Andrew Burgess
  2023-06-05  9:11 ` [PATCH 2/3] gdb/testsuite: add test for core file with a 0 pid Andrew Burgess
@ 2023-06-05  9:11 ` Andrew Burgess
  2023-06-10  0:36   ` Kevin Buettner
  2023-07-03 17:03 ` [PATCH 0/3] Improve vmcore loading Andrew Burgess
  3 siblings, 1 reply; 11+ messages in thread
From: Andrew Burgess @ 2023-06-05  9:11 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

The previous commit added the test gdb.arch/core-file-pid0.exp which
tests GDB's ability to load a core file containing threads with an
lwpid of 0, which is something we GDB can encounter when loading a
vmcore file -- a core file generated by the Linux kernel.  The threads
with an lwpid of 0 represents idle cores.

While the previous commit added the test, which confirms GDB doesn't
crash when confronted with such a core file, there are still some
problems with GDB's handling of these core files.  These problems all
originate from the fact that the core file (once opened by bfd)
contains multiple sections called .reg/0, these sections all
represents different threads (cpu cores in the original vmcore dump),
but GDB gets confused and thinks all of these .reg/0 sections are all
referencing the same thread.

Here is a GDB session on an x86-64 machine which loads the core file
from the gdb.arch/core-file-pid0.exp, this core file contains two
threads, both of which have a pid of 0:

  $ ./gdb/gdb --data-directory ./gdb/data-directory/ -q
  (gdb) core-file /tmp/x86_64-pid0-core.core
  [New process 1]
  [New process 1]
  Failed to read a valid object file image from memory.
  Core was generated by `./segv-mt'.
  Program terminated with signal SIGSEGV, Segmentation fault.
  The current thread has terminated
  (gdb) info threads
    Id   Target Id         Frame
    2    process 1         0x00000000004017c2 in ?? ()

  The current thread <Thread ID 1> has terminated.  See `help thread'.
  (gdb) maintenance info sections
  Core file: `/tmp/x86_64-pid0-core.core', file type elf64-x86-64.
   [0]      0x00000000->0x000012d4 at 0x00000318: note0 READONLY HAS_CONTENTS
   [1]      0x00000000->0x000000d8 at 0x0000039c: .reg/0 HAS_CONTENTS
   [2]      0x00000000->0x000000d8 at 0x0000039c: .reg HAS_CONTENTS
   [3]      0x00000000->0x00000080 at 0x0000052c: .note.linuxcore.siginfo/0 HAS_CONTENTS
   [4]      0x00000000->0x00000080 at 0x0000052c: .note.linuxcore.siginfo HAS_CONTENTS
   [5]      0x00000000->0x00000140 at 0x000005c0: .auxv HAS_CONTENTS
   [6]      0x00000000->0x000000a4 at 0x00000714: .note.linuxcore.file/0 HAS_CONTENTS
   [7]      0x00000000->0x000000a4 at 0x00000714: .note.linuxcore.file HAS_CONTENTS
   [8]      0x00000000->0x00000200 at 0x000007cc: .reg2/0 HAS_CONTENTS
   [9]      0x00000000->0x00000200 at 0x000007cc: .reg2 HAS_CONTENTS
   [10]     0x00000000->0x00000440 at 0x000009e0: .reg-xstate/0 HAS_CONTENTS
   [11]     0x00000000->0x00000440 at 0x000009e0: .reg-xstate HAS_CONTENTS
   [12]     0x00000000->0x000000d8 at 0x00000ea4: .reg/0 HAS_CONTENTS
   [13]     0x00000000->0x00000200 at 0x00000f98: .reg2/0 HAS_CONTENTS
   [14]     0x00000000->0x00000440 at 0x000011ac: .reg-xstate/0 HAS_CONTENTS
   [15]     0x00400000->0x00401000 at 0x00002000: load1 ALLOC LOAD READONLY HAS_CONTENTS
   [16]     0x00401000->0x004b9000 at 0x00003000: load2 ALLOC READONLY CODE
   [17]     0x004b9000->0x004e5000 at 0x00003000: load3 ALLOC READONLY
   [18]     0x004e6000->0x004ec000 at 0x00003000: load4 ALLOC LOAD HAS_CONTENTS
   [19]     0x004ec000->0x004f2000 at 0x00009000: load5 ALLOC LOAD HAS_CONTENTS
   [20]     0x012a8000->0x012cb000 at 0x0000f000: load6 ALLOC LOAD HAS_CONTENTS
   [21]     0x7fda77736000->0x7fda77737000 at 0x00032000: load7 ALLOC READONLY
   [22]     0x7fda77737000->0x7fda77f37000 at 0x00032000: load8 ALLOC LOAD HAS_CONTENTS
   [23]     0x7ffd55f65000->0x7ffd55f86000 at 0x00832000: load9 ALLOC LOAD HAS_CONTENTS
   [24]     0x7ffd55fc3000->0x7ffd55fc7000 at 0x00853000: load10 ALLOC LOAD READONLY HAS_CONTENTS
   [25]     0x7ffd55fc7000->0x7ffd55fc9000 at 0x00857000: load11 ALLOC LOAD READONLY CODE HAS_CONTENTS
   [26]     0xffffffffff600000->0xffffffffff601000 at 0x00859000: load12 ALLOC LOAD READONLY CODE HAS_CONTENTS
  (gdb)

Notice when the core file is first loaded we see two lines like:

  [New process 1]

And GDB reports:

  The current thread has terminated

Which isn't what we'd expect from a core file -- the core file should
only contain threads that are live at the point of the crash, one of
which should be the current thread.  The above message is reported
because GDB has deleted what we think is the current thread!

And in the 'info threads' output we are only seeing a single thread,
again, this is because GDB has deleted one of the threads.

Finally, the 'maintenance info sections' output shows the cause of all
our problems, two sections named .reg/0.  When GDB sees the first of
these it creates a new thread.  But, when we see the second .reg/0 GDB
tries to create another new thread, but this thread has the same
ptid_t as the first thread, so GDB deletes the first thread and
creates the second thread in its place.

Because both these threads are created with an lwpid of 0 GDB reports
these are 'New process NN' rather than 'New LWP NN' which is what we
would normally expect.

The previous commit includes a little more of the history of GDB
support in this area, but these problems were discussed on the mailing
list a while ago in this thread:

  https://inbox.sourceware.org/gdb-patches/AANLkTi=zuEDw6qiZ1jRatkdwHO99xF2Qu+WZ7i0EQjef@mail.gmail.com/

In this commit I propose a solution to these problems.

What I propose is that GDB should spot when we have .reg/0 sections
and, when these are found, should rename these sections using some
unique non-zero lwpid.

Note in the above output we also have sections like .reg2/0 and
.reg-xstate/0, these are additional register sets, this commit also
renumbers these sections inline with their .reg section.

The user is warned that some section renumbering has been performed.

GDB takes care to ensure that the new numbers assigned are unique and
don't clash with any of the pid's that might already be in use --
remember, in a real vmcore file, 0 is used to indicate an idle core,
non-idle cores will have the pid of whichever process was running on
that core, so we don't want GDB to assign an lwpid that clashes with
an actual pid that is in use in the core file.

After this commit here's the updated GDB session output:

  $ ./gdb/gdb --data-directory ./gdb/data-directory/ -q
  (gdb) core-file /tmp/x86_64-pid0-core.core
  warning: found threads with pid 0, assigned replacement Target Ids: LWP 1, LWP 2
  [New LWP 1]
  [New LWP 2]
  Failed to read a valid object file image from memory.
  Core was generated by `./segv-mt'.
  Program terminated with signal SIGSEGV, Segmentation fault.
  #0  0x00000000004017c2 in ?? ()
  [Current thread is 1 (LWP 1)]
  (gdb) info threads
    Id   Target Id         Frame
  * 1    LWP 1             0x00000000004017c2 in ?? ()
    2    LWP 2             0x000000000040dda5 in ?? ()
  (gdb) maintenance info sections
  Core file: `/tmp/x86_64-pid0-core.core', file type elf64-x86-64.
   [0]      0x00000000->0x000012d4 at 0x00000318: note0 READONLY HAS_CONTENTS
   [1]      0x00000000->0x000000d8 at 0x0000039c: .reg/1 HAS_CONTENTS
   [2]      0x00000000->0x000000d8 at 0x0000039c: .reg HAS_CONTENTS
   [3]      0x00000000->0x00000080 at 0x0000052c: .note.linuxcore.siginfo/1 HAS_CONTENTS
   [4]      0x00000000->0x00000080 at 0x0000052c: .note.linuxcore.siginfo HAS_CONTENTS
   [5]      0x00000000->0x00000140 at 0x000005c0: .auxv HAS_CONTENTS
   [6]      0x00000000->0x000000a4 at 0x00000714: .note.linuxcore.file/1 HAS_CONTENTS
   [7]      0x00000000->0x000000a4 at 0x00000714: .note.linuxcore.file HAS_CONTENTS
   [8]      0x00000000->0x00000200 at 0x000007cc: .reg2/1 HAS_CONTENTS
   [9]      0x00000000->0x00000200 at 0x000007cc: .reg2 HAS_CONTENTS
   [10]     0x00000000->0x00000440 at 0x000009e0: .reg-xstate/1 HAS_CONTENTS
   [11]     0x00000000->0x00000440 at 0x000009e0: .reg-xstate HAS_CONTENTS
   [12]     0x00000000->0x000000d8 at 0x00000ea4: .reg/2 HAS_CONTENTS
   [13]     0x00000000->0x00000200 at 0x00000f98: .reg2/2 HAS_CONTENTS
   [14]     0x00000000->0x00000440 at 0x000011ac: .reg-xstate/2 HAS_CONTENTS
   [15]     0x00400000->0x00401000 at 0x00002000: load1 ALLOC LOAD READONLY HAS_CONTENTS
   [16]     0x00401000->0x004b9000 at 0x00003000: load2 ALLOC READONLY CODE
   [17]     0x004b9000->0x004e5000 at 0x00003000: load3 ALLOC READONLY
   [18]     0x004e6000->0x004ec000 at 0x00003000: load4 ALLOC LOAD HAS_CONTENTS
   [19]     0x004ec000->0x004f2000 at 0x00009000: load5 ALLOC LOAD HAS_CONTENTS
   [20]     0x012a8000->0x012cb000 at 0x0000f000: load6 ALLOC LOAD HAS_CONTENTS
   [21]     0x7fda77736000->0x7fda77737000 at 0x00032000: load7 ALLOC READONLY
   [22]     0x7fda77737000->0x7fda77f37000 at 0x00032000: load8 ALLOC LOAD HAS_CONTENTS
   [23]     0x7ffd55f65000->0x7ffd55f86000 at 0x00832000: load9 ALLOC LOAD HAS_CONTENTS
   [24]     0x7ffd55fc3000->0x7ffd55fc7000 at 0x00853000: load10 ALLOC LOAD READONLY HAS_CONTENTS
   [25]     0x7ffd55fc7000->0x7ffd55fc9000 at 0x00857000: load11 ALLOC LOAD READONLY CODE HAS_CONTENTS
   [26]     0xffffffffff600000->0xffffffffff601000 at 0x00859000: load12 ALLOC LOAD READONLY CODE HAS_CONTENTS
  (gdb)

Notice the new warning which is issued when the core file is being
loaded.  The threads are announced as '[New LWP NN]', and we see two
threads in the 'info threads' output.  The 'maintenance info sections'
output shows the result of the section renaming.

The gdb.arch/core-file-pid0.exp test has been update to check for the
improved GDB output.
---
 gdb/corelow.c                             | 150 ++++++++++++++++++++++
 gdb/testsuite/gdb.arch/core-file-pid0.exp |  12 +-
 2 files changed, 161 insertions(+), 1 deletion(-)

diff --git a/gdb/corelow.c b/gdb/corelow.c
index 7312d40374f..321b2454b5f 100644
--- a/gdb/corelow.c
+++ b/gdb/corelow.c
@@ -405,6 +405,153 @@ core_file_command (const char *filename, int from_tty)
     core_target_open (filename, from_tty);
 }
 
+/* A vmcore file is a core file created by the Linux kernel at the point of
+   a crash.  Each thread in the core file represents a real CPU core, and
+   the lwpid for each thread is the pid of the process that was running on
+   that core at the moment of the crash.
+
+   However, not every CPU core will have been running a process, some cores
+   will be idle.  For these idle cores the CPU writes an lwpid of 0.  And
+   of course, multiple cores might be idle, so there could be multiple
+   threads with an lwpid of 0.
+
+   The problem is GDB doesn't really like threads with an lwpid of 0; GDB
+   presents such a thread as a process rather than a thread.  And GDB
+   certainly doesn't like multiple threads having the same lwpid, each time
+   a new thread is seen with the same lwpid the earlier thread (with the
+   same lwpid) will be deleted.
+
+   This function addresses both of these problems by assigning a fake lwpid
+   to any thread with an lwpid of 0.
+
+   GDB finds the lwpid information by looking at the bfd section names
+   which include the lwpid, e.g. .reg/NN where NN is the lwpid.  This
+   function looks though all the section names looking for sections named
+   .reg/NN.  If any sections are found where NN == 0, then we assign a new
+   unique value of NN.  Then, in a second pass, any sections ending /0 are
+   assigned their new number.
+
+   Remember, a core file may contain multiple register sections for
+   different register sets, but the sets are always grouped by thread, so
+   we can figure out which registers should be assigned the same new
+   lwpid.  For example, consider a core file containing:
+
+     .reg/0, .reg2/0, .reg/0, .reg2/0
+
+   This represents two threads, each thread contains a .reg and .reg2
+   register set.  The .reg represents the start of each thread.  After
+   renaming the sections will now look like this:
+
+     .reg/1, .reg2/1, .reg/2, .reg2/2
+
+   After calling this function the rest of the core file handling code can
+   treat this core file just like any other core file.  */
+
+static void
+rename_vmcore_idle_reg_sections (bfd *abfd, inferior *inf)
+{
+  /* Map from the bfd section to its lwpid (the /NN number).  */
+  std::vector<std::pair<asection *, int>> sections_and_lwpids;
+
+  /* The set of all /NN numbers found.  Needed so we can easily find unused
+     numbers in the case that we need to rename some sections.  */
+  std::unordered_set<int> all_lwpids;
+
+  /* A count of how many sections called .reg/0 we have found.  */
+  unsigned zero_lwpid_count = 0;
+
+  /* Look for all the .reg sections.  Record the section object and the
+     lwpid which is extracted from the section name.  Spot if any have an
+     lwpid of zero.  */
+  for (asection *sect : gdb_bfd_sections (core_bfd))
+    {
+      if (startswith (bfd_section_name (sect), ".reg/"))
+	{
+	  int lwpid = atoi (bfd_section_name (sect) + 5);
+	  sections_and_lwpids.emplace_back (sect, lwpid);
+	  all_lwpids.insert (lwpid);
+	  if (lwpid == 0)
+	    zero_lwpid_count++;
+	}
+    }
+
+  /* If every ".reg/NN" section has a non-zero lwpid then we don't need to
+     do any renaming.  */
+  if (zero_lwpid_count == 0)
+    return;
+
+  /* Assign a new number to any .reg sections with an lwpid of 0.  */
+  int new_lwpid = 1;
+  for (auto &sect_and_lwpid : sections_and_lwpids)
+    if (sect_and_lwpid.second == 0)
+      {
+	while (all_lwpids.find (new_lwpid) != all_lwpids.end ())
+	  new_lwpid++;
+	sect_and_lwpid.second = new_lwpid;
+	new_lwpid++;
+      }
+
+  /* Now update the names of any sections with an lwpid of 0.  This is
+     more than just the .reg sections we originally found.  */
+  std::string replacement_lwpid_str;
+  auto iter = sections_and_lwpids.begin ();
+  int replacement_lwpid = 0;
+  for (asection *sect : gdb_bfd_sections (core_bfd))
+    {
+      if (iter != sections_and_lwpids.end () && sect == iter->first)
+	{
+	  gdb_assert (startswith (bfd_section_name (sect), ".reg/"));
+
+	  int lwpid = atoi (bfd_section_name (sect) + 5);
+	  if (lwpid == iter->second)
+	    {
+	      /* This section was not given a new number.  */
+	      gdb_assert (lwpid != 0);
+	      replacement_lwpid = 0;
+	    }
+	  else
+	    {
+	      replacement_lwpid = iter->second;
+	      ptid_t ptid (inf->pid, replacement_lwpid);
+	      if (!replacement_lwpid_str.empty ())
+		replacement_lwpid_str += ", ";
+	      replacement_lwpid_str += target_pid_to_str (ptid);
+	    }
+
+	  iter++;
+	}
+
+      if (replacement_lwpid != 0)
+	{
+	  const char *name = bfd_section_name (sect);
+	  size_t len = strlen (name);
+
+	  if (strncmp (name + len - 2, "/0", 2) == 0)
+	    {
+	      /* This section needs a new name.  */
+	      std::string name_str
+		= string_printf ("%.*s/%d",
+				 static_cast<int> (len - 2),
+				 name, replacement_lwpid);
+	      char *name_buf
+		= static_cast<char *> (bfd_alloc (abfd, name_str.size () + 1));
+	      if (name_buf == nullptr)
+		error (_("failed to allocate space for section name '%s'"),
+		       name_str.c_str ());
+	      memcpy (name_buf, name_str.c_str(), name_str.size () + 1);
+	      bfd_rename_section (sect, name_buf);
+	    }
+	}
+    }
+
+  if (zero_lwpid_count == 1)
+    warning (_("found thread with pid 0, assigned replacement Target Id: %s"),
+	     replacement_lwpid_str.c_str ());
+  else
+    warning (_("found threads with pid 0, assigned replacement Target Ids: %s"),
+	     replacement_lwpid_str.c_str ());
+}
+
 /* Locate (and load) an executable file (and symbols) given the core file
    BFD ABFD.  */
 
@@ -540,6 +687,9 @@ core_target_open (const char *arg, int from_tty)
   inferior_appeared (inf, pid);
   inf->fake_pid_p = fake_pid_p;
 
+  /* Rename any .reg/0 sections, giving them each a fake lwpid.  */
+  rename_vmcore_idle_reg_sections (core_bfd, inf);
+
   /* Build up thread list from BFD sections, and possibly set the
      current thread to the .reg/NN section matching the .reg
      section.  */
diff --git a/gdb/testsuite/gdb.arch/core-file-pid0.exp b/gdb/testsuite/gdb.arch/core-file-pid0.exp
index b960dfe095b..6e91111b44b 100644
--- a/gdb/testsuite/gdb.arch/core-file-pid0.exp
+++ b/gdb/testsuite/gdb.arch/core-file-pid0.exp
@@ -57,7 +57,17 @@ clean_restart
 # and incorrectly deletes what should be the current thread.
 gdb_test "core-file ${corefile}" \
     [multi_line \
+	 "warning: found threads with pid 0, assigned replacement Target Ids: LWP 1, LWP 2" \
+	 ".*" \
 	 "Core was generated by \[^\r\n\]+\\." \
 	 "Program terminated with signal (?:11|SIGSEGV), Segmentation fault\\." \
-	 "The current thread has terminated"] \
+	 "#0\\s+$hex in \[^\r\n\]+" \
+	 "\\\[Current thread is 1 \\(LWP 1\\)\\\]"] \
     "check core file termination reason"
+
+# And check GDB has found both threads.
+gdb_test "info threads" \
+    [multi_line \
+	 "\\* 1\\s+LWP 1\\s+$hex in \[^\r\n\]+" \
+	 "  2\\s+LWP 2\\s+$hex in \[^\r\n\]+"] \
+    "check both threads are visible"
-- 
2.25.4


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] gdb: split inferior and thread setup when opening a core file
  2023-06-05  9:11 ` [PATCH 1/3] gdb: split inferior and thread setup when opening a core file Andrew Burgess
@ 2023-06-10  0:04   ` Kevin Buettner
  0 siblings, 0 replies; 11+ messages in thread
From: Kevin Buettner @ 2023-06-10  0:04 UTC (permalink / raw)
  To: Andrew Burgess via Gdb-patches; +Cc: Andrew Burgess

On Mon,  5 Jun 2023 10:11:07 +0100
Andrew Burgess via Gdb-patches <gdb-patches@sourceware.org> wrote:

> I noticed that in corelow.c, when a core file is opened, both the
> thread and inferior setup is done in add_to_thread_list.  In this
> patch I propose hoisting the inferior setup out of add_to_thread_list
> into core_target_open.
> 
> The only thing about this change that gave me cause for concern is
> that in add_to_thread_list, we only setup the inferior after finding
> the first section with a name like ".reg/NN".  If we find no such
> section then the inferior will never be setup.
> 
> Is this important?
> 
> Well, I don't think so.  Back in core_target_open, if there is no
> current thread (which there will not be if no ".reg/NN" section was
> found), then we look for a thread in the current inferior.  If there
> are no threads (which there will not be if no ".reg/NN" is found),
> then we once again setup the current inferior.
> 
> What I think this means, is that, in all cases, the current inferior
> will end up being setup.  By moving the inferior setup code earlier in
> core_target_open and making it non-conditional, we can remove the
> later code that sets up the inferior, we now know this will always
> have been done.
> 
> There should be no user visible changes after this commit.
> ---
>  gdb/corelow.c | 62 ++++++++++++++++++++++++---------------------------
>  1 file changed, 29 insertions(+), 33 deletions(-)

LGTM.

Reviewed-by: Kevin Buettner <kevinb@redhat.com>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] gdb/testsuite: add test for core file with a 0 pid
  2023-06-05  9:11 ` [PATCH 2/3] gdb/testsuite: add test for core file with a 0 pid Andrew Burgess
@ 2023-06-10  0:16   ` Kevin Buettner
  2023-07-06 14:49   ` Pedro Alves
  2023-07-10 11:10   ` Andrew Burgess
  2 siblings, 0 replies; 11+ messages in thread
From: Kevin Buettner @ 2023-06-10  0:16 UTC (permalink / raw)
  To: Andrew Burgess via Gdb-patches; +Cc: Andrew Burgess

On Mon,  5 Jun 2023 10:11:08 +0100
Andrew Burgess via Gdb-patches <gdb-patches@sourceware.org> wrote:

> This patch contains a test for this commit:
> 
>   commit c820c52a914cc9d7c63cb41ad396f4ddffff2196
>   Date:   Fri Aug 6 19:45:58 2010 +0000
> 
>               * thread.c (add_thread_silent): Use null_ptid instead of
>               minus_one_ptid while getting rid of stale inferior_ptid.
> 
> This is another test that has been carried in the Fedora GDB tree for
> some time, and I thought that it would be worth merging to master.  I
> don't believe there is any test like this currently in the testsuite.
> 
> The original issue was reported in this thread:
> 
>   https://inbox.sourceware.org/gdb-patches/AANLkTi=zuEDw6qiZ1jRatkdwHO99xF2Qu+WZ7i0EQjef@mail.gmail.com/
> 
> The problem was that when GDB was used to open a vmcore (core file)
> image generated by the Linux kernel GDB would (sometimes) crash with
> an assertion failure:
> 
>   thread.c:884: internal-error: switch_to_thread: Assertion `inf != NULL' failed.
> 
> To understand what's going on we need some background; a vmcore file
> represents each processor core in the same way that a standard
> application core file represents threads.  Thus, we might say, a
> vmcore file represents cores as threads.
> 
> When writing a vmcore file, the kernel will store the pid of the
> process currently running on that core as the thread's lwpid.
> 
> However, if a core is idle, with no process currently running on it,
> then the lwpid for that thread is stored as 0 in the vmcore file.  If
> multiple cores are idle then multiple threads will have a lwpid of 0.
> 
> Back in 2010, the original issue reported tried to change the kernel's
> behaviour in this thread:
> 
>   https://lkml.org/lkml/2010/8/3/75
> 
> This change was rejected by the kernel team, the current
> behaviour (lwpid of 0) was considered correct.  I've checked the
> source of a recent kernel.  The code mentioned in the lkml.org posting
> has moved, it's now in the function crash_save_cpu in the file
> kernel/kexec_core.c, but the general behaviour is unchanged, an idle
> core will have an lwpid of 0, so I think GDB still needs to be able to
> handle this case.
> 
> When GDB loads a vmcore file (which is handled just like any other
> core file) the sections are processed in core_open to generate the
> threads for the core file.  The processing is done by calling
> add_to_thread_list, a function which looks for sections named .reg/NN
> where NN is the lwpid of the thread, GDB then builds a ptid_t for the
> new thread and calls add_thread.
> 
> Remember, in our case the lwpid is 0.  Now for the first thread this
> is fine, if a little weird, 0 isn't usually a valid lwpid, but that's
> OK, GDB creates a thread with lwpid of 0 and carries on.
> 
> When we find the next thread (core) with lwpid of 0, we attempt to
> create another thread with an lwpid of 0.  This of course clashes with
> the previously created thread, they have the same ptid_t, so GDB tries
> to delete the first thread.
> 
> And it was within this thread delete code that we triggered a bug
> which would then cause GDB to assert -- when deleting we tried to
> switch to a thread with minus_one_ptid, this resulted in a call to
> find_inferior_pid (passing in minus_one_ptid's pid, which is -1), the
> find_inferior_pid call fails and returns NULL, which then triggered an
> assert in switch_to_thread.
> 
> The actual details of the why the assert triggered are really not
> important.  What's important (I think) is that a vmcore file might
> have this interesting lwpid of 0 characteristic, which isn't something
> we see in "normal" application core files, and it is this that I think
> we should be testing.
> 
> Now, you might be thinking: isn't deleting the first thread the wrong
> thing to do?  If the vmcore file has two threads that represent two
> cores, and both have an lwpid of 0 (indicating both cores are idle),
> then surely GDB should still represent this as two threads?  You're
> not wrong.  This was mentioned by Pedro in the original GDB mailing
> list thread here:
> 
>   https://inbox.sourceware.org/gdb-patches/201008061057.03037.pedro@codesourcery.com/
> 
> This is indeed a problem, and this problem is still present in GDB
> today.  I plan to try and address this in a later commit, however,
> this first commit is about getting a test in place to confirm that GDB
> at a minimum doesn't crash when loading such a vmcore file.
> 
> And so, finally, what's in this commit?
> 
> This commit contains a new test.  The test doesn't actually contain a
> vmcore file.  Instead I've created a standard application core file
> that contains two threads, and then manually edited the core file to
> set the lwpid of each thread to 0.
> 
> To further reduce the size of the core file (as it will be stored in
> git), I've zeroed all of the LOAD-able segments in the core file.
> This test really doesn't care about that part of the core file, we
> only really care about loading the register's, this is enough to
> confirm that the GDB doesn't crash.
> 
> Obviously as the core file is pre-generated, this test is architecture
> specific.  There are already a few tests in gdb.arch/ that include
> pre-generate core files.  Just as those existing tests do, I've
> compressed the core file with bzip2, which reduces it to just 750
> bytes.  I have structured the test so that if/when this patch is
> merged I can add some additional core files for other architectures,
> however, these are not included in this commit.
> 
> The test simply expands the core file, and then loads it into GDB.
> One interesting thing to note is that GDB reports the core file
> loading like this:
> 
>   (gdb) core-file ./gdb/testsuite/outputs/gdb.arch/core-file-pid0/core-file-pid0.x86-64.core
>   [New process 1]
>   [New process 1]
>   Failed to read a valid object file image from memory.
>   Core was generated by `./segv-mt'.
>   Program terminated with signal SIGSEGV, Segmentation fault.
>   The current thread has terminated
>   (gdb)
> 
> There's two interesting things here: first, the repeated "New process
> 1" message.  This is caused because linux_core_pid_to_str reports
> anything with an lwpid of 0 as a process, rather than an LWP.  And
> second, the "The current thread has terminated" message.  This is
> because the first thread in the core file is the current thread, but
> when GDB loads the second thread (which also has lwpid 0) this causes
> the first thread to be deleted, as a result GDB thinks that the
> current (first) thread has terminated.
> 
> As I said previously, both of these problems are a result of the lwpid
> 0 aliasing, which is not being fixed in this commit -- this commit is
> just confirming that GDB doesn't crash when loading this core file.

Great explanation! :)

Approved-by: Kevin Buettner <kevinb@redhat.com>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] gdb: handle core files with .reg/0 section names
  2023-06-05  9:11 ` [PATCH 3/3] gdb: handle core files with .reg/0 section names Andrew Burgess
@ 2023-06-10  0:36   ` Kevin Buettner
  0 siblings, 0 replies; 11+ messages in thread
From: Kevin Buettner @ 2023-06-10  0:36 UTC (permalink / raw)
  To: Andrew Burgess via Gdb-patches; +Cc: Andrew Burgess

On Mon,  5 Jun 2023 10:11:09 +0100
Andrew Burgess via Gdb-patches <gdb-patches@sourceware.org> wrote:

> The previous commit added the test gdb.arch/core-file-pid0.exp which
> tests GDB's ability to load a core file containing threads with an
> lwpid of 0, which is something we GDB can encounter when loading a
> vmcore file -- a core file generated by the Linux kernel.  The threads
> with an lwpid of 0 represents idle cores.
> 
> While the previous commit added the test, which confirms GDB doesn't
> crash when confronted with such a core file, there are still some
> problems with GDB's handling of these core files.  These problems all
> originate from the fact that the core file (once opened by bfd)
> contains multiple sections called .reg/0, these sections all
> represents different threads (cpu cores in the original vmcore dump),
> but GDB gets confused and thinks all of these .reg/0 sections are all
> referencing the same thread.
> 
> Here is a GDB session on an x86-64 machine which loads the core file
> from the gdb.arch/core-file-pid0.exp, this core file contains two
> threads, both of which have a pid of 0:
> 
>   $ ./gdb/gdb --data-directory ./gdb/data-directory/ -q
>   (gdb) core-file /tmp/x86_64-pid0-core.core
>   [New process 1]
>   [New process 1]
>   Failed to read a valid object file image from memory.
>   Core was generated by `./segv-mt'.
>   Program terminated with signal SIGSEGV, Segmentation fault.
>   The current thread has terminated
>   (gdb) info threads
>     Id   Target Id         Frame
>     2    process 1         0x00000000004017c2 in ?? ()
> 
>   The current thread <Thread ID 1> has terminated.  See `help thread'.
>   (gdb) maintenance info sections
>   Core file: `/tmp/x86_64-pid0-core.core', file type elf64-x86-64.
>    [0]      0x00000000->0x000012d4 at 0x00000318: note0 READONLY HAS_CONTENTS
>    [1]      0x00000000->0x000000d8 at 0x0000039c: .reg/0 HAS_CONTENTS
>    [2]      0x00000000->0x000000d8 at 0x0000039c: .reg HAS_CONTENTS
>    [3]      0x00000000->0x00000080 at 0x0000052c: .note.linuxcore.siginfo/0 HAS_CONTENTS
>    [4]      0x00000000->0x00000080 at 0x0000052c: .note.linuxcore.siginfo HAS_CONTENTS
>    [5]      0x00000000->0x00000140 at 0x000005c0: .auxv HAS_CONTENTS
>    [6]      0x00000000->0x000000a4 at 0x00000714: .note.linuxcore.file/0 HAS_CONTENTS
>    [7]      0x00000000->0x000000a4 at 0x00000714: .note.linuxcore.file HAS_CONTENTS
>    [8]      0x00000000->0x00000200 at 0x000007cc: .reg2/0 HAS_CONTENTS
>    [9]      0x00000000->0x00000200 at 0x000007cc: .reg2 HAS_CONTENTS
>    [10]     0x00000000->0x00000440 at 0x000009e0: .reg-xstate/0 HAS_CONTENTS
>    [11]     0x00000000->0x00000440 at 0x000009e0: .reg-xstate HAS_CONTENTS
>    [12]     0x00000000->0x000000d8 at 0x00000ea4: .reg/0 HAS_CONTENTS
>    [13]     0x00000000->0x00000200 at 0x00000f98: .reg2/0 HAS_CONTENTS
>    [14]     0x00000000->0x00000440 at 0x000011ac: .reg-xstate/0 HAS_CONTENTS
>    [15]     0x00400000->0x00401000 at 0x00002000: load1 ALLOC LOAD READONLY HAS_CONTENTS
>    [16]     0x00401000->0x004b9000 at 0x00003000: load2 ALLOC READONLY CODE
>    [17]     0x004b9000->0x004e5000 at 0x00003000: load3 ALLOC READONLY
>    [18]     0x004e6000->0x004ec000 at 0x00003000: load4 ALLOC LOAD HAS_CONTENTS
>    [19]     0x004ec000->0x004f2000 at 0x00009000: load5 ALLOC LOAD HAS_CONTENTS
>    [20]     0x012a8000->0x012cb000 at 0x0000f000: load6 ALLOC LOAD HAS_CONTENTS
>    [21]     0x7fda77736000->0x7fda77737000 at 0x00032000: load7 ALLOC READONLY
>    [22]     0x7fda77737000->0x7fda77f37000 at 0x00032000: load8 ALLOC LOAD HAS_CONTENTS
>    [23]     0x7ffd55f65000->0x7ffd55f86000 at 0x00832000: load9 ALLOC LOAD HAS_CONTENTS
>    [24]     0x7ffd55fc3000->0x7ffd55fc7000 at 0x00853000: load10 ALLOC LOAD READONLY HAS_CONTENTS
>    [25]     0x7ffd55fc7000->0x7ffd55fc9000 at 0x00857000: load11 ALLOC LOAD READONLY CODE HAS_CONTENTS
>    [26]     0xffffffffff600000->0xffffffffff601000 at 0x00859000: load12 ALLOC LOAD READONLY CODE HAS_CONTENTS
>   (gdb)
> 
> Notice when the core file is first loaded we see two lines like:
> 
>   [New process 1]
> 
> And GDB reports:
> 
>   The current thread has terminated
> 
> Which isn't what we'd expect from a core file -- the core file should
> only contain threads that are live at the point of the crash, one of
> which should be the current thread.  The above message is reported
> because GDB has deleted what we think is the current thread!
> 
> And in the 'info threads' output we are only seeing a single thread,
> again, this is because GDB has deleted one of the threads.
> 
> Finally, the 'maintenance info sections' output shows the cause of all
> our problems, two sections named .reg/0.  When GDB sees the first of
> these it creates a new thread.  But, when we see the second .reg/0 GDB
> tries to create another new thread, but this thread has the same
> ptid_t as the first thread, so GDB deletes the first thread and
> creates the second thread in its place.
> 
> Because both these threads are created with an lwpid of 0 GDB reports
> these are 'New process NN' rather than 'New LWP NN' which is what we
> would normally expect.
> 
> The previous commit includes a little more of the history of GDB
> support in this area, but these problems were discussed on the mailing
> list a while ago in this thread:
> 
>   https://inbox.sourceware.org/gdb-patches/AANLkTi=zuEDw6qiZ1jRatkdwHO99xF2Qu+WZ7i0EQjef@mail.gmail.com/
> 
> In this commit I propose a solution to these problems.
> 
> What I propose is that GDB should spot when we have .reg/0 sections
> and, when these are found, should rename these sections using some
> unique non-zero lwpid.
> 
> Note in the above output we also have sections like .reg2/0 and
> .reg-xstate/0, these are additional register sets, this commit also
> renumbers these sections inline with their .reg section.
> 
> The user is warned that some section renumbering has been performed.
> 
> GDB takes care to ensure that the new numbers assigned are unique and
> don't clash with any of the pid's that might already be in use --
> remember, in a real vmcore file, 0 is used to indicate an idle core,
> non-idle cores will have the pid of whichever process was running on
> that core, so we don't want GDB to assign an lwpid that clashes with
> an actual pid that is in use in the core file.
> 
> After this commit here's the updated GDB session output:
> 
>   $ ./gdb/gdb --data-directory ./gdb/data-directory/ -q
>   (gdb) core-file /tmp/x86_64-pid0-core.core
>   warning: found threads with pid 0, assigned replacement Target Ids: LWP 1, LWP 2
>   [New LWP 1]
>   [New LWP 2]
>   Failed to read a valid object file image from memory.
>   Core was generated by `./segv-mt'.
>   Program terminated with signal SIGSEGV, Segmentation fault.
>   #0  0x00000000004017c2 in ?? ()
>   [Current thread is 1 (LWP 1)]
>   (gdb) info threads
>     Id   Target Id         Frame
>   * 1    LWP 1             0x00000000004017c2 in ?? ()
>     2    LWP 2             0x000000000040dda5 in ?? ()
>   (gdb) maintenance info sections
>   Core file: `/tmp/x86_64-pid0-core.core', file type elf64-x86-64.
>    [0]      0x00000000->0x000012d4 at 0x00000318: note0 READONLY HAS_CONTENTS
>    [1]      0x00000000->0x000000d8 at 0x0000039c: .reg/1 HAS_CONTENTS
>    [2]      0x00000000->0x000000d8 at 0x0000039c: .reg HAS_CONTENTS
>    [3]      0x00000000->0x00000080 at 0x0000052c: .note.linuxcore.siginfo/1 HAS_CONTENTS
>    [4]      0x00000000->0x00000080 at 0x0000052c: .note.linuxcore.siginfo HAS_CONTENTS
>    [5]      0x00000000->0x00000140 at 0x000005c0: .auxv HAS_CONTENTS
>    [6]      0x00000000->0x000000a4 at 0x00000714: .note.linuxcore.file/1 HAS_CONTENTS
>    [7]      0x00000000->0x000000a4 at 0x00000714: .note.linuxcore.file HAS_CONTENTS
>    [8]      0x00000000->0x00000200 at 0x000007cc: .reg2/1 HAS_CONTENTS
>    [9]      0x00000000->0x00000200 at 0x000007cc: .reg2 HAS_CONTENTS
>    [10]     0x00000000->0x00000440 at 0x000009e0: .reg-xstate/1 HAS_CONTENTS
>    [11]     0x00000000->0x00000440 at 0x000009e0: .reg-xstate HAS_CONTENTS
>    [12]     0x00000000->0x000000d8 at 0x00000ea4: .reg/2 HAS_CONTENTS
>    [13]     0x00000000->0x00000200 at 0x00000f98: .reg2/2 HAS_CONTENTS
>    [14]     0x00000000->0x00000440 at 0x000011ac: .reg-xstate/2 HAS_CONTENTS
>    [15]     0x00400000->0x00401000 at 0x00002000: load1 ALLOC LOAD READONLY HAS_CONTENTS
>    [16]     0x00401000->0x004b9000 at 0x00003000: load2 ALLOC READONLY CODE
>    [17]     0x004b9000->0x004e5000 at 0x00003000: load3 ALLOC READONLY
>    [18]     0x004e6000->0x004ec000 at 0x00003000: load4 ALLOC LOAD HAS_CONTENTS
>    [19]     0x004ec000->0x004f2000 at 0x00009000: load5 ALLOC LOAD HAS_CONTENTS
>    [20]     0x012a8000->0x012cb000 at 0x0000f000: load6 ALLOC LOAD HAS_CONTENTS
>    [21]     0x7fda77736000->0x7fda77737000 at 0x00032000: load7 ALLOC READONLY
>    [22]     0x7fda77737000->0x7fda77f37000 at 0x00032000: load8 ALLOC LOAD HAS_CONTENTS
>    [23]     0x7ffd55f65000->0x7ffd55f86000 at 0x00832000: load9 ALLOC LOAD HAS_CONTENTS
>    [24]     0x7ffd55fc3000->0x7ffd55fc7000 at 0x00853000: load10 ALLOC LOAD READONLY HAS_CONTENTS
>    [25]     0x7ffd55fc7000->0x7ffd55fc9000 at 0x00857000: load11 ALLOC LOAD READONLY CODE HAS_CONTENTS
>    [26]     0xffffffffff600000->0xffffffffff601000 at 0x00859000: load12 ALLOC LOAD READONLY CODE HAS_CONTENTS
>   (gdb)
> 
> Notice the new warning which is issued when the core file is being
> loaded.  The threads are announced as '[New LWP NN]', and we see two
> threads in the 'info threads' output.  The 'maintenance info sections'
> output shows the result of the section renaming.
> 
> The gdb.arch/core-file-pid0.exp test has been update to check for the
> improved GDB output.
> ---
>  gdb/corelow.c                             | 150 ++++++++++++++++++++++
>  gdb/testsuite/gdb.arch/core-file-pid0.exp |  12 +-
>  2 files changed, 161 insertions(+), 1 deletion(-)

Another great explanation!

LGTM.

Reviewed-by: Kevin Buettner <kevinb@redhat.com>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/3] Improve vmcore loading
  2023-06-05  9:11 [PATCH 0/3] Improve vmcore loading Andrew Burgess
                   ` (2 preceding siblings ...)
  2023-06-05  9:11 ` [PATCH 3/3] gdb: handle core files with .reg/0 section names Andrew Burgess
@ 2023-07-03 17:03 ` Andrew Burgess
  3 siblings, 0 replies; 11+ messages in thread
From: Andrew Burgess @ 2023-07-03 17:03 UTC (permalink / raw)
  To: gdb-patches

Andrew Burgess <aburgess@redhat.com> writes:

> This patch started as a proposal to upstream a test from the Fedora
> GDB tree (patch #2), however, as I started looking into the test a
> little more I realised that there was scope for further improvements
> to GDB (patch #3).  Patch #1 is a small cleanup/refactor.
>
> This patch is all about loading vmcore files -- that is core files
> generated by the Linux kernel.  See the commit messages on patches #2
> and #3 for more details.

I've gone ahead and pushed this series.

Thanks,
Andrew


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] gdb/testsuite: add test for core file with a 0 pid
  2023-06-05  9:11 ` [PATCH 2/3] gdb/testsuite: add test for core file with a 0 pid Andrew Burgess
  2023-06-10  0:16   ` Kevin Buettner
@ 2023-07-06 14:49   ` Pedro Alves
  2023-07-07  9:56     ` Andrew Burgess
  2023-07-10 11:10   ` Andrew Burgess
  2 siblings, 1 reply; 11+ messages in thread
From: Pedro Alves @ 2023-07-06 14:49 UTC (permalink / raw)
  To: gdb-patches

Hi Andrew,

I'm skimming the thread to catch up, and noticed this:

On 2023-06-05 10:11, Andrew Burgess via Gdb-patches wrote:
> +++ b/gdb/testsuite/gdb.arch/core-file-pid0.exp
> @@ -0,0 +1,63 @@
> +# This testcase is part of GDB, the GNU debugger.
> +#
> +# Copyright 2023 Free Software Foundation, Inc.
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 2 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write to the Free Software
> +# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
> +

Wrong license (GPLv2), and wrong header -- we haven't been using the snail
mail FSF header in years.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] gdb/testsuite: add test for core file with a 0 pid
  2023-07-06 14:49   ` Pedro Alves
@ 2023-07-07  9:56     ` Andrew Burgess
  0 siblings, 0 replies; 11+ messages in thread
From: Andrew Burgess @ 2023-07-07  9:56 UTC (permalink / raw)
  To: Pedro Alves, gdb-patches

Pedro Alves <pedro@palves.net> writes:

> Hi Andrew,
>
> I'm skimming the thread to catch up, and noticed this:
>
> On 2023-06-05 10:11, Andrew Burgess via Gdb-patches wrote:
>> +++ b/gdb/testsuite/gdb.arch/core-file-pid0.exp
>> @@ -0,0 +1,63 @@
>> +# This testcase is part of GDB, the GNU debugger.
>> +#
>> +# Copyright 2023 Free Software Foundation, Inc.
>> +#
>> +# This program is free software; you can redistribute it and/or modify
>> +# it under the terms of the GNU General Public License as published by
>> +# the Free Software Foundation; either version 2 of the License, or
>> +# (at your option) any later version.
>> +#
>> +# This program is distributed in the hope that it will be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +# GNU General Public License for more details.
>> +#
>> +# You should have received a copy of the GNU General Public License
>> +# along with this program; if not, write to the Free Software
>> +# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
>> +
>
> Wrong license (GPLv2), and wrong header -- we haven't been using the snail
> mail FSF header in years.

Ooops.  Thanks for spotting this.

I pushed the patch below to correct this mistake.

Thanks,
Andrew

---

commit 7c632c2a696fb68e5575db1e2c934788a831e578
Author: Andrew Burgess <aburgess@redhat.com>
Date:   Fri Jul 7 10:51:53 2023 +0100

    gdb/testsuite: fix license on recently added file
    
    The license header on a file I recently contributed was incorrect.
    The file was added in commit:
    
      commit 087969169836f802a09b1cd0502d2f22d7a8f7dc
      Date:   Tue May 23 11:25:21 2023 +0100
    
          gdb: handle core files with .reg/0 section names
    
    The problems were:
    
      - GPLv2 instead of GPLv3,
      - Use the FSF postal address rather than their URL.
    
    Nobody else has touched the file since I merged it, so I don't believe
    there are any problems with me changing the license, this commit does
    just that.

diff --git a/gdb/testsuite/gdb.arch/core-file-pid0.exp b/gdb/testsuite/gdb.arch/core-file-pid0.exp
index 6e91111b44b..56746cca567 100644
--- a/gdb/testsuite/gdb.arch/core-file-pid0.exp
+++ b/gdb/testsuite/gdb.arch/core-file-pid0.exp
@@ -4,7 +4,7 @@
 #
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
-# the Free Software Foundation; either version 2 of the License, or
+# the Free Software Foundation; either version 3 of the License, or
 # (at your option) any later version.
 #
 # This program is distributed in the hope that it will be useful,
@@ -13,8 +13,7 @@
 # GNU General Public License for more details.
 #
 # You should have received a copy of the GNU General Public License
-# along with this program; if not, write to the Free Software
-# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
 
 # Some kernel core files have PID 0 (for the idle task), check that
 # GDB can handle such a core file.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] gdb/testsuite: add test for core file with a 0 pid
  2023-06-05  9:11 ` [PATCH 2/3] gdb/testsuite: add test for core file with a 0 pid Andrew Burgess
  2023-06-10  0:16   ` Kevin Buettner
  2023-07-06 14:49   ` Pedro Alves
@ 2023-07-10 11:10   ` Andrew Burgess
  2 siblings, 0 replies; 11+ messages in thread
From: Andrew Burgess @ 2023-07-10 11:10 UTC (permalink / raw)
  To: gdb-patches

Andrew Burgess <aburgess@redhat.com> writes:

> This patch contains a test for this commit:
>
>   commit c820c52a914cc9d7c63cb41ad396f4ddffff2196
>   Date:   Fri Aug 6 19:45:58 2010 +0000
>
>               * thread.c (add_thread_silent): Use null_ptid instead of
>               minus_one_ptid while getting rid of stale inferior_ptid.
>
> This is another test that has been carried in the Fedora GDB tree for
> some time, and I thought that it would be worth merging to master.  I
> don't believe there is any test like this currently in the testsuite.
>
> The original issue was reported in this thread:
>
>   https://inbox.sourceware.org/gdb-patches/AANLkTi=zuEDw6qiZ1jRatkdwHO99xF2Qu+WZ7i0EQjef@mail.gmail.com/
>
> The problem was that when GDB was used to open a vmcore (core file)
> image generated by the Linux kernel GDB would (sometimes) crash with
> an assertion failure:
>
>   thread.c:884: internal-error: switch_to_thread: Assertion `inf != NULL' failed.
>
> To understand what's going on we need some background; a vmcore file
> represents each processor core in the same way that a standard
> application core file represents threads.  Thus, we might say, a
> vmcore file represents cores as threads.
>
> When writing a vmcore file, the kernel will store the pid of the
> process currently running on that core as the thread's lwpid.
>
> However, if a core is idle, with no process currently running on it,
> then the lwpid for that thread is stored as 0 in the vmcore file.  If
> multiple cores are idle then multiple threads will have a lwpid of 0.
>
> Back in 2010, the original issue reported tried to change the kernel's
> behaviour in this thread:
>
>   https://lkml.org/lkml/2010/8/3/75
>
> This change was rejected by the kernel team, the current
> behaviour (lwpid of 0) was considered correct.  I've checked the
> source of a recent kernel.  The code mentioned in the lkml.org posting
> has moved, it's now in the function crash_save_cpu in the file
> kernel/kexec_core.c, but the general behaviour is unchanged, an idle
> core will have an lwpid of 0, so I think GDB still needs to be able to
> handle this case.
>
> When GDB loads a vmcore file (which is handled just like any other
> core file) the sections are processed in core_open to generate the
> threads for the core file.  The processing is done by calling
> add_to_thread_list, a function which looks for sections named .reg/NN
> where NN is the lwpid of the thread, GDB then builds a ptid_t for the
> new thread and calls add_thread.
>
> Remember, in our case the lwpid is 0.  Now for the first thread this
> is fine, if a little weird, 0 isn't usually a valid lwpid, but that's
> OK, GDB creates a thread with lwpid of 0 and carries on.
>
> When we find the next thread (core) with lwpid of 0, we attempt to
> create another thread with an lwpid of 0.  This of course clashes with
> the previously created thread, they have the same ptid_t, so GDB tries
> to delete the first thread.
>
> And it was within this thread delete code that we triggered a bug
> which would then cause GDB to assert -- when deleting we tried to
> switch to a thread with minus_one_ptid, this resulted in a call to
> find_inferior_pid (passing in minus_one_ptid's pid, which is -1), the
> find_inferior_pid call fails and returns NULL, which then triggered an
> assert in switch_to_thread.
>
> The actual details of the why the assert triggered are really not
> important.  What's important (I think) is that a vmcore file might
> have this interesting lwpid of 0 characteristic, which isn't something
> we see in "normal" application core files, and it is this that I think
> we should be testing.
>
> Now, you might be thinking: isn't deleting the first thread the wrong
> thing to do?  If the vmcore file has two threads that represent two
> cores, and both have an lwpid of 0 (indicating both cores are idle),
> then surely GDB should still represent this as two threads?  You're
> not wrong.  This was mentioned by Pedro in the original GDB mailing
> list thread here:
>
>   https://inbox.sourceware.org/gdb-patches/201008061057.03037.pedro@codesourcery.com/
>
> This is indeed a problem, and this problem is still present in GDB
> today.  I plan to try and address this in a later commit, however,
> this first commit is about getting a test in place to confirm that GDB
> at a minimum doesn't crash when loading such a vmcore file.
>
> And so, finally, what's in this commit?
>
> This commit contains a new test.  The test doesn't actually contain a
> vmcore file.  Instead I've created a standard application core file
> that contains two threads, and then manually edited the core file to
> set the lwpid of each thread to 0.
>
> To further reduce the size of the core file (as it will be stored in
> git), I've zeroed all of the LOAD-able segments in the core file.
> This test really doesn't care about that part of the core file, we
> only really care about loading the register's, this is enough to
> confirm that the GDB doesn't crash.
>
> Obviously as the core file is pre-generated, this test is architecture
> specific.  There are already a few tests in gdb.arch/ that include
> pre-generate core files.  Just as those existing tests do, I've
> compressed the core file with bzip2, which reduces it to just 750
> bytes.  I have structured the test so that if/when this patch is
> merged I can add some additional core files for other architectures,
> however, these are not included in this commit.
>
> The test simply expands the core file, and then loads it into GDB.
> One interesting thing to note is that GDB reports the core file
> loading like this:
>
>   (gdb) core-file ./gdb/testsuite/outputs/gdb.arch/core-file-pid0/core-file-pid0.x86-64.core
>   [New process 1]
>   [New process 1]
>   Failed to read a valid object file image from memory.
>   Core was generated by `./segv-mt'.
>   Program terminated with signal SIGSEGV, Segmentation fault.
>   The current thread has terminated
>   (gdb)
>
> There's two interesting things here: first, the repeated "New process
> 1" message.  This is caused because linux_core_pid_to_str reports
> anything with an lwpid of 0 as a process, rather than an LWP.  And
> second, the "The current thread has terminated" message.  This is
> because the first thread in the core file is the current thread, but
> when GDB loads the second thread (which also has lwpid 0) this causes
> the first thread to be deleted, as a result GDB thinks that the
> current (first) thread has terminated.
>
> As I said previously, both of these problems are a result of the lwpid
> 0 aliasing, which is not being fixed in this commit -- this commit is
> just confirming that GDB doesn't crash when loading this core file.
> ---
>  gdb/testsuite/gdb.arch/core-file-pid0.exp     |  63 ++++++++++++++++++
>  .../gdb.arch/core-file-pid0.x86-64.core.bz2   | Bin 0 -> 750 bytes
>  2 files changed, 63 insertions(+)
>  create mode 100644 gdb/testsuite/gdb.arch/core-file-pid0.exp
>  create mode 100644 gdb/testsuite/gdb.arch/core-file-pid0.x86-64.core.bz2
>
> diff --git a/gdb/testsuite/gdb.arch/core-file-pid0.exp b/gdb/testsuite/gdb.arch/core-file-pid0.exp
> new file mode 100644
> index 00000000000..b960dfe095b
> --- /dev/null
> +++ b/gdb/testsuite/gdb.arch/core-file-pid0.exp
> @@ -0,0 +1,63 @@
> +# This testcase is part of GDB, the GNU debugger.
> +#
> +# Copyright 2023 Free Software Foundation, Inc.
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 2 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write to the Free Software
> +# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
> +
> +# Some kernel core files have PID 0 (for the idle task), check that
> +# GDB can handle such a core file.
> +
> +standard_testfile
> +
> +# Set CF_NAME, the name of the compressed core file within the source
> +# tree, and CF_SIZE, the size (in bytes) of the uncompressed core
> +# file.
> +if {[istarget "x86_64-*-linux*"]} {
> +    set cf_name ${testfile}.x86-64.core.bz2
> +    set cf_size 8757248
> +} else {
> +    unsupported "no pre-generated core file for this target"
> +}

It was pointed out to me that after reporting 'unsupported', there
should be a return.  Without the return we end up seeing TCL errors
because the cf_name variable is not defined.

Fixed with the patch below, which I have gone ahead and pushed.

Thanks,
Andrew

---

commit 44c8334f4af5b9895d196077f23e20e15eff4c03
Author: Andrew Burgess <aburgess@redhat.com>
Date:   Mon Jul 10 12:05:21 2023 +0100

    gdb/testsuite: return after reporting a test unsupported
    
    In this commit:
    
      commit 8bcead69665af3a9f9867cd34c3a1daf22120027
      Date:   Tue May 23 11:25:01 2023 +0100
    
          gdb/testsuite: add test for core file with a 0 pid
    
    a new test gdb.arch/core-file-pid0.exp was added.  This test includes
    a pre-generated core file for x86-64 and for other architectures the
    test reports 'unsupported'.
    
    However, after reporting 'unsupported' the test failed to perform an
    early return, so the test would then carry on and try to actually
    perform the test, which resulted in some TCL errors.
    
    Fix this by returning after reporting the test unsupported.

diff --git a/gdb/testsuite/gdb.arch/core-file-pid0.exp b/gdb/testsuite/gdb.arch/core-file-pid0.exp
index 56746cca567..46b8c6db5ed 100644
--- a/gdb/testsuite/gdb.arch/core-file-pid0.exp
+++ b/gdb/testsuite/gdb.arch/core-file-pid0.exp
@@ -28,6 +28,7 @@ if {[istarget "x86_64-*-linux*"]} {
     set cf_size 8757248
 } else {
     unsupported "no pre-generated core file for this target"
+    return -1
 }
 
 # Decompress the core file.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-07-10 11:10 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-05  9:11 [PATCH 0/3] Improve vmcore loading Andrew Burgess
2023-06-05  9:11 ` [PATCH 1/3] gdb: split inferior and thread setup when opening a core file Andrew Burgess
2023-06-10  0:04   ` Kevin Buettner
2023-06-05  9:11 ` [PATCH 2/3] gdb/testsuite: add test for core file with a 0 pid Andrew Burgess
2023-06-10  0:16   ` Kevin Buettner
2023-07-06 14:49   ` Pedro Alves
2023-07-07  9:56     ` Andrew Burgess
2023-07-10 11:10   ` Andrew Burgess
2023-06-05  9:11 ` [PATCH 3/3] gdb: handle core files with .reg/0 section names Andrew Burgess
2023-06-10  0:36   ` Kevin Buettner
2023-07-03 17:03 ` [PATCH 0/3] Improve vmcore loading Andrew Burgess

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).