From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 6C4CB3858426 for ; Fri, 6 Oct 2023 16:43:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6C4CB3858426 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1696610624; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=bfkeH5apYUYkK9KeU0/v0ficrBnG3YhCaNDOi4wMrAk=; b=bMsEAVlyiVGEfcew6EXRSa0FxWo1a1Pq8gYDuNHzkGqlR1meMSqu97gfaD7KJaU95ZhwYe D3sPrg3EeW6xOMDbiO3pdCycIDRSUpYDJ72BXmUK8v8VqdwjCqQLKvHnWVubOW/umb/jTb OJn6BlfPtv89aD7oFMyZAJoGurSyVGQ= Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-85-DsH_D4s6NluWTskupDViUw-1; Fri, 06 Oct 2023 12:43:41 -0400 X-MC-Unique: DsH_D4s6NluWTskupDViUw-1 Received: by mail-qk1-f199.google.com with SMTP id af79cd13be357-77574c5f713so227965985a.0 for ; Fri, 06 Oct 2023 09:43:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696610620; x=1697215420; h=mime-version:message-id:date:references:in-reply-to:subject:cc:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=bfkeH5apYUYkK9KeU0/v0ficrBnG3YhCaNDOi4wMrAk=; b=jexcwQglYuK4QLY+xkGc1N3WwNpVE81EGIzPLvdWEPhIJl+89cNd7XVB91VSeUjY0S 8nMnylav1ivXdqhNoC7KxzgcepfamHa2ih790kYhslEoka6u0tQDRTVRnixLUH4s6NUE ozqj2tp2brtwJ0nopttXRFnrzqQ82x84WLpvzk0Csx+NRXDoa9MNEWfp0ylusfWqivNG H9BCLkvqdNqrp7AM0BCb/X1GQH05BYwFArDHidDz43mifOnlGsKArD6tftCuHR5aeqlW /5HQ4xHgjsKVgUMDNNDTrjxBJOC4yJe96ZSuY/JQiw35Q6bGOW98GR/Laq+UuFUQP0NH vAYQ== X-Gm-Message-State: AOJu0Yx+ZUbXKI4W160XaO+OCK1a5pr7Yu0yQThjEXnVSNohNqtk1cmD hVn0e2txWBapD6xrYeehTZvI9yut+HE500F+irHh2t3LyL5CHE1DswPw0PywgW/ap5oNTCEhZRC fp6ox6u7H2OipftAC9aYHY3UrzxXdqDS+o+0UUe9hv+KoyVejg4XDarRATlwBW3MJqS3r0n9mmt 7IWG/dKQ== X-Received: by 2002:a0c:a891:0:b0:658:1f8b:740e with SMTP id x17-20020a0ca891000000b006581f8b740emr7504443qva.31.1696610599620; Fri, 06 Oct 2023 09:43:19 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGOcLMBy1K48pj0LjGWd7QJ5dmUv6sVXqeK0yhrDqFiFKqS3SP8bu0ab3oxSvPshkeImVbOQQ== X-Received: by 2002:a0c:a891:0:b0:658:1f8b:740e with SMTP id x17-20020a0ca891000000b006581f8b740emr7504424qva.31.1696610599126; Fri, 06 Oct 2023 09:43:19 -0700 (PDT) Received: from localhost ([31.111.84.209]) by smtp.gmail.com with ESMTPSA id d4-20020a0cdb04000000b0065b260eafd9sm1514723qvk.87.2023.10.06.09.43.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Oct 2023 09:43:18 -0700 (PDT) From: Andrew Burgess To: gdb-patches@sourceware.org Cc: Simon Marchi Subject: Re: [PATCH] gdb: fix auxv cache clearing from new_objfile observer In-Reply-To: References: Date: Fri, 06 Oct 2023 17:43:17 +0100 Message-ID: <87y1gfsnve.fsf@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,RCVD_IN_SORBS_WEB,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Andrew Burgess writes: > It was pointed out on the mailing list that a recently added > test (gdb.python/py-progspace-events.exp) was failing when run with > the native-extended-gdbserver board. This test was added with this > commit: > > commit 59912fb2d22f8a4bb0862487f12a5cc65b6a013f > Date: Tue Sep 19 11:45:36 2023 +0100 > > gdb: add Python events for program space addition and removal > > It turns out though that the test is failing due to a existing bug > in GDB, the new test just exposes the problem. Additionally, the > failure really doesn't even rely on the new functionality added in the > above commit. I reduced the test to a simple set of steps that > reproduced the failure and tested against GDB 13, and the test passes; > so the bug was introduced since then. In fact, the bug was introduced > with this commit: > > commit a2827364e2bf19910fa5a54364a594a5ba3033b8 > Date: Fri Sep 8 15:48:16 2023 +0100 > > gdb: remove final user of the executable_changed observer > > This commit changed how the per-inferior auxv data cache is managed, > specifically, when the cache is cleared, and it is this that leads to > the failure. > > This bug is interesting because it exposes a number of issues with > GDB, I'll explain all of the problems I see, though ultimately, I only > propose fixing one problem in this commit, which is enough to resolve > the crash we are currently seeing. > > The crash that we are seeing manifests like this: > > ... > [Inferior 2 (process 3970384) exited normally] > +inferior 1 > [Switching to inferior 1 [process 3970383] (/tmp/build/gdb/testsuite/outputs/gdb.python/py-progspace-events/py-progspace-events)] > [Switching to thread 1.1 (Thread 3970383.3970383)] > #0 breakpt () at /tmp/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.python/py-progspace-events.c:28 > 28 { /* Nothing. */ } > (gdb) step > +step > terminate called after throwing an instance of 'gdb_exception_error' > > Fatal signal: Aborted > ... etc ... > > What's happening is that GDB attempts to refile the auxv cache as a > result of the gdbarch_has_shared_address_space call in > program_space::~program_space, the backtrace looks like this: > > #0 0x00007fb4f419a9a5 in raise () from /lib64/libpthread.so.0 > #1 0x00000000008b635d in handle_fatal_signal (sig=6) at ../../src/gdb/event-top.c:912 > #2 > #3 0x00007fb4f38e3625 in raise () from /lib64/libc.so.6 > #4 0x00007fb4f38cc8d9 in abort () from /lib64/libc.so.6 > #5 0x00007fb4f3c70756 in __gnu_cxx::__verbose_terminate_handler() [clone .cold] () from /lib64/libstdc++.so.6 > #6 0x00007fb4f3c7c6dc in __cxxabiv1::__terminate(void (*)()) () from /lib64/libstdc++.so.6 > #7 0x00007fb4f3c7b6e9 in __cxa_call_terminate () from /lib64/libstdc++.so.6 > #8 0x00007fb4f3c7c094 in __gxx_personality_v0 () from /lib64/libstdc++.so.6 > #9 0x00007fb4f3a80c63 in _Unwind_RaiseException_Phase2 () from /lib64/libgcc_s.so.1 > #10 0x00007fb4f3a8154e in _Unwind_Resume () from /lib64/libgcc_s.so.1 > #11 0x0000000000e8832d in target_read_alloc_1 (ops=0x408a3a0, object=TARGET_OBJECT_AUXV, annex=0x0) at ../../src/gdb/target.c:2266 > #12 0x0000000000e73dea in target_read_alloc (ops=0x408a3a0, object=TARGET_OBJECT_AUXV, annex=0x0) at ../../src/gdb/target.c:2315 > #13 0x000000000058248c in target_read_auxv_raw (ops=0x408a3a0) at ../../src/gdb/auxv.c:379 > #14 0x000000000058243d in target_read_auxv () at ../../src/gdb/auxv.c:368 > #15 0x000000000058255c in target_auxv_search (match=0x0, valp=0x7ffdee17c598) at ../../src/gdb/auxv.c:415 > #16 0x0000000000a464bb in linux_is_uclinux () at ../../src/gdb/linux-tdep.c:433 > #17 0x0000000000a464f6 in linux_has_shared_address_space (gdbarch=0x409a2d0) at ../../src/gdb/linux-tdep.c:440 > #18 0x0000000000510eae in gdbarch_has_shared_address_space (gdbarch=0x409a2d0) at ../../src/gdb/gdbarch.c:4889 > #19 0x0000000000bc7558 in program_space::~program_space (this=0x4544aa0, __in_chrg=) at ../../src/gdb/progspace.c:124 > #20 0x00000000009b245d in delete_inferior (inf=0x47b3de0) at ../../src/gdb/inferior.c:290 > #21 0x00000000009b2c10 in prune_inferiors () at ../../src/gdb/inferior.c:480 > #22 0x00000000009c5e3e in fetch_inferior_event () at ../../src/gdb/infrun.c:4558 > #23 0x000000000099b4dc in inferior_event_handler (event_type=INF_REG_EVENT) at ../../src/gdb/inf-loop.c:42 > #24 0x0000000000cbc64f in remote_async_serial_handler (scb=0x4090a30, context=0x408a6b0) at ../../src/gdb/remote.c:14859 > #25 0x0000000000d83d3a in run_async_handler_and_reschedule (scb=0x4090a30) at ../../src/gdb/ser-base.c:138 > #26 0x0000000000d83e1f in fd_event (error=0, context=0x4090a30) at ../../src/gdb/ser-base.c:189 > > So this is problem #1, if we throw an exception while deleting a > program_space then this is not caught, and is going to crash GDB. > > Problem #2 becomes evident when we ask why GDB is throwing an error in > this case; the error is thrown because the remote target, operating in > non-async mode, can't read the auxv data while an inferior is running > and GDB is waiting for a stop reply. The problem here then, is why > does GDB get into a position where it tries to interact with the > remote target in this way, at this time? The problem is caused by the > prune_inferiors call which can be seen in the above backtrace. > > In prune_inferiors we check if the inferior is deletable, and if it > is, we delete it. The problem is, I think, we should also check if > the target is currently in a state that would allow us to delete the > inferior. We don't currently have such a check available, we'd need > to add one, but for the remote target, this would return false if the > remote is in async mode and the remote is currently waiting for a stop > reply. With this change in place GDB would defer deleting the > inferior until the remote target has stopped, at which point GDB would > be able to refill the auxv cache successfully. > > And then, problem #3 becomes evident when we ask why GDB is needing to > refill the auxv cache now when it didn't need to for GDB 13. This is > where the second commit mentioned above (a2827364e2bf) comes in. > Prior to this commit, the auxv cache was cleared by the > executable_changed observer, while after that commit the auxv cache > was cleared by the new_objfile observer -- but only when the > new_objfile observer is used in the special mode that actually means > that all objfiles have been unloaded (I know, the overloading of the > new_objfile observer is horrible, and unnecessary, but it's not really > important for this bug). > > The difference arises because the new_objfile observer is triggered > from clear_symtab_users, which in turn is called from > program_space::~program_space. The new_objfile observer for auxv does > this: > > static void > auxv_new_objfile_observer (struct objfile *objfile) > { > if (objfile == nullptr) > invalidate_auxv_cache_inf (current_inferior ()); > } > > That is, when all the objfiles are unloaded, we clear the auxv cache > for the current inferior. > > The problem is, then when we look at the prune_inferiors -> > delete_inferior -> ~program_space path, we see that the current > inferior is not going to be an inferior that exists within the > program_space being deleted; delete_inferior removes the deleted > inferior from the global inferior list, and then only deletes the > program_space if program_space::empty() returns true, which is only > the case if the current inferior isn't within the program_space to > delete, and no other inferior exists within that program_space > either. > > What this means is that when the new_objfile observer is called we > can't rely on the current inferior having any relationship with the > program space in which the objfiles were removed. This was an error > in the commit a2827364e2bf, the only thing we can rely on is the > current program space. As a result of this mistake, after commit > a2827364e2bf, GDB was sometimes clearing the auxv cache for a random > inferior. In the native target case this was harmless as we can > always refill the cache when needed, but in the remote target case, if > we need to refill the cache when the remote target is executing, then > we get the crash we observed. > > And additionally, if we think about this a little more, we see that > commit a2827364e2bf made another mistake. When all the objfiles are > removed, they are removed from a program_space, a program_space might > contain multiple inferiors, so surely, we should clear the auxv cache > for all of the matching inferiors? > > Given these two insights, that the current_inferior is not relevant, > only the current_program_space, and that we should be clearing the > cache for all inferiors in the current_program_space, we can update > auxv_new_objfile_observer to: > > if (objfile == nullptr) > { > for (inferior *inf : all_inferiors ()) > { > if (inf->pspace == current_program_space) > invalidate_auxv_cache_inf (inf); > } > } > > With this change we now correctly clear the auxv cache for the correct > inferiors, and GDB no longer needs to refill the cache at an > inconvenient time, this avoids the crash we were seeing. > > And finally, we reach problem #4. Inspired by the observation that > using the current_inferior from within the ~program_space function was > not correct, I added some debug to see if current_inferior() was > called anywhere else (below ~program_space), and the answer is yes, > it's called a often. Mostly the culprit is GDB doing: > > current_inferior ()->top_target ()-> .... > > But I think all of these calls are most likely doing the wrong thing, > and only work because the top target in all these cases is shared > between all inferiors, e.g. it's the native target, or the remote > target for all inferiors. But if we had a truly multi-connection > setup, then we might start to see odd behaviour. > > Problem #1 I'm just ignoring for now, I guess at some point we might > run into this again, and then we'd need to solve this. But in this > case I wasn't sure what a "good" solution would look like. We need > the auxv data in order to implement the linux_is_uclinux() function. > If we can't get the auxv data then what should we do, assume yes, or > assume no? The right answer would probably be to propagate the error > back up the stack, but then we reach ~program_space, and throwing > exceptions from a destructor is problematic, so we'd need to catch and > deal at this point. The linux_is_uclinux() call is made from within > gdbarch_has_shared_address_space(), which is used like: > > if (!gdbarch_has_shared_address_space (target_gdbarch ())) > delete this->aspace; > > So, we would have to choose; delete the address space or not. If we > delete it on error, then we might delete an address space that is > shared within another program space. If we don't delete the address > space, then we might leak it. Neither choice is great. > > A better solution might be to have the address spaces be reference > counted, then we could remove the gdbarch_has_shared_address_space > call completely, and just rely on the reference count to auto-delete > the address space when appropriate. > > The solution for problem #2 I already hinted at above, we should have > a new target_can_delete_inferiors() call, which should be called from > prune_inferiors, this would prevent GDB from trying to delete > inferiors when a (remote) target is in a state where we know it can't > delete the inferior. Deleting an inferior often (always?) requires > sending packets to the remote, and if the remote is waiting for a stop > reply then this will never work, so the pruning should be deferred in > this case. > > The solution for problem #3 is included in this commit. > > And, for problem #4, I'm not sure what the right solution is. Maybe > delete_inferior should ensure the inferior to be deleted is in place > when ~program_space is called? But that seems a little weird, as the > current inferior would, in theory, still be using the current > program_space... > > Anyway, after this commit, the gdb.python/py-progspace-events.exp test > now passes when run with the native-extended-remote board. > > Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30935 This commit was merged (1b28c0f488bb). Thanks, Andrew > --- > gdb/auxv.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/gdb/auxv.c b/gdb/auxv.c > index 9f599b04a4f..cfb44ffa0bb 100644 > --- a/gdb/auxv.c > +++ b/gdb/auxv.c > @@ -351,7 +351,16 @@ static void > auxv_new_objfile_observer (struct objfile *objfile) > { > if (objfile == nullptr) > - invalidate_auxv_cache_inf (current_inferior ()); > + { > + /* When OBJFILE is nullptr, this indicates that all symbol files have > + been unloaded from the current program space. Discard cached auxv > + information from any inferior within the effected program space. */ > + for (inferior *inf : all_inferiors ()) > + { > + if (inf->pspace == current_program_space) > + invalidate_auxv_cache_inf (inf); > + } > + } > } > > /* See auxv.h. */ > > base-commit: e030ce2c79feee6a35665a6580a23dcf937ea46f > -- > 2.25.4