From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 62CA5386CE69 for ; Thu, 30 Jun 2022 09:33:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 62CA5386CE69 Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-125-1i7EBOZ0MauXhTSXE9AdtA-1; Thu, 30 Jun 2022 05:33:12 -0400 X-MC-Unique: 1i7EBOZ0MauXhTSXE9AdtA-1 Received: by mail-wm1-f70.google.com with SMTP id r132-20020a1c448a000000b003a02a3f0beeso1151652wma.3 for ; Thu, 30 Jun 2022 02:33:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:in-reply-to:references:date :message-id:mime-version; bh=PjI6UmTwXXPDB7wnHvhLQjTbcQTJ4WBB3600zNnxt/E=; b=noBZxrcKXW5Xy0Qxrxv5daAKkqs/YDiNRx9vwAjLj6EfZ/4T2d3mEqMvbWHnhYPW0I mn6ZkdAGVYGk1MINp4YX4mDIOeQ8rI2tp2GKNDEg/hlrN07JZLcssGI8owK/Nv1jfE9l 6+/StkSt7PrimPk9b7x55aKVx0QJ8CtA+V4Jy379PL1CxanrYVLj5xI0+ppRN11oKWIK Cb4WZB9F97e2KzI9bxixgVVypP23RZZKJVpSkACzmuTk6Gg/A8xxtG029kUWVmnxSGwB B+2php7jyWi3yjd4Ggmj0hznKA2TV89q2Ss2PLSNQGMSY1lM6EaEwEISfY+oSsv+zpi+ P7sw== X-Gm-Message-State: AJIora9nQ47nGkxXpNi7DmqLz0azkeNi5AfJjVh/tWoyHlmnETovH2+N ntU1lYllFJNofrDN3Q4E+iGJ5BzNzXytFDLfFNuJtNrnLAuR5yg0ZoUots7Mz1IUcHWu/ewtfVm gabuGqQs4RE+zKEOLTXj3gA== X-Received: by 2002:adf:fdca:0:b0:21b:8e8e:686e with SMTP id i10-20020adffdca000000b0021b8e8e686emr7074135wrs.368.1656581590844; Thu, 30 Jun 2022 02:33:10 -0700 (PDT) X-Google-Smtp-Source: AGRyM1syH502HiMoAJTWDduyG/g+2qasU2Ke9J6BgawUlCgUbvWkNPk21yc2njpFxTzu6/j1UMshUA== X-Received: by 2002:adf:fdca:0:b0:21b:8e8e:686e with SMTP id i10-20020adffdca000000b0021b8e8e686emr7074097wrs.368.1656581590306; Thu, 30 Jun 2022 02:33:10 -0700 (PDT) Received: from localhost (15.72.115.87.dyn.plus.net. [87.115.72.15]) by smtp.gmail.com with ESMTPSA id x11-20020adff0cb000000b0021b92171d28sm23382992wro.54.2022.06.30.02.33.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Jun 2022 02:33:09 -0700 (PDT) From: Andrew Burgess To: Pedro Alves , gdb-patches@sourceware.org Subject: Re: [PATCHv3 6/6] gdb: native target invalid architecture detection In-Reply-To: <6db0b4b8-25fb-8f4c-382e-d45f6edcff17@palves.net> References: <83ceec06f74f4ce6a2dc8d794031fb233567ba6d.1655136816.git.aburgess@redhat.com> <48d211cc-96f3-8c84-5aec-7023054d45ac@palves.net> <87fsjqdqco.fsf@redhat.com> <6db0b4b8-25fb-8f4c-382e-d45f6edcff17@palves.net> Date: Thu, 30 Jun 2022 10:33:07 +0100 Message-ID: <87letecx7w.fsf@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain X-Spam-Status: No, score=-10.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Jun 2022 09:33:17 -0000 Pedro Alves writes: > On 2022-06-27 17:27, Andrew Burgess wrote: >> Pedro Alves writes: >>> On 2022-06-13 17:15, Andrew Burgess via Gdb-patches wrote: >>> I have another question. I'm confused on how if e.g., on x86-64, you >>> load a RISC-V binary into GDB, and the try to run it, we don't end up >>> with architecture of the target description instead. >> >> The problem is that we end up trying to read the registers before we >> read the target description. Which I suspect you will say sounds wrong, >> and I agree. >> >> I removed the error() calls from this patch, and reran GDB, here's the >> stack at the point of failure: >> > > ... > >> #11 0x0000000000bf43c4 in regcache_read_pc (regcache=0x30f5f10) at ../../src/gdb/regcache.c:1325 >> #12 0x00000000009c2e19 in save_stop_reason (lp=0x3a8ff80) at ../../src/gdb/linux-nat.c:2545 >> #13 0x00000000009c40f0 in linux_nat_filter_event (lwpid=3990223, status=1407) at ../../src/gdb/linux-nat.c:3003 >> #14 0x00000000009c471a in linux_nat_wait_1 (ptid=..., ourstatus=0x7fffffffa4f0, target_options=...) >> at ../../src/gdb/linux-nat.c:3178 >> #15 0x00000000009c535a in linux_nat_target::wait (this=0x2d27140 , ptid=..., >> ourstatus=0x7fffffffa4f0, target_options=...) at ../../src/gdb/linux-nat.c:3416 >> #16 0x0000000000dc5f3b in target_wait (ptid=..., status=0x7fffffffa4f0, options=...) >> at ../../src/gdb/target.c:2601 >> #17 0x0000000000abd3c4 in startup_inferior (proc_target=0x2d27140 , pid=3990223, >> ntraps=1, last_waitstatus=0x0, last_ptid=0x0) at ../../src/gdb/nat/fork-inferior.c:482 > > Alright, we're inside startup_inferior. > >> >> The initial architecture is riscv:rv32, and it is this architecture that >> is used for handling the initial stop. >> >> So my first thought was, pretty much as you suggest, that we should >> fetch the target description earlier, then we'll not run into this >> problem at all. >> > > I don't think we should fetch the tdesc earlier, I think that would be incorrect. Instead, > we should avoid reading registers while going through the shell. The shell's registers have > nothing to do with the final program's registers. The shell may even be of a different > architecture from the final inferior's architecture. Like, if your shell is a 64-bit > bash, and then the inferior you're going to debug, which the shell execs, is a 32-bit program. > >> My problem was figuring out the right place to add such a call. >> >> We know it has to be before we enter regcache_read_pc, as at this point >> the regcache is already created, and has the wrong architecture. >> >> And it can't really be earlier than linux_nat_filter_event I think, as >> before this point we don't know that the target has stopped. We can't >> probe the target for features (I would assume) if the target is not >> currently stopped. >> >> This seems to leave either save_stop_reason, or in the early stages of >> get_thread_regcache maybe. >> >> The problem with save_stop_reason is that it's Linux specific, so any >> non-Linux targets will need to add their own code to catch this >> situation. > > I think we should change save_stop_reason. But, change it to avoid reading > the PC instead. > > I don't think that the fact that other targets need to handle this too > is an issue. > >>> It kind of sounds like if GDB detected the incompatibility when we first fetch the >>> target description, similarly to how we detect it in the cases above, and GDB picked >>> the target arch or errored out after detecting the incompatibility, then we wouldn't >>> need a patch like yours? >> >> I agree. One other aspect is, do we have any native targets that don't >> support reading a target description? Hopefully not. > > If we do, we have an action plan -- add such support. I think a tdesc can be just an > element, for instance. > >> >>> How come the mismatch isn't detected then (when we first >>> retrieve the target description) today? >> >> Hopefully the above makes that clear, but the TLDR answer is: we attempt >> to read the registers using the current architecture before we fetch the >> target description. >> >> I suspected that, in part, this patch would be a bit of a straw-man, I'd >> appreciate any insight you have for how we might reorder things so that >> the target description can be read before we try to read the registers. >> > > So I gave the patch below a try, and it seems to works well -- readcache_read_pc > is no longer called before target_find_description, and I saw no testsuite > regressions (this is without your series, just pristine master). > > Now, this patch as is makes sense for linux-nat.c, but then I went to do the > same thing for gdbserver, which has an equivalent save_stop_reason in linux-low.cc, > and there this approach won't work as is for the simple fact that lwp->stop_pc > is used in a lot more places. > > So to avoid gdb vs gdbserver divergence, it may be better to not take this patch > as is, but instead tweak save_stop_reason to avoid reading the stop pc while > the inferior is going through the shell, based on some per-inferior flag or some > such. Maybe we can repurpose progspace->executing_startup for this? That > flag used to be used by Cell (which is gone), but seems unused nowadays. There's > also other similar flags in inferior itself, like needs_setup, and > in_initial_library_scan. So we'd add a flag like one of those, set it while the > inferior is going through the startup_inferior dance, and make save_stop_reason avoid > reading the PC while that flag is set. > > Note the patch below applies on top of: > > [PATCH] gdb+gdbserver/Linux: Remove USE_SIGTRAP_SIGINFO fallback > https://sourceware.org/pipermail/gdb-patches/2022-June/190372.html > > Attaching will need some other fix, because in that case, the procedure is that the core > requests an attach, and then the target reports an initial stop, and that stop > makes infrun read the thread's PC, before setup_inferior is called. It may be > we need to call setup_inferior earlier. Or maybe we can just add a target_find_description > call in linux_nat_target::attach, like remote.c does: > > void > extended_remote_target::attach (const char *args, int from_tty) > { > ... > /* Next, if the target can specify a description, read it. We do > this before anything involving memory or registers. */ > target_find_description (); > > I haven't looked at that in much detail, and I have to leave for tonight. How about the patch below to solve the attach case? Thanks, Andrew --- commit 73f25fb4e74b528c18fa40e2477ca6f11667ed47 Author: Andrew Burgess Date: Wed Jun 29 10:06:38 2022 +0100 gdb: fetch target description during attach on linux This commit aims to address the problem of attaching to a native Linux process when the current BFD within GDB (as set by the 'file' command) is for the wrong architecture. For example, running on an x86-64 GNU/Linux target, attaching to an x86-64 process, but GDB is given a riscv-v binary. Currently, in such a situation I see this assertion: (gdb) file /tmp/riscv.exe (gdb) attach 3276515 Attaching to program: /tmp/riscv.exe, process 3276515 ../../src/gdb/i387-tdep.c:596: internal-error: i387_supply_fxsave: Assertion `tdep->st0_regnum >= I386_ST0_REGNUM' failed. The assertion is cause because, after attaching, we try to handle the first stop event. To do this we try to read registers from the target, which causes us to call target_ops::fetch_registers, which for x86-64 leads to amd64_linux_nat_target::fetch_registers, and for my machine, from there I end up (eventually) in i387_supply_fxsave. This would all be fine, except, as the current file is a riscv-v executable, the gdbarch for the current inferior is set to be a riscv-v gdbarch, and the gdbarch_tdep is a riscv_gdbarch_tdep. So, in i387_supply_fxsave we do this: i386_gdbarch_tdep *tdep = (i386_gdbarch_tdep *) gdbarch_tdep (arch); We incorrectly cast the tdep object, and now have undefined behaviour. The solution to this problem is to ensure that GDB fetches the target description early enough that we can change the architecture of the current inferior to be x86-64 before we handle the first stop event. Currently the target description is fetched in post_create_inferior, but this is really too late, by this point the first stop even has been handled, and if the current inferior's architecture was wrong, then we will have already experienced undefined behaviour by this point. And so, in this commit, I propose calling target_find_description from linux_nat_target::attach. By this point the target has been attached too, and is stopped, but GDB has not yet processed the stop, i.e. tried to read the $pc register from the target. This is almost enough to resolve the issue, but there is one additional problem. I did consider adding the target_find_description call to the attach_command function, in theory, this would have solved the "needs to get the target description earlier" problem for all targets, however, I didn't do this. The comments in attach_command concerned me a little, my reading of them was that not every target is guaranteed to be stopped by the time we return to attach_command, and I'm pretty sure that calling target_find_description for a target that is not stopped is a bad idea (as it could require probing for target features, which might not be allowed on a running target). And so, I figure, this might be something that needs handling on a target by target basis. Having arranged to fetch the target description earlier, there is just one problem left to solve. As part of calling target_find_description we call gdbarch_update_p, which calls gdbarch_find_by_info, which calls gdbarch_info_fill and then looks up a suitable gdbarch. In gdbarch_info_fill, if we have both a target description and a BFD supplied by the 'file' command, then we will call choose_architecture_for_target to select which of these architectures we should use. The goal of choose_architecture_for_target is to handle the case where one of the architectures (target description or BFD architecture) is more specific, we then want to use the more specific architecture. However, in my case we end up trying to choose between the target description's x86-64 architecture, and the BFD's riscv-v architecture, these architectures are not at all compatible, and so, currently, choose_architecture_for_target returns the risc-v architecture. This means we continue to try and access the target using the risc-v gdbarch and riscv_gdbarch_tdep structures, which leads to the same assertion. My second proposal in this patch is that, when we are selecting between two completely incompatible architectures in the function choose_architecture_for_target, we should default to using the architecture of the target description. This means we will then only try to access the target using a gdbarch that is compatible with the advertised target description. With these two fixes in place, now, when I attach to an x86-64 process but use a riscv-v binary with GDB, I see this: (gdb) file /tmp/riscv.exe (gdb) attach 3276515 Attaching to program: /tmp/riscv.exe, process 3276515 warning: Selected architecture riscv:rv32 is not compatible with reported target architecture i386:x86-64 Reading symbols from /lib64/libc.so.6... (No debugging symbols found in /lib64/libc.so.6) Reading symbols from /lib64/ld-linux-x86-64.so.2... (No debugging symbols found in /lib64/ld-linux-x86-64.so.2) Failed to read a valid object file image from memory. 0x00007f316710c1e7 in nanosleep () from /lib64/libc.so.6 (gdb) show architecture The target architecture is set to "auto" (currently "i386:x86-64"). The 'Failed to read a valid object file image from memory.' messages comes from symbol_file_add_from_memory, which is called from add_vsyscall_page. I think the problem here is that when we try to create a BFD to represent the vsysall page we base this BFD off of GDB's current BFD, which is a risc-v binary in my case. However, before that error, we do get the warning about the two architectures being incompatible, and, as you can see GDB has selected the architecture of the target description. The debug session is pretty useless at this point as GDB is using the symbols from one binary while trying to debug a completely different process, but I think this is fine. GDB didn't crash, and gave a hint that there was a problem with the architectures, hopefully, from this the user can figure out their mistake. No test as I don't know how I can reliably compile a binary for a different architecture as part of the testsuite. diff --git a/gdb/arch-utils.c b/gdb/arch-utils.c index ff946ee3767..c7994d208db 100644 --- a/gdb/arch-utils.c +++ b/gdb/arch-utils.c @@ -480,10 +480,17 @@ choose_architecture_for_target (const struct target_desc *target_desc, if (tdesc_compatible_p (target_desc, selected)) return from_target; + /* The architecture of the BFD and the target are not compatible. + Warn the user, and then use the architecture of the target. If we + select the architecture of the BFD, and are running on a native + target, then we can end up trying to fetch registers using a + gdbarch of the wrong architecture, e.g. in the case where GDB + attaches to a running process of one architecture, while the + symbol file loaded into GDB is for a different architecture. */ warning (_("Selected architecture %s is not compatible " "with reported target architecture %s"), selected->printable_name, from_target->printable_name); - return selected; + return from_target; } if (compat1 == NULL) @@ -503,11 +510,14 @@ choose_architecture_for_target (const struct target_desc *target_desc, return compat1; /* We have no idea which one is better. This is a bug, but not - a critical problem; warn the user. */ + a critical problem; warn the user, then use the architecture of the + target. If we instead chose the architecture of the BFD, and we are + using a native target, then GDB can end up trying to fetch registers + from the target using an invalid gdbarch. */ warning (_("Selected architecture %s is ambiguous with " "reported target architecture %s"), selected->printable_name, from_target->printable_name); - return selected; + return from_target; } /* Functions to manipulate the architecture of the target. */ diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c index aa6e3c4b57d..2a2d508668d 100644 --- a/gdb/linux-nat.c +++ b/gdb/linux-nat.c @@ -1141,6 +1141,9 @@ linux_nat_target::attach (const char *args, int from_tty) threads and associate pthread info with each LWP. */ linux_proc_attach_tgid_threads (lp->ptid.pid (), attach_proc_task_lwp_callback); + + /* Now the process is stopped we can fetch the target description. */ + target_find_description (); } /* Ptrace-detach the thread with pid PID. */