From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 101880 invoked by alias); 28 Jun 2018 12:09:31 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Received: (qmail 101487 invoked by uid 89); 28 Jun 2018 12:09:30 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-6.4 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_2,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=refreshed, inferior_thread X-HELO: mx1.redhat.com Received: from mx3-rdu2.redhat.com (HELO mx1.redhat.com) (66.187.233.73) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 28 Jun 2018 12:09:28 +0000 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E18FB401C7E9 for ; Thu, 28 Jun 2018 12:09:26 +0000 (UTC) Received: from [127.0.0.1] (ovpn04.gateway.prod.ext.ams2.redhat.com [10.39.146.4]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4F3E22026D69; Thu, 28 Jun 2018 12:09:26 +0000 (UTC) Subject: Re: Possible regression on gdb.multi/multi-arch-exec.exp To: Sergio Durigan Junior References: <20180607180704.3991-1-palves@redhat.com> <87in649jtd.fsf@redhat.com> Cc: gdb-patches@sourceware.org From: Pedro Alves Message-ID: <91c04ab2-ccbe-37ce-4a63-3350442dd406@redhat.com> Date: Thu, 28 Jun 2018 12:09:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <87in649jtd.fsf@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-SW-Source: 2018-06/txt/msg00678.txt.bz2 On 06/27/2018 07:16 PM, Sergio Durigan Junior wrote: > On Thursday, June 07 2018, Pedro Alves wrote: > >> This is more preparation bits for multi-target support. > > Hi Pedro, > > While preparing a new Fedora GDB rawhide release, I noticed a regression > related to this commit. The curious thing is that I am only able to > reproduce the regression on a Fedora Rawhide system; it doesn't happen > on my Fedora 27 machine (initially I thought it might be related to GCC, > but testing against GCC HEAD on my Fedora 27 machine also did not > trigger the regression). > > The test failing is gdb.multi/multi-arch-exec.exp, and here's what I'm seeing: > > (gdb) break all_started > Breakpoint 1 at 0x400848: file /home/sergio/build/gdb/testsuite/../../../binutils-gdb/gdb/testsuite/gdb.multi/multi-arch-exec.c, line 42. > (gdb) run > Starting program: /home/sergio/build/gdb/testsuite/outputs/gdb.multi/multi-arch-exec/1-multi-arch-exec > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib64/libthread_db.so.1". > [New Thread 0x7ffff7476700 (LWP 1354)] > > Thread 1 "1-multi-arch-ex" hit Breakpoint 1, all_started () at /home/sergio/build/gdb/testsuite/../../../binutils-gdb/gdb/testsuite/gdb.multi/multi-arch-exec.c:42 > 42 } > (gdb) delete breakpoints > Delete all breakpoints? (y or n) y > (gdb) info breakpoints > No breakpoints or watchpoints. > (gdb) break main > Breakpoint 2 at 0x400862: file /home/sergio/build/gdb/testsuite/../../../binutils-gdb/gdb/testsuite/gdb.multi/multi-arch-exec.c, line 51. > (gdb) thread 1 > [Switching to thread 1 (Thread 0x7ffff7fdf740 (LWP 1350))] > #0 all_started () at /home/sergio/build/gdb/testsuite/../../../binutils-gdb/gdb/testsuite/gdb.multi/multi-arch-exec.c:42 > 42 } > (gdb) PASS: gdb.multi/multi-arch-exec.exp: first_arch=1: selected_thread=1: follow_exec_mode=new: thread 1 > set follow-exec-mode new > (gdb) PASS: gdb.multi/multi-arch-exec.exp: first_arch=1: selected_thread=1: follow_exec_mode=new: set follow-exec-mode new > continue > Continuing. > [Thread 0x7ffff7476700 (LWP 1354) exited] > process 1350 is executing new program: /home/sergio/build/gdb/testsuite/outputs/gdb.multi/multi-arch-exec/1-multi-arch-exec-hello > [New inferior 2 (process 0)] > [New process 1350] > ../../binutils-gdb/gdb/target.c:3200: internal-error: gdbarch* default_thread_architecture(target_ops*, ptid_t): Assertion `inf != NULL' failed. > A problem internal to GDB has been detected, > further debugging may prove unreliable. > Quit this debugging session? (y or n) FAIL: gdb.multi/multi-arch-exec.exp: first_arch=1: selected_thread=1: follow_exec_mode=new: continue across exec that changes architecture (GDB internal error) > > > I spent some time investigating this, and here's what I've learned so > far: > > 1) When infrun.c:handle_inferior_event_1 is called and deals with > TARGET_WAITKIND_EXECD (around line 5275), it does: > > ... > case TARGET_WAITKIND_EXECD: > if (debug_infrun) > fprintf_unfiltered (gdb_stdlog, "infrun: TARGET_WAITKIND_EXECD\n"); > > /* Note we can't read registers yet (the stop_pc), because we > don't yet know the inferior's post-exec architecture. > 'stop_pc' is explicitly read below instead. */ > switch_to_thread_no_regs (ecs->event_thread); > > /* Do whatever is necessary to the parent branch of the vfork. */ > handle_vfork_child_exec_or_exit (1); > > /* This causes the eventpoints and symbol table to be reset. > Must do this now, before trying to determine whether to > stop. */ > follow_exec (inferior_ptid, ecs->ws.value.execd_pathname); // <---- #1 > > stop_pc = regcache_read_pc (get_thread_regcache (ecs->event_thread)); // <---- #2 > ... > > 2) When follow_exec is called (#1 above), it does: > > ... > /* The target reports the exec event to the main thread, even if > some other thread does the exec, and even if the main thread was > stopped or already gone. We may still have non-leader threads of > the process on our list. E.g., on targets that don't have thread > exit events (like remote); or on native Linux in non-stop mode if > there were only two threads in the inferior and the non-leader > one is the one that execs (and nothing forces an update of the > thread list up to here). When debugging remotely, it's best to > avoid extra traffic, when possible, so avoid syncing the thread > list with the target, and instead go ahead and delete all threads > of the process but one that reported the event. Note this must > be done before calling update_breakpoints_after_exec, as > otherwise clearing the threads' resources would reference stale > thread breakpoints -- it may have been one of these threads that > stepped across the exec. We could just clear their stepping > states, but as long as we're iterating, might as well delete > them. Deleting them now rather than at the next user-visible > stop provides a nicer sequence of events for user and MI > notifications. */ > ALL_THREADS_SAFE (th, tmp) > if (ptid_get_pid (th->ptid) == pid && !ptid_equal (th->ptid, ptid)) > delete_thread (th); > ... > > On my Fedora Rawhide box, delete_thread is being called to delete the > same thread as ecs->event_thread. On my Fedora 27 machine, it deletes a > different thread. > > 3) Back to handle_inferior_event_1, when #2 is called, ecs->event_thread > points to an invalid object, which triggers the assertion. > > > I haven't progressed much further (other things to wrap up), but I > decided to get the ball rolling already. If you need access to a Fedora > Rawhide VM, please let me know and I can provide this to you. I think the "gdb: Eliminate the 'stop_pc' global" patch () will fix this, because it moves the stop_pc assignment until after ecs->event_thread is refreshed: > @@ -5289,16 +5294,18 @@ Cannot fill $_exitsignal with the correct signal number.\n")); > stop. */ > follow_exec (inferior_ptid, ecs->ws.value.execd_pathname); > > - stop_pc = regcache_read_pc (get_thread_regcache (ecs->event_thread)); > - > /* In follow_exec we may have deleted the original thread and > created a new one. Make sure that the event thread is the > execd thread for that case (this is a nop otherwise). */ > ecs->event_thread = inferior_thread (); > > + ecs->event_thread->suspend.stop_pc > + = regcache_read_pc (get_thread_regcache (ecs->event_thread)); > + Thanks, Pedro Alves