From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-148535-listarch-gdb-patches=sources.redhat.com@sourceware.org>
Received: (qmail 101880 invoked by alias); 28 Jun 2018 12:09:31 -0000
Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb-patches.sourceware.org>
List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-patches-owner@sourceware.org
Received: (qmail 101487 invoked by uid 89); 28 Jun 2018 12:09:30 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-6.4 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_2,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=refreshed, inferior_thread
X-HELO: mx1.redhat.com
Received: from mx3-rdu2.redhat.com (HELO mx1.redhat.com) (66.187.233.73) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 28 Jun 2018 12:09:28 +0000
Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4])	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))	(No client certificate requested)	by mx1.redhat.com (Postfix) with ESMTPS id E18FB401C7E9	for <gdb-patches@sourceware.org>; Thu, 28 Jun 2018 12:09:26 +0000 (UTC)
Received: from [127.0.0.1] (ovpn04.gateway.prod.ext.ams2.redhat.com [10.39.146.4])	by smtp.corp.redhat.com (Postfix) with ESMTP id 4F3E22026D69;	Thu, 28 Jun 2018 12:09:26 +0000 (UTC)
Subject: Re: Possible regression on gdb.multi/multi-arch-exec.exp
To: Sergio Durigan Junior <sergiodj@redhat.com>
References: <20180607180704.3991-1-palves@redhat.com> <87in649jtd.fsf@redhat.com>
Cc: gdb-patches@sourceware.org
From: Pedro Alves <palves@redhat.com>
Message-ID: <91c04ab2-ccbe-37ce-4a63-3350442dd406@redhat.com>
Date: Thu, 28 Jun 2018 12:09:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0
MIME-Version: 1.0
In-Reply-To: <87in649jtd.fsf@redhat.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-SW-Source: 2018-06/txt/msg00678.txt.bz2

On 06/27/2018 07:16 PM, Sergio Durigan Junior wrote:
> On Thursday, June 07 2018, Pedro Alves wrote:
> 
>> This is more preparation bits for multi-target support.
> 
> Hi Pedro,
> 
> While preparing a new Fedora GDB rawhide release, I noticed a regression
> related to this commit.  The curious thing is that I am only able to
> reproduce the regression on a Fedora Rawhide system; it doesn't happen
> on my Fedora 27 machine (initially I thought it might be related to GCC,
> but testing against GCC HEAD on my Fedora 27 machine also did not
> trigger the regression).
> 
> The test failing is gdb.multi/multi-arch-exec.exp, and here's what I'm seeing:
> 
>   (gdb) break all_started
>   Breakpoint 1 at 0x400848: file /home/sergio/build/gdb/testsuite/../../../binutils-gdb/gdb/testsuite/gdb.multi/multi-arch-exec.c, line 42.
>   (gdb) run 
>   Starting program: /home/sergio/build/gdb/testsuite/outputs/gdb.multi/multi-arch-exec/1-multi-arch-exec 
>   [Thread debugging using libthread_db enabled]
>   Using host libthread_db library "/lib64/libthread_db.so.1".
>   [New Thread 0x7ffff7476700 (LWP 1354)]
> 
>   Thread 1 "1-multi-arch-ex" hit Breakpoint 1, all_started () at /home/sergio/build/gdb/testsuite/../../../binutils-gdb/gdb/testsuite/gdb.multi/multi-arch-exec.c:42
>   42      }
>   (gdb) delete breakpoints
>   Delete all breakpoints? (y or n) y
>   (gdb) info breakpoints
>   No breakpoints or watchpoints.
>   (gdb) break main
>   Breakpoint 2 at 0x400862: file /home/sergio/build/gdb/testsuite/../../../binutils-gdb/gdb/testsuite/gdb.multi/multi-arch-exec.c, line 51.
>   (gdb) thread 1
>   [Switching to thread 1 (Thread 0x7ffff7fdf740 (LWP 1350))]
>   #0  all_started () at /home/sergio/build/gdb/testsuite/../../../binutils-gdb/gdb/testsuite/gdb.multi/multi-arch-exec.c:42
>   42      }
>   (gdb) PASS: gdb.multi/multi-arch-exec.exp: first_arch=1: selected_thread=1: follow_exec_mode=new: thread 1
>   set follow-exec-mode new
>   (gdb) PASS: gdb.multi/multi-arch-exec.exp: first_arch=1: selected_thread=1: follow_exec_mode=new: set follow-exec-mode new
>   continue
>   Continuing.
>   [Thread 0x7ffff7476700 (LWP 1354) exited]
>   process 1350 is executing new program: /home/sergio/build/gdb/testsuite/outputs/gdb.multi/multi-arch-exec/1-multi-arch-exec-hello
>   [New inferior 2 (process 0)]
>   [New process 1350]
>   ../../binutils-gdb/gdb/target.c:3200: internal-error: gdbarch* default_thread_architecture(target_ops*, ptid_t): Assertion `inf != NULL' failed.
>   A problem internal to GDB has been detected,
>   further debugging may prove unreliable.
>   Quit this debugging session? (y or n) FAIL: gdb.multi/multi-arch-exec.exp: first_arch=1: selected_thread=1: follow_exec_mode=new: continue across exec that changes architecture (GDB internal error)
> 
> 
> I spent some time investigating this, and here's what I've learned so
> far:
> 
> 1) When infrun.c:handle_inferior_event_1 is called and deals with
> TARGET_WAITKIND_EXECD (around line 5275), it does:
> 
>     ...
>     case TARGET_WAITKIND_EXECD:
>       if (debug_infrun)
>         fprintf_unfiltered (gdb_stdlog, "infrun: TARGET_WAITKIND_EXECD\n");
> 
>       /* Note we can't read registers yet (the stop_pc), because we
> 	 don't yet know the inferior's post-exec architecture.
> 	 'stop_pc' is explicitly read below instead.  */
>       switch_to_thread_no_regs (ecs->event_thread);
> 
>       /* Do whatever is necessary to the parent branch of the vfork.  */
>       handle_vfork_child_exec_or_exit (1);
> 
>       /* This causes the eventpoints and symbol table to be reset.
>          Must do this now, before trying to determine whether to
>          stop.  */
>       follow_exec (inferior_ptid, ecs->ws.value.execd_pathname);   // <---- #1
> 
>       stop_pc = regcache_read_pc (get_thread_regcache (ecs->event_thread)); // <---- #2
>       ...
> 
> 2) When follow_exec is called (#1 above), it does:
> 
>   ...
>   /* The target reports the exec event to the main thread, even if
>      some other thread does the exec, and even if the main thread was
>      stopped or already gone.  We may still have non-leader threads of
>      the process on our list.  E.g., on targets that don't have thread
>      exit events (like remote); or on native Linux in non-stop mode if
>      there were only two threads in the inferior and the non-leader
>      one is the one that execs (and nothing forces an update of the
>      thread list up to here).  When debugging remotely, it's best to
>      avoid extra traffic, when possible, so avoid syncing the thread
>      list with the target, and instead go ahead and delete all threads
>      of the process but one that reported the event.  Note this must
>      be done before calling update_breakpoints_after_exec, as
>      otherwise clearing the threads' resources would reference stale
>      thread breakpoints -- it may have been one of these threads that
>      stepped across the exec.  We could just clear their stepping
>      states, but as long as we're iterating, might as well delete
>      them.  Deleting them now rather than at the next user-visible
>      stop provides a nicer sequence of events for user and MI
>      notifications.  */
>   ALL_THREADS_SAFE (th, tmp)
>     if (ptid_get_pid (th->ptid) == pid && !ptid_equal (th->ptid, ptid))
>       delete_thread (th);
>   ...
> 
> On my Fedora Rawhide box, delete_thread is being called to delete the
> same thread as ecs->event_thread.  On my Fedora 27 machine, it deletes a
> different thread.
> 
> 3) Back to handle_inferior_event_1, when #2 is called, ecs->event_thread
> points to an invalid object, which triggers the assertion.
> 
> 
> I haven't progressed much further (other things to wrap up), but I
> decided to get the ball rolling already.  If you need access to a Fedora
> Rawhide VM, please let me know and I can provide this to you.

I think the "gdb: Eliminate the 'stop_pc' global" patch
(<https://sourceware.org/ml/gdb-patches/2018-06/msg00524.html>)
will fix this, because it moves the stop_pc assignment until
after ecs->event_thread is refreshed:

> @@ -5289,16 +5294,18 @@ Cannot fill $_exitsignal with the correct signal number.\n"));
>           stop.  */
>        follow_exec (inferior_ptid, ecs->ws.value.execd_pathname);
>  
> -      stop_pc = regcache_read_pc (get_thread_regcache (ecs->event_thread));
> -
>        /* In follow_exec we may have deleted the original thread and
>  	 created a new one.  Make sure that the event thread is the
>  	 execd thread for that case (this is a nop otherwise).  */
>        ecs->event_thread = inferior_thread ();
>  
> +      ecs->event_thread->suspend.stop_pc
> +	= regcache_read_pc (get_thread_regcache (ecs->event_thread));
> +

Thanks,
Pedro Alves