From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-return-41521-listarch-gdb=sources.redhat.com@sourceware.org>
Received: (qmail 3185 invoked by alias); 14 Nov 2012 16:15:07 -0000
Received: (qmail 3120 invoked by uid 22791); 14 Nov 2012 16:15:03 -0000
X-SWARE-Spam-Status: No, hits=-7.7 required=5.0	tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,KHOP_THREADED,RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,SPF_HELO_PASS
X-Spam-Check-By: sourceware.org
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 14 Nov 2012 16:14:55 +0000
Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23])	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id qAEGEmmS010321	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);	Wed, 14 Nov 2012 11:14:48 -0500
Received: from [127.0.0.1] (ovpn01.gateway.prod.ext.ams2.redhat.com [10.39.146.11])	by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id qAEGEkBq014819;	Wed, 14 Nov 2012 11:14:46 -0500
Message-ID: <50A3C376.9080602@redhat.com>
Date: Wed, 14 Nov 2012 16:15:00 -0000
From: Pedro Alves <palves@redhat.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121029 Thunderbird/16.0.2
MIME-Version: 1.0
To: Andreas Arnez <arnez@linux.vnet.ibm.com>
CC: jan.kratochvil@redhat.com, gdb@sourceware.org
Subject: Re: Strange behavior of sigstep-threads.exp?
References: <878vacnlem.fsf@linux.vnet.ibm.com> <50A125FD.8090504@redhat.com> <87vcd9le9r.fsf@linux.vnet.ibm.com>
In-Reply-To: <87vcd9le9r.fsf@linux.vnet.ibm.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Mailing-List: contact gdb-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb.sourceware.org>
List-Subscribe: <mailto:gdb-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb/>
List-Post: <mailto:gdb@sourceware.org>
List-Help: <mailto:gdb-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-owner@sourceware.org
X-SW-Source: 2012-11/txt/msg00026.txt.bz2

On 11/13/2012 03:21 PM, Andreas Arnez wrote:
> Pedro Alves <palves@redhat.com> writes:
> 
>> > Whenever each of those single-steps is done, the other thread (let's
>> > call it thread #2) is allowed to run free (the "set scheduler-locking
>> > off" setting).
> Got it.  Seems I misunderstood the "step" behavior in this case.  Thanks
> for the explanation.
> 
>>> >> On s390x the test case actually fails sometimes.  In those cases,
>>> >> when stepping from step-1 to step-2, a ton of SIGUSR1 are indicated,
>>> >> and then the inferior seems to stop at the closing brace of the
>>> >> handler() function instead of the tgkill().
>> >
>> > That does sound like something's wrong.  Hacking the the test to force
>> > "set debug infrun 1" and "set debug lin-lwp 1" would be my first move.
> As soon as "lin-lwp" debugging is turned on the test always seems to
> succeed.  But with "debug infrun" alone the failure still occurs, and I
> observe the following:
> 
> 1. The stepped thread reaches the last instruction inside the stepping
> range.
> 
> 2. After resuming the stepped thread again, it traps at getpid@plt.
> Which is curious, because getpid() shouldn't be called until the
> instruction _after_ the stepping range.  It seems like the trap for that
> instruction was missed somehow.  (In the good case the thread always
> traps at the subroutine call, before having carried out the call.)

Indeed that's odd.  Is the event that is reported to GDB _before_ this
trap at getpid@plt a SIGUSR1 for the other thread?
linux-nat.c always gives preference to a SIGTRAP over other
signals, so it's unexpected that a trap could be lost.  Maybe while GDB goes
about stopping all threads with SIGSTOP (in effect, only the single-stepped
thread), the single step has actually completed, but the kernel manages
to report the SIGSTOP first, for some bizarre reason?  IOW, the kernel loses
the trap.  You could try to add some prints of the current PC of all lwps
at each event, so see if the stepped lwp makes the expected progress, but
we miss the trap, or even perhaps see if the lwp jumps between the last instruction
in the step range and getpid@plt, without the kernel ever reporting any
kind of stop at the branch.  Just adding a few prints instead of full
lin-lwp debug might be non disturbing enough.

> 
> 3. The thread is single-stepped until the jump to getpid().  The
> getpid() invocation itself is skipped with a step-resume breakpoint on
> the instruction after the original subroutine call.
> 
> 4. The step-resume breakpoint is reached.  Despite now being well
> outside the original stepping range, the thread is resumed.  Upon the
> next trap, an updated stepping range is shown, adjusted to fit the line
> of the tgkill().  Then stepping continues until the next line, which is
> the closing brace.
> 
>> > I wonder if this makes a difference?
>> >
>> >  gdb/infrun.c |    6 ++++--
>> >  1 file changed, 4 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/gdb/infrun.c b/gdb/infrun.c
>> > index de2cf19..9621b84 100644
>> > --- a/gdb/infrun.c
>> > +++ b/gdb/infrun.c
>> > @@ -4698,8 +4698,10 @@ process_event_stop_test:
>> >  	  ecs->event_thread = tp;
>> >  	  ecs->ptid = tp->ptid;
>> >  	  context_switch (ecs->ptid);
>> > -	  keep_going (ecs);
>> > -	  return;
>> > +
>> > +	  /* Keep checking.  The stepped thread might have already
>> > +	     reached its destination, but not have reported it yet.
>> > +	     If we just kept going, we could end up overstepping.  */
>> >  	}
>> >      }
> Yes, it does make a difference.  The test case still fails at a similar
> rate as before, but this time after "continue", because the inferior
> reaches "assert (0)".  Again, I can not reproduce this failure with "set
> debug infrun 1" and "set debug lin-lwp 1".

Hmm.  This confirms there's something not expected with single-step traps, like
we discuss above.  I shouldn't really make a difference for hardware step
Linux targets, as the backend always gives preference to single-step traps.  The
patch was originally for fixing something for software single-step targets, only.
If the assert is reached, a SIGUSR1 is mistakenly swallowed, which was the
whole point of the test.

-- 
Pedro Alves