From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-return-41503-listarch-gdb=sources.redhat.com@sourceware.org>
Received: (qmail 24373 invoked by alias); 12 Nov 2012 16:38:50 -0000
Received: (qmail 24207 invoked by uid 22791); 12 Nov 2012 16:38:47 -0000
X-SWARE-Spam-Status: No, hits=-7.6 required=5.0	tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,KHOP_THREADED,RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,SPF_HELO_PASS,TW_TG
X-Spam-Check-By: sourceware.org
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 12 Nov 2012 16:38:38 +0000
Received: from int-mx01.intmail.prod.int.phx2.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11])	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id qACGcPDZ012910	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);	Mon, 12 Nov 2012 11:38:25 -0500
Received: from [127.0.0.1] (ovpn01.gateway.prod.ext.ams2.redhat.com [10.39.146.11])	by int-mx01.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id qACGcL9S026235;	Mon, 12 Nov 2012 11:38:23 -0500
Message-ID: <50A125FD.8090504@redhat.com>
Date: Mon, 12 Nov 2012 16:38:00 -0000
From: Pedro Alves <palves@redhat.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121029 Thunderbird/16.0.2
MIME-Version: 1.0
To: Andreas Arnez <arnez@linux.vnet.ibm.com>
CC: jan.kratochvil@redhat.com, gdb@sourceware.org
Subject: Re: Strange behavior of sigstep-threads.exp?
References: <878vacnlem.fsf@linux.vnet.ibm.com>
In-Reply-To: <878vacnlem.fsf@linux.vnet.ibm.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Mailing-List: contact gdb-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb.sourceware.org>
List-Subscribe: <mailto:gdb-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb/>
List-Post: <mailto:gdb@sourceware.org>
List-Help: <mailto:gdb-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-owner@sourceware.org
X-SW-Source: 2012-11/txt/msg00008.txt.bz2

On 11/08/2012 03:50 PM, Andreas Arnez wrote:
> On x86-64 with upstream GDB, I'm observing what I consider strange
> behavior of the sigstep-threads.exp test case.
> 
> Here's an excerpt from the log file:
> 
>   32        var++;                /* step-1 */
>   (gdb) disable $step1
>   (gdb) step
> 
>   Program received signal SIGUSR1, User defined signal 1.
> 
>   Program received signal SIGUSR1, User defined signal 1.
> 
>   Program received signal SIGUSR1, User defined signal 1.
> 
>   Program received signal SIGUSR1, User defined signal 1.
>   33        tgkill (getpid (), gettid (), SIGUSR1);       /* step-2 */
> 
> It seems that running from step-1 to step-2 produced four SIGUSR1.  And
> in total ca. 3 times as many signals are encountered than we pass
> tgkill()'s:
> 
>   $ grep tgkill gdb.log | wc
>       100    1000    5900
>   $ grep 'Program received signal SIGUSR1' gdb.log | wc
>       310    2480   17670
> 
> From the code I wouldn't expect that.  Is this a known bug or expected
> behavior?  Note that the test case doesn't fail, since it doesn't care
> about the number of incoming signals.

static void
handler (int signo)	/* step-0 */
{			/* step-0 */
  var++;		/* step-1 */
  tgkill (getpid (), gettid (), SIGUSR1);	/* step-2 */
}

static void *
start (void *arg)
{
  tgkill (getpid (), gettid (), SIGUSR1);
  assert (0);
  return NULL;
}

int
main (void)
{
  pthread_t thread;

  signal (SIGUSR1, handler);

  pthread_create (&thread, NULL, start, NULL);
  start (NULL);	/* main-start */
  return 0;
}

There are two threads in the test.  The main thread, and another
spawned by pthread_create.  They both start at "start", and
send SIGUSR1 to themselves.  The SIGUSR1 handler sends a
SIGUSR1 to the current thread, so each thread is constantly re-entering
"handler" ad infinitum.

The test does "handle SIGUSR1 nostop print pass", which means that
control is never passed to the user whenever the program gets a
SIGUSR1, but GDB still prints the signal (the "Program received ..." lines).

The test then steps _one_ of the threads (the one that happens to reach
the "step-1" line first), let's call it thread #1.  This "step" involves
multiple single-steps behind the scenes.  Whenever each of those single-steps
is done, the other thread (let's call it thread #2) is allowed to run free (the
"set scheduler-locking off" setting).  Occasionally, that other thread
will run enough to send itself another SIGUSR1.  And then, occasionally
GDB will see that SIGUSR1 in thread #2, print "Program received", and
go back to single-stepping thread #1, until it reaches a different line.

So "grep tgkill gdb.log" will only count the "step"s that have passed
through the "tgkill (getpid (), ..." line, while "grep 'Program received
signal SIGUSR1'" counts both threads sending themselves SIGUSR1.

Moreover, since thread #2 is "continued", while thread #1 is
single-stepped, it is expected that thread #2 makes progress faster, hence
the 3 to 1 ratio you see.  I see 4 to 1 on my machine even.

> On s390x the test case actually fails sometimes.  In those cases, when
> stepping from step-1 to step-2, a ton of SIGUSR1 are indicated, and then
> the inferior seems to stop at the closing brace of the handler()
> function instead of the tgkill().

That does sound like something's wrong.  Hacking the the test to force
"set debug infrun 1" and "set debug lin-lwp 1" would be my first move.

I wonder if this makes a difference?

 gdb/infrun.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gdb/infrun.c b/gdb/infrun.c
index de2cf19..9621b84 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -4698,8 +4698,10 @@ process_event_stop_test:
 	  ecs->event_thread = tp;
 	  ecs->ptid = tp->ptid;
 	  context_switch (ecs->ptid);
-	  keep_going (ecs);
-	  return;
+
+	  /* Keep checking.  The stepped thread might have already
+	     reached its destination, but not have reported it yet.
+	     If we just kept going, we could end up overstepping.  */
 	}
     }