From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-149781-listarch-gdb-patches=sources.redhat.com@sourceware.org>
Received: (qmail 29098 invoked by alias); 13 Aug 2018 13:01:31 -0000
Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb-patches.sourceware.org>
List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-patches-owner@sourceware.org
Received: (qmail 29067 invoked by uid 89); 13 Aug 2018 13:01:30 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-4.0 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=Hx-spam-relays-external:74.125.82.68, H*RU:74.125.82.68, Kill
X-HELO: mail-wm0-f68.google.com
Received: from mail-wm0-f68.google.com (HELO mail-wm0-f68.google.com) (74.125.82.68) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 13 Aug 2018 13:01:28 +0000
Received: by mail-wm0-f68.google.com with SMTP id q8-v6so8771114wmq.4        for <gdb-patches@sourceware.org>; Mon, 13 Aug 2018 06:01:28 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=embecosm.com; s=google;        h=date:from:to:cc:subject:message-id:references:mime-version         :content-disposition:in-reply-to:user-agent;        bh=jEPedrUQ6PQqBxTL9ypDfzvBDS7vUMElEjQE1eDpaaY=;        b=c/wlfZjXnf4+XWDT19m3rm+2Mpw5weRiR9HxizEqFTpAuvp7FPtryw3tBPF0Uwk7oe         KqpEs4LapTX3yVbH4jmQ6yZK2afluTgEqJhNEMsqoUR8rWQvwlNd1+tnhbSGBUVN+qDq         pXfp+fh7kGhy3Xanp5fhIcHvNFsO8l7PTsblQzcS4Cnxqprr30tRIAdoTJ2oE+ZoWSIA         OAK4WaLG2YCfERsJmk6oGVWO3LXbp3BHsnn6MFi9ty87oJyggClX5ZV2VOXbFZEwmTl/         IYhYABKoLq1IyWk5GTJyk+7BAM9geLpxFCsh0TbbdLrgC1uT6nry8zq07PtQM1lWnuz8         Y2Pg==
Return-Path: <andrew.burgess@embecosm.com>
Received: from localhost (host81-140-215-41.range81-140.btcentralplus.com. [81.140.215.41])        by smtp.gmail.com with ESMTPSA id f132-v6sm11659637wme.24.2018.08.13.06.01.25        (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);        Mon, 13 Aug 2018 06:01:26 -0700 (PDT)
Date: Mon, 13 Aug 2018 13:01:00 -0000
From: Andrew Burgess <andrew.burgess@embecosm.com>
To: Pedro Alves <palves@redhat.com>
Cc: Simon Marchi <simon.marchi@polymtl.ca>, gdb-patches@sourceware.org
Subject: Re: [PATCH] gdb: Fix instability in thread groups test
Message-ID: <20180813130125.GY3155@embecosm.com>
References: <20180810095750.13017-1-andrew.burgess@embecosm.com> <d3447a86dc47513f44c82b799330fee6@polymtl.ca> <7da382e5-bd5e-25c2-b3f8-f38e692f35a1@redhat.com> <20180813114137.GX3155@embecosm.com> <2e47657d-b81b-497d-58bf-0463980dec24@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <2e47657d-b81b-497d-58bf-0463980dec24@redhat.com>
X-Fortune: There are never any bugs you haven't found yet.
X-Editor: GNU Emacs [ http://www.gnu.org/software/emacs ]
User-Agent: Mutt/1.9.2 (2017-12-15)
X-IsSubscribed: yes
X-SW-Source: 2018-08/txt/msg00330.txt.bz2

* Pedro Alves <palves@redhat.com> [2018-08-13 13:03:47 +0100]:

> On 08/13/2018 12:41 PM, Andrew Burgess wrote:
> > * Pedro Alves <palves@redhat.com> [2018-08-13 10:51:44 +0100]:
> > 
> >> But shouldn't we make GDB handle this better?  Make the output
> >> more "atomic" in the sense that we either show a valid complete
> >> entry, or no entry?  There's an inherent race
> >> here, since we use multiple /proc accesses to fill up a process
> >> entry.  If we start fetching process info for a process, and the process
> >> disappears midway, I'd think it better to discard that process's entry,
> >> as-if we had not even seen it, i.e., as if we had listed the set of
> >> processes a tiny moment later.
> > 
> > I agree.
> > 
> > We also need to think about process reuse.  So with multiple accesses
> > to /proc we might start with one process, and end up with a completely
> > new process.
> > 
> > I might be overthinking it, but my first guess at a reliable strategy
> > would be:
> > 
> >   1. Find each /proc/PID directory.
> >   2. Read /proc/PID/stat and extract the start time.  Failure to read
> >      this causes the process to be abandoned.
> >   3. Read all of the other /proc/PID/XXX files as needed.  Any failure
> >      results in the process being abandoned.
> >   4. Reread /proc/PID/stat and confirm the start time hasn't changed,
> >      this would indicate a new process having slipped in.
> > 
> 
> My initial quick thought was just to drop the process entry if
> it turns out we end up with an empty core set.  
> 
> I wonder whether we can prevent PID reuse by keeping a descriptor
> for /proc/PID/ open while we open the other files.  Probably not.

That was my first though, I tried:

  - chdir /proc/PID
  - opendir for /proc/PID

  - Kill /proc/PID

  - Read from the opendir handle, find nothing there.

Which didn't really surprise me, but was worth a try...

> Otherwise, your scheme sounds like the next best.
> 
> > Given the system is still running, we can never be sure that we have
> > "all" processes, so throwing out anything that looks wrong seems like
> > the right strategy.
> > 
> > Also in step #4 we know we've just missed a process - something new
> > has started, but we ignore it.  I think this is fine though given the
> > racy nature of this sort of thing...
> > 
> > The only question is, could these thoughts be dropped into a bug
> > report, 
> 
> 
> Sure.
> 
> 
> > and the original patch to remove the unstable result applied?
> > Or maybe the test updated to either PASS or KFAIL?
> 
> I'd prefer the KFAIL option.  At the very least, a comment in
> the .exp file.

I'll put something together...

Thanks,
Andrew