From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-170103-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 29209 invoked by alias); 7 Sep 2011 19:57:20 -0000
Received: (qmail 29201 invoked by uid 22791); 7 Sep 2011 19:57:18 -0000
X-SWARE-Spam-Status: No, hits=-2.6 required=5.0	tests=BAYES_00,RCVD_IN_DNSWL_LOW
X-Spam-Check-By: sourceware.org
Received: from outbound-queue-1.mail.thdo.gradwell.net (HELO outbound-queue-1.mail.thdo.gradwell.net) (212.11.70.34)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 07 Sep 2011 19:57:04 +0000
Received: from outbound-edge-1.mail.thdo.gradwell.net (bonnie.gradwell.net [212.11.70.2])	by outbound-queue-1.mail.thdo.gradwell.net (Postfix) with ESMTP id 581132221D;	Wed,  7 Sep 2011 20:57:03 +0100 (BST)
Received: from digraph.polyomino.org.uk (HELO digraph.polyomino.org.uk) (81.187.227.50)  (smtp-auth username postmaster%pop3.polyomino.org.uk, mechanism cram-md5)  by outbound-edge-1.mail.thdo.gradwell.net (qpsmtpd/0.83) with (AES256-SHA encrypted) ESMTPSA; Wed, 07 Sep 2011 20:57:03 +0100
Received: from jsm28 (helo=localhost)	by digraph.polyomino.org.uk with local-esmtp (Exim 4.74)	(envelope-from <joseph@codesourcery.com>)	id 1R1O6c-0005NU-4u; Wed, 07 Sep 2011 19:48:18 +0000
Date: Wed, 07 Sep 2011 19:57:00 -0000
From: "Joseph S. Myers" <joseph@codesourcery.com>
To: Diego Novillo <dnovillo@google.com>
cc: gcc@gcc.gnu.org
Subject: Re: RFC: Improving support for known testsuite failures
In-Reply-To: <20110907152813.GA28540@google.com>
Message-ID: <Pine.LNX.4.64.1109071930130.20321@digraph.polyomino.org.uk>
References: <20110907152813.GA28540@google.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Gradwell-MongoId: 4e67cc8f.739b-2c45-1
X-Gradwell-Auth-Method: mailbox
X-Gradwell-Auth-Credentials: postmaster@pop3.polyomino.org.uk
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
X-SW-Source: 2011-09/txt/msg00051.txt.bz2

On Wed, 7 Sep 2011, Diego Novillo wrote:

> One of the most vexing aspects of GCC development is dealing with
> failures in the various testsuites.  In general, we are unable to
> keep failures down to zero.  We tolerate some failures and tell
> people to "compare your build against a clean build".
> 
> This forces developers to either double their testing time by
> building the compiler twice or search in gcc-testresults and hope
> to find a relatively similar build to compare against.

I don't think you can sensibly avoid needing to build the compiler twice.  
Even if the expected state was no failures yesterday, during development 
Stage 1 it's quite likely a combination of patches committed then have 
changes the expected state.  Though regression testers such as HJ's 
certainly help in identifying such new failures promptly and we could 
certainly use more such testers on more targets (but they do need a person 
monitoring them and filing PRs).

> Additionally, the marking mechanisms in DejaGNU are generally
> cumbersome and hard to add.  Even worse, depending on the
> controlling script, there may not be an XFAIL marker at all.

Actually, I think they work well in GCC, given the work Janis did some 
years ago to allow precise specification of the conditions of XFAILing, 
effective-target names, etc. - especially when you are doing non-multilib 
testing (for multilib testing, core DejaGNU can get in the way because 
the multilib options come *after* those in dg-options on the command 
line, so complicating XFAILing).

The most obvious oddity is that gcc.c-torture/execute uses separate .x 
files instead of the dg- harness (see PR 20567).

To my mind, the point of an on-the-side mechanism for identifying known 
failures, separate from the in-test XFAILs, is for failures that depend on 
some machine-specific aspect of the test environment (e.g. the amount of 
memory on the target, or the amount of stack space on the host) - that is, 
for information it would not be appropriate to check in.  If the 
conditions of the failure are well-enough characterised to check in 
something saying when the failure is known, then that something can be 
represented as an XFAIL rather than having two different ways to represent 
it.

> - Supports flaky tests.

Flaky tests are a problem (including for regression testers identifying 
regressions and filing PRs); I'm inclined to think that if a test is flaky 
for non-machine-specific reasons, it should be fixed or promptly disabled 
by default (with a PR filed about the flakiness), rather than being left 
active in a flaky state.  There could be a GCC_TEST_RUN_FLAKY environment 
variable to enable running such tests to see if they have stopped being 
flaky.

-- 
Joseph S. Myers
joseph@codesourcery.com