From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 29209 invoked by alias); 7 Sep 2011 19:57:20 -0000 Received: (qmail 29201 invoked by uid 22791); 7 Sep 2011 19:57:18 -0000 X-SWARE-Spam-Status: No, hits=-2.6 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_LOW X-Spam-Check-By: sourceware.org Received: from outbound-queue-1.mail.thdo.gradwell.net (HELO outbound-queue-1.mail.thdo.gradwell.net) (212.11.70.34) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 07 Sep 2011 19:57:04 +0000 Received: from outbound-edge-1.mail.thdo.gradwell.net (bonnie.gradwell.net [212.11.70.2]) by outbound-queue-1.mail.thdo.gradwell.net (Postfix) with ESMTP id 581132221D; Wed, 7 Sep 2011 20:57:03 +0100 (BST) Received: from digraph.polyomino.org.uk (HELO digraph.polyomino.org.uk) (81.187.227.50) (smtp-auth username postmaster%pop3.polyomino.org.uk, mechanism cram-md5) by outbound-edge-1.mail.thdo.gradwell.net (qpsmtpd/0.83) with (AES256-SHA encrypted) ESMTPSA; Wed, 07 Sep 2011 20:57:03 +0100 Received: from jsm28 (helo=localhost) by digraph.polyomino.org.uk with local-esmtp (Exim 4.74) (envelope-from ) id 1R1O6c-0005NU-4u; Wed, 07 Sep 2011 19:48:18 +0000 Date: Wed, 07 Sep 2011 19:57:00 -0000 From: "Joseph S. Myers" To: Diego Novillo cc: gcc@gcc.gnu.org Subject: Re: RFC: Improving support for known testsuite failures In-Reply-To: <20110907152813.GA28540@google.com> Message-ID: References: <20110907152813.GA28540@google.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Gradwell-MongoId: 4e67cc8f.739b-2c45-1 X-Gradwell-Auth-Method: mailbox X-Gradwell-Auth-Credentials: postmaster@pop3.polyomino.org.uk Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org X-SW-Source: 2011-09/txt/msg00051.txt.bz2 On Wed, 7 Sep 2011, Diego Novillo wrote: > One of the most vexing aspects of GCC development is dealing with > failures in the various testsuites. In general, we are unable to > keep failures down to zero. We tolerate some failures and tell > people to "compare your build against a clean build". > > This forces developers to either double their testing time by > building the compiler twice or search in gcc-testresults and hope > to find a relatively similar build to compare against. I don't think you can sensibly avoid needing to build the compiler twice. Even if the expected state was no failures yesterday, during development Stage 1 it's quite likely a combination of patches committed then have changes the expected state. Though regression testers such as HJ's certainly help in identifying such new failures promptly and we could certainly use more such testers on more targets (but they do need a person monitoring them and filing PRs). > Additionally, the marking mechanisms in DejaGNU are generally > cumbersome and hard to add. Even worse, depending on the > controlling script, there may not be an XFAIL marker at all. Actually, I think they work well in GCC, given the work Janis did some years ago to allow precise specification of the conditions of XFAILing, effective-target names, etc. - especially when you are doing non-multilib testing (for multilib testing, core DejaGNU can get in the way because the multilib options come *after* those in dg-options on the command line, so complicating XFAILing). The most obvious oddity is that gcc.c-torture/execute uses separate .x files instead of the dg- harness (see PR 20567). To my mind, the point of an on-the-side mechanism for identifying known failures, separate from the in-test XFAILs, is for failures that depend on some machine-specific aspect of the test environment (e.g. the amount of memory on the target, or the amount of stack space on the host) - that is, for information it would not be appropriate to check in. If the conditions of the failure are well-enough characterised to check in something saying when the failure is known, then that something can be represented as an XFAIL rather than having two different ways to represent it. > - Supports flaky tests. Flaky tests are a problem (including for regression testers identifying regressions and filing PRs); I'm inclined to think that if a test is flaky for non-machine-specific reasons, it should be fixed or promptly disabled by default (with a PR filed about the flakiness), rather than being left active in a flaky state. There could be a GCC_TEST_RUN_FLAKY environment variable to enable running such tests to see if they have stopped being flaky. -- Joseph S. Myers joseph@codesourcery.com