RFC: Improving support for known testsuite failures

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* RFC: Improving support for known testsuite failures
@ 2011-09-07 15:28 Diego Novillo
  2011-09-07 18:43 ` Andreas Jaeger
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Diego Novillo @ 2011-09-07 15:28 UTC (permalink / raw)
  To: gcc

One of the most vexing aspects of GCC development is dealing with
failures in the various testsuites.  In general, we are unable to
keep failures down to zero.  We tolerate some failures and tell
people to "compare your build against a clean build".

This forces developers to either double their testing time by
building the compiler twice or search in gcc-testresults and hope
to find a relatively similar build to compare against.

Additionally, the marking mechanisms in DejaGNU are generally
cumbersome and hard to add.  Even worse, depending on the
controlling script, there may not be an XFAIL marker at all.

So, while we would ideally keep NO failures in the testsuite, the
reality is that we are content with having KNOWN failures.  For a
given set of failures out of 'make check', I would like to have a
simple filtering mechanism that prunes the known failures out.

Desired features:

- List of known failures lives in SVN.
- Each target can have its own list.
- Supports ignoring FAIL, UNRESOLVED and XPASS results.
- Supports pattern matching to glob sets of failures.
- Co-exists with the existing XFAIL support in DejaGNU.
- Supports flaky tests.
- Supports timestamps to avoid having tests in a knonw-to-fail
  state forever.

In terms of implementation, this filter could be part of 'make
check'.  We'd pipe make check's output to it and it would decide
whether to emit FAIL/UNRESOLVED/XPASS lines based on the black
list.

I could also make this a post-check filter that runs on all the
generated <tool>.sum files.  The filter could live in
<src>/contrib and be used on demand.

I am not thrilled about the prospect of implementing this in
DejaGNU directly.

Thoughts?

Thanks.  Diego.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: Improving support for known testsuite failures
  2011-09-07 15:28 RFC: Improving support for known testsuite failures Diego Novillo
@ 2011-09-07 18:43 ` Andreas Jaeger
  2011-09-07 19:57 ` Joseph S. Myers
  2011-09-08  8:31 ` Richard Guenther
  2 siblings, 0 replies; 21+ messages in thread
From: Andreas Jaeger @ 2011-09-07 18:43 UTC (permalink / raw)
  To: gcc

On Wednesday, September 07, 2011 05:28:15 PM Diego Novillo wrote:
> One of the most vexing aspects of GCC development is dealing with
> failures in the various testsuites.  In general, we are unable to
> keep failures down to zero.  We tolerate some failures and tell
> people to "compare your build against a clean build".
> 
> This forces developers to either double their testing time by
> building the compiler twice or search in gcc-testresults and hope
> to find a relatively similar build to compare against.
> 
> Additionally, the marking mechanisms in DejaGNU are generally
> cumbersome and hard to add.  Even worse, depending on the
> controlling script, there may not be an XFAIL marker at all.
> 
> So, while we would ideally keep NO failures in the testsuite, the
> reality is that we are content with having KNOWN failures.  For a
> given set of failures out of 'make check', I would like to have a
> simple filtering mechanism that prunes the known failures out.
> 
> Desired features:
> 
> - List of known failures lives in SVN.
> - Each target can have its own list.
> - Supports ignoring FAIL, UNRESOLVED and XPASS results.
> - Supports pattern matching to glob sets of failures.
> - Co-exists with the existing XFAIL support in DejaGNU.
> - Supports flaky tests.
> - Supports timestamps to avoid having tests in a knonw-to-fail
>   state forever.
> 
> In terms of implementation, this filter could be part of 'make
> check'.  We'd pipe make check's output to it and it would decide
> whether to emit FAIL/UNRESOLVED/XPASS lines based on the black
> list.
> 
> I could also make this a post-check filter that runs on all the
> generated <tool>.sum files.  The filter could live in
> <src>/contrib and be used on demand.
> 
> I am not thrilled about the prospect of implementing this in
> DejaGNU directly.
> 
> Thoughts?

good idea.

I have my own homegrown script that builds gcc and outputs something like:

trunk: successfull (New passes: 32, new fails: 0, time: 1:55 h)
4.6: successfull (New passes: 28, new fails: 12, time: 1:43 h)

I'll send it to you offline,

Andreas
-- 
 Andreas Jaeger aj@{suse.com,opensuse.org} Twitter/Identica: jaegerandi
  SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
   GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nürnberg)
    GPG fingerprint = 93A3 365E CE47 B889 DF7F  FED1 389A 563C C272 A126

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: Improving support for known testsuite failures
  2011-09-07 15:28 RFC: Improving support for known testsuite failures Diego Novillo
  2011-09-07 18:43 ` Andreas Jaeger
@ 2011-09-07 19:57 ` Joseph S. Myers
  2011-09-08  8:31 ` Richard Guenther
  2 siblings, 0 replies; 21+ messages in thread
From: Joseph S. Myers @ 2011-09-07 19:57 UTC (permalink / raw)
  To: Diego Novillo; +Cc: gcc

On Wed, 7 Sep 2011, Diego Novillo wrote:

> One of the most vexing aspects of GCC development is dealing with
> failures in the various testsuites.  In general, we are unable to
> keep failures down to zero.  We tolerate some failures and tell
> people to "compare your build against a clean build".
> 
> This forces developers to either double their testing time by
> building the compiler twice or search in gcc-testresults and hope
> to find a relatively similar build to compare against.

I don't think you can sensibly avoid needing to build the compiler twice.  
Even if the expected state was no failures yesterday, during development 
Stage 1 it's quite likely a combination of patches committed then have 
changes the expected state.  Though regression testers such as HJ's 
certainly help in identifying such new failures promptly and we could 
certainly use more such testers on more targets (but they do need a person 
monitoring them and filing PRs).

> Additionally, the marking mechanisms in DejaGNU are generally
> cumbersome and hard to add.  Even worse, depending on the
> controlling script, there may not be an XFAIL marker at all.

Actually, I think they work well in GCC, given the work Janis did some 
years ago to allow precise specification of the conditions of XFAILing, 
effective-target names, etc. - especially when you are doing non-multilib 
testing (for multilib testing, core DejaGNU can get in the way because 
the multilib options come *after* those in dg-options on the command 
line, so complicating XFAILing).

The most obvious oddity is that gcc.c-torture/execute uses separate .x 
files instead of the dg- harness (see PR 20567).

To my mind, the point of an on-the-side mechanism for identifying known 
failures, separate from the in-test XFAILs, is for failures that depend on 
some machine-specific aspect of the test environment (e.g. the amount of 
memory on the target, or the amount of stack space on the host) - that is, 
for information it would not be appropriate to check in.  If the 
conditions of the failure are well-enough characterised to check in 
something saying when the failure is known, then that something can be 
represented as an XFAIL rather than having two different ways to represent 
it.

> - Supports flaky tests.

Flaky tests are a problem (including for regression testers identifying 
regressions and filing PRs); I'm inclined to think that if a test is flaky 
for non-machine-specific reasons, it should be fixed or promptly disabled 
by default (with a PR filed about the flakiness), rather than being left 
active in a flaky state.  There could be a GCC_TEST_RUN_FLAKY environment 
variable to enable running such tests to see if they have stopped being 
flaky.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: Improving support for known testsuite failures
  2011-09-07 15:28 RFC: Improving support for known testsuite failures Diego Novillo
  2011-09-07 18:43 ` Andreas Jaeger
  2011-09-07 19:57 ` Joseph S. Myers
@ 2011-09-08  8:31 ` Richard Guenther
  2011-09-08 11:05   ` Diego Novillo
                     ` (2 more replies)
  2 siblings, 3 replies; 21+ messages in thread
From: Richard Guenther @ 2011-09-08  8:31 UTC (permalink / raw)
  To: Diego Novillo; +Cc: gcc

On Wed, Sep 7, 2011 at 5:28 PM, Diego Novillo <dnovillo@google.com> wrote:
> One of the most vexing aspects of GCC development is dealing with
> failures in the various testsuites.  In general, we are unable to
> keep failures down to zero.  We tolerate some failures and tell
> people to "compare your build against a clean build".
>
> This forces developers to either double their testing time by
> building the compiler twice or search in gcc-testresults and hope
> to find a relatively similar build to compare against.
>
> Additionally, the marking mechanisms in DejaGNU are generally
> cumbersome and hard to add.  Even worse, depending on the
> controlling script, there may not be an XFAIL marker at all.
>
> So, while we would ideally keep NO failures in the testsuite, the
> reality is that we are content with having KNOWN failures.  For a
> given set of failures out of 'make check', I would like to have a
> simple filtering mechanism that prunes the known failures out.
>
> Desired features:
>
> - List of known failures lives in SVN.
> - Each target can have its own list.
> - Supports ignoring FAIL, UNRESOLVED and XPASS results.
> - Supports pattern matching to glob sets of failures.
> - Co-exists with the existing XFAIL support in DejaGNU.
> - Supports flaky tests.
> - Supports timestamps to avoid having tests in a knonw-to-fail
>  state forever.
>
> In terms of implementation, this filter could be part of 'make
> check'.  We'd pipe make check's output to it and it would decide
> whether to emit FAIL/UNRESOLVED/XPASS lines based on the black
> list.
>
> I could also make this a post-check filter that runs on all the
> generated <tool>.sum files.  The filter could live in
> <src>/contrib and be used on demand.
>
> I am not thrilled about the prospect of implementing this in
> DejaGNU directly.
>
> Thoughts?

I think it would be more useful to have a script parse gcc-testresults@
postings from the various autotesters and produce a nice webpage
with revisions and known FAIL/XPASSes for the target triplets that
are tested.

That's been a long time on my TODO list, but my web/script FU is
weak enough that I've been pushing that back.

Maybe you have some web-stuff-capable folks at Google even? ;)

Richard.

>
> Thanks.  Diego.
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: Improving support for known testsuite failures
  2011-09-08  8:31 ` Richard Guenther
@ 2011-09-08 11:05   ` Diego Novillo
  2011-09-08 11:16     ` Richard Guenther
  2011-09-23  0:07     ` Hans-Peter Nilsson
  2011-09-08 16:39   ` Joseph S. Myers
  2011-09-08 22:27   ` Michael Hope
  2 siblings, 2 replies; 21+ messages in thread
From: Diego Novillo @ 2011-09-08 11:05 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc

On Thu, Sep 8, 2011 at 04:31, Richard Guenther
<richard.guenther@gmail.com> wrote:

> I think it would be more useful to have a script parse gcc-testresults@
> postings from the various autotesters and produce a nice webpage
> with revisions and known FAIL/XPASSes for the target triplets that
> are tested.

Sure, though that describes a different tool.  I'm after a tool that
will 'exit 0' if the testsuite finished with nominal results.

> Maybe you have some web-stuff-capable folks at Google even? ;)

http://code.google.com/appengine/ ? ;)


Diego.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: Improving support for known testsuite failures
  2011-09-08 11:05   ` Diego Novillo
@ 2011-09-08 11:16     ` Richard Guenther
  2011-09-08 11:34       ` Diego Novillo
  2011-09-23  0:07     ` Hans-Peter Nilsson
  1 sibling, 1 reply; 21+ messages in thread
From: Richard Guenther @ 2011-09-08 11:16 UTC (permalink / raw)
  To: Diego Novillo; +Cc: gcc

On Thu, Sep 8, 2011 at 1:04 PM, Diego Novillo <dnovillo@google.com> wrote:
> On Thu, Sep 8, 2011 at 04:31, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>
>> I think it would be more useful to have a script parse gcc-testresults@
>> postings from the various autotesters and produce a nice webpage
>> with revisions and known FAIL/XPASSes for the target triplets that
>> are tested.
>
> Sure, though that describes a different tool.  I'm after a tool that
> will 'exit 0' if the testsuite finished with nominal results.

Well, you'd need to maintain a list of known XPASS/FAILs anyway.
You can as well do it in the testcases themself (add XFAILs, remove
XPASSes and open bugreports to not forget about this).  Adding
a separate filter or whatever just looks completely wrong to me.

>> Maybe you have some web-stuff-capable folks at Google even? ;)
>
> http://code.google.com/appengine/ ? ;)

Can't find the script that parses gcc-testresults there ;)

>
> Diego.
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: Improving support for known testsuite failures
  2011-09-08 11:16     ` Richard Guenther
@ 2011-09-08 11:34       ` Diego Novillo
  2011-09-08 11:49         ` Richard Guenther
  2011-09-08 13:24         ` Richard Earnshaw
  0 siblings, 2 replies; 21+ messages in thread
From: Diego Novillo @ 2011-09-08 11:34 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc

On Thu, Sep 8, 2011 at 07:16, Richard Guenther
<richard.guenther@gmail.com> wrote:

> Well, you'd need to maintain a list of known XPASS/FAILs anyway.

Yes, of course.  That's the manifest of things you expect to be broken.

> You can as well do it in the testcases themself (add XFAILs, remove
> XPASSes and open bugreports to not forget about this).  Adding
> a separate filter or whatever just looks completely wrong to me.

The main motivation is precisely not have to deal with dejagnu's xfail
mechanisms.  They are too cumbersome  (details upthread).

Perhaps if there was a global marker that one could add, that would
solve my problem too.  I think I'll start with a post-check filter in
contrib/

Diego.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: Improving support for known testsuite failures
  2011-09-08 11:34       ` Diego Novillo
@ 2011-09-08 11:49         ` Richard Guenther
  2011-09-08 12:14           ` Diego Novillo
  2011-09-08 13:24         ` Richard Earnshaw
  1 sibling, 1 reply; 21+ messages in thread
From: Richard Guenther @ 2011-09-08 11:49 UTC (permalink / raw)
  To: Diego Novillo; +Cc: gcc

On Thu, Sep 8, 2011 at 1:33 PM, Diego Novillo <dnovillo@google.com> wrote:
> On Thu, Sep 8, 2011 at 07:16, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>
>> Well, you'd need to maintain a list of known XPASS/FAILs anyway.
>
> Yes, of course.  That's the manifest of things you expect to be broken.
>
>> You can as well do it in the testcases themself (add XFAILs, remove
>> XPASSes and open bugreports to not forget about this).  Adding
>> a separate filter or whatever just looks completely wrong to me.
>
> The main motivation is precisely not have to deal with dejagnu's xfail
> mechanisms.  They are too cumbersome  (details upthread).

Well, I'd rather _fix_ dejagnu then.  Any specific example you can't
eventually xfail by dg-skipping the testcase?

> Perhaps if there was a global marker that one could add, that would
> solve my problem too.  I think I'll start with a post-check filter in
> contrib/

You seem to have a very specific problem ;)   I suppose some
patch autotester?  Our patch autotester simply bootstraps twice
and compares the result.

Richard.

>
>
> Diego.
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: Improving support for known testsuite failures
  2011-09-08 11:49         ` Richard Guenther
@ 2011-09-08 12:14           ` Diego Novillo
  2011-09-08 12:20             ` Richard Guenther
  0 siblings, 1 reply; 21+ messages in thread
From: Diego Novillo @ 2011-09-08 12:14 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc, janisjo

On Thu, Sep 8, 2011 at 07:49, Richard Guenther
<richard.guenther@gmail.com> wrote:

> Well, I'd rather _fix_ dejagnu then.  Any specific example you can't
> eventually xfail by dg-skipping the testcase?

Several I mentioned upthread:
- Some .exp files do no support xfail markers.
- Different directories will have their own syntactic sugar (though
most are using dg-xfail now).
- When using dg-xfail-if, you also need to xfail other dg- markers to
avoid UNRESOLVED.
- Similarly, a dg-xfail-if will cause UNRESOLVED if the test was dg-do
run or dg-do link.
- If you are forced to use dg-skip-if, you will miss state changes in the test.
- Some tests are simply flaky (e.g. the ones that depend on
environment like those using gdb).

I am purposely avoiding tangling with DejaGNU, though if I could find
a way of adding a global known-to-fail marker, that would solve my
problem too.  Janis, do you think that's feasible?

>> Perhaps if there was a global marker that one could add, that would
>> solve my problem too.  I think I'll start with a post-check filter in
>> contrib/
>
> You seem to have a very specific problem ;)

Apparently :)  I am thinking about release and devel branches, in
particular.  That's why I think I'll just add a script in contrib/

> I suppose some
> patch autotester?  Our patch autotester simply bootstraps twice
> and compares the result.

We have those, but doubling the amount of work done by the testers is
not acceptable.  What we need is something that will quickly decide
whether the testsuite run is OK or not.


Diego.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: Improving support for known testsuite failures
  2011-09-08 12:14           ` Diego Novillo
@ 2011-09-08 12:20             ` Richard Guenther
  2011-09-08 12:26               ` Diego Novillo
  0 siblings, 1 reply; 21+ messages in thread
From: Richard Guenther @ 2011-09-08 12:20 UTC (permalink / raw)
  To: Diego Novillo; +Cc: gcc, janisjo

On Thu, Sep 8, 2011 at 2:14 PM, Diego Novillo <dnovillo@google.com> wrote:
> On Thu, Sep 8, 2011 at 07:49, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>
>> Well, I'd rather _fix_ dejagnu then.  Any specific example you can't
>> eventually xfail by dg-skipping the testcase?
>
> Several I mentioned upthread:
> - Some .exp files do no support xfail markers.
> - Different directories will have their own syntactic sugar (though
> most are using dg-xfail now).
> - When using dg-xfail-if, you also need to xfail other dg- markers to
> avoid UNRESOLVED.
> - Similarly, a dg-xfail-if will cause UNRESOLVED if the test was dg-do
> run or dg-do link.
> - If you are forced to use dg-skip-if, you will miss state changes in the test.
> - Some tests are simply flaky (e.g. the ones that depend on
> environment like those using gdb).
>
> I am purposely avoiding tangling with DejaGNU, though if I could find
> a way of adding a global known-to-fail marker, that would solve my
> problem too.  Janis, do you think that's feasible?
>
>>> Perhaps if there was a global marker that one could add, that would
>>> solve my problem too.  I think I'll start with a post-check filter in
>>> contrib/
>>
>> You seem to have a very specific problem ;)
>
> Apparently :)  I am thinking about release and devel branches, in
> particular.  That's why I think I'll just add a script in contrib/
>
>> I suppose some
>> patch autotester?  Our patch autotester simply bootstraps twice
>> and compares the result.
>
> We have those, but doubling the amount of work done by the testers is
> not acceptable.  What we need is something that will quickly decide
> whether the testsuite run is OK or not.

Cache the comparison result?  If you specify a (minimum) revision
required for testing just test against a cached revision that fulfils
the requirement.  Something I never implemented for ours.

Richard.

>
> Diego.
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: Improving support for known testsuite failures
  2011-09-08 12:20             ` Richard Guenther
@ 2011-09-08 12:26               ` Diego Novillo
  2011-09-08 12:30                 ` Richard Guenther
  0 siblings, 1 reply; 21+ messages in thread
From: Diego Novillo @ 2011-09-08 12:26 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc, janisjo

On Thu, Sep 8, 2011 at 08:20, Richard Guenther
<richard.guenther@gmail.com> wrote:

> Cache the comparison result?  If you specify a (minimum) revision
> required for testing just test against a cached revision that fulfils
> the requirement.  Something I never implemented for ours.

Nope.  Build must be functionally independent from other state.  If
the manifest of known failures lives together with the source code,
that's acceptable.  Depending on previous builds is not.  This also
helps individual developers doing builds and packagers doing spins off
of the main source branches.

Diego.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: Improving support for known testsuite failures
  2011-09-08 12:26               ` Diego Novillo
@ 2011-09-08 12:30                 ` Richard Guenther
  2011-09-08 12:33                   ` Diego Novillo
  0 siblings, 1 reply; 21+ messages in thread
From: Richard Guenther @ 2011-09-08 12:30 UTC (permalink / raw)
  To: Diego Novillo; +Cc: gcc, janisjo

On Thu, Sep 8, 2011 at 2:26 PM, Diego Novillo <dnovillo@google.com> wrote:
> On Thu, Sep 8, 2011 at 08:20, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>
>> Cache the comparison result?  If you specify a (minimum) revision
>> required for testing just test against a cached revision that fulfils
>> the requirement.  Something I never implemented for ours.
>
> Nope.  Build must be functionally independent from other state.  If
> the manifest of known failures lives together with the source code,
> that's acceptable.  Depending on previous builds is not.

It _does_ live with the source code.  Think of implicitly "checking in" the
build result with the tested revision.  That's not different from your idea
of checking in some sort of whitelist of fails.

>  This also
> helps individual developers doing builds and packagers doing spins off
> of the main source branches.

A svn revision is unique.  Or do you mean in other _repositories_?
Then the repository:revision combination is unique.  You don't
test whole source tarballs, do you?

Richard.
>
> Diego.
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: Improving support for known testsuite failures
  2011-09-08 12:30                 ` Richard Guenther
@ 2011-09-08 12:33                   ` Diego Novillo
  0 siblings, 0 replies; 21+ messages in thread
From: Diego Novillo @ 2011-09-08 12:33 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc, janisjo

On Thu, Sep 8, 2011 at 08:29, Richard Guenther
<richard.guenther@gmail.com> wrote:

> It _does_ live with the source code.  Think of implicitly "checking in" the
> build result with the tested revision.  That's not different from your idea
> of checking in some sort of whitelist of fails.

Ah, I see what you mean.  Yes, that's along the lines of what I'm
thinking.  The black list is populated out of a clean run and the
post-check filter does the comparison.

> A svn revision is unique.  Or do you mean in other _repositories_?
> Then the repository:revision combination is unique.  You don't
> test whole source tarballs, do you?

I mean svn branches.


Diego.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: Improving support for known testsuite failures
  2011-09-08 11:34       ` Diego Novillo
  2011-09-08 11:49         ` Richard Guenther
@ 2011-09-08 13:24         ` Richard Earnshaw
  2011-09-08 13:55           ` Diego Novillo
  2011-09-08 16:41           ` Joseph S. Myers
  1 sibling, 2 replies; 21+ messages in thread
From: Richard Earnshaw @ 2011-09-08 13:24 UTC (permalink / raw)
  To: Diego Novillo; +Cc: Richard Guenther, gcc

On 08/09/11 12:33, Diego Novillo wrote:
> On Thu, Sep 8, 2011 at 07:16, Richard Guenther
> <richard.guenther@gmail.com> wrote:
> 
>> Well, you'd need to maintain a list of known XPASS/FAILs anyway.
> 
> Yes, of course.  That's the manifest of things you expect to be broken.
> 

And that's only going to work if all the test names are unique.  I
currently see quite a few tests that appear in my log as both PASS and
FAIL in a single run.  For example:

FAIL: gcc.dg/torture/stackalign/non-local-goto-2.c  -Os  execution test
PASS: gcc.dg/torture/stackalign/non-local-goto-2.c  -Os  execution test

FAIL: gcc.dg/torture/stackalign/setjmp-3.c  -Os  execution test
PASS: gcc.dg/torture/stackalign/setjmp-3.c  -Os  execution test

R.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: Improving support for known testsuite failures
  2011-09-08 13:24         ` Richard Earnshaw
@ 2011-09-08 13:55           ` Diego Novillo
  2011-09-08 14:03             ` Richard Earnshaw
  2011-09-08 16:41           ` Joseph S. Myers
  1 sibling, 1 reply; 21+ messages in thread
From: Diego Novillo @ 2011-09-08 13:55 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: Richard Guenther, gcc

On Thu, Sep 8, 2011 at 09:23, Richard Earnshaw <rearnsha@arm.com> wrote:

> And that's only going to work if all the test names are unique.  I
> currently see quite a few tests that appear in my log as both PASS and
> FAIL in a single run.  For example:

That's fine.  What we are looking for is to capture the state of
FAIL/UNRESOLVED/XPASS.  Any change of state in those is an interesting
signal to investigate.


> FAIL: gcc.dg/torture/stackalign/non-local-goto-2.c  -Os  execution test
> PASS: gcc.dg/torture/stackalign/non-local-goto-2.c  -Os  execution test
>
> FAIL: gcc.dg/torture/stackalign/setjmp-3.c  -Os  execution test
> PASS: gcc.dg/torture/stackalign/setjmp-3.c  -Os  execution test

Sigh.  We should really try to replace dejagnu.  So much work, though.


Diego.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: Improving support for known testsuite failures
  2011-09-08 13:55           ` Diego Novillo
@ 2011-09-08 14:03             ` Richard Earnshaw
  0 siblings, 0 replies; 21+ messages in thread
From: Richard Earnshaw @ 2011-09-08 14:03 UTC (permalink / raw)
  To: Diego Novillo; +Cc: Richard Guenther, gcc

On 08/09/11 14:54, Diego Novillo wrote:
> On Thu, Sep 8, 2011 at 09:23, Richard Earnshaw <rearnsha@arm.com> wrote:
> 
>> And that's only going to work if all the test names are unique.  I
>> currently see quite a few tests that appear in my log as both PASS and
>> FAIL in a single run.  For example:
> 
> That's fine.  What we are looking for is to capture the state of
> FAIL/UNRESOLVED/XPASS.  Any change of state in those is an interesting
> signal to investigate.
> 
> 
>> FAIL: gcc.dg/torture/stackalign/non-local-goto-2.c  -Os  execution test
>> PASS: gcc.dg/torture/stackalign/non-local-goto-2.c  -Os  execution test
>>
>> FAIL: gcc.dg/torture/stackalign/setjmp-3.c  -Os  execution test
>> PASS: gcc.dg/torture/stackalign/setjmp-3.c  -Os  execution test
> 
> Sigh.  We should really try to replace dejagnu.  So much work, though.
> 
> 
> Diego.
> 

It's not dejagnu specifically, it's just the names people have given to
the tests.

R.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: Improving support for known testsuite failures
  2011-09-08  8:31 ` Richard Guenther
  2011-09-08 11:05   ` Diego Novillo
@ 2011-09-08 16:39   ` Joseph S. Myers
  2011-09-08 22:27   ` Michael Hope
  2 siblings, 0 replies; 21+ messages in thread
From: Joseph S. Myers @ 2011-09-08 16:39 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Diego Novillo, gcc

On Thu, 8 Sep 2011, Richard Guenther wrote:

> I think it would be more useful to have a script parse gcc-testresults@
> postings from the various autotesters and produce a nice webpage
> with revisions and known FAIL/XPASSes for the target triplets that
> are tested.

Better than parsing gcc-testresults might be a system for uploading full 
.sum files (or indeed .logs as well) to a database.  gcc-testresults is 
useful, but if a test isn't mentioned in a message you don't know if it 
passed or wasn't run at all, for example.  The database would be big by 
gcc.gnu.org standards (maybe multiple GB a day if all the gcc-testresults 
posters start uploading full .log files), but not by the standards of many 
modern web databases.  You'd want a contrib/ script for uploading files 
given metadata about the test run (some identifier for the tester, details 
of configuration and multilibs involved in the various files) that could 
be used for both build tree and installed testing.

(You might want to parse gcc-testresults *as well* for the additional logs 
found that way, but a system giving full logs and reliably identifying 
successive builds from the same tester could do more things, such as 
identifying regressions seen by any individual tester as opposed to a test 
that passes for one person on a given target and fails for another person 
on that target.)

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: Improving support for known testsuite failures
  2011-09-08 13:24         ` Richard Earnshaw
  2011-09-08 13:55           ` Diego Novillo
@ 2011-09-08 16:41           ` Joseph S. Myers
  1 sibling, 0 replies; 21+ messages in thread
From: Joseph S. Myers @ 2011-09-08 16:41 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: Diego Novillo, Richard Guenther, gcc

On Thu, 8 Sep 2011, Richard Earnshaw wrote:

> And that's only going to work if all the test names are unique.  I
> currently see quite a few tests that appear in my log as both PASS and
> FAIL in a single run.  For example:

Yes, that's just a bug in the testsuite that should be fixed just like any 
other bug.  One thing a web service working with .sum files could do is 
automatically detect places with just duplicate test names (even if both 
are PASS) and report them noisily as problems - and as regressions if new 
instances appear.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: Improving support for known testsuite failures
  2011-09-08  8:31 ` Richard Guenther
  2011-09-08 11:05   ` Diego Novillo
  2011-09-08 16:39   ` Joseph S. Myers
@ 2011-09-08 22:27   ` Michael Hope
  2 siblings, 0 replies; 21+ messages in thread
From: Michael Hope @ 2011-09-08 22:27 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Diego Novillo, gcc

On Thu, Sep 8, 2011 at 8:31 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Wed, Sep 7, 2011 at 5:28 PM, Diego Novillo <dnovillo@google.com> wrote:
>> One of the most vexing aspects of GCC development is dealing with
>> failures in the various testsuites.  In general, we are unable to
>> keep failures down to zero.  We tolerate some failures and tell
>> people to "compare your build against a clean build".
>>
>> This forces developers to either double their testing time by
>> building the compiler twice or search in gcc-testresults and hope
>> to find a relatively similar build to compare against.
>>
>> Additionally, the marking mechanisms in DejaGNU are generally
>> cumbersome and hard to add.  Even worse, depending on the
>> controlling script, there may not be an XFAIL marker at all.
>>
>> So, while we would ideally keep NO failures in the testsuite, the
>> reality is that we are content with having KNOWN failures.  For a
>> given set of failures out of 'make check', I would like to have a
>> simple filtering mechanism that prunes the known failures out.
>>
>> Desired features:
>>
>> - List of known failures lives in SVN.
>> - Each target can have its own list.
>> - Supports ignoring FAIL, UNRESOLVED and XPASS results.
>> - Supports pattern matching to glob sets of failures.
>> - Co-exists with the existing XFAIL support in DejaGNU.
>> - Supports flaky tests.
>> - Supports timestamps to avoid having tests in a knonw-to-fail
>>  state forever.
>>
>> In terms of implementation, this filter could be part of 'make
>> check'.  We'd pipe make check's output to it and it would decide
>> whether to emit FAIL/UNRESOLVED/XPASS lines based on the black
>> list.
>>
>> I could also make this a post-check filter that runs on all the
>> generated <tool>.sum files.  The filter could live in
>> <src>/contrib and be used on demand.
>>
>> I am not thrilled about the prospect of implementing this in
>> DejaGNU directly.
>>
>> Thoughts?
>
> I think it would be more useful to have a script parse gcc-testresults@
> postings from the various autotesters and produce a nice webpage
> with revisions and known FAIL/XPASSes for the target triplets that
> are tested.
>
> That's been a long time on my TODO list, but my web/script FU is
> weak enough that I've been pushing that back.

I have something along those lines for the Linaro releases:
 http://ex.seabright.co.nz/helpers/testcompare/gcc-linaro-4.6-2011.08/logs/armv7l-natty-cbuild162-ursa1-cortexa9r1/gcc-testsuite.txt?base=gcc-linaro-4.6-2011.07-0

and a lower level diff-on-sum-files for each commit:
 http://builds.linaro.org/toolchain/gcc-linaro-4.5+bzr99541~rsandifo~lp823708-4.5/logs/armv7l-natty-cbuild181-ursa4-armv5r2/testsuite-diff.txt
 http://builds.linaro.org/toolchain/gcc-linaro-4.6+bzr106801~ams-codesourcery~merge-from-fsf-20110908-4.6/logs/x86_64-natty-cbuild181-oort1-x86_64r1/testsuite-diff.txt

They're both a hack and only work against local files.  The code is
available at:
 https://launchpad.net/tcwg-web

and:
 https://launchpad.net/cbuild

They're both similar to contrib/compare_results but webified and
hooked into our auto builders.

-- Michael

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: Improving support for known testsuite failures
  2011-09-08 11:05   ` Diego Novillo
  2011-09-08 11:16     ` Richard Guenther
@ 2011-09-23  0:07     ` Hans-Peter Nilsson
  2011-09-23 13:11       ` Diego Novillo
  1 sibling, 1 reply; 21+ messages in thread
From: Hans-Peter Nilsson @ 2011-09-23  0:07 UTC (permalink / raw)
  To: Diego Novillo; +Cc: gcc

On Thu, 8 Sep 2011, Diego Novillo wrote:

> On Thu, Sep 8, 2011 at 04:31, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>
> > I think it would be more useful to have a script parse gcc-testresults@
> > postings from the various autotesters and produce a nice webpage
> > with revisions and known FAIL/XPASSes for the target triplets that
> > are tested.
>
> Sure, though that describes a different tool.  I'm after a tool that
> will 'exit 0' if the testsuite finished with nominal results.

Not to stop you from (partly) reinventing the wheel, but that's
pretty much what contrib/regression/btest-gcc.sh already does,
though you have to feed it a baseline a set of processed .sum
files which could (for a calling script or a modified
btest-gcc.sh) live in, say, contrib/target-results/<target>.
It handles "duplicate" test names by marking it as failing if
any of them has failed.  Works good enough.

brgds, H-P

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: Improving support for known testsuite failures
  2011-09-23  0:07     ` Hans-Peter Nilsson
@ 2011-09-23 13:11       ` Diego Novillo
  0 siblings, 0 replies; 21+ messages in thread
From: Diego Novillo @ 2011-09-23 13:11 UTC (permalink / raw)
  To: Hans-Peter Nilsson; +Cc: gcc

On Thu, Sep 22, 2011 at 20:06, Hans-Peter Nilsson <hp@bitrange.com> wrote:
> On Thu, 8 Sep 2011, Diego Novillo wrote:
>
>> On Thu, Sep 8, 2011 at 04:31, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>
>> > I think it would be more useful to have a script parse gcc-testresults@
>> > postings from the various autotesters and produce a nice webpage
>> > with revisions and known FAIL/XPASSes for the target triplets that
>> > are tested.
>>
>> Sure, though that describes a different tool.  I'm after a tool that
>> will 'exit 0' if the testsuite finished with nominal results.
>
> Not to stop you from (partly) reinventing the wheel, but that's
> pretty much what contrib/regression/btest-gcc.sh already does,
> though you have to feed it a baseline a set of processed .sum
> files which could (for a calling script or a modified
> btest-gcc.sh) live in, say, contrib/target-results/<target>.
> It handles "duplicate" test names by marking it as failing if
> any of them has failed.  Works good enough.

Yeah, I actually considered using it by extracting the actual .sum
file processing out of it (I was not interested in it running the
build nor the tests).

However, I also needed to add support for marking flaky tests and
putting an expiration date on failures.  Additionally, I needed
versioned failure manifests, and I could not justify storing in SVN
multiple directories with 12Mb worth of .sum files in them.

The small manifest file also has the local advantage of serving as
release documentation for what we expect to fail and why.


Diego.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2011-09-23 13:11 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-07 15:28 RFC: Improving support for known testsuite failures Diego Novillo
2011-09-07 18:43 ` Andreas Jaeger
2011-09-07 19:57 ` Joseph S. Myers
2011-09-08  8:31 ` Richard Guenther
2011-09-08 11:05   ` Diego Novillo
2011-09-08 11:16     ` Richard Guenther
2011-09-08 11:34       ` Diego Novillo
2011-09-08 11:49         ` Richard Guenther
2011-09-08 12:14           ` Diego Novillo
2011-09-08 12:20             ` Richard Guenther
2011-09-08 12:26               ` Diego Novillo
2011-09-08 12:30                 ` Richard Guenther
2011-09-08 12:33                   ` Diego Novillo
2011-09-08 13:24         ` Richard Earnshaw
2011-09-08 13:55           ` Diego Novillo
2011-09-08 14:03             ` Richard Earnshaw
2011-09-08 16:41           ` Joseph S. Myers
2011-09-23  0:07     ` Hans-Peter Nilsson
2011-09-23 13:11       ` Diego Novillo
2011-09-08 16:39   ` Joseph S. Myers
2011-09-08 22:27   ` Michael Hope

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).