public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug testsuite/110316] New: [14 regression] g++.dg/ext/timevar1.C and timevar2.C fail erratically
@ 2023-06-20  0:21 seurer at gcc dot gnu.org
  2023-06-20  6:58 ` [Bug middle-end/110316] [10/11/12/13/14 Regression] " pinskia at gcc dot gnu.org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: seurer at gcc dot gnu.org @ 2023-06-20  0:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110316

            Bug ID: 110316
           Summary: [14 regression] g++.dg/ext/timevar1.C and timevar2.C
                    fail erratically
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: testsuite
          Assignee: unassigned at gcc dot gnu.org
          Reporter: seurer at gcc dot gnu.org
  Target Milestone: ---

I unfortunately do not have a clear starting point for this but recently the
g++.dg/ext/timevar1.C and timevar2 tests began failing some runs and working on
the next.  It is happening on one of our newer/faster machines but it did not
used to fail there.

The last run I did not see any failures (for this ome nor previously) was
47fa3cef59a031f1b0fdce309ff634fab717606d, r14-1906-g47fa3cef59a031

The first run with failures was 0f9bb3e7a4aab95fd449f60b5f891ed9a6e5f352,
r14-1910-g0f9bb3e7a4aab9

I don't see anything in that range that might cause this, though.

FAIL: g++.dg/ext/timevar1.C  -std=gnu++17 (internal compiler error: in
validate_phases, at timevar.cc:626)
FAIL: g++.dg/ext/timevar1.C  -std=gnu++17 (test for excess errors)
FAIL: g++.dg/ext/timevar2.C  -std=gnu++20 (internal compiler error: in
validate_phases, at timevar.cc:626)
FAIL: g++.dg/ext/timevar2.C  -std=gnu++20 (test for excess errors)

spawn -ignore SIGHUP
/home/gccbuild/build/nightly/build-gcc-trunk/gcc/testsuite/g++1/../../xg++
-B/home/gccbuild/build/nightly/build-gcc-trunk/gcc/testsuite/g++1/../../
/home/gccbuild/gcc_trunk_git/gcc/gcc/testsuite/g++.dg/ext/timevar2.C
-fdiagnostics-plain-output -nostdinc++
-I/home/gccbuild/build/nightly/build-gcc-trunk/powerpc64le-unknown-linux-gnu/libstdc++-v3/include/powerpc64le-unknown-linux-gnu
-I/home/gccbuild/build/nightly/build-gcc-trunk/powerpc64le-unknown-linux-gnu/libstdc++-v3/include
-I/home/gccbuild/gcc_trunk_git/gcc/libstdc++-v3/libsupc++
-I/home/gccbuild/gcc_trunk_git/gcc/libstdc++-v3/include/backward
-I/home/gccbuild/gcc_trunk_git/gcc/libstdc++-v3/testsuite/util
-fmessage-length=0 -std=gnu++98 -ftime-report -S -o timevar2.s^M
^M
Time variable                                   usr           sys          wall
          GGC^M
 phase setup                        :   0.00 (  0%)   0.00 (  0%)   0.01 (100%)
 2835k ( 81%)^M
 phase parsing                      :   0.01 (100%)   0.00 (  0%)   0.00 (  0%)
  603k ( 17%)^M
 |name lookup                       :   0.00 (  0%)   0.00 (  0%)   0.01 (100%)
  174k (  5%)^M
 parser (global)                    :   0.01 (100%)   0.00 (  0%)   0.00 (  0%)
  587k ( 17%)^M
 TOTAL                              :   0.01          0.00          0.01       
 3496k^M
Extra diagnostic checks enabled; compiler may run slowly.^M
Configure with --enable-checking=release to disable checks.^M
Timing error: total of phase timers exceeds total time.^M
wall    1.000002666800281492e-02 > 1.000000150810810562e-02^M
internal compiler error: in validate_phases, at timevar.cc:626^M
0x10ff92bb toplev::~toplev()^M
        /home/gccbuild/gcc_trunk_git/gcc/gcc/toplev.cc:2155^M
xg++: internal compiler error: Segmentation fault signal terminated program
cc1plus^M

Note that the two phase timings are both 0.01 and both report 100% while the
total time is also 0.01.  Is this maybe a rounding issue?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug middle-end/110316] [10/11/12/13/14 Regression] g++.dg/ext/timevar1.C and timevar2.C fail erratically
  2023-06-20  0:21 [Bug testsuite/110316] New: [14 regression] g++.dg/ext/timevar1.C and timevar2.C fail erratically seurer at gcc dot gnu.org
@ 2023-06-20  6:58 ` pinskia at gcc dot gnu.org
  2023-07-07 10:45 ` [Bug middle-end/110316] [11/12/13/14 " rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-06-20  6:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110316

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
   Target Milestone|---                         |10.5
      Known to fail|                            |4.8.0
   Last reconfirmed|                            |2023-06-20
          Component|testsuite                   |middle-end
            Summary|[14 regression]             |[10/11/12/13/14 Regression]
                   |g++.dg/ext/timevar1.C and   |g++.dg/ext/timevar1.C and
                   |timevar2.C fail erratically |timevar2.C fail erratically
           Keywords|                            |ice-on-valid-code
             Status|UNCONFIRMED                 |NEW
      Known to work|                            |4.7.0

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I would not call this a regression from GCC 13 but from much earlier,
r0-117422-ga910399dfd44.

The check is definitely wrong since wall difference is less than ticks_to_msec
here.

Most likely the check should be rather:
  if (phase_user > total->user + toleranceu
      || phase_sys > total->sys + tolerances
      || phase_wall > total->wall + tolerancew
      || phase_ggc_mem > total->ggc_mem * tolerance)

Where toleranceu, etc. should be defined as:
  tolerancew = total->user * (tolerance - 1);
  toleranceu = total->user * (tolerance - 1);
  tolerances = total->sys * (tolerance - 1);
#ifdef USE_TIMES
  if (tolerancew < ticks_to_msec)
    tolerancew = ticks_to_msec;
  if (toleranceu < ticks_to_msec)
    toleranceu = ticks_to_msec;
  if (tolerances < ticks_to_msec)
    tolerances = ticks_to_msec;
#endif
#ifdef USE_GETRUSAGE
  toleranceu = total->user * (tolerance - 1);
  tolerances = total->sys * (tolerance - 1);
#endif
#ifdef USE_CLOCK
  if (toleranceu < clocks_to_msec)
    toleranceu = clocks_to_msec;
#endif

The reasoning is the tolerance should be at min a tick.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug middle-end/110316] [11/12/13/14 Regression] g++.dg/ext/timevar1.C and timevar2.C fail erratically
  2023-06-20  0:21 [Bug testsuite/110316] New: [14 regression] g++.dg/ext/timevar1.C and timevar2.C fail erratically seurer at gcc dot gnu.org
  2023-06-20  6:58 ` [Bug middle-end/110316] [10/11/12/13/14 Regression] " pinskia at gcc dot gnu.org
@ 2023-07-07 10:45 ` rguenth at gcc dot gnu.org
  2023-07-21 17:47 ` pinskia at gcc dot gnu.org
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-07 10:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110316

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|10.5                        |11.5

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 10 branch is being closed.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug middle-end/110316] [11/12/13/14 Regression] g++.dg/ext/timevar1.C and timevar2.C fail erratically
  2023-06-20  0:21 [Bug testsuite/110316] New: [14 regression] g++.dg/ext/timevar1.C and timevar2.C fail erratically seurer at gcc dot gnu.org
  2023-06-20  6:58 ` [Bug middle-end/110316] [10/11/12/13/14 Regression] " pinskia at gcc dot gnu.org
  2023-07-07 10:45 ` [Bug middle-end/110316] [11/12/13/14 " rguenth at gcc dot gnu.org
@ 2023-07-21 17:47 ` pinskia at gcc dot gnu.org
  2023-08-04 10:29 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-07-21 17:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110316

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
See https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625180.html thread too
which is exactly about this issue.

Basically what is happening is after inlining, there is now fused multiple
subtract being used and that causes issues. This is why it is not seen on x86
(without using --with-cpu=).

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug middle-end/110316] [11/12/13/14 Regression] g++.dg/ext/timevar1.C and timevar2.C fail erratically
  2023-06-20  0:21 [Bug testsuite/110316] New: [14 regression] g++.dg/ext/timevar1.C and timevar2.C fail erratically seurer at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2023-07-21 17:47 ` pinskia at gcc dot gnu.org
@ 2023-08-04 10:29 ` cvs-commit at gcc dot gnu.org
  2024-02-21  7:12 ` pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-08-04 10:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110316

--- Comment #4 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Matthew Malcomson <matmal01@gcc.gnu.org>:

https://gcc.gnu.org/g:0782b01c9ea43d43648071faa9c65a101f5068a2

commit r14-2986-g0782b01c9ea43d43648071faa9c65a101f5068a2
Author: Matthew Malcomson <matthew.malcomson@arm.com>
Date:   Fri Aug 4 11:26:47 2023 +0100

    mid-end: Use integral time intervals in timevar.cc

    On some AArch64 bootstrapped builds, we were getting a flaky test
    because the floating point operations in `get_time` were being fused
    with the floating point operations in `timevar_accumulate`.

    This meant that the rounding behaviour of our multiplication with
    `ticks_to_msec` was different when used in `timer::start` and when
    performed in `timer::stop`.  These extra inaccuracies led to the
    testcase `g++.dg/ext/timevar1.C` being flaky on some hardware.

    ------------------------------
    Avoiding the inlining which was agreed to be undesirable.  Three
    alternative approaches:
    1) Use `-ffp-contract=on` to avoid this particular optimisation.
    2) Adjusting the code so that the "tolerance" is always of the order of
       a "tick".
    3) Recording times and elapsed differences in integral values.
       - Could be in terms of a standard measurement (e.g. nanoseconds or
         microseconds).
       - Could be in terms of whatever integral value ("ticks" /
         seconds&microseconds / "clock ticks") is returned from the syscall
         chosen at configure time.

    While `-ffp-contract=on` removes the problem that I bumped into, there
    has been a similar bug on x86 that was to do with a different floating
    point problem that also happens after `get_time` and
    `timevar_accumulate` both being inlined into the same function.  Hence
    it seems worth choosing a different approach.

    Of the two other solutions, recording measurements in integral values
    seems the most robust against slightly "off" measurements being
    presented to the user -- even though it could avoid the ICE that creates
    a flaky test.

    I considered storing time in whatever units our syscall returns and
    normalising them at the time we print out rather than normalising them
    to nanoseconds at the point we record our "current time".  The logic
    being that normalisation could have some rounding affect (e.g. if
    TICKS_PER_SECOND is 3) that would be taken into account in calculations.

    I decided against it in order to give the values recorded in
    `timevar_time_def` some interpretive value so it's easier to read the
    code.  Compared to the small rounding that would represent a tiny amount
    of time and AIUI can not trigger the same kind of ICE's as we are
    attempting to fix, said interpretive value seems more valuable.

    Recording time in microseconds seemed reasonable since all obvious
    values for ticks and `getrusage` are at microsecond granularity or less
    precise.  That said, since TICKS_PER_SECOND and CLOCKS_PER_SEC are both
    variables given to use by the host system I was not sure of that enough
    to make this decision.

    ------------------------------
    timer::all_zero is ignoring rows which are inconsequential to the user
    and would be printed out as all zeros.  Since upon printing rows we
    convert to the same double value and print out the same precision as
    before, we return true/false based on the same amount of time as before.

    timer::print_row casts to a floating point measurement in units of
    seconds as was printed out before.

    timer::validate_phases -- I'm printing out nanoseconds here rather than
    floating point seconds since this is an error message for when things
    have "gone wrong" printing out the actual nanoseconds that have been
    recorded seems like the best approach.
    N.b. since we now print out nanoseconds instead of floating point value
    the padding requirements are different.  Originally we were padding to
    24 characters and printing 18 decimal places.  This looked odd with the
    now visually smaller values getting printed.  I judged 13 characters
    (corresponding to 2 hours) to be a reasonable point at which our
    alignment could start to degrade and this provides a more compact output
    for the majority of cases (checked by triggering the error case via
    GDB).

    ------------------------------
    N.b. I use a literal 1000000000 for "NANOSEC_PER_SEC".  I believe this
    would fit in an integer on all hosts that GCC supports, but am not
    certain there are not strange integer sizes we support hence am pointing
    it out for special attention during review.

    ------------------------------
    No expected change in generated code.
    Bootstrapped and regtested on AArch64 with no regressions.

    Hope this is acceptable -- I had originally planned to use
    `-ffp-contract` as agreed until I saw mention of the old x86 bug in the
    same area which was not to do with floating point contraction of
    operations (PR 99903).

    gcc/ChangeLog:

            PR middle-end/110316
            PR middle-end/9903
            * timevar.cc (NANOSEC_PER_SEC, TICKS_TO_NANOSEC,
            CLOCKS_TO_NANOSEC, nanosec_to_floating_sec, percent_of): New.
            (TICKS_TO_MSEC, CLOCKS_TO_MSEC): Remove these macros.
            (timer::validate_phases): Use integral arithmetic to check
            validity.
            (timer::print_row, timer::print): Convert from integral
            nanoseconds to floating point seconds before printing.
            (timer::all_zero): Change limit to nanosec count instead of
            fractional count of seconds.
            (make_json_for_timevar_time_def): Convert from integral
            nanoseconds to floating point seconds before recording.
            * timevar.h (struct timevar_time_def): Update all measurements
            to use uint64_t nanoseconds rather than seconds stored in a
            double.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug middle-end/110316] [11/12/13/14 Regression] g++.dg/ext/timevar1.C and timevar2.C fail erratically
  2023-06-20  0:21 [Bug testsuite/110316] New: [14 regression] g++.dg/ext/timevar1.C and timevar2.C fail erratically seurer at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2023-08-04 10:29 ` cvs-commit at gcc dot gnu.org
@ 2024-02-21  7:12 ` pinskia at gcc dot gnu.org
  2024-03-20 19:50 ` seurer at gcc dot gnu.org
  2024-03-20 20:32 ` pinskia at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-21  7:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110316

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|11.5                        |14.0
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Fixed.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug middle-end/110316] [11/12/13/14 Regression] g++.dg/ext/timevar1.C and timevar2.C fail erratically
  2023-06-20  0:21 [Bug testsuite/110316] New: [14 regression] g++.dg/ext/timevar1.C and timevar2.C fail erratically seurer at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2024-02-21  7:12 ` pinskia at gcc dot gnu.org
@ 2024-03-20 19:50 ` seurer at gcc dot gnu.org
  2024-03-20 20:32 ` pinskia at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: seurer at gcc dot gnu.org @ 2024-03-20 19:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110316

--- Comment #6 from seurer at gcc dot gnu.org ---
Was this only fixed for trunk?  I started seeing it for gcc 13:

g:5b928badac560ad48e0e9fc480096ff396d9d9c6, r13-8468-g5b928badac560a
make  -k check-gcc RUNTESTFLAGS="dg.exp=g++.dg/ext/timevar1.C"
FAIL: g++.dg/ext/timevar1.C  -std=gnu++98 (internal compiler error: in
validate_phases, at timevar.cc:626)
FAIL: g++.dg/ext/timevar1.C  -std=gnu++98 (test for excess errors)
# of expected passes            7
# of unexpected failures        2

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug middle-end/110316] [11/12/13/14 Regression] g++.dg/ext/timevar1.C and timevar2.C fail erratically
  2023-06-20  0:21 [Bug testsuite/110316] New: [14 regression] g++.dg/ext/timevar1.C and timevar2.C fail erratically seurer at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2024-03-20 19:50 ` seurer at gcc dot gnu.org
@ 2024-03-20 20:32 ` pinskia at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-03-20 20:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110316

--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to seurer from comment #6)
> Was this only fixed for trunk?  I started seeing it for gcc 13:

Yes it was only fixed on the trunk. The ICE only happens when -ftime-report is
used which most folks won't run into the issue and it only happens when there
is a short time spent in many of the passes so it won't show up when folks try
to use -ftime-report to see where the time is mostly spent. So backporting
seems not high priority except to have a clean testsuite results.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-03-20 20:32 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-20  0:21 [Bug testsuite/110316] New: [14 regression] g++.dg/ext/timevar1.C and timevar2.C fail erratically seurer at gcc dot gnu.org
2023-06-20  6:58 ` [Bug middle-end/110316] [10/11/12/13/14 Regression] " pinskia at gcc dot gnu.org
2023-07-07 10:45 ` [Bug middle-end/110316] [11/12/13/14 " rguenth at gcc dot gnu.org
2023-07-21 17:47 ` pinskia at gcc dot gnu.org
2023-08-04 10:29 ` cvs-commit at gcc dot gnu.org
2024-02-21  7:12 ` pinskia at gcc dot gnu.org
2024-03-20 19:50 ` seurer at gcc dot gnu.org
2024-03-20 20:32 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).