public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "cvs-commit at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug testsuite/66005] libgomp make check time is excessive
Date: Wed, 28 Jun 2023 11:39:49 +0000	[thread overview]
Message-ID: <bug-66005-4-PPduaEQdFK@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-66005-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66005

--- Comment #24 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-13 branch has been updated by Thomas Schwinge
<tschwinge@gcc.gnu.org>:

https://gcc.gnu.org/g:3840d5ccf750b6a059258be7faa4a3fce85a6fa6

commit r13-7494-g3840d5ccf750b6a059258be7faa4a3fce85a6fa6
Author: Thomas Schwinge <thomas@codesourcery.com>
Date:   Tue Apr 25 23:53:12 2023 +0200

    Support parallel testing in libgomp, part II [PR66005]

    ..., and enable if 'flock' is available for serializing execution testing.

    Regarding the default of 19 parallel slots, this turned out to be a local
    minimum for wall time when testing this on:

        $ uname -srvi
        Linux 4.2.0-42-generic #49~14.04.1-Ubuntu SMP Wed Jun 29 20:22:11 UTC
2016 x86_64
        $ grep '^model name' < /proc/cpuinfo | uniq -c
             32 model name      : Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz

    ... in two configurations: case (a) standard configuration, no offloading
    configured, case (b) offloading for GCN and nvptx configured but no devices
    available.  For both cases, default plus '-m32' variant.

        $ \time make check-target-libgomp
RUNTESTFLAGS="--target_board=unix\{,-m32\}"

    Case (a), baseline:

        6432.23user 332.38system 47:32.28elapsed 237%CPU (0avgtext+0avgdata
505044maxresident)k
        6382.43user 319.21system 47:06.04elapsed 237%CPU (0avgtext+0avgdata
505172maxresident)k

    This is what people have been complaining about, rightly so, in
    <https://gcc.gnu.org/PR66005> "libgomp make check time is excessive" and
    elsewhere.

    Case (a), parallelized:

        -j12 GCC_TEST_PARALLEL_SLOTS=10
        3088.49user 267.74system 6:43.82elapsed 831%CPU (0avgtext+0avgdata
505188maxresident)k
        -j15 GCC_TEST_PARALLEL_SLOTS=15
        3308.08user 294.79system 5:56.04elapsed 1011%CPU (0avgtext+0avgdata
505360maxresident)k
        -j17 GCC_TEST_PARALLEL_SLOTS=17
        3539.93user 298.99system 5:27.86elapsed 1170%CPU (0avgtext+0avgdata
505112maxresident)k
        -j18 GCC_TEST_PARALLEL_SLOTS=18
        3697.50user 317.18system 5:14.63elapsed 1275%CPU (0avgtext+0avgdata
505360maxresident)k
        -j19 GCC_TEST_PARALLEL_SLOTS=19
        3765.94user 324.27system 5:13.22elapsed 1305%CPU (0avgtext+0avgdata
505128maxresident)k
        -j20 GCC_TEST_PARALLEL_SLOTS=20
        3684.66user 312.32system 5:15.26elapsed 1267%CPU (0avgtext+0avgdata
505100maxresident)k
        -j23 GCC_TEST_PARALLEL_SLOTS=23
        4040.59user 347.10system 5:29.12elapsed 1333%CPU (0avgtext+0avgdata
505200maxresident)k
        -j26 GCC_TEST_PARALLEL_SLOTS=26
        3973.24user 377.96system 5:24.70elapsed 1340%CPU (0avgtext+0avgdata
505160maxresident)k
        -j32 GCC_TEST_PARALLEL_SLOTS=32
        4004.42user 346.10system 5:16.11elapsed 1376%CPU (0avgtext+0avgdata
505160maxresident)k

    Yay!

    Case (b), baseline; 2+ h:

        7227.58user 700.54system 2:14:33elapsed 98%CPU (0avgtext+0avgdata
994264maxresident)k

    Case (b), parallelized:

        -j12 GCC_TEST_PARALLEL_SLOTS=10
        7377.46user 777.52system 16:06.63elapsed 843%CPU (0avgtext+0avgdata
994344maxresident)k
        -j15 GCC_TEST_PARALLEL_SLOTS=15
        8019.18user 721.42system 12:13.56elapsed 1191%CPU (0avgtext+0avgdata
994228maxresident)k
        -j17 GCC_TEST_PARALLEL_SLOTS=17
        8530.11user 716.95system 10:45.92elapsed 1431%CPU (0avgtext+0avgdata
994176maxresident)k
        -j18 GCC_TEST_PARALLEL_SLOTS=18
        8776.79user 645.89system 10:27.20elapsed 1502%CPU (0avgtext+0avgdata
994248maxresident)k
        -j19 GCC_TEST_PARALLEL_SLOTS=19
        9332.37user 641.76system 10:15.09elapsed 1621%CPU (0avgtext+0avgdata
994260maxresident)k
        -j20 GCC_TEST_PARALLEL_SLOTS=20
        9609.54user 789.88system 10:26.94elapsed 1658%CPU (0avgtext+0avgdata
994284maxresident)k
        -j23 GCC_TEST_PARALLEL_SLOTS=23
        10362.40user 911.14system 10:44.47elapsed 1749%CPU (0avgtext+0avgdata
994208maxresident)k
        -j26 GCC_TEST_PARALLEL_SLOTS=26
        11159.44user 850.99system 11:09.25elapsed 1794%CPU (0avgtext+0avgdata
994256maxresident)k
        -j32 GCC_TEST_PARALLEL_SLOTS=32
        11453.50user 939.52system 11:00.38elapsed 1876%CPU (0avgtext+0avgdata
994240maxresident)k

    On my Dell Precision 7530 laptop:

        $ uname -srvi
        Linux 5.15.0-71-generic #78-Ubuntu SMP Tue Apr 18 09:00:29 UTC 2023
x86_64
        $ grep '^model name' < /proc/cpuinfo | uniq -c
             12 model name      : Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
        $ nvidia-smi -L
        GPU 0: Quadro P1000 (UUID: GPU-e043973b-b52a-d02b-c066-a8fdbf64e8ea)

    ... in two configurations: case (c) standard configuration, no offloading
    configured, case (d) offloading for nvptx configured and device available.
    For both cases, only default variant, no '-m32'.

        $ \time make check-target-libgomp

    Case (c), baseline; roughly half of case (a) (just one variant):

        1180.98user 110.80system 19:36.40elapsed 109%CPU (0avgtext+0avgdata
505148maxresident)k
        1133.22user 111.08system 19:35.75elapsed 105%CPU (0avgtext+0avgdata
505212maxresident)k

    Case (c), parallelized:

        -j12 GCC_TEST_PARALLEL_SLOTS=2
        1143.83user 110.76system 10:20.46elapsed 202%CPU (0avgtext+0avgdata
505216maxresident)k
        -j12 GCC_TEST_PARALLEL_SLOTS=6
        1737.08user 143.94system 4:59.48elapsed 628%CPU (0avgtext+0avgdata
505200maxresident)k
        1730.31user 143.02system 4:58.75elapsed 627%CPU (0avgtext+0avgdata
505152maxresident)k
        -j12 GCC_TEST_PARALLEL_SLOTS=8
        2192.63user 169.34system 4:52.96elapsed 806%CPU (0avgtext+0avgdata
505216maxresident)k
        2219.04user 167.67system 4:53.19elapsed 814%CPU (0avgtext+0avgdata
505152maxresident)k
        -j12 GCC_TEST_PARALLEL_SLOTS=10
        2463.93user 184.98system 4:48.39elapsed 918%CPU (0avgtext+0avgdata
505200maxresident)k
        2455.62user 183.68system 4:47.40elapsed 918%CPU (0avgtext+0avgdata
505216maxresident)k
        -j12 GCC_TEST_PARALLEL_SLOTS=12
        2591.04user 192.64system 4:44.98elapsed 976%CPU (0avgtext+0avgdata
505216maxresident)k
        2581.23user 195.21system 4:47.51elapsed 965%CPU (0avgtext+0avgdata
505212maxresident)k
        -j20 GCC_TEST_PARALLEL_SLOTS=20 [oversubscribe]
        2613.18user 199.51system 4:44.06elapsed 990%CPU (0avgtext+0avgdata
505216maxresident)k

    Case (d), baseline (compared to case (b): only nvptx offloading
compilation,
    but also nvptx offloading execution); ~1 h:

        2841.93user 653.68system 1:02:26elapsed 93%CPU (0avgtext+0avgdata
909792maxresident)k
        2842.03user 654.39system 1:02:24elapsed 93%CPU (0avgtext+0avgdata
909880maxresident)k

    Case (d), parallelized:

        -j12 GCC_TEST_PARALLEL_SLOTS=2
        2856.39user 606.87system 33:58.64elapsed 169%CPU (0avgtext+0avgdata
909948maxresident)k
        -j12 GCC_TEST_PARALLEL_SLOTS=6
        3444.90user 666.86system 18:37.57elapsed 367%CPU (0avgtext+0avgdata
909856maxresident)k
        3462.13user 667.13system 18:36.87elapsed 369%CPU (0avgtext+0avgdata
909872maxresident)k
        -j12 GCC_TEST_PARALLEL_SLOTS=8
        3929.74user 716.22system 18:02.36elapsed 429%CPU (0avgtext+0avgdata
909832maxresident)k
        -j12 GCC_TEST_PARALLEL_SLOTS=10
        4152.84user 736.16system 17:43.05elapsed 459%CPU (0avgtext+0avgdata
909872maxresident)k
        -j12 GCC_TEST_PARALLEL_SLOTS=12
        4209.60user 749.00system 17:35.20elapsed 469%CPU (0avgtext+0avgdata
909840maxresident)k
        -j20 GCC_TEST_PARALLEL_SLOTS=20 [oversubscribe]
        4255.54user 756.78system 17:29.06elapsed 477%CPU (0avgtext+0avgdata
909868maxresident)k

    Worth noting is that with nvptx offloading, there is one execution test
case
    that times out ('libgomp.fortran/reverse-offload-5.f90').  This effectively
    stalls progress for almost 5 min: quickly other executions test cases queue
up
    on the lock for all parallel slots.  That's working as expected; just
noting
    this as it accordingly does skew the wall time numbers.

            PR testsuite/66005
            libgomp/
            * configure.ac: Look for 'flock'.
            * testsuite/Makefile.am (gcc_test_parallel_slots): Enable parallel
testing.
            * testsuite/config/default.exp: Don't 'load_lib "standard.exp"'
here...
            * testsuite/lib/libgomp.exp: ... but here, instead.
            (libgomp_load): Override for parallel testing.
            * testsuite/libgomp-site-extra.exp.in (FLOCK): Set.
            * configure: Regenerate.
            * Makefile.in: Regenerate.
            * testsuite/Makefile.in: Regenerate.

    (cherry picked from commit 6c3b30ef9e0578509bdaf59c13da4a212fe6c2ba)

  parent reply	other threads:[~2023-06-28 11:39 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-04 13:38 [Bug libgomp/66005] New: " ro at gcc dot gnu.org
2022-01-21 20:42 ` [Bug libgomp/66005] " belyshev at depni dot sinp.msu.ru
2022-02-08 14:15 ` tschwinge at gcc dot gnu.org
2022-05-29  5:05 ` egallager at gcc dot gnu.org
2023-05-04 15:29 ` tschwinge at gcc dot gnu.org
2023-05-05  9:05 ` tschwinge at gcc dot gnu.org
2023-05-15 10:11 ` cvs-commit at gcc dot gnu.org
2023-05-15 10:12 ` cvs-commit at gcc dot gnu.org
2023-05-15 10:31 ` [Bug testsuite/66005] " tschwinge at gcc dot gnu.org
2023-05-15 11:15 ` ro at CeBiTec dot Uni-Bielefeld.DE
2023-05-15 11:38 ` jakub at gcc dot gnu.org
2023-05-15 14:22 ` tschwinge at gcc dot gnu.org
2023-05-15 18:35 ` tschwinge at gcc dot gnu.org
2023-05-15 20:06 ` egallager at gcc dot gnu.org
2023-05-15 20:22 ` jakub at gcc dot gnu.org
2023-05-15 20:42 ` tschwinge at gcc dot gnu.org
2023-05-16  7:46 ` ro at CeBiTec dot Uni-Bielefeld.DE
2023-05-16  7:57 ` jakub at gcc dot gnu.org
2023-06-02  7:51 ` cvs-commit at gcc dot gnu.org
2023-06-02 10:07 ` tschwinge at gcc dot gnu.org
2023-06-02 10:16 ` iains at gcc dot gnu.org
2023-06-05 14:52 ` tschwinge at gcc dot gnu.org
2023-06-23 12:51 ` tschwinge at gcc dot gnu.org
2023-06-23 13:42 ` jakub at gcc dot gnu.org
2023-06-28 11:39 ` cvs-commit at gcc dot gnu.org
2023-06-28 11:39 ` cvs-commit at gcc dot gnu.org [this message]
2023-06-28 11:39 ` cvs-commit at gcc dot gnu.org
2023-06-28 11:40 ` cvs-commit at gcc dot gnu.org
2023-06-28 11:41 ` cvs-commit at gcc dot gnu.org
2023-06-28 11:41 ` cvs-commit at gcc dot gnu.org
2023-06-28 11:42 ` cvs-commit at gcc dot gnu.org
2023-06-28 11:42 ` cvs-commit at gcc dot gnu.org
2023-06-28 11:42 ` cvs-commit at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-66005-4-PPduaEQdFK@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).