From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 0482B3857C59; Wed, 10 Mar 2021 09:47:18 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0482B3857C59 From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug middle-end/64928] [8/9/10/11 Regression] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs Date: Wed, 10 Mar 2021 09:47:18 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: middle-end X-Bugzilla-Version: 4.9.2 X-Bugzilla-Keywords: compile-time-hog, memory-hog X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org X-Bugzilla-Target-Milestone: 8.5 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Mar 2021 09:47:19 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D64928 --- Comment #36 from Richard Biener --- So the issue is still the same - one thing I noticed is that store-motion a= lso adds a flag for each counter update to avoid introducing store-data-races. -fallow-store-data-races mitigates that part and speeds up the compilation quite a bit. In case there are threads involved you'd want -fprofile-update=3Datomic which then causes store-motion to give up and the compile-time is great overall. The original trigger of the regression is likely the marking of the profile counters as to not be aliased - we might want to introduce another flag to tell that store-data-races for the particular decl are not a consideration (maybe even have some user-visible attribute for this). Otherwise re-confirmed (I stripped options down to -O -fPIC -fprofile-arcs -ftest-coverage): rguenther@ryzen:/tmp> /usr/bin/time ~/install/gcc-11.0/usr/local/bin/gcc -S= -O -fPIC -fprofile-arcs -ftest-coverage fib-2.o1-fib-2.i 1.84user 0.05system 0:01.90elapsed 99%CPU (0avgtext+0avgdata 160764maxresident)k 0inputs+0outputs (0major+58129minor)pagefaults 0swaps rguenther@ryzen:/tmp> /usr/bin/time ~/install/gcc-11.0/usr/local/bin/gcc -S= -O -fPIC -fprofile-arcs -ftest-coverage fib-3.o1-fib-3.i=20 10.15user 0.17system 0:10.32elapsed 99%CPU (0avgtext+0avgdata 726688maxresident)k 0inputs+0outputs (0major+265008minor)pagefaults 0swaps rguenther@ryzen:/tmp> /usr/bin/time ~/install/gcc-11.0/usr/local/bin/gcc -S= -O -fPIC -fprofile-arcs -ftest-coverage fib-4.o1-fib-4.i=20 43.60user 1.06system 0:44.68elapsed 99%CPU (0avgtext+0avgdata 6107260maxresident)k 0inputs+0outputs (0major+1765217minor)pagefaults 0swaps rguenther@ryzen:/tmp> /usr/bin/time ~/install/gcc-11.0/usr/local/bin/gcc -S= -O -fPIC -fprofile-arcs -ftest-coverage fib-5.o1-fib-5.i=20 gcc: fatal error: Killed signal terminated program cc1 compilation terminated. Command exited with non-zero status 1 143.09user 3.93system 2:28.29elapsed 99%CPU (0avgtext+0avgdata 24636148maxresident)k 37504inputs+0outputs (31major+6133278minor)pagefaults 0swaps on the last which runs OOM adding -fallow-store-data-races does rguenther@ryzen:/tmp> /usr/bin/time ~/install/gcc-11.0/usr/local/bin/gcc -S= -O -fPIC -fprofile-arcs -ftest-coverage fib-5.o1-fib-5.i -fallow-store-data-ra= ces 123.06user 0.45system 2:03.59elapsed 99%CPU (0avgtext+0avgdata 1777700maxresident)k 57304inputs+0outputs (68major+535127minor)pagefaults 0swaps and -fprofile-update=3Datomic rguenther@ryzen:/tmp> /usr/bin/time ~/install/gcc-11.0/usr/local/bin/gcc -S= -O -fPIC -fprofile-arcs -ftest-coverage fib-5.o1-fib-5.i -fprofile-update=3Dat= omic=20 0.61user 0.02system 0:00.63elapsed 100%CPU (0avgtext+0avgdata 73236maxresident)k 72inputs+0outputs (0major+18284minor)pagefaults 0swaps and -fno-tree-loop-im rguenther@ryzen:/tmp> /usr/bin/time ~/install/gcc-11.0/usr/local/bin/gcc -S= -O -fPIC -fprofile-arcs -ftest-coverage fib-5.o1-fib-5.i -fno-tree-loop-im=20= =20=20=20=20=20 1.06user 0.01system 0:01.07elapsed 99%CPU (0avgtext+0avgdata 90672maxreside= nt)k 0inputs+0outputs (0major+24331minor)pagefaults 0swaps I still wonder if you can produce an even smaller testcase where visualizing the CFG is possible. Unfortunately the source is mechanically generated and following it is hard. Like a testcase that retains the basic structure but ends up with just a few (2, less than 10) computed gotos?=