From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 7287E385802B; Mon, 29 Nov 2021 14:22:42 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7287E385802B From: "aldyh at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/103409] [12 Regression] 18% SPEC2017 WRF compile-time regression with -O2 -flto since r12-5228-gb7a23949b0dcc4205fcc2be6b84b91441faa384d Date: Mon, 29 Nov 2021 14:22:42 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: compile-time-hog X-Bugzilla-Severity: normal X-Bugzilla-Who: aldyh at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 12.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Nov 2021 14:22:42 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D103409 --- Comment #9 from Aldy Hernandez --- There's definitely something in the threader, but I'm not sure it's the cau= se of all the regression. For the record, I've reproduced on ppc64le with a spec .cfg file having: OPTIMIZE =3D -O2 -flto=3D100 -save-temps -ftime-report -v -fno-checking The slow wrf_r.ltransNN.o files that dominate the compilation and are taking more than 2-3 seconds are (42, 76, and 24). I've distilled -ftime-report f= or VRP and jump threading, which usually go hand in hand now that VRP2 runs wi= th ranger: dumping.42: tree VRP : 13.70 ( 3%) 0.08 ( 2%= )=20 13.73 ( 3%) 45M ( 4%) dumping.42: backwards jump threading : 26.68 ( 5%) 0.00 ( 0%= )=20 26.72 ( 5%) 3609k ( 0%) dumping.42: TOTAL : 524.00 3.31=20=20= =20=20=20=20=20 527.30 1277M dumping.76: tree VRP : 38.30 ( 13%) 0.03 ( 2%= )=20 38.31 ( 13%) 19M ( 2%) dumping.76: backwards jump threading : 47.38 ( 17%) 0.01 ( 1%= )=20 47.37 ( 16%) 1671k ( 0%) dumping.76: TOTAL : 286.03 1.79=20=20= =20=20=20=20=20 287.82 1173M dumping.24: tree VRP : 87.43 ( 8%) 0.07 ( 2%= )=20 87.53 ( 8%) 58M ( 3%) dumping.24: backwards jump threading : 129.81 ( 12%) 0.00 ( 0%) 129.81 ( 12%) 8986k ( 0%) dumping.24: TOTAL :1042.37 3.58=20=20= =20=20=20=20 1045.93 2325M Threading is usually more expensive than VRP because it tries candidates ov= er and over, but it's not meant to be orders of magnitude slower. Prior to the bisected patch in r12-5228, we had: dumping.42: tree VRP : 14.58 ( 3%) 0.07 ( 2%= )=20 14.62 ( 3%) 45M ( 4%) dumping.42: backwards jump threading : 13.88 ( 3%) 0.00 ( 0%= )=20 13.89 ( 3%) 3609k ( 0%) dumping.42: TOTAL : 484.12 3.06=20=20= =20=20=20=20=20 487.18 1277M dumping.76: tree VRP : 37.68 ( 13%) 0.04 ( 2%= )=20 37.79 ( 13%) 19M ( 2%) dumping.76: backwards jump threading : 45.50 ( 15%) 0.03 ( 2%= )=20 45.52 ( 15%) 1671k ( 0%) dumping.76: TOTAL : 293.74 1.81=20=20= =20=20=20=20=20 295.55 1173M dumping.24: tree VRP : 94.27 ( 9%) 0.11 ( 3%= )=20 94.39 ( 9%) 58M ( 3%) dumping.24: backwards jump threading : 102.63 ( 10%) 0.02 ( 0%) 102.67 ( 10%) 8986k ( 0%) dumping.24: TOTAL :1021.66 4.28=20=20= =20=20=20=20 1025.92 2325M So at least for ltrans42, there's a big slowdown with this patch. Before, threading was 4.80% faster than VRP, whereas now it's 94.7% slower. I have a patch for the above slowdown, but I wouldn't characterize the above difference as a "compile hog". When I add up the 3 ltrans unit totals (whi= ch are basically the entire compilation), the difference is a 3% slowdown. If this PR is for a larger than 3-4% slowdown, I think we should look elsewhere. I could be wrong though ;-).=