From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 57D713858C2D; Thu, 2 Feb 2023 10:22:58 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 57D713858C2D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1675333378; bh=AohORTa8QrnvQPaaqJy0YdMPAQ4Td0edmo/ME/hQ7y8=; h=From:To:Subject:Date:In-Reply-To:References:From; b=lcSh6hYaeBNCMpNWMJOqgDwAl9TigKYDe78FNWP/UpOOcCHhT11/G8SDvmtgr/XA3 dqVPie4bwA+4V3DljsXfdP21+8Kfh6jIrJqx+YL2lj7Rba99h0yug8YBc8x8zkneuV 6rTyOG7gVIDuWo75ztJqK43a1qAayg5+L8jmrnkc= From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/108500] [11/12 Regression] -O -finline-small-functions results in "internal compiler error: Segmentation fault" on a very large program (700k function calls) Date: Thu, 02 Feb 2023 10:22:55 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 12.2.0 X-Bugzilla-Keywords: compile-time-hog, ice-on-valid-code, memory-hog X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org X-Bugzilla-Target-Milestone: 11.4 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D108500 Richard Biener changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jamborm at gcc dot gnu.org, | |vmakarov at gcc dot gnu.org --- Comment #14 from Richard Biener --- Thanks for the new testcase. With -O0 (and a --enable-checking=3Drelease b= uilt compiler) this builds in ~11 minutes (on a Ryzen 9 7900X) with integrated RA : 38.96 ( 6%) 1.94 ( 20%) 42.00 ( = 6%) 3392M ( 23%) LRA non-specific : 18.93 ( 3%) 1.24 ( 13%) 23.78 ( = 4%) 450M ( 3%) LRA virtuals elimination : 5.67 ( 1%) 0.05 ( 1%) 5.75 ( = 1%) 457M ( 3%) LRA reload inheritance : 318.25 ( 49%) 0.24 ( 2%) 318.51 ( = 48%) 0 ( 0%) LRA create live ranges : 199.24 ( 31%) 0.12 ( 1%) 199.38 ( = 30%) 228M ( 2%) 645.67user 10.29system 11:04.42elapsed 98%CPU (0avgtext+0avgdata 30577844maxresident)k 3936200inputs+1091808outputs (122053major+10664929minor)pagefaults 0swaps so register allocation taking all of the time. There's maybe the possibili= ty to gate some of its features on the # of BBs or insns (or whatever the actu= al "bad" thing is - I didn't look closer yet). It also seems to use 30GB of peak memory at -O0 ... For -O the situation is "better": tree PTA : 987.21 ( 99%) 0.41 ( 12%) 987.70 ( = 99%) 128 ( 0%) 992.56user 3.53system 16:36.20elapsed 99%CPU (0avgtext+0avgdata 2968740maxresident)k 42576inputs+8outputs (28major+717414minor)pagefaults 0swaps which suggests a clear workaround, -fno-tree-pta, which makes it compile in 5s for me. Doing -O -finline-small-functions -fno-tree-pta we get a very high compile-time in SRAs propagate_all_subaccesses which probably sees a very large struct copy chain tem1 =3D s2; s2 =3D tem1; tem2 =3D s2; s2 =3D tem2; ... and somehow ends up quadratic (possibly switching the candidate_bitmap to tree form at the start of propagate_all_subaccesses will help a bit). tree form bitmap doesn't help, I guess we end up queueing all elements in the copy chain to the worklist and via the chains end up with a O(n^2) working set. The testcase can probably be shortened to get at this problem. SRA is actually quite important here, so disabling SRA as a workaround doesn't look to improve the situation a lot. Still with -fno-tree-sra added we get good compile time and DCE/DSE remove all code plus -fno-tree-pta isn't required. Martin, can you look at the SRA issue? Do you want me to create a separate bugreport for this? The IL into SRA looks like : s2D.2755 =3D {}; s1D.2756 =3D {}; _unusedD.2002766 =3D s1D.2756; sD.2002767 =3D s2D.2755; s2D.2755 =3D sD.2002767; _unusedD.2002766 =3D{v} {CLOBBER(eol)}; sD.2002767 =3D{v} {CLOBBER(eol)}; _unusedD.2002764 =3D s1D.2756; sD.2002765 =3D s2D.2755; s2D.2755 =3D sD.2002765; _unusedD.2002764 =3D{v} {CLOBBER(eol)}; sD.2002765 =3D{v} {CLOBBER(eol)}; _unusedD.2002762 =3D s1D.2756; sD.2002763 =3D s2D.2755; s2D.2755 =3D sD.2002763; _unusedD.2002762 =3D{v} {CLOBBER(eol)}; sD.2002763 =3D{v} {CLOBBER(eol)}; _unusedD.2002760 =3D s1D.2756; sD.2002761 =3D s2D.2755; s2D.2755 =3D sD.2002761; _unusedD.2002760 =3D{v} {CLOBBER(eol)}; sD.2002761 =3D{v} {CLOBBER(eol)}; ...=