From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 9DD0D397A4A7; Fri, 22 May 2020 11:13:15 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9DD0D397A4A7 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1590145995; bh=qtE7+gQ2TkEpr0ymo2Ud1vh+7kfCrV1Kcjd6NL9NsfU=; h=From:To:Subject:Date:In-Reply-To:References:From; b=WmiC7eYErIoVWYLQHryFzzG3InQmF8JyLcBs+hOfD9CSnNBmWg0jdew+rMUrgZJoP JboL72HWVFWscWww+GaNSA2zkDpuw0miDaT8EAG9Cig8viuM+74zucJmUOs4+yByLp 80XxJtAkQRYH3K2mOMDcK/CmhjWaJ2ceKZmYPtzw= From: "freddie at witherden dot org" To: gcc-bugs@gcc.gnu.org Subject: [Bug c++/95264] Infinite Loop When Compiling Templated C++ code at -O1 and above Date: Fri, 22 May 2020 11:13:15 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: c++ X-Bugzilla-Version: 10.1.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: freddie at witherden dot org X-Bugzilla-Status: WAITING X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 May 2020 11:13:15 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D95264 --- Comment #6 from Freddie Witherden --- (In reply to Richard Biener from comment #3) > So with the [[gnu::flatten]] attributes removed -O1 needs 80 seconds to > compile and about 3GB of memory, -O2 needs around 2 minutes (same memory), > -O3 > is the same as -O2. >=20 > Maybe instead of [[gnu::flatten]] you want to bump --param inline-unit-gr= owth > or --param large-function-growth more moderately in case you can measure = an > effect on runtime. >=20 > Note multiple [[gnu::flatten]] can really exponentially grow program size > since it is not appearant which functions might be used from another > translation unit until you can use -fwhole-program (single CU program) > or -flto (but there [[gnu::flatten]] is applied to early to avoid such > growth - sth we might want to fix). Placing things not used from outside > in anonymous namespaces might help. The [[gnu::flatten]] was added to get GCC's performance in the case of T =3D double on a par with Clang's. (We don't care about performance with T =3D = bfloat as it is just used as a final polishing pass.) I can understand why GCC do= es not want to inline it in the case of T =3D bfloat which is a complex type, = but for T =3D double the function is basically just a sequence of mov's to popu= late an array. As the function is of the form for (int i =3D 0; i < N; i++) // N =3D template arg for (int j =3D 0; j < p[N]; j++) // runtime trip count foo(i, ...); // static polymorphism with foo being a large switch-case on its first argument the expectation was for the compiler to inline foo, unroll the outer loop, and then prune the d= ead cases such that we have something similar to for (int j =3D 0; j < p[0]; j++) foo(0, ...); // inline i =3D 0 case for (int j =3D 0; j < p[1]; j++) foo(1, ...); // inline i =3D 1 case // ...=