From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 84F85385800C; Wed, 23 Dec 2020 10:42:40 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 84F85385800C From: "mscfd at gmx dot net" To: gcc-bugs@gcc.gnu.org Subject: [Bug fortran/98426] New: find_symbol in module.c traverses O(N) part of a search tree Date: Wed, 23 Dec 2020 10:42:40 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: fortran X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: mscfd at gmx dot net X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone attachments.created Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Dec 2020 10:42:40 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D98426 Bug ID: 98426 Summary: find_symbol in module.c traverses O(N) part of a search tree Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: mscfd at gmx dot net Target Milestone: --- Created attachment 49837 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=3D49837&action=3Dedit patch Compilation times of top level modules in a large project I am working on a= re=20 abysmal (up to 30 seconds for modules even with little code in it). Those modules are using other modules whose mod files are up to 500kB. As far as I can see the mod files contain all [including private] symbols from any mod= ule in their use-chain. I used -pg compiler option to compile gcc itself and ran f951 and gprof with some modules. gprof showed that about 80% or more of the compilation time is spent in routine "find_symbol" in module.c. >>From the documentation I can see that argument "gfc_symtree *st" of "find_symbol" should be a balanced binary search tree, ordered by gfc_symtree->name. It looks like the tree can have different nodes with the same name, so traversing the tree to check all nodes of the given name might require to descend both the left and right subtree. However, "find_symbol" traverses far more nodes than necessary. (Assuming t= hat all symbol names are unique, it traverses about half of the tree on average, making the lookup operation O(N) instead of O(log N).) The attached patch descends only if it can expect a match in the subtree (a= nd no match was found so far). This brings down the execution time of find_sym= bol to almost nothing, speeding up the compilation time by a factor of about 5-= 10 for my top level modules (makes a difference if a module compiles in 3 inst= ead of 30 seconds)! The patch regtests, and also compiles the large project with its big symbol trees flawlessly and much faster. Are there any testcases which check that find_symbol works as expected? If I change find_symbol to always return NULL, then make -k check-fortran gives # of expected passes 55911 # of unexpected failures 1 # of expected failures 232 # of unsupported tests 81 with just one unexpected failure I could not locate. In particular if I compile check use_only_1.f90 by hand (added by PR33541, which added find_symbol), everything seems to work without a proper return value from find_symbol.=