public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "mscfd at gmx dot net" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug fortran/98426] New: find_symbol in module.c traverses O(N) part of a search tree
Date: Wed, 23 Dec 2020 10:42:40 +0000	[thread overview]
Message-ID: <bug-98426-4@http.gcc.gnu.org/bugzilla/> (raw)

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98426

            Bug ID: 98426
           Summary: find_symbol in module.c traverses O(N) part of a
                    search tree
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: fortran
          Assignee: unassigned at gcc dot gnu.org
          Reporter: mscfd at gmx dot net
  Target Milestone: ---

Created attachment 49837
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49837&action=edit
patch

Compilation times of top level modules in a large project I am working on are 
abysmal (up to 30 seconds for modules even with little code in it). Those
modules are using other modules whose mod files are up to 500kB. As far as I
can  see the mod files contain all [including private] symbols from any module
in their use-chain.

I used -pg compiler option to compile gcc itself and ran f951 and gprof with
some modules. gprof showed that about 80% or more of the compilation time is
spent in routine "find_symbol" in module.c.

>From the documentation I can see that argument "gfc_symtree *st" of
"find_symbol" should be a balanced binary search tree, ordered by
gfc_symtree->name. It looks like the tree can have different nodes with the
same name, so traversing the tree to check all nodes of the given name might
require to descend both the left and right subtree.

However, "find_symbol" traverses far more nodes than necessary. (Assuming that
all symbol names are unique, it traverses about half of the tree on average,
making the lookup operation O(N) instead of O(log N).)

The attached patch descends only if it can expect a match in the subtree (and
no match was found so far). This brings down the execution time of find_symbol
to almost nothing, speeding up the compilation time by a factor of about 5-10
for my top level modules (makes a difference if a module compiles in 3 instead
of 30 seconds)!


The patch regtests, and also compiles the large project with its big symbol
trees flawlessly and much faster.

Are there any testcases which check that find_symbol works as expected? If I
change find_symbol to always return NULL, then
make -k check-fortran gives
# of expected passes            55911
# of unexpected failures        1
# of expected failures          232
# of unsupported tests          81
with just one unexpected failure I could not locate.

In particular if I compile check use_only_1.f90 by hand (added by PR33541,
which added find_symbol), everything seems to work without a proper return
value from find_symbol.

             reply	other threads:[~2020-12-23 10:42 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-23 10:42 mscfd at gmx dot net [this message]
2020-12-28 10:29 ` [Bug fortran/98426] " mscfd at gmx dot net
2020-12-28 10:36 ` mscfd at gmx dot net
2020-12-29 11:15 ` mscfd at gmx dot net
2021-10-29 18:36 ` aldot at gcc dot gnu.org
2024-04-23 20:40 ` matthew.thompson at nasa dot gov
2024-04-23 23:26 ` jvdelisle at gcc dot gnu.org
2024-04-24  2:59 ` jvdelisle at gcc dot gnu.org
2024-04-24  3:05 ` jvdelisle at gcc dot gnu.org
2024-04-25 11:20 ` matthew.thompson at nasa dot gov
2024-04-25 16:38 ` jvdelisle at gcc dot gnu.org
2024-04-26  3:12 ` jvdelisle at gcc dot gnu.org
2024-04-26 12:43 ` matthew.thompson at nasa dot gov
2024-04-26 15:53 ` matthew.thompson at nasa dot gov
2024-04-26 15:54 ` matthew.thompson at nasa dot gov
2024-04-27  2:34 ` jvdelisle at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-98426-4@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).