public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug preprocessor/109485] New: Feature request: More efficient include path handling
@ 2023-04-12 11:35 dani.borg at outlook dot com
  2023-04-12 18:05 ` [Bug preprocessor/109485] " pinskia at gcc dot gnu.org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: dani.borg at outlook dot com @ 2023-04-12 11:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109485

            Bug ID: 109485
           Summary: Feature request: More efficient include path handling
           Product: gcc
           Version: 12.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: preprocessor
          Assignee: unassigned at gcc dot gnu.org
          Reporter: dani.borg at outlook dot com
  Target Milestone: ---

When preprocessing, the method to lookup include paths is inefficient and cause
more file system calls than needed.
The current method simply tries each include path in order, for every unique
#include directive. Basically O(n*n) in complexity.
The method scales poorly when the number of include paths increase which can
cause high system load and long build times.

A smarter approach, which clang appears to be using, is to keep track of which
include paths doesn't contain the path seen in the include directive. Then file
system queries for impossible paths can be avoided.

To give a concrete example that compares gcc and clang, the following can be
run in bash on a Linux system. The example below only shows the relevant output
from strace.

#prepare - create 2 include paths, 3 headers and a source file including the
headers
mkdir -p a b/b && touch b/b/a.h b/b/b.h b/b/c.h && echo -e '#include
"b/a.h"\n#include "b/b.h"\n#include "b/c.h"' > a.c

#gcc
strace -f -e open,stat gcc -Ia -Ib -nostdinc a.c -E -o/dev/null
  [pid 12] open("a/b/a.h", O_RDONLY|O_NOCTTY) = -1 ENOENT (No such file or
directory)
  [pid 12] open("b/b/a.h", O_RDONLY|O_NOCTTY) = 4
  [pid 12] open("a/b/b.h", O_RDONLY|O_NOCTTY) = -1 ENOENT (No such file or
directory)
  [pid 12] open("b/b/b.h", O_RDONLY|O_NOCTTY) = 4
  [pid 12] open("a/b/c.h", O_RDONLY|O_NOCTTY) = -1 ENOENT (No such file or
directory)
  [pid 12] open("b/b/c.h", O_RDONLY|O_NOCTTY) = 4
  [pid 12] +++ exited with 0 +++

#clang
strace -f -e open,stat clang -Ia -Ib -nostdinc a.c -E -o/dev/null
  stat("a/b", 0x7ffd8d11ac20)             = -1 ENOENT (No such file or
directory)
  stat("b/b", {st_mode=S_IFDIR|0755, st_size=55, ...}) = 0
  open("b/b/a.h", O_RDONLY|O_CLOEXEC)     = 3
  open("b/b/b.h", O_RDONLY|O_CLOEXEC)     = 3
  open("b/b/c.h", O_RDONLY|O_CLOEXEC)     = 3
  +++ exited with 0 +++

Note how clang first determines if the partial path exist, before testing the
full path. This way all open("a/b/... calls can be avoided.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug preprocessor/109485] Feature request: More efficient include path handling
  2023-04-12 11:35 [Bug preprocessor/109485] New: Feature request: More efficient include path handling dani.borg at outlook dot com
@ 2023-04-12 18:05 ` pinskia at gcc dot gnu.org
  2023-04-12 18:09 ` pinskia at gcc dot gnu.org
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-04-12 18:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109485

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Is this even valid with NFS?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug preprocessor/109485] Feature request: More efficient include path handling
  2023-04-12 11:35 [Bug preprocessor/109485] New: Feature request: More efficient include path handling dani.borg at outlook dot com
  2023-04-12 18:05 ` [Bug preprocessor/109485] " pinskia at gcc dot gnu.org
@ 2023-04-12 18:09 ` pinskia at gcc dot gnu.org
  2023-04-12 18:11 ` [Bug preprocessor/109485] improve " pinskia at gcc dot gnu.org
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-04-12 18:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109485

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Also does it work with Windows style pathes?
I am suspecting clang does not do the right thing there ...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug preprocessor/109485] improve include path handling
  2023-04-12 11:35 [Bug preprocessor/109485] New: Feature request: More efficient include path handling dani.borg at outlook dot com
  2023-04-12 18:05 ` [Bug preprocessor/109485] " pinskia at gcc dot gnu.org
  2023-04-12 18:09 ` pinskia at gcc dot gnu.org
@ 2023-04-12 18:11 ` pinskia at gcc dot gnu.org
  2023-04-12 18:15 ` pinskia at gcc dot gnu.org
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-04-12 18:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109485

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Feature request: More       |improve include path
                   |efficient include path      |handling
                   |handling                    |
           Severity|normal                      |enhancement

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Also I really doubt the improvement here is less than 1% improvement really for
the common case where people don't put pathes in the include line.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug preprocessor/109485] improve include path handling
  2023-04-12 11:35 [Bug preprocessor/109485] New: Feature request: More efficient include path handling dani.borg at outlook dot com
                   ` (2 preceding siblings ...)
  2023-04-12 18:11 ` [Bug preprocessor/109485] improve " pinskia at gcc dot gnu.org
@ 2023-04-12 18:15 ` pinskia at gcc dot gnu.org
  2023-04-12 18:52 ` pinskia at gcc dot gnu.org
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-04-12 18:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109485

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=15772,
                   |                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=11242

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Yes this is a fragile area for sure ...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug preprocessor/109485] improve include path handling
  2023-04-12 11:35 [Bug preprocessor/109485] New: Feature request: More efficient include path handling dani.borg at outlook dot com
                   ` (3 preceding siblings ...)
  2023-04-12 18:15 ` pinskia at gcc dot gnu.org
@ 2023-04-12 18:52 ` pinskia at gcc dot gnu.org
  2023-04-13  8:06 ` dani.borg at outlook dot com
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-04-12 18:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109485

--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note I think clang's "optimization" might get the case where a subdirectory is
not "executable" but is readable wrong.

So this is definitely something which would need testing too.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug preprocessor/109485] improve include path handling
  2023-04-12 11:35 [Bug preprocessor/109485] New: Feature request: More efficient include path handling dani.borg at outlook dot com
                   ` (4 preceding siblings ...)
  2023-04-12 18:52 ` pinskia at gcc dot gnu.org
@ 2023-04-13  8:06 ` dani.borg at outlook dot com
  2023-04-13  8:17 ` dani.borg at outlook dot com
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: dani.borg at outlook dot com @ 2023-04-13  8:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109485

--- Comment #6 from Dani Borg <dani.borg at outlook dot com> ---
(In reply to Andrew Pinski from comment #1)
> Is this even valid with NFS?

My knowledge of different file systems is limited, but I think checking the
presence of a directory should be as valid on NFS as most other file systems?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug preprocessor/109485] improve include path handling
  2023-04-12 11:35 [Bug preprocessor/109485] New: Feature request: More efficient include path handling dani.borg at outlook dot com
                   ` (5 preceding siblings ...)
  2023-04-13  8:06 ` dani.borg at outlook dot com
@ 2023-04-13  8:17 ` dani.borg at outlook dot com
  2023-04-13  8:51 ` dani.borg at outlook dot com
  2023-04-13  9:07 ` dani.borg at outlook dot com
  8 siblings, 0 replies; 10+ messages in thread
From: dani.borg at outlook dot com @ 2023-04-13  8:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109485

--- Comment #7 from Dani Borg <dani.borg at outlook dot com> ---
(In reply to Andrew Pinski from comment #2)
> Also does it work with Windows style pathes?
> I am suspecting clang does not do the right thing there ...

I don't know much about Windows path handling, so I can't say. It's possible
some adaptations are needed depending on the OS. I think the main
principles/strategy should work for any file system that has a directory
structure though.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug preprocessor/109485] improve include path handling
  2023-04-12 11:35 [Bug preprocessor/109485] New: Feature request: More efficient include path handling dani.borg at outlook dot com
                   ` (6 preceding siblings ...)
  2023-04-13  8:17 ` dani.borg at outlook dot com
@ 2023-04-13  8:51 ` dani.borg at outlook dot com
  2023-04-13  9:07 ` dani.borg at outlook dot com
  8 siblings, 0 replies; 10+ messages in thread
From: dani.borg at outlook dot com @ 2023-04-13  8:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109485

--- Comment #8 from Dani Borg <dani.borg at outlook dot com> ---
(In reply to Andrew Pinski from comment #3)
> Also I really doubt the improvement here is less than 1% improvement really
> for the common case where people don't put pathes in the include line.

Yes, it really depends on the project. It mainly becomes an issue for larger
projects that may have a lot of libraries with their own include paths. But the
file system is a big factor as well. I imagine the performance cost of open()
calls can be quite expensive for some file systems.

The project I have focused on had system load peaks reaching over 70% when
using gcc, while with clang the peaks reached only around 8%. This extreme
difference is what got me started looking into this to begin with.

Actually, I did an experiment to wrap the open() call where I implemented my
own caching (I wrapped gcc's call by creating a shared library and using
LD_PRELOAD). With my quick and dirty hack the system load peaks were reduced to
11% and the overall build time was reduced by 12%.

It's quite possible this is an extreme case, but I'm sure there are other large
projects out there that would see noticeable improvements as well.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug preprocessor/109485] improve include path handling
  2023-04-12 11:35 [Bug preprocessor/109485] New: Feature request: More efficient include path handling dani.borg at outlook dot com
                   ` (7 preceding siblings ...)
  2023-04-13  8:51 ` dani.borg at outlook dot com
@ 2023-04-13  9:07 ` dani.borg at outlook dot com
  8 siblings, 0 replies; 10+ messages in thread
From: dani.borg at outlook dot com @ 2023-04-13  9:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109485

--- Comment #9 from Dani Borg <dani.borg at outlook dot com> ---
(In reply to Andrew Pinski from comment #5)
> Note I think clang's "optimization" might get the case where a subdirectory
> is not "executable" but is readable wrong.
> 
> So this is definitely something which would need testing too.

The idea is just to check for the presence of directories and not to try to
list the contents in them. I think various permissions can be handled just
fine, but this is one of probably many corner cases that needs to be considered
and handled correctly.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-04-13  9:07 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-12 11:35 [Bug preprocessor/109485] New: Feature request: More efficient include path handling dani.borg at outlook dot com
2023-04-12 18:05 ` [Bug preprocessor/109485] " pinskia at gcc dot gnu.org
2023-04-12 18:09 ` pinskia at gcc dot gnu.org
2023-04-12 18:11 ` [Bug preprocessor/109485] improve " pinskia at gcc dot gnu.org
2023-04-12 18:15 ` pinskia at gcc dot gnu.org
2023-04-12 18:52 ` pinskia at gcc dot gnu.org
2023-04-13  8:06 ` dani.borg at outlook dot com
2023-04-13  8:17 ` dani.borg at outlook dot com
2023-04-13  8:51 ` dani.borg at outlook dot com
2023-04-13  9:07 ` dani.borg at outlook dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).