From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 811F63858D33; Wed, 12 Apr 2023 11:36:00 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 811F63858D33
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1681299360;
	bh=hvktxxsFTJ1KKpfnMutEQziwxyYaPdovvy1k/ZcpCsI=;
	h=From:To:Subject:Date:From;
	b=EFPY+d9EWTRRgVq9p8EToTaCqlUJXTVKIbcw0ScRCS1fV32CvoqW+k3laKwQTgXAd
	 oXAYqgN/j4Y70gjNm7pmr2+w+QfPsE0iyGjC+871Kc+kw9ifC5ZABo9KSjC4NN/OCI
	 97VqSL0pdZmYDHt8pE9JI8OivU6jooUzSlE4so/w=
From: "dani.borg at outlook dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug preprocessor/109485] New: Feature request: More efficient
 include path handling
Date: Wed, 12 Apr 2023 11:35:59 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: preprocessor
X-Bugzilla-Version: 12.1.0
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: dani.borg at outlook dot com
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status
 bug_severity priority component assigned_to reporter target_milestone
Message-ID: <bug-109485-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D109485

            Bug ID: 109485
           Summary: Feature request: More efficient include path handling
           Product: gcc
           Version: 12.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: preprocessor
          Assignee: unassigned at gcc dot gnu.org
          Reporter: dani.borg at outlook dot com
  Target Milestone: ---

When preprocessing, the method to lookup include paths is inefficient and c=
ause
more file system calls than needed.
The current method simply tries each include path in order, for every unique
#include directive. Basically O(n*n) in complexity.
The method scales poorly when the number of include paths increase which can
cause high system load and long build times.

A smarter approach, which clang appears to be using, is to keep track of wh=
ich
include paths doesn't contain the path seen in the include directive. Then =
file
system queries for impossible paths can be avoided.

To give a concrete example that compares gcc and clang, the following can be
run in bash on a Linux system. The example below only shows the relevant ou=
tput
from strace.

#prepare - create 2 include paths, 3 headers and a source file including the
headers
mkdir -p a b/b && touch b/b/a.h b/b/b.h b/b/c.h && echo -e '#include
"b/a.h"\n#include "b/b.h"\n#include "b/c.h"' > a.c

#gcc
strace -f -e open,stat gcc -Ia -Ib -nostdinc a.c -E -o/dev/null
=C2=A0 [pid 12] open("a/b/a.h", O_RDONLY|O_NOCTTY) =3D -1 ENOENT (No such f=
ile or
directory)
=C2=A0 [pid 12] open("b/b/a.h", O_RDONLY|O_NOCTTY) =3D 4
=C2=A0 [pid 12] open("a/b/b.h", O_RDONLY|O_NOCTTY) =3D -1 ENOENT (No such f=
ile or
directory)
=C2=A0 [pid 12] open("b/b/b.h", O_RDONLY|O_NOCTTY) =3D 4
=C2=A0 [pid 12] open("a/b/c.h", O_RDONLY|O_NOCTTY) =3D -1 ENOENT (No such f=
ile or
directory)
=C2=A0 [pid 12] open("b/b/c.h", O_RDONLY|O_NOCTTY) =3D 4
=C2=A0 [pid 12] +++ exited with 0 +++

#clang
strace -f -e open,stat clang -Ia -Ib -nostdinc a.c -E -o/dev/null
=C2=A0 stat("a/b", 0x7ffd8d11ac20)=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =3D -1 ENOENT (No such file or
directory)
=C2=A0 stat("b/b", {st_mode=3DS_IFDIR|0755, st_size=3D55, ...}) =3D 0
=C2=A0 open("b/b/a.h", O_RDONLY|O_CLOEXEC)=C2=A0=C2=A0=C2=A0=C2=A0 =3D 3
=C2=A0 open("b/b/b.h", O_RDONLY|O_CLOEXEC)=C2=A0=C2=A0=C2=A0=C2=A0 =3D 3
=C2=A0 open("b/b/c.h", O_RDONLY|O_CLOEXEC)=C2=A0=C2=A0=C2=A0=C2=A0 =3D 3
=C2=A0 +++ exited with 0 +++

Note how clang first determines if the partial path exist, before testing t=
he
full path. This way all open("a/b/... calls can be avoided.=