public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "matthijs at stdin dot nl" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug preprocessor/80753] __has_include and __has_include_next taints subsequent I/O errors
Date: Wed, 23 Sep 2020 16:00:31 +0000	[thread overview]
Message-ID: <bug-80753-4-ke2XL7TPw2@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-80753-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80753

--- Comment #4 from Matthijs Kooijman <matthijs at stdin dot nl> ---
I looked a bit at the source, and it seems the the problem is (not
surprisingly) that `__has_include` causes the header filename to be put into
the cache, and an error message is only generated when the entire include path
has been processed without resolving the entry from the cache (essentially this
means that an error is only triggered when the entry is put into the cache,
*except* when it happens as part of `__has_include`).

Relevant function is _cpp_find_file:

https://github.com/gcc-mirror/gcc/blob/da13b7737662da11f8fefb28eaf4ed7c50c51767/libcpp/files.c#L506

This is called with kind = _cpp_FFK_HAS_INCLUDE for `__has_include` which
prevents an error here:

https://github.com/gcc-mirror/gcc/blob/da13b7737662da11f8fefb28eaf4ed7c50c51767/libcpp/files.c#L591-L592

And the cache entry is immediately returned on subsequent calls here:

https://github.com/gcc-mirror/gcc/blob/da13b7737662da11f8fefb28eaf4ed7c50c51767/libcpp/files.c#L523-L526

It seems there are continuous lookups in the cache and it took me a while to
realize how the cache actually works (I initially thought that maybe files that
were *not* found) were not actually put in the cache, but AFAIU the
`pfile->file_hash` cache works like this:
 - Hash slots are indexed by filename
 - Each slot contains a linked list of entries.
 - Each entry contains the directory lookups start from. This can be the
start_dir, but also the first quote_include or bracket_include dir. Iow, the
cache entry is only valid for a lookup that starts at that directory, or has
progressed to that directory, to allow a "" include to prime the cache for a <>
include as well.
 - Once a file is found, or the search path is exhausted, the result is stored
in the cache for start_dir and for "" and <> includes if the start dir for
those has been passed.

Anyway, this means that a failed lookup based from __has_include() will cause
the failed result to be put into the cache for one or more directories without
emitting an error and always be returned for subsequent includes without
emitting an error.


An obvious fix would be to simply not put the result in the cache for
_cpp_FFK_HAS_INCLUDE, but that would be a missed cache opportunity.

An alternative would be to add a boolean "error_emitted" to each _cpp_file* in
the cache (cannot add it to the cache entry itself, since the same _cpp_file*
might end up in different cache entries), that defaults to false and is set to
true by open_file_failed. When kind is not _cpp_FFK_HAS_INCLUDE, and returning
a cache entry that has "error_emitted" set to false and has an errno call
open_file_failed to emit an error. This would require that open_file_failed to
emit the right output in this case (i.e. the cached _cpp_file* should not rely
on the context in which it was generated), but AFAICS this would be the case.

I'm not quite familiar with building and patching gcc (and also really need to
get this yak hair back to my actual work), so I probably won't be providing a
patch here. Maybe my above analysis enables someone else to do so? :-)

  parent reply	other threads:[~2020-09-23 16:00 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <bug-80753-4@http.gcc.gnu.org/bugzilla/>
2020-09-23 14:24 ` matthijs at stdin dot nl
2020-09-23 16:00 ` matthijs at stdin dot nl [this message]
2023-06-12 17:15 ` pinskia at gcc dot gnu.org
2023-06-13  9:35 ` jakub at gcc dot gnu.org
2023-06-15 12:17 ` cvs-commit at gcc dot gnu.org
2023-08-01 10:43 ` jakub at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-80753-4-ke2XL7TPw2@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).