From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 16C3A3887038; Mon, 9 Nov 2020 17:45:18 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 16C3A3887038 From: "julien.ruffin at ivu dot de" To: gcc-bugs@gcc.gnu.org Subject: [Bug pch/56549] #pragma once ineffective with BOM in include file Date: Mon, 09 Nov 2020 17:45:17 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: pch X-Bugzilla-Version: 4.6.3 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: julien.ruffin at ivu dot de X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Nov 2020 17:45:18 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D56549 Julien Ruffin changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |julien.ruffin at ivu dot de --- Comment #6 from Julien Ruffin --- I have been having the same issue with GCC 9.2.0 for a while and ended up finding the cause of this error. It can be traced back to function _cpp_save_file_entries in gcc/libcpp/files.c. Short explanation: the function saves the sizes and MD5 checksums of files without any encoding conversion or BOM removal into the PCH's file list, ev= en though it should. Long explanation: the function fills the PCH's files list which contains, a= mong other things, the sizes and MD5 checksums of all files in the PCH. Later, w= hen using the PCH, the compiler compares the files it loads with the files in t= hat list. If it finds an entry with the same size and checksum as the loaded fi= le, it is in the PCH and the compiler skips processing it: see check_file_against_entries for the implementation, also in files.c. The issue here is that the matching never succeeds for headers that contain= a BOM. The PCH entry is always 3 Bytes longer than the file loaded by the compiler and the checksums always differ. The following code in _cpp_save_file_entries is why: if (f->buffer_valid) md5_buffer ((const char *)f->buffer, f->st.st_size, result->entries[count].sum); else { FILE *ff; int oldfd =3D f->fd; if (!open_file (f)) { open_file_failed (pfile, f, 0, 0); free (result); return false; } ff =3D fdopen (f->fd, "rb"); md5_stream (ff, result->entries[count].sum); fclose (ff); f->fd =3D oldfd; } result->entries[count].size =3D f->st.st_size; libcpp caches the contents of the files it reads into their own buffers, he= re f->buffer. The read_file function implements this loading and converts the file's encoding on the fly with _cpp_convert_input. *This conversion strips= the BOM,* so the contents of f->buffer differ from those of the file whenever a= BOM is used. If f->buffer_valid is not true, which seems to always be the case in the co= de above as far as I could test it, the function reopens the file by hand and computes the MD5 checksum directly from it, without any conversion. open_fi= le() also overwrites the data size in f->st.st_size with the size of the unconve= rted file. That is why the checksum and size of the unconverted file end up in t= he PCH's file list. The compiler later compares those with the files it loads through read_file= s. There never is a match because the checksums and sizes differ and the compi= ler thinks it it has loaded a different file, so it processes the header with t= he BOM a second time and the error we have been observing happens. I have managed to solve this issue by replacing the manual loading of the unconverted file in the else block above with a loading through read_file, yielding the converted buffer and the correct size and, in the end, the cor= rect checksum. I do not have a patch to offer yet for various reasons but my amateurish attempt at a fix made me able to build a large C++ code base successfully with precompiled headers, so it is rather encouraging. Somebody with more experience in the preprocessor might want to take a look= at this.=