From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x1036.google.com (mail-pj1-x1036.google.com [IPv6:2607:f8b0:4864:20::1036]) by sourceware.org (Postfix) with ESMTPS id 3858F3858D3C for ; Mon, 22 Aug 2022 17:43:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3858F3858D3C Received: by mail-pj1-x1036.google.com with SMTP id t11-20020a17090a510b00b001fac77e9d1fso12013754pjh.5 for ; Mon, 22 Aug 2022 10:43:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc; bh=6GsQoijq0sMGigT5z5DnVCetKUltl5sby4/iCsaDRE0=; b=LWjjH8M9Hyk5xENP3AJ5uba8+XydOW7hRFGfCZKP+Jy5h82nJROiJjkwCqwCQjnCyt BQY1iToYJTgh1nQbyY+K8wUbBxazkivXBsFaFjLsK+PsODoZRDHRHTk/dresYjhiGjnI +u1pkPlruEdfSD3u/0wxIs4D4WUqofgVLxDBT1SdhY9/l1BLhPatdRNwAKbDU4sPNIRh a15T9OD3f6KMYF0QdKdCGcD4j3S3gMWqNODMSFHwTGJ8q5LhoakD5UvrZzFTgL33BijE BirBfsD41Vp3+aOJxo+PWSKTFTjyAw4tN/R2ofqEcfXL7rvfmVZimk8m1ktPeXsgxBg5 cX6w== X-Gm-Message-State: ACgBeo3bA5SXKFYeYzU0/qyyOjtn+fP0P5wOfz2JSxk4uuN3YicbOr6j ewfaA4IFMYkPBOQfjOYJAyo= X-Google-Smtp-Source: AA6agR6ClHvveD9KSHp0+7T4kSLi1HUnPw9VJpG4A/VN4HdOoHBnBsM5t3ZK85YY3ZXonMiwa+sfKQ== X-Received: by 2002:a17:902:e748:b0:16f:8ae9:307f with SMTP id p8-20020a170902e74800b0016f8ae9307fmr20667615plf.135.1661190214742; Mon, 22 Aug 2022 10:43:34 -0700 (PDT) Received: from perseverance.local (76-14-114-239.rk.wavecable.com. [76.14.114.239]) by smtp.gmail.com with ESMTPSA id j6-20020a170902758600b00172a2a41064sm8593983pll.298.2022.08.22.10.43.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Aug 2022 10:43:34 -0700 (PDT) Date: Mon, 22 Aug 2022 10:43:32 -0700 From: Paul Hollinsky To: David Malcolm Cc: gcc-patches@gcc.gnu.org, Jason Merrill , Per Bothner , Nathan Sidwell , Tom Tromey Subject: Re: Ping [PATCH V2] libcpp: Optimize #pragma once with a hash table [PR58770] Message-ID: <20220822174332.v5tqtj2jik4nag4n@perseverance.local> References: <20220706230321.48041-1-paulhollinsky@gmail.com> <20220801051839.1454-1-paulhollinsky@gmail.com> <20220819202734.wvourvrje3x5vfla@perseverance.local> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Aug 2022 17:43:38 -0000 On Sun, Aug 21, 2022 at 08:40:26PM +0100, David Malcolm wrote: > On Fri, 2022-08-19 at 13:27 -0700, Paul Hollinsky wrote: > > Hi all, > > > > Would love some feedback on this patch! > > > > Thanks, > > Paul > > Hi Paul. Sorry for not getting back to you before. I'm listed as a > libcpp maintainer, but this happens to be a part of libcpp I've not > looked at (I'm mostly just familiar with location_t management). > > FWIW, the patch seems sane to me, though I have one concern: does it > work with precompiled headers? (given that PCH is implemented by > taking a snapshot of the gc heap and reconstructing it on load, it thus > tends to be an annoying source of bugs when trying the kind of cleanup > this patch is doing). Hi Dave, thanks for taking a look! I figured that, since the original implementation didn't do anything with PCH, I wouldn't have to either. But I'll dig into how PCH is implemented and run some tests to make absolutely sure. All the best, Paul > > Sorry to not look be able to look at things in more detail (I'm > travelling) > > Hope this is constructive > Dave > > > > > On Mon, Aug 01, 2022 at 05:18:40AM +0000, Paul Hollinsky wrote: > > > Rather than traversing the all_files linked list for every include, > > > this factors out the quick idempotency checks (modification time > > > and size) to be the keys in a hash table so we can find matching > > > files quickly. > > > > > > The hash table value type is a linked list, in case more than one > > > file matches the quick check. > > > > > > The table is only built if a once-only file is seen, so include > > > guard performance is not affected. > > > > > > My laptop would previously complete Ricardo's benchmark from the > > > PR in ~1.1s using #pragma once, and ~0.35s using include guards. > > > > > > After this change, both benchmarks now complete in ~0.35s. I did > > > have to randomize the modification dates on the benchmark headers > > > so the files did not all end up in the same hash table list, but > > > that would likely not come up outside of the contrived benchmark. > > > > > > I bootstrapped and ran the testsuite on x86_64 Darwin, as well as > > > ppc64le and aarch64 Linux. > > > > > > libcpp/ChangeLog: > > > > > >         PR preprocessor/58770 > > >         * internal.h: Add hash table for #pragma once > > >         * files.cc: Optimize #pragma once with the hash table > > > > > > Signed-off-by: Paul Hollinsky > > > --- > > >  libcpp/files.cc   | 116 > > > +++++++++++++++++++++++++++++++++++++++++++--- > > >  libcpp/internal.h |   3 ++ > > >  2 files changed, 112 insertions(+), 7 deletions(-) > > > > > > diff --git a/libcpp/files.cc b/libcpp/files.cc > > > index 24208f7b0f8..d4ffd77578e 100644 > > > --- a/libcpp/files.cc > > > +++ b/libcpp/files.cc > > > @@ -167,6 +167,33 @@ struct file_hash_entry_pool > > >    struct cpp_file_hash_entry pool[FILE_HASH_POOL_SIZE]; > > >  }; > > >   > > > +/* A set of attributes designed to quickly identify obviously > > > different files > > > +   in a hashtable.  Just in case there are collisions, we still > > > maintain a > > > +   list.  These sub-lists can then be checked for #pragma once > > > rather than > > > +   interating through all_files.  */ > > > +struct file_quick_idempotency_attrs > > > +{ > > > +  file_quick_idempotency_attrs(const _cpp_file *f) > > > +    : mtime(f->st.st_mtime), size(f->st.st_size) {} > > > + > > > +  time_t mtime; > > > +  off_t size; > > > + > > > +  static hashval_t hash (/* _cpp_file* */ const void *p); > > > +}; > > > + > > > +/* Sub-list of very similar files kept in a hashtable to check for > > > #pragma > > > +   once.  */ > > > +struct file_sublist > > > +{ > > > +  _cpp_file *f; > > > +  file_sublist *next; > > > + > > > +  static int eq (/* _cpp_file* */ const void *p, > > > +                /* file_sublist* */ const void *q); > > > +  static void del (/* file_sublist* */ void *p); > > > +}; > > > + > > >  static bool open_file (_cpp_file *file); > > >  static bool pch_open_file (cpp_reader *pfile, _cpp_file *file, > > >                            bool *invalid_pch); > > > @@ -849,17 +876,17 @@ has_unique_contents (cpp_reader *pfile, > > > _cpp_file *file, bool import, > > >    if (!pfile->seen_once_only) > > >      return true; > > >   > > > -  /* We may have read the file under a different name.  Look > > > -     for likely candidates and compare file contents to be sure.  > > > */ > > > -  for (_cpp_file *f = pfile->all_files; f; f = f->next_file) > > > +  /* We may have read the file under a different name.  We've kept > > > +     similar looking files in this lists under this hash table, so > > > +     check those more thoroughly.  */ > > > +  void* ent = htab_find(pfile->pragma_once_files, file); > > > +  for (file_sublist *e = static_cast (ent); e; e = > > > e->next) > > >      { > > > +      _cpp_file *f = e->f; > > >        if (f == file) > > >         continue; /* It'sa me!  */ > > >   > > > -      if ((import || f->once_only) > > > -         && f->err_no == 0 > > > -         && f->st.st_mtime == file->st.st_mtime > > > -         && f->st.st_size == file->st.st_size) > > > +      if ((import || f->once_only) && f->err_no == 0) > > >         { > > >           _cpp_file *ref_file; > > >   > > > @@ -895,6 +922,38 @@ has_unique_contents (cpp_reader *pfile, > > > _cpp_file *file, bool import, > > >    return true; > > >  } > > >   > > > +/* Add the given file to the #pragma once table so it can be > > > +   quickly identified and excluded the next time it's seen.  */ > > > +static void > > > +update_pragma_once_table (cpp_reader *pfile, _cpp_file *file) > > > +{ > > > +  void **slot = htab_find_slot (pfile->pragma_once_files, file, > > > INSERT); > > > +  if (slot) > > > +    { > > > +      if (!*slot) > > > +       *slot = xcalloc (1, sizeof (file_sublist)); > > > + > > > +      file_sublist *e = static_cast (*slot); > > > +      while (e->f) > > > +       { > > > +         if (!e->next) > > > +           { > > > +             void *new_sublist = xcalloc(1, sizeof > > > (file_sublist)); > > > +             e->next = static_cast (new_sublist); > > > +           } > > > +         e = e->next; > > > +       } > > > + > > > +      e->f = file; > > > +    } > > > +  else > > > +    { > > > +      cpp_error (pfile, CPP_DL_ERROR, > > > +                "Unable to create #pragma once table space for > > > %s", > > > +                _cpp_get_file_name (file)); > > > +    } > > > +} > > > + > > >  /* Place the file referenced by FILE into a new buffer on the > > > buffer > > >     stack if possible.  Returns true if a buffer is stacked.  Use > > > LOC > > >     for any diagnostics.  */ > > > @@ -950,6 +1009,9 @@ _cpp_stack_file (cpp_reader *pfile, _cpp_file > > > *file, include_type type, > > >        if (!has_unique_contents (pfile, file, type == IT_IMPORT, > > > loc)) > > >         return false; > > >   > > > +      if (pfile->seen_once_only && file->once_only) > > > +       update_pragma_once_table (pfile, file); > > > + > > >        if (pfile->buffer && file->dir) > > >         sysp = MAX (pfile->buffer->sysp, file->dir->sysp); > > >   > > > @@ -1434,6 +1496,41 @@ nonexistent_file_hash_eq (const void *p, > > > const void *q) > > >    return filename_cmp ((const char *) p, (const char *) q) == 0; > > >  } > > >   > > > +/* Hasher for the #pragma once hash table.  */ > > > +hashval_t > > > +file_quick_idempotency_attrs::hash (const void *p) > > > +{ > > > +  const _cpp_file *f = static_cast (p); > > > +  file_quick_idempotency_attrs kh (f); > > > +  return iterative_hash_object (kh, 0); > > > +} > > > + > > > +/* Equality checker for the #pragma once hash table.  */ > > > +int > > > +file_sublist::eq (const void *p, const void *q) > > > +{ > > > +  /* Just check if the file q would be in the list p. Every > > > +     file in the list should have these attributes the same, > > > +     so we don't need to traverse.  */ > > > +  const file_sublist *e = static_cast (p); > > > +  const _cpp_file *f = static_cast (q); > > > +  return f->st.st_mtime == e->f->st.st_mtime > > > +        && f->st.st_size == e->f->st.st_size; > > > +} > > > + > > > +/* Cleanup for a file sub-list. Does not free the _cpp_file > > > +   structures within.  */ > > > +void > > > +file_sublist::del (void *p) > > > +{ > > > +  file_sublist *e = static_cast (p); > > > +  if (e->next) > > > +    { > > > +      file_sublist::del (e->next); > > > +      free (e->next); > > > +    } > > > +} > > > + > > >  /* Initialize everything in this source file.  */ > > >  void > > >  _cpp_init_files (cpp_reader *pfile) > > > @@ -1442,6 +1539,10 @@ _cpp_init_files (cpp_reader *pfile) > > >                                         NULL, xcalloc, free); > > >    pfile->dir_hash = htab_create_alloc (127, file_hash_hash, > > > file_hash_eq, > > >                                         NULL, xcalloc, free); > > > +  pfile->pragma_once_files = htab_create_alloc (127, > > > +                                       file_quick_idempotency_attr > > > s::hash, > > > +                                       file_sublist::eq, > > > file_sublist::del, > > > +                                       xcalloc, free); > > >    allocate_file_hash_entries (pfile); > > >    pfile->nonexistent_file_hash = htab_create_alloc (127, > > > htab_hash_string, > > >                                                     > > > nonexistent_file_hash_eq, > > > @@ -1456,6 +1557,7 @@ _cpp_cleanup_files (cpp_reader *pfile) > > >  { > > >    htab_delete (pfile->file_hash); > > >    htab_delete (pfile->dir_hash); > > > +  htab_delete (pfile->pragma_once_files); > > >    htab_delete (pfile->nonexistent_file_hash); > > >    obstack_free (&pfile->nonexistent_file_ob, 0); > > >    free_file_hash_entries (pfile); > > > diff --git a/libcpp/internal.h b/libcpp/internal.h > > > index badfd1b40da..9c3c46df335 100644 > > > --- a/libcpp/internal.h > > > +++ b/libcpp/internal.h > > > @@ -485,6 +485,9 @@ struct cpp_reader > > >       been used.  */ > > >    bool seen_once_only; > > >   > > > +  /* Optimization for #pragma once.  */ > > > +  struct htab *pragma_once_files; > > > + > > >    /* Multiple include optimization.  */ > > >    const cpp_hashnode *mi_cmacro; > > >    const cpp_hashnode *mi_ind_cmacro; > > > -- > > > 2.34.1 > > > > > >