public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 0/7] lto: Incremental LTO.
@ 2023-11-17 20:16 Michal Jires
  2023-11-17 20:16 ` [PATCH 1/7] lto: Skip flag OPT_fltrans_output_list_ Michal Jires
                   ` (6 more replies)
  0 siblings, 7 replies; 18+ messages in thread
From: Michal Jires @ 2023-11-17 20:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: hubicka

Hi,
these patches implement Incremental LTO, specifically by caching results of
ltrans phase. Secondarily these patches contain changes to reduce divergence of
ltrans partitions so that they can be cached.

The aim is to reduce compile times for quick edit-compile cycles while using
LTO. Even with these minimal changes to the rest of GCC it works surprisingly
well. Currently testing by self compiling cc1, with individual commits used as
incremental changes, on average only ~1/3 of partitions need to be recompiled
with `-O2 -g0` and ~1/2 with `-O2 -g`. Which directly reduces time spent in
ltrans phase of LTO.

Unfortunately larger gains are a bit fragile. You may remember that during my
Cauldron talk I claimed reduction to ~1/6 and ~1/3 recompilations. That was
achieved with branch from March. Since then there were at least two commits
which introduced new divergence of partitions, though they seem fixable in
future.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/7] lto: Skip flag OPT_fltrans_output_list_.
  2023-11-17 20:16 [PATCH 0/7] lto: Incremental LTO Michal Jires
@ 2023-11-17 20:16 ` Michal Jires
  2023-12-29 21:16   ` Jan Hubicka
  2023-11-17 20:16 ` [PATCH 2/7] lto: Remove random_seed from section name Michal Jires
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Michal Jires @ 2023-11-17 20:16 UTC (permalink / raw)
  To: gcc-patches

Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/ChangeLog:

	* lto-opts.cc (lto_write_options): Skip OPT_fltrans_output_list_.
---
 gcc/lto-opts.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/lto-opts.cc b/gcc/lto-opts.cc
index c9bee9d4197..0451e290c75 100644
--- a/gcc/lto-opts.cc
+++ b/gcc/lto-opts.cc
@@ -152,6 +152,7 @@ lto_write_options (void)
 	case OPT_fprofile_prefix_map_:
 	case OPT_fcanon_prefix_map:
 	case OPT_fwhole_program:
+	case OPT_fltrans_output_list_:
 	  continue;
 
 	default:
-- 
2.42.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 2/7] lto: Remove random_seed from section name.
  2023-11-17 20:16 [PATCH 0/7] lto: Incremental LTO Michal Jires
  2023-11-17 20:16 ` [PATCH 1/7] lto: Skip flag OPT_fltrans_output_list_ Michal Jires
@ 2023-11-17 20:16 ` Michal Jires
  2023-12-29 21:17   ` Jan Hubicka
  2023-11-17 20:17 ` [PATCH 3/7] Lockfile Michal Jires
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Michal Jires @ 2023-11-17 20:16 UTC (permalink / raw)
  To: gcc-patches

Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/ChangeLog:

	* lto-streamer.cc (lto_get_section_name): Remove random_seed in WPA.
---
 gcc/lto-streamer.cc | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/lto-streamer.cc b/gcc/lto-streamer.cc
index 4968fd13413..53275e32618 100644
--- a/gcc/lto-streamer.cc
+++ b/gcc/lto-streamer.cc
@@ -132,11 +132,17 @@ lto_get_section_name (int section_type, const char *name,
      doesn't confuse the reader with merged sections.
 
      For options don't add a ID, the option reader cannot deal with them
-     and merging should be ok here. */
+     and merging should be ok here.
+
+     WPA output is sent to LTRANS directly inside of lto-wrapper, so name
+     uniqueness for external tools is not needed.
+     Randomness would inhibit incremental LTO.  */
   if (section_type == LTO_section_opts)
     strcpy (post, "");
   else if (f != NULL) 
     sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, f->id);
+  else if (flag_wpa)
+    strcpy (post, ".0");
   else
     sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, get_random_seed (false)); 
   char *res = concat (section_name_prefix, sep, add, post, NULL);
-- 
2.42.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 3/7] Lockfile.
  2023-11-17 20:16 [PATCH 0/7] lto: Incremental LTO Michal Jires
  2023-11-17 20:16 ` [PATCH 1/7] lto: Skip flag OPT_fltrans_output_list_ Michal Jires
  2023-11-17 20:16 ` [PATCH 2/7] lto: Remove random_seed from section name Michal Jires
@ 2023-11-17 20:17 ` Michal Jires
  2023-12-29 21:23   ` Jan Hubicka
  2023-11-17 20:17 ` [PATCH 4/7] lto: Implement ltrans cache Michal Jires
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Michal Jires @ 2023-11-17 20:17 UTC (permalink / raw)
  To: gcc-patches

This patch implements lockfile used for incremental LTO.

Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/ChangeLog:

	* Makefile.in: Add lockfile.o.
	* lockfile.cc: New file.
	* lockfile.h: New file.
---
 gcc/Makefile.in |   5 +-
 gcc/lockfile.cc | 136 ++++++++++++++++++++++++++++++++++++++++++++++++
 gcc/lockfile.h  |  85 ++++++++++++++++++++++++++++++
 3 files changed, 224 insertions(+), 2 deletions(-)
 create mode 100644 gcc/lockfile.cc
 create mode 100644 gcc/lockfile.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 7b7a4ff789a..2c527245c81 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1831,7 +1831,7 @@ ALL_HOST_BACKEND_OBJS = $(GCC_OBJS) $(OBJS) $(OBJS-libcommon) \
   $(OBJS-libcommon-target) main.o c-family/cppspec.o \
   $(COLLECT2_OBJS) $(EXTRA_GCC_OBJS) $(GCOV_OBJS) $(GCOV_DUMP_OBJS) \
   $(GCOV_TOOL_OBJS) $(GENGTYPE_OBJS) gcc-ar.o gcc-nm.o gcc-ranlib.o \
-  lto-wrapper.o collect-utils.o
+  lto-wrapper.o collect-utils.o lockfile.o
 
 # for anything that is shared use the cc1plus profile data, as that
 # is likely the most exercised during the build
@@ -2359,7 +2359,8 @@ collect2$(exeext): $(COLLECT2_OBJS) $(LIBDEPS)
 CFLAGS-collect2.o += -DTARGET_MACHINE=\"$(target_noncanonical)\" \
 	@TARGET_SYSTEM_ROOT_DEFINE@
 
-LTO_WRAPPER_OBJS = lto-wrapper.o collect-utils.o ggc-none.o
+LTO_WRAPPER_OBJS = lto-wrapper.o collect-utils.o ggc-none.o lockfile.o
+
 lto-wrapper$(exeext): $(LTO_WRAPPER_OBJS) libcommon-target.a $(LIBDEPS)
 	+$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o T$@ \
 	   $(LTO_WRAPPER_OBJS) libcommon-target.a $(LIBS)
diff --git a/gcc/lockfile.cc b/gcc/lockfile.cc
new file mode 100644
index 00000000000..9440e8938f3
--- /dev/null
+++ b/gcc/lockfile.cc
@@ -0,0 +1,136 @@
+/* File locking.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+
+#include "lockfile.h"
+
+
+/* Unique write lock.  No other lock can be held on this lockfile.
+   Blocking call.  */
+int
+lockfile::lock_write ()
+{
+  fd = open (filename.c_str (), O_RDWR | O_CREAT, 0666);
+  if (fd < 0)
+    return -1;
+
+#if HAVE_FCNTL_H
+  struct flock s_flock;
+
+  s_flock.l_whence = SEEK_SET;
+  s_flock.l_start = 0;
+  s_flock.l_len = 0;
+  s_flock.l_pid = getpid ();
+  s_flock.l_type = F_WRLCK;
+
+  while (fcntl (fd, F_SETLKW, &s_flock) && errno == EINTR)
+    continue;
+#endif
+  return 0;
+}
+
+/* Unique write lock.  No other lock can be held on this lockfile.
+   Only locks if this filelock is not locked by any other process.
+   Return whether locking was successful.  */
+int
+lockfile::try_lock_write ()
+{
+  fd = open (filename.c_str (), O_RDWR | O_CREAT, 0666);
+  if (fd < 0)
+    return -1;
+
+#if HAVE_FCNTL_H
+  struct flock s_flock;
+
+  s_flock.l_whence = SEEK_SET;
+  s_flock.l_start = 0;
+  s_flock.l_len = 0;
+  s_flock.l_pid = getpid ();
+  s_flock.l_type = F_WRLCK;
+
+  if (fcntl (fd, F_SETLK, &s_flock) == -1)
+    {
+      close (fd);
+      fd = -1;
+      return 1;
+    }
+#endif
+  return 0;
+}
+
+/* Shared read lock.  Only read lock can be held concurrently.
+   If write lock is already held by this process, it will be
+   changed to read lock.
+   Blocking call.  */
+int
+lockfile::lock_read ()
+{
+  fd = open (filename.c_str (), O_RDWR | O_CREAT, 0666);
+  if (fd < 0)
+    return -1;
+
+#if HAVE_FCNTL_H
+  struct flock s_flock;
+
+  s_flock.l_whence = SEEK_SET;
+  s_flock.l_start = 0;
+  s_flock.l_len = 0;
+  s_flock.l_pid = getpid ();
+  s_flock.l_type = F_RDLCK;
+
+  while (fcntl (fd, F_SETLKW, &s_flock) && errno == EINTR)
+    continue;
+#endif
+  return 0;
+}
+
+/* Unlock all previously placed locks.  */
+void
+lockfile::unlock ()
+{
+  if (fd < 0)
+    {
+#if HAVE_FCNTL_H
+      struct flock s_flock;
+
+      s_flock.l_whence = SEEK_SET;
+      s_flock.l_start = 0;
+      s_flock.l_len = 0;
+      s_flock.l_pid = getpid ();
+      s_flock.l_type = F_UNLCK;
+
+      fcntl (fd, F_SETLK, &s_flock);
+#endif
+      close (fd);
+      fd = -1;
+    }
+}
+
+/* Are lockfiles supported?  */
+bool
+lockfile::lockfile_supported ()
+{
+#if HAVE_FCNTL_H
+  return true;
+#else
+  return false;
+#endif
+}
diff --git a/gcc/lockfile.h b/gcc/lockfile.h
new file mode 100644
index 00000000000..afcbaf599c1
--- /dev/null
+++ b/gcc/lockfile.h
@@ -0,0 +1,85 @@
+/* File locking.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef LOCKFILE_H
+#define LOCKFILE_H
+
+#include <string>
+
+/* Used to synchronize across multiple processes.  */
+class lockfile {
+public:
+  /* Default constructor.  */
+  lockfile (): fd (-1)
+  {}
+  /* Intended constructor for use.  Filename should not be used for anything
+     other than locking to prevent unintentional unlock.  */
+  lockfile (std::string filename): lockfile ()
+  {
+    this->filename = std::move (filename);
+  }
+  lockfile (lockfile const& o): lockfile (o.filename)
+  {}
+
+  void operator=(lockfile o)
+  {
+    unlock ();
+    this->filename = o.filename;
+    this->fd = o.fd;
+    o.fd = -1;
+  }
+
+  /* Unique write lock.  No other lock can be held on this lockfile.
+     Blocking call.  */
+  int
+  lock_write ();
+
+  /* Unique write lock.  No other lock can be held on this lockfile.
+     Only locks if this filelock is not locked by any other process.
+     Return whether locking was successful.  */
+  int
+  try_lock_write ();
+
+  /* Shared read lock.  Only read lock can be held concurrently.
+     If write lock is already held by this process, it will be
+     changed to read lock.
+     Blocking call.  */
+  int
+  lock_read ();
+
+  /* Unlock all previously placed locks.  */
+  void
+  unlock ();
+
+  /* Returns whether any lock is held.  */
+  bool
+  locked ()
+  {
+    return fd < 0;
+  }
+
+  /* Are lockfiles supported?  */
+  static bool
+  lockfile_supported ();
+private:
+  std::string filename;
+  int fd;
+};
+
+#endif
-- 
2.42.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 4/7] lto: Implement ltrans cache
  2023-11-17 20:16 [PATCH 0/7] lto: Incremental LTO Michal Jires
                   ` (2 preceding siblings ...)
  2023-11-17 20:17 ` [PATCH 3/7] Lockfile Michal Jires
@ 2023-11-17 20:17 ` Michal Jires
  2024-05-14 11:51   ` Jan Hubicka
  2023-11-17 20:17 ` [PATCH 5/7] lto: Implement cache partitioning Michal Jires
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Michal Jires @ 2023-11-17 20:17 UTC (permalink / raw)
  To: gcc-patches

This patch implements Incremental LTO as ltrans cache.

The cache is active when directory $GCC_LTRANS_CACHE is specified and exists.
Stored are pairs of ltrans input/output files and input file hash.
File locking is used to allow multiple GCC instances to use to same cache.

Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/ChangeLog:

	* Makefile.in: Add lto-ltrans-cache.o.
	* lto-wrapper.cc: Use ltrans cache.
	* lto-ltrans-cache.cc: New file.
	* lto-ltrans-cache.h: New file.
---
 gcc/Makefile.in         |   5 +-
 gcc/lto-ltrans-cache.cc | 407 ++++++++++++++++++++++++++++++++++++++++
 gcc/lto-ltrans-cache.h  | 164 ++++++++++++++++
 gcc/lto-wrapper.cc      | 150 +++++++++++++--
 4 files changed, 711 insertions(+), 15 deletions(-)
 create mode 100644 gcc/lto-ltrans-cache.cc
 create mode 100644 gcc/lto-ltrans-cache.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 2c527245c81..495e5f3d069 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1831,7 +1831,7 @@ ALL_HOST_BACKEND_OBJS = $(GCC_OBJS) $(OBJS) $(OBJS-libcommon) \
   $(OBJS-libcommon-target) main.o c-family/cppspec.o \
   $(COLLECT2_OBJS) $(EXTRA_GCC_OBJS) $(GCOV_OBJS) $(GCOV_DUMP_OBJS) \
   $(GCOV_TOOL_OBJS) $(GENGTYPE_OBJS) gcc-ar.o gcc-nm.o gcc-ranlib.o \
-  lto-wrapper.o collect-utils.o lockfile.o
+  lto-wrapper.o collect-utils.o lockfile.o lto-ltrans-cache.o
 
 # for anything that is shared use the cc1plus profile data, as that
 # is likely the most exercised during the build
@@ -2359,7 +2359,8 @@ collect2$(exeext): $(COLLECT2_OBJS) $(LIBDEPS)
 CFLAGS-collect2.o += -DTARGET_MACHINE=\"$(target_noncanonical)\" \
 	@TARGET_SYSTEM_ROOT_DEFINE@
 
-LTO_WRAPPER_OBJS = lto-wrapper.o collect-utils.o ggc-none.o lockfile.o
+LTO_WRAPPER_OBJS = lto-wrapper.o collect-utils.o ggc-none.o lockfile.o \
+  lto-ltrans-cache.o
 
 lto-wrapper$(exeext): $(LTO_WRAPPER_OBJS) libcommon-target.a $(LIBDEPS)
 	+$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o T$@ \
diff --git a/gcc/lto-ltrans-cache.cc b/gcc/lto-ltrans-cache.cc
new file mode 100644
index 00000000000..0d43e548fb3
--- /dev/null
+++ b/gcc/lto-ltrans-cache.cc
@@ -0,0 +1,407 @@
+/* File caching.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "md5.h"
+#include "lto-ltrans-cache.h"
+
+#include <cstring>
+#include <algorithm>
+#include <stdio.h>
+
+const md5_checksum_t INVALID_CHECKSUM = {
+  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+};
+
+/* Computes checksum for given file, returns INVALID_CHECKSUM if not possible.
+ */
+static md5_checksum_t
+file_checksum (char const *filename)
+{
+  FILE *file = fopen (filename, "rb");
+
+  if (!file)
+    return INVALID_CHECKSUM;
+
+  md5_checksum_t result;
+
+  int ret = md5_stream (file, &result);
+
+  if (ret)
+    result = INVALID_CHECKSUM;
+
+  fclose (file);
+
+  return result;
+}
+
+/* Checks identity of two files byte by byte.  */
+static bool
+files_identical (char const *first_filename, char const *second_filename)
+{
+  FILE *f_first = fopen (first_filename, "rb");
+  if (!f_first)
+    return false;
+
+  FILE *f_second = fopen (second_filename, "rb");
+  if (!f_second)
+    {
+      fclose (f_first);
+      return false;
+    }
+
+  bool ret = true;
+
+  for (;;)
+    {
+      int c1, c2;
+      c1 = fgetc (f_first);
+      c2 = fgetc (f_second);
+
+      if (c1 != c2)
+	{
+	  ret = false;
+	  break;
+	}
+
+      if (c1 == EOF)
+	break;
+    }
+
+  fclose (f_first);
+  fclose (f_second);
+  return ret;
+}
+
+/* Contructor of cache item.  */
+ltrans_file_cache::item::item (std::string input, std::string output,
+  md5_checksum_t input_checksum, uint32_t last_used):
+  input (std::move (input)), output (std::move (output)),
+  input_checksum (input_checksum), last_used (last_used)
+{
+  lock = lockfile (this->input + ".lock");
+}
+/* Destructor of cache item.  */
+ltrans_file_cache::item::~item ()
+{
+  lock.unlock ();
+}
+
+/* Reads next cache item from cachedata file.
+   Adds `dir/` prefix to filenames.  */
+static ltrans_file_cache::item*
+read_cache_item (FILE* f, const char* dir)
+{
+  md5_checksum_t checksum;
+  uint32_t last_used;
+
+  if (fread (&checksum, 1, checksum.size (), f) != checksum.size ())
+    return NULL;
+  if (fread (&last_used, sizeof (last_used), 1, f) != 1)
+    return NULL;
+
+  std::vector<char> input (strlen (dir));
+  memcpy (&input[0], dir, input.size ());
+  input.push_back ('/');
+  std::vector<char> output = input; /* Copy.  */
+
+  int c;
+  while ((c = getc (f)))
+    {
+      if (c == EOF)
+	return NULL;
+      input.push_back (c);
+    }
+  input.push_back (0);
+  while ((c = getc (f)))
+    {
+      if (c == EOF)
+	return NULL;
+      output.push_back (c);
+    }
+  output.push_back (0);
+
+  return new ltrans_file_cache::item (&input[0], &output[0], checksum,
+				      last_used);
+}
+
+/* Writes cache item to cachedata file.
+   Removes `dir/` prefix from filenames.  */
+static void
+write_cache_item (FILE* f, ltrans_file_cache::item *item, const char* dir)
+{
+  fwrite (&item->input_checksum, 1, item->input_checksum.size (), f);
+  fwrite (&item->last_used, sizeof (item->last_used), 1, f);
+
+  gcc_assert (item->input.size () > strlen (dir));
+  fputs (item->input.c_str () + strlen (dir) + 1, f);
+  fputc (0, f);
+
+  gcc_assert (item->output.size () > strlen (dir));
+  fputs (item->output.c_str () + strlen (dir) + 1, f);
+  fputc (0, f);
+}
+
+/* Constructor.  Resulting cache item filenames will be
+   in format `prefix%d[.ltrans]suffix`.  */
+ltrans_file_cache::ltrans_file_cache (const char* dir, const char* prefix,
+				      const char* suffix)
+{
+  this->dir = dir;
+  if (!dir) return;
+
+  creation_lock = lockfile (std::string (dir) + "/lockfile_creation");
+  deletion_lock = lockfile (std::string (dir) + "/lockfile_deletion");
+
+  soft_cache_size = 2048;
+
+  cache_prefix = std::string (dir) + "/" + prefix;
+  cache_free_idx = 0;
+
+  this->prefix = prefix;
+  this->suffix = suffix;
+
+  str_buffer = (char *)xmalloc (cache_prefix.size ()
+				+ sizeof ("4294967295.ltrans")
+				+ strlen (suffix));
+}
+
+/* Destructor.  */
+ltrans_file_cache::~ltrans_file_cache ()
+{
+  if (!*this)
+    return;
+
+  cleanup ();
+  free (str_buffer);
+}
+
+/* Adds given cache item to all relevant datastructures.  */
+void
+ltrans_file_cache::add_item (ltrans_file_cache::item* item)
+{
+  items.push_back (item);
+  map_checksum[item->input_checksum] = item;
+  map_input[item->input] = item;
+
+  usage_counter = std::max (usage_counter, item->last_used);
+}
+
+/* Creates cachedata filename for save/load.  */
+std::string
+ltrans_file_cache::filename_cachedata ()
+{
+  return std::string (dir) + "/cachedata";
+}
+
+/* Loads data about previously cached items from cachedata file.
+   Sorts items by last_used and remaps last_used to small integers.
+
+   Must be called with creation_lock or deletion_lock held to
+   prevent data race.  */
+void
+ltrans_file_cache::load_cache ()
+{
+  cleanup ();
+
+  std::string filename = filename_cachedata ();
+  FILE *file = fopen (filename.c_str (), "rb");
+
+  if (!file)
+    return;
+
+  ltrans_file_cache::item* _item;
+  while ((_item = read_cache_item (file, dir)))
+    add_item (_item);
+
+  fclose (file);
+
+  std::sort (items.begin (), items.end (),
+	     [](item* a, item* b)
+	       {return a->last_used < b->last_used;});
+
+  for (size_t i = 0; i < items.size (); ++i)
+    items[i]->last_used = (uint32_t) i;
+  usage_counter = (uint32_t) items.size ();
+}
+
+/* Rewrites data about cache items into cachedata file.
+
+   Must be only called when creation_lock or deletion_lock was held since last
+   call to load_cache.  */
+void
+ltrans_file_cache::save_cache ()
+{
+  std::string filename = filename_cachedata ();
+  FILE *file = fopen (filename.c_str (), "wb");
+
+  if (!file)
+    return;
+
+  for (item* _item: items)
+    write_cache_item (file, _item, dir);
+
+  fclose (file);
+}
+
+/* Creates new cache item with given checksum.
+   New input/output files are chosen to not collide with other items.
+
+   Must be called with creation_lock held to prevent data race.  */
+ltrans_file_cache::item*
+ltrans_file_cache::create_item (md5_checksum_t checksum)
+{
+  size_t prefix_len = cache_prefix.size ();
+
+  strcpy (str_buffer, cache_prefix.c_str ());
+
+  for (;;)
+    {
+      sprintf (str_buffer + prefix_len, "%04u%s", cache_free_idx, suffix);
+
+      if (map_input.find (str_buffer) == map_input.end ())
+	break;
+      cache_free_idx++;
+    }
+
+  std::string input = str_buffer;
+
+  sprintf (str_buffer + prefix_len, "%04u.ltrans%s", cache_free_idx, suffix);
+
+  return new item (std::move (input), str_buffer, checksum, usage_counter++);
+}
+
+/* Adds input file into cache.  Cache item with input file identical to
+   added input file will be returned as _item.
+   If the file was already cached, `true` is returned, `false` otherwise.
+   The added input file is deleted (or moved).
+
+   Must be called with creation_lock held to prevent data race.  */
+bool
+ltrans_file_cache::add_to_cache (const char* filename, item*& _item)
+{
+  md5_checksum_t checksum = file_checksum (filename);
+
+  auto it = map_checksum.find (checksum);
+
+  if (it != map_checksum.end ()
+      && files_identical (filename, it->second->input.c_str ()))
+    {
+      unlink (filename);
+      _item = it->second;
+      _item->last_used = usage_counter++;
+      return true;
+    }
+  else
+    {
+      /* Cache miss.  Move into cache dir.  */
+      _item = create_item (checksum);
+      add_item (_item);
+
+      rename (filename, _item->input.c_str ());
+      return false;
+    }
+}
+
+/* If exists, returns cache item corresponding to cached input file.  */
+ltrans_file_cache::item*
+ltrans_file_cache::get_item (const char* input)
+{
+  auto it = map_input.find (input);
+  if (it == map_input.end ())
+    return NULL;
+  return it->second;
+}
+
+/* If no other process holds the deletion_lock, prunes oldest unused cache
+   items over limit.  */
+void
+ltrans_file_cache::try_prune ()
+{
+  if (deletion_lock.try_lock_write () == 0)
+    {
+      prune ();
+      deletion_lock.unlock ();
+    }
+}
+
+/* Returns true if file exists.  */
+static int
+file_exists (char const *name)
+{
+  return access (name, R_OK) == 0;
+}
+
+/* Prunes oldest unused cache items over limit.
+   Must be called with deletion_lock held to prevent data race.  */
+void
+ltrans_file_cache::prune ()
+{
+  load_cache ();
+  if (items.size () > soft_cache_size)
+    {
+      std::vector<item*> sorted_items = std::move (items);
+
+      cleanup ();
+
+      for (size_t i = 0; i < sorted_items.size (); ++i)
+	{
+	  ltrans_file_cache::item* item = sorted_items[i];
+	  if ((i < sorted_items.size () - soft_cache_size)
+	      || !file_exists (item->input.c_str ())
+	      || !file_exists (item->output.c_str ()))
+	    {
+	      unlink (item->input.c_str ());
+	      unlink (item->output.c_str ());
+	      delete item;
+	    }
+	  else
+	    add_item (item);
+	}
+    }
+  save_cache ();
+}
+
+/* Clears cache class, as if only constructor was called.  */
+void
+ltrans_file_cache::cleanup ()
+{
+  map_checksum.clear ();
+  map_input.clear ();
+
+  for (ltrans_file_cache::item* item: items)
+    delete item;
+  items.clear ();
+
+  usage_counter = 0;
+}
+
+
+/* Returns ltrans cache dir.
+   Returns NULL if ltrans cache is disabled.  */
+const char*
+get_ltrans_cache_dir ()
+{
+  const char *ltrans_cache_dir = getenv ("GCC_LTRANS_CACHE");
+  if (!ltrans_cache_dir || ltrans_cache_dir[0] == '\0'
+      || !file_exists (ltrans_cache_dir))
+    ltrans_cache_dir = NULL;
+  return ltrans_cache_dir;
+}
diff --git a/gcc/lto-ltrans-cache.h b/gcc/lto-ltrans-cache.h
new file mode 100644
index 00000000000..763a23635f7
--- /dev/null
+++ b/gcc/lto-ltrans-cache.h
@@ -0,0 +1,164 @@
+/* File caching.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_LTO_LTRANS_CACHE_H
+#define GCC_LTO_LTRANS_CACHE_H
+
+#include <map>
+#include <string>
+#include <array>
+#include <vector>
+#include <cstdint>
+
+#include "lockfile.h"
+
+using md5_checksum_t = std::array<uint8_t, 16>;
+
+class ltrans_file_cache
+{
+public:
+  /* Cache item representing input/output filename pair.  */
+  struct item
+  {
+    item (std::string input, std::string output,
+	  md5_checksum_t input_checksum, uint32_t last_used);
+    ~item ();
+
+    /* Full path to input filename.  */
+    const std::string input;
+    /* Full path to output filename.  */
+    const std::string output;
+    /* Checksum of input file.  */
+    const md5_checksum_t input_checksum;
+
+    /* Larger last_used corresponds to later usage.  */
+    uint32_t last_used;
+
+    /* Lockfile so that output file can be created later than input file.  */
+    lockfile lock;
+  };
+
+  /* Constructor.  Resulting cache item filenames will be
+     in format `prefix%d[.ltrans]suffix`.  */
+  ltrans_file_cache (const char* dir, const char* prefix, const char* suffix);
+  /* Destructor.  */
+  ~ltrans_file_cache ();
+
+  /* Loads data about previously cached items from cachedata file.
+
+     Must be called with creation_lock or deletion_lock held to
+     prevent data race.  */
+  void
+  load_cache ();
+
+  /* Rewrites data about cache items into cachedata file.
+
+     Must be only called when creation_lock or deletion_lock was held since last
+     call to load_cache.  */
+  void
+  save_cache ();
+
+
+  /* Adds input file into cache.  Cache item with input file identical to
+     added input file will be returned as _item.
+     If the file was already cached, `true` is returned, `false` otherwise.
+     The added input file is deleted (or moved).
+
+     Must be called with creation_lock held to prevent data race.  */
+  bool
+  add_to_cache (const char* filename, item*& _item);
+
+  /* If exists, returns cache item corresponding to cached input file.  */
+  item*
+  get_item (const char* input);
+
+  /* If no other process holds the deletion_lock, prunes oldest unused cache
+     items over limit.  */
+  void
+  try_prune ();
+
+  /* Clears cache class, as if only constructor was called.  */
+  void
+  cleanup ();
+
+  /* Cache is enabled if true.  */
+  operator bool ()
+  {
+    return dir;
+  }
+
+  /* Access to already created items can be concurrent with item creation.  */
+  lockfile creation_lock;
+  /* Access to already created items cannot be concurrent
+     with item deletion.  */
+  lockfile deletion_lock;
+
+  /* Directory of cache.  NULL if cache is disabled.  */
+  const char* dir;
+private:
+  /* Adds given cache item to all relevant datastructures.  */
+  void
+  add_item (item* item);
+
+  /* Creates new cache item with given checksum.
+     New input/output files are chosen to not collide with other items.
+
+     Must be called with creation_lock held to prevent data race.  */
+  item*
+  create_item (md5_checksum_t checksum);
+
+  /* Prunes oldest unused cache items over limit.
+     Must be called with deletion_lock held to prevent data race.  */
+  void
+  prune ();
+
+  /* Creates cachedata filename for save/load.  */
+  std::string
+  filename_cachedata ();
+
+  /* All cache items in current cache.  */
+  std::vector<item*> items;
+  std::map<md5_checksum_t, item*> map_checksum;
+  std::map<std::string, item*> map_input;
+
+  /* Cached filenames are in format "prefix%d[.ltrans]suffix".  */
+  const char* prefix;
+  const char* suffix;
+
+  /* If cache items count is larger, prune deletes old items.  */
+  size_t soft_cache_size;
+
+  /* Counter used to populate last_used of items.  */
+  uint32_t usage_counter;
+
+  /* String in format "dir/prefix".  */
+  std::string cache_prefix;
+  /* Lower indices are occupied.  */
+  uint32_t cache_free_idx;
+
+  /* Buffer for sprintf.  */
+  char* str_buffer;
+};
+
+/* Returns ltrans cache dir.
+   Returns NULL if ltrans cache is disabled.  */
+const char*
+get_ltrans_cache_dir ();
+
+#endif
diff --git a/gcc/lto-wrapper.cc b/gcc/lto-wrapper.cc
index 5186d040ce0..4b98539067e 100644
--- a/gcc/lto-wrapper.cc
+++ b/gcc/lto-wrapper.cc
@@ -52,6 +52,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "opts-diagnostic.h"
 #include "opt-suggestions.h"
 #include "opts-jobserver.h"
+#include "lto-ltrans-cache.h"
 
 /* Environment variable, used for passing the names of offload targets from GCC
    driver to lto-wrapper.  */
@@ -79,7 +80,7 @@ static char *flto_out;
 static unsigned int nr;
 static int *ltrans_priorities;
 static char **input_names;
-static char **output_names;
+static char const**output_names;
 static char **offload_names;
 static char *offload_objects_file_name;
 static char *makefile;
@@ -1802,6 +1803,8 @@ cont1:
 	}
     }
 
+  const char *ltrans_cache_dir = get_ltrans_cache_dir ();
+
   /* If object files contain offload sections, but do not contain LTO sections,
      then there is no need to perform a link-time recompilation, i.e.
      lto-wrapper is used only for a compilation of offload images.  */
@@ -1827,7 +1830,17 @@ cont1:
       char *dumpbase = concat (dumppfx, "wpa", NULL);
       obstack_ptr_grow (&argv_obstack, dumpbase);
 
-      if (save_temps)
+      if (ltrans_cache_dir)
+	{
+	  /* Results of wpa phase must be on the same disk partition as
+	     cache.  */
+	  char* file = concat (ltrans_cache_dir, "/ccXXXXXX.ltrans.out", NULL);
+	  int fd = mkstemps (file, strlen (".ltrans.out"));
+	  gcc_assert (fd != -1 && !close (fd));
+
+	  ltrans_output_file = file;
+	}
+      else if (save_temps)
 	ltrans_output_file = concat (dumppfx, "ltrans.out", NULL);
       else
 	ltrans_output_file = make_temp_file (".ltrans.out");
@@ -1953,7 +1966,8 @@ cont:
 	  ltrans_priorities
 	     = (int *)xrealloc (ltrans_priorities, nr * sizeof (int) * 2);
 	  input_names = (char **)xrealloc (input_names, nr * sizeof (char *));
-	  output_names = (char **)xrealloc (output_names, nr * sizeof (char *));
+	  output_names = (char const**)
+	    xrealloc (output_names, nr * sizeof (char const*));
 	  ltrans_priorities[(nr-1)*2] = priority;
 	  ltrans_priorities[(nr-1)*2+1] = nr-1;
 	  input_names[nr-1] = input_name;
@@ -1985,21 +1999,79 @@ cont:
 	  qsort (ltrans_priorities, nr, sizeof (int) * 2, cmp_priority);
 	}
 
+      ltrans_file_cache ltrans_cache (ltrans_cache_dir, "ltrans", ".o");
+
+      if (ltrans_cache)
+	{
+	  if (!lockfile::lockfile_supported ())
+	    {
+	      warning (0, "using ltrans cache without file locking support,"
+		       " do not use in parallel");
+	    }
+	  ltrans_cache.deletion_lock.lock_read ();
+	  ltrans_cache.creation_lock.lock_write ();
+
+	  ltrans_cache.load_cache ();
+
+	  int recompiling = 0;
+
+	  for (i = 0; i < nr; ++i)
+	    {
+	      /* If it's a pass-through file do nothing.  */
+	      if (output_names[i])
+		continue;
+
+	      ltrans_file_cache::item* item;
+	      bool existed = ltrans_cache.add_to_cache (input_names[i], item);
+	      free (input_names[i]);
+	      input_names[i] = xstrdup (item->input.c_str ());
+
+	      if (existed)
+		{
+		  /* Fill the output_name to skip compilation.  */
+		  output_names[i] = item->output.c_str ();
+		  recompiling++;
+		}
+	      else
+		{
+		  /* Lock so no other process can access until the file is
+		     compiled.  */
+		  item->lock.lock_write ();
+		}
+	    }
+	  if (verbose)
+	    fprintf (stderr, "LTRANS: recompiling %d/%d\n", recompiling, nr);
+
+	  ltrans_cache.save_cache ();
+	  ltrans_cache.creation_lock.unlock ();
+	}
+
       /* Execute the LTRANS stage for each input file (or prepare a
 	 makefile to invoke this in parallel).  */
       for (i = 0; i < nr; ++i)
 	{
-	  char *output_name;
+	  char const* output_name;
 	  char *input_name = input_names[i];
-	  /* If it's a pass-through file do nothing.  */
+	  /* If it's a pass-through or cached file do nothing.  */
 	  if (output_names[i])
 	    continue;
 
-	  /* Replace the .o suffix with a .ltrans.o suffix and write
-	     the resulting name to the LTRANS output list.  */
-	  obstack_grow (&env_obstack, input_name, strlen (input_name) - 2);
-	  obstack_grow (&env_obstack, ".ltrans.o", sizeof (".ltrans.o"));
-	  output_name = XOBFINISH (&env_obstack, char *);
+	  if (ltrans_cache)
+	    {
+	      ltrans_file_cache::item* item;
+	      item = ltrans_cache.get_item (input_name);
+	      gcc_assert (item);
+
+	      output_name = item->output.c_str ();
+	    }
+	  else
+	    {
+	      /* Replace the .o suffix with a .ltrans.o suffix and write
+		 the resulting name to the LTRANS output list.  */
+	      obstack_grow (&env_obstack, input_name, strlen (input_name) - 2);
+	      obstack_grow (&env_obstack, ".ltrans.o", sizeof (".ltrans.o"));
+	      output_name = XOBFINISH (&env_obstack, char const*);
+	    }
 
 	  /* Adjust the dumpbase if the linker output file was seen.  */
 	  int dumpbase_len = (strlen (dumppfx)
@@ -2023,7 +2095,7 @@ cont:
 	      /* If we are not preserving the ltrans input files then
 	         truncate them as soon as we have processed it.  This
 		 reduces temporary disk-space usage.  */
-	      if (! save_temps)
+	      if (!ltrans_cache && !save_temps)
 		fprintf (mstream, "\t@-touch -r \"%s\" \"%s.tem\" > /dev/null "
 			 "2>&1 && mv \"%s.tem\" \"%s\"\n",
 			 input_name, input_name, input_name, input_name); 
@@ -2038,7 +2110,8 @@ cont:
 			  "ltrans%u.ltrans_args", i);
 	      fork_execute (new_argv[0], CONST_CAST (char **, new_argv),
 			    true, save_temps ? argsuffix : NULL);
-	      maybe_unlink (input_name);
+	      if (!ltrans_cache)
+		maybe_unlink (input_names[i]);
 	    }
 
 	  output_names[i] = output_name;
@@ -2093,15 +2166,66 @@ cont:
 	  freeargv (make_argv);
 	  maybe_unlink (makefile);
 	  makefile = NULL;
+
+	  if (!ltrans_cache)
+	    for (i = 0; i < nr; ++i)
+	      maybe_unlink (input_names[i]);
+	}
+
+      if (ltrans_cache)
+	{
 	  for (i = 0; i < nr; ++i)
-	    maybe_unlink (input_names[i]);
+	    {
+	      char *input_name = input_names[i];
+	      char const *output_name = output_names[i];
+
+	      ltrans_file_cache::item* item;
+	      item = ltrans_cache.get_item (input_name);
+
+	      if (item && !save_temps)
+		{
+		  item->lock.lock_read ();
+		  /* Ensure that cached compiled file is not deleted.
+		     Create copy.  */
+
+		  obstack_grow (&env_obstack, output_name,
+				strlen (output_name) - 2);
+		  obstack_grow (&env_obstack, ".cache_copy.XXX.o",
+				sizeof (".cache_copy.XXX.o"));
+
+		  char* output_name_link = XOBFINISH (&env_obstack, char *);
+		  char* name_idx = output_name_link + strlen (output_name_link)
+				   - strlen ("XXX.o");
+
+		  /* lto-wrapper can run in parallel and access
+		     the same partition.  */
+		  for (int j = 0; ; j++)
+		    {
+		      gcc_assert (j < 1000);
+		      sprintf (name_idx, "%03d.o", j);
+
+		      if (link (output_name, output_name_link) != EEXIST)
+			break;
+		    }
+
+		  output_names[i] = output_name_link;
+		  item->lock.unlock ();
+		}
+	    }
+
+	  ltrans_cache.deletion_lock.unlock ();
 	}
+
       for (i = 0; i < nr; ++i)
 	{
 	  fputs (output_names[i], stdout);
 	  putc ('\n', stdout);
 	  free (input_names[i]);
 	}
+
+      if (ltrans_cache && !save_temps)
+	ltrans_cache.try_prune ();
+
       if (!skip_debug)
 	{
 	  for (i = 0; i < ltoobj_argc; ++i)
-- 
2.42.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 5/7] lto: Implement cache partitioning
  2023-11-17 20:16 [PATCH 0/7] lto: Incremental LTO Michal Jires
                   ` (3 preceding siblings ...)
  2023-11-17 20:17 ` [PATCH 4/7] lto: Implement ltrans cache Michal Jires
@ 2023-11-17 20:17 ` Michal Jires
  2024-05-14 12:10   ` Jan Hubicka
  2023-11-17 20:17 ` [PATCH 6/7] lto: squash order of symbols in partitions Michal Jires
  2023-11-17 20:17 ` [PATCH 7/7] lto: partition specific lto_clone_numbers Michal Jires
  6 siblings, 1 reply; 18+ messages in thread
From: Michal Jires @ 2023-11-17 20:17 UTC (permalink / raw)
  To: gcc-patches

This patch implements new cache partitioning. It tries to keep symbols
from single source file together to minimize propagation of divergence.

It starts with symbols already grouped by source files. If reasonably
possible it only either combines several files into one final partition,
or, if a file is large, split the file into several final partitions.

Intermediate representation is partition_set which contains set of
groups of symbols (each group corresponding to original source file) and
number of final partitions this partition_set should split into.

First partition_fixed_split splits partition_set into constant number of
partition_sets with equal number of symbols groups. If for example there
are 39 source files, the resulting partition_sets will contain 10, 10,
10, and 9 source files. This splitting intentionally ignores estimated
instruction counts to minimize propagation of divergence.

Second partition_over_target_split separates too large files and splits
them into individual symbols to be combined back into several smaller
files in next step.

Third partition_binary_split splits partition_set into two halves until
it should be split into only one final partition, at which point the
remaining symbols are joined into one final partition.

Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/ChangeLog:

	* common.opt: Add cache partitioning.
	* flag-types.h (enum lto_partition_model): Likewise.

gcc/lto/ChangeLog:

	* lto-partition.cc (new_partition): Use new_partition_no_push.
	(new_partition_no_push): New.
	(free_ltrans_partition): New.
	(free_ltrans_partitions): Use free_ltrans_partition.
	(join_partitions): New.
	(split_partition_into_nodes): New.
	(is_partition_reorder): New.
	(class partition_set): New.
	(distribute_n_partitions): New.
	(partition_over_target_split): New.
	(partition_binary_split): New.
	(partition_fixed_split): New.
	(class partitioner_base): New.
	(class partitioner_default): New.
	(lto_cache_map): New.
	* lto-partition.h (lto_cache_map): New.
	* lto.cc (do_whole_program_analysis): Use lto_cache_map.

gcc/testsuite/ChangeLog:

	* gcc.dg/completion-2.c: Add -flto-partition=cache.
---
 gcc/common.opt                      |   3 +
 gcc/flag-types.h                    |   3 +-
 gcc/lto/lto-partition.cc            | 605 +++++++++++++++++++++++++++-
 gcc/lto/lto-partition.h             |   1 +
 gcc/lto/lto.cc                      |   2 +
 gcc/testsuite/gcc.dg/completion-2.c |   1 +
 6 files changed, 605 insertions(+), 10 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 1cf3bdd3b51..fe5cf3c0a05 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2174,6 +2174,9 @@ Enum(lto_partition_model) String(1to1) Value(LTO_PARTITION_1TO1)
 EnumValue
 Enum(lto_partition_model) String(max) Value(LTO_PARTITION_MAX)
 
+EnumValue
+Enum(lto_partition_model) String(cache) Value(LTO_PARTITION_CACHE)
+
 flto-partition=
 Common Joined RejectNegative Enum(lto_partition_model) Var(flag_lto_partition) Init(LTO_PARTITION_BALANCED)
 Specify the algorithm to partition symbols and vars at linktime.
diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index c1852cd810c..59b3c23081b 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -393,7 +393,8 @@ enum lto_partition_model {
   LTO_PARTITION_ONE = 1,
   LTO_PARTITION_BALANCED = 2,
   LTO_PARTITION_1TO1 = 3,
-  LTO_PARTITION_MAX = 4
+  LTO_PARTITION_MAX = 4,
+  LTO_PARTITION_CACHE = 5
 };
 
 /* flag_lto_linker_output initialization values.  */
diff --git a/gcc/lto/lto-partition.cc b/gcc/lto/lto-partition.cc
index e4c91213f4b..eb31ecba0d3 100644
--- a/gcc/lto/lto-partition.cc
+++ b/gcc/lto/lto-partition.cc
@@ -36,6 +36,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "lto-partition.h"
 #include "sreal.h"
 
+#include <limits>
+#include <vector>
+
 vec<ltrans_partition> ltrans_partitions;
 
 static void add_symbol_to_partition (ltrans_partition part, symtab_node *node);
@@ -59,20 +62,41 @@ cmp_partitions_order (const void *a, const void *b)
   return orderb - ordera;
 }
 
-/* Create new partition with name NAME.  */
-
+/* Create new partition with name NAME.
+   Does not push into ltrans_partitions.  */
 static ltrans_partition
-new_partition (const char *name)
+new_partition_no_push (const char *name)
 {
   ltrans_partition part = XCNEW (struct ltrans_partition_def);
   part->encoder = lto_symtab_encoder_new (false);
   part->name = name;
   part->insns = 0;
   part->symbols = 0;
+  return part;
+}
+
+/* Create new partition with name NAME.  */
+
+static ltrans_partition
+new_partition (const char *name)
+{
+  ltrans_partition part = new_partition_no_push (name);
   ltrans_partitions.safe_push (part);
   return part;
 }
 
+/* Free memory used by ltrans partition.
+   Encoder can be kept to be freed after streaming.  */
+static void
+free_ltrans_partition (ltrans_partition part, bool delete_encoder)
+  {
+    if (part->initializers_visited)
+      delete part->initializers_visited;
+    if (delete_encoder)
+      lto_symtab_encoder_delete (part->encoder);
+    free (part);
+  }
+
 /* Free memory used by ltrans datastructures.  */
 
 void
@@ -81,12 +105,7 @@ free_ltrans_partitions (void)
   unsigned int idx;
   ltrans_partition part;
   for (idx = 0; ltrans_partitions.iterate (idx, &part); idx++)
-    {
-      if (part->initializers_visited)
-	delete part->initializers_visited;
-      /* Symtab encoder is freed after streaming.  */
-      free (part);
-    }
+    free_ltrans_partition (part, false);
   ltrans_partitions.release ();
 }
 
@@ -427,6 +446,574 @@ account_reference_p (symtab_node *n1, symtab_node *n2)
   return true;
 }
 
+/* Joins two partitions into one.
+   NULL partitions are equivalent to empty partition.
+   If both partition are non-null, symbols from FROM are added into INTO.  */
+static ltrans_partition
+join_partitions (ltrans_partition into, ltrans_partition from)
+{
+  if (!into)
+    return from;
+  if (!from)
+    return into;
+
+  lto_symtab_encoder_iterator lsei;
+  lto_symtab_encoder_t encoder = from->encoder;
+
+  /* If aux is non zero, it will not be added to the new partition.  Since
+     adding symbols is recursive, it is safer to reduce aux of all symbols
+     before adding any symbols to other partition.  */
+  for (lsei = lsei_start (encoder); !lsei_end_p (lsei); lsei_next (&lsei))
+    {
+      symtab_node *node = lsei_node (lsei);
+      node->aux = (void *)((size_t)node->aux - 1);
+    }
+
+  for (lsei = lsei_start (encoder); !lsei_end_p (lsei); lsei_next (&lsei))
+    {
+      symtab_node *node = lsei_node (lsei);
+
+      if (symbol_partitioned_p (node))
+	continue;
+
+      add_symbol_to_partition (into, node);
+    }
+
+  free_ltrans_partition (from, true);
+
+  return into;
+}
+
+/* Takes symbols from given partitions and splits them into N partitions where
+   each partitions contains one symbol and its requirements.  */
+static std::vector<ltrans_partition>
+split_partition_into_nodes (ltrans_partition part)
+{
+  std::vector<ltrans_partition> partitions;
+
+  lto_symtab_encoder_iterator lsei;
+  lto_symtab_encoder_t encoder = part->encoder;
+
+  for (lsei = lsei_start (encoder); !lsei_end_p (lsei); lsei_next (&lsei))
+    {
+      symtab_node *node = lsei_node (lsei);
+      node->aux = (void *)((size_t)node->aux - 1);
+    }
+
+  for (lsei = lsei_start (encoder); !lsei_end_p (lsei); lsei_next (&lsei))
+    {
+      symtab_node *node = lsei_node (lsei);
+
+      if (node->get_partitioning_class () != SYMBOL_PARTITION
+	  || symbol_partitioned_p (node))
+	continue;
+
+      ltrans_partition new_part = new_partition_no_push (part->name);
+      add_symbol_to_partition (new_part, node);
+      partitions.push_back (new_part);
+    }
+
+  return partitions;
+}
+
+/* Returns whether partition contains symbols that cannot be reordered.  */
+static bool
+is_partition_reorder (ltrans_partition part)
+{
+  lto_symtab_encoder_iterator lsei;
+  lto_symtab_encoder_t encoder = part->encoder;
+
+  for (lsei = lsei_start (encoder); !lsei_end_p (lsei); lsei_next (&lsei))
+    {
+      symtab_node *node = lsei_node (lsei);
+      if (node->no_reorder)
+	return false;
+    }
+  return true;
+}
+
+/* Represents groups of symbols, that should be partitioned into n_partitions
+   partitions.  */
+class partition_set
+{
+public:
+  /* Metadata to easily pass copy to new partition_set.  */
+  class metadata
+  {
+  public:
+    /* Partitions can be reordered.  */
+    bool reorder;
+    /* Partitions can be split into individual symbols.  */
+    bool splitable;
+
+    metadata (bool reorder, bool splitable):
+      reorder (reorder), splitable (splitable)
+    {}
+  };
+  metadata data;
+
+  /* Symbol groups.  Use push (g) to insert symbols.  */
+  std::vector<ltrans_partition> sym_groups;
+  /* Number of partitions these symbols should be partitioned into.  */
+  size_t n_partitions;
+  /* Total number of instructions of all symbols.  */
+  int64_t insns;
+
+  /* Constructor.  Symbols and n_partitions can be added later.  */
+  partition_set (metadata data, std::vector<ltrans_partition> sym_groups = {},
+		 size_t n_partitions = 0)
+    : data (data), sym_groups (std::move (sym_groups)),
+    n_partitions (n_partitions), insns (0)
+  {
+    for (ltrans_partition g: this->sym_groups)
+      insns += g->insns;
+  }
+
+  /* Adds symbol group and updates total insns.  */
+  void
+  push (ltrans_partition g)
+  {
+    sym_groups.push_back (g);
+    insns += g->insns;
+  }
+
+  /* Returns whether any symbols group is contained.  */
+  bool
+  empty ()
+  {
+    return sym_groups.empty ();
+  }
+};
+
+/* Distributes total n_partitions among partition_sets.
+   Aims to be as fair as possible.  */
+static void
+distribute_n_partitions (std::vector<partition_set>& ps, size_t n_partitions)
+{
+  gcc_assert (ps.size ());
+  gcc_assert (ps.size () <= n_partitions);
+
+  int64_t total_size = 0;
+
+  for (partition_set& p: ps)
+    {
+      total_size += p.insns;
+      p.n_partitions = 0;
+    }
+
+  if (total_size <= 0)
+    total_size = 1;
+
+  size_t n_partitions_allocated = 0;
+
+  /* First we allocate largest amount of partitions so that target_sizes are
+     larger than target size of total (insns/total_size).
+     All partition_set must have n_partitions at least one.  */
+  for (partition_set& p: ps)
+    {
+      p.n_partitions = n_partitions * p.insns / total_size;
+      if (p.n_partitions == 0 && p.sym_groups.size ())
+	p.n_partitions = 1;
+      if (!p.data.splitable)
+	p.n_partitions = std::min (p.n_partitions, p.sym_groups.size ());
+
+      n_partitions_allocated += p.n_partitions;
+    }
+
+  /* Rare special case, with a lot of initially 0 sized splits.  */
+  while (n_partitions_allocated > n_partitions)
+    {
+      size_t idx = 0;
+      int64_t min = std::numeric_limits<int64_t>::max ();
+
+      for (size_t i = 0; i < ps.size (); ++i)
+	{
+	  if (ps[i].n_partitions <= 1)
+	    continue;
+
+	  int64_t target_size = ps[i].insns / ps[i].n_partitions;
+	  if (min > target_size)
+	    {
+	      min = target_size;
+	      idx = i;
+	    }
+	}
+
+      ps[idx].n_partitions--;
+      n_partitions_allocated--;
+    }
+
+  /* Typical case where with any increase of n_partitions target size will cross
+     total target size.  We optimize for minimal:
+     (old_target_size - total_target_size)
+     - (total_target_size - new_target_size).  */
+  while (n_partitions_allocated < n_partitions)
+    {
+      size_t idx = 0;
+      int64_t max = 0;
+
+      for (size_t i = 0; i < ps.size (); ++i)
+	{
+	  if (ps[i].sym_groups.size () <= 1 && !ps[i].data.splitable)
+	    continue;
+
+	  int64_t target_size = ps[i].insns / ps[i].n_partitions;
+	  int64_t new_target_size = ps[i].insns / (ps[i].n_partitions + 1);
+
+	  int64_t positive_change = target_size + new_target_size;
+
+	  if (max < positive_change)
+	    {
+	      max = positive_change;
+	      idx = i;
+	    }
+	}
+
+      ps[idx].n_partitions++;
+      n_partitions_allocated++;
+    }
+}
+
+/* Splits off symbol groups that are larger than target size.
+   n_partitions are then distributed between individual
+   split off symbol groups, and everything else as a whole.
+
+   Split off symbol groups with n_partitions > 1, are
+   then split into individual symbols.
+
+   Order is not conserved.  This pass is ignored if reorder is not allowed.  */
+static std::vector<partition_set>
+partition_over_target_split (partition_set& p)
+{
+  gcc_assert (p.n_partitions >= 1);
+
+  std::vector<partition_set> all;
+  partition_set small (p.data);
+
+  int64_t target_size = p.insns / p.n_partitions;
+
+  for (ltrans_partition g: p.sym_groups)
+    {
+      if (g->insns > target_size
+	  && (p.data.reorder || is_partition_reorder (g)))
+	all.push_back (partition_set (p.data, {g}));
+      else
+	small.push (g);
+    }
+
+  if (all.empty ())
+    return {};
+
+  if (small.sym_groups.size ())
+    {
+      /* Handles special case where n_partitions might be smaller than
+	 all.size ().  Which can happen as result of interger division or with
+	 0 sized partition_sets.  Then also prevents too small symbol group.
+	 This should also be a special case; more common one,
+	 but with no correctness problems.  */
+      if (all.size () && (
+	     small.insns < (int64_t) p.n_partitions
+	  || small.insns < target_size * 0.6
+	  || small.insns < param_min_partition_size))
+	{
+	  size_t idx = 0;
+	  int64_t min_insns = std::numeric_limits<int64_t>::max ();
+	  for (size_t i = 0; i < all.size (); ++i)
+	    {
+	      if (all[i].insns < min_insns)
+		{
+		  min_insns = all[i].insns;
+		  idx = i;
+		}
+	    }
+
+	  gcc_assert (all[idx].sym_groups.size () == 1);
+
+	  ltrans_partition& into = all[idx].sym_groups[0];
+	  for (ltrans_partition g: small.sym_groups)
+	    into = join_partitions (into, g);
+
+	  all[idx].insns = into->insns;
+	}
+      else
+	{
+	  gcc_assert (all.size () < p.n_partitions);
+	  all.push_back (std::move (small));
+	}
+    }
+
+  distribute_n_partitions (all, p.n_partitions);
+
+  for (partition_set& p: all)
+    {
+      gcc_assert (p.sym_groups.size ());
+
+      /* Handles large symbol groups (large files) that will be
+	 further divided.  */
+      if (p.sym_groups.size () == 1 && p.n_partitions > 1)
+	{
+	  p.sym_groups = split_partition_into_nodes (p.sym_groups[0]);
+	  p.data.reorder = false;
+	  p.data.splitable = false;
+	}
+    }
+
+  return all;
+}
+
+/* Splits partition_set into two partition_sets with
+   equal or off by one n_partitions.
+   Order is conserved.  */
+static std::vector<partition_set>
+partition_binary_split (partition_set& p)
+{
+  gcc_assert (p.n_partitions > 1);
+
+  if (p.sym_groups.size () < 2)
+    return {};
+
+  int64_t target_size = p.insns / p.n_partitions;
+
+
+  std::vector<partition_set> result (2, partition_set (p.data));
+  partition_set& first  = result[0];
+  partition_set& second = result[1];
+
+  first.n_partitions = p.n_partitions/2;
+  second.n_partitions = p.n_partitions - first.n_partitions;
+
+  int64_t first_target_size = first.n_partitions * target_size;
+
+  int64_t insns = 0;
+  for (ltrans_partition g: p.sym_groups)
+    {
+      /* We want at least one symbol in first partition.  */
+      if (first.empty ())
+	first.push (g);
+      else if (insns < first_target_size)
+	{
+	  if (insns + g->insns < first_target_size)
+	    first.push (g);
+	  else
+	    {
+	      /* Target splitting point is in this symbol group.  */
+	      int64_t diff_first  = first_target_size - insns;
+	      int64_t diff_second = (insns + g->insns) - first_target_size;
+
+	      if (diff_first * second.n_partitions
+		  > diff_second * first.n_partitions)
+		first.push (g);
+	      else
+		second.push (g);
+	    }
+	}
+      else
+	second.push (g);
+
+      insns += g->insns;
+    }
+
+  return result;
+}
+
+/* Split partition_set into 'into' partition_sets with equal or off by one
+   number of symbol groups.  Sizes of symbol groups are ignored for deciding
+   where to split. n_partitions is then distributed among new partition_sets
+   based on their sizes.
+   Order in conserved.  */
+static std::vector<partition_set>
+partition_fixed_split (partition_set& p, size_t into)
+{
+  gcc_assert (into < p.n_partitions);
+
+  std::vector<partition_set> result;
+
+  for (size_t i = 0; i < into; ++i)
+    {
+      size_t begin = i * p.sym_groups.size () / into;
+      size_t end = (i + 1) * p.sym_groups.size () / into;
+
+      auto it = p.sym_groups.begin ();
+      result.push_back (partition_set (p.data, {it + begin, it + end}));
+    }
+
+  distribute_n_partitions (result, p.n_partitions);
+
+  return result;
+}
+
+
+/* Base implementation to inherit from for all Partitioners.  */
+class partitioner_base {
+public:
+  /* Partitions sym_groups into n_partitions partitions inserted into
+     ltrans_partitions.  */
+  void
+  apply (std::vector<ltrans_partition>& sym_groups, int n_partitions)
+  {
+    partition_set p (partition_set::metadata (true, true),
+		     std::move (sym_groups), n_partitions);
+    split (p, 0);
+  }
+
+protected:
+  partitioner_base (int64_t min_partition_size, int64_t max_partition_size):
+    min_partition_size (min_partition_size),
+    max_partition_size (max_partition_size)
+  {
+    gcc_assert (min_partition_size != 0);
+    gcc_assert (max_partition_size != 0);
+  }
+  virtual ~partitioner_base ()
+  {}
+
+  /* Joins all symbol groups into one finalized partition.  */
+  void
+  finalize (partition_set& p)
+  {
+    ltrans_partition joined = NULL;
+
+    for (ltrans_partition g: p.sym_groups)
+      joined = join_partitions (joined, g);
+
+    if (joined)
+      ltrans_partitions.safe_push (joined);
+  }
+
+  /* Splits all partition_sets.  */
+  void
+  split_list (std::vector<partition_set>& ps, uintptr_t state)
+  {
+    for (partition_set& p: ps)
+      split (p, state);
+  }
+
+  /* Handles common cases:
+     too large or small n_partitions, or n_partitions = 1.
+     And then calls split_state.  */
+  void
+  split (partition_set& p, uintptr_t state)
+  {
+    size_t min_partitions = p.insns / max_partition_size + 1;
+    size_t max_partitions = p.insns / min_partition_size;
+    if (!p.data.splitable)
+      max_partitions = std::min (max_partitions, p.sym_groups.size ());
+
+    p.n_partitions = std::max (p.n_partitions, min_partitions);
+    p.n_partitions = std::min (p.n_partitions, max_partitions);
+
+    if (p.n_partitions <= 1)
+      return finalize (p);
+
+    split_state (p, state);
+  }
+
+  /* State machine for specific partitioner implementation.  */
+  virtual void
+  split_state (partition_set& p, uintptr_t state) = 0;
+
+  int64_t min_partition_size, max_partition_size;
+};
+
+
+/* Partitioner combining fixed, over_target, and binary partitionings.  */
+class partitioner_default: public partitioner_base
+{
+public:
+  partitioner_default (int64_t min_partition_size, int64_t max_partition_size):
+    partitioner_base (min_partition_size, max_partition_size)
+  {}
+
+private:
+  virtual void
+  split_state (partition_set& p, uintptr_t state)
+  {
+    const uintptr_t FIXED = 0;
+    const uintptr_t OVER_TARGET = 1;
+    const uintptr_t BINARY = 2;
+
+    std::vector<partition_set> ps;
+
+    switch (state)
+      {
+      case FIXED:
+	if (p.n_partitions > 64 && p.sym_groups.size () >= 4)
+	  {
+	    ps = partition_fixed_split (p, 4);
+	    split_list (ps, OVER_TARGET);
+	    break;
+	  }
+
+	/* FALLTHROUGH */
+      case OVER_TARGET:
+	ps = partition_over_target_split (p);
+	if (!ps.empty ())
+	  {
+	    split_list (ps, BINARY);
+	    break;
+	  }
+
+	/* FALLTHROUGH */
+      case BINARY:
+	ps = partition_binary_split (p);
+	if (!ps.empty ())
+	  {
+	    split_list (ps, OVER_TARGET);
+	    break;
+	  }
+
+	/* FALLTHROUGH */
+      default:
+	finalize (p);
+      }
+  }
+};
+
+/* Group cgraph nodes into equally-sized partitions.
+   It tries to keep symbols from single source file together to minimize
+   propagation of divergence.
+
+   It starts with symbols already grouped by source files.  If reasonably
+   possible it only either combines several files into one final partition,
+   or, if a file is large, split the file into several final partitions.
+
+   Intermediate representation is partition_set which contains set of
+   groups of symbols (each group corresponding to original source file) and
+   number of final partitions this partition_set should split into.
+
+   First partition_fixed_split splits partition_set into constant number of
+   partition_sets with equal number of symbols groups.  If for example there
+   are 39 source files, the resulting partition_sets will contain 10, 10,
+   10, and 9 source files.  This splitting intentionally ignores estimated
+   instruction counts to minimize propagation of divergence.
+
+   Second partition_over_target_split separates too large files and splits
+   them into individual symbols to be combined back into several smaller
+   files in next step.
+
+   Third partition_binary_split splits partition_set into two halves until
+   it should be split into only one final partition, at which point the
+   remaining symbols are joined into one final partition.
+*/
+
+void
+lto_cache_map (int n_lto_partitions, int max_partition_size)
+{
+  lto_1_to_1_map ();
+
+  std::vector<ltrans_partition> partitions;
+  for (unsigned i = 0; i < ltrans_partitions.length (); ++i)
+    {
+      ltrans_partition part = ltrans_partitions[i];
+      partitions.push_back (part);
+    }
+  ltrans_partitions.truncate (0);
+
+  partitioner_default partitioner = partitioner_default
+    (param_min_partition_size, max_partition_size);
+
+  partitioner.apply (partitions, n_lto_partitions);
+}
 
 /* Group cgraph nodes into equally-sized partitions.
 
diff --git a/gcc/lto/lto-partition.h b/gcc/lto/lto-partition.h
index 4299033e665..853f456537e 100644
--- a/gcc/lto/lto-partition.h
+++ b/gcc/lto/lto-partition.h
@@ -35,6 +35,7 @@ extern vec<ltrans_partition> ltrans_partitions;
 
 void lto_1_to_1_map (void);
 void lto_max_map (void);
+void lto_cache_map (int, int);
 void lto_balanced_map (int, int);
 void lto_promote_cross_file_statics (void);
 void free_ltrans_partitions (void);
diff --git a/gcc/lto/lto.cc b/gcc/lto/lto.cc
index 12d4c6b95fe..f77ad0ea034 100644
--- a/gcc/lto/lto.cc
+++ b/gcc/lto/lto.cc
@@ -552,6 +552,8 @@ do_whole_program_analysis (void)
   else if (flag_lto_partition == LTO_PARTITION_BALANCED)
     lto_balanced_map (param_lto_partitions,
 		      param_max_partition_size);
+  else if (flag_lto_partition == LTO_PARTITION_CACHE)
+    lto_cache_map (param_lto_partitions, param_max_partition_size);
   else
     gcc_unreachable ();
 
diff --git a/gcc/testsuite/gcc.dg/completion-2.c b/gcc/testsuite/gcc.dg/completion-2.c
index 166bfdc1424..99e65312201 100644
--- a/gcc/testsuite/gcc.dg/completion-2.c
+++ b/gcc/testsuite/gcc.dg/completion-2.c
@@ -4,6 +4,7 @@
 /* { dg-begin-multiline-output "" }
 -flto-partition=1to1
 -flto-partition=balanced
+-flto-partition=cache
 -flto-partition=max
 -flto-partition=none
 -flto-partition=one
-- 
2.42.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 6/7] lto: squash order of symbols in partitions
  2023-11-17 20:16 [PATCH 0/7] lto: Incremental LTO Michal Jires
                   ` (4 preceding siblings ...)
  2023-11-17 20:17 ` [PATCH 5/7] lto: Implement cache partitioning Michal Jires
@ 2023-11-17 20:17 ` Michal Jires
  2024-05-14 12:20   ` Jan Hubicka
  2023-11-17 20:17 ` [PATCH 7/7] lto: partition specific lto_clone_numbers Michal Jires
  6 siblings, 1 reply; 18+ messages in thread
From: Michal Jires @ 2023-11-17 20:17 UTC (permalink / raw)
  To: gcc-patches

This patch squashes order of symbols in individual partitions, so that
their relative order is conserved, but is not influenced by symbols in
other partitions.
Order of cloned symbols is set to 0. This should be fine because order
specifies order of symbols in input files, which cloned symbols are not
part of.

This is important for incremental LTO because if there is a new symbol,
it otherwise shifts order of all symbols with higher order, which would
diverge them all.

Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/ChangeLog:

	* lto-cgraph.cc (lto_output_node): Add and use order_remap.
	(lto_output_varpool_node): Likewise.
	(output_symtab): Likewise.
	* lto-streamer-out.cc (produce_asm): Likewise.
	(output_function): Likewise.
	(output_constructor): Likewise.
	(copy_function_or_variable): Likewise.
	(cmp_int): New.
	(lto_output): Generate order_remap.
	* lto-streamer.h (produce_asm): Add order_remap.
	(output_symtab): Likewise.
---
 gcc/lto-cgraph.cc       | 20 ++++++++----
 gcc/lto-streamer-out.cc | 71 +++++++++++++++++++++++++++++++++--------
 gcc/lto-streamer.h      |  5 +--
 3 files changed, 73 insertions(+), 23 deletions(-)

diff --git a/gcc/lto-cgraph.cc b/gcc/lto-cgraph.cc
index 32c0f5ac6db..a7530290fba 100644
--- a/gcc/lto-cgraph.cc
+++ b/gcc/lto-cgraph.cc
@@ -381,7 +381,8 @@ reachable_from_this_partition_p (struct cgraph_node *node, lto_symtab_encoder_t
 
 static void
 lto_output_node (struct lto_simple_output_block *ob, struct cgraph_node *node,
-		 lto_symtab_encoder_t encoder)
+		 lto_symtab_encoder_t encoder,
+		 hash_map<int_hash<int, -1, -2>, int>* order_remap)
 {
   unsigned int tag;
   struct bitpack_d bp;
@@ -405,7 +406,9 @@ lto_output_node (struct lto_simple_output_block *ob, struct cgraph_node *node,
 
   streamer_write_enum (ob->main_stream, LTO_symtab_tags, LTO_symtab_last_tag,
 		       tag);
-  streamer_write_hwi_stream (ob->main_stream, node->order);
+
+  int order = flag_wpa ? *order_remap->get (node->order) : node->order;
+  streamer_write_hwi_stream (ob->main_stream, order);
 
   /* In WPA mode, we only output part of the call-graph.  Also, we
      fake cgraph node attributes.  There are two cases that we care.
@@ -585,7 +588,8 @@ lto_output_node (struct lto_simple_output_block *ob, struct cgraph_node *node,
 
 static void
 lto_output_varpool_node (struct lto_simple_output_block *ob, varpool_node *node,
-			 lto_symtab_encoder_t encoder)
+			 lto_symtab_encoder_t encoder,
+			 hash_map<int_hash<int, -1, -2>, int>* order_remap)
 {
   bool boundary_p = !lto_symtab_encoder_in_partition_p (encoder, node);
   bool encode_initializer_p
@@ -602,7 +606,8 @@ lto_output_varpool_node (struct lto_simple_output_block *ob, varpool_node *node,
 
   streamer_write_enum (ob->main_stream, LTO_symtab_tags, LTO_symtab_last_tag,
 		       LTO_symtab_variable);
-  streamer_write_hwi_stream (ob->main_stream, node->order);
+  int order = flag_wpa ? *order_remap->get (node->order) : node->order;
+  streamer_write_hwi_stream (ob->main_stream, order);
   lto_output_var_decl_ref (ob->decl_state, ob->main_stream, node->decl);
   bp = bitpack_create (ob->main_stream);
   bp_pack_value (&bp, node->externally_visible, 1);
@@ -967,7 +972,7 @@ compute_ltrans_boundary (lto_symtab_encoder_t in_encoder)
 /* Output the part of the symtab in SET and VSET.  */
 
 void
-output_symtab (void)
+output_symtab (hash_map<int_hash<int, -1, -2>, int>* order_remap)
 {
   struct cgraph_node *node;
   struct lto_simple_output_block *ob;
@@ -994,9 +999,10 @@ output_symtab (void)
     {
       symtab_node *node = lto_symtab_encoder_deref (encoder, i);
       if (cgraph_node *cnode = dyn_cast <cgraph_node *> (node))
-        lto_output_node (ob, cnode, encoder);
+	lto_output_node (ob, cnode, encoder, order_remap);
       else
-	lto_output_varpool_node (ob, dyn_cast<varpool_node *> (node), encoder);
+	lto_output_varpool_node (ob, dyn_cast<varpool_node *> (node), encoder,
+				 order_remap);
     }
 
   /* Go over the nodes in SET again to write edges.  */
diff --git a/gcc/lto-streamer-out.cc b/gcc/lto-streamer-out.cc
index a1bbea8fc68..9448ab195d5 100644
--- a/gcc/lto-streamer-out.cc
+++ b/gcc/lto-streamer-out.cc
@@ -2212,7 +2212,8 @@ output_cfg (struct output_block *ob, struct function *fn)
    a function, set FN to the decl for that function.  */
 
 void
-produce_asm (struct output_block *ob, tree fn)
+produce_asm (struct output_block *ob, tree fn,
+	     hash_map<int_hash<int, -1, -2>, int>* order_remap)
 {
   enum lto_section_type section_type = ob->section_type;
   struct lto_function_header header;
@@ -2221,9 +2222,11 @@ produce_asm (struct output_block *ob, tree fn)
   if (section_type == LTO_section_function_body)
     {
       const char *name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (fn));
-      section_name = lto_get_section_name (section_type, name,
-					   symtab_node::get (fn)->order,
-					   NULL);
+
+      int order = symtab_node::get (fn)->order;
+      if (flag_wpa && order_remap)
+	order = *order_remap->get (order);
+      section_name = lto_get_section_name (section_type, name, order, NULL);
     }
   else
     section_name = lto_get_section_name (section_type, NULL, 0, NULL);
@@ -2405,7 +2408,8 @@ streamer_write_chain (struct output_block *ob, tree t, bool ref_p)
 /* Output the body of function NODE->DECL.  */
 
 static void
-output_function (struct cgraph_node *node)
+output_function (struct cgraph_node *node,
+		 hash_map<int_hash<int, -1, -2>, int>* order_remap)
 {
   tree function;
   struct function *fn;
@@ -2482,7 +2486,7 @@ output_function (struct cgraph_node *node)
     streamer_write_uhwi (ob, 0);
 
   /* Create a section to hold the pickled output of this function.   */
-  produce_asm (ob, function);
+  produce_asm (ob, function, order_remap);
 
   destroy_output_block (ob);
   if (streamer_dump_file)
@@ -2493,7 +2497,8 @@ output_function (struct cgraph_node *node)
 /* Output the body of function NODE->DECL.  */
 
 static void
-output_constructor (struct varpool_node *node)
+output_constructor (struct varpool_node *node,
+		    hash_map<int_hash<int, -1, -2>, int>* order_remap)
 {
   tree var = node->decl;
   struct output_block *ob;
@@ -2515,7 +2520,7 @@ output_constructor (struct varpool_node *node)
   stream_write_tree (ob, DECL_INITIAL (var), true);
 
   /* Create a section to hold the pickled output of this function.   */
-  produce_asm (ob, var);
+  produce_asm (ob, var, order_remap);
 
   destroy_output_block (ob);
   if (streamer_dump_file)
@@ -2576,15 +2581,18 @@ lto_output_toplevel_asms (void)
 /* Copy the function body or variable constructor of NODE without deserializing. */
 
 static void
-copy_function_or_variable (struct symtab_node *node)
+copy_function_or_variable (struct symtab_node *node,
+			   hash_map<int_hash<int, -1, -2>, int>* order_remap)
 {
   tree function = node->decl;
   struct lto_file_decl_data *file_data = node->lto_file_data;
   const char *data;
   size_t len;
   const char *name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (function));
+
+  int order = flag_wpa ? *order_remap->get (node->order) : node->order;
   char *section_name =
-    lto_get_section_name (LTO_section_function_body, name, node->order, NULL);
+    lto_get_section_name (LTO_section_function_body, name, order, NULL);
   size_t i, j;
   struct lto_in_decl_state *in_state;
   struct lto_out_decl_state *out_state = lto_get_out_decl_state ();
@@ -2729,6 +2737,15 @@ cmp_symbol_files (const void *pn1, const void *pn2, void *id_map_)
   return n1->order - n2->order;
 }
 
+/* Compare ints, callback for qsort.  */
+static int
+cmp_int (const void *a, const void *b)
+{
+  int ia = *(int const*) a;
+  int ib = *(int const*) b;
+  return ia - ib;
+}
+
 /* Main entry point from the pass manager.  */
 
 void
@@ -2741,6 +2758,32 @@ lto_output (void)
   lto_symtab_encoder_t encoder = lto_get_out_decl_state ()->symtab_node_encoder;
   auto_vec<symtab_node *> symbols_to_copy;
 
+  hash_map<int_hash<int, -1, -2>, int> order_remap;
+  if (flag_wpa)
+    {
+      /* Remap order so that it does not depend on symbols outside of
+	 partition.  */
+      auto_vec<int> orders;
+
+      n_nodes = lto_symtab_encoder_size (encoder);
+      for (i = 0; i < n_nodes; i++)
+	{
+	  symtab_node *snode = lto_symtab_encoder_deref (encoder, i);
+	  if (cgraph_node *cnode = dyn_cast <cgraph_node *> (snode))
+	    {
+	      if (cnode->clone_of)
+		{
+		  order_remap.put (snode->order, 0);
+		  continue;
+		}
+	    }
+	  orders.safe_push (snode->order);
+	}
+      orders.qsort (cmp_int);
+      for (i = 0; i < orders.length (); i++)
+	order_remap.put (orders[i], i);
+    }
+
   prune_offload_funcs ();
 
   if (flag_checking)
@@ -2817,14 +2860,14 @@ lto_output (void)
 		 at WPA time.  */
 	      || DECL_ARGUMENTS (cnode->decl)
 	      || cnode->declare_variant_alt))
-	output_function (cnode);
+	output_function (cnode, &order_remap);
       else if ((vnode = dyn_cast <varpool_node *> (snode))
 	       && (DECL_INITIAL (vnode->decl) != error_mark_node
 		   || (!flag_wpa
 		       && flag_incremental_link != INCREMENTAL_LINK_LTO)))
-	output_constructor (vnode);
+	output_constructor (vnode, &order_remap);
       else
-	copy_function_or_variable (snode);
+	copy_function_or_variable (snode, &order_remap);
       gcc_assert (lto_get_out_decl_state () == decl_state);
       lto_pop_out_decl_state ();
       lto_record_function_out_decl_state (snode->decl, decl_state);
@@ -2834,7 +2877,7 @@ lto_output (void)
      be done now to make sure that all the statements in every function
      have been renumbered so that edges can be associated with call
      statements using the statement UIDs.  */
-  output_symtab ();
+  output_symtab (&order_remap);
 
   output_offload_tables ();
 
diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index 0556b34c837..3363e6f9e61 100644
--- a/gcc/lto-streamer.h
+++ b/gcc/lto-streamer.h
@@ -888,7 +888,8 @@ extern void lto_output_fn_decl_ref (struct lto_out_decl_state *,
 extern tree lto_input_var_decl_ref (lto_input_block *, lto_file_decl_data *);
 extern tree lto_input_fn_decl_ref (lto_input_block *, lto_file_decl_data *);
 extern void lto_output_toplevel_asms (void);
-extern void produce_asm (struct output_block *ob, tree fn);
+extern void produce_asm (struct output_block *ob, tree fn,
+			 hash_map<int_hash<int, -1, -2>, int>* order_remap = 0);
 extern void lto_output ();
 extern void produce_asm_for_decls ();
 void lto_output_decl_state_streams (struct output_block *,
@@ -919,7 +920,7 @@ void lto_set_symtab_encoder_in_partition (lto_symtab_encoder_t,
 
 bool lto_symtab_encoder_encode_initializer_p (lto_symtab_encoder_t,
 					      varpool_node *);
-void output_symtab (void);
+void output_symtab (hash_map<int_hash<int, -1, -2>, int>*);
 void input_symtab (void);
 void output_offload_tables (void);
 void input_offload_tables (bool);
-- 
2.42.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 7/7] lto: partition specific lto_clone_numbers
  2023-11-17 20:16 [PATCH 0/7] lto: Incremental LTO Michal Jires
                   ` (5 preceding siblings ...)
  2023-11-17 20:17 ` [PATCH 6/7] lto: squash order of symbols in partitions Michal Jires
@ 2023-11-17 20:17 ` Michal Jires
  2024-05-14 12:11   ` Jan Hubicka
  6 siblings, 1 reply; 18+ messages in thread
From: Michal Jires @ 2023-11-17 20:17 UTC (permalink / raw)
  To: gcc-patches

Replaces "lto_priv.$clone_number" by
"lto_priv.$partition_hash.$partition_specific_clone_number".
To reduce divergence for incremental LTO.

Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/lto/ChangeLog:

	* lto-partition.cc (set_clone_partition_name_checksum): New.
	(CHECKSUM_STRING): New.
	(privatize_symbol_name_1): Use partition hash for lto_priv.
	(lto_promote_cross_file_statics): Use set_clone_partition_name_checksum.
	(lto_promote_statics_nonwpa): Changed clone_map type.
---
 gcc/lto/lto-partition.cc | 49 +++++++++++++++++++++++++++++++++++-----
 1 file changed, 43 insertions(+), 6 deletions(-)

diff --git a/gcc/lto/lto-partition.cc b/gcc/lto/lto-partition.cc
index eb31ecba0d3..a2ce24eea23 100644
--- a/gcc/lto/lto-partition.cc
+++ b/gcc/lto/lto-partition.cc
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-fnsummary.h"
 #include "lto-partition.h"
 #include "sreal.h"
+#include "md5.h"
 
 #include <limits>
 #include <vector>
@@ -1516,8 +1517,36 @@ validize_symbol_for_target (symtab_node *node)
     }
 }
 
-/* Maps symbol names to unique lto clone counters.  */
-static hash_map<const char *, unsigned> *lto_clone_numbers;
+/* Maps symbol names with partition checksum to unique lto clone counters.  */
+using clone_map = hash_map<pair_hash<nofree_string_hash,
+				     int_hash_base<uint64_t>>, unsigned>;
+static clone_map *lto_clone_numbers;
+uint64_t current_partition_checksum = 0;
+
+/* Computes a quick checksum to distinguish partitions of clone numbers.  */
+void
+set_clone_partition_name_checksum (ltrans_partition part)
+{
+#define CHECKSUM_STRING(FOO) md5_process_bytes ((FOO), strlen (FOO), &ctx)
+  struct md5_ctx ctx;
+  md5_init_ctx (&ctx);
+
+  CHECKSUM_STRING (part->name);
+
+  lto_symtab_encoder_iterator lsei;
+  lto_symtab_encoder_t encoder = part->encoder;
+
+  for (lsei = lsei_start (encoder); !lsei_end_p (lsei); lsei_next (&lsei))
+    {
+      symtab_node *node = lsei_node (lsei);
+      CHECKSUM_STRING (node->name ());
+    }
+
+  uint64_t checksum[2];
+  md5_finish_ctx (&ctx, checksum);
+  current_partition_checksum = checksum[0];
+#undef CHECKSUM_STRING
+}
 
 /* Helper for privatize_symbol_name.  Mangle NODE symbol name
    represented by DECL.  */
@@ -1531,10 +1560,16 @@ privatize_symbol_name_1 (symtab_node *node, tree decl)
     return false;
 
   const char *name = maybe_rewrite_identifier (name0);
-  unsigned &clone_number = lto_clone_numbers->get_or_insert (name);
+
+  unsigned &clone_number = lto_clone_numbers->get_or_insert (
+    std::pair<const char*, uint64_t> {name, current_partition_checksum});
+
+  char lto_priv[32];
+  sprintf (lto_priv, "lto_priv.%lu", current_partition_checksum);
+
   symtab->change_decl_assembler_name (decl,
 				      clone_function_name (
-					  name, "lto_priv", clone_number));
+					  name, lto_priv, clone_number));
   clone_number++;
 
   if (node->lto_file_data)
@@ -1735,11 +1770,13 @@ lto_promote_cross_file_statics (void)
       part->encoder = compute_ltrans_boundary (part->encoder);
     }
 
-  lto_clone_numbers = new hash_map<const char *, unsigned>;
+  lto_clone_numbers = new clone_map;
 
   /* Look at boundaries and promote symbols as needed.  */
   for (i = 0; i < n_sets; i++)
     {
+      set_clone_partition_name_checksum (ltrans_partitions[i]);
+
       lto_symtab_encoder_iterator lsei;
       lto_symtab_encoder_t encoder = ltrans_partitions[i]->encoder;
 
@@ -1778,7 +1815,7 @@ lto_promote_statics_nonwpa (void)
 {
   symtab_node *node;
 
-  lto_clone_numbers = new hash_map<const char *, unsigned>;
+  lto_clone_numbers = new clone_map;
   FOR_EACH_SYMBOL (node)
     {
       rename_statics (NULL, node);
-- 
2.42.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/7] lto: Skip flag OPT_fltrans_output_list_.
  2023-11-17 20:16 ` [PATCH 1/7] lto: Skip flag OPT_fltrans_output_list_ Michal Jires
@ 2023-12-29 21:16   ` Jan Hubicka
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Hubicka @ 2023-12-29 21:16 UTC (permalink / raw)
  To: Michal Jires; +Cc: gcc-patches

Hi,
> Bootstrapped/regtested on x86_64-pc-linux-gnu
> 
> gcc/ChangeLog:
> 
> 	* lto-opts.cc (lto_write_options): Skip OPT_fltrans_output_list_.
OK,
thanks,
Honza
> ---
>  gcc/lto-opts.cc | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/gcc/lto-opts.cc b/gcc/lto-opts.cc
> index c9bee9d4197..0451e290c75 100644
> --- a/gcc/lto-opts.cc
> +++ b/gcc/lto-opts.cc
> @@ -152,6 +152,7 @@ lto_write_options (void)
>  	case OPT_fprofile_prefix_map_:
>  	case OPT_fcanon_prefix_map:
>  	case OPT_fwhole_program:
> +	case OPT_fltrans_output_list_:
>  	  continue;
>  
>  	default:
> -- 
> 2.42.1
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/7] lto: Remove random_seed from section name.
  2023-11-17 20:16 ` [PATCH 2/7] lto: Remove random_seed from section name Michal Jires
@ 2023-12-29 21:17   ` Jan Hubicka
  2024-01-09 16:49     ` [PATCH 2/7 v2] " Michal Jires
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Hubicka @ 2023-12-29 21:17 UTC (permalink / raw)
  To: Michal Jires; +Cc: gcc-patches

> Bootstrapped/regtested on x86_64-pc-linux-gnu
> 
> gcc/ChangeLog:
> 
> 	* lto-streamer.cc (lto_get_section_name): Remove random_seed in WPA.

This is also OK. (since it lacks explanation - the random suffixes are
added for ld -r to work.  This never happens between WPA and ltrans, so
they only consume extra space and confuse the ltrans cache).
> ---
>  gcc/lto-streamer.cc | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/lto-streamer.cc b/gcc/lto-streamer.cc
> index 4968fd13413..53275e32618 100644
> --- a/gcc/lto-streamer.cc
> +++ b/gcc/lto-streamer.cc
> @@ -132,11 +132,17 @@ lto_get_section_name (int section_type, const char *name,
>       doesn't confuse the reader with merged sections.
>  
>       For options don't add a ID, the option reader cannot deal with them
> -     and merging should be ok here. */
> +     and merging should be ok here.
> +
> +     WPA output is sent to LTRANS directly inside of lto-wrapper, so name
> +     uniqueness for external tools is not needed.
> +     Randomness would inhibit incremental LTO.  */
>    if (section_type == LTO_section_opts)
>      strcpy (post, "");
>    else if (f != NULL) 
>      sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, f->id);
> +  else if (flag_wpa)
> +    strcpy (post, ".0");
Con't post be just empty string?
>    else
>      sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, get_random_seed (false)); 
>    char *res = concat (section_name_prefix, sep, add, post, NULL);
> -- 
> 2.42.1
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/7] Lockfile.
  2023-11-17 20:17 ` [PATCH 3/7] Lockfile Michal Jires
@ 2023-12-29 21:23   ` Jan Hubicka
  2024-01-09 17:10     ` Michal Jires
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Hubicka @ 2023-12-29 21:23 UTC (permalink / raw)
  To: Michal Jires; +Cc: gcc-patches

Hi,
> This patch implements lockfile used for incremental LTO.
> 
> Bootstrapped/regtested on x86_64-pc-linux-gnu
> 
> gcc/ChangeLog:
> 
> 	* Makefile.in: Add lockfile.o.
> 	* lockfile.cc: New file.
> 	* lockfile.h: New file.

I can't approve it, but overall it looks good to me.
We also have locking in gcov-io, but it is probably not that practical
to keep these shared, since gcov-io is also built into runtime.

You do not implement GCOV_LINKED_WITH_LOCKING patch, does locking work
with mingw? Or we only build gcc with cygwin emulation layer these days?

Honza
> ---
>  gcc/Makefile.in |   5 +-
>  gcc/lockfile.cc | 136 ++++++++++++++++++++++++++++++++++++++++++++++++
>  gcc/lockfile.h  |  85 ++++++++++++++++++++++++++++++
>  3 files changed, 224 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/lockfile.cc
>  create mode 100644 gcc/lockfile.h
> 
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 7b7a4ff789a..2c527245c81 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1831,7 +1831,7 @@ ALL_HOST_BACKEND_OBJS = $(GCC_OBJS) $(OBJS) $(OBJS-libcommon) \
>    $(OBJS-libcommon-target) main.o c-family/cppspec.o \
>    $(COLLECT2_OBJS) $(EXTRA_GCC_OBJS) $(GCOV_OBJS) $(GCOV_DUMP_OBJS) \
>    $(GCOV_TOOL_OBJS) $(GENGTYPE_OBJS) gcc-ar.o gcc-nm.o gcc-ranlib.o \
> -  lto-wrapper.o collect-utils.o
> +  lto-wrapper.o collect-utils.o lockfile.o
>  
>  # for anything that is shared use the cc1plus profile data, as that
>  # is likely the most exercised during the build
> @@ -2359,7 +2359,8 @@ collect2$(exeext): $(COLLECT2_OBJS) $(LIBDEPS)
>  CFLAGS-collect2.o += -DTARGET_MACHINE=\"$(target_noncanonical)\" \
>  	@TARGET_SYSTEM_ROOT_DEFINE@
>  
> -LTO_WRAPPER_OBJS = lto-wrapper.o collect-utils.o ggc-none.o
> +LTO_WRAPPER_OBJS = lto-wrapper.o collect-utils.o ggc-none.o lockfile.o
> +
>  lto-wrapper$(exeext): $(LTO_WRAPPER_OBJS) libcommon-target.a $(LIBDEPS)
>  	+$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o T$@ \
>  	   $(LTO_WRAPPER_OBJS) libcommon-target.a $(LIBS)
> diff --git a/gcc/lockfile.cc b/gcc/lockfile.cc
> new file mode 100644
> index 00000000000..9440e8938f3
> --- /dev/null
> +++ b/gcc/lockfile.cc
> @@ -0,0 +1,136 @@
> +/* File locking.
> +   Copyright (C) 2009-2023 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +#include "config.h"
> +#include "system.h"
> +
> +#include "lockfile.h"
> +
> +
> +/* Unique write lock.  No other lock can be held on this lockfile.
> +   Blocking call.  */
> +int
> +lockfile::lock_write ()
> +{
> +  fd = open (filename.c_str (), O_RDWR | O_CREAT, 0666);
> +  if (fd < 0)
> +    return -1;
> +
> +#if HAVE_FCNTL_H
> +  struct flock s_flock;
> +
> +  s_flock.l_whence = SEEK_SET;
> +  s_flock.l_start = 0;
> +  s_flock.l_len = 0;
> +  s_flock.l_pid = getpid ();
> +  s_flock.l_type = F_WRLCK;
> +
> +  while (fcntl (fd, F_SETLKW, &s_flock) && errno == EINTR)
> +    continue;
> +#endif
> +  return 0;
> +}
> +
> +/* Unique write lock.  No other lock can be held on this lockfile.
> +   Only locks if this filelock is not locked by any other process.
> +   Return whether locking was successful.  */
> +int
> +lockfile::try_lock_write ()
> +{
> +  fd = open (filename.c_str (), O_RDWR | O_CREAT, 0666);
> +  if (fd < 0)
> +    return -1;
> +
> +#if HAVE_FCNTL_H
> +  struct flock s_flock;
> +
> +  s_flock.l_whence = SEEK_SET;
> +  s_flock.l_start = 0;
> +  s_flock.l_len = 0;
> +  s_flock.l_pid = getpid ();
> +  s_flock.l_type = F_WRLCK;
> +
> +  if (fcntl (fd, F_SETLK, &s_flock) == -1)
> +    {
> +      close (fd);
> +      fd = -1;
> +      return 1;
> +    }
> +#endif
> +  return 0;
> +}
> +
> +/* Shared read lock.  Only read lock can be held concurrently.
> +   If write lock is already held by this process, it will be
> +   changed to read lock.
> +   Blocking call.  */
> +int
> +lockfile::lock_read ()
> +{
> +  fd = open (filename.c_str (), O_RDWR | O_CREAT, 0666);
> +  if (fd < 0)
> +    return -1;
> +
> +#if HAVE_FCNTL_H
> +  struct flock s_flock;
> +
> +  s_flock.l_whence = SEEK_SET;
> +  s_flock.l_start = 0;
> +  s_flock.l_len = 0;
> +  s_flock.l_pid = getpid ();
> +  s_flock.l_type = F_RDLCK;
> +
> +  while (fcntl (fd, F_SETLKW, &s_flock) && errno == EINTR)
> +    continue;
> +#endif
> +  return 0;
> +}
> +
> +/* Unlock all previously placed locks.  */
> +void
> +lockfile::unlock ()
> +{
> +  if (fd < 0)
> +    {
> +#if HAVE_FCNTL_H
> +      struct flock s_flock;
> +
> +      s_flock.l_whence = SEEK_SET;
> +      s_flock.l_start = 0;
> +      s_flock.l_len = 0;
> +      s_flock.l_pid = getpid ();
> +      s_flock.l_type = F_UNLCK;
> +
> +      fcntl (fd, F_SETLK, &s_flock);
> +#endif
> +      close (fd);
> +      fd = -1;
> +    }
> +}
> +
> +/* Are lockfiles supported?  */
> +bool
> +lockfile::lockfile_supported ()
> +{
> +#if HAVE_FCNTL_H
> +  return true;
> +#else
> +  return false;
> +#endif
> +}
> diff --git a/gcc/lockfile.h b/gcc/lockfile.h
> new file mode 100644
> index 00000000000..afcbaf599c1
> --- /dev/null
> +++ b/gcc/lockfile.h
> @@ -0,0 +1,85 @@
> +/* File locking.
> +   Copyright (C) 2009-2023 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +#ifndef LOCKFILE_H
> +#define LOCKFILE_H
> +
> +#include <string>
> +
> +/* Used to synchronize across multiple processes.  */
> +class lockfile {
> +public:
> +  /* Default constructor.  */
> +  lockfile (): fd (-1)
> +  {}
> +  /* Intended constructor for use.  Filename should not be used for anything
> +     other than locking to prevent unintentional unlock.  */
> +  lockfile (std::string filename): lockfile ()
> +  {
> +    this->filename = std::move (filename);
> +  }
> +  lockfile (lockfile const& o): lockfile (o.filename)
> +  {}
> +
> +  void operator=(lockfile o)
> +  {
> +    unlock ();
> +    this->filename = o.filename;
> +    this->fd = o.fd;
> +    o.fd = -1;
> +  }
> +
> +  /* Unique write lock.  No other lock can be held on this lockfile.
> +     Blocking call.  */
> +  int
> +  lock_write ();
> +
> +  /* Unique write lock.  No other lock can be held on this lockfile.
> +     Only locks if this filelock is not locked by any other process.
> +     Return whether locking was successful.  */
> +  int
> +  try_lock_write ();
> +
> +  /* Shared read lock.  Only read lock can be held concurrently.
> +     If write lock is already held by this process, it will be
> +     changed to read lock.
> +     Blocking call.  */
> +  int
> +  lock_read ();
> +
> +  /* Unlock all previously placed locks.  */
> +  void
> +  unlock ();
> +
> +  /* Returns whether any lock is held.  */
> +  bool
> +  locked ()
> +  {
> +    return fd < 0;
> +  }
> +
> +  /* Are lockfiles supported?  */
> +  static bool
> +  lockfile_supported ();
> +private:
> +  std::string filename;
> +  int fd;
> +};
> +
> +#endif
> -- 
> 2.42.1
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 2/7 v2] lto: Remove random_seed from section name.
  2023-12-29 21:17   ` Jan Hubicka
@ 2024-01-09 16:49     ` Michal Jires
       [not found]       ` <c480760c-f167-4e60-a27e-52bebdd1351b@suse.cz>
  0 siblings, 1 reply; 18+ messages in thread
From: Michal Jires @ 2024-01-09 16:49 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches

This patch removes suffixes from section names during LTO linking.

These suffixes were originally added for ld -r to work (PR lto/44992).
They were added to all LTO object files, but are only useful before WPA.
After that they waste space, and if kept random, make LTO caching impossible.

Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/ChangeLog:

	* lto-streamer.cc (lto_get_section_name): Remove suffixes after WPA.

gcc/lto/ChangeLog:

	* lto-common.cc (lto_section_with_id): Dont load suffix during LTRANS.
---
 gcc/lto-streamer.cc   | 11 +++++++++--
 gcc/lto/lto-common.cc |  7 +++++++
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/gcc/lto-streamer.cc b/gcc/lto-streamer.cc
index 8032bbf7108..61b5f8ed4dc 100644
--- a/gcc/lto-streamer.cc
+++ b/gcc/lto-streamer.cc
@@ -132,11 +132,18 @@ lto_get_section_name (int section_type, const char *name,
      doesn't confuse the reader with merged sections.
 
      For options don't add a ID, the option reader cannot deal with them
-     and merging should be ok here. */
-  if (section_type == LTO_section_opts)
+     and merging should be ok here.
+
+     LTRANS files (output of wpa, input and output of ltrans) are handled
+     directly inside of linker/lto-wrapper, so name uniqueness for external
+     tools is not needed.
+     Randomness would inhibit incremental LTO.  */
+  if (section_type == LTO_section_opts || flag_ltrans)
     strcpy (post, "");
   else if (f != NULL) 
     sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, f->id);
+  else if (flag_wpa)
+    strcpy (post, "");
   else
     sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, get_random_seed (false)); 
   char *res = concat (section_name_prefix, sep, add, post, NULL);
diff --git a/gcc/lto/lto-common.cc b/gcc/lto/lto-common.cc
index 11e7d63f1be..44aeeddf46f 100644
--- a/gcc/lto/lto-common.cc
+++ b/gcc/lto/lto-common.cc
@@ -2174,6 +2174,13 @@ lto_section_with_id (const char *name, unsigned HOST_WIDE_INT *id)
 
   if (strncmp (name, section_name_prefix, strlen (section_name_prefix)))
     return 0;
+
+  if (flag_ltrans)
+    {
+      *id = 0;
+      return 1;
+    }
+
   s = strrchr (name, '.');
   if (!s)
     return 0;
-- 
2.43.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/7] Lockfile.
  2023-12-29 21:23   ` Jan Hubicka
@ 2024-01-09 17:10     ` Michal Jires
  0 siblings, 0 replies; 18+ messages in thread
From: Michal Jires @ 2024-01-09 17:10 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches

Hi,
> You do not implement GCOV_LINKED_WITH_LOCKING patch, does locking work
> with mingw? Or we only build gcc with cygwin emulation layer these days?

I tried to test _locking implementation with both mingw and msys2, in both
cases fcntl was present and _locking was not. Admittedly I was unable to
finish bootstrap without errors, so I might have been doing something wrong.

So I didn't include _locking implementation, because I was unable to test it,
and I am unsure whether we even have supported host which would require it.

Michal

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Fwd: [PATCH 2/7 v2] lto: Remove random_seed from section name.
       [not found]       ` <c480760c-f167-4e60-a27e-52bebdd1351b@suse.cz>
@ 2024-05-14 11:28         ` Jan Hubicka
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Hubicka @ 2024-05-14 11:28 UTC (permalink / raw)
  To: Michal Jireš, gcc-patches

> This patch removes suffixes from section names during LTO linking.
> 
> These suffixes were originally added for ld -r to work (PR lto/44992).
> They were added to all LTO object files, but are only useful before WPA.
> After that they waste space, and if kept random, make LTO caching
> impossible.
> 
> Bootstrapped/regtested on x86_64-pc-linux-gnu
> 
> gcc/ChangeLog:
> 
> 	* lto-streamer.cc (lto_get_section_name): Remove suffixes after WPA.
> 
> gcc/lto/ChangeLog:
> 
> 	* lto-common.cc (lto_section_with_id): Dont load suffix during LTRANS.
OK,
thanks
Honza

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/7] lto: Implement ltrans cache
  2023-11-17 20:17 ` [PATCH 4/7] lto: Implement ltrans cache Michal Jires
@ 2024-05-14 11:51   ` Jan Hubicka
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Hubicka @ 2024-05-14 11:51 UTC (permalink / raw)
  To: Michal Jires; +Cc: gcc-patches

> This patch implements Incremental LTO as ltrans cache.
> 
> The cache is active when directory $GCC_LTRANS_CACHE is specified and exists.
> Stored are pairs of ltrans input/output files and input file hash.
> File locking is used to allow multiple GCC instances to use to same cache.
> 
> Bootstrapped/regtested on x86_64-pc-linux-gnu
> 
> gcc/ChangeLog:
> 
> 	* Makefile.in: Add lto-ltrans-cache.o.
> 	* lto-wrapper.cc: Use ltrans cache.
> 	* lto-ltrans-cache.cc: New file.
> 	* lto-ltrans-cache.h: New file.
> diff --git a/gcc/lto-ltrans-cache.cc b/gcc/lto-ltrans-cache.cc
> new file mode 100644
> index 00000000000..0d43e548fb3
> --- /dev/null
> +++ b/gcc/lto-ltrans-cache.cc
> @@ -0,0 +1,407 @@
> +/* File caching.
> +   Copyright (C) 2009-2023 Free Software Foundation, Inc.

Probably copyright should be 2023-2024
> +const md5_checksum_t INVALID_CHECKSUM = {
Maybe static here? Officially there should be comment before the
function.
> +  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> +};
> +
> +/* Computes checksum for given file, returns INVALID_CHECKSUM if not possible.
> + */
comment would look more regular if linebreak is made before possible :)
> +
> +/* Checks identity of two files byte by byte.  */
> +static bool
> +files_identical (char const *first_filename, char const *second_filename)
> +{
> +  FILE *f_first = fopen (first_filename, "rb");
> +  if (!f_first)
> +    return false;
> +
> +  FILE *f_second = fopen (second_filename, "rb");
> +  if (!f_second)
> +    {
> +      fclose (f_first);
> +      return false;
> +    }
> +
> +  bool ret = true;
> +
> +  for (;;)
> +    {
> +      int c1, c2;
> +      c1 = fgetc (f_first);
> +      c2 = fgetc (f_second);

I guess reading by fgetc may get quite ineffecient here.  Comparing
bigger blocks is probably going to be faster.  We could also
(incrementally) use mmap where supported.
> +
> +/* Contructor of cache item.  */
> +ltrans_file_cache::item::item (std::string input, std::string output,
> +  md5_checksum_t input_checksum, uint32_t last_used):
Here should be enough whitespace so md5_checksum appears just after ( in
line above
                                  md5_checksum_t input_checksum, uint32_t last_used):
> +  input (std::move (input)), output (std::move (output)),
> +  input_checksum (input_checksum), last_used (last_used)
> +{
> +  lock = lockfile (this->input + ".lock");
> +}
> +/* Destructor of cache item.  */
> +ltrans_file_cache::item::~item ()
> +{
> +  lock.unlock ();
> +}
> +
> +/* Reads next cache item from cachedata file.
> +   Adds `dir/` prefix to filenames.  */
> +static ltrans_file_cache::item*
> +read_cache_item (FILE* f, const char* dir)
> +{
> +  md5_checksum_t checksum;
> +  uint32_t last_used;
> +
> +  if (fread (&checksum, 1, checksum.size (), f) != checksum.size ())
> +    return NULL;
> +  if (fread (&last_used, sizeof (last_used), 1, f) != 1)
> +    return NULL;
> +
> +  std::vector<char> input (strlen (dir));
> +  memcpy (&input[0], dir, input.size ());
> +  input.push_back ('/');
Why this is not std::string?
> +  /* Loads data about previously cached items from cachedata file.
> +
> +     Must be called with creation_lock or deletion_lock held to
> +     prevent data race.  */
> +  void
> +  load_cache ();
There should be no newline between type and name.  It is there only when
defining function (so it is easy to use old-school grep to find where
function is defined.)

Looks good to me otherwise.
Honza

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/7] lto: Implement cache partitioning
  2023-11-17 20:17 ` [PATCH 5/7] lto: Implement cache partitioning Michal Jires
@ 2024-05-14 12:10   ` Jan Hubicka
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Hubicka @ 2024-05-14 12:10 UTC (permalink / raw)
  To: Michal Jires; +Cc: gcc-patches

> gcc/ChangeLog:
> 
> 	* common.opt: Add cache partitioning.
> 	* flag-types.h (enum lto_partition_model): Likewise.
> 
> gcc/lto/ChangeLog:
> 
> 	* lto-partition.cc (new_partition): Use new_partition_no_push.
> 	(new_partition_no_push): New.
> 	(free_ltrans_partition): New.
> 	(free_ltrans_partitions): Use free_ltrans_partition.
> 	(join_partitions): New.
> 	(split_partition_into_nodes): New.
> 	(is_partition_reorder): New.
> 	(class partition_set): New.
> 	(distribute_n_partitions): New.
> 	(partition_over_target_split): New.
> 	(partition_binary_split): New.
> 	(partition_fixed_split): New.
> 	(class partitioner_base): New.
> 	(class partitioner_default): New.
> 	(lto_cache_map): New.
> 	* lto-partition.h (lto_cache_map): New.
> 	* lto.cc (do_whole_program_analysis): Use lto_cache_map.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.dg/completion-2.c: Add -flto-partition=cache.
> +/* Free memory used by ltrans partition.
> +   Encoder can be kept to be freed after streaming.  */
> +static void
> +free_ltrans_partition (ltrans_partition part, bool delete_encoder)
> +  {
No two spaces here (indent everything to left by 2).
> +    if (part->initializers_visited)
> +      delete part->initializers_visited;
> +    if (delete_encoder)
> +      lto_symtab_encoder_delete (part->encoder);
> +    free (part);
It would make sense to turn this into C++ and use destructors
(incrementally).

OK,
Honza

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 7/7] lto: partition specific lto_clone_numbers
  2023-11-17 20:17 ` [PATCH 7/7] lto: partition specific lto_clone_numbers Michal Jires
@ 2024-05-14 12:11   ` Jan Hubicka
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Hubicka @ 2024-05-14 12:11 UTC (permalink / raw)
  To: Michal Jires; +Cc: gcc-patches

> Replaces "lto_priv.$clone_number" by
> "lto_priv.$partition_hash.$partition_specific_clone_number".
> To reduce divergence for incremental LTO.
> 
> Bootstrapped/regtested on x86_64-pc-linux-gnu
OK,
thanks!
Honza
> 
> gcc/lto/ChangeLog:
> 
> 	* lto-partition.cc (set_clone_partition_name_checksum): New.
> 	(CHECKSUM_STRING): New.
> 	(privatize_symbol_name_1): Use partition hash for lto_priv.
> 	(lto_promote_cross_file_statics): Use set_clone_partition_name_checksum.
> 	(lto_promote_statics_nonwpa): Changed clone_map type.
> ---
>  gcc/lto/lto-partition.cc | 49 +++++++++++++++++++++++++++++++++++-----
>  1 file changed, 43 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/lto/lto-partition.cc b/gcc/lto/lto-partition.cc
> index eb31ecba0d3..a2ce24eea23 100644
> --- a/gcc/lto/lto-partition.cc
> +++ b/gcc/lto/lto-partition.cc
> @@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "ipa-fnsummary.h"
>  #include "lto-partition.h"
>  #include "sreal.h"
> +#include "md5.h"
>  
>  #include <limits>
>  #include <vector>
> @@ -1516,8 +1517,36 @@ validize_symbol_for_target (symtab_node *node)
>      }
>  }
>  
> -/* Maps symbol names to unique lto clone counters.  */
> -static hash_map<const char *, unsigned> *lto_clone_numbers;
> +/* Maps symbol names with partition checksum to unique lto clone counters.  */
> +using clone_map = hash_map<pair_hash<nofree_string_hash,
> +				     int_hash_base<uint64_t>>, unsigned>;
> +static clone_map *lto_clone_numbers;
> +uint64_t current_partition_checksum = 0;
> +
> +/* Computes a quick checksum to distinguish partitions of clone numbers.  */
> +void
> +set_clone_partition_name_checksum (ltrans_partition part)
> +{
> +#define CHECKSUM_STRING(FOO) md5_process_bytes ((FOO), strlen (FOO), &ctx)
> +  struct md5_ctx ctx;
> +  md5_init_ctx (&ctx);
> +
> +  CHECKSUM_STRING (part->name);
> +
> +  lto_symtab_encoder_iterator lsei;
> +  lto_symtab_encoder_t encoder = part->encoder;
> +
> +  for (lsei = lsei_start (encoder); !lsei_end_p (lsei); lsei_next (&lsei))
> +    {
> +      symtab_node *node = lsei_node (lsei);
> +      CHECKSUM_STRING (node->name ());
> +    }
> +
> +  uint64_t checksum[2];
> +  md5_finish_ctx (&ctx, checksum);
> +  current_partition_checksum = checksum[0];
> +#undef CHECKSUM_STRING
> +}
>  
>  /* Helper for privatize_symbol_name.  Mangle NODE symbol name
>     represented by DECL.  */
> @@ -1531,10 +1560,16 @@ privatize_symbol_name_1 (symtab_node *node, tree decl)
>      return false;
>  
>    const char *name = maybe_rewrite_identifier (name0);
> -  unsigned &clone_number = lto_clone_numbers->get_or_insert (name);
> +
> +  unsigned &clone_number = lto_clone_numbers->get_or_insert (
> +    std::pair<const char*, uint64_t> {name, current_partition_checksum});
> +
> +  char lto_priv[32];
> +  sprintf (lto_priv, "lto_priv.%lu", current_partition_checksum);
> +
>    symtab->change_decl_assembler_name (decl,
>  				      clone_function_name (
> -					  name, "lto_priv", clone_number));
> +					  name, lto_priv, clone_number));
>    clone_number++;
>  
>    if (node->lto_file_data)
> @@ -1735,11 +1770,13 @@ lto_promote_cross_file_statics (void)
>        part->encoder = compute_ltrans_boundary (part->encoder);
>      }
>  
> -  lto_clone_numbers = new hash_map<const char *, unsigned>;
> +  lto_clone_numbers = new clone_map;
>  
>    /* Look at boundaries and promote symbols as needed.  */
>    for (i = 0; i < n_sets; i++)
>      {
> +      set_clone_partition_name_checksum (ltrans_partitions[i]);
> +
>        lto_symtab_encoder_iterator lsei;
>        lto_symtab_encoder_t encoder = ltrans_partitions[i]->encoder;
>  
> @@ -1778,7 +1815,7 @@ lto_promote_statics_nonwpa (void)
>  {
>    symtab_node *node;
>  
> -  lto_clone_numbers = new hash_map<const char *, unsigned>;
> +  lto_clone_numbers = new clone_map;
>    FOR_EACH_SYMBOL (node)
>      {
>        rename_statics (NULL, node);
> -- 
> 2.42.1
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/7] lto: squash order of symbols in partitions
  2023-11-17 20:17 ` [PATCH 6/7] lto: squash order of symbols in partitions Michal Jires
@ 2024-05-14 12:20   ` Jan Hubicka
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Hubicka @ 2024-05-14 12:20 UTC (permalink / raw)
  To: Michal Jires; +Cc: gcc-patches

> This patch squashes order of symbols in individual partitions, so that
> their relative order is conserved, but is not influenced by symbols in
> other partitions.
> Order of cloned symbols is set to 0. This should be fine because order
> specifies order of symbols in input files, which cloned symbols are not
> part of.

The current use of order is somewhat broken (after converting cgraph to
C++, that is a while).
The original code was setting order at the time function was finalized,
which made them to be output in same order as the bodies appear in
source code (with -fno-toplevel-reorder build at least).

With this logic the clones should have same order as originals, so they
appear next to tihem.

Later initialization of order was moved to register_symbol that
is king of wrong since frontends are allowed to produce symbols early.
So it would be nice to fix this problem and make sure that order of
clons is sane.

I guess this is bit of independent of the rest of caching, so maybe we
can first get the other patches in and then worry about order?
> 
> This is important for incremental LTO because if there is a new symbol,
> it otherwise shifts order of all symbols with higher order, which would
> diverge them all.
> 
> Bootstrapped/regtested on x86_64-pc-linux-gnu
> 
> gcc/ChangeLog:
> 
> 	* lto-cgraph.cc (lto_output_node): Add and use order_remap.
> 	(lto_output_varpool_node): Likewise.
> 	(output_symtab): Likewise.
> 	* lto-streamer-out.cc (produce_asm): Likewise.
> 	(output_function): Likewise.
> 	(output_constructor): Likewise.
> 	(copy_function_or_variable): Likewise.
> 	(cmp_int): New.
> 	(lto_output): Generate order_remap.
> 	* lto-streamer.h (produce_asm): Add order_remap.
> 	(output_symtab): Likewise.
> ---
>  gcc/lto-cgraph.cc       | 20 ++++++++----
>  gcc/lto-streamer-out.cc | 71 +++++++++++++++++++++++++++++++++--------
>  gcc/lto-streamer.h      |  5 +--
>  3 files changed, 73 insertions(+), 23 deletions(-)
> 
> diff --git a/gcc/lto-cgraph.cc b/gcc/lto-cgraph.cc
> index 32c0f5ac6db..a7530290fba 100644
> --- a/gcc/lto-cgraph.cc
> +++ b/gcc/lto-cgraph.cc
> @@ -381,7 +381,8 @@ reachable_from_this_partition_p (struct cgraph_node *node, lto_symtab_encoder_t
>  
>  static void
>  lto_output_node (struct lto_simple_output_block *ob, struct cgraph_node *node,
> -		 lto_symtab_encoder_t encoder)
> +		 lto_symtab_encoder_t encoder,
> +		 hash_map<int_hash<int, -1, -2>, int>* order_remap)
>  {
>    unsigned int tag;
>    struct bitpack_d bp;
> @@ -405,7 +406,9 @@ lto_output_node (struct lto_simple_output_block *ob, struct cgraph_node *node,
>  
>    streamer_write_enum (ob->main_stream, LTO_symtab_tags, LTO_symtab_last_tag,
>  		       tag);
> -  streamer_write_hwi_stream (ob->main_stream, node->order);
> +
> +  int order = flag_wpa ? *order_remap->get (node->order) : node->order;
> +  streamer_write_hwi_stream (ob->main_stream, order);
>  
>    /* In WPA mode, we only output part of the call-graph.  Also, we
>       fake cgraph node attributes.  There are two cases that we care.
> @@ -585,7 +588,8 @@ lto_output_node (struct lto_simple_output_block *ob, struct cgraph_node *node,
>  
>  static void
>  lto_output_varpool_node (struct lto_simple_output_block *ob, varpool_node *node,
> -			 lto_symtab_encoder_t encoder)
> +			 lto_symtab_encoder_t encoder,
> +			 hash_map<int_hash<int, -1, -2>, int>* order_remap)
>  {
>    bool boundary_p = !lto_symtab_encoder_in_partition_p (encoder, node);
>    bool encode_initializer_p
> @@ -602,7 +606,8 @@ lto_output_varpool_node (struct lto_simple_output_block *ob, varpool_node *node,
>  
>    streamer_write_enum (ob->main_stream, LTO_symtab_tags, LTO_symtab_last_tag,
>  		       LTO_symtab_variable);
> -  streamer_write_hwi_stream (ob->main_stream, node->order);
> +  int order = flag_wpa ? *order_remap->get (node->order) : node->order;
> +  streamer_write_hwi_stream (ob->main_stream, order);
>    lto_output_var_decl_ref (ob->decl_state, ob->main_stream, node->decl);
>    bp = bitpack_create (ob->main_stream);
>    bp_pack_value (&bp, node->externally_visible, 1);
> @@ -967,7 +972,7 @@ compute_ltrans_boundary (lto_symtab_encoder_t in_encoder)
>  /* Output the part of the symtab in SET and VSET.  */
>  
>  void
> -output_symtab (void)
> +output_symtab (hash_map<int_hash<int, -1, -2>, int>* order_remap)
>  {
>    struct cgraph_node *node;
>    struct lto_simple_output_block *ob;
> @@ -994,9 +999,10 @@ output_symtab (void)
>      {
>        symtab_node *node = lto_symtab_encoder_deref (encoder, i);
>        if (cgraph_node *cnode = dyn_cast <cgraph_node *> (node))
> -        lto_output_node (ob, cnode, encoder);
> +	lto_output_node (ob, cnode, encoder, order_remap);
>        else
> -	lto_output_varpool_node (ob, dyn_cast<varpool_node *> (node), encoder);
> +	lto_output_varpool_node (ob, dyn_cast<varpool_node *> (node), encoder,
> +				 order_remap);
>      }
>  
>    /* Go over the nodes in SET again to write edges.  */
> diff --git a/gcc/lto-streamer-out.cc b/gcc/lto-streamer-out.cc
> index a1bbea8fc68..9448ab195d5 100644
> --- a/gcc/lto-streamer-out.cc
> +++ b/gcc/lto-streamer-out.cc
> @@ -2212,7 +2212,8 @@ output_cfg (struct output_block *ob, struct function *fn)
>     a function, set FN to the decl for that function.  */
>  
>  void
> -produce_asm (struct output_block *ob, tree fn)
> +produce_asm (struct output_block *ob, tree fn,
> +	     hash_map<int_hash<int, -1, -2>, int>* order_remap)
>  {
>    enum lto_section_type section_type = ob->section_type;
>    struct lto_function_header header;
> @@ -2221,9 +2222,11 @@ produce_asm (struct output_block *ob, tree fn)
>    if (section_type == LTO_section_function_body)
>      {
>        const char *name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (fn));
> -      section_name = lto_get_section_name (section_type, name,
> -					   symtab_node::get (fn)->order,
> -					   NULL);
> +
> +      int order = symtab_node::get (fn)->order;
> +      if (flag_wpa && order_remap)
> +	order = *order_remap->get (order);
> +      section_name = lto_get_section_name (section_type, name, order, NULL);
>      }
>    else
>      section_name = lto_get_section_name (section_type, NULL, 0, NULL);
> @@ -2405,7 +2408,8 @@ streamer_write_chain (struct output_block *ob, tree t, bool ref_p)
>  /* Output the body of function NODE->DECL.  */
>  
>  static void
> -output_function (struct cgraph_node *node)
> +output_function (struct cgraph_node *node,
> +		 hash_map<int_hash<int, -1, -2>, int>* order_remap)
>  {
>    tree function;
>    struct function *fn;
> @@ -2482,7 +2486,7 @@ output_function (struct cgraph_node *node)
>      streamer_write_uhwi (ob, 0);
>  
>    /* Create a section to hold the pickled output of this function.   */
> -  produce_asm (ob, function);
> +  produce_asm (ob, function, order_remap);
>  
>    destroy_output_block (ob);
>    if (streamer_dump_file)
> @@ -2493,7 +2497,8 @@ output_function (struct cgraph_node *node)
>  /* Output the body of function NODE->DECL.  */
>  
>  static void
> -output_constructor (struct varpool_node *node)
> +output_constructor (struct varpool_node *node,
> +		    hash_map<int_hash<int, -1, -2>, int>* order_remap)
>  {
>    tree var = node->decl;
>    struct output_block *ob;
> @@ -2515,7 +2520,7 @@ output_constructor (struct varpool_node *node)
>    stream_write_tree (ob, DECL_INITIAL (var), true);
>  
>    /* Create a section to hold the pickled output of this function.   */
> -  produce_asm (ob, var);
> +  produce_asm (ob, var, order_remap);
>  
>    destroy_output_block (ob);
>    if (streamer_dump_file)
> @@ -2576,15 +2581,18 @@ lto_output_toplevel_asms (void)
>  /* Copy the function body or variable constructor of NODE without deserializing. */
>  
>  static void
> -copy_function_or_variable (struct symtab_node *node)
> +copy_function_or_variable (struct symtab_node *node,
> +			   hash_map<int_hash<int, -1, -2>, int>* order_remap)
>  {
>    tree function = node->decl;
>    struct lto_file_decl_data *file_data = node->lto_file_data;
>    const char *data;
>    size_t len;
>    const char *name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (function));
> +
> +  int order = flag_wpa ? *order_remap->get (node->order) : node->order;
>    char *section_name =
> -    lto_get_section_name (LTO_section_function_body, name, node->order, NULL);
> +    lto_get_section_name (LTO_section_function_body, name, order, NULL);
>    size_t i, j;
>    struct lto_in_decl_state *in_state;
>    struct lto_out_decl_state *out_state = lto_get_out_decl_state ();
> @@ -2729,6 +2737,15 @@ cmp_symbol_files (const void *pn1, const void *pn2, void *id_map_)
>    return n1->order - n2->order;
>  }
>  
> +/* Compare ints, callback for qsort.  */
> +static int
> +cmp_int (const void *a, const void *b)
> +{
> +  int ia = *(int const*) a;
> +  int ib = *(int const*) b;
> +  return ia - ib;
> +}
> +
>  /* Main entry point from the pass manager.  */
>  
>  void
> @@ -2741,6 +2758,32 @@ lto_output (void)
>    lto_symtab_encoder_t encoder = lto_get_out_decl_state ()->symtab_node_encoder;
>    auto_vec<symtab_node *> symbols_to_copy;
>  
> +  hash_map<int_hash<int, -1, -2>, int> order_remap;
> +  if (flag_wpa)
> +    {
> +      /* Remap order so that it does not depend on symbols outside of
> +	 partition.  */
> +      auto_vec<int> orders;
> +
> +      n_nodes = lto_symtab_encoder_size (encoder);
> +      for (i = 0; i < n_nodes; i++)
> +	{
> +	  symtab_node *snode = lto_symtab_encoder_deref (encoder, i);
> +	  if (cgraph_node *cnode = dyn_cast <cgraph_node *> (snode))
> +	    {
> +	      if (cnode->clone_of)
> +		{
> +		  order_remap.put (snode->order, 0);
> +		  continue;
> +		}
> +	    }
> +	  orders.safe_push (snode->order);
> +	}
> +      orders.qsort (cmp_int);
> +      for (i = 0; i < orders.length (); i++)
> +	order_remap.put (orders[i], i);
> +    }
> +
>    prune_offload_funcs ();
>  
>    if (flag_checking)
> @@ -2817,14 +2860,14 @@ lto_output (void)
>  		 at WPA time.  */
>  	      || DECL_ARGUMENTS (cnode->decl)
>  	      || cnode->declare_variant_alt))
> -	output_function (cnode);
> +	output_function (cnode, &order_remap);
>        else if ((vnode = dyn_cast <varpool_node *> (snode))
>  	       && (DECL_INITIAL (vnode->decl) != error_mark_node
>  		   || (!flag_wpa
>  		       && flag_incremental_link != INCREMENTAL_LINK_LTO)))
> -	output_constructor (vnode);
> +	output_constructor (vnode, &order_remap);
>        else
> -	copy_function_or_variable (snode);
> +	copy_function_or_variable (snode, &order_remap);
>        gcc_assert (lto_get_out_decl_state () == decl_state);
>        lto_pop_out_decl_state ();
>        lto_record_function_out_decl_state (snode->decl, decl_state);
> @@ -2834,7 +2877,7 @@ lto_output (void)
>       be done now to make sure that all the statements in every function
>       have been renumbered so that edges can be associated with call
>       statements using the statement UIDs.  */
> -  output_symtab ();
> +  output_symtab (&order_remap);
>  
>    output_offload_tables ();
>  
> diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
> index 0556b34c837..3363e6f9e61 100644
> --- a/gcc/lto-streamer.h
> +++ b/gcc/lto-streamer.h
> @@ -888,7 +888,8 @@ extern void lto_output_fn_decl_ref (struct lto_out_decl_state *,
>  extern tree lto_input_var_decl_ref (lto_input_block *, lto_file_decl_data *);
>  extern tree lto_input_fn_decl_ref (lto_input_block *, lto_file_decl_data *);
>  extern void lto_output_toplevel_asms (void);
> -extern void produce_asm (struct output_block *ob, tree fn);
> +extern void produce_asm (struct output_block *ob, tree fn,
> +			 hash_map<int_hash<int, -1, -2>, int>* order_remap = 0);
>  extern void lto_output ();
>  extern void produce_asm_for_decls ();
>  void lto_output_decl_state_streams (struct output_block *,
> @@ -919,7 +920,7 @@ void lto_set_symtab_encoder_in_partition (lto_symtab_encoder_t,
>  
>  bool lto_symtab_encoder_encode_initializer_p (lto_symtab_encoder_t,
>  					      varpool_node *);
> -void output_symtab (void);
> +void output_symtab (hash_map<int_hash<int, -1, -2>, int>*);
>  void input_symtab (void);
>  void output_offload_tables (void);
>  void input_offload_tables (bool);
> -- 
> 2.42.1
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2024-05-14 12:20 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-17 20:16 [PATCH 0/7] lto: Incremental LTO Michal Jires
2023-11-17 20:16 ` [PATCH 1/7] lto: Skip flag OPT_fltrans_output_list_ Michal Jires
2023-12-29 21:16   ` Jan Hubicka
2023-11-17 20:16 ` [PATCH 2/7] lto: Remove random_seed from section name Michal Jires
2023-12-29 21:17   ` Jan Hubicka
2024-01-09 16:49     ` [PATCH 2/7 v2] " Michal Jires
     [not found]       ` <c480760c-f167-4e60-a27e-52bebdd1351b@suse.cz>
2024-05-14 11:28         ` Fwd: " Jan Hubicka
2023-11-17 20:17 ` [PATCH 3/7] Lockfile Michal Jires
2023-12-29 21:23   ` Jan Hubicka
2024-01-09 17:10     ` Michal Jires
2023-11-17 20:17 ` [PATCH 4/7] lto: Implement ltrans cache Michal Jires
2024-05-14 11:51   ` Jan Hubicka
2023-11-17 20:17 ` [PATCH 5/7] lto: Implement cache partitioning Michal Jires
2024-05-14 12:10   ` Jan Hubicka
2023-11-17 20:17 ` [PATCH 6/7] lto: squash order of symbols in partitions Michal Jires
2024-05-14 12:20   ` Jan Hubicka
2023-11-17 20:17 ` [PATCH 7/7] lto: partition specific lto_clone_numbers Michal Jires
2024-05-14 12:11   ` Jan Hubicka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).