public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 0/6] Parallelize Intra-Procedural Optimizations using the LTO Engine.
@ 2020-08-20 22:00 Giuliano Belinassi
  2020-08-20 22:00 ` [PATCH 1/6] Modify gcc driver for parallel compilation Giuliano Belinassi
                   ` (7 more replies)
  0 siblings, 8 replies; 31+ messages in thread
From: Giuliano Belinassi @ 2020-08-20 22:00 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.guenther, hubicka

This patch series add a new flag "-fparallel-jobs=" to control if the
compiler should try to compile the current file in parallel.

There are three modes which is supported by now:

1. -fparallel-jobs=<N>: Try to compile the file using a maximum of N
jobs.

2. -fparallel-jobs=jobserver: Check if there is a running GNU Make
Jobserver. If positive, communicate with it in order to launch jobs,
but alert the user if the jobserver was not found, since it requires
modifications in the project Makefile.

3. -fparallel-jobs=auto: Same as 2., but quietly fall back to a maximum
of 2 jobs if the jobserver was not found.

The parallelization works by using a modified LTO engine, as no IR is
dumped into the disk, and a new partitioner is employed to find
symbols which must be partitioned together.

In order to implement the parallelism feature, we:

1. The driver will pass a hidden -fsplit-outputs=<filename> to cc1*.

2. After IPA, cc1* will search for symbols in which must be partitioned
together.  If the user allows GCC to automatically promote symbols to
globals through "--param=promote-statics=1" for a better parallel
compilation performance, it will also be done.  However, if it decides
that partitioning is a bad idea, it will continue with a default serial
compilation, and the additional <filename> will not be created.  It will
avoid compiling in parallel if and only if:

  * File size exceeds the minimum file size specified by LTO default
  --param=lto-min-partition.

  * The partitioner is unable to find any point of partitioning in the
  file.

3. cc1* will fork itself; one fork for each partition. Each child
process will apply its partition mask generated by the partitioner
and write a new assembler name file to <filename> pointed by the driver.

4. The driver will open each file and partially link them together into
a single .o file, if -c was requested, else into a binary.  -S and -E
is unsupported for now and probably will remain so.


Speedups ranged from 0.95x to 1.9x on a Quad-Core Intel Core-i7 8565U
when testing with two files in GCC, as stated in the following table.
The test was the result of a single execution with a previous warm up
execution. The compiled GCC had checking enabled, and therefore release
version might have better timings in both sequential and parallel, but the
speedup may remain the same.

|                |            | Without Static | With Static |   Max   |
| File           | Sequential |    Promotion   |  Promotion  | Speedup |
|----------------|------------|----------------|-----------------------|
| gimple-match.c |     60s    |       63s      |     34s     |   1.7x  |
| insn-emit.c    |     37s    |       19s      |     20s     |   1.9x  |

Notice that we have a slowdown in some cases when it is enabled, that
is why the parallelism feature is enabled with a flag for now.

Bootstrapped and Regtested on Linux x86_64.

Giuliano Belinassi (6):
  Modify gcc driver for parallel compilation
  Implement a new partitioner for parallel compilation
  Implement fork-based parallelism engine
  Add `+' for Jobserver Integration
  Add invoke documentation
  New tests for parallel compilation feature

 gcc/Makefile.in                               |    6 +-
 gcc/cgraph.c                                  |   16 +
 gcc/cgraph.h                                  |   13 +
 gcc/cgraphunit.c                              |  198 ++-
 gcc/common.opt                                |    4 +
 gcc/doc/invoke.texi                           |   32 +-
 gcc/gcc.c                                     | 1219 +++++++++++++----
 gcc/ipa-fnsummary.c                           |    2 +-
 gcc/ipa-icf.c                                 |    3 +-
 gcc/ipa-visibility.c                          |    3 +-
 gcc/ipa.c                                     |    4 +-
 gcc/jobserver.cc                              |  168 +++
 gcc/jobserver.h                               |   33 +
 gcc/lto-cgraph.c                              |  172 +++
 gcc/{lto => }/lto-partition.c                 |  463 ++++++-
 gcc/{lto => }/lto-partition.h                 |    4 +-
 gcc/lto-streamer.h                            |    4 +
 gcc/lto/Make-lang.in                          |    4 +-
 gcc/lto/lto.c                                 |    2 +-
 gcc/params.opt                                |    8 +
 gcc/symtab.c                                  |   46 +-
 gcc/testsuite/driver/a.c                      |    6 +
 gcc/testsuite/driver/b.c                      |    6 +
 gcc/testsuite/driver/driver.exp               |   80 ++
 gcc/testsuite/driver/empty.c                  |    0
 gcc/testsuite/driver/foo.c                    |    7 +
 .../gcc.dg/parallel-early-constant.c          |   22 +
 gcc/testsuite/gcc.dg/parallel-static-1.c      |   21 +
 gcc/testsuite/gcc.dg/parallel-static-2.c      |   21 +
 .../gcc.dg/parallel-static-clash-1.c          |   23 +
 .../gcc.dg/parallel-static-clash-aux.c        |   14 +
 gcc/toplev.c                                  |   58 +-
 gcc/toplev.h                                  |    3 +
 gcc/tree.c                                    |   23 +-
 gcc/varasm.c                                  |   26 +-
 intl/Makefile.in                              |    2 +-
 libbacktrace/Makefile.in                      |    2 +-
 libcpp/Makefile.in                            |    2 +-
 libdecnumber/Makefile.in                      |    2 +-
 libiberty/Makefile.in                         |  212 +--
 zlib/Makefile.in                              |   64 +-
 41 files changed, 2539 insertions(+), 459 deletions(-)
 create mode 100644 gcc/jobserver.cc
 create mode 100644 gcc/jobserver.h
 rename gcc/{lto => }/lto-partition.c (78%)
 rename gcc/{lto => }/lto-partition.h (89%)
 create mode 100644 gcc/testsuite/driver/a.c
 create mode 100644 gcc/testsuite/driver/b.c
 create mode 100644 gcc/testsuite/driver/driver.exp
 create mode 100644 gcc/testsuite/driver/empty.c
 create mode 100644 gcc/testsuite/driver/foo.c
 create mode 100644 gcc/testsuite/gcc.dg/parallel-early-constant.c
 create mode 100644 gcc/testsuite/gcc.dg/parallel-static-1.c
 create mode 100644 gcc/testsuite/gcc.dg/parallel-static-2.c
 create mode 100644 gcc/testsuite/gcc.dg/parallel-static-clash-1.c
 create mode 100644 gcc/testsuite/gcc.dg/parallel-static-clash-aux.c

-- 
2.28.0


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 1/6] Modify gcc driver for parallel compilation
  2020-08-20 22:00 [PATCH 0/6] Parallelize Intra-Procedural Optimizations using the LTO Engine Giuliano Belinassi
@ 2020-08-20 22:00 ` Giuliano Belinassi
  2020-08-24 13:17   ` Richard Biener
  2020-08-20 22:00 ` [PATCH 2/6] Implement a new partitioner " Giuliano Belinassi
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 31+ messages in thread
From: Giuliano Belinassi @ 2020-08-20 22:00 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.guenther, hubicka

Update the driver for parallel compilation. This process work as
follows:

When calling gcc, the driver will check if the flag
"-fparallel-jobs" was provided by the user. If yes, then we will
check what is the desired output, and if it can be parallelized.
There are the following cases, which is described:

1. -S or -E was provided: We can't run in parallel, as the output
   can not be easily merged together into one file.

2. -c was provided: When cc1* forks into multiple processes, it
   must tell the driver where it stored its generated assembler files.
   Therefore we pass a hidden "-fsplit-outputs=filename" to the compiler,
   and we check if "filename" was created by it. If yes, we open it,
   call assembler for each generated asm file
   (this file must not be empty), and link them together with
   partial linking to a single .o file. This process is done for each
   object file in the argument list.

3. -c was not provided, and the final product will be an binary: Here
   we proceed exactly as 2., but we avoid doing the partial
   linking, feeding the generated object files directly into the final link.

For that to work, we had to heavily modify how the "execute" function
works, extracting common code which is used multiple times, and
also detecting when the command is a call to a compiler or an
assembler, as can be seen in append_split_outputs.

Finally, we added some tests which reflects all cases found when
bootstrapping the compiler, so development of further features to the
driver get faster for now on.

gcc/ChangeLog
2020-08-20  Giuliano Belinassi  <giuliano.belinassi@usp.br>

	* common.opt (fsplit-outputs): New flag.
	(fparallel-jobs): New flag.
	* gcc.c (extra_arg_storer): New class.
	(have_S): New variable.
	(struct command): Move from execute.
	(is_compiler): New function.
	(is_assembler): New function.
	(get_number_of_args): New function.
	(get_file_by_lines): New function.
	(identify_asm_file): New function.
	(struct infile): New attribute temp_additional_asm.
	(current_infile): New variable.
	(get_path_to_ld): New function.
	(has_hidden_E): New function.
	(sort_asm_files): New function.
	(append_split_outputs): New function.
	(print_command): New function.
	(print_commands): New function.
	(print_argbuf): New function.
	(handle_verbose): Extracted from execute.
	(append_valgrind): Same as above.
	(async_launch_commands): Same as above.
	(await_commands_to_finish): Same as above.
	(split_commands): Same as above.
	(parse_argbuf): Same as above.
	(execute): Refator.
	(fsplit_arg): New function.
	(alloc_infile): Initialize infiles with 0.
	(process_command): Remember when -S was passed.
	(do_spec_on_infiles): Remember current infile being processed.
	(maybe_run_linker): Replace object files when -o is a executable.
	(finalize): Deinitialize temp_object_files.

gcc/testsuite/ChangeLog:
20-08-2020  Giuliano Belinassi  <giuliano.belinassi@usp.br>

	* driver/driver.exp: New test.
	* driver/a.c: New file.
	* driver/b.c: New file.
	* driver/empty.c: New file.
	* driver/foo.c: New file.
---
 gcc/common.opt                  |    4 +
 gcc/gcc.c                       | 1219 ++++++++++++++++++++++++-------
 gcc/testsuite/driver/a.c        |    6 +
 gcc/testsuite/driver/b.c        |    6 +
 gcc/testsuite/driver/driver.exp |   80 ++
 gcc/testsuite/driver/empty.c    |    0
 gcc/testsuite/driver/foo.c      |    7 +
 7 files changed, 1049 insertions(+), 273 deletions(-)
 create mode 100644 gcc/testsuite/driver/a.c
 create mode 100644 gcc/testsuite/driver/b.c
 create mode 100644 gcc/testsuite/driver/driver.exp
 create mode 100644 gcc/testsuite/driver/empty.c
 create mode 100644 gcc/testsuite/driver/foo.c

diff --git a/gcc/common.opt b/gcc/common.opt
index 4b08e91859f..4aa3ad8c95b 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3465,4 +3465,8 @@ fipa-ra
 Common Report Var(flag_ipa_ra) Optimization
 Use caller save register across calls if possible.
 
+fsplit-outputs=
+Common Joined Var(split_outputs)
+-fsplit-outputs=<tempfile>  Filename in which current Compilation Unit will be split to.
+
 ; This comment is to ensure we retain the blank line above.
diff --git a/gcc/gcc.c b/gcc/gcc.c
index 10bc9881aed..c276a11ca7a 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -343,6 +343,74 @@ static struct obstack obstack;
 
 static struct obstack collect_obstack;
 
+/* This is used to store new argv arrays created dinamically to avoid memory
+   leaks.  */
+
+class extra_arg_storer
+{
+  public:
+
+    /* Initialize the vec with a default size.  */
+
+    extra_arg_storer ()
+      {
+	string_vec.create (8);
+	extra_args.create (64);
+      }
+
+    /* Create new array of strings of size N.  */
+    const char **create_new (size_t n)
+      {
+	const char **ret = XNEWVEC (const char *, n);
+	extra_args.safe_push (ret);
+	return ret;
+      }
+
+    char *create_string (size_t n)
+      {
+	char *ret = XNEWVEC (char, n);
+	string_vec.safe_push (ret);
+	return ret;
+      }
+
+    void store (char *str)
+      {
+	string_vec.safe_push (str);
+      }
+
+    ~extra_arg_storer ()
+      {
+	release_extra_args ();
+	release_string_vec ();
+      }
+
+
+  private:
+
+    /* Release all allocated strings.  */
+    void release_extra_args ()
+      {
+	size_t i;
+
+	for (i = 0; i < extra_args.length (); i++)
+	  free (extra_args[i]);
+	extra_args.release ();
+      }
+
+    void release_string_vec ()
+      {
+	size_t i;
+
+	for (i = 0; i < string_vec.length (); i++)
+	  free (string_vec[i]);
+	string_vec.release ();
+      }
+
+    /* Data structure to hold all arrays.  */
+    vec<const char **> extra_args;
+    vec<char *> string_vec;
+};
+
 /* Forward declaration for prototypes.  */
 struct path_prefix;
 struct prefix_list;
@@ -1993,6 +2061,9 @@ static int have_o = 0;
 /* Was the option -E passed.  */
 static int have_E = 0;
 
+/* Was the option -S passed.  */
+static int have_S = 0;
+
 /* Pointer to output file name passed in with -o. */
 static const char *output_file = 0;
 
@@ -3056,158 +3127,522 @@ add_sysrooted_hdrs_prefix (struct path_prefix *pprefix, const char *prefix,
 	      require_machine_suffix, os_multilib);
 }
 
-\f
-/* Execute the command specified by the arguments on the current line of spec.
-   When using pipes, this includes several piped-together commands
-   with `|' between them.
+struct command
+{
+  const char *prog;		/* program name.  */
+  const char **argv;		/* vector of args.  */
+};
 
-   Return 0 if successful, -1 if failed.  */
+#define EMPTY_CMD(x) (!((x).prog))  /* Is the provided CMD empty?  */
+
+/* Check if arg is a call to a compiler.  Return false if not, true if yes.  */
+
+static bool
+is_compiler (const char *arg)
+{
+  static const char *const compilers[] = {"cc1", "cc1plus", "f771"};
+  const char* ptr = arg;
+
+  size_t i;
+
+  /* Jump to last '/' of string.  */
+  while (*arg)
+    if (*arg++ == '/')
+      ptr = arg;
+
+  /* Look if current character seems valid.  */
+  gcc_assert (!(*ptr == '\0' ||  *ptr == '/'));
+
+  for (i = 0; i < ARRAY_SIZE (compilers); i++)
+    {
+      if (!strcmp (ptr, compilers[i]))
+	return true;
+    }
+
+  return false;
+}
+
+/* Check if arg is a call to as.  Return false if not, true if yes.  */
+
+static bool
+is_assembler (const char *arg)
+{
+  static const char *const assemblers[] = {"as", "gas"};
+  const char* ptr = arg;
+
+  size_t i;
+
+  /* Jump to last '/' of string.  */
+  while (*arg)
+    if (*arg++ == '/')
+      ptr = arg;
+
+  /* Look if current character seems valid.  */
+  gcc_assert (!(*ptr == '\0' ||  *ptr == '/'));
+
+  for (i = 0; i < ARRAY_SIZE (assemblers); i++)
+    {
+      if (!strcmp (ptr, assemblers[i]))
+	return true;
+    }
+
+  return false;
+}
+
+/* Get argv[] array length.  */
 
 static int
-execute (void)
+get_number_of_args (const char *argv[])
+{
+  int argc;
+
+  for (argc = 0; argv[argc] != NULL; argc++)
+    ;
+
+  return argc;
+}
+
+static const char *fsplit_arg (extra_arg_storer *);
+
+/* Accumulate each line in lines vec.  Return true if file exists, false if
+   not.  */
+
+static bool
+get_file_by_lines (extra_arg_storer *storer, vec<char *> *lines, const char *name)
+{
+  int buf_size = 64, len = 0;
+  char *buf = XNEWVEC (char, buf_size);
+
+
+  FILE *file = fopen (name, "r");
+
+  if (!file)
+    return false;
+
+  while (1)
+    {
+      if (!fgets (buf + len, buf_size, file))
+	{
+	  free (buf); /* Release buffer we created unecessarily.  */
+	  break;
+	}
+
+      len = strlen (buf);
+      if (buf[len - 1] == '\n') /* Check if we indeed read the entire line.  */
+	{
+	  buf[len - 1] = '\0';
+	  /* Yes.  Insert into the lines vector.  */
+	  lines->safe_push (buf);
+	  len = 0;
+
+	  /* Store the created string for future release.  */
+	  storer->store (buf);
+	  buf = XNEWVEC (char, buf_size);
+	}
+      else
+	{
+	  /* No.  Increase the buffer size and read again.  */
+	  buf = XRESIZEVEC (char, buf, buf_size * 2);
+	}
+    }
+
+  if (lines->length () == 0)
+    internal_error ("Empty file: %s", name);
+
+  fclose (file);
+  return true;
+}
+
+static void
+identify_asm_file (int argc, const char *argv[],
+		   int *infile_pos, int *outfile_pos)
 {
   int i;
-  int n_commands;		/* # of command.  */
-  char *string;
-  struct pex_obj *pex;
-  struct command
-  {
-    const char *prog;		/* program name.  */
-    const char **argv;		/* vector of args.  */
-  };
-  const char *arg;
 
-  struct command *commands;	/* each command buffer with above info.  */
+  static const char *asm_extension[] = {"s", "S"};
 
-  gcc_assert (!processing_spec_function);
+  bool infile_found = false;
+  bool outfile_found = false;
 
-  if (wrapper_string)
+  for (i = 0; i < argc; i++)
     {
-      string = find_a_file (&exec_prefixes,
-			    argbuf[0], X_OK, false);
-      if (string)
-	argbuf[0] = string;
-      insert_wrapper (wrapper_string);
+      const char *arg = argv[i];
+      const char *ext = argv[i];
+      unsigned j;
+
+      /* Jump to last '.' of string.  */
+      while (*arg)
+	if (*arg++ == '.')
+	  ext = arg;
+
+      if (!infile_found)
+	for (j = 0; j < ARRAY_SIZE (asm_extension); ++j)
+	    if (!strcmp (ext, asm_extension[j]))
+	      {
+		infile_found = true;
+		*infile_pos = i;
+		break;
+	      }
+
+      if (!outfile_found)
+	if (!strcmp (ext, "-o"))
+	  {
+	    outfile_found = true;
+	    *outfile_pos = i+1;
+	  }
+
+      if (infile_found && outfile_found)
+	return;
     }
 
-  /* Count # of piped commands.  */
-  for (n_commands = 1, i = 0; argbuf.iterate (i, &arg); i++)
-    if (strcmp (arg, "|") == 0)
-      n_commands++;
+  gcc_assert (infile_found && outfile_found);
 
-  /* Get storage for each command.  */
-  commands = (struct command *) alloca (n_commands * sizeof (struct command));
+}
 
-  /* Split argbuf into its separate piped processes,
-     and record info about each one.
-     Also search for the programs that are to be run.  */
+/* Language is one of three things:
 
-  argbuf.safe_push (0);
+   1) The name of a real programming language.
+   2) NULL, indicating that no one has figured out
+   what it is yet.
+   3) '*', indicating that the file should be passed
+   to the linker.  */
+struct infile
+{
+  const char *name;
+  const char *language;
+  const char *temp_additional_asm;
+  struct compiler *incompiler;
+  bool compiled;
+  bool preprocessed;
+};
 
-  commands[0].prog = argbuf[0]; /* first command.  */
-  commands[0].argv = argbuf.address ();
+/* Also a vector of input files specified.  */
 
-  if (!wrapper_string)
+static struct infile *infiles;
+static struct infile *current_infile = NULL;
+
+int n_infiles;
+
+static int n_infiles_alloc;
+
+static vec<const char *> temp_object_files;
+
+/* Get path to the configured ld.  */
+
+static const char *
+get_path_to_ld (void)
+{
+  const char *ret = find_a_file (&exec_prefixes, LINKER_NAME, X_OK, false);
+  if (!ret)
+    ret = "ld";
+
+  return ret;
+}
+
+/* Check if a hidden -E was passed as argument to something.  */
+
+static bool
+has_hidden_E (int argc, const char *argv[])
+{
+  int i;
+  for (i = 0; i < argc; ++i)
+    if (!strcmp (argv[i], "-E"))
+      return true;
+
+  return false;
+}
+
+/* Assembler in the container file are inserted as soon as they are ready.
+   Sort them so that builds are reproducible.  */
+
+static void
+sort_asm_files (vec <char *> *_lines)
+{
+  vec <char *> &lines = *_lines;
+  int i, n = lines.length ();
+  char **temp_buf = XALLOCAVEC (char *, n);
+
+  for (i = 0; i < n; i++)
+    temp_buf[i] = lines[i];
+
+  for (i = 0; i < n; i++)
     {
-      string = find_a_file (&exec_prefixes, commands[0].prog, X_OK, false);
-      if (string)
-	commands[0].argv[0] = string;
+      char *no_str = strtok (temp_buf[i], " ");
+      char *name = strtok (NULL, "");
+
+      int pos = atoi (no_str);
+      lines[pos] = name;
     }
+}
 
-  for (n_commands = 1, i = 0; argbuf.iterate (i, &arg); i++)
-    if (arg && strcmp (arg, "|") == 0)
-      {				/* each command.  */
-#if defined (__MSDOS__) || defined (OS2) || defined (VMS)
-	fatal_error (input_location, "%<-pipe%> not supported");
-#endif
-	argbuf[i] = 0; /* Termination of command args.  */
-	commands[n_commands].prog = argbuf[i + 1];
-	commands[n_commands].argv
-	  = &(argbuf.address ())[i + 1];
-	string = find_a_file (&exec_prefixes, commands[n_commands].prog,
-			      X_OK, false);
-	if (string)
-	  commands[n_commands].argv[0] = string;
-	n_commands++;
-      }
+/* Append -fsplit-output=<tempfile> to all calls to compilers.  Return true
+   if a additional call to LD is required to merge the resulting files.  */
 
-  /* If -v, print what we are about to do, and maybe query.  */
+static void
+append_split_outputs (extra_arg_storer *storer,
+		      struct command *additional_ld,
+		      struct command **_commands,
+		      int *_n_commands)
+{
+  int i;
 
-  if (verbose_flag)
+  struct command *commands = *_commands;
+  int n_commands = *_n_commands;
+
+  const char **argv;
+  int argc;
+
+  if (is_compiler (commands[0].prog))
+    {
+      argc = get_number_of_args (commands[0].argv);
+      argv = storer->create_new (argc + 4);
+
+      memcpy (argv, commands[0].argv, argc * sizeof (const char *));
+
+      if (!has_hidden_E (argc, commands[0].argv))
+	{
+	  const char *extra_argument = fsplit_arg (storer);
+	  argv[argc++] = extra_argument;
+	}
+
+      if (have_c)
+	{
+	  argv[argc++] = "-fPIE";
+	  argv[argc++] = "-fPIC";
+	}
+
+      argv[argc]   = NULL;
+
+      commands[0].argv = argv;
+    }
+
+  else if (is_assembler (commands[0].prog))
     {
-      /* For help listings, put a blank line between sub-processes.  */
-      if (print_help_list)
-	fputc ('\n', stderr);
+      vec<char *> additional_asm_files;
+
+      struct command orig;
+      const char **orig_argv;
+      int orig_argc;
+      const char *orig_obj_file;
+
+      int infile_pos = -1;
+      int outfile_pos = -1;
+
+      static const char *path_to_ld = NULL;
+
+      if (!current_infile->temp_additional_asm)
+	{
+	  /* Return because we did not create a additional-asm file for this
+	     input.  */
+
+	  return;
+	}
+
+      additional_asm_files.create (2);
+
+      if (!get_file_by_lines (storer, &additional_asm_files,
+			      current_infile->temp_additional_asm))
+	{
+	  additional_asm_files.release ();
+	  return; /* File not found.  This means that cc1* decided not to
+		      parallelize.  */
+	}
+
+      sort_asm_files (&additional_asm_files);
+
+      if (n_commands != 1)
+	fatal_error (input_location,
+		     "Auto parallelism is unsupported when piping commands");
+
+      if (!path_to_ld)
+	path_to_ld = get_path_to_ld ();
+
+      /* Get original command.  */
+      orig = commands[0];
+      orig_argv = commands[0].argv;
+      orig_argc = get_number_of_args (orig.argv);
+
+
+      /* Update commands array to include the extra `as' calls.  */
+      *_n_commands = additional_asm_files.length ();
+      n_commands = *_n_commands;
+
+      gcc_assert (n_commands > 0);
+
+      identify_asm_file (orig_argc, orig_argv, &infile_pos, &outfile_pos);
+
+      *_commands = XRESIZEVEC (struct command, *_commands, n_commands);
+      commands = *_commands;
 
-      /* Print each piped command as a separate line.  */
       for (i = 0; i < n_commands; i++)
 	{
-	  const char *const *j;
+	  const char **argv = storer->create_new (orig_argc + 1);
+	  const char *temp_obj = make_temp_file ("additional-obj.o");
+	  record_temp_file (temp_obj, true, true);
+	  record_temp_file (additional_asm_files[i], true, true);
+
+	  memcpy (argv, orig_argv, (orig_argc + 1) * sizeof (const char *));
+
+	  orig_obj_file = argv[outfile_pos];
+
+	  argv[infile_pos]  = additional_asm_files[i];
+	  argv[outfile_pos] = temp_obj;
+
+	  commands[i].prog = orig.prog;
+	  commands[i].argv = argv;
+
+	  temp_object_files.safe_push (temp_obj);
+	}
+
+	if (have_c)
+	  {
+	    unsigned int num_temp_objs = temp_object_files.length ();
+	    const char **argv = storer->create_new (num_temp_objs + 5);
+	    unsigned int j;
+
+	    argv[0] = path_to_ld;
+	    argv[1] = "-o";
+	    argv[2] = orig_obj_file;
+	    argv[3] = "-r";
+
+	    for (j = 0; j < num_temp_objs; j++)
+	      argv[j + 4] = temp_object_files[j];
+	    argv[j + 4] = NULL;
+
+	    additional_ld->prog = path_to_ld;
+	    additional_ld->argv = argv;
+
+	    if (!have_o)
+	      temp_object_files.truncate (0);
+	  }
+
+	additional_asm_files.release ();
+    }
+}
+
+DEBUG_FUNCTION void
+print_command (struct command *command)
+{
+  const char **argv;
+
+  for (argv = command->argv; *argv != NULL; argv++)
+    fprintf (stdout, " %s", *argv);
+  fputc ('\n', stdout);
+}
+
+DEBUG_FUNCTION void
+print_commands (int n, struct command *commands)
+{
+  int i;
+
+  for (i = 0; i < n; i++)
+    print_command (&commands[i]);
+}
+
+DEBUG_FUNCTION void
+print_argbuf ()
+{
+  int i;
+  const char *arg;
+
+  for (i = 0; argbuf.iterate (i, &arg); i++)
+    fprintf (stdout, "%s ", arg);
+  fputc ('\n', stdout);
+}
+
+
+/* Print what commands will run.  Return 0 if success, anything else on
+   error.  */
 
-	  if (verbose_only_flag)
+static int
+handle_verbose (int n_commands, struct command commands[])
+{
+  int i;
+
+  /* For help listings, put a blank line between sub-processes.  */
+  if (print_help_list)
+    fputc ('\n', stderr);
+
+  /* Print each piped command as a separate line.  */
+  for (i = 0; i < n_commands; i++)
+    {
+      const char *const *j;
+
+      if (verbose_only_flag)
+	{
+	  for (j = commands[i].argv; *j; j++)
 	    {
-	      for (j = commands[i].argv; *j; j++)
+	      const char *p;
+	      for (p = *j; *p; ++p)
+		if (!ISALNUM ((unsigned char) *p)
+		    && *p != '_' && *p != '/' && *p != '-' && *p != '.')
+		  break;
+	      if (*p || !*j)
 		{
-		  const char *p;
+		  fprintf (stderr, " \"");
 		  for (p = *j; *p; ++p)
-		    if (!ISALNUM ((unsigned char) *p)
-			&& *p != '_' && *p != '/' && *p != '-' && *p != '.')
-		      break;
-		  if (*p || !*j)
 		    {
-		      fprintf (stderr, " \"");
-		      for (p = *j; *p; ++p)
-			{
-			  if (*p == '"' || *p == '\\' || *p == '$')
-			    fputc ('\\', stderr);
-			  fputc (*p, stderr);
-			}
-		      fputc ('"', stderr);
+		      if (*p == '"' || *p == '\\' || *p == '$')
+			fputc ('\\', stderr);
+		      fputc (*p, stderr);
 		    }
-		  /* If it's empty, print "".  */
-		  else if (!**j)
-		    fprintf (stderr, " \"\"");
-		  else
-		    fprintf (stderr, " %s", *j);
-		}
-	    }
-	  else
-	    for (j = commands[i].argv; *j; j++)
+		  fputc ('"', stderr);
+		}
 	      /* If it's empty, print "".  */
-	      if (!**j)
+	      else if (!**j)
 		fprintf (stderr, " \"\"");
 	      else
 		fprintf (stderr, " %s", *j);
-
-	  /* Print a pipe symbol after all but the last command.  */
-	  if (i + 1 != n_commands)
-	    fprintf (stderr, " |");
-	  fprintf (stderr, "\n");
+	    }
 	}
-      fflush (stderr);
-      if (verbose_only_flag != 0)
-        {
-	  /* verbose_only_flag should act as if the spec was
-	     executed, so increment execution_count before
-	     returning.  This prevents spurious warnings about
-	     unused linker input files, etc.  */
-	  execution_count++;
-	  return 0;
-        }
+      else
+	for (j = commands[i].argv; *j; j++)
+	  /* If it's empty, print "".  */
+	  if (!**j)
+	    fprintf (stderr, " \"\"");
+	  else
+	    fprintf (stderr, " %s", *j);
+
+      /* Print a pipe symbol after all but the last command.  */
+      if (i + 1 != n_commands)
+	fprintf (stderr, " |");
+      fprintf (stderr, "\n");
+    }
+  fflush (stderr);
+  if (verbose_only_flag != 0)
+    {
+      /* verbose_only_flag should act as if the spec was
+	 executed, so increment execution_count before
+	 returning.  This prevents spurious warnings about
+	 unused linker input files, etc.  */
+      execution_count++;
+      return 1;
+    }
 #ifdef DEBUG
-      fnotice (stderr, "\nGo ahead? (y or n) ");
-      fflush (stderr);
-      i = getchar ();
-      if (i != '\n')
-	while (getchar () != '\n')
-	  ;
-
-      if (i != 'y' && i != 'Y')
-	return 0;
+  fnotice (stderr, "\nGo ahead? (y or n) ");
+  fflush (stderr);
+  i = getchar ();
+  if (i != '\n')
+    while (getchar () != '\n')
+      ;
+
+  if (i != 'y' && i != 'Y')
+    return 1;
 #endif /* DEBUG */
-    }
+
+  return 0;
+}
 
 #ifdef ENABLE_VALGRIND_CHECKING
+
+/* Append valgrind to each program.  */
+
+static void
+append_valgrind (struct obstack *to_be_released,
+		 int n_commands, struct command commands[])
+{
+  int i;
+
   /* Run the each command through valgrind.  To simplify prepending the
      path to valgrind and the option "-q" (for quiet operation unless
      something triggers), we allocate a separate argv array.  */
@@ -3221,7 +3656,7 @@ execute (void)
       for (argc = 0; commands[i].argv[argc] != NULL; argc++)
 	;
 
-      argv = XALLOCAVEC (const char *, argc + 3);
+      argv = obstack_alloc (to_be_released, (argc + 3) * sizeof (const char *));
 
       argv[0] = VALGRIND_PATH;
       argv[1] = "-q";
@@ -3232,15 +3667,16 @@ execute (void)
       commands[i].argv = argv;
       commands[i].prog = argv[0];
     }
+}
 #endif
 
-  /* Run each piped subprocess.  */
+/* Launch a list of commands asynchronously.  */
 
-  pex = pex_init (PEX_USE_PIPES | ((report_times || report_times_to_file)
-				   ? PEX_RECORD_TIMES : 0),
-		  progname, temp_filename);
-  if (pex == NULL)
-    fatal_error (input_location, "%<pex_init%> failed: %m");
+static void
+async_launch_commands (struct pex_obj *pex,
+		       int n_commands, struct command commands[])
+{
+  int i;
 
   for (i = 0; i < n_commands; i++)
     {
@@ -3267,151 +3703,341 @@ execute (void)
     }
 
   execution_count++;
+}
 
-  /* Wait for all the subprocesses to finish.  */
 
-  {
-    int *statuses;
-    struct pex_time *times = NULL;
-    int ret_code = 0;
+/* Wait for all the subprocesses to finish.  Return 0 on success, -1 on
+   failure.  */
 
-    statuses = (int *) alloca (n_commands * sizeof (int));
-    if (!pex_get_status (pex, n_commands, statuses))
-      fatal_error (input_location, "failed to get exit status: %m");
+static int
+await_commands_to_finish (struct pex_obj *pex,
+			  int n_commands, struct command commands[])
+{
 
-    if (report_times || report_times_to_file)
-      {
-	times = (struct pex_time *) alloca (n_commands * sizeof (struct pex_time));
-	if (!pex_get_times (pex, n_commands, times))
-	  fatal_error (input_location, "failed to get process times: %m");
-      }
+  int *statuses;
+  struct pex_time *times = NULL;
+  int ret_code = 0, i;
 
-    pex_free (pex);
+  statuses = (int *) alloca (n_commands * sizeof (int));
+  if (!pex_get_status (pex, n_commands, statuses))
+    fatal_error (input_location, "failed to get exit status: %m");
 
-    for (i = 0; i < n_commands; ++i)
-      {
-	int status = statuses[i];
+  if (report_times || report_times_to_file)
+    {
+      times = (struct pex_time *) alloca (n_commands * sizeof (*times));
+      if (!pex_get_times (pex, n_commands, times))
+	fatal_error (input_location, "failed to get process times: %m");
+    }
 
-	if (WIFSIGNALED (status))
-	  switch (WTERMSIG (status))
-	    {
-	    case SIGINT:
-	    case SIGTERM:
-	      /* SIGQUIT and SIGKILL are not available on MinGW.  */
+  for (i = 0; i < n_commands; ++i)
+    {
+      int status = statuses[i];
+
+      if (WIFSIGNALED (status))
+	switch (WTERMSIG (status))
+	  {
+	  case SIGINT:
+	  case SIGTERM:
+	    /* SIGQUIT and SIGKILL are not available on MinGW.  */
 #ifdef SIGQUIT
-	    case SIGQUIT:
+	  case SIGQUIT:
 #endif
 #ifdef SIGKILL
-	    case SIGKILL:
+	  case SIGKILL:
 #endif
-	      /* The user (or environment) did something to the
-		 inferior.  Making this an ICE confuses the user into
-		 thinking there's a compiler bug.  Much more likely is
-		 the user or OOM killer nuked it.  */
-	      fatal_error (input_location,
-			   "%s signal terminated program %s",
-			   strsignal (WTERMSIG (status)),
-			   commands[i].prog);
-	      break;
+	    /* The user (or environment) did something to the
+	       inferior.  Making this an ICE confuses the user into
+	       thinking there's a compiler bug.  Much more likely is
+	       the user or OOM killer nuked it.  */
+	    fatal_error (input_location,
+			 "%s signal terminated program %s",
+			 strsignal (WTERMSIG (status)),
+			 commands[i].prog);
+	    break;
 
 #ifdef SIGPIPE
-	    case SIGPIPE:
-	      /* SIGPIPE is a special case.  It happens in -pipe mode
-		 when the compiler dies before the preprocessor is
-		 done, or the assembler dies before the compiler is
-		 done.  There's generally been an error already, and
-		 this is just fallout.  So don't generate another
-		 error unless we would otherwise have succeeded.  */
-	      if (signal_count || greatest_status >= MIN_FATAL_STATUS)
-		{
-		  signal_count++;
-		  ret_code = -1;
-		  break;
-		}
+	  case SIGPIPE:
+	    /* SIGPIPE is a special case.  It happens in -pipe mode
+	       when the compiler dies before the preprocessor is
+	       done, or the assembler dies before the compiler is
+	       done.  There's generally been an error already, and
+	       this is just fallout.  So don't generate another
+	       error unless we would otherwise have succeeded.  */
+	    if (signal_count || greatest_status >= MIN_FATAL_STATUS)
+	      {
+		signal_count++;
+		ret_code = -1;
+		break;
+	      }
 #endif
-	      /* FALLTHROUGH */
+	    /* FALLTHROUGH.  */
 
-	    default:
-	      /* The inferior failed to catch the signal.  */
-	      internal_error_no_backtrace ("%s signal terminated program %s",
-					   strsignal (WTERMSIG (status)),
-					   commands[i].prog);
-	    }
-	else if (WIFEXITED (status)
-		 && WEXITSTATUS (status) >= MIN_FATAL_STATUS)
-	  {
-	    /* For ICEs in cc1, cc1obj, cc1plus see if it is
-	       reproducible or not.  */
-	    const char *p;
-	    if (flag_report_bug
-		&& WEXITSTATUS (status) == ICE_EXIT_CODE
-		&& i == 0
-		&& (p = strrchr (commands[0].argv[0], DIR_SEPARATOR))
-		&& ! strncmp (p + 1, "cc1", 3))
-	      try_generate_repro (commands[0].argv);
-	    if (WEXITSTATUS (status) > greatest_status)
-	      greatest_status = WEXITSTATUS (status);
-	    ret_code = -1;
+	  default:
+	    /* The inferior failed to catch the signal.  */
+	    internal_error_no_backtrace ("%s signal terminated program %s",
+					 strsignal (WTERMSIG (status)),
+					 commands[i].prog);
 	  }
+      else if (WIFEXITED (status)
+	       && WEXITSTATUS (status) >= MIN_FATAL_STATUS)
+	{
+	  /* For ICEs in cc1, cc1obj, cc1plus see if it is
+	     reproducible or not.  */
+	  const char *p;
+	  if (flag_report_bug
+	      && WEXITSTATUS (status) == ICE_EXIT_CODE
+	      && i == 0
+	      && (p = strrchr (commands[0].argv[0], DIR_SEPARATOR))
+	      && ! strncmp (p + 1, "cc1", 3))
+	    try_generate_repro (commands[0].argv);
+	  if (WEXITSTATUS (status) > greatest_status)
+	    greatest_status = WEXITSTATUS (status);
+	  ret_code = -1;
+	}
 
-	if (report_times || report_times_to_file)
-	  {
-	    struct pex_time *pt = &times[i];
-	    double ut, st;
+      if (report_times || report_times_to_file)
+	{
+	  struct pex_time *pt = &times[i];
+	  double ut, st;
 
-	    ut = ((double) pt->user_seconds
-		  + (double) pt->user_microseconds / 1.0e6);
-	    st = ((double) pt->system_seconds
-		  + (double) pt->system_microseconds / 1.0e6);
+	  ut = ((double) pt->user_seconds
+		+ (double) pt->user_microseconds / 1.0e6);
+	  st = ((double) pt->system_seconds
+		+ (double) pt->system_microseconds / 1.0e6);
 
-	    if (ut + st != 0)
-	      {
-		if (report_times)
-		  fnotice (stderr, "# %s %.2f %.2f\n",
-			   commands[i].prog, ut, st);
+	  if (ut + st != 0)
+	    {
+	      if (report_times)
+		fnotice (stderr, "# %s %.2f %.2f\n",
+			 commands[i].prog, ut, st);
 
-		if (report_times_to_file)
-		  {
-		    int c = 0;
-		    const char *const *j;
+	      if (report_times_to_file)
+		{
+		  int c = 0;
+		  const char *const *j;
 
-		    fprintf (report_times_to_file, "%g %g", ut, st);
+		  fprintf (report_times_to_file, "%g %g", ut, st);
 
-		    for (j = &commands[i].prog; *j; j = &commands[i].argv[++c])
-		      {
-			const char *p;
-			for (p = *j; *p; ++p)
-			  if (*p == '"' || *p == '\\' || *p == '$'
-			      || ISSPACE (*p))
-			    break;
+		  for (j = &commands[i].prog; *j; j = &commands[i].argv[++c])
+		    {
+		      const char *p;
+		      for (p = *j; *p; ++p)
+			if (*p == '"' || *p == '\\' || *p == '$'
+			    || ISSPACE (*p))
+			  break;
 
-			if (*p)
-			  {
-			    fprintf (report_times_to_file, " \"");
-			    for (p = *j; *p; ++p)
-			      {
-				if (*p == '"' || *p == '\\' || *p == '$')
-				  fputc ('\\', report_times_to_file);
-				fputc (*p, report_times_to_file);
-			      }
-			    fputc ('"', report_times_to_file);
-			  }
-			else
-			  fprintf (report_times_to_file, " %s", *j);
-		      }
+		      if (*p)
+			{
+			  fprintf (report_times_to_file, " \"");
+			  for (p = *j; *p; ++p)
+			    {
+			      if (*p == '"' || *p == '\\' || *p == '$')
+				fputc ('\\', report_times_to_file);
+			      fputc (*p, report_times_to_file);
+			    }
+			  fputc ('"', report_times_to_file);
+			}
+		      else
+			fprintf (report_times_to_file, " %s", *j);
+		    }
 
-		    fputc ('\n', report_times_to_file);
-		  }
-	      }
-	  }
+		  fputc ('\n', report_times_to_file);
+		}
+	    }
+	}
+    }
+
+  return ret_code;
+}
+
+/* Split a single command with pipes into several commands.  */
+
+static void
+split_commands (vec<const_char_p> *argbuf_p,
+		int n_commands, struct command commands[])
+{
+  int i;
+  const char *arg;
+  vec<const_char_p> &argbuf = *argbuf_p;
+
+  for (n_commands = 1, i = 0; argbuf.iterate (i, &arg); i++)
+    if (arg && strcmp (arg, "|") == 0)
+      {				/* each command.  */
+	const char *string;
+#if defined (__MSDOS__) || defined (OS2) || defined (VMS)
+	fatal_error (input_location, "%<-pipe%> not supported");
+#endif
+	argbuf[i] = 0; /* Termination of command args.  */
+	commands[n_commands].prog = argbuf[i + 1];
+	commands[n_commands].argv
+	  = &(argbuf.address ())[i + 1];
+	string = find_a_file (&exec_prefixes, commands[n_commands].prog,
+			      X_OK, false);
+	if (string)
+	  commands[n_commands].argv[0] = string;
+	n_commands++;
       }
+}
+
+struct command *
+parse_argbuf (vec <const_char_p> *argbuf_p, int *n)
+{
+  int i, n_commands;
+  vec<const_char_p> &argbuf = *argbuf_p;
+  const char *arg;
+  struct command *commands;
 
-   if (commands[0].argv[0] != commands[0].prog)
-     free (CONST_CAST (char *, commands[0].argv[0]));
+  /* Count # of piped commands.  */
+  for (n_commands = 1, i = 0; argbuf.iterate (i, &arg); i++)
+    if (strcmp (arg, "|") == 0)
+      n_commands++;
 
-    return ret_code;
-  }
+  /* Get storage for each command.  */
+  commands = XNEWVEC (struct command, n_commands);
+
+  /* Split argbuf into its separate piped processes,
+     and record info about each one.
+     Also search for the programs that are to be run.  */
+
+  argbuf.safe_push (0);
+
+  commands[0].prog = argbuf[0]; /* first command.  */
+  commands[0].argv = argbuf.address ();
+
+  split_commands (argbuf_p, n_commands, commands);
+
+  *n = n_commands;
+  return commands;
+}
+
+/* Execute the command specified by the arguments on the current line of spec.
+   When using pipes, this includes several piped-together commands
+   with `|' between them.
+
+   Return 0 if successful, -1 if failed.  */
+
+static int
+execute (void)
+{
+  struct pex_obj *pex;
+  struct command *commands;	 /* each command buffer with program to call
+				    and arguments.  */
+  int n_commands;		 /* # of command.  */
+  int ret = 0;
+
+  struct command additional_ld = {NULL, NULL};
+  extra_arg_storer storer;
+
+  struct command *commands_batch;
+  int n;
+
+  gcc_assert (!processing_spec_function);
+
+  if (wrapper_string)
+    {
+      char *string = find_a_file (&exec_prefixes, argbuf[0], X_OK, false);
+      if (string)
+	argbuf[0] = string;
+      insert_wrapper (wrapper_string);
+    }
+
+  /* Parse the argbuf into several commands.  */
+  commands = parse_argbuf (&argbuf, &n_commands);
+
+  if (!have_S && !have_E && flag_parallel_jobs)
+    append_split_outputs (&storer, &additional_ld, &commands, &n_commands);
+
+  if (!wrapper_string)
+    {
+      char *string = find_a_file (&exec_prefixes, commands[0].prog,
+				  X_OK, false);
+      if (string)
+	commands[0].argv[0] = string;
+    }
+
+  /* If -v, print what we are about to do, and maybe query.  */
+
+  if (verbose_flag)
+    {
+      int ret_verbose = handle_verbose (n_commands, commands);
+      if (ret_verbose > 0)
+	{
+	  ret = 0;
+	  goto cleanup;
+	}
+    }
+
+#ifdef ENABLE_VALGRIND_CHECKING
+  /* Stack of strings to be released on function return.  */
+  struct obstack to_be_released;
+  obstack_init (&to_be_released);
+  append_valgrind (&to_be_released, n_commands, commands);
+#endif
+
+  /* FIXME: Interact with GNU Jobserver if necessary.  */
+
+  commands_batch = commands;
+  n = flag_parallel_jobs? 1: n_commands;
+
+  for (int i = 0; i < n_commands; i += n)
+    {
+      /* Run each piped subprocess.  */
+
+      pex = pex_init (PEX_USE_PIPES | ((report_times || report_times_to_file)
+				       ? PEX_RECORD_TIMES : 0),
+		      progname, temp_filename);
+      if (pex == NULL)
+	fatal_error (input_location, "%<pex_init%> failed: %m");
+
+      /* Lauch the commands.  */
+      async_launch_commands (pex, n, commands_batch);
+
+      /* Await them to be done.  */
+      ret |= await_commands_to_finish (pex, n, commands_batch);
+
+      commands_batch = commands_batch + n;
+
+      /* Cleanup.  */
+      pex_free (pex);
+    }
+
+
+  if (ret != 0)
+    goto cleanup;
+
+  /* Run extra ld call.  */
+  if (!EMPTY_CMD (additional_ld))
+    {
+      /* If we are here, we must be sure that we had at least two object
+	 files to link.  */
+      //gcc_assert (n_commands != 1);
+
+      pex = pex_init (PEX_USE_PIPES | ((report_times || report_times_to_file)
+				       ? PEX_RECORD_TIMES : 0),
+		      progname, temp_filename);
+
+      if (verbose_flag)
+	print_command (&additional_ld);
+
+      async_launch_commands (pex, 1, &additional_ld);
+      ret = await_commands_to_finish (pex, 1, &additional_ld);
+      pex_free (pex);
+    }
+
+
+#ifdef ENABLE_VALGRIND_CHECKING
+  obstack_free (&to_be_released, NULL);
+#endif
+
+cleanup:
+  if (commands[0].argv[0] != commands[0].prog)
+    free (CONST_CAST (char *, commands[0].argv[0]));
+
+  free (commands);
+
+  return ret;
 }
+
 \f
 /* Find all the switches given to us
    and make a vector describing them.
@@ -3480,29 +4106,33 @@ static int n_switches_alloc_debug_check[2];
 
 static char *debug_check_temp_file[2];
 
-/* Language is one of three things:
-
-   1) The name of a real programming language.
-   2) NULL, indicating that no one has figured out
-   what it is yet.
-   3) '*', indicating that the file should be passed
-   to the linker.  */
-struct infile
+static const char *
+fsplit_arg (extra_arg_storer *storer)
 {
-  const char *name;
-  const char *language;
-  struct compiler *incompiler;
-  bool compiled;
-  bool preprocessed;
-};
+  const char *tempname = make_temp_file ("additional-asm");
+  const char arg[] = "-fsplit-outputs=";
+  char *final;
 
-/* Also a vector of input files specified.  */
+  size_t n = ARRAY_SIZE (arg) + strlen (tempname);
 
-static struct infile *infiles;
+  gcc_assert (current_infile);
 
-int n_infiles;
+  current_infile->temp_additional_asm = tempname;
+
+  /* Remove file, once we may not even need it and create it later.  */
+  /* FIXME: This is a little hackish.  */
+  remove (tempname);
+
+  final = storer->create_string (n);
+
+  strcpy (final, arg);
+  strcat (final, tempname);
+
+  record_temp_file (tempname, true, true);
+
+  return final;
+}
 
-static int n_infiles_alloc;
 
 /* True if undefined environment variables encountered during spec processing
    are ok to ignore, typically when we're running for --help or --version.  */
@@ -3683,6 +4313,8 @@ alloc_infile (void)
     {
       n_infiles_alloc = 16;
       infiles = XNEWVEC (struct infile, n_infiles_alloc);
+      memset (infiles, 0x00, sizeof (*infiles) * n_infiles_alloc);
+
     }
   else if (n_infiles_alloc == n_infiles)
     {
@@ -4648,6 +5280,9 @@ process_command (unsigned int decoded_options_count,
       switch (decoded_options[j].opt_index)
 	{
 	case OPT_S:
+	  have_S = 1;
+	  have_c = 1;
+	  break;
 	case OPT_c:
 	case OPT_E:
 	  have_c = 1;
@@ -6155,11 +6790,14 @@ do_spec_1 (const char *spec, int inswitch, const char *soft_matched_part)
 		  open_at_file ();
 
 		for (i = 0; (int) i < n_infiles; i++)
-		  if (compile_input_file_p (&infiles[i]))
-		    {
-		      store_arg (infiles[i].name, 0, 0);
-		      infiles[i].compiled = true;
-		    }
+		  {
+		    current_infile = &infiles[i];
+		    if (compile_input_file_p (current_infile))
+		      {
+			store_arg (current_infile->name, 0, 0);
+			current_infile->compiled = true;
+		      }
+		  }
 
 		if (at_file_supplied)
 		  close_at_file ();
@@ -6515,7 +7153,7 @@ do_spec_1 (const char *spec, int inswitch, const char *soft_matched_part)
 		     "%{foo=*:bar%*}%{foo=*:one%*two}"
 
 		   matches -foo=hello then it will produce:
-		   
+
 		     barhello onehellotwo
 		*/
 		if (*p == 0 || *p == '}')
@@ -8642,6 +9280,7 @@ driver::do_spec_on_infiles () const
   for (i = 0; (int) i < n_infiles; i++)
     {
       int this_file_error = 0;
+      current_infile = &infiles[i];
 
       /* Tell do_spec what to substitute for %i.  */
 
@@ -8761,12 +9400,15 @@ driver::do_spec_on_infiles () const
       int i;
 
       for (i = 0; i < n_infiles ; i++)
-	if (infiles[i].incompiler
-	    || (infiles[i].language && infiles[i].language[0] != '*'))
-	  {
-	    set_input (infiles[i].name);
-	    break;
-	  }
+	{
+	  current_infile = &infiles[i];
+	  if (infiles[i].incompiler
+	      || (infiles[i].language && infiles[i].language[0] != '*'))
+	    {
+	      set_input (infiles[i].name);
+	      break;
+	    }
+	}
     }
 
   if (!seen_error ())
@@ -8788,11 +9430,31 @@ driver::maybe_run_linker (const char *argv0) const
   int linker_was_run = 0;
   int num_linker_inputs;
 
-  /* Determine if there are any linker input files.  */
-  num_linker_inputs = 0;
-  for (i = 0; (int) i < n_infiles; i++)
-    if (explicit_link_files[i] || outfiles[i] != NULL)
-      num_linker_inputs++;
+  /* Set outfiles to be the temporary object vector.  */
+  const char **outfiles_holder = outfiles;
+  int n_infiles_holder = n_infiles;
+  bool outfiles_switched = false;
+  if (temp_object_files.length () > 0)
+    {
+      /* Insert explicit link files into the temp object vector.  */
+
+      for (i = 0; (int) i < n_infiles; i++)
+	if (explicit_link_files[i] && outfiles[i] != NULL)
+	  temp_object_files.safe_push (outfiles[i]);
+
+      num_linker_inputs = n_infiles = temp_object_files.length ();
+      temp_object_files.safe_push (NULL); /* the NULL sentinel.  */
+      outfiles = temp_object_files.address ();
+    }
+  else /* Fall back to the old method.  */
+    {
+
+      /* Determine if there are any linker input files.  */
+      num_linker_inputs = 0;
+      for (i = 0; (int) i < n_infiles; i++)
+	if (explicit_link_files[i] || outfiles[i] != NULL)
+	  num_linker_inputs++;
+    }
 
   /* Arrange for temporary file names created during linking to take
      on names related with the linker output rather than with the
@@ -8897,14 +9559,24 @@ driver::maybe_run_linker (const char *argv0) const
     }
 
   /* If options said don't run linker,
-     complain about input files to be given to the linker.  */
+     complain about input files to be given to the linker.
+     When fsplit-arg is active, the linker will run and this if
+     will not be triggered.  */
 
-  if (! linker_was_run && !seen_error ())
+  if (!outfiles_switched && !linker_was_run && !seen_error ()
+      && temp_object_files.length () == 0)
     for (i = 0; (int) i < n_infiles; i++)
       if (explicit_link_files[i]
 	  && !(infiles[i].language && infiles[i].language[0] == '*'))
 	warning (0, "%s: linker input file unused because linking not done",
 		 outfiles[i]);
+
+  if (outfiles_switched)
+    {
+      /* Undo our changes.  */
+      outfiles = outfiles_holder;
+      n_infiles = n_infiles_holder;
+    }
 }
 
 /* The end of "main".  */
@@ -10808,6 +11480,7 @@ driver::finalize ()
   linker_options.truncate (0);
   assembler_options.truncate (0);
   preprocessor_options.truncate (0);
+  temp_object_files.truncate (0);
 
   path_prefix_reset (&exec_prefixes);
   path_prefix_reset (&startfile_prefixes);
diff --git a/gcc/testsuite/driver/a.c b/gcc/testsuite/driver/a.c
new file mode 100644
index 00000000000..c6b8c2eb61e
--- /dev/null
+++ b/gcc/testsuite/driver/a.c
@@ -0,0 +1,6 @@
+int puts (const char *);
+
+void a_func (void)
+{
+  puts ("A test");
+}
diff --git a/gcc/testsuite/driver/b.c b/gcc/testsuite/driver/b.c
new file mode 100644
index 00000000000..76a2cba0bd9
--- /dev/null
+++ b/gcc/testsuite/driver/b.c
@@ -0,0 +1,6 @@
+int puts (const char *);
+
+void a_func (void)
+{
+  puts ("Another test");
+}
diff --git a/gcc/testsuite/driver/driver.exp b/gcc/testsuite/driver/driver.exp
new file mode 100644
index 00000000000..2bbaf07778a
--- /dev/null
+++ b/gcc/testsuite/driver/driver.exp
@@ -0,0 +1,80 @@
+#   Copyright (C) 2008-2020 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+# 
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+# 
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# Load support procs.
+load_lib gcc-dg.exp
+
+proc check-for-errors { test input } {
+    if { [string equal "$input" ""] } then {
+	pass "$test: std out"
+    } else {
+	fail "$test: std out\n$input"
+    }
+}
+
+if ![check_effective_target_pthread] {
+  return
+}
+
+# If a testcase doesn't have special options, use these.
+global DEFAULT_CFLAGS
+if ![info exists DEFAULT_CFLAGS] then {
+    set DEFAULT_CFLAGS " -ansi -pedantic-errors"
+}
+
+# Initialize `dg'.
+dg-init
+
+
+# Test multi-input compilation
+check-for-errors "Multi-input Compilation" \
+	[gcc_target_compile "$srcdir/$subdir/a.c $srcdir/$subdir/b.c -c" "" none ""]
+
+# Compile file and generate an assembler and object file
+check-for-errors "Object Generation" \
+	[gcc_target_compile "$srcdir/$subdir/a.c -c" "a.o" none ""]
+check-for-errors "Object Generation" \
+	[gcc_target_compile "$srcdir/$subdir/b.c -c" "a.o" none ""]
+check-for-errors "Assembler Generation" \
+	[gcc_target_compile "$srcdir/$subdir/a.c -S" "a.S" none ""]
+check-for-errors "Assembler Generation" \
+	[gcc_target_compile "$srcdir/$subdir/b.c -S" "b.S" none ""]
+
+# Empty file is a valid program
+check-for-errors "Empty Program" \
+	[gcc_target_compile "$srcdir/$subdir/empty.c -c" "empty.o" none ""]
+
+# Test object file passthrough
+check-for-errors "Object file passthrough" \
+	[gcc_target_compile "$srcdir/$subdir/foo.c a.o" "a.exe" none ""]
+
+# Test compilation when assembler is provided
+check-for-errors "Assembler with Macros" \
+	[gcc_target_compile "a.S -c" "a.o" none ""]
+
+# Clean temporary generated files.
+set temp_files {"a.o" "a.S" "b.o" "b.S" "empty.o"}
+
+foreach f $temp_files {
+	if { [file exists $f] } {
+		file delete $f
+	}
+}
+
+# All done.
+dg-finish
diff --git a/gcc/testsuite/driver/empty.c b/gcc/testsuite/driver/empty.c
new file mode 100644
index 00000000000..e69de29bb2d
diff --git a/gcc/testsuite/driver/foo.c b/gcc/testsuite/driver/foo.c
new file mode 100644
index 00000000000..a18fd2a3b14
--- /dev/null
+++ b/gcc/testsuite/driver/foo.c
@@ -0,0 +1,7 @@
+void a_func (void);
+
+int main()
+{
+  a_func ();
+  return 0;
+}
-- 
2.28.0


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 2/6] Implement a new partitioner for parallel compilation
  2020-08-20 22:00 [PATCH 0/6] Parallelize Intra-Procedural Optimizations using the LTO Engine Giuliano Belinassi
  2020-08-20 22:00 ` [PATCH 1/6] Modify gcc driver for parallel compilation Giuliano Belinassi
@ 2020-08-20 22:00 ` Giuliano Belinassi
  2020-08-27 15:18   ` Jan Hubicka
  2020-08-31  9:25   ` Richard Biener
  2020-08-20 22:00 ` [PATCH 3/6] Implement fork-based parallelism engine Giuliano Belinassi
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 31+ messages in thread
From: Giuliano Belinassi @ 2020-08-20 22:00 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.guenther, hubicka

When using the LTO infrastructure to compile files in parallel, we
can't simply use any of the LTO partitioner, once extra dependency
analysis is required to ensure that some nodes are correctly
partitioned together.

Therefore, here we implement a new partitioner called
"lto_merge_comdat_map" that does all these required analysis.
The partitioner works as follows:

1. We create a number of disjoint sets and inserts each node into a
   separate set, which may be merged together in the future.

2. Find COMDAT groups, and mark them to be partitioned together.

3. Check all nodes that would require any COMDAT group to be
   copied to its partition (which we name "COMDAT frontier"),
   and mark them to be partitioned together.
   This avoids duplication of COMDAT groups and crashes on the LTO
   partitioning infrastructure.

4. Check if the user allows the partitioner to promote non-public
   functions or variables to global to improve parallelization
   opportunity with a cost of modifying the output code layout.

5. Balance generated partitions for performance unless not told to.

The choice of 1. was by design, so we could use a union-find
data structure, which are know for being very fast on set unite
operations.

For 3. to work properly, we also had to modify
lto_promote_cross_file_statics to handle this case.

The parameters --param=promote-statics and --param=balance-partitions
control 4. and 5., respectively

gcc/ChangeLog:
2020-08-20  Giuliano Belinassi  <giuliano.belinassi@usp.br>

	* Makefile.in: Add lto-partition.o
	* cgraph.h (struct symtab_node::aux2): New variable.
	* lto-partition.c: Move from gcc/lto/lto-partition.c
	(add_symbol_to_partition_1): Only compute insn size
	if information is available.
	(node_cmp): Same as above.
	(class union_find): New.
	(ds_print_roots): New function.
	(balance_partitions): New function.
	(build_ltrans_partitions): New function.
	(merge_comdat_nodes): New function.
	(merge_static_calls): New function.
	(merge_contained_symbols): New function.
	(lto_merge_comdat_map): New function.
	(privatize_symbol_name_1): Handle when WPA is not enabled.
	(privatize_symbol_name): Same as above.
	(lto_promote_cross_file_statics): New parameter to select when
	to promote to global.
	(lto_check_usage_from_other_partitions): New function.
	* lto-partition.h: Move from gcc/lto/lto-partition.h
	(lto_promote_cross_file_statics): Update prototype.
	(lto_check_usage_from_other_partitions): Declare.
	(lto_merge_comdat_map): Declare.

gcc/lto/ChangeLog:
2020-08-20  Giuliano Belinassi  <giuliano.belinassi@usp.br>

	* lto-partition.c: Move to gcc/lto-partition.c.
	* lto-partition.h: Move to gcc/lto-partition.h.
	* lto.c: Update call to lto_promote_cross_file_statics.
	* Makefile.in: Remove lto-partition.o.
---
 gcc/Makefile.in               |   1 +
 gcc/cgraph.h                  |   1 +
 gcc/{lto => }/lto-partition.c | 463 +++++++++++++++++++++++++++++++++-
 gcc/{lto => }/lto-partition.h |   4 +-
 gcc/lto/Make-lang.in          |   4 +-
 gcc/lto/lto.c                 |   2 +-
 gcc/params.opt                |   8 +
 gcc/tree.c                    |  23 +-
 8 files changed, 489 insertions(+), 17 deletions(-)
 rename gcc/{lto => }/lto-partition.c (78%)
 rename gcc/{lto => }/lto-partition.h (89%)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 79e854aa938..be42b15f4ff 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1457,6 +1457,7 @@ OBJS = \
 	lra-spills.o \
 	lto-cgraph.o \
 	lto-streamer.o \
+	lto-partition.o \
 	lto-streamer-in.o \
 	lto-streamer-out.o \
 	lto-section-in.o \
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 0211f08964f..b4a7871bd3d 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -615,6 +615,7 @@ public:
   struct lto_file_decl_data * lto_file_data;
 
   PTR GTY ((skip)) aux;
+  int aux2;
 
   /* Comdat group the symbol is in.  Can be private if GGC allowed that.  */
   tree x_comdat_group;
diff --git a/gcc/lto/lto-partition.c b/gcc/lto-partition.c
similarity index 78%
rename from gcc/lto/lto-partition.c
rename to gcc/lto-partition.c
index 8e0488ab13e..ca962e69b5d 100644
--- a/gcc/lto/lto-partition.c
+++ b/gcc/lto-partition.c
@@ -170,7 +170,11 @@ add_symbol_to_partition_1 (ltrans_partition part, symtab_node *node)
     {
       struct cgraph_edge *e;
       if (!node->alias && c == SYMBOL_PARTITION)
-	part->insns += ipa_size_summaries->get (cnode)->size;
+	{
+	  /* FIXME: Find out why this is being returned NULL in some cases.  */
+	  if (ipa_size_summaries->get (cnode))
+	    part->insns += ipa_size_summaries->get (cnode)->size;
+	}
 
       /* Add all inline clones and callees that are duplicated.  */
       for (e = cnode->callees; e; e = e->next_callee)
@@ -372,6 +376,402 @@ lto_max_map (void)
     new_partition ("empty");
 }
 
+/* Class implementing a union-find algorithm.  */
+
+class union_find
+{
+public:
+
+  int *parent;
+  int *rank;
+  int n;
+  int successful_unions;
+
+  union_find (int num_nodes)
+    {
+      n = num_nodes;
+      parent = XNEWVEC (int, n);
+      rank   = XNEWVEC (int, n);
+
+      for (int i = 0; i < n; ++i)
+	parent[i] = i;
+
+      memset (rank, 0, n*sizeof(*rank));
+      successful_unions = 0;
+    }
+
+  ~union_find ()
+    {
+      free (parent);
+      free (rank);
+    }
+
+  int find (int x)
+    {
+      while (parent[x] != x)
+	{
+	  parent[x] = parent[parent[x]];
+	  x = parent[x];
+	}
+      return x;
+    }
+
+  void unite (int x, int y)
+    {
+      int x_root = find (x);
+      int y_root = find (y);
+
+      if (x_root == y_root) /* If x and y are in same set.  */
+	return;
+
+      successful_unions++;
+
+      if (rank[x_root] > rank[y_root]) /* Get which ones have greater rank.  */
+	{
+	  x_root ^= y_root; /* Swap.  */
+	  y_root ^= x_root;
+	  x_root ^= y_root;
+	}
+
+      parent[y_root] = x_root;
+      if (rank[x_root] == rank[y_root])
+	rank[x_root]++;
+    }
+
+  void print_roots ()
+    {
+      int i;
+      for (i = 0; i < n; ++i)
+	printf ("%d, ", find (i));
+      printf ("\n");
+    }
+};
+
+static union_find *ds;
+
+DEBUG_FUNCTION void ds_print_roots (void)
+{
+  ds->print_roots ();
+}
+
+static bool
+privatize_symbol_name (symtab_node *);
+
+static void
+promote_symbol (symtab_node *);
+
+/* Quickly balance partitions, trying to reach target_size in each of
+   them.  Returns true if something was done, or false if we decided
+   that it is not worth.  */
+
+static bool
+balance_partitions (union_find *ds, int n, int jobs)
+{
+  int *sizes, i, j;
+  int total_size = 0, max_size = -1;
+  int target_size;
+  const int eps = 0;
+
+  symtab_node *node;
+
+  sizes = (int *) alloca (n * sizeof (*sizes));
+  memset (sizes, 0, n * sizeof (*sizes));
+ 
+  /* Compute costs.  */
+  i = 0;
+  FOR_EACH_SYMBOL (node)
+    {
+      int root = ds->find (i);
+
+      if (cgraph_node *cnode = dyn_cast<cgraph_node *> (node))
+	{
+	  ipa_size_summary *summary = ipa_size_summaries->get (cnode);
+	  if (summary)
+	    sizes[root] += summary->size;
+	  else
+	    sizes[root] += 10;
+	}
+      else
+	sizes[root] += 10;
+
+
+      i++;
+    }
+
+  /* Compute the total size and maximum size.  */
+  for (i = 0; i < n; ++i)
+    {
+      total_size += sizes[i];
+      max_size    = MAX (max_size, sizes[i]);
+    }
+
+  /* Quick return if total size is small.  */
+  if (total_size < param_min_partition_size)
+    return false;
+
+  target_size = total_size / (jobs + 1);
+
+  /* Unite small partitions.  */
+  for (i = 0, j = 0; j < n; ++j)
+    {
+      if (sizes[j] == 0)
+	continue;
+
+      if (i == -1)
+	i = j;
+      else
+	{
+	  if (sizes[i] + sizes[j] < target_size + eps)
+	    {
+	      ds->unite (i, j);
+	      sizes[i] += sizes[j];
+	      sizes[j] = 0;
+	    }
+	  else
+	      i = j;
+	}
+    }
+  return true;
+}
+
+/* Builds the LTRANS partitions, or return if not needed.  */
+
+static int
+build_ltrans_partitions (union_find *ds, int n)
+{
+  int i, n_partitions;
+  symtab_node *node;
+
+  int *compression = (int *) alloca (n * sizeof (*compression));
+  for (i = 0; i < n; ++i)
+    compression[i] = -1; /* Invalid value.  */
+
+  i = 0, n_partitions = 0;
+  FOR_EACH_SYMBOL (node)
+    {
+      int root = ds->find (i);
+      node->aux2 = root;
+      node->aux = NULL;
+
+      if (node->get_partitioning_class () == SYMBOL_PARTITION
+	  && compression[root] < 0)
+	compression[root] = n_partitions++;
+      i++;
+    }
+
+  if (dump_file)
+    fprintf (dump_file, "n_partitions = %d\n", n_partitions);
+
+  if (n_partitions <= 1)
+    return false;
+
+  /* Create LTRANS partitions.  */
+  ltrans_partitions.create (n_partitions);
+  for (i = 0; i < n_partitions; i++)
+    new_partition ("");
+
+  FOR_EACH_SYMBOL (node)
+    {
+      if (node->get_partitioning_class () != SYMBOL_PARTITION
+	  || symbol_partitioned_p (node))
+	  continue;
+
+      int p = compression[node->aux2];
+      if (dump_file)
+	fprintf (dump_file, "p = %d\t;; %s\n", p, node->dump_name ());
+      add_symbol_to_partition (ltrans_partitions[p], node);
+    }
+
+  return true;
+}
+
+/* Partition COMDAT groups together, and also bring together nodes that
+   requires them. Such nodes that are not in the COMDAT group that have
+   references to COMDAT grouped nodes are called the COMDAT frontier.  */
+
+static bool
+merge_comdat_nodes (symtab_node *node, int set)
+{
+  enum symbol_partitioning_class c = node->get_partitioning_class ();
+  bool ret = false;
+  symtab_node *node1;
+  cgraph_edge *e;
+
+  /* If node is already analysed, quickly return.  */
+  if (node->aux)
+    return false;
+
+  /* Mark as analysed.  */
+  node->aux = (void *) 1;
+
+
+  /* Aglomerate the COMDAT group into the same partition.  */
+  if (node->same_comdat_group)
+    {
+      for (node1 = node->same_comdat_group;
+	   node1 != node; node1 = node1->same_comdat_group)
+	if (!node->alias)
+	  {
+	    ds->unite (node1->aux2, set);
+	    merge_comdat_nodes (node1, set);
+	    ret = true;
+	  }
+    }
+
+  /* Look at nodes that can reach the COMDAT group, and aglomerate to the
+     same partition.  These nodes are called the "COMDAT Frontier".  The
+     idea is that every unpartitioned node that reaches a COMDAT group MUST
+     go through the COMDAT frontier before reaching it.  Therefore, only
+     nodes in the frontier are exported.  */
+  if (node->same_comdat_group || c == SYMBOL_DUPLICATE)
+    {
+      int i;
+      struct ipa_ref *ref = NULL;
+
+      if (cgraph_node *cnode = dyn_cast <cgraph_node *> (node))
+	{
+	  /* Add all inline clones and callees that are duplicated.  */
+	  for (e = cnode->callers; e; e = e->next_caller)
+	    if (!e->inline_failed || c == SYMBOL_DUPLICATE)
+	      {
+		ds->unite (set, e->caller->aux2);
+		merge_comdat_nodes (e->caller, set);
+		ret = true;
+	      }
+
+	  /* Add all thunks associated with the function.  */
+	  for (e = cnode->callees; e; e = e->next_callee)
+	    if (e->caller->thunk.thunk_p && !e->caller->inlined_to)
+	      {
+		ds->unite (set, e->callee->aux2);
+		merge_comdat_nodes (e->callee, set);
+		ret = true;
+	      }
+	}
+
+      for (i = 0; node->iterate_referring (i, ref); i++)
+	{
+	  symtab_node *node1 = ref->referring;
+	  ds->unite (node1->aux2, set);
+	  ret = true;
+
+	  if (node1->get_partitioning_class () == SYMBOL_DUPLICATE)
+	    merge_comdat_nodes (node1, set);
+	}
+    }
+
+  return ret;
+}
+
+/* Bring together static nodes that are called by static functions, so 
+   promotion of statics to globals are not required.  This *MIGHT* negatively
+   impact the number of partitions, and even generate very umbalanced
+   partitions that can't be fixed.  */
+
+static bool
+merge_static_calls (symtab_node *node, int set)
+{
+  bool ret = false;
+  enum symbol_partitioning_class c = node->get_partitioning_class ();
+
+  if (node->aux)
+    return false;
+
+  node->aux = (void *) 1;
+
+
+  if (!TREE_PUBLIC (node->decl) || c == SYMBOL_DUPLICATE)
+    {
+      int i;
+      struct ipa_ref *ref = NULL;
+
+      if (cgraph_node *cnode = dyn_cast <cgraph_node *> (node))
+	{
+	  for (cgraph_edge *e = cnode->callers; e; e = e->next_caller)
+	    {
+	      /* FIXME: In theory, inlined functions should be a criteria to not
+		 merge partitions.  */
+	      ds->unite (node->aux2, e->caller->aux2);
+	      merge_static_calls (e->caller, set);
+	      ret = true;
+	    }
+
+	}
+
+      for (i = 0; node->iterate_referring (i, ref); ++i)
+	{
+	  symtab_node *node1 = ref->referring;
+	  ds->unite (node1->aux2, set);
+	  merge_static_calls (node1, set);
+	  ret = true;
+	}
+    }
+
+  return ret;
+}
+
+static bool
+merge_contained_symbols (symtab_node *node, int set)
+{
+  bool ret = false;
+  symtab_node *node1;
+
+  while ((node1 = contained_in_symbol (node)) != node)
+    {
+      node = node1;
+      ds->unite (node->aux2, set);
+      ret = true;
+    }
+
+  return ret;
+}
+
+/* Partition the program into several partitions with a restriction that
+   COMDATS are partitioned together with all nodes requiring them.  If
+   promote_statics is false, we also partition together static functions
+   and nodes that call eachother, so non-public functions are not promoted
+   to globals.  */
+
+void
+lto_merge_comdat_map (bool balance, bool promote_statics, int jobs)
+{
+  symtab_node *node;
+  int n = 0;
+
+  /* Initialize each not into its own distinct disjoint sets.  */
+  FOR_EACH_SYMBOL (node)
+    node->aux2 = n++;
+
+  union_find disjoint_sets = union_find (n);
+  ds = &disjoint_sets;
+
+  /* First look at COMDATs.  */
+  FOR_EACH_SYMBOL (node)
+    {
+      if (node->same_comdat_group)
+	merge_comdat_nodes (node, node->aux2);
+      merge_contained_symbols (node, node->aux2);
+    }
+
+  FOR_EACH_SYMBOL (node)
+    node->aux = NULL;
+
+  /* Then look at STATICs, if needed.  */
+  if (!promote_statics)
+    FOR_EACH_SYMBOL (node)
+      if (!TREE_PUBLIC (node->decl))
+	merge_static_calls (node, node->aux2);
+
+  FOR_EACH_SYMBOL (node)
+    node->aux = NULL;
+
+  if (balance && !balance_partitions (&disjoint_sets, n, jobs))
+    return;
+
+  build_ltrans_partitions (&disjoint_sets, n);
+}
+
+
 /* Helper function for qsort; sort nodes by order.  */
 static int
 node_cmp (const void *pa, const void *pb)
@@ -931,7 +1331,7 @@ static hash_map<const char *, unsigned> *lto_clone_numbers;
    represented by DECL.  */
 
 static bool
-privatize_symbol_name_1 (symtab_node *node, tree decl)
+privatize_symbol_name_1 (symtab_node *node, tree decl, bool wpa)
 {
   const char *name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
 
@@ -939,11 +1339,19 @@ privatize_symbol_name_1 (symtab_node *node, tree decl)
     return false;
 
   name = maybe_rewrite_identifier (name);
-  unsigned &clone_number = lto_clone_numbers->get_or_insert (name);
-  symtab->change_decl_assembler_name (decl,
-				      clone_function_name (
-					  name, "lto_priv", clone_number));
-  clone_number++;
+  if (wpa)
+    {
+      gcc_assert (lto_clone_numbers);
+
+      unsigned &clone_number = lto_clone_numbers->get_or_insert (name);
+      symtab->change_decl_assembler_name (decl,
+					  clone_function_name (
+					      name, "lto_priv", clone_number));
+      clone_number++;
+    }
+  else
+    symtab->change_decl_assembler_name (decl, get_file_function_name
+					(node->asm_name ()));
 
   if (node->lto_file_data)
     lto_record_renamed_decl (node->lto_file_data, name,
@@ -968,7 +1376,9 @@ privatize_symbol_name_1 (symtab_node *node, tree decl)
 static bool
 privatize_symbol_name (symtab_node *node)
 {
-  if (!privatize_symbol_name_1 (node, node->decl))
+  bool wpa = !split_outputs;
+
+  if (!privatize_symbol_name_1 (node, node->decl, wpa))
     return false;
 
   return true;
@@ -1117,7 +1527,7 @@ rename_statics (lto_symtab_encoder_t encoder, symtab_node *node)
    all inlinees are added.  */
 
 void
-lto_promote_cross_file_statics (void)
+lto_promote_cross_file_statics (bool promote)
 {
   unsigned i, n_sets;
 
@@ -1147,10 +1557,17 @@ lto_promote_cross_file_statics (void)
 	   lsei_next (&lsei))
         {
           symtab_node *node = lsei_node (lsei);
+	  cgraph_node *cnode = dyn_cast <cgraph_node *> (node);
 
 	  /* If symbol is static, rename it if its assembler name
 	     clashes with anything else in this unit.  */
 	  rename_statics (encoder, node);
+	  if (cnode)
+	    {
+	      bool in_partition = lsei.encoder->nodes[lsei.index].in_partition;
+	      if (!in_partition)
+		cnode->local = false;
+	    }
 
 	  /* No need to promote if symbol already is externally visible ... */
 	  if (node->externally_visible
@@ -1163,8 +1580,12 @@ lto_promote_cross_file_statics (void)
 	      validize_symbol_for_target (node);
 	      continue;
 	    }
-
-          promote_symbol (node);
+	  if (promote)
+	    {
+	      promote_symbol (node);
+	      if (cnode && split_outputs)
+		cnode->local = false;
+	    }
         }
     }
   delete lto_clone_numbers;
@@ -1186,3 +1607,23 @@ lto_promote_statics_nonwpa (void)
     }
   delete lto_clone_numbers;
 }
+
+/* Check if a variable is accessed across partitions.  If yesm then update
+   used_from_other_partition.  */
+
+void
+lto_check_usage_from_other_partitions (void)
+{
+  unsigned int i, j;
+  for (i = 0; i < ltrans_partitions.length (); i++)
+    {
+      vec<lto_encoder_entry> &nodes = (ltrans_partitions[i])->encoder->nodes;
+
+      for (j = 0; j < nodes.length (); j++)
+	{
+	  symtab_node *node = nodes[j].node;
+	  if (node && !nodes[j].in_partition)
+	    node->used_from_other_partition = true;
+	}
+    }
+}
diff --git a/gcc/lto/lto-partition.h b/gcc/lto-partition.h
similarity index 89%
rename from gcc/lto/lto-partition.h
rename to gcc/lto-partition.h
index 42b5ea8c80c..4a1b17fa728 100644
--- a/gcc/lto/lto-partition.h
+++ b/gcc/lto-partition.h
@@ -36,6 +36,8 @@ extern vec<ltrans_partition> ltrans_partitions;
 void lto_1_to_1_map (void);
 void lto_max_map (void);
 void lto_balanced_map (int, int);
-void lto_promote_cross_file_statics (void);
+void lto_promote_cross_file_statics (bool promote);
 void free_ltrans_partitions (void);
 void lto_promote_statics_nonwpa (void);
+void lto_check_usage_from_other_partitions (void);
+void lto_merge_comdat_map (bool, bool, int);
diff --git a/gcc/lto/Make-lang.in b/gcc/lto/Make-lang.in
index 0b73f9ef7bb..46b52cff183 100644
--- a/gcc/lto/Make-lang.in
+++ b/gcc/lto/Make-lang.in
@@ -24,9 +24,9 @@ LTO_EXE = lto1$(exeext)
 LTO_DUMP_EXE = lto-dump$(exeext)
 LTO_DUMP_INSTALL_NAME := $(shell echo lto-dump|sed '$(program_transform_name)')
 # The LTO-specific object files inclued in $(LTO_EXE).
-LTO_OBJS = lto/lto-lang.o lto/lto.o lto/lto-object.o attribs.o lto/lto-partition.o lto/lto-symtab.o lto/lto-common.o
+LTO_OBJS = lto/lto-lang.o lto/lto.o lto/lto-object.o attribs.o lto/lto-symtab.o lto/lto-common.o
 lto_OBJS = $(LTO_OBJS)
-LTO_DUMP_OBJS = lto/lto-lang.o lto/lto-object.o attribs.o lto/lto-partition.o lto/lto-symtab.o lto/lto-dump.o lto/lto-common.o
+LTO_DUMP_OBJS = lto/lto-lang.o lto/lto-object.o attribs.o lto/lto-symtab.o lto/lto-dump.o lto/lto-common.o
 lto_dump_OBJS = $(LTO_DUMP_OBJS)
 
 # this is only useful in a LTO bootstrap, but this does not work right
diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index 1c37814bde4..803b9920e35 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -515,7 +515,7 @@ do_whole_program_analysis (void)
   /* Find out statics that need to be promoted
      to globals with hidden visibility because they are accessed from multiple
      partitions.  */
-  lto_promote_cross_file_statics ();
+  lto_promote_cross_file_statics (true);
   if (dump_file)
      dump_end (partition_dump_id, dump_file);
   dump_file = NULL;
diff --git a/gcc/params.opt b/gcc/params.opt
index f39e5d1a012..00fc58cd5cc 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -366,6 +366,14 @@ Minimal size of a partition for LTO (in estimated instructions).
 Common Joined UInteger Var(param_lto_partitions) Init(128) IntegerRange(1, 65536) Param
 Number of partitions the program should be split to.
 
+-param=promote-statics=
+Common Joined UInteger Var(param_promote_statics) Init(0) IntegerRange(0, 1) Param
+Allow statics and non-public functions to be promoted as public when compiling in parallel.
+
+-param=balance-partitions=
+Common Joined UInteger Var(param_balance_partitions) Init(1) IntegerRange(0, 1) Param
+When compiling in parallel, try to balance the partitions for compilation performance.
+
 -param=max-average-unrolled-insns=
 Common Joined UInteger Var(param_max_average_unrolled_insns) Init(80) Param Optimization
 The maximum number of instructions to consider to unroll in a loop on average.
diff --git a/gcc/tree.c b/gcc/tree.c
index d0202c3f785..3ca162d5070 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -9595,6 +9595,24 @@ make_anon_name ()
   return id;
 }
 
+/* Filter the input name removing characters that may confuse the linker.  */
+
+static void
+filter_name (char *name)
+{
+  char *p = name;
+
+  while (*p != '\0')
+    {
+      switch (*p)
+	{
+	  case '*':
+	    *p = '_';
+	}
+      p++;
+    }
+}
+
 /* Generate a name for a special-purpose function.
    The generated name may need to be unique across the whole link.
    Changes to this function may also require corresponding changes to
@@ -9651,8 +9669,7 @@ get_file_function_name (const char *type)
       q = (char *) alloca (9 + 19 + len + 1);
       memcpy (q, file, len + 1);
 
-      snprintf (q + len, 9 + 19 + 1, "_%08X_" HOST_WIDE_INT_PRINT_HEX,
-		crc32_string (0, name), get_random_seed (false));
+      snprintf (q + len, 9 + 19 + 1, "_%08X", crc32_string (0, name));
 
       p = q;
     }
@@ -9665,7 +9682,9 @@ get_file_function_name (const char *type)
      Use a global object (which is already required to be unique over
      the program) rather than the file name (which imposes extra
      constraints).  */
+
   sprintf (buf, FILE_FUNCTION_FORMAT, type, p);
+  filter_name (buf);
 
   return get_identifier (buf);
 }
-- 
2.28.0


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 3/6] Implement fork-based parallelism engine
  2020-08-20 22:00 [PATCH 0/6] Parallelize Intra-Procedural Optimizations using the LTO Engine Giuliano Belinassi
  2020-08-20 22:00 ` [PATCH 1/6] Modify gcc driver for parallel compilation Giuliano Belinassi
  2020-08-20 22:00 ` [PATCH 2/6] Implement a new partitioner " Giuliano Belinassi
@ 2020-08-20 22:00 ` Giuliano Belinassi
  2020-08-27 15:25   ` Jan Hubicka
  2020-08-27 15:37   ` Jan Hubicka
  2020-08-20 22:00 ` [PATCH 4/6] Add `+' for Jobserver Integration Giuliano Belinassi
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 31+ messages in thread
From: Giuliano Belinassi @ 2020-08-20 22:00 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.guenther, hubicka

This patch belongs to the "Parallelize GCC with Processes" series.

Here, we implement the parallelism by forking the compiler into
multiple processes after what would be the LTO LTRANS stage,
partitioning the callgraph into several partitions, as implemented in
"maybe_compile_in_parallel". From a high level, what happens is:

1. If the partitioner manages to generate multiple partitions, the
compiler will then call lto_promote_cross_file_statics to compute
the partition boundary, and symbols are promoted to global only if
promote_statics is set to true. This option is controlled by the user
through --param=promote-statics, which is disabled by default.

2. The compiler will initialize the file passed by the driver trough
the hidden "-fsplit-outputs=<file>", creating such file.

3. The compiler will fork into multiple processes and apply the
allocated partition to the symbol table, removing every node which
is unnecessary for the partition.

4. The parent process wait for all child processes to finish, and then
call exit (0).

For implementing 3., however, we had to do some more detailed analysis
and figure a way to correctly remove reachable nodes from the callgraph
without corrupting any other node. LTO does this by simple trowing
everything into files and reloading it, but we had to avoid this
because that would result in a huge overhead. We implemented this in
"lto_apply_partition_mask" by classifying each node according to
a dependency analysis:

	* Start by trusting what lto_promote_cross_file_statics
	gave to us.

	* Look for nodes in which may need additional nodes to be
	carried with it. For example, inline clones requires that their body
	keep present, so we have to expand the boundary a little by adding
	all nodes that it calls.

	* If the node is in the boundary, we release all unnecessary
	informations about it.	For varpool nodes, we have to declare it
	external, otherwise we end up with multiple instances of the same
	global variable in the program, which results in incorrect linking.

	* Avoid duplicated release of function summaries (ipa-fnsummary).

	* Finally, we had to delay the assembler file initialization,
	delay any early assembler output to file, and remove any initialized
	RTL code if a certain varaible requires to be renamed.

We also implemented a GNU Make Jobserver integration to this mechanism,
as implemented in jobserver.cc. This works as follows:

	* If -fparallel-jobs=jobserver, then we will query the existence of a
	jobserver by calling jobserver_initialize. This method will look if
	the file descriptors provided by make are valid, and check the flags
	of the read file descriptor are set to O_NONBLOCK.

	* Then, the parent process will return the token which Make
	originally gave to it, since the child is blocked awaiting for a
	new token. To correctly block the child, there are two cases: (1)
	when select is available in the host, and (2) when it is not. In
	(1), we have to use it, since the read fd will have O_NONBLOCK. In
	(2), we can simply read the fd, as the read is set to blocking mode.

	* Once the child read a token, it will then compile its part, and return
	the token before finalizing. If the compilation crash, however, the parent
	process will correctly detect that a signal was sent to it, so there is
	no need for any fancy crash control by the jobserver engine part.

gcc/ChangeLog:
2020-08-20  Giuliano Belinassi  <giuliano.belinassi@usp.br>

	* jobserver.cc: New file.
	* jobserver.h: New file.
	* cgraph.c (cgraph_node::maybe_release_dominators): New function.
	* cgraph.h (symtab_node::find_by_order): Declare.
	(symtab_node::find_by_name): Declare.
	(symtab_node::find_by_asm_name): Declare.
	(maybe_release_dominators): Declare.
	* cgraphunit.c (cgraph_node::expand): Quickly return if body removed.
	(ipa_passes): Run all_regular_ipa_passes if split_outputs.
	(is_number): New function.
	(childno): New variable.
	(maybe_compile_in_parallel): New function.
	* ipa-fnsummary (pass_ipa_free_fn_summary::gate): Avoid running twice
	when compiling in parallel.
	* ipa-icf.c (sem_item_optimizer::filter_removed_items): Behaviour when
	compiling in parallel should be the same as if in LTO.
	* ipa-visibility (localize_node): Same as above.
	lto-cgraph.c (handle_node_in_boundary): New function.
	(compute_boundary): New function.
	(lto_apply_partition_mask): New function.
	symtab.c: (symbol_table::change_decl_assembler_name): Discard RTL decl
	if name changed.
	(symtab_node::dump_base): Dump aux2.
	(symtab_node::find_by_order): New function.
	(symtab_node::find_by_name): New function.
	(symtab_node::find_by_asm_name): New function.
	toplev.c: (additional_asm_files): New variable.
	(init_additional_asm_names_file): New function.
	(handle_additional_asm): New function.
	(toplev::main): Finalize the jobserver if initialized.
	* toplev.h:	(init_additional_asm_names_file): Declare.
	(handle_additional_asm): Declare.
	* varasm.c: (output_addressed_contants): Avoid writting to asm too
	early.
	(output_constants): Same as above.
	(add_constant_to_table): Same as above.
	(output_constant_def_contents): Same as above.
	(output_addressed_constants): Same as above.
---
 gcc/Makefile.in      |   1 +
 gcc/cgraph.c         |  16 ++++
 gcc/cgraph.h         |  12 +++
 gcc/cgraphunit.c     | 198 ++++++++++++++++++++++++++++++++++++++++++-
 gcc/ipa-fnsummary.c  |   2 +-
 gcc/ipa-icf.c        |   3 +-
 gcc/ipa-visibility.c |   3 +-
 gcc/ipa.c            |   4 +-
 gcc/jobserver.cc     | 168 ++++++++++++++++++++++++++++++++++++
 gcc/jobserver.h      |  33 ++++++++
 gcc/lto-cgraph.c     | 172 +++++++++++++++++++++++++++++++++++++
 gcc/lto-streamer.h   |   4 +
 gcc/symtab.c         |  46 +++++++++-
 gcc/toplev.c         |  58 ++++++++++++-
 gcc/toplev.h         |   3 +
 gcc/varasm.c         |  26 +++---
 16 files changed, 725 insertions(+), 24 deletions(-)
 create mode 100644 gcc/jobserver.cc
 create mode 100644 gcc/jobserver.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index be42b15f4ff..c00617cfc1a 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1437,6 +1437,7 @@ OBJS = \
 	ira-color.o \
 	ira-emit.o \
 	ira-lives.o \
+	jobserver.o \
 	jump.o \
 	langhooks.o \
 	lcm.o \
diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index c0b45795059..22405098dc5 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -226,6 +226,22 @@ cgraph_node::delete_function_version_by_decl (tree decl)
   decl_node->remove ();
 }
 
+/* Release function dominator info if present.  */
+
+void
+cgraph_node::maybe_release_dominators (void)
+{
+  struct function *fun = DECL_STRUCT_FUNCTION (decl);
+
+  if (fun && fun->cfg)
+    {
+      if (dom_info_available_p (fun, CDI_DOMINATORS))
+	free_dominance_info (fun, CDI_DOMINATORS);
+      if (dom_info_available_p (fun, CDI_POST_DOMINATORS))
+	free_dominance_info (fun, CDI_POST_DOMINATORS);
+    }
+}
+
 /* Record that DECL1 and DECL2 are semantically identical function
    versions.  */
 void
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index b4a7871bd3d..72ac19f9672 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -463,6 +463,15 @@ public:
      Return NULL if there's no such node.  */
   static symtab_node *get_for_asmname (const_tree asmname);
 
+  /* Get symtab node by order.  */
+  static symtab_node *find_by_order (int order);
+
+  /* Get symtab_node by its name.  */
+  static symtab_node *find_by_name (const char *);
+
+  /* Get symtab_node by its ASM name.  */
+  static symtab_node *find_by_asm_name (const char *);
+
   /* Verify symbol table for internal consistency.  */
   static DEBUG_FUNCTION void verify_symtab_nodes (void);
 
@@ -1338,6 +1347,9 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : public symtab_node
   /* Dump the callgraph to file F.  */
   static void dump_cgraph (FILE *f);
 
+  /* Release function dominator info if available.  */
+  void maybe_release_dominators ();
+
   /* Dump the call graph to stderr.  */
   static inline
   void debug_cgraph (void)
diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index d10d635e942..73e4bed3b61 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -2258,6 +2258,11 @@ cgraph_node::expand (void)
 {
   location_t saved_loc;
 
+  /* FIXME: Find out why body-removed nodes are marked for output.  */
+  if (body_removed)
+    return;
+
+
   /* We ought to not compile any inline clones.  */
   gcc_assert (!inlined_to);
 
@@ -2658,6 +2663,7 @@ ipa_passes (void)
 
       execute_ipa_summary_passes
 	((ipa_opt_pass_d *) passes->all_regular_ipa_passes);
+
     }
 
   /* Some targets need to handle LTO assembler output specially.  */
@@ -2687,10 +2693,17 @@ ipa_passes (void)
   if (flag_generate_lto || flag_generate_offload)
     targetm.asm_out.lto_end ();
 
-  if (!flag_ltrans
+  if (split_outputs)
+    flag_ltrans = true;
+
+  if ((!flag_ltrans || split_outputs)
       && ((in_lto_p && flag_incremental_link != INCREMENTAL_LINK_LTO)
 	  || !flag_lto || flag_fat_lto_objects))
     execute_ipa_pass_list (passes->all_regular_ipa_passes);
+
+  if (split_outputs)
+    flag_ltrans = false;
+
   invoke_plugin_callbacks (PLUGIN_ALL_IPA_PASSES_END, NULL);
 
   bitmap_obstack_release (NULL);
@@ -2742,6 +2755,185 @@ symbol_table::output_weakrefs (void)
       }
 }
 
+static bool is_number (const char *str)
+{
+  while (*str != '\0')
+    switch (*str++)
+      {
+	case '0':
+	case '1':
+	case '2':
+	case '3':
+	case '4':
+	case '5':
+	case '6':
+	case '7':
+	case '8':
+	case '9':
+	  continue;
+	default:
+	  return false;
+      }
+
+  return true;
+}
+
+/* If forked, which child am I?  */
+
+static int childno = -1;
+
+static bool
+maybe_compile_in_parallel (void)
+{
+  struct symtab_node *node;
+  int partitions, i, j;
+  int *pids;
+
+  bool promote_statics = param_promote_statics;
+  bool balance = param_balance_partitions;
+  bool jobserver = false;
+  bool job_auto = false;
+  int num_jobs = -1;
+
+  if (!flag_parallel_jobs || !split_outputs)
+    return false;
+
+  if (!strcmp (flag_parallel_jobs, "auto"))
+    {
+      jobserver = jobserver_initialize ();
+      job_auto = true;
+    }
+  else if (!strcmp (flag_parallel_jobs, "jobserver"))
+    jobserver = jobserver_initialize ();
+  else if (is_number (flag_parallel_jobs))
+    num_jobs = atoi (flag_parallel_jobs);
+  else
+    gcc_unreachable ();
+
+  if (job_auto && !jobserver)
+    {
+      num_jobs = sysconf (_SC_NPROCESSORS_CONF);
+      if (num_jobs > 2)
+	num_jobs = 2;
+    }
+
+  if (num_jobs < 0 && !jobserver)
+    {
+      inform (UNKNOWN_LOCATION,
+	      "-fparallel-jobs=jobserver, but no GNU Jobserver found");
+      return false;
+    }
+
+  if (jobserver)
+    num_jobs = 2;
+
+  if (num_jobs == 0)
+    {
+      inform (UNKNOWN_LOCATION, "-fparallel-jobs=0 makes no sense");
+      return false;
+    }
+
+  /* Trick the compiler to think that we are in WPA.  */
+  flag_wpa = "";
+  symtab_node::checking_verify_symtab_nodes ();
+
+  /* Partition the program so that COMDATs get mapped to the same
+     partition.  If promote_statics is true, it also maps statics
+     to the same partition.  If balance is true, try to balance the
+     partitions for compilation performance.  */
+  lto_merge_comdat_map (balance, promote_statics, num_jobs);
+
+  /* AUX pointers are used by partitioning code to bookkeep number of
+     partitions symbol is in.  This is no longer needed.  */
+  FOR_EACH_SYMBOL (node)
+    node->aux = NULL;
+
+  /* We decided that partitioning is a bad idea.  In this case, just
+     proceed with the default compilation method.  */
+  if (ltrans_partitions.length () <= 1)
+    {
+      flag_wpa = NULL;
+      jobserver_finalize ();
+      return false;
+    }
+
+  /* Find out statics that need to be promoted
+     to globals with hidden visibility because they are accessed from
+     multiple partitions.  */
+  lto_promote_cross_file_statics (promote_statics);
+
+  /* Check if we have variables being referenced across partitions.  */
+  lto_check_usage_from_other_partitions ();
+
+  /* Trick the compiler to think we are not in WPA anymore.  */
+  flag_wpa = NULL;
+
+  partitions = ltrans_partitions.length ();
+  pids = XALLOCAVEC (pid_t, partitions);
+
+  /* There is no point in launching more jobs than we have partitions.  */
+  if (num_jobs > partitions)
+    num_jobs = partitions;
+
+  /* Trick the compiler to think we are in LTRANS mode.  */
+  flag_ltrans = true;
+
+  init_additional_asm_names_file ();
+
+  /* Flush asm file, so we don't get repeated output as we fork.  */
+  fflush (asm_out_file);
+
+  /* Insert a token for child to consume.  */
+  if (jobserver)
+    {
+      num_jobs = partitions;
+      jobserver_return_token ('p');
+    }
+
+  /* Spawn processes.  Spawn as soon as there is a free slot.  */
+  for (j = 0, i = -num_jobs; i < partitions; i++, j++)
+    {
+      if (i >= 0)
+	{
+	  int wstatus, ret;
+	  ret = waitpid (pids[i], &wstatus, 0);
+
+	  if (ret < 0)
+	    internal_error ("Unable to wait child %d to finish", i);
+	  else if (WIFEXITED (wstatus))
+	    {
+	      if (WEXITSTATUS (wstatus) != 0)
+		error ("Child %d exited with error", i);
+	    }
+	  else if (WIFSIGNALED (wstatus))
+	    error ("Child %d aborted with error", i);
+	}
+
+      if (j < partitions)
+	{
+	  gcc_assert (ltrans_partitions[j]->symbols > 0);
+
+	  if (jobserver)
+	    jobserver_get_token ();
+
+	  pids[j] = fork ();
+	  if (pids[j] == 0)
+	    {
+	      childno = j;
+	      lto_apply_partition_mask (ltrans_partitions[j]);
+	      return true;
+	    }
+	}
+    }
+
+  /* Get the token which parent inserted for the childs, which they returned by
+     now.  */
+  if (jobserver)
+    jobserver_get_token ();
+  exit (0);
+}
+
+
 /* Perform simple optimizations based on callgraph.  */
 
 void
@@ -2768,6 +2960,7 @@ symbol_table::compile (void)
   {
     timevar_start (TV_CGRAPH_IPA_PASSES);
     ipa_passes ();
+    maybe_compile_in_parallel ();
     timevar_stop (TV_CGRAPH_IPA_PASSES);
   }
   /* Do nothing else if any IPA pass found errors or if we are just streaming LTO.  */
@@ -2790,6 +2983,9 @@ symbol_table::compile (void)
   timevar_pop (TV_CGRAPHOPT);
 
   /* Output everything.  */
+  if (split_outputs)
+    handle_additional_asm (childno);
+
   switch_to_section (text_section);
   (*debug_hooks->assembly_start) ();
   if (!quiet_flag)
diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
index 2cfab40156e..bc500df4853 100644
--- a/gcc/ipa-fnsummary.c
+++ b/gcc/ipa-fnsummary.c
@@ -4610,7 +4610,7 @@ public:
       gcc_assert (n == 0);
       small_p = param;
     }
-  virtual bool gate (function *) { return true; }
+  virtual bool gate (function *) { return !(flag_ltrans && split_outputs); }
   virtual unsigned int execute (function *)
     {
       ipa_free_fn_summary ();
diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
index 069de9d82fb..6a5657c7507 100644
--- a/gcc/ipa-icf.c
+++ b/gcc/ipa-icf.c
@@ -2345,7 +2345,8 @@ sem_item_optimizer::filter_removed_items (void)
         {
 	  cgraph_node *cnode = static_cast <sem_function *>(item)->get_node ();
 
-	  if (in_lto_p && (cnode->alias || cnode->body_removed))
+	  if ((in_lto_p || split_outputs)
+	      && (cnode->alias || cnode->body_removed))
 	    remove_item (item);
 	  else
 	    filtered.safe_push (item);
diff --git a/gcc/ipa-visibility.c b/gcc/ipa-visibility.c
index 7c854f471e8..4d9e11482d3 100644
--- a/gcc/ipa-visibility.c
+++ b/gcc/ipa-visibility.c
@@ -540,7 +540,8 @@ optimize_weakref (symtab_node *node)
 static void
 localize_node (bool whole_program, symtab_node *node)
 {
-  gcc_assert (whole_program || in_lto_p || !TREE_PUBLIC (node->decl));
+  gcc_assert (split_outputs || whole_program || in_lto_p
+	      || !TREE_PUBLIC (node->decl));
 
   /* It is possible that one comdat group contains both hidden and non-hidden
      symbols.  In this case we can privatize all hidden symbol but we need
diff --git a/gcc/ipa.c b/gcc/ipa.c
index 288b58cf73d..b397ea2fed8 100644
--- a/gcc/ipa.c
+++ b/gcc/ipa.c
@@ -350,7 +350,7 @@ symbol_table::remove_unreachable_nodes (FILE *file)
 
   /* Mark variables that are obviously needed.  */
   FOR_EACH_DEFINED_VARIABLE (vnode)
-    if (!vnode->can_remove_if_no_refs_p()
+    if (!vnode->can_remove_if_no_refs_p ()
 	&& !vnode->in_other_partition)
       {
 	reachable.add (vnode);
@@ -564,7 +564,7 @@ symbol_table::remove_unreachable_nodes (FILE *file)
 	}
       else
 	gcc_assert (node->clone_of || !node->has_gimple_body_p ()
-		    || in_lto_p || DECL_RESULT (node->decl));
+		    || in_lto_p || split_outputs || DECL_RESULT (node->decl));
     }
 
   /* Inline clones might be kept around so their materializing allows further
diff --git a/gcc/jobserver.cc b/gcc/jobserver.cc
new file mode 100644
index 00000000000..8cb374de86e
--- /dev/null
+++ b/gcc/jobserver.cc
@@ -0,0 +1,168 @@
+/* GNU Jobserver Integration Interface.
+   Copyright (C) 2005-2020 Free Software Foundation, Inc.
+
+   Contributed by Giuliano Belinassi <giuliano.belinassi@usp.br>
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "jobserver.h"
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "diagnostic.h"
+#include "errno.h"
+
+/* Token which make sent when invoking GCC.  */
+#define JOBSERVER_MAKE_TOKEN  '+'
+
+bool jobserver_initialized = false;
+bool nonblock_mode = false;
+static int wfd = -1;
+static int rfd = -1;
+
+jobserver_token_t jobserver_curr_token = JOBSERVER_NULL_TOKEN;
+
+/* When using GNU Jobserver but the user did not prepend the recursive make
+   token `+' to the GCC invocation, Make can close the file descriptors used
+   to comunicate with us, and there is no reliable way to detect this.
+   Therefore the best we can do is crash and alert the user to do hte right
+   thing.  */
+static void jobserver_crash ()
+{
+  fatal_error (UNKNOWN_LOCATION,
+	       "-fparallel-jobs=jobserver, but Make jobserver pipe is closed");
+}
+
+/* Initialize this interface.  We try to find whether the Jobserver is active
+   and working.  */
+bool jobserver_initialize ()
+{
+  bool success;
+  const char *makeflags = getenv ("MAKEFLAGS");
+  if (makeflags == NULL)
+    return false;
+
+  const char *needle = "--jobserver-auth=";
+  const char *n = strstr (makeflags, needle);
+  if (n == NULL)
+    return false;
+
+  success = (sscanf (n + strlen (needle), "%d,%d", &rfd, &wfd) == 2
+		    && rfd > 0
+		    && wfd > 0
+		    && is_valid_fd (rfd)
+		    && is_valid_fd (wfd));
+
+  if (!success)
+    return false;
+
+  struct stat statbuf;
+  if (fstat (rfd, &statbuf) < 0
+      || (statbuf.st_mode & S_IFMT) != S_IFIFO
+      || fstat (wfd, &statbuf) < 0
+      || (statbuf.st_mode & S_IFMT) != S_IFIFO)
+    return false;
+
+  int flags = fcntl (rfd, F_GETFL);
+  if (flags < 0)
+    return false;
+
+  /* Mark that rfd is O_NONBLOCK, as we will have to use select.  */
+  if (flags & O_NONBLOCK)
+    nonblock_mode = true;
+
+  return (jobserver_initialized = true);
+}
+
+/* Finalize the jobserver, so this interface could be used again later.  */
+bool jobserver_finalize ()
+{
+  if (!jobserver_initialized)
+    return false;
+
+  close (rfd);
+  close (wfd);
+
+  rfd = wfd = -1;
+  nonblock_mode = false;
+
+  jobserver_initialized = false;
+  return true;
+}
+
+/* Return token to the jobserver.  If c is the NULL token, then return
+   the last token we got.  */
+void jobserver_return_token (jobserver_token_t c)
+{
+  ssize_t w;
+
+  if (c == JOBSERVER_NULL_TOKEN)
+    c = jobserver_curr_token;
+
+  w = write (wfd, &c, sizeof (jobserver_token_t));
+
+  if (w <= 0)
+    jobserver_crash ();
+}
+
+/* TODO: Check if select if available in our system.  */
+#define HAVE_SELECT
+
+/* Retrieve a token from the Jobserver.  We have two cases, in which we must be
+   careful.  First is when the function pselect is available in our system, as
+   Make will set the read fd as nonblocking and will expect that we use select.
+   (see posixos.c in GNU Make sourcecode).
+   The other is when select is not available in our system, and Make will set
+   it as blocking.  */
+char jobserver_get_token ()
+{
+  jobserver_token_t ret = JOBSERVER_NULL_TOKEN;
+  ssize_t r = -1;
+
+  while (r < 0)
+    {
+      if (nonblock_mode)
+	{
+#ifdef HAVE_SELECT
+	  fd_set readfd_set;
+
+	  FD_ZERO (&readfd_set);
+	  FD_SET (rfd, &readfd_set);
+
+	  r = select (rfd+1, &readfd_set, NULL, NULL, NULL);
+
+	  if (r < 0 && errno == EAGAIN)
+	    continue;
+
+	  gcc_assert (r > 0);
+#else
+	  internal_error ("Make set Jobserver pipe to nonblock mode, but "
+			  " select is not supported in your system");
+#endif
+	}
+
+      r = read (rfd, &ret, sizeof (jobserver_token_t));
+
+      if (!(r > 0 || (r < 0 && errno == EAGAIN)))
+	{
+	  jobserver_crash ();
+	  break;
+	}
+    }
+
+  return (jobserver_curr_token = ret);
+}
diff --git a/gcc/jobserver.h b/gcc/jobserver.h
new file mode 100644
index 00000000000..2047cf4cdbf
--- /dev/null
+++ b/gcc/jobserver.h
@@ -0,0 +1,33 @@
+/* GNU Jobserver Integration Interface.
+   Copyright (C) 2005-2020 Free Software Foundation, Inc.
+
+   Contributed by Giuliano Belinassi <giuliano.belinassi@usp.br>
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#define JOBSERVER_NULL_TOKEN ('\0')
+
+typedef char jobserver_token_t;
+
+extern bool jobserver_initialized;
+extern jobserver_token_t jobserver_curr_token;
+
+bool jobserver_initialize ();
+bool jobserver_finalize ();
+jobserver_token_t jobserver_get_token ();
+void jobserver_return_token (jobserver_token_t);
+
diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index 93a99f3465b..12be8546d9c 100644
--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -39,6 +39,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "omp-offload.h"
 #include "stringpool.h"
 #include "attribs.h"
+#include "lto-partition.h"
 
 /* True when asm nodes has been output.  */
 bool asm_nodes_output = false;
@@ -2065,3 +2066,174 @@ input_cgraph_opt_summary (vec<symtab_node *> nodes)
 	input_cgraph_opt_section (file_data, data, len, nodes);
     }
 }
+
+/* When analysing function for removal, we have mainly three states, as
+   defined below.  */
+
+enum node_partition_state
+{
+  CAN_REMOVE,		/* This node can be removed, or is still to be
+			   analysed.  */
+  IN_CURRENT_PARTITION, /* This node is in current partition and should not be
+			   touched.  */
+  IN_BOUNDARY,		/* This node is in boundary, therefore being in other
+			   partition or is a external symbol, and its body can
+			   be released.  */
+  IN_BOUNDARY_KEEP_BODY /* This symbol is in other partition but we may need its
+			   body for inlining, for instance.  */
+};
+
+/* Handle node that are in the LTRANS boundary, releasing its body and
+   other informations if necessary.  */
+
+static void
+handle_node_in_boundary (symtab_node *node, bool keep_body)
+{
+  if (cgraph_node *cnode = dyn_cast <cgraph_node *> (node))
+    {
+      if (cnode->inlined_to && cnode->inlined_to->aux2 != IN_CURRENT_PARTITION)
+	{
+	  /* If marked to be inlined into a node not in current partition,
+	     then undo the inline.  */
+
+	  if (cnode->callers) /* This edge could be removed.  */
+	    cnode->callers->inline_failed = CIF_UNSPECIFIED;
+	  cnode->inlined_to = NULL;
+	}
+
+      if (cnode->has_gimple_body_p ())
+	{
+	  if (!keep_body)
+	    {
+	      cnode->maybe_release_dominators ();
+	      cnode->remove_callees ();
+	      cnode->remove_all_references ();
+
+	      /* FIXME: Releasing body of clones can release bodies of functions
+		 in current partition.  */
+
+	      /* cnode->release_body ();  */
+	      cnode->body_removed = true;
+	      cnode->definition = false;
+	      cnode->analyzed = false;
+	    }
+	  cnode->cpp_implicit_alias = false;
+	  cnode->alias = false;
+	  cnode->transparent_alias = false;
+	  cnode->thunk.thunk_p = false;
+	  cnode->weakref = false;
+	  /* After early inlining we drop always_inline attributes on
+	     bodies of functions that are still referenced (have their
+	     address taken).  */
+	  DECL_ATTRIBUTES (cnode->decl)
+	    = remove_attribute ("always_inline",
+				DECL_ATTRIBUTES (node->decl));
+
+	  cnode->in_other_partition = true;
+	}
+    }
+  else if (is_a <varpool_node *> (node) && !DECL_EXTERNAL (node->decl))
+    {
+      DECL_EXTERNAL (node->decl) = true;
+      node->in_other_partition = true;
+    }
+}
+
+/* Check the boundary and expands it if necessary, including more nodes or
+   promoting then to a state where their body is required.  */
+
+static void
+compute_boundary (ltrans_partition partition)
+{
+  vec<lto_encoder_entry> &nodes = partition->encoder->nodes;
+  symtab_node *node;
+  cgraph_node *cnode;
+  auto_vec<symtab_node *, 16> mark_to_remove;
+  unsigned int i;
+
+  FOR_EACH_SYMBOL (node)
+    node->aux2 = CAN_REMOVE;
+
+  /* Lets assign the information that the encoder gave to us.  */
+  for (i = 0; i < nodes.length (); i++)
+    {
+      node = nodes[i].node;
+      if (nodes[i].in_partition)
+	{
+	  node->aux2 = IN_CURRENT_PARTITION;
+	  node->in_other_partition = false;
+	}
+      else if (nodes[i].body)
+	node->aux2 = IN_BOUNDARY_KEEP_BODY;
+      else
+	node->aux2 = IN_BOUNDARY;
+    }
+
+  /* Then look for nodes that was marked to be inlined to.  If it is marked to
+     be inlined into a node that is in current partition, then mark its body to
+     not be removed.  Also expand the boundary for nodes that requires that
+     its body not to be removed.  */
+  for (i = 0; i < nodes.length (); i++)
+    {
+      cnode = dyn_cast <cgraph_node *> (nodes[i].node);
+      if (!cnode)
+	continue;
+
+      /* Promote nodes that will be inlined into a node in current partiton.  */
+      if (cnode->inlined_to && cnode->inlined_to->aux2 == IN_CURRENT_PARTITION
+	  && cnode->aux2 == IN_BOUNDARY)
+	cnode->aux2 = IN_BOUNDARY_KEEP_BODY;
+
+      /* Expand the boundary based on nodes in boundary that requires their
+	 body to be present.  */
+      if (cnode->aux2 == IN_BOUNDARY_KEEP_BODY)
+	for (cgraph_edge *e = cnode->callees; e; e = e->next_callee)
+	  if (e->callee->aux2 == CAN_REMOVE)
+	    e->callee->aux2 = IN_BOUNDARY;
+    }
+}
+
+/* Replace the partition in the symbol table, removing every node which is not
+   in partition.  */
+
+void
+lto_apply_partition_mask (ltrans_partition partition)
+{
+  symtab_node *node;
+  auto_vec<symtab_node *, 16> mark_to_remove;
+  unsigned int i;
+
+  compute_boundary (partition);
+
+  FOR_EACH_SYMBOL (node)
+    switch (node->aux2)
+      {
+	case IN_CURRENT_PARTITION:
+	  continue;
+
+	case CAN_REMOVE:
+	  mark_to_remove.safe_push (node);
+	  break;
+
+	case IN_BOUNDARY:
+	  handle_node_in_boundary (node, false);
+	  break;
+
+	case IN_BOUNDARY_KEEP_BODY:
+	  handle_node_in_boundary (node, true);
+	  break;
+
+	default:
+	  gcc_unreachable ();
+      }
+
+  /* Finally remove queued nodes.  */
+  for (i = 0; i < mark_to_remove.length (); i++)
+    {
+      symtab_node *node = mark_to_remove[i];
+      if (is_a <cgraph_node *> (node))
+	dyn_cast <cgraph_node *> (node)->maybe_release_dominators ();
+
+      node->remove ();
+    }
+}
diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index 0129f00cc1a..2ff23d16e52 100644
--- a/gcc/lto-streamer.h
+++ b/gcc/lto-streamer.h
@@ -916,6 +916,10 @@ bool reachable_from_this_partition_p (struct cgraph_node *,
 lto_symtab_encoder_t compute_ltrans_boundary (lto_symtab_encoder_t encoder);
 void select_what_to_stream (void);
 
+struct ltrans_partition_def;
+void lto_apply_partition_mask (struct ltrans_partition_def *partition);
+
+
 /* In options-save.c.  */
 void cl_target_option_stream_out (struct output_block *, struct bitpack_d *,
 				  struct cl_target_option *);
diff --git a/gcc/symtab.c b/gcc/symtab.c
index d7dfbb676df..669095af820 100644
--- a/gcc/symtab.c
+++ b/gcc/symtab.c
@@ -297,9 +297,13 @@ symbol_table::change_decl_assembler_name (tree decl, tree name)
 	unlink_from_assembler_name_hash (node, true);
 
       const char *old_name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
-      if (TREE_SYMBOL_REFERENCED (DECL_ASSEMBLER_NAME (decl))
-	  && DECL_RTL_SET_P (decl))
-	warning (0, "%qD renamed after being referenced in assembly", decl);
+      if (DECL_RTL_SET_P (decl))
+	{
+	  if (TREE_SYMBOL_REFERENCED (DECL_ASSEMBLER_NAME (decl))
+	      && DECL_RTL_SET_P (decl))
+	    warning (0, "%qD renamed after being referenced in assembly", decl);
+	  SET_DECL_RTL (decl, NULL);
+	}
 
       SET_DECL_ASSEMBLER_NAME (decl, name);
       if (alias)
@@ -965,6 +969,8 @@ symtab_node::dump_base (FILE *f)
   if (lto_file_data)
     fprintf (f, "  Read from file: %s\n",
 	     lto_file_data->file_name);
+
+  fprintf (f, "  AUX2: %d\n", aux2);
 }
 
 /* Dump symtab node to F.  */
@@ -2504,3 +2510,37 @@ symtab_node::output_to_lto_symbol_table_p (void)
     }
   return true;
 }
+
+DEBUG_FUNCTION symtab_node *
+symtab_node::find_by_order (int order)
+{
+  symtab_node *node;
+  FOR_EACH_SYMBOL (node)
+    if (node->order == order)
+      return node;
+
+  return NULL;
+}
+
+DEBUG_FUNCTION symtab_node *
+symtab_node::find_by_name (const char * name)
+{
+  symtab_node *node;
+  FOR_EACH_SYMBOL (node)
+    if (!strcmp (node->name (), name))
+      return node;
+
+  return NULL;
+}
+
+DEBUG_FUNCTION symtab_node *
+symtab_node::find_by_asm_name (const char *asm_name)
+{
+  symtab_node *node;
+  FOR_EACH_SYMBOL (node)
+    if (!strcmp (node->asm_name (), asm_name))
+      return node;
+
+  return NULL;
+
+}
diff --git a/gcc/toplev.c b/gcc/toplev.c
index 07457d08c3a..95450880aab 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -84,6 +84,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "dump-context.h"
 #include "print-tree.h"
 #include "optinfo-emit-json.h"
+#include "jobserver.h"
 
 #if defined(DBX_DEBUGGING_INFO) || defined(XCOFF_DEBUGGING_INFO)
 #include "dbxout.h"
@@ -104,7 +105,6 @@ static void do_compile ();
 static void process_options (void);
 static void backend_init (void);
 static int lang_dependent_init (const char *);
-static void init_asm_output (const char *);
 static void finalize (bool);
 
 static void crash_signal (int) ATTRIBUTE_NORETURN;
@@ -177,6 +177,9 @@ FILE *callgraph_info_file = NULL;
 static bitmap callgraph_info_external_printed;
 FILE *stack_usage_file = NULL;
 
+/* Output file to write additional asm filenames.  */
+FILE *additional_asm_filenames = NULL;
+
 /* The current working directory of a translation.  It's generally the
    directory from which compilation was initiated, but a preprocessed
    file may specify the original directory in which it was
@@ -895,6 +898,51 @@ init_asm_output (const char *name)
     }
 }
 
+void
+init_additional_asm_names_file (void)
+{
+  gcc_assert (split_outputs);
+
+  additional_asm_filenames = fopen (split_outputs, "w");
+  if (!additional_asm_filenames)
+    error ("Unable to create a temporary write-only file.");
+
+  fclose (additional_asm_filenames);
+}
+
+/* Reinitialize the assembler file and store it in the additional asm file.  */
+
+void
+handle_additional_asm (int childno)
+{
+  gcc_assert (split_outputs);
+
+  if (childno < 0)
+    return;
+
+  const char *temp_asm_name = make_temp_file (".s");
+  asm_file_name = temp_asm_name;
+
+  if (asm_out_file == stdout)
+    fatal_error (UNKNOWN_LOCATION, "Unexpected asm output to stdout");
+
+  fclose (asm_out_file);
+
+  asm_out_file = fopen (temp_asm_name, "w");
+  if (!asm_out_file)
+    fatal_error (UNKNOWN_LOCATION, "Unable to create asm output file");
+
+  /* Reopen file as append mode.  Here we assume that write to append file is
+     atomic, as it is in Linux.  */
+  additional_asm_filenames = fopen (split_outputs, "a");
+  if (!additional_asm_filenames)
+    fatal_error (UNKNOWN_LOCATION,
+		 "Unable to open the temporary asm files container");
+
+  fprintf (additional_asm_filenames, "%d %s\n", childno, asm_file_name);
+  fclose (additional_asm_filenames);
+}
+
 /* A helper function; used as the reallocator function for cpp's line
    table.  */
 static void *
@@ -2311,7 +2359,7 @@ do_compile ()
 
           timevar_stop (TV_PHASE_SETUP);
 
-          compile_file ();
+	  compile_file ();
         }
       else
         {
@@ -2477,6 +2525,12 @@ toplev::main (int argc, char **argv)
 
   finalize_plugins ();
 
+  if (jobserver_initialized)
+    {
+      jobserver_return_token (JOBSERVER_NULL_TOKEN);
+      jobserver_finalize ();
+    }
+
   after_memory_report = true;
 
   if (seen_error () || werrorcount)
diff --git a/gcc/toplev.h b/gcc/toplev.h
index d6c316962b0..3abbf74cd02 100644
--- a/gcc/toplev.h
+++ b/gcc/toplev.h
@@ -103,4 +103,7 @@ extern void parse_alignment_opts (void);
 
 extern void initialize_rtl (void);
 
+extern void init_additional_asm_names_file (void);
+extern void handle_additional_asm (int);
+
 #endif /* ! GCC_TOPLEV_H */
diff --git a/gcc/varasm.c b/gcc/varasm.c
index 4070f9c17e8..84df52013d7 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -110,7 +110,7 @@ static void decode_addr_const (tree, class addr_const *);
 static hashval_t const_hash_1 (const tree);
 static int compare_constant (const tree, const tree);
 static void output_constant_def_contents (rtx);
-static void output_addressed_constants (tree);
+static void output_addressed_constants (tree, int);
 static unsigned HOST_WIDE_INT output_constant (tree, unsigned HOST_WIDE_INT,
 					       unsigned int, bool, bool);
 static void globalize_decl (tree);
@@ -2272,7 +2272,7 @@ assemble_variable (tree decl, int top_level ATTRIBUTE_UNUSED,
 
   /* Output any data that we will need to use the address of.  */
   if (DECL_INITIAL (decl) && DECL_INITIAL (decl) != error_mark_node)
-    output_addressed_constants (DECL_INITIAL (decl));
+    output_addressed_constants (DECL_INITIAL (decl), 0);
 
   /* dbxout.c needs to know this.  */
   if (sect && (sect->common.flags & SECTION_CODE) != 0)
@@ -3426,11 +3426,11 @@ build_constant_desc (tree exp)
    already have labels.  */
 
 static constant_descriptor_tree *
-add_constant_to_table (tree exp)
+add_constant_to_table (tree exp, int defer)
 {
   /* The hash table methods may call output_constant_def for addressed
      constants, so handle them first.  */
-  output_addressed_constants (exp);
+  output_addressed_constants (exp, defer);
 
   /* Sanity check to catch recursive insertion.  */
   static bool inserting;
@@ -3474,7 +3474,7 @@ add_constant_to_table (tree exp)
 rtx
 output_constant_def (tree exp, int defer)
 {
-  struct constant_descriptor_tree *desc = add_constant_to_table (exp);
+  struct constant_descriptor_tree *desc = add_constant_to_table (exp, defer);
   maybe_output_constant_def_contents (desc, defer);
   return desc->rtl;
 }
@@ -3544,7 +3544,7 @@ output_constant_def_contents (rtx symbol)
 
   /* Make sure any other constants whose addresses appear in EXP
      are assigned label numbers.  */
-  output_addressed_constants (exp);
+  output_addressed_constants (exp, 0);
 
   /* We are no longer deferring this constant.  */
   TREE_ASM_WRITTEN (decl) = TREE_ASM_WRITTEN (exp) = 1;
@@ -3608,7 +3608,7 @@ lookup_constant_def (tree exp)
 tree
 tree_output_constant_def (tree exp)
 {
-  struct constant_descriptor_tree *desc = add_constant_to_table (exp);
+  struct constant_descriptor_tree *desc = add_constant_to_table (exp, 1);
   tree decl = SYMBOL_REF_DECL (XEXP (desc->rtl, 0));
   varpool_node::finalize_decl (decl);
   return decl;
@@ -4327,7 +4327,7 @@ compute_reloc_for_constant (tree exp)
    Indicate whether an ADDR_EXPR has been encountered.  */
 
 static void
-output_addressed_constants (tree exp)
+output_addressed_constants (tree exp, int defer)
 {
   tree tem;
 
@@ -4347,21 +4347,21 @@ output_addressed_constants (tree exp)
 	tem = DECL_INITIAL (tem);
 
       if (CONSTANT_CLASS_P (tem) || TREE_CODE (tem) == CONSTRUCTOR)
-	output_constant_def (tem, 0);
+	output_constant_def (tem, defer);
 
       if (TREE_CODE (tem) == MEM_REF)
-	output_addressed_constants (TREE_OPERAND (tem, 0));
+	output_addressed_constants (TREE_OPERAND (tem, 0), defer);
       break;
 
     case PLUS_EXPR:
     case POINTER_PLUS_EXPR:
     case MINUS_EXPR:
-      output_addressed_constants (TREE_OPERAND (exp, 1));
+      output_addressed_constants (TREE_OPERAND (exp, 1), defer);
       gcc_fallthrough ();
 
     CASE_CONVERT:
     case VIEW_CONVERT_EXPR:
-      output_addressed_constants (TREE_OPERAND (exp, 0));
+      output_addressed_constants (TREE_OPERAND (exp, 0), defer);
       break;
 
     case CONSTRUCTOR:
@@ -4369,7 +4369,7 @@ output_addressed_constants (tree exp)
 	unsigned HOST_WIDE_INT idx;
 	FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (exp), idx, tem)
 	  if (tem != 0)
-	    output_addressed_constants (tem);
+	    output_addressed_constants (tem, defer);
       }
       break;
 
-- 
2.28.0


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 4/6] Add `+' for Jobserver Integration
  2020-08-20 22:00 [PATCH 0/6] Parallelize Intra-Procedural Optimizations using the LTO Engine Giuliano Belinassi
                   ` (2 preceding siblings ...)
  2020-08-20 22:00 ` [PATCH 3/6] Implement fork-based parallelism engine Giuliano Belinassi
@ 2020-08-20 22:00 ` Giuliano Belinassi
  2020-08-20 22:33   ` Joseph Myers
  2020-08-20 22:00 ` [PATCH 5/6] Add invoke documentation Giuliano Belinassi
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 31+ messages in thread
From: Giuliano Belinassi @ 2020-08-20 22:00 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.guenther, hubicka

GNU Make expects that a `+' token is present on the beggining of the
rule command if it wants to interact with the Jobserver [1]. This commit
add such token for the Makefiles in GCC.

[1] https://www.gnu.org/software/make/manual/html_node/POSIX-Jobserver.html#POSIX-Jobserver

gcc/ChangeLog:
intl/ChageLog:
libbacktrace/ChangeLog:
libcpp/ChangeLog:
libdecnumber/ChangeLog:
libiberty/ChangeLog:
zlib/ChangeLog:

2020-08-20  Giuliano Belinassi  <giuliano.belinassi@usp.br>

		* Makefile.in: Use `+' on rule calling GCC.
---
 gcc/Makefile.in          |   4 +-
 intl/Makefile.in         |   2 +-
 libbacktrace/Makefile.in |   2 +-
 libcpp/Makefile.in       |   2 +-
 libdecnumber/Makefile.in |   2 +-
 libiberty/Makefile.in    | 212 +++++++++++++++++++--------------------
 zlib/Makefile.in         |  64 ++++++------
 7 files changed, 144 insertions(+), 144 deletions(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index c00617cfc1a..2e7aa4b6d30 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -2703,14 +2703,14 @@ generated_files = config.h tm.h $(TM_P_H) $(TM_D_H) $(TM_H) multilib.h \
 # How to compile object files to run on the build machine.
 
 build/%.o :  # dependencies provided by explicit rule later
-	$(COMPILER_FOR_BUILD) -c $(BUILD_COMPILERFLAGS) $(BUILD_CPPFLAGS) \
+	+$(COMPILER_FOR_BUILD) -c $(BUILD_COMPILERFLAGS) $(BUILD_CPPFLAGS) \
 		-o $@ $<
 
 ## build/version.o is compiled by the $(COMPILER_FOR_BUILD) but needs
 ## several C macro definitions, just like version.o
 build/version.o:  version.c version.h \
                   $(REVISION) $(DATESTAMP) $(BASEVER) $(DEVPHASE)
-	$(COMPILER_FOR_BUILD) -c $(BUILD_COMPILERFLAGS) $(BUILD_CPPFLAGS) \
+	+$(COMPILER_FOR_BUILD) -c $(BUILD_COMPILERFLAGS) $(BUILD_CPPFLAGS) \
 	-DBASEVER=$(BASEVER_s) -DDATESTAMP=$(DATESTAMP_s) \
 	-DREVISION=$(REVISION_s) \
 	-DDEVPHASE=$(DEVPHASE_s) -DPKGVERSION=$(PKGVERSION_s) \
diff --git a/intl/Makefile.in b/intl/Makefile.in
index 356c8ab9b65..de95846cc1d 100644
--- a/intl/Makefile.in
+++ b/intl/Makefile.in
@@ -131,7 +131,7 @@ libintl.h: $(srcdir)/libgnuintl.h
 .SUFFIXES: .c .y .o
 
 .c.o:
-	$(COMPILE) $<
+	+$(COMPILE) $<
 
 .y.c:
 @BISON3_YES@	echo '#define USE_BISON3' > $(patsubst %.c,%-config.h,$@)
diff --git a/libbacktrace/Makefile.in b/libbacktrace/Makefile.in
index b244ca10a4a..08212bb8ac0 100644
--- a/libbacktrace/Makefile.in
+++ b/libbacktrace/Makefile.in
@@ -1326,7 +1326,7 @@ distclean-compile:
 	-rm -f *.tab.c
 
 .c.o:
-	$(AM_V_CC)$(COMPILE) -c -o $@ $<
+	+$(AM_V_CC)$(COMPILE) -c -o $@ $<
 
 .c.obj:
 	$(AM_V_CC)$(COMPILE) -c -o $@ `$(CYGPATH_W) '$<'`
diff --git a/libcpp/Makefile.in b/libcpp/Makefile.in
index 5fbba9b9c76..eaffeedf31a 100644
--- a/libcpp/Makefile.in
+++ b/libcpp/Makefile.in
@@ -223,7 +223,7 @@ endif
 # Implicit rules and I18N
 
 .c.o:
-	$(COMPILE) $<
+	+$(COMPILE) $<
 	$(POSTCOMPILE)
 
 # N.B. We do not attempt to copy these into $(srcdir).
diff --git a/libdecnumber/Makefile.in b/libdecnumber/Makefile.in
index 9da028d7f2f..2192e7434ad 100644
--- a/libdecnumber/Makefile.in
+++ b/libdecnumber/Makefile.in
@@ -191,7 +191,7 @@ COMPILE = source='$<' object='$@' libtool=no $(CC) $(DEFS) $(INCLUDES) $(CPPFLAG
 # Implicit rules
 
 .c.$(objext):
-	$(COMPILE) $<
+	+$(COMPILE) $<
 
 # Tell versions [3.59,3.63) of GNU make to not export all variables.
 # Otherwise a system limit (for SysV at least) may be exceeded.
diff --git a/libiberty/Makefile.in b/libiberty/Makefile.in
index 895f701bcd0..fe85363ba2a 100644
--- a/libiberty/Makefile.in
+++ b/libiberty/Makefile.in
@@ -419,7 +419,7 @@ etags tags TAGS: etags-subdir
 demangle: $(ALL) $(srcdir)/cp-demangle.c
 	@echo "The standalone demangler, now named c++filt, is now"
 	@echo "a part of binutils."
-	$(CC) @DEFS@ $(CFLAGS) $(CPPFLAGS) -I. -I$(INCDIR) $(HDEFINES) \
+	+$(CC) @DEFS@ $(CFLAGS) $(CPPFLAGS) -I. -I$(INCDIR) $(HDEFINES) \
 	  $(srcdir)/cp-demangle.c -DSTANDALONE_DEMANGLER $(TARGETLIB) -o $@
 
 ls:
@@ -739,7 +739,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/dyn-string.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/dyn-string.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/dyn-string.c $(OUTPUT_OPTION)
 
 ./fdmatch.$(objext): $(srcdir)/fdmatch.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/libiberty.h
@@ -749,7 +749,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/fdmatch.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/fdmatch.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/fdmatch.c $(OUTPUT_OPTION)
 
 ./ffs.$(objext): $(srcdir)/ffs.c
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -758,7 +758,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/ffs.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/ffs.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/ffs.c $(OUTPUT_OPTION)
 
 ./fibheap.$(objext): $(srcdir)/fibheap.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/fibheap.h $(INCDIR)/libiberty.h
@@ -768,7 +768,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/fibheap.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/fibheap.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/fibheap.c $(OUTPUT_OPTION)
 
 ./filedescriptor.$(objext): $(srcdir)/filedescriptor.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/libiberty.h
@@ -778,7 +778,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/filedescriptor.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/filedescriptor.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/filedescriptor.c $(OUTPUT_OPTION)
 
 
 ./filename_cmp.$(objext): $(srcdir)/filename_cmp.c config.h $(INCDIR)/ansidecl.h \
@@ -790,7 +790,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/filename_cmp.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/filename_cmp.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/filename_cmp.c $(OUTPUT_OPTION)
 
 ./floatformat.$(objext): $(srcdir)/floatformat.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/floatformat.h $(INCDIR)/libiberty.h
@@ -800,7 +800,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/floatformat.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/floatformat.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/floatformat.c $(OUTPUT_OPTION)
 
 ./fnmatch.$(objext): $(srcdir)/fnmatch.c config.h $(INCDIR)/fnmatch.h \
 	$(INCDIR)/safe-ctype.h
@@ -810,7 +810,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/fnmatch.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/fnmatch.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/fnmatch.c $(OUTPUT_OPTION)
 
 ./fopen_unlocked.$(objext): $(srcdir)/fopen_unlocked.c config.h \
 	$(INCDIR)/ansidecl.h $(INCDIR)/libiberty.h
@@ -820,7 +820,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/fopen_unlocked.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/fopen_unlocked.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/fopen_unlocked.c $(OUTPUT_OPTION)
 
 ./getcwd.$(objext): $(srcdir)/getcwd.c config.h
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -829,7 +829,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/getcwd.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/getcwd.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/getcwd.c $(OUTPUT_OPTION)
 
 ./getopt.$(objext): $(srcdir)/getopt.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/getopt.h
@@ -839,7 +839,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/getopt.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/getopt.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/getopt.c $(OUTPUT_OPTION)
 
 ./getopt1.$(objext): $(srcdir)/getopt1.c config.h $(INCDIR)/getopt.h
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -848,7 +848,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/getopt1.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/getopt1.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/getopt1.c $(OUTPUT_OPTION)
 
 ./getpagesize.$(objext): $(srcdir)/getpagesize.c config.h
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -857,7 +857,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/getpagesize.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/getpagesize.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/getpagesize.c $(OUTPUT_OPTION)
 
 ./getpwd.$(objext): $(srcdir)/getpwd.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/libiberty.h
@@ -867,7 +867,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/getpwd.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/getpwd.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/getpwd.c $(OUTPUT_OPTION)
 
 ./getruntime.$(objext): $(srcdir)/getruntime.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/libiberty.h
@@ -877,7 +877,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/getruntime.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/getruntime.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/getruntime.c $(OUTPUT_OPTION)
 
 ./gettimeofday.$(objext): $(srcdir)/gettimeofday.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/libiberty.h
@@ -887,7 +887,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/gettimeofday.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/gettimeofday.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/gettimeofday.c $(OUTPUT_OPTION)
 
 ./hashtab.$(objext): $(srcdir)/hashtab.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/hashtab.h $(INCDIR)/libiberty.h
@@ -897,7 +897,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/hashtab.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/hashtab.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/hashtab.c $(OUTPUT_OPTION)
 
 ./hex.$(objext): $(srcdir)/hex.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/libiberty.h $(INCDIR)/safe-ctype.h
@@ -907,7 +907,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/hex.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/hex.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/hex.c $(OUTPUT_OPTION)
 
 ./index.$(objext): $(srcdir)/index.c
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -916,7 +916,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/index.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/index.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/index.c $(OUTPUT_OPTION)
 
 ./insque.$(objext): $(srcdir)/insque.c
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -925,7 +925,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/insque.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/insque.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/insque.c $(OUTPUT_OPTION)
 
 ./lbasename.$(objext): $(srcdir)/lbasename.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/filenames.h $(INCDIR)/hashtab.h $(INCDIR)/libiberty.h \
@@ -936,7 +936,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/lbasename.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/lbasename.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/lbasename.c $(OUTPUT_OPTION)
 
 ./lrealpath.$(objext): $(srcdir)/lrealpath.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/libiberty.h
@@ -946,7 +946,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/lrealpath.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/lrealpath.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/lrealpath.c $(OUTPUT_OPTION)
 
 ./make-relative-prefix.$(objext): $(srcdir)/make-relative-prefix.c config.h \
 	$(INCDIR)/ansidecl.h $(INCDIR)/libiberty.h
@@ -956,7 +956,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/make-relative-prefix.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/make-relative-prefix.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/make-relative-prefix.c $(OUTPUT_OPTION)
 
 ./make-temp-file.$(objext): $(srcdir)/make-temp-file.c config.h \
 	$(INCDIR)/ansidecl.h $(INCDIR)/libiberty.h
@@ -966,7 +966,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/make-temp-file.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/make-temp-file.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/make-temp-file.c $(OUTPUT_OPTION)
 
 ./md5.$(objext): $(srcdir)/md5.c config.h $(INCDIR)/ansidecl.h $(INCDIR)/md5.h
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -975,7 +975,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/md5.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/md5.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/md5.c $(OUTPUT_OPTION)
 
 ./memchr.$(objext): $(srcdir)/memchr.c $(INCDIR)/ansidecl.h
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -984,7 +984,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/memchr.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/memchr.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/memchr.c $(OUTPUT_OPTION)
 
 ./memcmp.$(objext): $(srcdir)/memcmp.c $(INCDIR)/ansidecl.h
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -993,7 +993,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/memcmp.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/memcmp.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/memcmp.c $(OUTPUT_OPTION)
 
 ./memcpy.$(objext): $(srcdir)/memcpy.c $(INCDIR)/ansidecl.h
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -1002,7 +1002,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/memcpy.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/memcpy.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/memcpy.c $(OUTPUT_OPTION)
 
 ./memmem.$(objext): $(srcdir)/memmem.c config.h
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -1011,7 +1011,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/memmem.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/memmem.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/memmem.c $(OUTPUT_OPTION)
 
 ./memmove.$(objext): $(srcdir)/memmove.c $(INCDIR)/ansidecl.h
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -1020,7 +1020,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/memmove.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/memmove.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/memmove.c $(OUTPUT_OPTION)
 
 ./mempcpy.$(objext): $(srcdir)/mempcpy.c $(INCDIR)/ansidecl.h
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -1029,7 +1029,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/mempcpy.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/mempcpy.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/mempcpy.c $(OUTPUT_OPTION)
 
 ./memset.$(objext): $(srcdir)/memset.c $(INCDIR)/ansidecl.h
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -1038,7 +1038,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/memset.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/memset.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/memset.c $(OUTPUT_OPTION)
 
 ./mkstemps.$(objext): $(srcdir)/mkstemps.c config.h $(INCDIR)/ansidecl.h
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -1047,7 +1047,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/mkstemps.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/mkstemps.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/mkstemps.c $(OUTPUT_OPTION)
 
 ./msdos.$(objext): $(srcdir)/msdos.c
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -1056,7 +1056,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/msdos.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/msdos.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/msdos.c $(OUTPUT_OPTION)
 
 ./objalloc.$(objext): $(srcdir)/objalloc.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/objalloc.h
@@ -1066,7 +1066,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/objalloc.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/objalloc.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/objalloc.c $(OUTPUT_OPTION)
 
 ./obstack.$(objext): $(srcdir)/obstack.c config.h $(INCDIR)/obstack.h
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -1075,7 +1075,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/obstack.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/obstack.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/obstack.c $(OUTPUT_OPTION)
 
 ./partition.$(objext): $(srcdir)/partition.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/libiberty.h $(INCDIR)/partition.h
@@ -1085,7 +1085,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/partition.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/partition.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/partition.c $(OUTPUT_OPTION)
 
 ./pex-common.$(objext): $(srcdir)/pex-common.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/libiberty.h $(srcdir)/pex-common.h
@@ -1095,7 +1095,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/pex-common.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/pex-common.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/pex-common.c $(OUTPUT_OPTION)
 
 ./pex-djgpp.$(objext): $(srcdir)/pex-djgpp.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/libiberty.h $(srcdir)/pex-common.h
@@ -1105,7 +1105,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/pex-djgpp.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/pex-djgpp.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/pex-djgpp.c $(OUTPUT_OPTION)
 
 ./pex-msdos.$(objext): $(srcdir)/pex-msdos.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/libiberty.h $(srcdir)/pex-common.h \
@@ -1116,7 +1116,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/pex-msdos.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/pex-msdos.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/pex-msdos.c $(OUTPUT_OPTION)
 
 ./pex-one.$(objext): $(srcdir)/pex-one.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/libiberty.h
@@ -1126,7 +1126,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/pex-one.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/pex-one.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/pex-one.c $(OUTPUT_OPTION)
 
 ./pex-unix.$(objext): $(srcdir)/pex-unix.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/libiberty.h $(srcdir)/pex-common.h
@@ -1136,7 +1136,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/pex-unix.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/pex-unix.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/pex-unix.c $(OUTPUT_OPTION)
 
 ./pex-win32.$(objext): $(srcdir)/pex-win32.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/libiberty.h $(srcdir)/pex-common.h
@@ -1146,7 +1146,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/pex-win32.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/pex-win32.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/pex-win32.c $(OUTPUT_OPTION)
 
 ./pexecute.$(objext): $(srcdir)/pexecute.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/libiberty.h
@@ -1156,7 +1156,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/pexecute.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/pexecute.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/pexecute.c $(OUTPUT_OPTION)
 
 ./physmem.$(objext): $(srcdir)/physmem.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/libiberty.h
@@ -1166,7 +1166,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/physmem.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/physmem.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/physmem.c $(OUTPUT_OPTION)
 
 ./putenv.$(objext): $(srcdir)/putenv.c config.h $(INCDIR)/ansidecl.h
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -1175,7 +1175,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/putenv.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/putenv.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/putenv.c $(OUTPUT_OPTION)
 
 ./random.$(objext): $(srcdir)/random.c $(INCDIR)/ansidecl.h
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -1184,7 +1184,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/random.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/random.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/random.c $(OUTPUT_OPTION)
 
 ./regex.$(objext): $(srcdir)/regex.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/xregex.h $(INCDIR)/xregex2.h
@@ -1194,7 +1194,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/regex.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/regex.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/regex.c $(OUTPUT_OPTION)
 
 ./rename.$(objext): $(srcdir)/rename.c config.h $(INCDIR)/ansidecl.h
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -1203,7 +1203,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/rename.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/rename.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/rename.c $(OUTPUT_OPTION)
 
 ./rindex.$(objext): $(srcdir)/rindex.c
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -1212,7 +1212,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/rindex.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/rindex.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/rindex.c $(OUTPUT_OPTION)
 
 ./rust-demangle.$(objext): $(srcdir)/rust-demangle.c config.h \
 	$(INCDIR)/ansidecl.h $(INCDIR)/demangle.h $(INCDIR)/libiberty.h \
@@ -1223,7 +1223,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/rust-demangle.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/rust-demangle.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/rust-demangle.c $(OUTPUT_OPTION)
 
 ./safe-ctype.$(objext): $(srcdir)/safe-ctype.c $(INCDIR)/ansidecl.h \
 	$(INCDIR)/safe-ctype.h
@@ -1233,7 +1233,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/safe-ctype.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/safe-ctype.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/safe-ctype.c $(OUTPUT_OPTION)
 
 ./setenv.$(objext): $(srcdir)/setenv.c config.h $(INCDIR)/ansidecl.h
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -1242,7 +1242,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/setenv.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/setenv.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/setenv.c $(OUTPUT_OPTION)
 
 ./setproctitle.$(objext): $(srcdir)/setproctitle.c config.h $(INCDIR)/ansidecl.h
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -1251,7 +1251,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/setproctitle.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/setproctitle.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/setproctitle.c $(OUTPUT_OPTION)
 
 ./sha1.$(objext): $(srcdir)/sha1.c config.h $(INCDIR)/ansidecl.h $(INCDIR)/sha1.h
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -1260,7 +1260,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/sha1.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/sha1.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/sha1.c $(OUTPUT_OPTION)
 
 ./sigsetmask.$(objext): $(srcdir)/sigsetmask.c $(INCDIR)/ansidecl.h
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -1269,7 +1269,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/sigsetmask.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/sigsetmask.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/sigsetmask.c $(OUTPUT_OPTION)
 
 ./simple-object-coff.$(objext): $(srcdir)/simple-object-coff.c config.h \
 	$(INCDIR)/ansidecl.h $(INCDIR)/libiberty.h \
@@ -1280,7 +1280,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/simple-object-coff.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/simple-object-coff.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/simple-object-coff.c $(OUTPUT_OPTION)
 
 ./simple-object-elf.$(objext): $(srcdir)/simple-object-elf.c config.h \
 	$(INCDIR)/ansidecl.h $(INCDIR)/libiberty.h \
@@ -1291,7 +1291,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/simple-object-elf.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/simple-object-elf.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/simple-object-elf.c $(OUTPUT_OPTION)
 
 ./simple-object-mach-o.$(objext): $(srcdir)/simple-object-mach-o.c config.h \
 	$(INCDIR)/ansidecl.h $(INCDIR)/libiberty.h \
@@ -1302,7 +1302,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/simple-object-mach-o.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/simple-object-mach-o.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/simple-object-mach-o.c $(OUTPUT_OPTION)
 
 ./simple-object-xcoff.$(objext): $(srcdir)/simple-object-xcoff.c config.h \
 	$(INCDIR)/ansidecl.h $(INCDIR)/libiberty.h \
@@ -1313,7 +1313,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/simple-object-xcoff.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/simple-object-xcoff.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/simple-object-xcoff.c $(OUTPUT_OPTION)
 
 ./simple-object.$(objext): $(srcdir)/simple-object.c config.h \
 	$(INCDIR)/ansidecl.h $(INCDIR)/libiberty.h \
@@ -1324,7 +1324,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/simple-object.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/simple-object.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/simple-object.c $(OUTPUT_OPTION)
 
 ./snprintf.$(objext): $(srcdir)/snprintf.c $(INCDIR)/ansidecl.h
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -1333,7 +1333,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/snprintf.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/snprintf.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/snprintf.c $(OUTPUT_OPTION)
 
 ./sort.$(objext): $(srcdir)/sort.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/libiberty.h $(INCDIR)/sort.h
@@ -1343,7 +1343,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/sort.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/sort.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/sort.c $(OUTPUT_OPTION)
 
 ./spaces.$(objext): $(srcdir)/spaces.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/libiberty.h
@@ -1353,7 +1353,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/spaces.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/spaces.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/spaces.c $(OUTPUT_OPTION)
 
 ./splay-tree.$(objext): $(srcdir)/splay-tree.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/libiberty.h $(INCDIR)/splay-tree.h
@@ -1363,7 +1363,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/splay-tree.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/splay-tree.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/splay-tree.c $(OUTPUT_OPTION)
 
 ./stack-limit.$(objext): $(srcdir)/stack-limit.c config.h $(INCDIR)/ansidecl.h
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -1372,7 +1372,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/stack-limit.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/stack-limit.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/stack-limit.c $(OUTPUT_OPTION)
 
 ./stpcpy.$(objext): $(srcdir)/stpcpy.c $(INCDIR)/ansidecl.h
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -1381,7 +1381,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/stpcpy.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/stpcpy.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/stpcpy.c $(OUTPUT_OPTION)
 
 ./stpncpy.$(objext): $(srcdir)/stpncpy.c $(INCDIR)/ansidecl.h
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -1390,7 +1390,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/stpncpy.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/stpncpy.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/stpncpy.c $(OUTPUT_OPTION)
 
 ./strcasecmp.$(objext): $(srcdir)/strcasecmp.c $(INCDIR)/ansidecl.h
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -1399,7 +1399,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/strcasecmp.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/strcasecmp.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/strcasecmp.c $(OUTPUT_OPTION)
 
 ./strchr.$(objext): $(srcdir)/strchr.c $(INCDIR)/ansidecl.h
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -1408,7 +1408,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/strchr.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/strchr.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/strchr.c $(OUTPUT_OPTION)
 
 ./strdup.$(objext): $(srcdir)/strdup.c $(INCDIR)/ansidecl.h
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -1417,7 +1417,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/strdup.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/strdup.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/strdup.c $(OUTPUT_OPTION)
 
 ./strerror.$(objext): $(srcdir)/strerror.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/libiberty.h
@@ -1427,7 +1427,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/strerror.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/strerror.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/strerror.c $(OUTPUT_OPTION)
 
 ./strncasecmp.$(objext): $(srcdir)/strncasecmp.c $(INCDIR)/ansidecl.h
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -1436,7 +1436,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/strncasecmp.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/strncasecmp.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/strncasecmp.c $(OUTPUT_OPTION)
 
 ./strncmp.$(objext): $(srcdir)/strncmp.c $(INCDIR)/ansidecl.h
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -1445,7 +1445,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/strncmp.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/strncmp.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/strncmp.c $(OUTPUT_OPTION)
 
 ./strndup.$(objext): $(srcdir)/strndup.c $(INCDIR)/ansidecl.h
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -1454,7 +1454,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/strndup.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/strndup.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/strndup.c $(OUTPUT_OPTION)
 
 ./strnlen.$(objext): $(srcdir)/strnlen.c config.h
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -1463,7 +1463,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/strnlen.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/strnlen.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/strnlen.c $(OUTPUT_OPTION)
 
 ./strrchr.$(objext): $(srcdir)/strrchr.c $(INCDIR)/ansidecl.h
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -1472,7 +1472,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/strrchr.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/strrchr.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/strrchr.c $(OUTPUT_OPTION)
 
 ./strsignal.$(objext): $(srcdir)/strsignal.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/libiberty.h
@@ -1482,7 +1482,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/strsignal.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/strsignal.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/strsignal.c $(OUTPUT_OPTION)
 
 ./strstr.$(objext): $(srcdir)/strstr.c
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -1491,7 +1491,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/strstr.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/strstr.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/strstr.c $(OUTPUT_OPTION)
 
 ./strtod.$(objext): $(srcdir)/strtod.c $(INCDIR)/ansidecl.h \
 	$(INCDIR)/safe-ctype.h
@@ -1501,7 +1501,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/strtod.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/strtod.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/strtod.c $(OUTPUT_OPTION)
 
 ./strtol.$(objext): $(srcdir)/strtol.c config.h $(INCDIR)/safe-ctype.h
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -1510,7 +1510,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/strtol.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/strtol.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/strtol.c $(OUTPUT_OPTION)
 
 ./strtoll.$(objext): $(srcdir)/strtoll.c config.h $(INCDIR)/safe-ctype.h
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -1519,7 +1519,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/strtoll.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/strtoll.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/strtoll.c $(OUTPUT_OPTION)
 
 ./strtoul.$(objext): $(srcdir)/strtoul.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/safe-ctype.h
@@ -1529,7 +1529,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/strtoul.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/strtoul.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/strtoul.c $(OUTPUT_OPTION)
 
 ./strtoull.$(objext): $(srcdir)/strtoull.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/safe-ctype.h
@@ -1539,7 +1539,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/strtoull.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/strtoull.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/strtoull.c $(OUTPUT_OPTION)
 
 ./strverscmp.$(objext): $(srcdir)/strverscmp.c $(INCDIR)/ansidecl.h \
 	$(INCDIR)/libiberty.h $(INCDIR)/safe-ctype.h
@@ -1549,7 +1549,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/strverscmp.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/strverscmp.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/strverscmp.c $(OUTPUT_OPTION)
 
 ./timeval-utils.$(objext): $(srcdir)/timeval-utils.c config.h \
 	$(INCDIR)/timeval-utils.h
@@ -1559,7 +1559,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/timeval-utils.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/timeval-utils.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/timeval-utils.c $(OUTPUT_OPTION)
 
 ./tmpnam.$(objext): $(srcdir)/tmpnam.c
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -1568,7 +1568,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/tmpnam.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/tmpnam.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/tmpnam.c $(OUTPUT_OPTION)
 
 ./unlink-if-ordinary.$(objext): $(srcdir)/unlink-if-ordinary.c config.h \
 	$(INCDIR)/ansidecl.h $(INCDIR)/libiberty.h
@@ -1578,7 +1578,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/unlink-if-ordinary.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/unlink-if-ordinary.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/unlink-if-ordinary.c $(OUTPUT_OPTION)
 
 ./vasprintf.$(objext): $(srcdir)/vasprintf.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/libiberty.h $(srcdir)/vprintf-support.h
@@ -1588,7 +1588,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/vasprintf.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/vasprintf.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/vasprintf.c $(OUTPUT_OPTION)
 
 ./vfork.$(objext): $(srcdir)/vfork.c $(INCDIR)/ansidecl.h
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -1597,7 +1597,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/vfork.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/vfork.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/vfork.c $(OUTPUT_OPTION)
 
 ./vfprintf.$(objext): $(srcdir)/vfprintf.c $(INCDIR)/ansidecl.h
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -1606,7 +1606,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/vfprintf.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/vfprintf.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/vfprintf.c $(OUTPUT_OPTION)
 
 ./vprintf-support.$(objext): $(srcdir)/vprintf-support.c config.h \
 	$(INCDIR)/ansidecl.h $(INCDIR)/libiberty.h
@@ -1616,7 +1616,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/vprintf-support.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/vprintf-support.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/vprintf-support.c $(OUTPUT_OPTION)
 
 ./vprintf.$(objext): $(srcdir)/vprintf.c $(INCDIR)/ansidecl.h
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -1625,7 +1625,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/vprintf.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/vprintf.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/vprintf.c $(OUTPUT_OPTION)
 
 ./vsnprintf.$(objext): $(srcdir)/vsnprintf.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/libiberty.h
@@ -1635,7 +1635,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/vsnprintf.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/vsnprintf.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/vsnprintf.c $(OUTPUT_OPTION)
 
 ./vsprintf.$(objext): $(srcdir)/vsprintf.c $(INCDIR)/ansidecl.h
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -1644,7 +1644,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/vsprintf.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/vsprintf.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/vsprintf.c $(OUTPUT_OPTION)
 
 ./waitpid.$(objext): $(srcdir)/waitpid.c config.h $(INCDIR)/ansidecl.h
 	if [ x"$(PICFLAG)" != x ]; then \
@@ -1653,7 +1653,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/waitpid.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/waitpid.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/waitpid.c $(OUTPUT_OPTION)
 
 ./xasprintf.$(objext): $(srcdir)/xasprintf.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/libiberty.h
@@ -1663,7 +1663,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/xasprintf.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/xasprintf.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/xasprintf.c $(OUTPUT_OPTION)
 
 ./xatexit.$(objext): $(srcdir)/xatexit.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/libiberty.h
@@ -1673,7 +1673,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/xatexit.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/xatexit.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/xatexit.c $(OUTPUT_OPTION)
 
 ./xexit.$(objext): $(srcdir)/xexit.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/libiberty.h
@@ -1683,7 +1683,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/xexit.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/xexit.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/xexit.c $(OUTPUT_OPTION)
 
 ./xmalloc.$(objext): $(srcdir)/xmalloc.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/libiberty.h
@@ -1693,7 +1693,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/xmalloc.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/xmalloc.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/xmalloc.c $(OUTPUT_OPTION)
 
 ./xmemdup.$(objext): $(srcdir)/xmemdup.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/libiberty.h
@@ -1703,7 +1703,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/xmemdup.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/xmemdup.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/xmemdup.c $(OUTPUT_OPTION)
 
 ./xstrdup.$(objext): $(srcdir)/xstrdup.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/libiberty.h
@@ -1713,7 +1713,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/xstrdup.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/xstrdup.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/xstrdup.c $(OUTPUT_OPTION)
 
 ./xstrerror.$(objext): $(srcdir)/xstrerror.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/libiberty.h
@@ -1723,7 +1723,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/xstrerror.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/xstrerror.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/xstrerror.c $(OUTPUT_OPTION)
 
 ./xstrndup.$(objext): $(srcdir)/xstrndup.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/libiberty.h
@@ -1733,7 +1733,7 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/xstrndup.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/xstrndup.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/xstrndup.c $(OUTPUT_OPTION)
 
 ./xvasprintf.$(objext): $(srcdir)/xvasprintf.c config.h $(INCDIR)/ansidecl.h \
 	$(INCDIR)/libiberty.h $(srcdir)/vprintf-support.h
@@ -1743,4 +1743,4 @@ $(CONFIGURED_OFILES): stamp-picdir stamp-noasandir
 	if [ x"$(NOASANFLAG)" != x ]; then \
 	  $(COMPILE.c) $(PICFLAG) $(NOASANFLAG) $(srcdir)/xvasprintf.c -o noasan/$@; \
 	else true; fi
-	$(COMPILE.c) $(srcdir)/xvasprintf.c $(OUTPUT_OPTION)
+	+$(COMPILE.c) $(srcdir)/xvasprintf.c $(OUTPUT_OPTION)
diff --git a/zlib/Makefile.in b/zlib/Makefile.in
index 3f5102d1b87..860844954a5 100644
--- a/zlib/Makefile.in
+++ b/zlib/Makefile.in
@@ -579,7 +579,7 @@ distclean-compile:
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/zutil.Plo@am__quote@
 
 .c.o:
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(COMPILE) -MT $@ -MD -MP -MF $(DEPDIR)/$*.Tpo -c -o $@ $<
+@am__fastdepCC_TRUE@	_$(AM_V_CC)$(COMPILE) -MT $@ -MD -MP -MF $(DEPDIR)/$*.Tpo -c -o $@ $<
 @am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/$*.Tpo $(DEPDIR)/$*.Po
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='$<' object='$@' libtool=no @AMDEPBACKSLASH@
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
@@ -600,7 +600,7 @@ distclean-compile:
 @am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(LTCOMPILE) -c -o $@ $<
 
 libz_a-adler32.o: adler32.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-adler32.o -MD -MP -MF $(DEPDIR)/libz_a-adler32.Tpo -c -o libz_a-adler32.o `test -f 'adler32.c' || echo '$(srcdir)/'`adler32.c
+@am__fastdepCC_TRUE@	+$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-adler32.o -MD -MP -MF $(DEPDIR)/libz_a-adler32.Tpo -c -o libz_a-adler32.o `test -f 'adler32.c' || echo '$(srcdir)/'`adler32.c
 @am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/libz_a-adler32.Tpo $(DEPDIR)/libz_a-adler32.Po
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='adler32.c' object='libz_a-adler32.o' libtool=no @AMDEPBACKSLASH@
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
@@ -614,25 +614,25 @@ libz_a-adler32.obj: adler32.c
 @am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-adler32.obj `if test -f 'adler32.c'; then $(CYGPATH_W) 'adler32.c'; else $(CYGPATH_W) '$(srcdir)/adler32.c'; fi`
 
 libz_a-compress.o: compress.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-compress.o -MD -MP -MF $(DEPDIR)/libz_a-compress.Tpo -c -o libz_a-compress.o `test -f 'compress.c' || echo '$(srcdir)/'`compress.c
+@am__fastdepCC_TRUE@	+$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-compress.o -MD -MP -MF $(DEPDIR)/libz_a-compress.Tpo -c -o libz_a-compress.o `test -f 'compress.c' || echo '$(srcdir)/'`compress.c
 @am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/libz_a-compress.Tpo $(DEPDIR)/libz_a-compress.Po
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='compress.c' object='libz_a-compress.o' libtool=no @AMDEPBACKSLASH@
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
 @am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-compress.o `test -f 'compress.c' || echo '$(srcdir)/'`compress.c
 
 libz_a-compress.obj: compress.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-compress.obj -MD -MP -MF $(DEPDIR)/libz_a-compress.Tpo -c -o libz_a-compress.obj `if test -f 'compress.c'; then $(CYGPATH_W) 'compress.c'; else $(CYGPATH_W) '$(srcdir)/compress.c'; fi`
+@am__fastdepCC_TRUE@	+$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-compress.obj -MD -MP -MF $(DEPDIR)/libz_a-compress.Tpo -c -o libz_a-compress.obj `if test -f 'compress.c'; then $(CYGPATH_W) 'compress.c'; else $(CYGPATH_W) '$(srcdir)/compress.c'; fi`
 @am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/libz_a-compress.Tpo $(DEPDIR)/libz_a-compress.Po
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='compress.c' object='libz_a-compress.obj' libtool=no @AMDEPBACKSLASH@
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
 @am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-compress.obj `if test -f 'compress.c'; then $(CYGPATH_W) 'compress.c'; else $(CYGPATH_W) '$(srcdir)/compress.c'; fi`
 
 libz_a-crc32.o: crc32.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-crc32.o -MD -MP -MF $(DEPDIR)/libz_a-crc32.Tpo -c -o libz_a-crc32.o `test -f 'crc32.c' || echo '$(srcdir)/'`crc32.c
+@am__fastdepCC_TRUE@	+$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-crc32.o -MD -MP -MF $(DEPDIR)/libz_a-crc32.Tpo -c -o libz_a-crc32.o `test -f 'crc32.c' || echo '$(srcdir)/'`crc32.c
 @am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/libz_a-crc32.Tpo $(DEPDIR)/libz_a-crc32.Po
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='crc32.c' object='libz_a-crc32.o' libtool=no @AMDEPBACKSLASH@
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-crc32.o `test -f 'crc32.c' || echo '$(srcdir)/'`crc32.c
+@am__fastdepCC_FALSE@	+$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-crc32.o `test -f 'crc32.c' || echo '$(srcdir)/'`crc32.c
 
 libz_a-crc32.obj: crc32.c
 @am__fastdepCC_TRUE@	$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-crc32.obj -MD -MP -MF $(DEPDIR)/libz_a-crc32.Tpo -c -o libz_a-crc32.obj `if test -f 'crc32.c'; then $(CYGPATH_W) 'crc32.c'; else $(CYGPATH_W) '$(srcdir)/crc32.c'; fi`
@@ -642,25 +642,25 @@ libz_a-crc32.obj: crc32.c
 @am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-crc32.obj `if test -f 'crc32.c'; then $(CYGPATH_W) 'crc32.c'; else $(CYGPATH_W) '$(srcdir)/crc32.c'; fi`
 
 libz_a-deflate.o: deflate.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-deflate.o -MD -MP -MF $(DEPDIR)/libz_a-deflate.Tpo -c -o libz_a-deflate.o `test -f 'deflate.c' || echo '$(srcdir)/'`deflate.c
+@am__fastdepCC_TRUE@	+$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-deflate.o -MD -MP -MF $(DEPDIR)/libz_a-deflate.Tpo -c -o libz_a-deflate.o `test -f 'deflate.c' || echo '$(srcdir)/'`deflate.c
 @am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/libz_a-deflate.Tpo $(DEPDIR)/libz_a-deflate.Po
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='deflate.c' object='libz_a-deflate.o' libtool=no @AMDEPBACKSLASH@
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-deflate.o `test -f 'deflate.c' || echo '$(srcdir)/'`deflate.c
+@am__fastdepCC_FALSE@	+$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-deflate.o `test -f 'deflate.c' || echo '$(srcdir)/'`deflate.c
 
 libz_a-deflate.obj: deflate.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-deflate.obj -MD -MP -MF $(DEPDIR)/libz_a-deflate.Tpo -c -o libz_a-deflate.obj `if test -f 'deflate.c'; then $(CYGPATH_W) 'deflate.c'; else $(CYGPATH_W) '$(srcdir)/deflate.c'; fi`
+@am__fastdepCC_TRUE@	+$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-deflate.obj -MD -MP -MF $(DEPDIR)/libz_a-deflate.Tpo -c -o libz_a-deflate.obj `if test -f 'deflate.c'; then $(CYGPATH_W) 'deflate.c'; else $(CYGPATH_W) '$(srcdir)/deflate.c'; fi`
 @am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/libz_a-deflate.Tpo $(DEPDIR)/libz_a-deflate.Po
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='deflate.c' object='libz_a-deflate.obj' libtool=no @AMDEPBACKSLASH@
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-deflate.obj `if test -f 'deflate.c'; then $(CYGPATH_W) 'deflate.c'; else $(CYGPATH_W) '$(srcdir)/deflate.c'; fi`
+@am__fastdepCC_FALSE@	+$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-deflate.obj `if test -f 'deflate.c'; then $(CYGPATH_W) 'deflate.c'; else $(CYGPATH_W) '$(srcdir)/deflate.c'; fi`
 
 libz_a-gzread.o: gzread.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-gzread.o -MD -MP -MF $(DEPDIR)/libz_a-gzread.Tpo -c -o libz_a-gzread.o `test -f 'gzread.c' || echo '$(srcdir)/'`gzread.c
+@am__fastdepCC_TRUE@	+$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-gzread.o -MD -MP -MF $(DEPDIR)/libz_a-gzread.Tpo -c -o libz_a-gzread.o `test -f 'gzread.c' || echo '$(srcdir)/'`gzread.c
 @am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/libz_a-gzread.Tpo $(DEPDIR)/libz_a-gzread.Po
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='gzread.c' object='libz_a-gzread.o' libtool=no @AMDEPBACKSLASH@
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-gzread.o `test -f 'gzread.c' || echo '$(srcdir)/'`gzread.c
+@am__fastdepCC_FALSE@	+$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-gzread.o `test -f 'gzread.c' || echo '$(srcdir)/'`gzread.c
 
 libz_a-gzread.obj: gzread.c
 @am__fastdepCC_TRUE@	$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-gzread.obj -MD -MP -MF $(DEPDIR)/libz_a-gzread.Tpo -c -o libz_a-gzread.obj `if test -f 'gzread.c'; then $(CYGPATH_W) 'gzread.c'; else $(CYGPATH_W) '$(srcdir)/gzread.c'; fi`
@@ -670,11 +670,11 @@ libz_a-gzread.obj: gzread.c
 @am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-gzread.obj `if test -f 'gzread.c'; then $(CYGPATH_W) 'gzread.c'; else $(CYGPATH_W) '$(srcdir)/gzread.c'; fi`
 
 libz_a-gzclose.o: gzclose.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-gzclose.o -MD -MP -MF $(DEPDIR)/libz_a-gzclose.Tpo -c -o libz_a-gzclose.o `test -f 'gzclose.c' || echo '$(srcdir)/'`gzclose.c
+@am__fastdepCC_TRUE@	+$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-gzclose.o -MD -MP -MF $(DEPDIR)/libz_a-gzclose.Tpo -c -o libz_a-gzclose.o `test -f 'gzclose.c' || echo '$(srcdir)/'`gzclose.c
 @am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/libz_a-gzclose.Tpo $(DEPDIR)/libz_a-gzclose.Po
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='gzclose.c' object='libz_a-gzclose.o' libtool=no @AMDEPBACKSLASH@
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-gzclose.o `test -f 'gzclose.c' || echo '$(srcdir)/'`gzclose.c
+@am__fastdepCC_FALSE@	+$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-gzclose.o `test -f 'gzclose.c' || echo '$(srcdir)/'`gzclose.c
 
 libz_a-gzclose.obj: gzclose.c
 @am__fastdepCC_TRUE@	$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-gzclose.obj -MD -MP -MF $(DEPDIR)/libz_a-gzclose.Tpo -c -o libz_a-gzclose.obj `if test -f 'gzclose.c'; then $(CYGPATH_W) 'gzclose.c'; else $(CYGPATH_W) '$(srcdir)/gzclose.c'; fi`
@@ -698,11 +698,11 @@ libz_a-gzwrite.obj: gzwrite.c
 @am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-gzwrite.obj `if test -f 'gzwrite.c'; then $(CYGPATH_W) 'gzwrite.c'; else $(CYGPATH_W) '$(srcdir)/gzwrite.c'; fi`
 
 libz_a-gzlib.o: gzlib.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-gzlib.o -MD -MP -MF $(DEPDIR)/libz_a-gzlib.Tpo -c -o libz_a-gzlib.o `test -f 'gzlib.c' || echo '$(srcdir)/'`gzlib.c
+@am__fastdepCC_TRUE@	+$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-gzlib.o -MD -MP -MF $(DEPDIR)/libz_a-gzlib.Tpo -c -o libz_a-gzlib.o `test -f 'gzlib.c' || echo '$(srcdir)/'`gzlib.c
 @am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/libz_a-gzlib.Tpo $(DEPDIR)/libz_a-gzlib.Po
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='gzlib.c' object='libz_a-gzlib.o' libtool=no @AMDEPBACKSLASH@
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-gzlib.o `test -f 'gzlib.c' || echo '$(srcdir)/'`gzlib.c
+@am__fastdepCC_FALSE@	+$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-gzlib.o `test -f 'gzlib.c' || echo '$(srcdir)/'`gzlib.c
 
 libz_a-gzlib.obj: gzlib.c
 @am__fastdepCC_TRUE@	$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-gzlib.obj -MD -MP -MF $(DEPDIR)/libz_a-gzlib.Tpo -c -o libz_a-gzlib.obj `if test -f 'gzlib.c'; then $(CYGPATH_W) 'gzlib.c'; else $(CYGPATH_W) '$(srcdir)/gzlib.c'; fi`
@@ -712,11 +712,11 @@ libz_a-gzlib.obj: gzlib.c
 @am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-gzlib.obj `if test -f 'gzlib.c'; then $(CYGPATH_W) 'gzlib.c'; else $(CYGPATH_W) '$(srcdir)/gzlib.c'; fi`
 
 libz_a-infback.o: infback.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-infback.o -MD -MP -MF $(DEPDIR)/libz_a-infback.Tpo -c -o libz_a-infback.o `test -f 'infback.c' || echo '$(srcdir)/'`infback.c
+@am__fastdepCC_TRUE@	+$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-infback.o -MD -MP -MF $(DEPDIR)/libz_a-infback.Tpo -c -o libz_a-infback.o `test -f 'infback.c' || echo '$(srcdir)/'`infback.c
 @am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/libz_a-infback.Tpo $(DEPDIR)/libz_a-infback.Po
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='infback.c' object='libz_a-infback.o' libtool=no @AMDEPBACKSLASH@
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-infback.o `test -f 'infback.c' || echo '$(srcdir)/'`infback.c
+@am__fastdepCC_FALSE@	+$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-infback.o `test -f 'infback.c' || echo '$(srcdir)/'`infback.c
 
 libz_a-infback.obj: infback.c
 @am__fastdepCC_TRUE@	$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-infback.obj -MD -MP -MF $(DEPDIR)/libz_a-infback.Tpo -c -o libz_a-infback.obj `if test -f 'infback.c'; then $(CYGPATH_W) 'infback.c'; else $(CYGPATH_W) '$(srcdir)/infback.c'; fi`
@@ -726,11 +726,11 @@ libz_a-infback.obj: infback.c
 @am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-infback.obj `if test -f 'infback.c'; then $(CYGPATH_W) 'infback.c'; else $(CYGPATH_W) '$(srcdir)/infback.c'; fi`
 
 libz_a-inffast.o: inffast.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-inffast.o -MD -MP -MF $(DEPDIR)/libz_a-inffast.Tpo -c -o libz_a-inffast.o `test -f 'inffast.c' || echo '$(srcdir)/'`inffast.c
+@am__fastdepCC_TRUE@	+$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-inffast.o -MD -MP -MF $(DEPDIR)/libz_a-inffast.Tpo -c -o libz_a-inffast.o `test -f 'inffast.c' || echo '$(srcdir)/'`inffast.c
 @am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/libz_a-inffast.Tpo $(DEPDIR)/libz_a-inffast.Po
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='inffast.c' object='libz_a-inffast.o' libtool=no @AMDEPBACKSLASH@
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-inffast.o `test -f 'inffast.c' || echo '$(srcdir)/'`inffast.c
+@am__fastdepCC_FALSE@	+$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-inffast.o `test -f 'inffast.c' || echo '$(srcdir)/'`inffast.c
 
 libz_a-inffast.obj: inffast.c
 @am__fastdepCC_TRUE@	$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-inffast.obj -MD -MP -MF $(DEPDIR)/libz_a-inffast.Tpo -c -o libz_a-inffast.obj `if test -f 'inffast.c'; then $(CYGPATH_W) 'inffast.c'; else $(CYGPATH_W) '$(srcdir)/inffast.c'; fi`
@@ -740,25 +740,25 @@ libz_a-inffast.obj: inffast.c
 @am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-inffast.obj `if test -f 'inffast.c'; then $(CYGPATH_W) 'inffast.c'; else $(CYGPATH_W) '$(srcdir)/inffast.c'; fi`
 
 libz_a-inflate.o: inflate.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-inflate.o -MD -MP -MF $(DEPDIR)/libz_a-inflate.Tpo -c -o libz_a-inflate.o `test -f 'inflate.c' || echo '$(srcdir)/'`inflate.c
+@am__fastdepCC_TRUE@	+$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-inflate.o -MD -MP -MF $(DEPDIR)/libz_a-inflate.Tpo -c -o libz_a-inflate.o `test -f 'inflate.c' || echo '$(srcdir)/'`inflate.c
 @am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/libz_a-inflate.Tpo $(DEPDIR)/libz_a-inflate.Po
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='inflate.c' object='libz_a-inflate.o' libtool=no @AMDEPBACKSLASH@
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-inflate.o `test -f 'inflate.c' || echo '$(srcdir)/'`inflate.c
+@am__fastdepCC_FALSE@	+$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-inflate.o `test -f 'inflate.c' || echo '$(srcdir)/'`inflate.c
 
 libz_a-inflate.obj: inflate.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-inflate.obj -MD -MP -MF $(DEPDIR)/libz_a-inflate.Tpo -c -o libz_a-inflate.obj `if test -f 'inflate.c'; then $(CYGPATH_W) 'inflate.c'; else $(CYGPATH_W) '$(srcdir)/inflate.c'; fi`
+@am__fastdepCC_TRUE@	+$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-inflate.obj -MD -MP -MF $(DEPDIR)/libz_a-inflate.Tpo -c -o libz_a-inflate.obj `if test -f 'inflate.c'; then $(CYGPATH_W) 'inflate.c'; else $(CYGPATH_W) '$(srcdir)/inflate.c'; fi`
 @am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/libz_a-inflate.Tpo $(DEPDIR)/libz_a-inflate.Po
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='inflate.c' object='libz_a-inflate.obj' libtool=no @AMDEPBACKSLASH@
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-inflate.obj `if test -f 'inflate.c'; then $(CYGPATH_W) 'inflate.c'; else $(CYGPATH_W) '$(srcdir)/inflate.c'; fi`
+@am__fastdepCC_FALSE@	+$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-inflate.obj `if test -f 'inflate.c'; then $(CYGPATH_W) 'inflate.c'; else $(CYGPATH_W) '$(srcdir)/inflate.c'; fi`
 
 libz_a-inftrees.o: inftrees.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-inftrees.o -MD -MP -MF $(DEPDIR)/libz_a-inftrees.Tpo -c -o libz_a-inftrees.o `test -f 'inftrees.c' || echo '$(srcdir)/'`inftrees.c
+@am__fastdepCC_TRUE@	+$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-inftrees.o -MD -MP -MF $(DEPDIR)/libz_a-inftrees.Tpo -c -o libz_a-inftrees.o `test -f 'inftrees.c' || echo '$(srcdir)/'`inftrees.c
 @am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/libz_a-inftrees.Tpo $(DEPDIR)/libz_a-inftrees.Po
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='inftrees.c' object='libz_a-inftrees.o' libtool=no @AMDEPBACKSLASH@
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-inftrees.o `test -f 'inftrees.c' || echo '$(srcdir)/'`inftrees.c
+@am__fastdepCC_FALSE@	+$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-inftrees.o `test -f 'inftrees.c' || echo '$(srcdir)/'`inftrees.c
 
 libz_a-inftrees.obj: inftrees.c
 @am__fastdepCC_TRUE@	$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-inftrees.obj -MD -MP -MF $(DEPDIR)/libz_a-inftrees.Tpo -c -o libz_a-inftrees.obj `if test -f 'inftrees.c'; then $(CYGPATH_W) 'inftrees.c'; else $(CYGPATH_W) '$(srcdir)/inftrees.c'; fi`
@@ -768,11 +768,11 @@ libz_a-inftrees.obj: inftrees.c
 @am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-inftrees.obj `if test -f 'inftrees.c'; then $(CYGPATH_W) 'inftrees.c'; else $(CYGPATH_W) '$(srcdir)/inftrees.c'; fi`
 
 libz_a-trees.o: trees.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-trees.o -MD -MP -MF $(DEPDIR)/libz_a-trees.Tpo -c -o libz_a-trees.o `test -f 'trees.c' || echo '$(srcdir)/'`trees.c
+@am__fastdepCC_TRUE@	+$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-trees.o -MD -MP -MF $(DEPDIR)/libz_a-trees.Tpo -c -o libz_a-trees.o `test -f 'trees.c' || echo '$(srcdir)/'`trees.c
 @am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/libz_a-trees.Tpo $(DEPDIR)/libz_a-trees.Po
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='trees.c' object='libz_a-trees.o' libtool=no @AMDEPBACKSLASH@
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-trees.o `test -f 'trees.c' || echo '$(srcdir)/'`trees.c
+@am__fastdepCC_FALSE@	+$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-trees.o `test -f 'trees.c' || echo '$(srcdir)/'`trees.c
 
 libz_a-trees.obj: trees.c
 @am__fastdepCC_TRUE@	$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-trees.obj -MD -MP -MF $(DEPDIR)/libz_a-trees.Tpo -c -o libz_a-trees.obj `if test -f 'trees.c'; then $(CYGPATH_W) 'trees.c'; else $(CYGPATH_W) '$(srcdir)/trees.c'; fi`
@@ -782,11 +782,11 @@ libz_a-trees.obj: trees.c
 @am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-trees.obj `if test -f 'trees.c'; then $(CYGPATH_W) 'trees.c'; else $(CYGPATH_W) '$(srcdir)/trees.c'; fi`
 
 libz_a-uncompr.o: uncompr.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-uncompr.o -MD -MP -MF $(DEPDIR)/libz_a-uncompr.Tpo -c -o libz_a-uncompr.o `test -f 'uncompr.c' || echo '$(srcdir)/'`uncompr.c
+@am__fastdepCC_TRUE@	+$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-uncompr.o -MD -MP -MF $(DEPDIR)/libz_a-uncompr.Tpo -c -o libz_a-uncompr.o `test -f 'uncompr.c' || echo '$(srcdir)/'`uncompr.c
 @am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/libz_a-uncompr.Tpo $(DEPDIR)/libz_a-uncompr.Po
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='uncompr.c' object='libz_a-uncompr.o' libtool=no @AMDEPBACKSLASH@
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-uncompr.o `test -f 'uncompr.c' || echo '$(srcdir)/'`uncompr.c
+@am__fastdepCC_FALSE@	+$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-uncompr.o `test -f 'uncompr.c' || echo '$(srcdir)/'`uncompr.c
 
 libz_a-uncompr.obj: uncompr.c
 @am__fastdepCC_TRUE@	$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-uncompr.obj -MD -MP -MF $(DEPDIR)/libz_a-uncompr.Tpo -c -o libz_a-uncompr.obj `if test -f 'uncompr.c'; then $(CYGPATH_W) 'uncompr.c'; else $(CYGPATH_W) '$(srcdir)/uncompr.c'; fi`
@@ -796,11 +796,11 @@ libz_a-uncompr.obj: uncompr.c
 @am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-uncompr.obj `if test -f 'uncompr.c'; then $(CYGPATH_W) 'uncompr.c'; else $(CYGPATH_W) '$(srcdir)/uncompr.c'; fi`
 
 libz_a-zutil.o: zutil.c
-@am__fastdepCC_TRUE@	$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-zutil.o -MD -MP -MF $(DEPDIR)/libz_a-zutil.Tpo -c -o libz_a-zutil.o `test -f 'zutil.c' || echo '$(srcdir)/'`zutil.c
+@am__fastdepCC_TRUE@	+$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-zutil.o -MD -MP -MF $(DEPDIR)/libz_a-zutil.Tpo -c -o libz_a-zutil.o `test -f 'zutil.c' || echo '$(srcdir)/'`zutil.c
 @am__fastdepCC_TRUE@	$(AM_V_at)$(am__mv) $(DEPDIR)/libz_a-zutil.Tpo $(DEPDIR)/libz_a-zutil.Po
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	$(AM_V_CC)source='zutil.c' object='libz_a-zutil.o' libtool=no @AMDEPBACKSLASH@
 @AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
-@am__fastdepCC_FALSE@	$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-zutil.o `test -f 'zutil.c' || echo '$(srcdir)/'`zutil.c
+@am__fastdepCC_FALSE@	+$(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -c -o libz_a-zutil.o `test -f 'zutil.c' || echo '$(srcdir)/'`zutil.c
 
 libz_a-zutil.obj: zutil.c
 @am__fastdepCC_TRUE@	$(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libz_a_CFLAGS) $(CFLAGS) -MT libz_a-zutil.obj -MD -MP -MF $(DEPDIR)/libz_a-zutil.Tpo -c -o libz_a-zutil.obj `if test -f 'zutil.c'; then $(CYGPATH_W) 'zutil.c'; else $(CYGPATH_W) '$(srcdir)/zutil.c'; fi`
-- 
2.28.0


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 5/6] Add invoke documentation
  2020-08-20 22:00 [PATCH 0/6] Parallelize Intra-Procedural Optimizations using the LTO Engine Giuliano Belinassi
                   ` (3 preceding siblings ...)
  2020-08-20 22:00 ` [PATCH 4/6] Add `+' for Jobserver Integration Giuliano Belinassi
@ 2020-08-20 22:00 ` Giuliano Belinassi
  2020-08-20 22:00 ` [PATCH 6/6] New tests for parallel compilation feature Giuliano Belinassi
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 31+ messages in thread
From: Giuliano Belinassi @ 2020-08-20 22:00 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.guenther, hubicka

Add documentation about how to invoke GCC in order to use parallel
compilation.

gcc/ChangeLog:
20-08-2020  Giuliano Belinassi  <giuliano.belinassi@usp.br>

	* doc/invoke.texi: Document -fparallel-jobs=.
---
 gcc/doc/invoke.texi | 32 +++++++++++++++++++++++++++++++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 70dc1ab73a1..18cebf99dfd 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -504,7 +504,8 @@ Objective-C and Objective-C++ Dialects}.
 -fno-sched-spec  -fno-signed-zeros @gol
 -fno-toplevel-reorder  -fno-trapping-math  -fno-zero-initialized-in-bss @gol
 -fomit-frame-pointer  -foptimize-sibling-calls @gol
--fpartial-inlining  -fpeel-loops  -fpredictive-commoning @gol
+-fpartial-inlining  -fparallel-jobs=@var{alg} @gol
+-fpeel-loops  -fpredictive-commoning @gol
 -fprefetch-loop-arrays @gol
 -fprofile-correction @gol
 -fprofile-use  -fprofile-use=@var{path} -fprofile-partial-training @gol
@@ -14511,6 +14512,35 @@ of the function name, it is considered to be a match.  For C99 and C++
 extended identifiers, the function name must be given in UTF-8, not
 using universal character names.
 
+@item -fparallel-jobs=@var{n}
+@opindex parallel
+This option is experimental.
+
+This option enables parallel compilation of files using a maximum of
+@var{n} parallel jobs.  When invoked, it tries to distribute the symbols
+within the file into multiple partitions and compile them in parallel.
+
+For now, private symbols are paritioned together with public symbols
+if there are references to them to avoid code layout modifications
+when compiling.  This means that compiling a file
+with very few public symbols will not provide noticeable improvements
+in compilation time.  However, you can use
+@option{--param=promote-statics=1} to allow GCC to automatically
+promote a symbol to be globally available, improving compilation
+performance in exchange to changing code layout.
+
+You can also specify @option{-fparallel-jobs=jobserver} to use GNU make's
+job server mode to determine the number of parallel jobs.  This
+is useful when the Makefile calling GCC is already executing in parallel.
+You must prepend a @samp{+} to the command recipe in the parent Makefile
+for this to work.  This option likely only works if @env{MAKE} is
+GNU make.  If you specify @option{-fparallel-jobs=auto}, GCC will try to
+automatically detect a running GNU make's job server.
+
+An extra parameter, @option{--param=balance-partitions=0} can be used to
+avoid balancing created partitions.  This should only be used to debug
+the compiler.
+
 @item -fpatchable-function-entry=@var{N}[,@var{M}]
 @opindex fpatchable-function-entry
 Generate @var{N} NOPs right at the beginning
-- 
2.28.0


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 6/6] New tests for parallel compilation feature
  2020-08-20 22:00 [PATCH 0/6] Parallelize Intra-Procedural Optimizations using the LTO Engine Giuliano Belinassi
                   ` (4 preceding siblings ...)
  2020-08-20 22:00 ` [PATCH 5/6] Add invoke documentation Giuliano Belinassi
@ 2020-08-20 22:00 ` Giuliano Belinassi
  2020-08-21 21:08 ` [PATCH 0/6] Parallelize Intra-Procedural Optimizations using the LTO Engine Josh Triplett
  2020-08-24 12:50 ` Richard Biener
  7 siblings, 0 replies; 31+ messages in thread
From: Giuliano Belinassi @ 2020-08-20 22:00 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.guenther, hubicka

Adds new tests for testing the parallel compilation engine.
They mainly test issues with regard to symbol promotion clash and
incorrect early assembler output.

2020-08-20  Giuliano Belinassi  <giuliano.belinassi@usp.br>

	* gcc.dg/parallel-early-constant.c: New test.
	* gcc.dg/parallel-static-1.c: New test.
	* gcc.dg/parallel-static-2.c: New test.
	* gcc.dg/parallel-static-clash-1.c: New test.
	* gcc.dg/parallel-static-clash-aux.c: New test.
---
 .../gcc.dg/parallel-early-constant.c          | 22 ++++++++++++++++++
 gcc/testsuite/gcc.dg/parallel-static-1.c      | 21 +++++++++++++++++
 gcc/testsuite/gcc.dg/parallel-static-2.c      | 21 +++++++++++++++++
 .../gcc.dg/parallel-static-clash-1.c          | 23 +++++++++++++++++++
 .../gcc.dg/parallel-static-clash-aux.c        | 14 +++++++++++
 5 files changed, 101 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/parallel-early-constant.c
 create mode 100644 gcc/testsuite/gcc.dg/parallel-static-1.c
 create mode 100644 gcc/testsuite/gcc.dg/parallel-static-2.c
 create mode 100644 gcc/testsuite/gcc.dg/parallel-static-clash-1.c
 create mode 100644 gcc/testsuite/gcc.dg/parallel-static-clash-aux.c

diff --git a/gcc/testsuite/gcc.dg/parallel-early-constant.c b/gcc/testsuite/gcc.dg/parallel-early-constant.c
new file mode 100644
index 00000000000..fc8c5a986ec
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/parallel-early-constant.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-fparallel-jobs=2 --param=balance-partitions=0" } */
+
+#define A "This is a long test that tests the structure initialization"
+#define B A,A
+#define C B,B,B,B
+#define D C,C,C,C
+
+const char *foo1 ()
+{
+  return A;
+}
+
+int foo2 ()
+{
+  return 42;
+}
+
+int main()
+{
+  char *subs[]={ D, D, D, D, D, D, D, D, D, D, D, D, D, D, D};
+}
diff --git a/gcc/testsuite/gcc.dg/parallel-static-1.c b/gcc/testsuite/gcc.dg/parallel-static-1.c
new file mode 100644
index 00000000000..cf1cc7df93d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/parallel-static-1.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-fparallel-jobs=2 --param=balance-partitions=0" } */
+
+static int global_var;
+
+int foo1(void)
+{
+  global_var = 1;
+}
+
+int foo2(void)
+{
+  global_var = 2;
+}
+
+int main ()
+{
+  foo1 ();
+  foo2 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/parallel-static-2.c b/gcc/testsuite/gcc.dg/parallel-static-2.c
new file mode 100644
index 00000000000..44f5b0d5a02
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/parallel-static-2.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-fparallel-jobs=2 --param=balance-partitions=0" } */
+
+int foo1(void)
+{
+  static int var;
+  var = 1;
+}
+
+int foo2(void)
+{
+  static int var;
+  var = 2;
+}
+
+int main ()
+{
+  foo1 ();
+  foo2 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/parallel-static-clash-1.c b/gcc/testsuite/gcc.dg/parallel-static-clash-1.c
new file mode 100644
index 00000000000..37a01e28b1b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/parallel-static-clash-1.c
@@ -0,0 +1,23 @@
+/* { dg-do run } */
+/* { dg-options "-fparallel-jobs=2 --param=balance-partitions=0 --param=promote-statics=1" } */
+/* { dg-additional-sources "parallel-static-clash-aux.c" } */
+
+int file2_c ();
+
+static int __attribute__ ((noinline))
+private ()
+{
+  return 42;
+}
+
+int
+file1_c ()
+{
+  return private ();
+}
+
+int
+main ()
+{
+  return file1_c () + file2_c ();
+}
diff --git a/gcc/testsuite/gcc.dg/parallel-static-clash-aux.c b/gcc/testsuite/gcc.dg/parallel-static-clash-aux.c
new file mode 100644
index 00000000000..aac473933a9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/parallel-static-clash-aux.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-fparallel-jobs=2 --param=balance-partitions=0" } */
+
+static int __attribute__ ((noinline))
+private ()
+{
+  return -42;
+}
+
+int
+file2_c ()
+{
+  return private ();
+}
-- 
2.28.0


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/6] Add `+' for Jobserver Integration
  2020-08-20 22:00 ` [PATCH 4/6] Add `+' for Jobserver Integration Giuliano Belinassi
@ 2020-08-20 22:33   ` Joseph Myers
  2020-08-24 13:19     ` Richard Biener
  2020-08-27 15:38     ` Jan Hubicka
  0 siblings, 2 replies; 31+ messages in thread
From: Joseph Myers @ 2020-08-20 22:33 UTC (permalink / raw)
  To: Giuliano Belinassi; +Cc: gcc-patches, hubicka

On Thu, 20 Aug 2020, Giuliano Belinassi via Gcc-patches wrote:

>  libbacktrace/Makefile.in |   2 +-
>  zlib/Makefile.in         |  64 ++++++------

These directories use makefiles generated by automake.  Rather than 
modifying the generated files, you need to modify the sources (whether 
that's Makefile.am, or code in automake itself - if in automake itself, we 
should wait for an actual new automake release before updating the version 
used in GCC).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/6] Parallelize Intra-Procedural Optimizations using the LTO Engine.
  2020-08-20 22:00 [PATCH 0/6] Parallelize Intra-Procedural Optimizations using the LTO Engine Giuliano Belinassi
                   ` (5 preceding siblings ...)
  2020-08-20 22:00 ` [PATCH 6/6] New tests for parallel compilation feature Giuliano Belinassi
@ 2020-08-21 21:08 ` Josh Triplett
  2020-08-22 21:04   ` Giuliano Belinassi
  2020-08-24 12:50 ` Richard Biener
  7 siblings, 1 reply; 31+ messages in thread
From: Josh Triplett @ 2020-08-21 21:08 UTC (permalink / raw)
  To: Giuliano Belinassi; +Cc: gcc-patches

On Thu, Aug 20, 2020 at 07:00:13PM -0300, Giuliano Belinassi wrote:
> This patch series add a new flag "-fparallel-jobs=" to control if the
> compiler should try to compile the current file in parallel.
[...]
> Bootstrapped and Regtested on Linux x86_64.
> 
> Giuliano Belinassi (6):
>   Modify gcc driver for parallel compilation
>   Implement a new partitioner for parallel compilation
>   Implement fork-based parallelism engine
>   Add `+' for Jobserver Integration
>   Add invoke documentation
>   New tests for parallel compilation feature

Very nice!

I'm interested in testing this on a highly parallel system. What
baseline do these patches apply to?  They don't seem to apply to GCC
trunk.

Also, I tried to bootstrap the current tip of the devel/autopar_devel
branch, but ended up with compiler segfaults that all look like this:
../../gcc/zlib/compress.c:86:1: internal compiler error: Segmentation fault
   86 | }
      | ^

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/6] Parallelize Intra-Procedural Optimizations using the LTO Engine.
  2020-08-21 21:08 ` [PATCH 0/6] Parallelize Intra-Procedural Optimizations using the LTO Engine Josh Triplett
@ 2020-08-22 21:04   ` Giuliano Belinassi
  2020-08-24 16:44     ` Josh Triplett
  0 siblings, 1 reply; 31+ messages in thread
From: Giuliano Belinassi @ 2020-08-22 21:04 UTC (permalink / raw)
  To: Josh Triplett; +Cc: gcc-patches

Hi, Josh

On 08/21, Josh Triplett wrote:
> On Thu, Aug 20, 2020 at 07:00:13PM -0300, Giuliano Belinassi wrote:
> > This patch series add a new flag "-fparallel-jobs=" to control if the
> > compiler should try to compile the current file in parallel.
> [...]
> > Bootstrapped and Regtested on Linux x86_64.
> > 
> > Giuliano Belinassi (6):
> >   Modify gcc driver for parallel compilation
> >   Implement a new partitioner for parallel compilation
> >   Implement fork-based parallelism engine
> >   Add `+' for Jobserver Integration
> >   Add invoke documentation
> >   New tests for parallel compilation feature
> 
> Very nice!

Thank you for your interest in this :)

> 
> I'm interested in testing this on a highly parallel system. What
> baseline do these patches apply to?  They don't seem to apply to GCC
> trunk.

Hummm, this was supposed to work on trunk out of the box. However,
there is a high probability that I messed up something while rebasing.
I will post a version 2 of it when I get more comments and when I fix
the Makefile issue that Joseph pointed out in other e-mail.

If you want to test it on a high parallel system, I think it will be
cool to see how it behaves also when --param=promote-statics=1, as it
increases parallelism opportunity. :)

> 
> Also, I tried to bootstrap the current tip of the devel/autopar_devel
> branch, but ended up with compiler segfaults that all look like this:
> ../../gcc/zlib/compress.c:86:1: internal compiler error: Segmentation fault
>    86 | }
>       | ^

Well, there was once a bug in this branch when compiling with -flto that
caused the assembler output file not to be properly initialized early
enough, resulting in LTO LGEN stage writing into a invalid FILE pointer.
I fixed this during rebasing but I forgot to push to the autopar_devel
branch. In any case, I just pushed the recent changes to autopar_devel
which fix this issue.

In any case, -fparallel-jobs= should NOT be used together with -flto.
Although I used part of the LTO engine for development of this feature,
they are meant for distinct things. I guess I should give a warning
about that in next version :)

Also, I just tested bootstrap with

../gcc/configure --disable-multilib --enable-languages=c,c++

on x86_64 linux and it is working.

Thank you,
Giuliano.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/6] Parallelize Intra-Procedural Optimizations using the LTO Engine.
  2020-08-20 22:00 [PATCH 0/6] Parallelize Intra-Procedural Optimizations using the LTO Engine Giuliano Belinassi
                   ` (6 preceding siblings ...)
  2020-08-21 21:08 ` [PATCH 0/6] Parallelize Intra-Procedural Optimizations using the LTO Engine Josh Triplett
@ 2020-08-24 12:50 ` Richard Biener
  2020-08-24 15:13   ` Giuliano Belinassi
  7 siblings, 1 reply; 31+ messages in thread
From: Richard Biener @ 2020-08-24 12:50 UTC (permalink / raw)
  To: Giuliano Belinassi; +Cc: GCC Patches, Jan Hubicka

On Fri, Aug 21, 2020 at 12:00 AM Giuliano Belinassi
<giuliano.belinassi@usp.br> wrote:
>
> This patch series add a new flag "-fparallel-jobs=" to control if the
> compiler should try to compile the current file in parallel.
>
> There are three modes which is supported by now:
>
> 1. -fparallel-jobs=<N>: Try to compile the file using a maximum of N
> jobs.
>
> 2. -fparallel-jobs=jobserver: Check if there is a running GNU Make
> Jobserver. If positive, communicate with it in order to launch jobs,
> but alert the user if the jobserver was not found, since it requires
> modifications in the project Makefile.
>
> 3. -fparallel-jobs=auto: Same as 2., but quietly fall back to a maximum
> of 2 jobs if the jobserver was not found.
>
> The parallelization works by using a modified LTO engine, as no IR is
> dumped into the disk, and a new partitioner is employed to find
> symbols which must be partitioned together.
>
> In order to implement the parallelism feature, we:
>
> 1. The driver will pass a hidden -fsplit-outputs=<filename> to cc1*.
>
> 2. After IPA, cc1* will search for symbols in which must be partitioned
> together.  If the user allows GCC to automatically promote symbols to
> globals through "--param=promote-statics=1" for a better parallel
> compilation performance, it will also be done.  However, if it decides
> that partitioning is a bad idea, it will continue with a default serial
> compilation, and the additional <filename> will not be created.  It will
> avoid compiling in parallel if and only if:
>
>   * File size exceeds the minimum file size specified by LTO default
>   --param=lto-min-partition.

less than the minimum size I suppose.

>   * The partitioner is unable to find any point of partitioning in the
>   file.

It might make sense to increase the minimum partition size and also
check the partitioning result against unreasonable bias (one very
large and one very small partition).

> 3. cc1* will fork itself; one fork for each partition. Each child
> process will apply its partition mask generated by the partitioner
> and write a new assembler name file to <filename> pointed by the driver.

For the first partition there's no fork (but the main process is used) and
the main output file will be used, correct?

> 4. The driver will open each file and partially link them together into
> a single .o file, if -c was requested, else into a binary.  -S and -E
> is unsupported for now and probably will remain so.

That also applies to -save-temps mode I assume which makes
debugging issues a bit tricky and involves manual invocation
of the cc1 command to have the file with the output filenames preserved.

>
> Speedups ranged from 0.95x to 1.9x on a Quad-Core Intel Core-i7 8565U
> when testing with two files in GCC, as stated in the following table.
> The test was the result of a single execution with a previous warm up
> execution. The compiled GCC had checking enabled, and therefore release
> version might have better timings in both sequential and parallel, but the
> speedup may remain the same.
>
> |                |            | Without Static | With Static |   Max   |
> | File           | Sequential |    Promotion   |  Promotion  | Speedup |
> |----------------|------------|----------------|-----------------------|
> | gimple-match.c |     60s    |       63s      |     34s     |   1.7x  |
> | insn-emit.c    |     37s    |       19s      |     20s     |   1.9x  |
>
> Notice that we have a slowdown in some cases when it is enabled, that
> is why the parallelism feature is enabled with a flag for now.

One reason why promote-statics is not enabled by default is that
it creates new hidden symbols (LTO does so as well) which might
be undesirable.  If deemed OK in general we could enable it by
default.  Note that originally I wanted to have -fparallel-jobs=auto
be enabled by default which should not end up with visible changes
like this(?)

> Bootstrapped and Regtested on Linux x86_64.
>
> Giuliano Belinassi (6):
>   Modify gcc driver for parallel compilation
>   Implement a new partitioner for parallel compilation
>   Implement fork-based parallelism engine
>   Add `+' for Jobserver Integration
>   Add invoke documentation
>   New tests for parallel compilation feature
>
>  gcc/Makefile.in                               |    6 +-
>  gcc/cgraph.c                                  |   16 +
>  gcc/cgraph.h                                  |   13 +
>  gcc/cgraphunit.c                              |  198 ++-
>  gcc/common.opt                                |    4 +
>  gcc/doc/invoke.texi                           |   32 +-
>  gcc/gcc.c                                     | 1219 +++++++++++++----
>  gcc/ipa-fnsummary.c                           |    2 +-
>  gcc/ipa-icf.c                                 |    3 +-
>  gcc/ipa-visibility.c                          |    3 +-
>  gcc/ipa.c                                     |    4 +-
>  gcc/jobserver.cc                              |  168 +++
>  gcc/jobserver.h                               |   33 +
>  gcc/lto-cgraph.c                              |  172 +++
>  gcc/{lto => }/lto-partition.c                 |  463 ++++++-
>  gcc/{lto => }/lto-partition.h                 |    4 +-
>  gcc/lto-streamer.h                            |    4 +
>  gcc/lto/Make-lang.in                          |    4 +-
>  gcc/lto/lto.c                                 |    2 +-
>  gcc/params.opt                                |    8 +
>  gcc/symtab.c                                  |   46 +-
>  gcc/testsuite/driver/a.c                      |    6 +
>  gcc/testsuite/driver/b.c                      |    6 +
>  gcc/testsuite/driver/driver.exp               |   80 ++
>  gcc/testsuite/driver/empty.c                  |    0
>  gcc/testsuite/driver/foo.c                    |    7 +
>  .../gcc.dg/parallel-early-constant.c          |   22 +
>  gcc/testsuite/gcc.dg/parallel-static-1.c      |   21 +
>  gcc/testsuite/gcc.dg/parallel-static-2.c      |   21 +
>  .../gcc.dg/parallel-static-clash-1.c          |   23 +
>  .../gcc.dg/parallel-static-clash-aux.c        |   14 +
>  gcc/toplev.c                                  |   58 +-
>  gcc/toplev.h                                  |    3 +
>  gcc/tree.c                                    |   23 +-
>  gcc/varasm.c                                  |   26 +-
>  intl/Makefile.in                              |    2 +-
>  libbacktrace/Makefile.in                      |    2 +-
>  libcpp/Makefile.in                            |    2 +-
>  libdecnumber/Makefile.in                      |    2 +-
>  libiberty/Makefile.in                         |  212 +--
>  zlib/Makefile.in                              |   64 +-
>  41 files changed, 2539 insertions(+), 459 deletions(-)
>  create mode 100644 gcc/jobserver.cc
>  create mode 100644 gcc/jobserver.h
>  rename gcc/{lto => }/lto-partition.c (78%)
>  rename gcc/{lto => }/lto-partition.h (89%)
>  create mode 100644 gcc/testsuite/driver/a.c
>  create mode 100644 gcc/testsuite/driver/b.c
>  create mode 100644 gcc/testsuite/driver/driver.exp
>  create mode 100644 gcc/testsuite/driver/empty.c
>  create mode 100644 gcc/testsuite/driver/foo.c
>  create mode 100644 gcc/testsuite/gcc.dg/parallel-early-constant.c
>  create mode 100644 gcc/testsuite/gcc.dg/parallel-static-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/parallel-static-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/parallel-static-clash-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/parallel-static-clash-aux.c
>
> --
> 2.28.0
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/6] Modify gcc driver for parallel compilation
  2020-08-20 22:00 ` [PATCH 1/6] Modify gcc driver for parallel compilation Giuliano Belinassi
@ 2020-08-24 13:17   ` Richard Biener
  2020-08-24 18:06     ` Giuliano Belinassi
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Biener @ 2020-08-24 13:17 UTC (permalink / raw)
  To: Giuliano Belinassi, Joseph S. Myers; +Cc: GCC Patches, Jan Hubicka

On Fri, Aug 21, 2020 at 12:00 AM Giuliano Belinassi
<giuliano.belinassi@usp.br> wrote:
>
> Update the driver for parallel compilation. This process work as
> follows:
>
> When calling gcc, the driver will check if the flag
> "-fparallel-jobs" was provided by the user. If yes, then we will
> check what is the desired output, and if it can be parallelized.
> There are the following cases, which is described:
>
> 1. -S or -E was provided: We can't run in parallel, as the output
>    can not be easily merged together into one file.
>
> 2. -c was provided: When cc1* forks into multiple processes, it
>    must tell the driver where it stored its generated assembler files.
>    Therefore we pass a hidden "-fsplit-outputs=filename" to the compiler,
>    and we check if "filename" was created by it. If yes, we open it,
>    call assembler for each generated asm file
>    (this file must not be empty), and link them together with
>    partial linking to a single .o file. This process is done for each
>    object file in the argument list.
>
> 3. -c was not provided, and the final product will be an binary: Here
>    we proceed exactly as 2., but we avoid doing the partial
>    linking, feeding the generated object files directly into the final link.
>
> For that to work, we had to heavily modify how the "execute" function
> works, extracting common code which is used multiple times, and
> also detecting when the command is a call to a compiler or an
> assembler, as can be seen in append_split_outputs.
>
> Finally, we added some tests which reflects all cases found when
> bootstrapping the compiler, so development of further features to the
> driver get faster for now on.

Few comments inline, Joseph may want to comment on the overall
structure as driver maintainer (CCed).

I know I asked for the changes on the branch to be squashed but
the diff below is quite unreadable with the ChangeLog not helping
the overall refactoring much.  Is it possible to do some of the
factoring/refactoring without any functionality change to make the
actual diff easier to follow?

Thanks,
Richard.

> gcc/ChangeLog
> 2020-08-20  Giuliano Belinassi  <giuliano.belinassi@usp.br>
>
>         * common.opt (fsplit-outputs): New flag.
>         (fparallel-jobs): New flag.
>         * gcc.c (extra_arg_storer): New class.
>         (have_S): New variable.
>         (struct command): Move from execute.
>         (is_compiler): New function.
>         (is_assembler): New function.
>         (get_number_of_args): New function.
>         (get_file_by_lines): New function.
>         (identify_asm_file): New function.
>         (struct infile): New attribute temp_additional_asm.
>         (current_infile): New variable.
>         (get_path_to_ld): New function.
>         (has_hidden_E): New function.
>         (sort_asm_files): New function.
>         (append_split_outputs): New function.
>         (print_command): New function.
>         (print_commands): New function.
>         (print_argbuf): New function.
>         (handle_verbose): Extracted from execute.
>         (append_valgrind): Same as above.
>         (async_launch_commands): Same as above.
>         (await_commands_to_finish): Same as above.
>         (split_commands): Same as above.
>         (parse_argbuf): Same as above.
>         (execute): Refator.
>         (fsplit_arg): New function.
>         (alloc_infile): Initialize infiles with 0.
>         (process_command): Remember when -S was passed.
>         (do_spec_on_infiles): Remember current infile being processed.
>         (maybe_run_linker): Replace object files when -o is a executable.
>         (finalize): Deinitialize temp_object_files.
>
> gcc/testsuite/ChangeLog:
> 20-08-2020  Giuliano Belinassi  <giuliano.belinassi@usp.br>
>
>         * driver/driver.exp: New test.
>         * driver/a.c: New file.
>         * driver/b.c: New file.
>         * driver/empty.c: New file.
>         * driver/foo.c: New file.
> ---
>  gcc/common.opt                  |    4 +
>  gcc/gcc.c                       | 1219 ++++++++++++++++++++++++-------
>  gcc/testsuite/driver/a.c        |    6 +
>  gcc/testsuite/driver/b.c        |    6 +
>  gcc/testsuite/driver/driver.exp |   80 ++
>  gcc/testsuite/driver/empty.c    |    0
>  gcc/testsuite/driver/foo.c      |    7 +
>  7 files changed, 1049 insertions(+), 273 deletions(-)
>  create mode 100644 gcc/testsuite/driver/a.c
>  create mode 100644 gcc/testsuite/driver/b.c
>  create mode 100644 gcc/testsuite/driver/driver.exp
>  create mode 100644 gcc/testsuite/driver/empty.c
>  create mode 100644 gcc/testsuite/driver/foo.c
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 4b08e91859f..4aa3ad8c95b 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -3465,4 +3465,8 @@ fipa-ra
>  Common Report Var(flag_ipa_ra) Optimization
>  Use caller save register across calls if possible.
>
> +fsplit-outputs=
> +Common Joined Var(split_outputs)
> +-fsplit-outputs=<tempfile>  Filename in which current Compilation Unit will be split to.
> +
>  ; This comment is to ensure we retain the blank line above.
> diff --git a/gcc/gcc.c b/gcc/gcc.c
> index 10bc9881aed..c276a11ca7a 100644
> --- a/gcc/gcc.c
> +++ b/gcc/gcc.c
> @@ -343,6 +343,74 @@ static struct obstack obstack;
>
>  static struct obstack collect_obstack;
>
> +/* This is used to store new argv arrays created dinamically to avoid memory
> +   leaks.  */
> +
> +class extra_arg_storer
> +{
> +  public:
> +
> +    /* Initialize the vec with a default size.  */
> +
> +    extra_arg_storer ()
> +      {
> +       string_vec.create (8);
> +       extra_args.create (64);
> +      }
> +
> +    /* Create new array of strings of size N.  */
> +    const char **create_new (size_t n)
> +      {
> +       const char **ret = XNEWVEC (const char *, n);
> +       extra_args.safe_push (ret);
> +       return ret;
> +      }
> +
> +    char *create_string (size_t n)
> +      {
> +       char *ret = XNEWVEC (char, n);
> +       string_vec.safe_push (ret);
> +       return ret;
> +      }
> +
> +    void store (char *str)
> +      {
> +       string_vec.safe_push (str);
> +      }
> +
> +    ~extra_arg_storer ()
> +      {
> +       release_extra_args ();
> +       release_string_vec ();
> +      }
> +
> +
> +  private:
> +
> +    /* Release all allocated strings.  */
> +    void release_extra_args ()
> +      {
> +       size_t i;
> +
> +       for (i = 0; i < extra_args.length (); i++)
> +         free (extra_args[i]);
> +       extra_args.release ();
> +      }
> +
> +    void release_string_vec ()
> +      {
> +       size_t i;
> +
> +       for (i = 0; i < string_vec.length (); i++)
> +         free (string_vec[i]);
> +       string_vec.release ();
> +      }
> +
> +    /* Data structure to hold all arrays.  */
> +    vec<const char **> extra_args;
> +    vec<char *> string_vec;
> +};
> +
>  /* Forward declaration for prototypes.  */
>  struct path_prefix;
>  struct prefix_list;
> @@ -1993,6 +2061,9 @@ static int have_o = 0;
>  /* Was the option -E passed.  */
>  static int have_E = 0;
>
> +/* Was the option -S passed.  */
> +static int have_S = 0;
> +
>  /* Pointer to output file name passed in with -o. */
>  static const char *output_file = 0;
>
> @@ -3056,158 +3127,522 @@ add_sysrooted_hdrs_prefix (struct path_prefix *pprefix, const char *prefix,
>               require_machine_suffix, os_multilib);
>  }
>
> -
> -/* Execute the command specified by the arguments on the current line of spec.
> -   When using pipes, this includes several piped-together commands
> -   with `|' between them.
> +struct command
> +{
> +  const char *prog;            /* program name.  */
> +  const char **argv;           /* vector of args.  */
> +};
>
> -   Return 0 if successful, -1 if failed.  */
> +#define EMPTY_CMD(x) (!((x).prog))  /* Is the provided CMD empty?  */
> +
> +/* Check if arg is a call to a compiler.  Return false if not, true if yes.  */
> +
> +static bool
> +is_compiler (const char *arg)

This and is_assembler should somehow magically fall out of
specs processing, specifically

> +{
> +  static const char *const compilers[] = {"cc1", "cc1plus", "f771"};

^^ this is incomplete.  Of course I don't know how to auto-infer
these but I think it must be possible from somewhere up the
call-chain?

> +  const char* ptr = arg;
> +
> +  size_t i;
> +
> +  /* Jump to last '/' of string.  */
> +  while (*arg)
> +    if (*arg++ == '/')
> +      ptr = arg;
> +
> +  /* Look if current character seems valid.  */
> +  gcc_assert (!(*ptr == '\0' ||  *ptr == '/'));
> +
> +  for (i = 0; i < ARRAY_SIZE (compilers); i++)
> +    {
> +      if (!strcmp (ptr, compilers[i]))
> +       return true;
> +    }
> +
> +  return false;
> +}
> +
> +/* Check if arg is a call to as.  Return false if not, true if yes.  */
> +
> +static bool
> +is_assembler (const char *arg)
> +{
> +  static const char *const assemblers[] = {"as", "gas"};
> +  const char* ptr = arg;
> +
> +  size_t i;
> +
> +  /* Jump to last '/' of string.  */
> +  while (*arg)
> +    if (*arg++ == '/')
> +      ptr = arg;
> +
> +  /* Look if current character seems valid.  */
> +  gcc_assert (!(*ptr == '\0' ||  *ptr == '/'));
> +
> +  for (i = 0; i < ARRAY_SIZE (assemblers); i++)
> +    {
> +      if (!strcmp (ptr, assemblers[i]))
> +       return true;
> +    }
> +
> +  return false;
> +}
> +
> +/* Get argv[] array length.  */
>
>  static int
> -execute (void)
> +get_number_of_args (const char *argv[])
> +{
> +  int argc;
> +
> +  for (argc = 0; argv[argc] != NULL; argc++)
> +    ;
> +
> +  return argc;
> +}
> +
> +static const char *fsplit_arg (extra_arg_storer *);
> +
> +/* Accumulate each line in lines vec.  Return true if file exists, false if
> +   not.  */
> +
> +static bool
> +get_file_by_lines (extra_arg_storer *storer, vec<char *> *lines, const char *name)
> +{
> +  int buf_size = 64, len = 0;
> +  char *buf = XNEWVEC (char, buf_size);
> +
> +
> +  FILE *file = fopen (name, "r");
> +
> +  if (!file)
> +    return false;
> +
> +  while (1)
> +    {
> +      if (!fgets (buf + len, buf_size, file))
> +       {
> +         free (buf); /* Release buffer we created unecessarily.  */
> +         break;
> +       }
> +
> +      len = strlen (buf);
> +      if (buf[len - 1] == '\n') /* Check if we indeed read the entire line.  */
> +       {
> +         buf[len - 1] = '\0';
> +         /* Yes.  Insert into the lines vector.  */
> +         lines->safe_push (buf);
> +         len = 0;
> +
> +         /* Store the created string for future release.  */
> +         storer->store (buf);
> +         buf = XNEWVEC (char, buf_size);
> +       }
> +      else
> +       {
> +         /* No.  Increase the buffer size and read again.  */
> +         buf = XRESIZEVEC (char, buf, buf_size * 2);
> +       }
> +    }
> +
> +  if (lines->length () == 0)
> +    internal_error ("Empty file: %s", name);
> +
> +  fclose (file);
> +  return true;
> +}
> +
> +static void
> +identify_asm_file (int argc, const char *argv[],
> +                  int *infile_pos, int *outfile_pos)
>  {
>    int i;
> -  int n_commands;              /* # of command.  */
> -  char *string;
> -  struct pex_obj *pex;
> -  struct command
> -  {
> -    const char *prog;          /* program name.  */
> -    const char **argv;         /* vector of args.  */
> -  };
> -  const char *arg;
>
> -  struct command *commands;    /* each command buffer with above info.  */
> +  static const char *asm_extension[] = {"s", "S"};
>
> -  gcc_assert (!processing_spec_function);
> +  bool infile_found = false;
> +  bool outfile_found = false;
>
> -  if (wrapper_string)
> +  for (i = 0; i < argc; i++)
>      {
> -      string = find_a_file (&exec_prefixes,
> -                           argbuf[0], X_OK, false);
> -      if (string)
> -       argbuf[0] = string;
> -      insert_wrapper (wrapper_string);
> +      const char *arg = argv[i];
> +      const char *ext = argv[i];
> +      unsigned j;
> +
> +      /* Jump to last '.' of string.  */
> +      while (*arg)
> +       if (*arg++ == '.')
> +         ext = arg;
> +
> +      if (!infile_found)
> +       for (j = 0; j < ARRAY_SIZE (asm_extension); ++j)
> +           if (!strcmp (ext, asm_extension[j]))
> +             {
> +               infile_found = true;
> +               *infile_pos = i;
> +               break;
> +             }
> +
> +      if (!outfile_found)
> +       if (!strcmp (ext, "-o"))
> +         {
> +           outfile_found = true;
> +           *outfile_pos = i+1;
> +         }
> +
> +      if (infile_found && outfile_found)
> +       return;
>      }
>
> -  /* Count # of piped commands.  */
> -  for (n_commands = 1, i = 0; argbuf.iterate (i, &arg); i++)
> -    if (strcmp (arg, "|") == 0)
> -      n_commands++;
> +  gcc_assert (infile_found && outfile_found);
>
> -  /* Get storage for each command.  */
> -  commands = (struct command *) alloca (n_commands * sizeof (struct command));
> +}
>
> -  /* Split argbuf into its separate piped processes,
> -     and record info about each one.
> -     Also search for the programs that are to be run.  */
> +/* Language is one of three things:
>
> -  argbuf.safe_push (0);
> +   1) The name of a real programming language.
> +   2) NULL, indicating that no one has figured out
> +   what it is yet.
> +   3) '*', indicating that the file should be passed
> +   to the linker.  */
> +struct infile
> +{
> +  const char *name;
> +  const char *language;
> +  const char *temp_additional_asm;
> +  struct compiler *incompiler;
> +  bool compiled;
> +  bool preprocessed;
> +};
>
> -  commands[0].prog = argbuf[0]; /* first command.  */
> -  commands[0].argv = argbuf.address ();
> +/* Also a vector of input files specified.  */
>
> -  if (!wrapper_string)
> +static struct infile *infiles;
> +static struct infile *current_infile = NULL;
> +
> +int n_infiles;
> +
> +static int n_infiles_alloc;
> +
> +static vec<const char *> temp_object_files;
> +
> +/* Get path to the configured ld.  */
> +
> +static const char *
> +get_path_to_ld (void)
> +{
> +  const char *ret = find_a_file (&exec_prefixes, LINKER_NAME, X_OK, false);
> +  if (!ret)
> +    ret = "ld";
> +
> +  return ret;
> +}
> +
> +/* Check if a hidden -E was passed as argument to something.  */
> +
> +static bool
> +has_hidden_E (int argc, const char *argv[])
> +{
> +  int i;
> +  for (i = 0; i < argc; ++i)
> +    if (!strcmp (argv[i], "-E"))
> +      return true;
> +
> +  return false;
> +}
> +
> +/* Assembler in the container file are inserted as soon as they are ready.
> +   Sort them so that builds are reproducible.  */

In principle the list of outputs is pre-determined by the
scheduler compiling the partitions - is there any reason
to write the file with the output names only incrementally
rather than in one (sorted) go?

> +static void
> +sort_asm_files (vec <char *> *_lines)
> +{
> +  vec <char *> &lines = *_lines;
> +  int i, n = lines.length ();
> +  char **temp_buf = XALLOCAVEC (char *, n);
> +
> +  for (i = 0; i < n; i++)
> +    temp_buf[i] = lines[i];
> +
> +  for (i = 0; i < n; i++)
>      {
> -      string = find_a_file (&exec_prefixes, commands[0].prog, X_OK, false);
> -      if (string)
> -       commands[0].argv[0] = string;
> +      char *no_str = strtok (temp_buf[i], " ");
> +      char *name = strtok (NULL, "");
> +
> +      int pos = atoi (no_str);
> +      lines[pos] = name;
>      }
> +}
>
> -  for (n_commands = 1, i = 0; argbuf.iterate (i, &arg); i++)
> -    if (arg && strcmp (arg, "|") == 0)
> -      {                                /* each command.  */
> -#if defined (__MSDOS__) || defined (OS2) || defined (VMS)
> -       fatal_error (input_location, "%<-pipe%> not supported");
> -#endif
> -       argbuf[i] = 0; /* Termination of command args.  */
> -       commands[n_commands].prog = argbuf[i + 1];
> -       commands[n_commands].argv
> -         = &(argbuf.address ())[i + 1];
> -       string = find_a_file (&exec_prefixes, commands[n_commands].prog,
> -                             X_OK, false);
> -       if (string)
> -         commands[n_commands].argv[0] = string;
> -       n_commands++;
> -      }
> +/* Append -fsplit-output=<tempfile> to all calls to compilers.  Return true
> +   if a additional call to LD is required to merge the resulting files.  */
>
> -  /* If -v, print what we are about to do, and maybe query.  */
> +static void
> +append_split_outputs (extra_arg_storer *storer,
> +                     struct command *additional_ld,
> +                     struct command **_commands,
> +                     int *_n_commands)
> +{
> +  int i;
>
> -  if (verbose_flag)
> +  struct command *commands = *_commands;
> +  int n_commands = *_n_commands;
> +
> +  const char **argv;
> +  int argc;
> +
> +  if (is_compiler (commands[0].prog))
> +    {
> +      argc = get_number_of_args (commands[0].argv);
> +      argv = storer->create_new (argc + 4);
> +
> +      memcpy (argv, commands[0].argv, argc * sizeof (const char *));
> +
> +      if (!has_hidden_E (argc, commands[0].argv))
> +       {
> +         const char *extra_argument = fsplit_arg (storer);
> +         argv[argc++] = extra_argument;
> +       }
> +
> +      if (have_c)
> +       {
> +         argv[argc++] = "-fPIE";
> +         argv[argc++] = "-fPIC";

Uh, I think this has to go away - this must be from some early
problems and no longer necessary?

> +       }
> +
> +      argv[argc]   = NULL;
> +
> +      commands[0].argv = argv;
> +    }
> +
> +  else if (is_assembler (commands[0].prog))
>      {
> -      /* For help listings, put a blank line between sub-processes.  */
> -      if (print_help_list)
> -       fputc ('\n', stderr);
> +      vec<char *> additional_asm_files;
> +
> +      struct command orig;
> +      const char **orig_argv;
> +      int orig_argc;
> +      const char *orig_obj_file;
> +
> +      int infile_pos = -1;
> +      int outfile_pos = -1;
> +
> +      static const char *path_to_ld = NULL;
> +
> +      if (!current_infile->temp_additional_asm)
> +       {
> +         /* Return because we did not create a additional-asm file for this
> +            input.  */
> +
> +         return;
> +       }
> +
> +      additional_asm_files.create (2);
> +
> +      if (!get_file_by_lines (storer, &additional_asm_files,
> +                             current_infile->temp_additional_asm))
> +       {
> +         additional_asm_files.release ();
> +         return; /* File not found.  This means that cc1* decided not to
> +                     parallelize.  */
> +       }
> +
> +      sort_asm_files (&additional_asm_files);
> +
> +      if (n_commands != 1)
> +       fatal_error (input_location,
> +                    "Auto parallelism is unsupported when piping commands");
> +
> +      if (!path_to_ld)
> +       path_to_ld = get_path_to_ld ();
> +
> +      /* Get original command.  */
> +      orig = commands[0];
> +      orig_argv = commands[0].argv;
> +      orig_argc = get_number_of_args (orig.argv);
> +
> +
> +      /* Update commands array to include the extra `as' calls.  */
> +      *_n_commands = additional_asm_files.length ();
> +      n_commands = *_n_commands;
> +
> +      gcc_assert (n_commands > 0);
> +
> +      identify_asm_file (orig_argc, orig_argv, &infile_pos, &outfile_pos);
> +
> +      *_commands = XRESIZEVEC (struct command, *_commands, n_commands);
> +      commands = *_commands;
>
> -      /* Print each piped command as a separate line.  */
>        for (i = 0; i < n_commands; i++)
>         {
> -         const char *const *j;
> +         const char **argv = storer->create_new (orig_argc + 1);
> +         const char *temp_obj = make_temp_file ("additional-obj.o");
> +         record_temp_file (temp_obj, true, true);
> +         record_temp_file (additional_asm_files[i], true, true);
> +
> +         memcpy (argv, orig_argv, (orig_argc + 1) * sizeof (const char *));
> +
> +         orig_obj_file = argv[outfile_pos];
> +
> +         argv[infile_pos]  = additional_asm_files[i];
> +         argv[outfile_pos] = temp_obj;
> +
> +         commands[i].prog = orig.prog;
> +         commands[i].argv = argv;
> +
> +         temp_object_files.safe_push (temp_obj);
> +       }
> +
> +       if (have_c)
> +         {
> +           unsigned int num_temp_objs = temp_object_files.length ();
> +           const char **argv = storer->create_new (num_temp_objs + 5);
> +           unsigned int j;
> +
> +           argv[0] = path_to_ld;
> +           argv[1] = "-o";
> +           argv[2] = orig_obj_file;
> +           argv[3] = "-r";
> +
> +           for (j = 0; j < num_temp_objs; j++)
> +             argv[j + 4] = temp_object_files[j];
> +           argv[j + 4] = NULL;
> +
> +           additional_ld->prog = path_to_ld;
> +           additional_ld->argv = argv;
> +
> +           if (!have_o)
> +             temp_object_files.truncate (0);
> +         }
> +
> +       additional_asm_files.release ();
> +    }
> +}
> +
> +DEBUG_FUNCTION void
> +print_command (struct command *command)
> +{
> +  const char **argv;
> +
> +  for (argv = command->argv; *argv != NULL; argv++)
> +    fprintf (stdout, " %s", *argv);
> +  fputc ('\n', stdout);
> +}
> +
> +DEBUG_FUNCTION void
> +print_commands (int n, struct command *commands)
> +{
> +  int i;
> +
> +  for (i = 0; i < n; i++)
> +    print_command (&commands[i]);
> +}
> +
> +DEBUG_FUNCTION void
> +print_argbuf ()
> +{
> +  int i;
> +  const char *arg;
> +
> +  for (i = 0; argbuf.iterate (i, &arg); i++)
> +    fprintf (stdout, "%s ", arg);
> +  fputc ('\n', stdout);
> +}
> +
> +
> +/* Print what commands will run.  Return 0 if success, anything else on
> +   error.  */
>
> -         if (verbose_only_flag)
> +static int
> +handle_verbose (int n_commands, struct command commands[])
> +{
> +  int i;
> +
> +  /* For help listings, put a blank line between sub-processes.  */
> +  if (print_help_list)
> +    fputc ('\n', stderr);
> +
> +  /* Print each piped command as a separate line.  */
> +  for (i = 0; i < n_commands; i++)
> +    {
> +      const char *const *j;
> +
> +      if (verbose_only_flag)
> +       {
> +         for (j = commands[i].argv; *j; j++)
>             {
> -             for (j = commands[i].argv; *j; j++)
> +             const char *p;
> +             for (p = *j; *p; ++p)
> +               if (!ISALNUM ((unsigned char) *p)
> +                   && *p != '_' && *p != '/' && *p != '-' && *p != '.')
> +                 break;
> +             if (*p || !*j)
>                 {
> -                 const char *p;
> +                 fprintf (stderr, " \"");
>                   for (p = *j; *p; ++p)
> -                   if (!ISALNUM ((unsigned char) *p)
> -                       && *p != '_' && *p != '/' && *p != '-' && *p != '.')
> -                     break;
> -                 if (*p || !*j)
>                     {
> -                     fprintf (stderr, " \"");
> -                     for (p = *j; *p; ++p)
> -                       {
> -                         if (*p == '"' || *p == '\\' || *p == '$')
> -                           fputc ('\\', stderr);
> -                         fputc (*p, stderr);
> -                       }
> -                     fputc ('"', stderr);
> +                     if (*p == '"' || *p == '\\' || *p == '$')
> +                       fputc ('\\', stderr);
> +                     fputc (*p, stderr);
>                     }
> -                 /* If it's empty, print "".  */
> -                 else if (!**j)
> -                   fprintf (stderr, " \"\"");
> -                 else
> -                   fprintf (stderr, " %s", *j);
> -               }
> -           }
> -         else
> -           for (j = commands[i].argv; *j; j++)
> +                 fputc ('"', stderr);
> +               }
>               /* If it's empty, print "".  */
> -             if (!**j)
> +             else if (!**j)
>                 fprintf (stderr, " \"\"");
>               else
>                 fprintf (stderr, " %s", *j);
> -
> -         /* Print a pipe symbol after all but the last command.  */
> -         if (i + 1 != n_commands)
> -           fprintf (stderr, " |");
> -         fprintf (stderr, "\n");
> +           }
>         }
> -      fflush (stderr);
> -      if (verbose_only_flag != 0)
> -        {
> -         /* verbose_only_flag should act as if the spec was
> -            executed, so increment execution_count before
> -            returning.  This prevents spurious warnings about
> -            unused linker input files, etc.  */
> -         execution_count++;
> -         return 0;
> -        }
> +      else
> +       for (j = commands[i].argv; *j; j++)
> +         /* If it's empty, print "".  */
> +         if (!**j)
> +           fprintf (stderr, " \"\"");
> +         else
> +           fprintf (stderr, " %s", *j);
> +
> +      /* Print a pipe symbol after all but the last command.  */
> +      if (i + 1 != n_commands)
> +       fprintf (stderr, " |");
> +      fprintf (stderr, "\n");
> +    }
> +  fflush (stderr);
> +  if (verbose_only_flag != 0)
> +    {
> +      /* verbose_only_flag should act as if the spec was
> +        executed, so increment execution_count before
> +        returning.  This prevents spurious warnings about
> +        unused linker input files, etc.  */
> +      execution_count++;
> +      return 1;
> +    }
>  #ifdef DEBUG
> -      fnotice (stderr, "\nGo ahead? (y or n) ");
> -      fflush (stderr);
> -      i = getchar ();
> -      if (i != '\n')
> -       while (getchar () != '\n')
> -         ;
> -
> -      if (i != 'y' && i != 'Y')
> -       return 0;
> +  fnotice (stderr, "\nGo ahead? (y or n) ");
> +  fflush (stderr);
> +  i = getchar ();
> +  if (i != '\n')
> +    while (getchar () != '\n')
> +      ;
> +
> +  if (i != 'y' && i != 'Y')
> +    return 1;
>  #endif /* DEBUG */
> -    }
> +
> +  return 0;
> +}
>
>  #ifdef ENABLE_VALGRIND_CHECKING
> +
> +/* Append valgrind to each program.  */
> +
> +static void
> +append_valgrind (struct obstack *to_be_released,
> +                int n_commands, struct command commands[])
> +{
> +  int i;
> +
>    /* Run the each command through valgrind.  To simplify prepending the
>       path to valgrind and the option "-q" (for quiet operation unless
>       something triggers), we allocate a separate argv array.  */
> @@ -3221,7 +3656,7 @@ execute (void)
>        for (argc = 0; commands[i].argv[argc] != NULL; argc++)
>         ;
>
> -      argv = XALLOCAVEC (const char *, argc + 3);
> +      argv = obstack_alloc (to_be_released, (argc + 3) * sizeof (const char *));
>
>        argv[0] = VALGRIND_PATH;
>        argv[1] = "-q";
> @@ -3232,15 +3667,16 @@ execute (void)
>        commands[i].argv = argv;
>        commands[i].prog = argv[0];
>      }
> +}
>  #endif
>
> -  /* Run each piped subprocess.  */
> +/* Launch a list of commands asynchronously.  */
>
> -  pex = pex_init (PEX_USE_PIPES | ((report_times || report_times_to_file)
> -                                  ? PEX_RECORD_TIMES : 0),
> -                 progname, temp_filename);
> -  if (pex == NULL)
> -    fatal_error (input_location, "%<pex_init%> failed: %m");
> +static void
> +async_launch_commands (struct pex_obj *pex,
> +                      int n_commands, struct command commands[])
> +{
> +  int i;
>
>    for (i = 0; i < n_commands; i++)
>      {
> @@ -3267,151 +3703,341 @@ execute (void)
>      }
>
>    execution_count++;
> +}
>
> -  /* Wait for all the subprocesses to finish.  */
>
> -  {
> -    int *statuses;
> -    struct pex_time *times = NULL;
> -    int ret_code = 0;
> +/* Wait for all the subprocesses to finish.  Return 0 on success, -1 on
> +   failure.  */
>
> -    statuses = (int *) alloca (n_commands * sizeof (int));
> -    if (!pex_get_status (pex, n_commands, statuses))
> -      fatal_error (input_location, "failed to get exit status: %m");
> +static int
> +await_commands_to_finish (struct pex_obj *pex,
> +                         int n_commands, struct command commands[])
> +{
>
> -    if (report_times || report_times_to_file)
> -      {
> -       times = (struct pex_time *) alloca (n_commands * sizeof (struct pex_time));
> -       if (!pex_get_times (pex, n_commands, times))
> -         fatal_error (input_location, "failed to get process times: %m");
> -      }
> +  int *statuses;
> +  struct pex_time *times = NULL;
> +  int ret_code = 0, i;
>
> -    pex_free (pex);
> +  statuses = (int *) alloca (n_commands * sizeof (int));
> +  if (!pex_get_status (pex, n_commands, statuses))
> +    fatal_error (input_location, "failed to get exit status: %m");
>
> -    for (i = 0; i < n_commands; ++i)
> -      {
> -       int status = statuses[i];
> +  if (report_times || report_times_to_file)
> +    {
> +      times = (struct pex_time *) alloca (n_commands * sizeof (*times));
> +      if (!pex_get_times (pex, n_commands, times))
> +       fatal_error (input_location, "failed to get process times: %m");
> +    }
>
> -       if (WIFSIGNALED (status))
> -         switch (WTERMSIG (status))
> -           {
> -           case SIGINT:
> -           case SIGTERM:
> -             /* SIGQUIT and SIGKILL are not available on MinGW.  */
> +  for (i = 0; i < n_commands; ++i)
> +    {
> +      int status = statuses[i];
> +
> +      if (WIFSIGNALED (status))
> +       switch (WTERMSIG (status))
> +         {
> +         case SIGINT:
> +         case SIGTERM:
> +           /* SIGQUIT and SIGKILL are not available on MinGW.  */
>  #ifdef SIGQUIT
> -           case SIGQUIT:
> +         case SIGQUIT:
>  #endif
>  #ifdef SIGKILL
> -           case SIGKILL:
> +         case SIGKILL:
>  #endif
> -             /* The user (or environment) did something to the
> -                inferior.  Making this an ICE confuses the user into
> -                thinking there's a compiler bug.  Much more likely is
> -                the user or OOM killer nuked it.  */
> -             fatal_error (input_location,
> -                          "%s signal terminated program %s",
> -                          strsignal (WTERMSIG (status)),
> -                          commands[i].prog);
> -             break;
> +           /* The user (or environment) did something to the
> +              inferior.  Making this an ICE confuses the user into
> +              thinking there's a compiler bug.  Much more likely is
> +              the user or OOM killer nuked it.  */
> +           fatal_error (input_location,
> +                        "%s signal terminated program %s",
> +                        strsignal (WTERMSIG (status)),
> +                        commands[i].prog);
> +           break;
>
>  #ifdef SIGPIPE
> -           case SIGPIPE:
> -             /* SIGPIPE is a special case.  It happens in -pipe mode
> -                when the compiler dies before the preprocessor is
> -                done, or the assembler dies before the compiler is
> -                done.  There's generally been an error already, and
> -                this is just fallout.  So don't generate another
> -                error unless we would otherwise have succeeded.  */
> -             if (signal_count || greatest_status >= MIN_FATAL_STATUS)
> -               {
> -                 signal_count++;
> -                 ret_code = -1;
> -                 break;
> -               }
> +         case SIGPIPE:
> +           /* SIGPIPE is a special case.  It happens in -pipe mode
> +              when the compiler dies before the preprocessor is
> +              done, or the assembler dies before the compiler is
> +              done.  There's generally been an error already, and
> +              this is just fallout.  So don't generate another
> +              error unless we would otherwise have succeeded.  */
> +           if (signal_count || greatest_status >= MIN_FATAL_STATUS)
> +             {
> +               signal_count++;
> +               ret_code = -1;
> +               break;
> +             }
>  #endif
> -             /* FALLTHROUGH */
> +           /* FALLTHROUGH.  */
>
> -           default:
> -             /* The inferior failed to catch the signal.  */
> -             internal_error_no_backtrace ("%s signal terminated program %s",
> -                                          strsignal (WTERMSIG (status)),
> -                                          commands[i].prog);
> -           }
> -       else if (WIFEXITED (status)
> -                && WEXITSTATUS (status) >= MIN_FATAL_STATUS)
> -         {
> -           /* For ICEs in cc1, cc1obj, cc1plus see if it is
> -              reproducible or not.  */
> -           const char *p;
> -           if (flag_report_bug
> -               && WEXITSTATUS (status) == ICE_EXIT_CODE
> -               && i == 0
> -               && (p = strrchr (commands[0].argv[0], DIR_SEPARATOR))
> -               && ! strncmp (p + 1, "cc1", 3))
> -             try_generate_repro (commands[0].argv);
> -           if (WEXITSTATUS (status) > greatest_status)
> -             greatest_status = WEXITSTATUS (status);
> -           ret_code = -1;
> +         default:
> +           /* The inferior failed to catch the signal.  */
> +           internal_error_no_backtrace ("%s signal terminated program %s",
> +                                        strsignal (WTERMSIG (status)),
> +                                        commands[i].prog);
>           }
> +      else if (WIFEXITED (status)
> +              && WEXITSTATUS (status) >= MIN_FATAL_STATUS)
> +       {
> +         /* For ICEs in cc1, cc1obj, cc1plus see if it is
> +            reproducible or not.  */
> +         const char *p;
> +         if (flag_report_bug
> +             && WEXITSTATUS (status) == ICE_EXIT_CODE
> +             && i == 0
> +             && (p = strrchr (commands[0].argv[0], DIR_SEPARATOR))
> +             && ! strncmp (p + 1, "cc1", 3))
> +           try_generate_repro (commands[0].argv);
> +         if (WEXITSTATUS (status) > greatest_status)
> +           greatest_status = WEXITSTATUS (status);
> +         ret_code = -1;
> +       }
>
> -       if (report_times || report_times_to_file)
> -         {
> -           struct pex_time *pt = &times[i];
> -           double ut, st;
> +      if (report_times || report_times_to_file)
> +       {
> +         struct pex_time *pt = &times[i];
> +         double ut, st;
>
> -           ut = ((double) pt->user_seconds
> -                 + (double) pt->user_microseconds / 1.0e6);
> -           st = ((double) pt->system_seconds
> -                 + (double) pt->system_microseconds / 1.0e6);
> +         ut = ((double) pt->user_seconds
> +               + (double) pt->user_microseconds / 1.0e6);
> +         st = ((double) pt->system_seconds
> +               + (double) pt->system_microseconds / 1.0e6);
>
> -           if (ut + st != 0)
> -             {
> -               if (report_times)
> -                 fnotice (stderr, "# %s %.2f %.2f\n",
> -                          commands[i].prog, ut, st);
> +         if (ut + st != 0)
> +           {
> +             if (report_times)
> +               fnotice (stderr, "# %s %.2f %.2f\n",
> +                        commands[i].prog, ut, st);
>
> -               if (report_times_to_file)
> -                 {
> -                   int c = 0;
> -                   const char *const *j;
> +             if (report_times_to_file)
> +               {
> +                 int c = 0;
> +                 const char *const *j;
>
> -                   fprintf (report_times_to_file, "%g %g", ut, st);
> +                 fprintf (report_times_to_file, "%g %g", ut, st);
>
> -                   for (j = &commands[i].prog; *j; j = &commands[i].argv[++c])
> -                     {
> -                       const char *p;
> -                       for (p = *j; *p; ++p)
> -                         if (*p == '"' || *p == '\\' || *p == '$'
> -                             || ISSPACE (*p))
> -                           break;
> +                 for (j = &commands[i].prog; *j; j = &commands[i].argv[++c])
> +                   {
> +                     const char *p;
> +                     for (p = *j; *p; ++p)
> +                       if (*p == '"' || *p == '\\' || *p == '$'
> +                           || ISSPACE (*p))
> +                         break;
>
> -                       if (*p)
> -                         {
> -                           fprintf (report_times_to_file, " \"");
> -                           for (p = *j; *p; ++p)
> -                             {
> -                               if (*p == '"' || *p == '\\' || *p == '$')
> -                                 fputc ('\\', report_times_to_file);
> -                               fputc (*p, report_times_to_file);
> -                             }
> -                           fputc ('"', report_times_to_file);
> -                         }
> -                       else
> -                         fprintf (report_times_to_file, " %s", *j);
> -                     }
> +                     if (*p)
> +                       {
> +                         fprintf (report_times_to_file, " \"");
> +                         for (p = *j; *p; ++p)
> +                           {
> +                             if (*p == '"' || *p == '\\' || *p == '$')
> +                               fputc ('\\', report_times_to_file);
> +                             fputc (*p, report_times_to_file);
> +                           }
> +                         fputc ('"', report_times_to_file);
> +                       }
> +                     else
> +                       fprintf (report_times_to_file, " %s", *j);
> +                   }
>
> -                   fputc ('\n', report_times_to_file);
> -                 }
> -             }
> -         }
> +                 fputc ('\n', report_times_to_file);
> +               }
> +           }
> +       }
> +    }
> +
> +  return ret_code;
> +}
> +
> +/* Split a single command with pipes into several commands.  */
> +
> +static void
> +split_commands (vec<const_char_p> *argbuf_p,
> +               int n_commands, struct command commands[])
> +{
> +  int i;
> +  const char *arg;
> +  vec<const_char_p> &argbuf = *argbuf_p;
> +
> +  for (n_commands = 1, i = 0; argbuf.iterate (i, &arg); i++)
> +    if (arg && strcmp (arg, "|") == 0)
> +      {                                /* each command.  */
> +       const char *string;
> +#if defined (__MSDOS__) || defined (OS2) || defined (VMS)
> +       fatal_error (input_location, "%<-pipe%> not supported");
> +#endif
> +       argbuf[i] = 0; /* Termination of command args.  */
> +       commands[n_commands].prog = argbuf[i + 1];
> +       commands[n_commands].argv
> +         = &(argbuf.address ())[i + 1];
> +       string = find_a_file (&exec_prefixes, commands[n_commands].prog,
> +                             X_OK, false);
> +       if (string)
> +         commands[n_commands].argv[0] = string;
> +       n_commands++;
>        }
> +}
> +
> +struct command *
> +parse_argbuf (vec <const_char_p> *argbuf_p, int *n)
> +{
> +  int i, n_commands;
> +  vec<const_char_p> &argbuf = *argbuf_p;
> +  const char *arg;
> +  struct command *commands;
>
> -   if (commands[0].argv[0] != commands[0].prog)
> -     free (CONST_CAST (char *, commands[0].argv[0]));
> +  /* Count # of piped commands.  */
> +  for (n_commands = 1, i = 0; argbuf.iterate (i, &arg); i++)
> +    if (strcmp (arg, "|") == 0)
> +      n_commands++;
>
> -    return ret_code;
> -  }
> +  /* Get storage for each command.  */
> +  commands = XNEWVEC (struct command, n_commands);
> +
> +  /* Split argbuf into its separate piped processes,
> +     and record info about each one.
> +     Also search for the programs that are to be run.  */
> +
> +  argbuf.safe_push (0);
> +
> +  commands[0].prog = argbuf[0]; /* first command.  */
> +  commands[0].argv = argbuf.address ();
> +
> +  split_commands (argbuf_p, n_commands, commands);
> +
> +  *n = n_commands;
> +  return commands;
> +}
> +
> +/* Execute the command specified by the arguments on the current line of spec.
> +   When using pipes, this includes several piped-together commands
> +   with `|' between them.
> +
> +   Return 0 if successful, -1 if failed.  */
> +
> +static int
> +execute (void)
> +{
> +  struct pex_obj *pex;
> +  struct command *commands;     /* each command buffer with program to call
> +                                   and arguments.  */
> +  int n_commands;               /* # of command.  */
> +  int ret = 0;
> +
> +  struct command additional_ld = {NULL, NULL};
> +  extra_arg_storer storer;
> +
> +  struct command *commands_batch;
> +  int n;
> +
> +  gcc_assert (!processing_spec_function);
> +
> +  if (wrapper_string)
> +    {
> +      char *string = find_a_file (&exec_prefixes, argbuf[0], X_OK, false);
> +      if (string)
> +       argbuf[0] = string;
> +      insert_wrapper (wrapper_string);
> +    }
> +
> +  /* Parse the argbuf into several commands.  */
> +  commands = parse_argbuf (&argbuf, &n_commands);
> +
> +  if (!have_S && !have_E && flag_parallel_jobs)
> +    append_split_outputs (&storer, &additional_ld, &commands, &n_commands);
> +
> +  if (!wrapper_string)
> +    {
> +      char *string = find_a_file (&exec_prefixes, commands[0].prog,
> +                                 X_OK, false);
> +      if (string)
> +       commands[0].argv[0] = string;
> +    }
> +
> +  /* If -v, print what we are about to do, and maybe query.  */
> +
> +  if (verbose_flag)
> +    {
> +      int ret_verbose = handle_verbose (n_commands, commands);
> +      if (ret_verbose > 0)
> +       {
> +         ret = 0;
> +         goto cleanup;
> +       }
> +    }
> +
> +#ifdef ENABLE_VALGRIND_CHECKING
> +  /* Stack of strings to be released on function return.  */
> +  struct obstack to_be_released;
> +  obstack_init (&to_be_released);
> +  append_valgrind (&to_be_released, n_commands, commands);
> +#endif
> +
> +  /* FIXME: Interact with GNU Jobserver if necessary.  */
> +
> +  commands_batch = commands;
> +  n = flag_parallel_jobs? 1: n_commands;
> +
> +  for (int i = 0; i < n_commands; i += n)
> +    {
> +      /* Run each piped subprocess.  */
> +
> +      pex = pex_init (PEX_USE_PIPES | ((report_times || report_times_to_file)
> +                                      ? PEX_RECORD_TIMES : 0),
> +                     progname, temp_filename);
> +      if (pex == NULL)
> +       fatal_error (input_location, "%<pex_init%> failed: %m");
> +
> +      /* Lauch the commands.  */
> +      async_launch_commands (pex, n, commands_batch);
> +
> +      /* Await them to be done.  */
> +      ret |= await_commands_to_finish (pex, n, commands_batch);
> +
> +      commands_batch = commands_batch + n;
> +
> +      /* Cleanup.  */
> +      pex_free (pex);
> +    }
> +
> +
> +  if (ret != 0)
> +    goto cleanup;
> +
> +  /* Run extra ld call.  */
> +  if (!EMPTY_CMD (additional_ld))
> +    {
> +      /* If we are here, we must be sure that we had at least two object
> +        files to link.  */
> +      //gcc_assert (n_commands != 1);
> +
> +      pex = pex_init (PEX_USE_PIPES | ((report_times || report_times_to_file)
> +                                      ? PEX_RECORD_TIMES : 0),
> +                     progname, temp_filename);
> +
> +      if (verbose_flag)
> +       print_command (&additional_ld);
> +
> +      async_launch_commands (pex, 1, &additional_ld);
> +      ret = await_commands_to_finish (pex, 1, &additional_ld);
> +      pex_free (pex);
> +    }
> +
> +
> +#ifdef ENABLE_VALGRIND_CHECKING
> +  obstack_free (&to_be_released, NULL);
> +#endif
> +
> +cleanup:
> +  if (commands[0].argv[0] != commands[0].prog)
> +    free (CONST_CAST (char *, commands[0].argv[0]));
> +
> +  free (commands);
> +
> +  return ret;
>  }
> +
>
>  /* Find all the switches given to us
>     and make a vector describing them.
> @@ -3480,29 +4106,33 @@ static int n_switches_alloc_debug_check[2];
>
>  static char *debug_check_temp_file[2];
>
> -/* Language is one of three things:
> -
> -   1) The name of a real programming language.
> -   2) NULL, indicating that no one has figured out
> -   what it is yet.
> -   3) '*', indicating that the file should be passed
> -   to the linker.  */
> -struct infile
> +static const char *
> +fsplit_arg (extra_arg_storer *storer)
>  {
> -  const char *name;
> -  const char *language;
> -  struct compiler *incompiler;
> -  bool compiled;
> -  bool preprocessed;
> -};
> +  const char *tempname = make_temp_file ("additional-asm");
> +  const char arg[] = "-fsplit-outputs=";
> +  char *final;
>
> -/* Also a vector of input files specified.  */
> +  size_t n = ARRAY_SIZE (arg) + strlen (tempname);
>
> -static struct infile *infiles;
> +  gcc_assert (current_infile);
>
> -int n_infiles;
> +  current_infile->temp_additional_asm = tempname;
> +
> +  /* Remove file, once we may not even need it and create it later.  */
> +  /* FIXME: This is a little hackish.  */
> +  remove (tempname);
> +
> +  final = storer->create_string (n);
> +
> +  strcpy (final, arg);
> +  strcat (final, tempname);
> +
> +  record_temp_file (tempname, true, true);
> +
> +  return final;
> +}
>
> -static int n_infiles_alloc;
>
>  /* True if undefined environment variables encountered during spec processing
>     are ok to ignore, typically when we're running for --help or --version.  */
> @@ -3683,6 +4313,8 @@ alloc_infile (void)
>      {
>        n_infiles_alloc = 16;
>        infiles = XNEWVEC (struct infile, n_infiles_alloc);
> +      memset (infiles, 0x00, sizeof (*infiles) * n_infiles_alloc);
> +
>      }
>    else if (n_infiles_alloc == n_infiles)
>      {
> @@ -4648,6 +5280,9 @@ process_command (unsigned int decoded_options_count,
>        switch (decoded_options[j].opt_index)
>         {
>         case OPT_S:
> +         have_S = 1;
> +         have_c = 1;
> +         break;
>         case OPT_c:
>         case OPT_E:
>           have_c = 1;
> @@ -6155,11 +6790,14 @@ do_spec_1 (const char *spec, int inswitch, const char *soft_matched_part)
>                   open_at_file ();
>
>                 for (i = 0; (int) i < n_infiles; i++)
> -                 if (compile_input_file_p (&infiles[i]))
> -                   {
> -                     store_arg (infiles[i].name, 0, 0);
> -                     infiles[i].compiled = true;
> -                   }
> +                 {
> +                   current_infile = &infiles[i];
> +                   if (compile_input_file_p (current_infile))
> +                     {
> +                       store_arg (current_infile->name, 0, 0);
> +                       current_infile->compiled = true;
> +                     }
> +                 }
>
>                 if (at_file_supplied)
>                   close_at_file ();
> @@ -6515,7 +7153,7 @@ do_spec_1 (const char *spec, int inswitch, const char *soft_matched_part)
>                      "%{foo=*:bar%*}%{foo=*:one%*two}"
>
>                    matches -foo=hello then it will produce:
> -
> +
>                      barhello onehellotwo
>                 */
>                 if (*p == 0 || *p == '}')
> @@ -8642,6 +9280,7 @@ driver::do_spec_on_infiles () const
>    for (i = 0; (int) i < n_infiles; i++)
>      {
>        int this_file_error = 0;
> +      current_infile = &infiles[i];
>
>        /* Tell do_spec what to substitute for %i.  */
>
> @@ -8761,12 +9400,15 @@ driver::do_spec_on_infiles () const
>        int i;
>
>        for (i = 0; i < n_infiles ; i++)
> -       if (infiles[i].incompiler
> -           || (infiles[i].language && infiles[i].language[0] != '*'))
> -         {
> -           set_input (infiles[i].name);
> -           break;
> -         }
> +       {
> +         current_infile = &infiles[i];
> +         if (infiles[i].incompiler
> +             || (infiles[i].language && infiles[i].language[0] != '*'))
> +           {
> +             set_input (infiles[i].name);
> +             break;
> +           }
> +       }
>      }
>
>    if (!seen_error ())
> @@ -8788,11 +9430,31 @@ driver::maybe_run_linker (const char *argv0) const
>    int linker_was_run = 0;
>    int num_linker_inputs;
>
> -  /* Determine if there are any linker input files.  */
> -  num_linker_inputs = 0;
> -  for (i = 0; (int) i < n_infiles; i++)
> -    if (explicit_link_files[i] || outfiles[i] != NULL)
> -      num_linker_inputs++;
> +  /* Set outfiles to be the temporary object vector.  */
> +  const char **outfiles_holder = outfiles;
> +  int n_infiles_holder = n_infiles;
> +  bool outfiles_switched = false;
> +  if (temp_object_files.length () > 0)
> +    {
> +      /* Insert explicit link files into the temp object vector.  */
> +
> +      for (i = 0; (int) i < n_infiles; i++)
> +       if (explicit_link_files[i] && outfiles[i] != NULL)
> +         temp_object_files.safe_push (outfiles[i]);
> +
> +      num_linker_inputs = n_infiles = temp_object_files.length ();
> +      temp_object_files.safe_push (NULL); /* the NULL sentinel.  */
> +      outfiles = temp_object_files.address ();
> +    }
> +  else /* Fall back to the old method.  */
> +    {
> +
> +      /* Determine if there are any linker input files.  */
> +      num_linker_inputs = 0;
> +      for (i = 0; (int) i < n_infiles; i++)
> +       if (explicit_link_files[i] || outfiles[i] != NULL)
> +         num_linker_inputs++;
> +    }
>
>    /* Arrange for temporary file names created during linking to take
>       on names related with the linker output rather than with the
> @@ -8897,14 +9559,24 @@ driver::maybe_run_linker (const char *argv0) const
>      }
>
>    /* If options said don't run linker,
> -     complain about input files to be given to the linker.  */
> +     complain about input files to be given to the linker.
> +     When fsplit-arg is active, the linker will run and this if
> +     will not be triggered.  */
>
> -  if (! linker_was_run && !seen_error ())
> +  if (!outfiles_switched && !linker_was_run && !seen_error ()
> +      && temp_object_files.length () == 0)
>      for (i = 0; (int) i < n_infiles; i++)
>        if (explicit_link_files[i]
>           && !(infiles[i].language && infiles[i].language[0] == '*'))
>         warning (0, "%s: linker input file unused because linking not done",
>                  outfiles[i]);
> +
> +  if (outfiles_switched)
> +    {
> +      /* Undo our changes.  */
> +      outfiles = outfiles_holder;
> +      n_infiles = n_infiles_holder;
> +    }
>  }
>
>  /* The end of "main".  */
> @@ -10808,6 +11480,7 @@ driver::finalize ()
>    linker_options.truncate (0);
>    assembler_options.truncate (0);
>    preprocessor_options.truncate (0);
> +  temp_object_files.truncate (0);
>
>    path_prefix_reset (&exec_prefixes);
>    path_prefix_reset (&startfile_prefixes);
> diff --git a/gcc/testsuite/driver/a.c b/gcc/testsuite/driver/a.c
> new file mode 100644
> index 00000000000..c6b8c2eb61e
> --- /dev/null
> +++ b/gcc/testsuite/driver/a.c
> @@ -0,0 +1,6 @@
> +int puts (const char *);
> +
> +void a_func (void)
> +{
> +  puts ("A test");
> +}
> diff --git a/gcc/testsuite/driver/b.c b/gcc/testsuite/driver/b.c
> new file mode 100644
> index 00000000000..76a2cba0bd9
> --- /dev/null
> +++ b/gcc/testsuite/driver/b.c
> @@ -0,0 +1,6 @@
> +int puts (const char *);
> +
> +void a_func (void)
> +{
> +  puts ("Another test");
> +}
> diff --git a/gcc/testsuite/driver/driver.exp b/gcc/testsuite/driver/driver.exp
> new file mode 100644
> index 00000000000..2bbaf07778a
> --- /dev/null
> +++ b/gcc/testsuite/driver/driver.exp
> @@ -0,0 +1,80 @@
> +#   Copyright (C) 2008-2020 Free Software Foundation, Inc.
> +
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with GCC; see the file COPYING3.  If not see
> +# <http://www.gnu.org/licenses/>.
> +
> +# GCC testsuite that uses the `dg.exp' driver.
> +
> +# Load support procs.
> +load_lib gcc-dg.exp
> +
> +proc check-for-errors { test input } {
> +    if { [string equal "$input" ""] } then {
> +       pass "$test: std out"
> +    } else {
> +       fail "$test: std out\n$input"
> +    }
> +}
> +
> +if ![check_effective_target_pthread] {
> +  return
> +}
> +
> +# If a testcase doesn't have special options, use these.
> +global DEFAULT_CFLAGS
> +if ![info exists DEFAULT_CFLAGS] then {
> +    set DEFAULT_CFLAGS " -ansi -pedantic-errors"
> +}
> +
> +# Initialize `dg'.
> +dg-init
> +
> +
> +# Test multi-input compilation
> +check-for-errors "Multi-input Compilation" \
> +       [gcc_target_compile "$srcdir/$subdir/a.c $srcdir/$subdir/b.c -c" "" none ""]
> +
> +# Compile file and generate an assembler and object file
> +check-for-errors "Object Generation" \
> +       [gcc_target_compile "$srcdir/$subdir/a.c -c" "a.o" none ""]
> +check-for-errors "Object Generation" \
> +       [gcc_target_compile "$srcdir/$subdir/b.c -c" "a.o" none ""]
> +check-for-errors "Assembler Generation" \
> +       [gcc_target_compile "$srcdir/$subdir/a.c -S" "a.S" none ""]
> +check-for-errors "Assembler Generation" \
> +       [gcc_target_compile "$srcdir/$subdir/b.c -S" "b.S" none ""]
> +
> +# Empty file is a valid program
> +check-for-errors "Empty Program" \
> +       [gcc_target_compile "$srcdir/$subdir/empty.c -c" "empty.o" none ""]
> +
> +# Test object file passthrough
> +check-for-errors "Object file passthrough" \
> +       [gcc_target_compile "$srcdir/$subdir/foo.c a.o" "a.exe" none ""]
> +
> +# Test compilation when assembler is provided
> +check-for-errors "Assembler with Macros" \
> +       [gcc_target_compile "a.S -c" "a.o" none ""]
> +
> +# Clean temporary generated files.
> +set temp_files {"a.o" "a.S" "b.o" "b.S" "empty.o"}
> +
> +foreach f $temp_files {
> +       if { [file exists $f] } {
> +               file delete $f
> +       }
> +}
> +
> +# All done.
> +dg-finish
> diff --git a/gcc/testsuite/driver/empty.c b/gcc/testsuite/driver/empty.c
> new file mode 100644
> index 00000000000..e69de29bb2d
> diff --git a/gcc/testsuite/driver/foo.c b/gcc/testsuite/driver/foo.c
> new file mode 100644
> index 00000000000..a18fd2a3b14
> --- /dev/null
> +++ b/gcc/testsuite/driver/foo.c
> @@ -0,0 +1,7 @@
> +void a_func (void);
> +
> +int main()
> +{
> +  a_func ();
> +  return 0;
> +}
> --
> 2.28.0
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/6] Add `+' for Jobserver Integration
  2020-08-20 22:33   ` Joseph Myers
@ 2020-08-24 13:19     ` Richard Biener
  2020-08-27 15:38     ` Jan Hubicka
  1 sibling, 0 replies; 31+ messages in thread
From: Richard Biener @ 2020-08-24 13:19 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Giuliano Belinassi, GCC Patches, Jan Hubicka

On Fri, Aug 21, 2020 at 12:34 AM Joseph Myers <joseph@codesourcery.com> wrote:
>
> On Thu, 20 Aug 2020, Giuliano Belinassi via Gcc-patches wrote:
>
> >  libbacktrace/Makefile.in |   2 +-
> >  zlib/Makefile.in         |  64 ++++++------
>
> These directories use makefiles generated by automake.  Rather than
> modifying the generated files, you need to modify the sources (whether
> that's Makefile.am, or code in automake itself - if in automake itself, we
> should wait for an actual new automake release before updating the version
> used in GCC).

I also wonder whether for actual production use in GCC we should concentrate
on the known bottle-necks and just amend the rule(s) in gcc/Makefile.in for now?

Richard.

> --
> Joseph S. Myers
> joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/6] Parallelize Intra-Procedural Optimizations using the LTO Engine.
  2020-08-24 12:50 ` Richard Biener
@ 2020-08-24 15:13   ` Giuliano Belinassi
  2020-08-29 11:31     ` Jan Hubicka
  0 siblings, 1 reply; 31+ messages in thread
From: Giuliano Belinassi @ 2020-08-24 15:13 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Jan Hubicka

Hi, Richi.

On 08/24, Richard Biener wrote:
> On Fri, Aug 21, 2020 at 12:00 AM Giuliano Belinassi
> <giuliano.belinassi@usp.br> wrote:
> >
> > This patch series add a new flag "-fparallel-jobs=" to control if the
> > compiler should try to compile the current file in parallel.
> >
> > There are three modes which is supported by now:
> >
> > 1. -fparallel-jobs=<N>: Try to compile the file using a maximum of N
> > jobs.
> >
> > 2. -fparallel-jobs=jobserver: Check if there is a running GNU Make
> > Jobserver. If positive, communicate with it in order to launch jobs,
> > but alert the user if the jobserver was not found, since it requires
> > modifications in the project Makefile.
> >
> > 3. -fparallel-jobs=auto: Same as 2., but quietly fall back to a maximum
> > of 2 jobs if the jobserver was not found.
> >
> > The parallelization works by using a modified LTO engine, as no IR is
> > dumped into the disk, and a new partitioner is employed to find
> > symbols which must be partitioned together.
> >
> > In order to implement the parallelism feature, we:
> >
> > 1. The driver will pass a hidden -fsplit-outputs=<filename> to cc1*.
> >
> > 2. After IPA, cc1* will search for symbols in which must be partitioned
> > together.  If the user allows GCC to automatically promote symbols to
> > globals through "--param=promote-statics=1" for a better parallel
> > compilation performance, it will also be done.  However, if it decides
> > that partitioning is a bad idea, it will continue with a default serial
> > compilation, and the additional <filename> will not be created.  It will
> > avoid compiling in parallel if and only if:
> >
> >   * File size exceeds the minimum file size specified by LTO default
> >   --param=lto-min-partition.
> 
> less than the minimum size I suppose.

True. :)

> 
> >   * The partitioner is unable to find any point of partitioning in the
> >   file.
> 
> It might make sense to increase the minimum partition size and also
> check the partitioning result against unreasonable bias (one very
> large and one very small partition).

I am working on this right now :)

> 
> > 3. cc1* will fork itself; one fork for each partition. Each child
> > process will apply its partition mask generated by the partitioner
> > and write a new assembler name file to <filename> pointed by the driver.
> 
> For the first partition there's no fork (but the main process is used) and
> the main output file will be used, correct?

No. Forking is only disabled only in fallback mode in this version, but
this is certainly possible to implement.

> 
> > 4. The driver will open each file and partially link them together into
> > a single .o file, if -c was requested, else into a binary.  -S and -E
> > is unsupported for now and probably will remain so.
> 
> That also applies to -save-temps mode I assume which makes
> debugging issues a bit tricky and involves manual invocation
> of the cc1 command to have the file with the output filenames preserved.

This is currently unsupported, but it seems to be a interesting feature
to have.  I could use the input file name as a baseline for the
temporary files and dump them in the current working directory instead
of asking for a temporary file to libiberty.

> 
> >
> > Speedups ranged from 0.95x to 1.9x on a Quad-Core Intel Core-i7 8565U
> > when testing with two files in GCC, as stated in the following table.
> > The test was the result of a single execution with a previous warm up
> > execution. The compiled GCC had checking enabled, and therefore release
> > version might have better timings in both sequential and parallel, but the
> > speedup may remain the same.
> >
> > |                |            | Without Static | With Static |   Max   |
> > | File           | Sequential |    Promotion   |  Promotion  | Speedup |
> > |----------------|------------|----------------|-----------------------|
> > | gimple-match.c |     60s    |       63s      |     34s     |   1.7x  |
> > | insn-emit.c    |     37s    |       19s      |     20s     |   1.9x  |
> >
> > Notice that we have a slowdown in some cases when it is enabled, that
> > is why the parallelism feature is enabled with a flag for now.
> 
> One reason why promote-statics is not enabled by default is that
> it creates new hidden symbols (LTO does so as well) which might
> be undesirable.  If deemed OK in general we could enable it by
> default.  Note that originally I wanted to have -fparallel-jobs=auto
> be enabled by default which should not end up with visible changes
> like this(?)

It not only creates hidden symbols, but it also changes the original
symbol name to avoid clashses with other object files. It could be
very nice to avoid doing this at all.

There was once an idea (I don't remember if from Richi or Honza) to
avoid using partial linking, but instead concatenate the generated
assembler files into a single assembler file and assemble it.  This
would remove the requirement of symbol promotion, as they would be
in the same assembler file, but I am not sure if this would work out
of the box (i.e. if GCC generates assembler that could be concatenated
together).

> 
> > Bootstrapped and Regtested on Linux x86_64.
> >
> > Giuliano Belinassi (6):
> >   Modify gcc driver for parallel compilation
> >   Implement a new partitioner for parallel compilation
> >   Implement fork-based parallelism engine
> >   Add `+' for Jobserver Integration
> >   Add invoke documentation
> >   New tests for parallel compilation feature
> >
> >  gcc/Makefile.in                               |    6 +-
> >  gcc/cgraph.c                                  |   16 +
> >  gcc/cgraph.h                                  |   13 +
> >  gcc/cgraphunit.c                              |  198 ++-
> >  gcc/common.opt                                |    4 +
> >  gcc/doc/invoke.texi                           |   32 +-
> >  gcc/gcc.c                                     | 1219 +++++++++++++----
> >  gcc/ipa-fnsummary.c                           |    2 +-
> >  gcc/ipa-icf.c                                 |    3 +-
> >  gcc/ipa-visibility.c                          |    3 +-
> >  gcc/ipa.c                                     |    4 +-
> >  gcc/jobserver.cc                              |  168 +++
> >  gcc/jobserver.h                               |   33 +
> >  gcc/lto-cgraph.c                              |  172 +++
> >  gcc/{lto => }/lto-partition.c                 |  463 ++++++-
> >  gcc/{lto => }/lto-partition.h                 |    4 +-
> >  gcc/lto-streamer.h                            |    4 +
> >  gcc/lto/Make-lang.in                          |    4 +-
> >  gcc/lto/lto.c                                 |    2 +-
> >  gcc/params.opt                                |    8 +
> >  gcc/symtab.c                                  |   46 +-
> >  gcc/testsuite/driver/a.c                      |    6 +
> >  gcc/testsuite/driver/b.c                      |    6 +
> >  gcc/testsuite/driver/driver.exp               |   80 ++
> >  gcc/testsuite/driver/empty.c                  |    0
> >  gcc/testsuite/driver/foo.c                    |    7 +
> >  .../gcc.dg/parallel-early-constant.c          |   22 +
> >  gcc/testsuite/gcc.dg/parallel-static-1.c      |   21 +
> >  gcc/testsuite/gcc.dg/parallel-static-2.c      |   21 +
> >  .../gcc.dg/parallel-static-clash-1.c          |   23 +
> >  .../gcc.dg/parallel-static-clash-aux.c        |   14 +
> >  gcc/toplev.c                                  |   58 +-
> >  gcc/toplev.h                                  |    3 +
> >  gcc/tree.c                                    |   23 +-
> >  gcc/varasm.c                                  |   26 +-
> >  intl/Makefile.in                              |    2 +-
> >  libbacktrace/Makefile.in                      |    2 +-
> >  libcpp/Makefile.in                            |    2 +-
> >  libdecnumber/Makefile.in                      |    2 +-
> >  libiberty/Makefile.in                         |  212 +--
> >  zlib/Makefile.in                              |   64 +-
> >  41 files changed, 2539 insertions(+), 459 deletions(-)
> >  create mode 100644 gcc/jobserver.cc
> >  create mode 100644 gcc/jobserver.h
> >  rename gcc/{lto => }/lto-partition.c (78%)
> >  rename gcc/{lto => }/lto-partition.h (89%)
> >  create mode 100644 gcc/testsuite/driver/a.c
> >  create mode 100644 gcc/testsuite/driver/b.c
> >  create mode 100644 gcc/testsuite/driver/driver.exp
> >  create mode 100644 gcc/testsuite/driver/empty.c
> >  create mode 100644 gcc/testsuite/driver/foo.c
> >  create mode 100644 gcc/testsuite/gcc.dg/parallel-early-constant.c
> >  create mode 100644 gcc/testsuite/gcc.dg/parallel-static-1.c
> >  create mode 100644 gcc/testsuite/gcc.dg/parallel-static-2.c
> >  create mode 100644 gcc/testsuite/gcc.dg/parallel-static-clash-1.c
> >  create mode 100644 gcc/testsuite/gcc.dg/parallel-static-clash-aux.c
> >
> > --
> > 2.28.0
> >

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/6] Parallelize Intra-Procedural Optimizations using the LTO Engine.
  2020-08-22 21:04   ` Giuliano Belinassi
@ 2020-08-24 16:44     ` Josh Triplett
  2020-08-24 18:38       ` Giuliano Belinassi
  0 siblings, 1 reply; 31+ messages in thread
From: Josh Triplett @ 2020-08-24 16:44 UTC (permalink / raw)
  To: Giuliano Belinassi; +Cc: gcc-patches

On Sat, Aug 22, 2020 at 06:04:48PM -0300, Giuliano Belinassi wrote:
> Hi, Josh
> 
> On 08/21, Josh Triplett wrote:
> > On Thu, Aug 20, 2020 at 07:00:13PM -0300, Giuliano Belinassi wrote:
> > > This patch series add a new flag "-fparallel-jobs=" to control if the
> > > compiler should try to compile the current file in parallel.
> > [...]
> > > Bootstrapped and Regtested on Linux x86_64.
> > > 
> > > Giuliano Belinassi (6):
> > >   Modify gcc driver for parallel compilation
> > >   Implement a new partitioner for parallel compilation
> > >   Implement fork-based parallelism engine
> > >   Add `+' for Jobserver Integration
> > >   Add invoke documentation
> > >   New tests for parallel compilation feature
> > 
> > Very nice!
> 
> Thank you for your interest in this :)
> 
> > 
> > I'm interested in testing this on a highly parallel system. What
> > baseline do these patches apply to?  They don't seem to apply to GCC
> > trunk.
> 
> Hummm, this was supposed to work on trunk out of the box. However,
> there is a high probability that I messed up something while rebasing.
> I will post a version 2 of it when I get more comments and when I fix
> the Makefile issue that Joseph pointed out in other e-mail.
> 
> If you want to test it on a high parallel system, I think it will be
> cool to see how it behaves also when --param=promote-statics=1, as it
> increases parallelism opportunity. :)

I plan to try several variations, including that.

I'd like to see how it affects the performance of Linux kernel builds.

> > Also, I tried to bootstrap the current tip of the devel/autopar_devel
> > branch, but ended up with compiler segfaults that all look like this:
> > ../../gcc/zlib/compress.c:86:1: internal compiler error: Segmentation fault
> >    86 | }
> >       | ^
> 
> Well, there was once a bug in this branch when compiling with -flto that
> caused the assembler output file not to be properly initialized early
> enough, resulting in LTO LGEN stage writing into a invalid FILE pointer.
> I fixed this during rebasing but I forgot to push to the autopar_devel
> branch. In any case, I just pushed the recent changes to autopar_devel
> which fix this issue.

That might explain the problem; I had tried to build gcc with the
bootstrap-lto configuration.

> In any case, -fparallel-jobs= should NOT be used together with -flto.
> Although I used part of the LTO engine for development of this feature,
> they are meant for distinct things. I guess I should give a warning
> about that in next version :)

Interesting. Is that something that could change in the future? I'd like
to be able to get some parallelism when creating the object files, and
then more parallelism when doing the final LTO link.

> Also, I just tested bootstrap with
> 
> ../gcc/configure --disable-multilib --enable-languages=c,c++
>
> on x86_64 linux and it is working.

I'd used --enable-multilib, and --enable-languages=c,c++,lto . Would
that be expected to work?

Thanks,
Josh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/6] Modify gcc driver for parallel compilation
  2020-08-24 13:17   ` Richard Biener
@ 2020-08-24 18:06     ` Giuliano Belinassi
  2020-08-25  6:53       ` Richard Biener
  0 siblings, 1 reply; 31+ messages in thread
From: Giuliano Belinassi @ 2020-08-24 18:06 UTC (permalink / raw)
  To: Richard Biener; +Cc: Joseph S. Myers, GCC Patches, Jan Hubicka

Hi, Richi.

On 08/24, Richard Biener wrote:
> On Fri, Aug 21, 2020 at 12:00 AM Giuliano Belinassi
> <giuliano.belinassi@usp.br> wrote:
> >
> > Update the driver for parallel compilation. This process work as
> > follows:
> >
> > When calling gcc, the driver will check if the flag
> > "-fparallel-jobs" was provided by the user. If yes, then we will
> > check what is the desired output, and if it can be parallelized.
> > There are the following cases, which is described:
> >
> > 1. -S or -E was provided: We can't run in parallel, as the output
> >    can not be easily merged together into one file.
> >
> > 2. -c was provided: When cc1* forks into multiple processes, it
> >    must tell the driver where it stored its generated assembler files.
> >    Therefore we pass a hidden "-fsplit-outputs=filename" to the compiler,
> >    and we check if "filename" was created by it. If yes, we open it,
> >    call assembler for each generated asm file
> >    (this file must not be empty), and link them together with
> >    partial linking to a single .o file. This process is done for each
> >    object file in the argument list.
> >
> > 3. -c was not provided, and the final product will be an binary: Here
> >    we proceed exactly as 2., but we avoid doing the partial
> >    linking, feeding the generated object files directly into the final link.
> >
> > For that to work, we had to heavily modify how the "execute" function
> > works, extracting common code which is used multiple times, and
> > also detecting when the command is a call to a compiler or an
> > assembler, as can be seen in append_split_outputs.
> >
> > Finally, we added some tests which reflects all cases found when
> > bootstrapping the compiler, so development of further features to the
> > driver get faster for now on.
> 
> Few comments inline, Joseph may want to comment on the overall
> structure as driver maintainer (CCed).
> 
> I know I asked for the changes on the branch to be squashed but
> the diff below is quite unreadable with the ChangeLog not helping
> the overall refactoring much.  Is it possible to do some of the
> factoring/refactoring without any functionality change to make the
> actual diff easier to follow?

Well, the refactoring is necessary, otherwise I would need to copy and
paste a really huge amount of code.

What I can do (and sounds reasonable to me) is to break this patch into
two parts; one with just refactoring changes, and the other adding the
parallelism engine.

> 
> Thanks,
> Richard.
> 
> > gcc/ChangeLog
> > 2020-08-20  Giuliano Belinassi  <giuliano.belinassi@usp.br>
> >
> >         * common.opt (fsplit-outputs): New flag.
> >         (fparallel-jobs): New flag.
> >         * gcc.c (extra_arg_storer): New class.
> >         (have_S): New variable.
> >         (struct command): Move from execute.
> >         (is_compiler): New function.
> >         (is_assembler): New function.
> >         (get_number_of_args): New function.
> >         (get_file_by_lines): New function.
> >         (identify_asm_file): New function.
> >         (struct infile): New attribute temp_additional_asm.
> >         (current_infile): New variable.
> >         (get_path_to_ld): New function.
> >         (has_hidden_E): New function.
> >         (sort_asm_files): New function.
> >         (append_split_outputs): New function.
> >         (print_command): New function.
> >         (print_commands): New function.
> >         (print_argbuf): New function.
> >         (handle_verbose): Extracted from execute.
> >         (append_valgrind): Same as above.
> >         (async_launch_commands): Same as above.
> >         (await_commands_to_finish): Same as above.
> >         (split_commands): Same as above.
> >         (parse_argbuf): Same as above.
> >         (execute): Refator.
> >         (fsplit_arg): New function.
> >         (alloc_infile): Initialize infiles with 0.
> >         (process_command): Remember when -S was passed.
> >         (do_spec_on_infiles): Remember current infile being processed.
> >         (maybe_run_linker): Replace object files when -o is a executable.
> >         (finalize): Deinitialize temp_object_files.
> >
> > gcc/testsuite/ChangeLog:
> > 20-08-2020  Giuliano Belinassi  <giuliano.belinassi@usp.br>
> >
> >         * driver/driver.exp: New test.
> >         * driver/a.c: New file.
> >         * driver/b.c: New file.
> >         * driver/empty.c: New file.
> >         * driver/foo.c: New file.
> > ---
> >  gcc/common.opt                  |    4 +
> >  gcc/gcc.c                       | 1219 ++++++++++++++++++++++++-------
> >  gcc/testsuite/driver/a.c        |    6 +
> >  gcc/testsuite/driver/b.c        |    6 +
> >  gcc/testsuite/driver/driver.exp |   80 ++
> >  gcc/testsuite/driver/empty.c    |    0
> >  gcc/testsuite/driver/foo.c      |    7 +
> >  7 files changed, 1049 insertions(+), 273 deletions(-)
> >  create mode 100644 gcc/testsuite/driver/a.c
> >  create mode 100644 gcc/testsuite/driver/b.c
> >  create mode 100644 gcc/testsuite/driver/driver.exp
> >  create mode 100644 gcc/testsuite/driver/empty.c
> >  create mode 100644 gcc/testsuite/driver/foo.c
> >
> > diff --git a/gcc/common.opt b/gcc/common.opt
> > index 4b08e91859f..4aa3ad8c95b 100644
> > --- a/gcc/common.opt
> > +++ b/gcc/common.opt
> > @@ -3465,4 +3465,8 @@ fipa-ra
> >  Common Report Var(flag_ipa_ra) Optimization
> >  Use caller save register across calls if possible.
> >
> > +fsplit-outputs=
> > +Common Joined Var(split_outputs)
> > +-fsplit-outputs=<tempfile>  Filename in which current Compilation Unit will be split to.
> > +
> >  ; This comment is to ensure we retain the blank line above.
> > diff --git a/gcc/gcc.c b/gcc/gcc.c
> > index 10bc9881aed..c276a11ca7a 100644
> > --- a/gcc/gcc.c
> > +++ b/gcc/gcc.c
> > @@ -343,6 +343,74 @@ static struct obstack obstack;
> >
> >  static struct obstack collect_obstack;
> >
> > +/* This is used to store new argv arrays created dinamically to avoid memory
> > +   leaks.  */
> > +
> > +class extra_arg_storer
> > +{
> > +  public:
> > +
> > +    /* Initialize the vec with a default size.  */
> > +
> > +    extra_arg_storer ()
> > +      {
> > +       string_vec.create (8);
> > +       extra_args.create (64);
> > +      }
> > +
> > +    /* Create new array of strings of size N.  */
> > +    const char **create_new (size_t n)
> > +      {
> > +       const char **ret = XNEWVEC (const char *, n);
> > +       extra_args.safe_push (ret);
> > +       return ret;
> > +      }
> > +
> > +    char *create_string (size_t n)
> > +      {
> > +       char *ret = XNEWVEC (char, n);
> > +       string_vec.safe_push (ret);
> > +       return ret;
> > +      }
> > +
> > +    void store (char *str)
> > +      {
> > +       string_vec.safe_push (str);
> > +      }
> > +
> > +    ~extra_arg_storer ()
> > +      {
> > +       release_extra_args ();
> > +       release_string_vec ();
> > +      }
> > +
> > +
> > +  private:
> > +
> > +    /* Release all allocated strings.  */
> > +    void release_extra_args ()
> > +      {
> > +       size_t i;
> > +
> > +       for (i = 0; i < extra_args.length (); i++)
> > +         free (extra_args[i]);
> > +       extra_args.release ();
> > +      }
> > +
> > +    void release_string_vec ()
> > +      {
> > +       size_t i;
> > +
> > +       for (i = 0; i < string_vec.length (); i++)
> > +         free (string_vec[i]);
> > +       string_vec.release ();
> > +      }
> > +
> > +    /* Data structure to hold all arrays.  */
> > +    vec<const char **> extra_args;
> > +    vec<char *> string_vec;
> > +};
> > +
> >  /* Forward declaration for prototypes.  */
> >  struct path_prefix;
> >  struct prefix_list;
> > @@ -1993,6 +2061,9 @@ static int have_o = 0;
> >  /* Was the option -E passed.  */
> >  static int have_E = 0;
> >
> > +/* Was the option -S passed.  */
> > +static int have_S = 0;
> > +
> >  /* Pointer to output file name passed in with -o. */
> >  static const char *output_file = 0;
> >
> > @@ -3056,158 +3127,522 @@ add_sysrooted_hdrs_prefix (struct path_prefix *pprefix, const char *prefix,
> >               require_machine_suffix, os_multilib);
> >  }
> >
> > -
> > -/* Execute the command specified by the arguments on the current line of spec.
> > -   When using pipes, this includes several piped-together commands
> > -   with `|' between them.
> > +struct command
> > +{
> > +  const char *prog;            /* program name.  */
> > +  const char **argv;           /* vector of args.  */
> > +};
> >
> > -   Return 0 if successful, -1 if failed.  */
> > +#define EMPTY_CMD(x) (!((x).prog))  /* Is the provided CMD empty?  */
> > +
> > +/* Check if arg is a call to a compiler.  Return false if not, true if yes.  */
> > +
> > +static bool
> > +is_compiler (const char *arg)
> 
> This and is_assembler should somehow magically fall out of
> specs processing, specifically
> 
> > +{
> > +  static const char *const compilers[] = {"cc1", "cc1plus", "f771"};
> 
> ^^ this is incomplete.  Of course I don't know how to auto-infer
> these but I think it must be possible from somewhere up the
> call-chain?

I was expecting this to be some sort of issue in the merging process ;)

Well, I remember trying to find a way of doing this once just to find
out that the compiler name is embedded with a SPEC string, with no way
to check if the name is actually, a compiler or assembler.

> 
> > +  const char* ptr = arg;
> > +
> > +  size_t i;
> > +
> > +  /* Jump to last '/' of string.  */
> > +  while (*arg)
> > +    if (*arg++ == '/')
> > +      ptr = arg;
> > +
> > +  /* Look if current character seems valid.  */
> > +  gcc_assert (!(*ptr == '\0' ||  *ptr == '/'));
> > +
> > +  for (i = 0; i < ARRAY_SIZE (compilers); i++)
> > +    {
> > +      if (!strcmp (ptr, compilers[i]))
> > +       return true;
> > +    }
> > +
> > +  return false;
> > +}
> > +
> > +/* Check if arg is a call to as.  Return false if not, true if yes.  */
> > +
> > +static bool
> > +is_assembler (const char *arg)
> > +{
> > +  static const char *const assemblers[] = {"as", "gas"};
> > +  const char* ptr = arg;
> > +
> > +  size_t i;
> > +
> > +  /* Jump to last '/' of string.  */
> > +  while (*arg)
> > +    if (*arg++ == '/')
> > +      ptr = arg;
> > +
> > +  /* Look if current character seems valid.  */
> > +  gcc_assert (!(*ptr == '\0' ||  *ptr == '/'));
> > +
> > +  for (i = 0; i < ARRAY_SIZE (assemblers); i++)
> > +    {
> > +      if (!strcmp (ptr, assemblers[i]))
> > +       return true;
> > +    }
> > +
> > +  return false;
> > +}
> > +
> > +/* Get argv[] array length.  */
> >
> >  static int
> > -execute (void)
> > +get_number_of_args (const char *argv[])
> > +{
> > +  int argc;
> > +
> > +  for (argc = 0; argv[argc] != NULL; argc++)
> > +    ;
> > +
> > +  return argc;
> > +}
> > +
> > +static const char *fsplit_arg (extra_arg_storer *);
> > +
> > +/* Accumulate each line in lines vec.  Return true if file exists, false if
> > +   not.  */
> > +
> > +static bool
> > +get_file_by_lines (extra_arg_storer *storer, vec<char *> *lines, const char *name)
> > +{
> > +  int buf_size = 64, len = 0;
> > +  char *buf = XNEWVEC (char, buf_size);
> > +
> > +
> > +  FILE *file = fopen (name, "r");
> > +
> > +  if (!file)
> > +    return false;
> > +
> > +  while (1)
> > +    {
> > +      if (!fgets (buf + len, buf_size, file))
> > +       {
> > +         free (buf); /* Release buffer we created unecessarily.  */
> > +         break;
> > +       }
> > +
> > +      len = strlen (buf);
> > +      if (buf[len - 1] == '\n') /* Check if we indeed read the entire line.  */
> > +       {
> > +         buf[len - 1] = '\0';
> > +         /* Yes.  Insert into the lines vector.  */
> > +         lines->safe_push (buf);
> > +         len = 0;
> > +
> > +         /* Store the created string for future release.  */
> > +         storer->store (buf);
> > +         buf = XNEWVEC (char, buf_size);
> > +       }
> > +      else
> > +       {
> > +         /* No.  Increase the buffer size and read again.  */
> > +         buf = XRESIZEVEC (char, buf, buf_size * 2);
> > +       }
> > +    }
> > +
> > +  if (lines->length () == 0)
> > +    internal_error ("Empty file: %s", name);
> > +
> > +  fclose (file);
> > +  return true;
> > +}
> > +
> > +static void
> > +identify_asm_file (int argc, const char *argv[],
> > +                  int *infile_pos, int *outfile_pos)
> >  {
> >    int i;
> > -  int n_commands;              /* # of command.  */
> > -  char *string;
> > -  struct pex_obj *pex;
> > -  struct command
> > -  {
> > -    const char *prog;          /* program name.  */
> > -    const char **argv;         /* vector of args.  */
> > -  };
> > -  const char *arg;
> >
> > -  struct command *commands;    /* each command buffer with above info.  */
> > +  static const char *asm_extension[] = {"s", "S"};
> >
> > -  gcc_assert (!processing_spec_function);
> > +  bool infile_found = false;
> > +  bool outfile_found = false;
> >
> > -  if (wrapper_string)
> > +  for (i = 0; i < argc; i++)
> >      {
> > -      string = find_a_file (&exec_prefixes,
> > -                           argbuf[0], X_OK, false);
> > -      if (string)
> > -       argbuf[0] = string;
> > -      insert_wrapper (wrapper_string);
> > +      const char *arg = argv[i];
> > +      const char *ext = argv[i];
> > +      unsigned j;
> > +
> > +      /* Jump to last '.' of string.  */
> > +      while (*arg)
> > +       if (*arg++ == '.')
> > +         ext = arg;
> > +
> > +      if (!infile_found)
> > +       for (j = 0; j < ARRAY_SIZE (asm_extension); ++j)
> > +           if (!strcmp (ext, asm_extension[j]))
> > +             {
> > +               infile_found = true;
> > +               *infile_pos = i;
> > +               break;
> > +             }
> > +
> > +      if (!outfile_found)
> > +       if (!strcmp (ext, "-o"))
> > +         {
> > +           outfile_found = true;
> > +           *outfile_pos = i+1;
> > +         }
> > +
> > +      if (infile_found && outfile_found)
> > +       return;
> >      }
> >
> > -  /* Count # of piped commands.  */
> > -  for (n_commands = 1, i = 0; argbuf.iterate (i, &arg); i++)
> > -    if (strcmp (arg, "|") == 0)
> > -      n_commands++;
> > +  gcc_assert (infile_found && outfile_found);
> >
> > -  /* Get storage for each command.  */
> > -  commands = (struct command *) alloca (n_commands * sizeof (struct command));
> > +}
> >
> > -  /* Split argbuf into its separate piped processes,
> > -     and record info about each one.
> > -     Also search for the programs that are to be run.  */
> > +/* Language is one of three things:
> >
> > -  argbuf.safe_push (0);
> > +   1) The name of a real programming language.
> > +   2) NULL, indicating that no one has figured out
> > +   what it is yet.
> > +   3) '*', indicating that the file should be passed
> > +   to the linker.  */
> > +struct infile
> > +{
> > +  const char *name;
> > +  const char *language;
> > +  const char *temp_additional_asm;
> > +  struct compiler *incompiler;
> > +  bool compiled;
> > +  bool preprocessed;
> > +};
> >
> > -  commands[0].prog = argbuf[0]; /* first command.  */
> > -  commands[0].argv = argbuf.address ();
> > +/* Also a vector of input files specified.  */
> >
> > -  if (!wrapper_string)
> > +static struct infile *infiles;
> > +static struct infile *current_infile = NULL;
> > +
> > +int n_infiles;
> > +
> > +static int n_infiles_alloc;
> > +
> > +static vec<const char *> temp_object_files;
> > +
> > +/* Get path to the configured ld.  */
> > +
> > +static const char *
> > +get_path_to_ld (void)
> > +{
> > +  const char *ret = find_a_file (&exec_prefixes, LINKER_NAME, X_OK, false);
> > +  if (!ret)
> > +    ret = "ld";
> > +
> > +  return ret;
> > +}
> > +
> > +/* Check if a hidden -E was passed as argument to something.  */
> > +
> > +static bool
> > +has_hidden_E (int argc, const char *argv[])
> > +{
> > +  int i;
> > +  for (i = 0; i < argc; ++i)
> > +    if (!strcmp (argv[i], "-E"))
> > +      return true;
> > +
> > +  return false;
> > +}
> > +
> > +/* Assembler in the container file are inserted as soon as they are ready.
> > +   Sort them so that builds are reproducible.  */
> 
> In principle the list of outputs is pre-determined by the
> scheduler compiling the partitions - is there any reason
> to write the file with the output names only incrementally
> rather than in one (sorted) go?

No. This surely could be done by the main process.

> 
> > +static void
> > +sort_asm_files (vec <char *> *_lines)
> > +{
> > +  vec <char *> &lines = *_lines;
> > +  int i, n = lines.length ();
> > +  char **temp_buf = XALLOCAVEC (char *, n);
> > +
> > +  for (i = 0; i < n; i++)
> > +    temp_buf[i] = lines[i];
> > +
> > +  for (i = 0; i < n; i++)
> >      {
> > -      string = find_a_file (&exec_prefixes, commands[0].prog, X_OK, false);
> > -      if (string)
> > -       commands[0].argv[0] = string;
> > +      char *no_str = strtok (temp_buf[i], " ");
> > +      char *name = strtok (NULL, "");
> > +
> > +      int pos = atoi (no_str);
> > +      lines[pos] = name;
> >      }
> > +}
> >
> > -  for (n_commands = 1, i = 0; argbuf.iterate (i, &arg); i++)
> > -    if (arg && strcmp (arg, "|") == 0)
> > -      {                                /* each command.  */
> > -#if defined (__MSDOS__) || defined (OS2) || defined (VMS)
> > -       fatal_error (input_location, "%<-pipe%> not supported");
> > -#endif
> > -       argbuf[i] = 0; /* Termination of command args.  */
> > -       commands[n_commands].prog = argbuf[i + 1];
> > -       commands[n_commands].argv
> > -         = &(argbuf.address ())[i + 1];
> > -       string = find_a_file (&exec_prefixes, commands[n_commands].prog,
> > -                             X_OK, false);
> > -       if (string)
> > -         commands[n_commands].argv[0] = string;
> > -       n_commands++;
> > -      }
> > +/* Append -fsplit-output=<tempfile> to all calls to compilers.  Return true
> > +   if a additional call to LD is required to merge the resulting files.  */
> >
> > -  /* If -v, print what we are about to do, and maybe query.  */
> > +static void
> > +append_split_outputs (extra_arg_storer *storer,
> > +                     struct command *additional_ld,
> > +                     struct command **_commands,
> > +                     int *_n_commands)
> > +{
> > +  int i;
> >
> > -  if (verbose_flag)
> > +  struct command *commands = *_commands;
> > +  int n_commands = *_n_commands;
> > +
> > +  const char **argv;
> > +  int argc;
> > +
> > +  if (is_compiler (commands[0].prog))
> > +    {
> > +      argc = get_number_of_args (commands[0].argv);
> > +      argv = storer->create_new (argc + 4);
> > +
> > +      memcpy (argv, commands[0].argv, argc * sizeof (const char *));
> > +
> > +      if (!has_hidden_E (argc, commands[0].argv))
> > +       {
> > +         const char *extra_argument = fsplit_arg (storer);
> > +         argv[argc++] = extra_argument;
> > +       }
> > +
> > +      if (have_c)
> > +       {
> > +         argv[argc++] = "-fPIE";
> > +         argv[argc++] = "-fPIC";
> 
> Uh, I think this has to go away - this must be from some early
> problems and no longer necessary?

Woops. Yeah, I just erased that and bootstrap is still working. :)

> 
> > +       }
> > +
> > +      argv[argc]   = NULL;
> > +
> > +      commands[0].argv = argv;
> > +    }
> > +
> > +  else if (is_assembler (commands[0].prog))
> >      {
> > -      /* For help listings, put a blank line between sub-processes.  */
> > -      if (print_help_list)
> > -       fputc ('\n', stderr);
> > +      vec<char *> additional_asm_files;
> > +
> > +      struct command orig;
> > +      const char **orig_argv;
> > +      int orig_argc;
> > +      const char *orig_obj_file;
> > +
> > +      int infile_pos = -1;
> > +      int outfile_pos = -1;
> > +
> > +      static const char *path_to_ld = NULL;
> > +
> > +      if (!current_infile->temp_additional_asm)
> > +       {
> > +         /* Return because we did not create a additional-asm file for this
> > +            input.  */
> > +
> > +         return;
> > +       }
> > +
> > +      additional_asm_files.create (2);
> > +
> > +      if (!get_file_by_lines (storer, &additional_asm_files,
> > +                             current_infile->temp_additional_asm))
> > +       {
> > +         additional_asm_files.release ();
> > +         return; /* File not found.  This means that cc1* decided not to
> > +                     parallelize.  */
> > +       }
> > +
> > +      sort_asm_files (&additional_asm_files);
> > +
> > +      if (n_commands != 1)
> > +       fatal_error (input_location,
> > +                    "Auto parallelism is unsupported when piping commands");
> > +
> > +      if (!path_to_ld)
> > +       path_to_ld = get_path_to_ld ();
> > +
> > +      /* Get original command.  */
> > +      orig = commands[0];
> > +      orig_argv = commands[0].argv;
> > +      orig_argc = get_number_of_args (orig.argv);
> > +
> > +
> > +      /* Update commands array to include the extra `as' calls.  */
> > +      *_n_commands = additional_asm_files.length ();
> > +      n_commands = *_n_commands;
> > +
> > +      gcc_assert (n_commands > 0);
> > +
> > +      identify_asm_file (orig_argc, orig_argv, &infile_pos, &outfile_pos);
> > +
> > +      *_commands = XRESIZEVEC (struct command, *_commands, n_commands);
> > +      commands = *_commands;
> >
> > -      /* Print each piped command as a separate line.  */
> >        for (i = 0; i < n_commands; i++)
> >         {
> > -         const char *const *j;
> > +         const char **argv = storer->create_new (orig_argc + 1);
> > +         const char *temp_obj = make_temp_file ("additional-obj.o");
> > +         record_temp_file (temp_obj, true, true);
> > +         record_temp_file (additional_asm_files[i], true, true);
> > +
> > +         memcpy (argv, orig_argv, (orig_argc + 1) * sizeof (const char *));
> > +
> > +         orig_obj_file = argv[outfile_pos];
> > +
> > +         argv[infile_pos]  = additional_asm_files[i];
> > +         argv[outfile_pos] = temp_obj;
> > +
> > +         commands[i].prog = orig.prog;
> > +         commands[i].argv = argv;
> > +
> > +         temp_object_files.safe_push (temp_obj);
> > +       }
> > +
> > +       if (have_c)
> > +         {
> > +           unsigned int num_temp_objs = temp_object_files.length ();
> > +           const char **argv = storer->create_new (num_temp_objs + 5);
> > +           unsigned int j;
> > +
> > +           argv[0] = path_to_ld;
> > +           argv[1] = "-o";
> > +           argv[2] = orig_obj_file;
> > +           argv[3] = "-r";
> > +
> > +           for (j = 0; j < num_temp_objs; j++)
> > +             argv[j + 4] = temp_object_files[j];
> > +           argv[j + 4] = NULL;
> > +
> > +           additional_ld->prog = path_to_ld;
> > +           additional_ld->argv = argv;
> > +
> > +           if (!have_o)
> > +             temp_object_files.truncate (0);
> > +         }
> > +
> > +       additional_asm_files.release ();
> > +    }
> > +}
> > +
> > +DEBUG_FUNCTION void
> > +print_command (struct command *command)
> > +{
> > +  const char **argv;
> > +
> > +  for (argv = command->argv; *argv != NULL; argv++)
> > +    fprintf (stdout, " %s", *argv);
> > +  fputc ('\n', stdout);
> > +}
> > +
> > +DEBUG_FUNCTION void
> > +print_commands (int n, struct command *commands)
> > +{
> > +  int i;
> > +
> > +  for (i = 0; i < n; i++)
> > +    print_command (&commands[i]);
> > +}
> > +
> > +DEBUG_FUNCTION void
> > +print_argbuf ()
> > +{
> > +  int i;
> > +  const char *arg;
> > +
> > +  for (i = 0; argbuf.iterate (i, &arg); i++)
> > +    fprintf (stdout, "%s ", arg);
> > +  fputc ('\n', stdout);
> > +}
> > +
> > +
> > +/* Print what commands will run.  Return 0 if success, anything else on
> > +   error.  */
> >
> > -         if (verbose_only_flag)
> > +static int
> > +handle_verbose (int n_commands, struct command commands[])
> > +{
> > +  int i;
> > +
> > +  /* For help listings, put a blank line between sub-processes.  */
> > +  if (print_help_list)
> > +    fputc ('\n', stderr);
> > +
> > +  /* Print each piped command as a separate line.  */
> > +  for (i = 0; i < n_commands; i++)
> > +    {
> > +      const char *const *j;
> > +
> > +      if (verbose_only_flag)
> > +       {
> > +         for (j = commands[i].argv; *j; j++)
> >             {
> > -             for (j = commands[i].argv; *j; j++)
> > +             const char *p;
> > +             for (p = *j; *p; ++p)
> > +               if (!ISALNUM ((unsigned char) *p)
> > +                   && *p != '_' && *p != '/' && *p != '-' && *p != '.')
> > +                 break;
> > +             if (*p || !*j)
> >                 {
> > -                 const char *p;
> > +                 fprintf (stderr, " \"");
> >                   for (p = *j; *p; ++p)
> > -                   if (!ISALNUM ((unsigned char) *p)
> > -                       && *p != '_' && *p != '/' && *p != '-' && *p != '.')
> > -                     break;
> > -                 if (*p || !*j)
> >                     {
> > -                     fprintf (stderr, " \"");
> > -                     for (p = *j; *p; ++p)
> > -                       {
> > -                         if (*p == '"' || *p == '\\' || *p == '$')
> > -                           fputc ('\\', stderr);
> > -                         fputc (*p, stderr);
> > -                       }
> > -                     fputc ('"', stderr);
> > +                     if (*p == '"' || *p == '\\' || *p == '$')
> > +                       fputc ('\\', stderr);
> > +                     fputc (*p, stderr);
> >                     }
> > -                 /* If it's empty, print "".  */
> > -                 else if (!**j)
> > -                   fprintf (stderr, " \"\"");
> > -                 else
> > -                   fprintf (stderr, " %s", *j);
> > -               }
> > -           }
> > -         else
> > -           for (j = commands[i].argv; *j; j++)
> > +                 fputc ('"', stderr);
> > +               }
> >               /* If it's empty, print "".  */
> > -             if (!**j)
> > +             else if (!**j)
> >                 fprintf (stderr, " \"\"");
> >               else
> >                 fprintf (stderr, " %s", *j);
> > -
> > -         /* Print a pipe symbol after all but the last command.  */
> > -         if (i + 1 != n_commands)
> > -           fprintf (stderr, " |");
> > -         fprintf (stderr, "\n");
> > +           }
> >         }
> > -      fflush (stderr);
> > -      if (verbose_only_flag != 0)
> > -        {
> > -         /* verbose_only_flag should act as if the spec was
> > -            executed, so increment execution_count before
> > -            returning.  This prevents spurious warnings about
> > -            unused linker input files, etc.  */
> > -         execution_count++;
> > -         return 0;
> > -        }
> > +      else
> > +       for (j = commands[i].argv; *j; j++)
> > +         /* If it's empty, print "".  */
> > +         if (!**j)
> > +           fprintf (stderr, " \"\"");
> > +         else
> > +           fprintf (stderr, " %s", *j);
> > +
> > +      /* Print a pipe symbol after all but the last command.  */
> > +      if (i + 1 != n_commands)
> > +       fprintf (stderr, " |");
> > +      fprintf (stderr, "\n");
> > +    }
> > +  fflush (stderr);
> > +  if (verbose_only_flag != 0)
> > +    {
> > +      /* verbose_only_flag should act as if the spec was
> > +        executed, so increment execution_count before
> > +        returning.  This prevents spurious warnings about
> > +        unused linker input files, etc.  */
> > +      execution_count++;
> > +      return 1;
> > +    }
> >  #ifdef DEBUG
> > -      fnotice (stderr, "\nGo ahead? (y or n) ");
> > -      fflush (stderr);
> > -      i = getchar ();
> > -      if (i != '\n')
> > -       while (getchar () != '\n')
> > -         ;
> > -
> > -      if (i != 'y' && i != 'Y')
> > -       return 0;
> > +  fnotice (stderr, "\nGo ahead? (y or n) ");
> > +  fflush (stderr);
> > +  i = getchar ();
> > +  if (i != '\n')
> > +    while (getchar () != '\n')
> > +      ;
> > +
> > +  if (i != 'y' && i != 'Y')
> > +    return 1;
> >  #endif /* DEBUG */
> > -    }
> > +
> > +  return 0;
> > +}
> >
> >  #ifdef ENABLE_VALGRIND_CHECKING
> > +
> > +/* Append valgrind to each program.  */
> > +
> > +static void
> > +append_valgrind (struct obstack *to_be_released,
> > +                int n_commands, struct command commands[])
> > +{
> > +  int i;
> > +
> >    /* Run the each command through valgrind.  To simplify prepending the
> >       path to valgrind and the option "-q" (for quiet operation unless
> >       something triggers), we allocate a separate argv array.  */
> > @@ -3221,7 +3656,7 @@ execute (void)
> >        for (argc = 0; commands[i].argv[argc] != NULL; argc++)
> >         ;
> >
> > -      argv = XALLOCAVEC (const char *, argc + 3);
> > +      argv = obstack_alloc (to_be_released, (argc + 3) * sizeof (const char *));
> >
> >        argv[0] = VALGRIND_PATH;
> >        argv[1] = "-q";
> > @@ -3232,15 +3667,16 @@ execute (void)
> >        commands[i].argv = argv;
> >        commands[i].prog = argv[0];
> >      }
> > +}
> >  #endif
> >
> > -  /* Run each piped subprocess.  */
> > +/* Launch a list of commands asynchronously.  */
> >
> > -  pex = pex_init (PEX_USE_PIPES | ((report_times || report_times_to_file)
> > -                                  ? PEX_RECORD_TIMES : 0),
> > -                 progname, temp_filename);
> > -  if (pex == NULL)
> > -    fatal_error (input_location, "%<pex_init%> failed: %m");
> > +static void
> > +async_launch_commands (struct pex_obj *pex,
> > +                      int n_commands, struct command commands[])
> > +{
> > +  int i;
> >
> >    for (i = 0; i < n_commands; i++)
> >      {
> > @@ -3267,151 +3703,341 @@ execute (void)
> >      }
> >
> >    execution_count++;
> > +}
> >
> > -  /* Wait for all the subprocesses to finish.  */
> >
> > -  {
> > -    int *statuses;
> > -    struct pex_time *times = NULL;
> > -    int ret_code = 0;
> > +/* Wait for all the subprocesses to finish.  Return 0 on success, -1 on
> > +   failure.  */
> >
> > -    statuses = (int *) alloca (n_commands * sizeof (int));
> > -    if (!pex_get_status (pex, n_commands, statuses))
> > -      fatal_error (input_location, "failed to get exit status: %m");
> > +static int
> > +await_commands_to_finish (struct pex_obj *pex,
> > +                         int n_commands, struct command commands[])
> > +{
> >
> > -    if (report_times || report_times_to_file)
> > -      {
> > -       times = (struct pex_time *) alloca (n_commands * sizeof (struct pex_time));
> > -       if (!pex_get_times (pex, n_commands, times))
> > -         fatal_error (input_location, "failed to get process times: %m");
> > -      }
> > +  int *statuses;
> > +  struct pex_time *times = NULL;
> > +  int ret_code = 0, i;
> >
> > -    pex_free (pex);
> > +  statuses = (int *) alloca (n_commands * sizeof (int));
> > +  if (!pex_get_status (pex, n_commands, statuses))
> > +    fatal_error (input_location, "failed to get exit status: %m");
> >
> > -    for (i = 0; i < n_commands; ++i)
> > -      {
> > -       int status = statuses[i];
> > +  if (report_times || report_times_to_file)
> > +    {
> > +      times = (struct pex_time *) alloca (n_commands * sizeof (*times));
> > +      if (!pex_get_times (pex, n_commands, times))
> > +       fatal_error (input_location, "failed to get process times: %m");
> > +    }
> >
> > -       if (WIFSIGNALED (status))
> > -         switch (WTERMSIG (status))
> > -           {
> > -           case SIGINT:
> > -           case SIGTERM:
> > -             /* SIGQUIT and SIGKILL are not available on MinGW.  */
> > +  for (i = 0; i < n_commands; ++i)
> > +    {
> > +      int status = statuses[i];
> > +
> > +      if (WIFSIGNALED (status))
> > +       switch (WTERMSIG (status))
> > +         {
> > +         case SIGINT:
> > +         case SIGTERM:
> > +           /* SIGQUIT and SIGKILL are not available on MinGW.  */
> >  #ifdef SIGQUIT
> > -           case SIGQUIT:
> > +         case SIGQUIT:
> >  #endif
> >  #ifdef SIGKILL
> > -           case SIGKILL:
> > +         case SIGKILL:
> >  #endif
> > -             /* The user (or environment) did something to the
> > -                inferior.  Making this an ICE confuses the user into
> > -                thinking there's a compiler bug.  Much more likely is
> > -                the user or OOM killer nuked it.  */
> > -             fatal_error (input_location,
> > -                          "%s signal terminated program %s",
> > -                          strsignal (WTERMSIG (status)),
> > -                          commands[i].prog);
> > -             break;
> > +           /* The user (or environment) did something to the
> > +              inferior.  Making this an ICE confuses the user into
> > +              thinking there's a compiler bug.  Much more likely is
> > +              the user or OOM killer nuked it.  */
> > +           fatal_error (input_location,
> > +                        "%s signal terminated program %s",
> > +                        strsignal (WTERMSIG (status)),
> > +                        commands[i].prog);
> > +           break;
> >
> >  #ifdef SIGPIPE
> > -           case SIGPIPE:
> > -             /* SIGPIPE is a special case.  It happens in -pipe mode
> > -                when the compiler dies before the preprocessor is
> > -                done, or the assembler dies before the compiler is
> > -                done.  There's generally been an error already, and
> > -                this is just fallout.  So don't generate another
> > -                error unless we would otherwise have succeeded.  */
> > -             if (signal_count || greatest_status >= MIN_FATAL_STATUS)
> > -               {
> > -                 signal_count++;
> > -                 ret_code = -1;
> > -                 break;
> > -               }
> > +         case SIGPIPE:
> > +           /* SIGPIPE is a special case.  It happens in -pipe mode
> > +              when the compiler dies before the preprocessor is
> > +              done, or the assembler dies before the compiler is
> > +              done.  There's generally been an error already, and
> > +              this is just fallout.  So don't generate another
> > +              error unless we would otherwise have succeeded.  */
> > +           if (signal_count || greatest_status >= MIN_FATAL_STATUS)
> > +             {
> > +               signal_count++;
> > +               ret_code = -1;
> > +               break;
> > +             }
> >  #endif
> > -             /* FALLTHROUGH */
> > +           /* FALLTHROUGH.  */
> >
> > -           default:
> > -             /* The inferior failed to catch the signal.  */
> > -             internal_error_no_backtrace ("%s signal terminated program %s",
> > -                                          strsignal (WTERMSIG (status)),
> > -                                          commands[i].prog);
> > -           }
> > -       else if (WIFEXITED (status)
> > -                && WEXITSTATUS (status) >= MIN_FATAL_STATUS)
> > -         {
> > -           /* For ICEs in cc1, cc1obj, cc1plus see if it is
> > -              reproducible or not.  */
> > -           const char *p;
> > -           if (flag_report_bug
> > -               && WEXITSTATUS (status) == ICE_EXIT_CODE
> > -               && i == 0
> > -               && (p = strrchr (commands[0].argv[0], DIR_SEPARATOR))
> > -               && ! strncmp (p + 1, "cc1", 3))
> > -             try_generate_repro (commands[0].argv);
> > -           if (WEXITSTATUS (status) > greatest_status)
> > -             greatest_status = WEXITSTATUS (status);
> > -           ret_code = -1;
> > +         default:
> > +           /* The inferior failed to catch the signal.  */
> > +           internal_error_no_backtrace ("%s signal terminated program %s",
> > +                                        strsignal (WTERMSIG (status)),
> > +                                        commands[i].prog);
> >           }
> > +      else if (WIFEXITED (status)
> > +              && WEXITSTATUS (status) >= MIN_FATAL_STATUS)
> > +       {
> > +         /* For ICEs in cc1, cc1obj, cc1plus see if it is
> > +            reproducible or not.  */
> > +         const char *p;
> > +         if (flag_report_bug
> > +             && WEXITSTATUS (status) == ICE_EXIT_CODE
> > +             && i == 0
> > +             && (p = strrchr (commands[0].argv[0], DIR_SEPARATOR))
> > +             && ! strncmp (p + 1, "cc1", 3))
> > +           try_generate_repro (commands[0].argv);
> > +         if (WEXITSTATUS (status) > greatest_status)
> > +           greatest_status = WEXITSTATUS (status);
> > +         ret_code = -1;
> > +       }
> >
> > -       if (report_times || report_times_to_file)
> > -         {
> > -           struct pex_time *pt = &times[i];
> > -           double ut, st;
> > +      if (report_times || report_times_to_file)
> > +       {
> > +         struct pex_time *pt = &times[i];
> > +         double ut, st;
> >
> > -           ut = ((double) pt->user_seconds
> > -                 + (double) pt->user_microseconds / 1.0e6);
> > -           st = ((double) pt->system_seconds
> > -                 + (double) pt->system_microseconds / 1.0e6);
> > +         ut = ((double) pt->user_seconds
> > +               + (double) pt->user_microseconds / 1.0e6);
> > +         st = ((double) pt->system_seconds
> > +               + (double) pt->system_microseconds / 1.0e6);
> >
> > -           if (ut + st != 0)
> > -             {
> > -               if (report_times)
> > -                 fnotice (stderr, "# %s %.2f %.2f\n",
> > -                          commands[i].prog, ut, st);
> > +         if (ut + st != 0)
> > +           {
> > +             if (report_times)
> > +               fnotice (stderr, "# %s %.2f %.2f\n",
> > +                        commands[i].prog, ut, st);
> >
> > -               if (report_times_to_file)
> > -                 {
> > -                   int c = 0;
> > -                   const char *const *j;
> > +             if (report_times_to_file)
> > +               {
> > +                 int c = 0;
> > +                 const char *const *j;
> >
> > -                   fprintf (report_times_to_file, "%g %g", ut, st);
> > +                 fprintf (report_times_to_file, "%g %g", ut, st);
> >
> > -                   for (j = &commands[i].prog; *j; j = &commands[i].argv[++c])
> > -                     {
> > -                       const char *p;
> > -                       for (p = *j; *p; ++p)
> > -                         if (*p == '"' || *p == '\\' || *p == '$'
> > -                             || ISSPACE (*p))
> > -                           break;
> > +                 for (j = &commands[i].prog; *j; j = &commands[i].argv[++c])
> > +                   {
> > +                     const char *p;
> > +                     for (p = *j; *p; ++p)
> > +                       if (*p == '"' || *p == '\\' || *p == '$'
> > +                           || ISSPACE (*p))
> > +                         break;
> >
> > -                       if (*p)
> > -                         {
> > -                           fprintf (report_times_to_file, " \"");
> > -                           for (p = *j; *p; ++p)
> > -                             {
> > -                               if (*p == '"' || *p == '\\' || *p == '$')
> > -                                 fputc ('\\', report_times_to_file);
> > -                               fputc (*p, report_times_to_file);
> > -                             }
> > -                           fputc ('"', report_times_to_file);
> > -                         }
> > -                       else
> > -                         fprintf (report_times_to_file, " %s", *j);
> > -                     }
> > +                     if (*p)
> > +                       {
> > +                         fprintf (report_times_to_file, " \"");
> > +                         for (p = *j; *p; ++p)
> > +                           {
> > +                             if (*p == '"' || *p == '\\' || *p == '$')
> > +                               fputc ('\\', report_times_to_file);
> > +                             fputc (*p, report_times_to_file);
> > +                           }
> > +                         fputc ('"', report_times_to_file);
> > +                       }
> > +                     else
> > +                       fprintf (report_times_to_file, " %s", *j);
> > +                   }
> >
> > -                   fputc ('\n', report_times_to_file);
> > -                 }
> > -             }
> > -         }
> > +                 fputc ('\n', report_times_to_file);
> > +               }
> > +           }
> > +       }
> > +    }
> > +
> > +  return ret_code;
> > +}
> > +
> > +/* Split a single command with pipes into several commands.  */
> > +
> > +static void
> > +split_commands (vec<const_char_p> *argbuf_p,
> > +               int n_commands, struct command commands[])
> > +{
> > +  int i;
> > +  const char *arg;
> > +  vec<const_char_p> &argbuf = *argbuf_p;
> > +
> > +  for (n_commands = 1, i = 0; argbuf.iterate (i, &arg); i++)
> > +    if (arg && strcmp (arg, "|") == 0)
> > +      {                                /* each command.  */
> > +       const char *string;
> > +#if defined (__MSDOS__) || defined (OS2) || defined (VMS)
> > +       fatal_error (input_location, "%<-pipe%> not supported");
> > +#endif
> > +       argbuf[i] = 0; /* Termination of command args.  */
> > +       commands[n_commands].prog = argbuf[i + 1];
> > +       commands[n_commands].argv
> > +         = &(argbuf.address ())[i + 1];
> > +       string = find_a_file (&exec_prefixes, commands[n_commands].prog,
> > +                             X_OK, false);
> > +       if (string)
> > +         commands[n_commands].argv[0] = string;
> > +       n_commands++;
> >        }
> > +}
> > +
> > +struct command *
> > +parse_argbuf (vec <const_char_p> *argbuf_p, int *n)
> > +{
> > +  int i, n_commands;
> > +  vec<const_char_p> &argbuf = *argbuf_p;
> > +  const char *arg;
> > +  struct command *commands;
> >
> > -   if (commands[0].argv[0] != commands[0].prog)
> > -     free (CONST_CAST (char *, commands[0].argv[0]));
> > +  /* Count # of piped commands.  */
> > +  for (n_commands = 1, i = 0; argbuf.iterate (i, &arg); i++)
> > +    if (strcmp (arg, "|") == 0)
> > +      n_commands++;
> >
> > -    return ret_code;
> > -  }
> > +  /* Get storage for each command.  */
> > +  commands = XNEWVEC (struct command, n_commands);
> > +
> > +  /* Split argbuf into its separate piped processes,
> > +     and record info about each one.
> > +     Also search for the programs that are to be run.  */
> > +
> > +  argbuf.safe_push (0);
> > +
> > +  commands[0].prog = argbuf[0]; /* first command.  */
> > +  commands[0].argv = argbuf.address ();
> > +
> > +  split_commands (argbuf_p, n_commands, commands);
> > +
> > +  *n = n_commands;
> > +  return commands;
> > +}
> > +
> > +/* Execute the command specified by the arguments on the current line of spec.
> > +   When using pipes, this includes several piped-together commands
> > +   with `|' between them.
> > +
> > +   Return 0 if successful, -1 if failed.  */
> > +
> > +static int
> > +execute (void)
> > +{
> > +  struct pex_obj *pex;
> > +  struct command *commands;     /* each command buffer with program to call
> > +                                   and arguments.  */
> > +  int n_commands;               /* # of command.  */
> > +  int ret = 0;
> > +
> > +  struct command additional_ld = {NULL, NULL};
> > +  extra_arg_storer storer;
> > +
> > +  struct command *commands_batch;
> > +  int n;
> > +
> > +  gcc_assert (!processing_spec_function);
> > +
> > +  if (wrapper_string)
> > +    {
> > +      char *string = find_a_file (&exec_prefixes, argbuf[0], X_OK, false);
> > +      if (string)
> > +       argbuf[0] = string;
> > +      insert_wrapper (wrapper_string);
> > +    }
> > +
> > +  /* Parse the argbuf into several commands.  */
> > +  commands = parse_argbuf (&argbuf, &n_commands);
> > +
> > +  if (!have_S && !have_E && flag_parallel_jobs)
> > +    append_split_outputs (&storer, &additional_ld, &commands, &n_commands);
> > +
> > +  if (!wrapper_string)
> > +    {
> > +      char *string = find_a_file (&exec_prefixes, commands[0].prog,
> > +                                 X_OK, false);
> > +      if (string)
> > +       commands[0].argv[0] = string;
> > +    }
> > +
> > +  /* If -v, print what we are about to do, and maybe query.  */
> > +
> > +  if (verbose_flag)
> > +    {
> > +      int ret_verbose = handle_verbose (n_commands, commands);
> > +      if (ret_verbose > 0)
> > +       {
> > +         ret = 0;
> > +         goto cleanup;
> > +       }
> > +    }
> > +
> > +#ifdef ENABLE_VALGRIND_CHECKING
> > +  /* Stack of strings to be released on function return.  */
> > +  struct obstack to_be_released;
> > +  obstack_init (&to_be_released);
> > +  append_valgrind (&to_be_released, n_commands, commands);
> > +#endif
> > +
> > +  /* FIXME: Interact with GNU Jobserver if necessary.  */
> > +
> > +  commands_batch = commands;
> > +  n = flag_parallel_jobs? 1: n_commands;
> > +
> > +  for (int i = 0; i < n_commands; i += n)
> > +    {
> > +      /* Run each piped subprocess.  */
> > +
> > +      pex = pex_init (PEX_USE_PIPES | ((report_times || report_times_to_file)
> > +                                      ? PEX_RECORD_TIMES : 0),
> > +                     progname, temp_filename);
> > +      if (pex == NULL)
> > +       fatal_error (input_location, "%<pex_init%> failed: %m");
> > +
> > +      /* Lauch the commands.  */
> > +      async_launch_commands (pex, n, commands_batch);
> > +
> > +      /* Await them to be done.  */
> > +      ret |= await_commands_to_finish (pex, n, commands_batch);
> > +
> > +      commands_batch = commands_batch + n;
> > +
> > +      /* Cleanup.  */
> > +      pex_free (pex);
> > +    }
> > +
> > +
> > +  if (ret != 0)
> > +    goto cleanup;
> > +
> > +  /* Run extra ld call.  */
> > +  if (!EMPTY_CMD (additional_ld))
> > +    {
> > +      /* If we are here, we must be sure that we had at least two object
> > +        files to link.  */
> > +      //gcc_assert (n_commands != 1);
> > +
> > +      pex = pex_init (PEX_USE_PIPES | ((report_times || report_times_to_file)
> > +                                      ? PEX_RECORD_TIMES : 0),
> > +                     progname, temp_filename);
> > +
> > +      if (verbose_flag)
> > +       print_command (&additional_ld);
> > +
> > +      async_launch_commands (pex, 1, &additional_ld);
> > +      ret = await_commands_to_finish (pex, 1, &additional_ld);
> > +      pex_free (pex);
> > +    }
> > +
> > +
> > +#ifdef ENABLE_VALGRIND_CHECKING
> > +  obstack_free (&to_be_released, NULL);
> > +#endif
> > +
> > +cleanup:
> > +  if (commands[0].argv[0] != commands[0].prog)
> > +    free (CONST_CAST (char *, commands[0].argv[0]));
> > +
> > +  free (commands);
> > +
> > +  return ret;
> >  }
> > +
> >
> >  /* Find all the switches given to us
> >     and make a vector describing them.
> > @@ -3480,29 +4106,33 @@ static int n_switches_alloc_debug_check[2];
> >
> >  static char *debug_check_temp_file[2];
> >
> > -/* Language is one of three things:
> > -
> > -   1) The name of a real programming language.
> > -   2) NULL, indicating that no one has figured out
> > -   what it is yet.
> > -   3) '*', indicating that the file should be passed
> > -   to the linker.  */
> > -struct infile
> > +static const char *
> > +fsplit_arg (extra_arg_storer *storer)
> >  {
> > -  const char *name;
> > -  const char *language;
> > -  struct compiler *incompiler;
> > -  bool compiled;
> > -  bool preprocessed;
> > -};
> > +  const char *tempname = make_temp_file ("additional-asm");
> > +  const char arg[] = "-fsplit-outputs=";
> > +  char *final;
> >
> > -/* Also a vector of input files specified.  */
> > +  size_t n = ARRAY_SIZE (arg) + strlen (tempname);
> >
> > -static struct infile *infiles;
> > +  gcc_assert (current_infile);
> >
> > -int n_infiles;
> > +  current_infile->temp_additional_asm = tempname;
> > +
> > +  /* Remove file, once we may not even need it and create it later.  */
> > +  /* FIXME: This is a little hackish.  */
> > +  remove (tempname);
> > +
> > +  final = storer->create_string (n);
> > +
> > +  strcpy (final, arg);
> > +  strcat (final, tempname);
> > +
> > +  record_temp_file (tempname, true, true);
> > +
> > +  return final;
> > +}
> >
> > -static int n_infiles_alloc;
> >
> >  /* True if undefined environment variables encountered during spec processing
> >     are ok to ignore, typically when we're running for --help or --version.  */
> > @@ -3683,6 +4313,8 @@ alloc_infile (void)
> >      {
> >        n_infiles_alloc = 16;
> >        infiles = XNEWVEC (struct infile, n_infiles_alloc);
> > +      memset (infiles, 0x00, sizeof (*infiles) * n_infiles_alloc);
> > +
> >      }
> >    else if (n_infiles_alloc == n_infiles)
> >      {
> > @@ -4648,6 +5280,9 @@ process_command (unsigned int decoded_options_count,
> >        switch (decoded_options[j].opt_index)
> >         {
> >         case OPT_S:
> > +         have_S = 1;
> > +         have_c = 1;
> > +         break;
> >         case OPT_c:
> >         case OPT_E:
> >           have_c = 1;
> > @@ -6155,11 +6790,14 @@ do_spec_1 (const char *spec, int inswitch, const char *soft_matched_part)
> >                   open_at_file ();
> >
> >                 for (i = 0; (int) i < n_infiles; i++)
> > -                 if (compile_input_file_p (&infiles[i]))
> > -                   {
> > -                     store_arg (infiles[i].name, 0, 0);
> > -                     infiles[i].compiled = true;
> > -                   }
> > +                 {
> > +                   current_infile = &infiles[i];
> > +                   if (compile_input_file_p (current_infile))
> > +                     {
> > +                       store_arg (current_infile->name, 0, 0);
> > +                       current_infile->compiled = true;
> > +                     }
> > +                 }
> >
> >                 if (at_file_supplied)
> >                   close_at_file ();
> > @@ -6515,7 +7153,7 @@ do_spec_1 (const char *spec, int inswitch, const char *soft_matched_part)
> >                      "%{foo=*:bar%*}%{foo=*:one%*two}"
> >
> >                    matches -foo=hello then it will produce:
> > -
> > +
> >                      barhello onehellotwo
> >                 */
> >                 if (*p == 0 || *p == '}')
> > @@ -8642,6 +9280,7 @@ driver::do_spec_on_infiles () const
> >    for (i = 0; (int) i < n_infiles; i++)
> >      {
> >        int this_file_error = 0;
> > +      current_infile = &infiles[i];
> >
> >        /* Tell do_spec what to substitute for %i.  */
> >
> > @@ -8761,12 +9400,15 @@ driver::do_spec_on_infiles () const
> >        int i;
> >
> >        for (i = 0; i < n_infiles ; i++)
> > -       if (infiles[i].incompiler
> > -           || (infiles[i].language && infiles[i].language[0] != '*'))
> > -         {
> > -           set_input (infiles[i].name);
> > -           break;
> > -         }
> > +       {
> > +         current_infile = &infiles[i];
> > +         if (infiles[i].incompiler
> > +             || (infiles[i].language && infiles[i].language[0] != '*'))
> > +           {
> > +             set_input (infiles[i].name);
> > +             break;
> > +           }
> > +       }
> >      }
> >
> >    if (!seen_error ())
> > @@ -8788,11 +9430,31 @@ driver::maybe_run_linker (const char *argv0) const
> >    int linker_was_run = 0;
> >    int num_linker_inputs;
> >
> > -  /* Determine if there are any linker input files.  */
> > -  num_linker_inputs = 0;
> > -  for (i = 0; (int) i < n_infiles; i++)
> > -    if (explicit_link_files[i] || outfiles[i] != NULL)
> > -      num_linker_inputs++;
> > +  /* Set outfiles to be the temporary object vector.  */
> > +  const char **outfiles_holder = outfiles;
> > +  int n_infiles_holder = n_infiles;
> > +  bool outfiles_switched = false;
> > +  if (temp_object_files.length () > 0)
> > +    {
> > +      /* Insert explicit link files into the temp object vector.  */
> > +
> > +      for (i = 0; (int) i < n_infiles; i++)
> > +       if (explicit_link_files[i] && outfiles[i] != NULL)
> > +         temp_object_files.safe_push (outfiles[i]);
> > +
> > +      num_linker_inputs = n_infiles = temp_object_files.length ();
> > +      temp_object_files.safe_push (NULL); /* the NULL sentinel.  */
> > +      outfiles = temp_object_files.address ();
> > +    }
> > +  else /* Fall back to the old method.  */
> > +    {
> > +
> > +      /* Determine if there are any linker input files.  */
> > +      num_linker_inputs = 0;
> > +      for (i = 0; (int) i < n_infiles; i++)
> > +       if (explicit_link_files[i] || outfiles[i] != NULL)
> > +         num_linker_inputs++;
> > +    }
> >
> >    /* Arrange for temporary file names created during linking to take
> >       on names related with the linker output rather than with the
> > @@ -8897,14 +9559,24 @@ driver::maybe_run_linker (const char *argv0) const
> >      }
> >
> >    /* If options said don't run linker,
> > -     complain about input files to be given to the linker.  */
> > +     complain about input files to be given to the linker.
> > +     When fsplit-arg is active, the linker will run and this if
> > +     will not be triggered.  */
> >
> > -  if (! linker_was_run && !seen_error ())
> > +  if (!outfiles_switched && !linker_was_run && !seen_error ()
> > +      && temp_object_files.length () == 0)
> >      for (i = 0; (int) i < n_infiles; i++)
> >        if (explicit_link_files[i]
> >           && !(infiles[i].language && infiles[i].language[0] == '*'))
> >         warning (0, "%s: linker input file unused because linking not done",
> >                  outfiles[i]);
> > +
> > +  if (outfiles_switched)
> > +    {
> > +      /* Undo our changes.  */
> > +      outfiles = outfiles_holder;
> > +      n_infiles = n_infiles_holder;
> > +    }
> >  }
> >
> >  /* The end of "main".  */
> > @@ -10808,6 +11480,7 @@ driver::finalize ()
> >    linker_options.truncate (0);
> >    assembler_options.truncate (0);
> >    preprocessor_options.truncate (0);
> > +  temp_object_files.truncate (0);
> >
> >    path_prefix_reset (&exec_prefixes);
> >    path_prefix_reset (&startfile_prefixes);
> > diff --git a/gcc/testsuite/driver/a.c b/gcc/testsuite/driver/a.c
> > new file mode 100644
> > index 00000000000..c6b8c2eb61e
> > --- /dev/null
> > +++ b/gcc/testsuite/driver/a.c
> > @@ -0,0 +1,6 @@
> > +int puts (const char *);
> > +
> > +void a_func (void)
> > +{
> > +  puts ("A test");
> > +}
> > diff --git a/gcc/testsuite/driver/b.c b/gcc/testsuite/driver/b.c
> > new file mode 100644
> > index 00000000000..76a2cba0bd9
> > --- /dev/null
> > +++ b/gcc/testsuite/driver/b.c
> > @@ -0,0 +1,6 @@
> > +int puts (const char *);
> > +
> > +void a_func (void)
> > +{
> > +  puts ("Another test");
> > +}
> > diff --git a/gcc/testsuite/driver/driver.exp b/gcc/testsuite/driver/driver.exp
> > new file mode 100644
> > index 00000000000..2bbaf07778a
> > --- /dev/null
> > +++ b/gcc/testsuite/driver/driver.exp
> > @@ -0,0 +1,80 @@
> > +#   Copyright (C) 2008-2020 Free Software Foundation, Inc.
> > +
> > +# This program is free software; you can redistribute it and/or modify
> > +# it under the terms of the GNU General Public License as published by
> > +# the Free Software Foundation; either version 3 of the License, or
> > +# (at your option) any later version.
> > +#
> > +# This program is distributed in the hope that it will be useful,
> > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +# GNU General Public License for more details.
> > +#
> > +# You should have received a copy of the GNU General Public License
> > +# along with GCC; see the file COPYING3.  If not see
> > +# <http://www.gnu.org/licenses/>.
> > +
> > +# GCC testsuite that uses the `dg.exp' driver.
> > +
> > +# Load support procs.
> > +load_lib gcc-dg.exp
> > +
> > +proc check-for-errors { test input } {
> > +    if { [string equal "$input" ""] } then {
> > +       pass "$test: std out"
> > +    } else {
> > +       fail "$test: std out\n$input"
> > +    }
> > +}
> > +
> > +if ![check_effective_target_pthread] {
> > +  return
> > +}
> > +
> > +# If a testcase doesn't have special options, use these.
> > +global DEFAULT_CFLAGS
> > +if ![info exists DEFAULT_CFLAGS] then {
> > +    set DEFAULT_CFLAGS " -ansi -pedantic-errors"
> > +}
> > +
> > +# Initialize `dg'.
> > +dg-init
> > +
> > +
> > +# Test multi-input compilation
> > +check-for-errors "Multi-input Compilation" \
> > +       [gcc_target_compile "$srcdir/$subdir/a.c $srcdir/$subdir/b.c -c" "" none ""]
> > +
> > +# Compile file and generate an assembler and object file
> > +check-for-errors "Object Generation" \
> > +       [gcc_target_compile "$srcdir/$subdir/a.c -c" "a.o" none ""]
> > +check-for-errors "Object Generation" \
> > +       [gcc_target_compile "$srcdir/$subdir/b.c -c" "a.o" none ""]
> > +check-for-errors "Assembler Generation" \
> > +       [gcc_target_compile "$srcdir/$subdir/a.c -S" "a.S" none ""]
> > +check-for-errors "Assembler Generation" \
> > +       [gcc_target_compile "$srcdir/$subdir/b.c -S" "b.S" none ""]
> > +
> > +# Empty file is a valid program
> > +check-for-errors "Empty Program" \
> > +       [gcc_target_compile "$srcdir/$subdir/empty.c -c" "empty.o" none ""]
> > +
> > +# Test object file passthrough
> > +check-for-errors "Object file passthrough" \
> > +       [gcc_target_compile "$srcdir/$subdir/foo.c a.o" "a.exe" none ""]
> > +
> > +# Test compilation when assembler is provided
> > +check-for-errors "Assembler with Macros" \
> > +       [gcc_target_compile "a.S -c" "a.o" none ""]
> > +
> > +# Clean temporary generated files.
> > +set temp_files {"a.o" "a.S" "b.o" "b.S" "empty.o"}
> > +
> > +foreach f $temp_files {
> > +       if { [file exists $f] } {
> > +               file delete $f
> > +       }
> > +}
> > +
> > +# All done.
> > +dg-finish
> > diff --git a/gcc/testsuite/driver/empty.c b/gcc/testsuite/driver/empty.c
> > new file mode 100644
> > index 00000000000..e69de29bb2d
> > diff --git a/gcc/testsuite/driver/foo.c b/gcc/testsuite/driver/foo.c
> > new file mode 100644
> > index 00000000000..a18fd2a3b14
> > --- /dev/null
> > +++ b/gcc/testsuite/driver/foo.c
> > @@ -0,0 +1,7 @@
> > +void a_func (void);
> > +
> > +int main()
> > +{
> > +  a_func ();
> > +  return 0;
> > +}
> > --
> > 2.28.0
> >

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/6] Parallelize Intra-Procedural Optimizations using the LTO Engine.
  2020-08-24 16:44     ` Josh Triplett
@ 2020-08-24 18:38       ` Giuliano Belinassi
  2020-08-25  7:03         ` Richard Biener
  0 siblings, 1 reply; 31+ messages in thread
From: Giuliano Belinassi @ 2020-08-24 18:38 UTC (permalink / raw)
  To: Josh Triplett; +Cc: gcc-patches

Ho, Josh.

On 08/24, Josh Triplett wrote:
> On Sat, Aug 22, 2020 at 06:04:48PM -0300, Giuliano Belinassi wrote:
> > Hi, Josh
> > 
> > On 08/21, Josh Triplett wrote:
> > > On Thu, Aug 20, 2020 at 07:00:13PM -0300, Giuliano Belinassi wrote:
> > > > This patch series add a new flag "-fparallel-jobs=" to control if the
> > > > compiler should try to compile the current file in parallel.
> > > [...]
> > > > Bootstrapped and Regtested on Linux x86_64.
> > > > 
> > > > Giuliano Belinassi (6):
> > > >   Modify gcc driver for parallel compilation
> > > >   Implement a new partitioner for parallel compilation
> > > >   Implement fork-based parallelism engine
> > > >   Add `+' for Jobserver Integration
> > > >   Add invoke documentation
> > > >   New tests for parallel compilation feature
> > > 
> > > Very nice!
> > 
> > Thank you for your interest in this :)
> > 
> > > 
> > > I'm interested in testing this on a highly parallel system. What
> > > baseline do these patches apply to?  They don't seem to apply to GCC
> > > trunk.
> > 
> > Hummm, this was supposed to work on trunk out of the box. However,
> > there is a high probability that I messed up something while rebasing.
> > I will post a version 2 of it when I get more comments and when I fix
> > the Makefile issue that Joseph pointed out in other e-mail.
> > 
> > If you want to test it on a high parallel system, I think it will be
> > cool to see how it behaves also when --param=promote-statics=1, as it
> > increases parallelism opportunity. :)
> 
> I plan to try several variations, including that.
> 
> I'd like to see how it affects the performance of Linux kernel builds.

Well, I expect little to no impact on that.  I ran an experiment back
on 2018 looking for parallelism bottleneck in Kernel, and what I found
was that the developers did a good job on balancing the file sizes.

This was run on a machine with 4x AMD Opteron CPUs, (64 cores in total)
https://www.ime.usp.br/~belinass/64cores-kernel-experiment.svg

As you can see from this image, the jobs ends almost at the same time.

> 
> > > Also, I tried to bootstrap the current tip of the devel/autopar_devel
> > > branch, but ended up with compiler segfaults that all look like this:
> > > ../../gcc/zlib/compress.c:86:1: internal compiler error: Segmentation fault
> > >    86 | }
> > >       | ^
> > 
> > Well, there was once a bug in this branch when compiling with -flto that
> > caused the assembler output file not to be properly initialized early
> > enough, resulting in LTO LGEN stage writing into a invalid FILE pointer.
> > I fixed this during rebasing but I forgot to push to the autopar_devel
> > branch. In any case, I just pushed the recent changes to autopar_devel
> > which fix this issue.
> 
> That might explain the problem; I had tried to build gcc with the
> bootstrap-lto configuration.
> 
> > In any case, -fparallel-jobs= should NOT be used together with -flto.
> > Although I used part of the LTO engine for development of this feature,
> > they are meant for distinct things. I guess I should give a warning
> > about that in next version :)
> 
> Interesting. Is that something that could change in the future? I'd like
> to be able to get some parallelism when creating the object files, and
> then more parallelism when doing the final LTO link.

Well, if by "final LTO link" you mean LTO's Whole Program Analysis,
that is a quite challenging task to parallelize :)

As for the "creating object files", you mean the LTO LGEN, I think
it is not possible for now because -- as far as I understeand --, LTO
object files are just containers for a intermediate language and
does not support partial linking.

However, I would not expect LGEN bottlenecking compilation of any
project. Most compilation time is spent in optimization, that is
IPA and Intra-Procedural.

> 
> > Also, I just tested bootstrap with
> > 
> > ../gcc/configure --disable-multilib --enable-languages=c,c++
> >
> > on x86_64 linux and it is working.
> 
> I'd used --enable-multilib, and --enable-languages=c,c++,lto . Would
> that be expected to work?

Yes. If it doesn't, that is a bug :)

> 
> Thanks,
> Josh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/6] Modify gcc driver for parallel compilation
  2020-08-24 18:06     ` Giuliano Belinassi
@ 2020-08-25  6:53       ` Richard Biener
  0 siblings, 0 replies; 31+ messages in thread
From: Richard Biener @ 2020-08-25  6:53 UTC (permalink / raw)
  To: Giuliano Belinassi; +Cc: Joseph S. Myers, GCC Patches, Jan Hubicka

On Mon, Aug 24, 2020 at 8:06 PM Giuliano Belinassi
<giuliano.belinassi@usp.br> wrote:
>
> Hi, Richi.
>
> On 08/24, Richard Biener wrote:
> > On Fri, Aug 21, 2020 at 12:00 AM Giuliano Belinassi
> > <giuliano.belinassi@usp.br> wrote:
> > >
> > > Update the driver for parallel compilation. This process work as
> > > follows:
> > >
> > > When calling gcc, the driver will check if the flag
> > > "-fparallel-jobs" was provided by the user. If yes, then we will
> > > check what is the desired output, and if it can be parallelized.
> > > There are the following cases, which is described:
> > >
> > > 1. -S or -E was provided: We can't run in parallel, as the output
> > >    can not be easily merged together into one file.
> > >
> > > 2. -c was provided: When cc1* forks into multiple processes, it
> > >    must tell the driver where it stored its generated assembler files.
> > >    Therefore we pass a hidden "-fsplit-outputs=filename" to the compiler,
> > >    and we check if "filename" was created by it. If yes, we open it,
> > >    call assembler for each generated asm file
> > >    (this file must not be empty), and link them together with
> > >    partial linking to a single .o file. This process is done for each
> > >    object file in the argument list.
> > >
> > > 3. -c was not provided, and the final product will be an binary: Here
> > >    we proceed exactly as 2., but we avoid doing the partial
> > >    linking, feeding the generated object files directly into the final link.
> > >
> > > For that to work, we had to heavily modify how the "execute" function
> > > works, extracting common code which is used multiple times, and
> > > also detecting when the command is a call to a compiler or an
> > > assembler, as can be seen in append_split_outputs.
> > >
> > > Finally, we added some tests which reflects all cases found when
> > > bootstrapping the compiler, so development of further features to the
> > > driver get faster for now on.
> >
> > Few comments inline, Joseph may want to comment on the overall
> > structure as driver maintainer (CCed).
> >
> > I know I asked for the changes on the branch to be squashed but
> > the diff below is quite unreadable with the ChangeLog not helping
> > the overall refactoring much.  Is it possible to do some of the
> > factoring/refactoring without any functionality change to make the
> > actual diff easier to follow?
>
> Well, the refactoring is necessary, otherwise I would need to copy and
> paste a really huge amount of code.
>
> What I can do (and sounds reasonable to me) is to break this patch into
> two parts; one with just refactoring changes, and the other adding the
> parallelism engine.
>
> >
> > Thanks,
> > Richard.
> >
> > > gcc/ChangeLog
> > > 2020-08-20  Giuliano Belinassi  <giuliano.belinassi@usp.br>
> > >
> > >         * common.opt (fsplit-outputs): New flag.
> > >         (fparallel-jobs): New flag.
> > >         * gcc.c (extra_arg_storer): New class.
> > >         (have_S): New variable.
> > >         (struct command): Move from execute.
> > >         (is_compiler): New function.
> > >         (is_assembler): New function.
> > >         (get_number_of_args): New function.
> > >         (get_file_by_lines): New function.
> > >         (identify_asm_file): New function.
> > >         (struct infile): New attribute temp_additional_asm.
> > >         (current_infile): New variable.
> > >         (get_path_to_ld): New function.
> > >         (has_hidden_E): New function.
> > >         (sort_asm_files): New function.
> > >         (append_split_outputs): New function.
> > >         (print_command): New function.
> > >         (print_commands): New function.
> > >         (print_argbuf): New function.
> > >         (handle_verbose): Extracted from execute.
> > >         (append_valgrind): Same as above.
> > >         (async_launch_commands): Same as above.
> > >         (await_commands_to_finish): Same as above.
> > >         (split_commands): Same as above.
> > >         (parse_argbuf): Same as above.
> > >         (execute): Refator.
> > >         (fsplit_arg): New function.
> > >         (alloc_infile): Initialize infiles with 0.
> > >         (process_command): Remember when -S was passed.
> > >         (do_spec_on_infiles): Remember current infile being processed.
> > >         (maybe_run_linker): Replace object files when -o is a executable.
> > >         (finalize): Deinitialize temp_object_files.
> > >
> > > gcc/testsuite/ChangeLog:
> > > 20-08-2020  Giuliano Belinassi  <giuliano.belinassi@usp.br>
> > >
> > >         * driver/driver.exp: New test.
> > >         * driver/a.c: New file.
> > >         * driver/b.c: New file.
> > >         * driver/empty.c: New file.
> > >         * driver/foo.c: New file.
> > > ---
> > >  gcc/common.opt                  |    4 +
> > >  gcc/gcc.c                       | 1219 ++++++++++++++++++++++++-------
> > >  gcc/testsuite/driver/a.c        |    6 +
> > >  gcc/testsuite/driver/b.c        |    6 +
> > >  gcc/testsuite/driver/driver.exp |   80 ++
> > >  gcc/testsuite/driver/empty.c    |    0
> > >  gcc/testsuite/driver/foo.c      |    7 +
> > >  7 files changed, 1049 insertions(+), 273 deletions(-)
> > >  create mode 100644 gcc/testsuite/driver/a.c
> > >  create mode 100644 gcc/testsuite/driver/b.c
> > >  create mode 100644 gcc/testsuite/driver/driver.exp
> > >  create mode 100644 gcc/testsuite/driver/empty.c
> > >  create mode 100644 gcc/testsuite/driver/foo.c
> > >
> > > diff --git a/gcc/common.opt b/gcc/common.opt
> > > index 4b08e91859f..4aa3ad8c95b 100644
> > > --- a/gcc/common.opt
> > > +++ b/gcc/common.opt
> > > @@ -3465,4 +3465,8 @@ fipa-ra
> > >  Common Report Var(flag_ipa_ra) Optimization
> > >  Use caller save register across calls if possible.
> > >
> > > +fsplit-outputs=
> > > +Common Joined Var(split_outputs)
> > > +-fsplit-outputs=<tempfile>  Filename in which current Compilation Unit will be split to.
> > > +
> > >  ; This comment is to ensure we retain the blank line above.
> > > diff --git a/gcc/gcc.c b/gcc/gcc.c
> > > index 10bc9881aed..c276a11ca7a 100644
> > > --- a/gcc/gcc.c
> > > +++ b/gcc/gcc.c
> > > @@ -343,6 +343,74 @@ static struct obstack obstack;
> > >
> > >  static struct obstack collect_obstack;
> > >
> > > +/* This is used to store new argv arrays created dinamically to avoid memory
> > > +   leaks.  */
> > > +
> > > +class extra_arg_storer
> > > +{
> > > +  public:
> > > +
> > > +    /* Initialize the vec with a default size.  */
> > > +
> > > +    extra_arg_storer ()
> > > +      {
> > > +       string_vec.create (8);
> > > +       extra_args.create (64);
> > > +      }
> > > +
> > > +    /* Create new array of strings of size N.  */
> > > +    const char **create_new (size_t n)
> > > +      {
> > > +       const char **ret = XNEWVEC (const char *, n);
> > > +       extra_args.safe_push (ret);
> > > +       return ret;
> > > +      }
> > > +
> > > +    char *create_string (size_t n)
> > > +      {
> > > +       char *ret = XNEWVEC (char, n);
> > > +       string_vec.safe_push (ret);
> > > +       return ret;
> > > +      }
> > > +
> > > +    void store (char *str)
> > > +      {
> > > +       string_vec.safe_push (str);
> > > +      }
> > > +
> > > +    ~extra_arg_storer ()
> > > +      {
> > > +       release_extra_args ();
> > > +       release_string_vec ();
> > > +      }
> > > +
> > > +
> > > +  private:
> > > +
> > > +    /* Release all allocated strings.  */
> > > +    void release_extra_args ()
> > > +      {
> > > +       size_t i;
> > > +
> > > +       for (i = 0; i < extra_args.length (); i++)
> > > +         free (extra_args[i]);
> > > +       extra_args.release ();
> > > +      }
> > > +
> > > +    void release_string_vec ()
> > > +      {
> > > +       size_t i;
> > > +
> > > +       for (i = 0; i < string_vec.length (); i++)
> > > +         free (string_vec[i]);
> > > +       string_vec.release ();
> > > +      }
> > > +
> > > +    /* Data structure to hold all arrays.  */
> > > +    vec<const char **> extra_args;
> > > +    vec<char *> string_vec;
> > > +};
> > > +
> > >  /* Forward declaration for prototypes.  */
> > >  struct path_prefix;
> > >  struct prefix_list;
> > > @@ -1993,6 +2061,9 @@ static int have_o = 0;
> > >  /* Was the option -E passed.  */
> > >  static int have_E = 0;
> > >
> > > +/* Was the option -S passed.  */
> > > +static int have_S = 0;
> > > +
> > >  /* Pointer to output file name passed in with -o. */
> > >  static const char *output_file = 0;
> > >
> > > @@ -3056,158 +3127,522 @@ add_sysrooted_hdrs_prefix (struct path_prefix *pprefix, const char *prefix,
> > >               require_machine_suffix, os_multilib);
> > >  }
> > >
> > > -
> > > -/* Execute the command specified by the arguments on the current line of spec.
> > > -   When using pipes, this includes several piped-together commands
> > > -   with `|' between them.
> > > +struct command
> > > +{
> > > +  const char *prog;            /* program name.  */
> > > +  const char **argv;           /* vector of args.  */
> > > +};
> > >
> > > -   Return 0 if successful, -1 if failed.  */
> > > +#define EMPTY_CMD(x) (!((x).prog))  /* Is the provided CMD empty?  */
> > > +
> > > +/* Check if arg is a call to a compiler.  Return false if not, true if yes.  */
> > > +
> > > +static bool
> > > +is_compiler (const char *arg)
> >
> > This and is_assembler should somehow magically fall out of
> > specs processing, specifically
> >
> > > +{
> > > +  static const char *const compilers[] = {"cc1", "cc1plus", "f771"};
> >
> > ^^ this is incomplete.  Of course I don't know how to auto-infer
> > these but I think it must be possible from somewhere up the
> > call-chain?
>
> I was expecting this to be some sort of issue in the merging process ;)
>
> Well, I remember trying to find a way of doing this once just to find
> out that the compiler name is embedded with a SPEC string, with no way
> to check if the name is actually, a compiler or assembler.

Hmm, I see.  I guess this might say that the current approach is flawed
and the logic should instead be fully contained in specs processing itself,
at least parts of it, say setting whether we're a compilation or not.

OTOH, -fcompare-debug handling already has quite some explicit code
in gcc.c and that's probably "closest" to what we need (processing multiple
outputs on the same input).

But things like appending -fsplit-outputs=<tem> on -fparallel-jobs=
can be done via specs.

As said it's a bit hard to follow the actual changes in the patch because
of the refactoring.  I hope Joseph knows better what can and what can not
be done on the specs level.

> >
> > > +  const char* ptr = arg;
> > > +
> > > +  size_t i;
> > > +
> > > +  /* Jump to last '/' of string.  */
> > > +  while (*arg)
> > > +    if (*arg++ == '/')
> > > +      ptr = arg;
> > > +
> > > +  /* Look if current character seems valid.  */
> > > +  gcc_assert (!(*ptr == '\0' ||  *ptr == '/'));
> > > +
> > > +  for (i = 0; i < ARRAY_SIZE (compilers); i++)
> > > +    {
> > > +      if (!strcmp (ptr, compilers[i]))
> > > +       return true;
> > > +    }
> > > +
> > > +  return false;
> > > +}
> > > +
> > > +/* Check if arg is a call to as.  Return false if not, true if yes.  */
> > > +
> > > +static bool
> > > +is_assembler (const char *arg)
> > > +{
> > > +  static const char *const assemblers[] = {"as", "gas"};
> > > +  const char* ptr = arg;
> > > +
> > > +  size_t i;
> > > +
> > > +  /* Jump to last '/' of string.  */
> > > +  while (*arg)
> > > +    if (*arg++ == '/')
> > > +      ptr = arg;
> > > +
> > > +  /* Look if current character seems valid.  */
> > > +  gcc_assert (!(*ptr == '\0' ||  *ptr == '/'));
> > > +
> > > +  for (i = 0; i < ARRAY_SIZE (assemblers); i++)
> > > +    {
> > > +      if (!strcmp (ptr, assemblers[i]))
> > > +       return true;
> > > +    }
> > > +
> > > +  return false;
> > > +}
> > > +
> > > +/* Get argv[] array length.  */
> > >
> > >  static int
> > > -execute (void)
> > > +get_number_of_args (const char *argv[])
> > > +{
> > > +  int argc;
> > > +
> > > +  for (argc = 0; argv[argc] != NULL; argc++)
> > > +    ;
> > > +
> > > +  return argc;
> > > +}
> > > +
> > > +static const char *fsplit_arg (extra_arg_storer *);
> > > +
> > > +/* Accumulate each line in lines vec.  Return true if file exists, false if
> > > +   not.  */
> > > +
> > > +static bool
> > > +get_file_by_lines (extra_arg_storer *storer, vec<char *> *lines, const char *name)
> > > +{
> > > +  int buf_size = 64, len = 0;
> > > +  char *buf = XNEWVEC (char, buf_size);
> > > +
> > > +
> > > +  FILE *file = fopen (name, "r");
> > > +
> > > +  if (!file)
> > > +    return false;
> > > +
> > > +  while (1)
> > > +    {
> > > +      if (!fgets (buf + len, buf_size, file))
> > > +       {
> > > +         free (buf); /* Release buffer we created unecessarily.  */
> > > +         break;
> > > +       }
> > > +
> > > +      len = strlen (buf);
> > > +      if (buf[len - 1] == '\n') /* Check if we indeed read the entire line.  */
> > > +       {
> > > +         buf[len - 1] = '\0';
> > > +         /* Yes.  Insert into the lines vector.  */
> > > +         lines->safe_push (buf);
> > > +         len = 0;
> > > +
> > > +         /* Store the created string for future release.  */
> > > +         storer->store (buf);
> > > +         buf = XNEWVEC (char, buf_size);
> > > +       }
> > > +      else
> > > +       {
> > > +         /* No.  Increase the buffer size and read again.  */
> > > +         buf = XRESIZEVEC (char, buf, buf_size * 2);
> > > +       }
> > > +    }
> > > +
> > > +  if (lines->length () == 0)
> > > +    internal_error ("Empty file: %s", name);
> > > +
> > > +  fclose (file);
> > > +  return true;
> > > +}
> > > +
> > > +static void
> > > +identify_asm_file (int argc, const char *argv[],
> > > +                  int *infile_pos, int *outfile_pos)
> > >  {
> > >    int i;
> > > -  int n_commands;              /* # of command.  */
> > > -  char *string;
> > > -  struct pex_obj *pex;
> > > -  struct command
> > > -  {
> > > -    const char *prog;          /* program name.  */
> > > -    const char **argv;         /* vector of args.  */
> > > -  };
> > > -  const char *arg;
> > >
> > > -  struct command *commands;    /* each command buffer with above info.  */
> > > +  static const char *asm_extension[] = {"s", "S"};
> > >
> > > -  gcc_assert (!processing_spec_function);
> > > +  bool infile_found = false;
> > > +  bool outfile_found = false;
> > >
> > > -  if (wrapper_string)
> > > +  for (i = 0; i < argc; i++)
> > >      {
> > > -      string = find_a_file (&exec_prefixes,
> > > -                           argbuf[0], X_OK, false);
> > > -      if (string)
> > > -       argbuf[0] = string;
> > > -      insert_wrapper (wrapper_string);
> > > +      const char *arg = argv[i];
> > > +      const char *ext = argv[i];
> > > +      unsigned j;
> > > +
> > > +      /* Jump to last '.' of string.  */
> > > +      while (*arg)
> > > +       if (*arg++ == '.')
> > > +         ext = arg;
> > > +
> > > +      if (!infile_found)
> > > +       for (j = 0; j < ARRAY_SIZE (asm_extension); ++j)
> > > +           if (!strcmp (ext, asm_extension[j]))
> > > +             {
> > > +               infile_found = true;
> > > +               *infile_pos = i;
> > > +               break;
> > > +             }
> > > +
> > > +      if (!outfile_found)
> > > +       if (!strcmp (ext, "-o"))
> > > +         {
> > > +           outfile_found = true;
> > > +           *outfile_pos = i+1;
> > > +         }
> > > +
> > > +      if (infile_found && outfile_found)
> > > +       return;
> > >      }
> > >
> > > -  /* Count # of piped commands.  */
> > > -  for (n_commands = 1, i = 0; argbuf.iterate (i, &arg); i++)
> > > -    if (strcmp (arg, "|") == 0)
> > > -      n_commands++;
> > > +  gcc_assert (infile_found && outfile_found);
> > >
> > > -  /* Get storage for each command.  */
> > > -  commands = (struct command *) alloca (n_commands * sizeof (struct command));
> > > +}
> > >
> > > -  /* Split argbuf into its separate piped processes,
> > > -     and record info about each one.
> > > -     Also search for the programs that are to be run.  */
> > > +/* Language is one of three things:
> > >
> > > -  argbuf.safe_push (0);
> > > +   1) The name of a real programming language.
> > > +   2) NULL, indicating that no one has figured out
> > > +   what it is yet.
> > > +   3) '*', indicating that the file should be passed
> > > +   to the linker.  */
> > > +struct infile
> > > +{
> > > +  const char *name;
> > > +  const char *language;
> > > +  const char *temp_additional_asm;
> > > +  struct compiler *incompiler;
> > > +  bool compiled;
> > > +  bool preprocessed;
> > > +};
> > >
> > > -  commands[0].prog = argbuf[0]; /* first command.  */
> > > -  commands[0].argv = argbuf.address ();
> > > +/* Also a vector of input files specified.  */
> > >
> > > -  if (!wrapper_string)
> > > +static struct infile *infiles;
> > > +static struct infile *current_infile = NULL;
> > > +
> > > +int n_infiles;
> > > +
> > > +static int n_infiles_alloc;
> > > +
> > > +static vec<const char *> temp_object_files;
> > > +
> > > +/* Get path to the configured ld.  */
> > > +
> > > +static const char *
> > > +get_path_to_ld (void)
> > > +{
> > > +  const char *ret = find_a_file (&exec_prefixes, LINKER_NAME, X_OK, false);
> > > +  if (!ret)
> > > +    ret = "ld";
> > > +
> > > +  return ret;
> > > +}
> > > +
> > > +/* Check if a hidden -E was passed as argument to something.  */
> > > +
> > > +static bool
> > > +has_hidden_E (int argc, const char *argv[])
> > > +{
> > > +  int i;
> > > +  for (i = 0; i < argc; ++i)
> > > +    if (!strcmp (argv[i], "-E"))
> > > +      return true;
> > > +
> > > +  return false;
> > > +}
> > > +
> > > +/* Assembler in the container file are inserted as soon as they are ready.
> > > +   Sort them so that builds are reproducible.  */
> >
> > In principle the list of outputs is pre-determined by the
> > scheduler compiling the partitions - is there any reason
> > to write the file with the output names only incrementally
> > rather than in one (sorted) go?
>
> No. This surely could be done by the main process.

Good.  It looks less fragile to do there.

> >
> > > +static void
> > > +sort_asm_files (vec <char *> *_lines)
> > > +{
> > > +  vec <char *> &lines = *_lines;
> > > +  int i, n = lines.length ();
> > > +  char **temp_buf = XALLOCAVEC (char *, n);
> > > +
> > > +  for (i = 0; i < n; i++)
> > > +    temp_buf[i] = lines[i];
> > > +
> > > +  for (i = 0; i < n; i++)
> > >      {
> > > -      string = find_a_file (&exec_prefixes, commands[0].prog, X_OK, false);
> > > -      if (string)
> > > -       commands[0].argv[0] = string;
> > > +      char *no_str = strtok (temp_buf[i], " ");
> > > +      char *name = strtok (NULL, "");
> > > +
> > > +      int pos = atoi (no_str);
> > > +      lines[pos] = name;
> > >      }
> > > +}
> > >
> > > -  for (n_commands = 1, i = 0; argbuf.iterate (i, &arg); i++)
> > > -    if (arg && strcmp (arg, "|") == 0)
> > > -      {                                /* each command.  */
> > > -#if defined (__MSDOS__) || defined (OS2) || defined (VMS)
> > > -       fatal_error (input_location, "%<-pipe%> not supported");
> > > -#endif
> > > -       argbuf[i] = 0; /* Termination of command args.  */
> > > -       commands[n_commands].prog = argbuf[i + 1];
> > > -       commands[n_commands].argv
> > > -         = &(argbuf.address ())[i + 1];
> > > -       string = find_a_file (&exec_prefixes, commands[n_commands].prog,
> > > -                             X_OK, false);
> > > -       if (string)
> > > -         commands[n_commands].argv[0] = string;
> > > -       n_commands++;
> > > -      }
> > > +/* Append -fsplit-output=<tempfile> to all calls to compilers.  Return true
> > > +   if a additional call to LD is required to merge the resulting files.  */
> > >
> > > -  /* If -v, print what we are about to do, and maybe query.  */
> > > +static void
> > > +append_split_outputs (extra_arg_storer *storer,
> > > +                     struct command *additional_ld,
> > > +                     struct command **_commands,
> > > +                     int *_n_commands)
> > > +{
> > > +  int i;
> > >
> > > -  if (verbose_flag)
> > > +  struct command *commands = *_commands;
> > > +  int n_commands = *_n_commands;
> > > +
> > > +  const char **argv;
> > > +  int argc;
> > > +
> > > +  if (is_compiler (commands[0].prog))
> > > +    {
> > > +      argc = get_number_of_args (commands[0].argv);
> > > +      argv = storer->create_new (argc + 4);
> > > +
> > > +      memcpy (argv, commands[0].argv, argc * sizeof (const char *));
> > > +
> > > +      if (!has_hidden_E (argc, commands[0].argv))
> > > +       {
> > > +         const char *extra_argument = fsplit_arg (storer);
> > > +         argv[argc++] = extra_argument;
> > > +       }
> > > +
> > > +      if (have_c)
> > > +       {
> > > +         argv[argc++] = "-fPIE";
> > > +         argv[argc++] = "-fPIC";
> >
> > Uh, I think this has to go away - this must be from some early
> > problems and no longer necessary?
>
> Woops. Yeah, I just erased that and bootstrap is still working. :)

Phew ;)

> >
> > > +       }
> > > +
> > > +      argv[argc]   = NULL;
> > > +
> > > +      commands[0].argv = argv;
> > > +    }
> > > +
> > > +  else if (is_assembler (commands[0].prog))
> > >      {
> > > -      /* For help listings, put a blank line between sub-processes.  */
> > > -      if (print_help_list)
> > > -       fputc ('\n', stderr);
> > > +      vec<char *> additional_asm_files;
> > > +
> > > +      struct command orig;
> > > +      const char **orig_argv;
> > > +      int orig_argc;
> > > +      const char *orig_obj_file;
> > > +
> > > +      int infile_pos = -1;
> > > +      int outfile_pos = -1;
> > > +
> > > +      static const char *path_to_ld = NULL;
> > > +
> > > +      if (!current_infile->temp_additional_asm)
> > > +       {
> > > +         /* Return because we did not create a additional-asm file for this
> > > +            input.  */
> > > +
> > > +         return;
> > > +       }
> > > +
> > > +      additional_asm_files.create (2);
> > > +
> > > +      if (!get_file_by_lines (storer, &additional_asm_files,
> > > +                             current_infile->temp_additional_asm))
> > > +       {
> > > +         additional_asm_files.release ();
> > > +         return; /* File not found.  This means that cc1* decided not to
> > > +                     parallelize.  */
> > > +       }
> > > +
> > > +      sort_asm_files (&additional_asm_files);
> > > +
> > > +      if (n_commands != 1)
> > > +       fatal_error (input_location,
> > > +                    "Auto parallelism is unsupported when piping commands");
> > > +
> > > +      if (!path_to_ld)
> > > +       path_to_ld = get_path_to_ld ();
> > > +
> > > +      /* Get original command.  */
> > > +      orig = commands[0];
> > > +      orig_argv = commands[0].argv;
> > > +      orig_argc = get_number_of_args (orig.argv);
> > > +
> > > +
> > > +      /* Update commands array to include the extra `as' calls.  */
> > > +      *_n_commands = additional_asm_files.length ();
> > > +      n_commands = *_n_commands;
> > > +
> > > +      gcc_assert (n_commands > 0);
> > > +
> > > +      identify_asm_file (orig_argc, orig_argv, &infile_pos, &outfile_pos);
> > > +
> > > +      *_commands = XRESIZEVEC (struct command, *_commands, n_commands);
> > > +      commands = *_commands;
> > >
> > > -      /* Print each piped command as a separate line.  */
> > >        for (i = 0; i < n_commands; i++)
> > >         {
> > > -         const char *const *j;
> > > +         const char **argv = storer->create_new (orig_argc + 1);
> > > +         const char *temp_obj = make_temp_file ("additional-obj.o");
> > > +         record_temp_file (temp_obj, true, true);
> > > +         record_temp_file (additional_asm_files[i], true, true);
> > > +
> > > +         memcpy (argv, orig_argv, (orig_argc + 1) * sizeof (const char *));
> > > +
> > > +         orig_obj_file = argv[outfile_pos];
> > > +
> > > +         argv[infile_pos]  = additional_asm_files[i];
> > > +         argv[outfile_pos] = temp_obj;
> > > +
> > > +         commands[i].prog = orig.prog;
> > > +         commands[i].argv = argv;
> > > +
> > > +         temp_object_files.safe_push (temp_obj);
> > > +       }
> > > +
> > > +       if (have_c)
> > > +         {
> > > +           unsigned int num_temp_objs = temp_object_files.length ();
> > > +           const char **argv = storer->create_new (num_temp_objs + 5);
> > > +           unsigned int j;
> > > +
> > > +           argv[0] = path_to_ld;
> > > +           argv[1] = "-o";
> > > +           argv[2] = orig_obj_file;
> > > +           argv[3] = "-r";
> > > +
> > > +           for (j = 0; j < num_temp_objs; j++)
> > > +             argv[j + 4] = temp_object_files[j];
> > > +           argv[j + 4] = NULL;
> > > +
> > > +           additional_ld->prog = path_to_ld;
> > > +           additional_ld->argv = argv;
> > > +
> > > +           if (!have_o)
> > > +             temp_object_files.truncate (0);
> > > +         }
> > > +
> > > +       additional_asm_files.release ();
> > > +    }
> > > +}
> > > +
> > > +DEBUG_FUNCTION void
> > > +print_command (struct command *command)
> > > +{
> > > +  const char **argv;
> > > +
> > > +  for (argv = command->argv; *argv != NULL; argv++)
> > > +    fprintf (stdout, " %s", *argv);
> > > +  fputc ('\n', stdout);
> > > +}
> > > +
> > > +DEBUG_FUNCTION void
> > > +print_commands (int n, struct command *commands)
> > > +{
> > > +  int i;
> > > +
> > > +  for (i = 0; i < n; i++)
> > > +    print_command (&commands[i]);
> > > +}
> > > +
> > > +DEBUG_FUNCTION void
> > > +print_argbuf ()
> > > +{
> > > +  int i;
> > > +  const char *arg;
> > > +
> > > +  for (i = 0; argbuf.iterate (i, &arg); i++)
> > > +    fprintf (stdout, "%s ", arg);
> > > +  fputc ('\n', stdout);
> > > +}
> > > +
> > > +
> > > +/* Print what commands will run.  Return 0 if success, anything else on
> > > +   error.  */
> > >
> > > -         if (verbose_only_flag)
> > > +static int
> > > +handle_verbose (int n_commands, struct command commands[])
> > > +{
> > > +  int i;
> > > +
> > > +  /* For help listings, put a blank line between sub-processes.  */
> > > +  if (print_help_list)
> > > +    fputc ('\n', stderr);
> > > +
> > > +  /* Print each piped command as a separate line.  */
> > > +  for (i = 0; i < n_commands; i++)
> > > +    {
> > > +      const char *const *j;
> > > +
> > > +      if (verbose_only_flag)
> > > +       {
> > > +         for (j = commands[i].argv; *j; j++)
> > >             {
> > > -             for (j = commands[i].argv; *j; j++)
> > > +             const char *p;
> > > +             for (p = *j; *p; ++p)
> > > +               if (!ISALNUM ((unsigned char) *p)
> > > +                   && *p != '_' && *p != '/' && *p != '-' && *p != '.')
> > > +                 break;
> > > +             if (*p || !*j)
> > >                 {
> > > -                 const char *p;
> > > +                 fprintf (stderr, " \"");
> > >                   for (p = *j; *p; ++p)
> > > -                   if (!ISALNUM ((unsigned char) *p)
> > > -                       && *p != '_' && *p != '/' && *p != '-' && *p != '.')
> > > -                     break;
> > > -                 if (*p || !*j)
> > >                     {
> > > -                     fprintf (stderr, " \"");
> > > -                     for (p = *j; *p; ++p)
> > > -                       {
> > > -                         if (*p == '"' || *p == '\\' || *p == '$')
> > > -                           fputc ('\\', stderr);
> > > -                         fputc (*p, stderr);
> > > -                       }
> > > -                     fputc ('"', stderr);
> > > +                     if (*p == '"' || *p == '\\' || *p == '$')
> > > +                       fputc ('\\', stderr);
> > > +                     fputc (*p, stderr);
> > >                     }
> > > -                 /* If it's empty, print "".  */
> > > -                 else if (!**j)
> > > -                   fprintf (stderr, " \"\"");
> > > -                 else
> > > -                   fprintf (stderr, " %s", *j);
> > > -               }
> > > -           }
> > > -         else
> > > -           for (j = commands[i].argv; *j; j++)
> > > +                 fputc ('"', stderr);
> > > +               }
> > >               /* If it's empty, print "".  */
> > > -             if (!**j)
> > > +             else if (!**j)
> > >                 fprintf (stderr, " \"\"");
> > >               else
> > >                 fprintf (stderr, " %s", *j);
> > > -
> > > -         /* Print a pipe symbol after all but the last command.  */
> > > -         if (i + 1 != n_commands)
> > > -           fprintf (stderr, " |");
> > > -         fprintf (stderr, "\n");
> > > +           }
> > >         }
> > > -      fflush (stderr);
> > > -      if (verbose_only_flag != 0)
> > > -        {
> > > -         /* verbose_only_flag should act as if the spec was
> > > -            executed, so increment execution_count before
> > > -            returning.  This prevents spurious warnings about
> > > -            unused linker input files, etc.  */
> > > -         execution_count++;
> > > -         return 0;
> > > -        }
> > > +      else
> > > +       for (j = commands[i].argv; *j; j++)
> > > +         /* If it's empty, print "".  */
> > > +         if (!**j)
> > > +           fprintf (stderr, " \"\"");
> > > +         else
> > > +           fprintf (stderr, " %s", *j);
> > > +
> > > +      /* Print a pipe symbol after all but the last command.  */
> > > +      if (i + 1 != n_commands)
> > > +       fprintf (stderr, " |");
> > > +      fprintf (stderr, "\n");
> > > +    }
> > > +  fflush (stderr);
> > > +  if (verbose_only_flag != 0)
> > > +    {
> > > +      /* verbose_only_flag should act as if the spec was
> > > +        executed, so increment execution_count before
> > > +        returning.  This prevents spurious warnings about
> > > +        unused linker input files, etc.  */
> > > +      execution_count++;
> > > +      return 1;
> > > +    }
> > >  #ifdef DEBUG
> > > -      fnotice (stderr, "\nGo ahead? (y or n) ");
> > > -      fflush (stderr);
> > > -      i = getchar ();
> > > -      if (i != '\n')
> > > -       while (getchar () != '\n')
> > > -         ;
> > > -
> > > -      if (i != 'y' && i != 'Y')
> > > -       return 0;
> > > +  fnotice (stderr, "\nGo ahead? (y or n) ");
> > > +  fflush (stderr);
> > > +  i = getchar ();
> > > +  if (i != '\n')
> > > +    while (getchar () != '\n')
> > > +      ;
> > > +
> > > +  if (i != 'y' && i != 'Y')
> > > +    return 1;
> > >  #endif /* DEBUG */
> > > -    }
> > > +
> > > +  return 0;
> > > +}
> > >
> > >  #ifdef ENABLE_VALGRIND_CHECKING
> > > +
> > > +/* Append valgrind to each program.  */
> > > +
> > > +static void
> > > +append_valgrind (struct obstack *to_be_released,
> > > +                int n_commands, struct command commands[])
> > > +{
> > > +  int i;
> > > +
> > >    /* Run the each command through valgrind.  To simplify prepending the
> > >       path to valgrind and the option "-q" (for quiet operation unless
> > >       something triggers), we allocate a separate argv array.  */
> > > @@ -3221,7 +3656,7 @@ execute (void)
> > >        for (argc = 0; commands[i].argv[argc] != NULL; argc++)
> > >         ;
> > >
> > > -      argv = XALLOCAVEC (const char *, argc + 3);
> > > +      argv = obstack_alloc (to_be_released, (argc + 3) * sizeof (const char *));
> > >
> > >        argv[0] = VALGRIND_PATH;
> > >        argv[1] = "-q";
> > > @@ -3232,15 +3667,16 @@ execute (void)
> > >        commands[i].argv = argv;
> > >        commands[i].prog = argv[0];
> > >      }
> > > +}
> > >  #endif
> > >
> > > -  /* Run each piped subprocess.  */
> > > +/* Launch a list of commands asynchronously.  */
> > >
> > > -  pex = pex_init (PEX_USE_PIPES | ((report_times || report_times_to_file)
> > > -                                  ? PEX_RECORD_TIMES : 0),
> > > -                 progname, temp_filename);
> > > -  if (pex == NULL)
> > > -    fatal_error (input_location, "%<pex_init%> failed: %m");
> > > +static void
> > > +async_launch_commands (struct pex_obj *pex,
> > > +                      int n_commands, struct command commands[])
> > > +{
> > > +  int i;
> > >
> > >    for (i = 0; i < n_commands; i++)
> > >      {
> > > @@ -3267,151 +3703,341 @@ execute (void)
> > >      }
> > >
> > >    execution_count++;
> > > +}
> > >
> > > -  /* Wait for all the subprocesses to finish.  */
> > >
> > > -  {
> > > -    int *statuses;
> > > -    struct pex_time *times = NULL;
> > > -    int ret_code = 0;
> > > +/* Wait for all the subprocesses to finish.  Return 0 on success, -1 on
> > > +   failure.  */
> > >
> > > -    statuses = (int *) alloca (n_commands * sizeof (int));
> > > -    if (!pex_get_status (pex, n_commands, statuses))
> > > -      fatal_error (input_location, "failed to get exit status: %m");
> > > +static int
> > > +await_commands_to_finish (struct pex_obj *pex,
> > > +                         int n_commands, struct command commands[])
> > > +{
> > >
> > > -    if (report_times || report_times_to_file)
> > > -      {
> > > -       times = (struct pex_time *) alloca (n_commands * sizeof (struct pex_time));
> > > -       if (!pex_get_times (pex, n_commands, times))
> > > -         fatal_error (input_location, "failed to get process times: %m");
> > > -      }
> > > +  int *statuses;
> > > +  struct pex_time *times = NULL;
> > > +  int ret_code = 0, i;
> > >
> > > -    pex_free (pex);
> > > +  statuses = (int *) alloca (n_commands * sizeof (int));
> > > +  if (!pex_get_status (pex, n_commands, statuses))
> > > +    fatal_error (input_location, "failed to get exit status: %m");
> > >
> > > -    for (i = 0; i < n_commands; ++i)
> > > -      {
> > > -       int status = statuses[i];
> > > +  if (report_times || report_times_to_file)
> > > +    {
> > > +      times = (struct pex_time *) alloca (n_commands * sizeof (*times));
> > > +      if (!pex_get_times (pex, n_commands, times))
> > > +       fatal_error (input_location, "failed to get process times: %m");
> > > +    }
> > >
> > > -       if (WIFSIGNALED (status))
> > > -         switch (WTERMSIG (status))
> > > -           {
> > > -           case SIGINT:
> > > -           case SIGTERM:
> > > -             /* SIGQUIT and SIGKILL are not available on MinGW.  */
> > > +  for (i = 0; i < n_commands; ++i)
> > > +    {
> > > +      int status = statuses[i];
> > > +
> > > +      if (WIFSIGNALED (status))
> > > +       switch (WTERMSIG (status))
> > > +         {
> > > +         case SIGINT:
> > > +         case SIGTERM:
> > > +           /* SIGQUIT and SIGKILL are not available on MinGW.  */
> > >  #ifdef SIGQUIT
> > > -           case SIGQUIT:
> > > +         case SIGQUIT:
> > >  #endif
> > >  #ifdef SIGKILL
> > > -           case SIGKILL:
> > > +         case SIGKILL:
> > >  #endif
> > > -             /* The user (or environment) did something to the
> > > -                inferior.  Making this an ICE confuses the user into
> > > -                thinking there's a compiler bug.  Much more likely is
> > > -                the user or OOM killer nuked it.  */
> > > -             fatal_error (input_location,
> > > -                          "%s signal terminated program %s",
> > > -                          strsignal (WTERMSIG (status)),
> > > -                          commands[i].prog);
> > > -             break;
> > > +           /* The user (or environment) did something to the
> > > +              inferior.  Making this an ICE confuses the user into
> > > +              thinking there's a compiler bug.  Much more likely is
> > > +              the user or OOM killer nuked it.  */
> > > +           fatal_error (input_location,
> > > +                        "%s signal terminated program %s",
> > > +                        strsignal (WTERMSIG (status)),
> > > +                        commands[i].prog);
> > > +           break;
> > >
> > >  #ifdef SIGPIPE
> > > -           case SIGPIPE:
> > > -             /* SIGPIPE is a special case.  It happens in -pipe mode
> > > -                when the compiler dies before the preprocessor is
> > > -                done, or the assembler dies before the compiler is
> > > -                done.  There's generally been an error already, and
> > > -                this is just fallout.  So don't generate another
> > > -                error unless we would otherwise have succeeded.  */
> > > -             if (signal_count || greatest_status >= MIN_FATAL_STATUS)
> > > -               {
> > > -                 signal_count++;
> > > -                 ret_code = -1;
> > > -                 break;
> > > -               }
> > > +         case SIGPIPE:
> > > +           /* SIGPIPE is a special case.  It happens in -pipe mode
> > > +              when the compiler dies before the preprocessor is
> > > +              done, or the assembler dies before the compiler is
> > > +              done.  There's generally been an error already, and
> > > +              this is just fallout.  So don't generate another
> > > +              error unless we would otherwise have succeeded.  */
> > > +           if (signal_count || greatest_status >= MIN_FATAL_STATUS)
> > > +             {
> > > +               signal_count++;
> > > +               ret_code = -1;
> > > +               break;
> > > +             }
> > >  #endif
> > > -             /* FALLTHROUGH */
> > > +           /* FALLTHROUGH.  */
> > >
> > > -           default:
> > > -             /* The inferior failed to catch the signal.  */
> > > -             internal_error_no_backtrace ("%s signal terminated program %s",
> > > -                                          strsignal (WTERMSIG (status)),
> > > -                                          commands[i].prog);
> > > -           }
> > > -       else if (WIFEXITED (status)
> > > -                && WEXITSTATUS (status) >= MIN_FATAL_STATUS)
> > > -         {
> > > -           /* For ICEs in cc1, cc1obj, cc1plus see if it is
> > > -              reproducible or not.  */
> > > -           const char *p;
> > > -           if (flag_report_bug
> > > -               && WEXITSTATUS (status) == ICE_EXIT_CODE
> > > -               && i == 0
> > > -               && (p = strrchr (commands[0].argv[0], DIR_SEPARATOR))
> > > -               && ! strncmp (p + 1, "cc1", 3))
> > > -             try_generate_repro (commands[0].argv);
> > > -           if (WEXITSTATUS (status) > greatest_status)
> > > -             greatest_status = WEXITSTATUS (status);
> > > -           ret_code = -1;
> > > +         default:
> > > +           /* The inferior failed to catch the signal.  */
> > > +           internal_error_no_backtrace ("%s signal terminated program %s",
> > > +                                        strsignal (WTERMSIG (status)),
> > > +                                        commands[i].prog);
> > >           }
> > > +      else if (WIFEXITED (status)
> > > +              && WEXITSTATUS (status) >= MIN_FATAL_STATUS)
> > > +       {
> > > +         /* For ICEs in cc1, cc1obj, cc1plus see if it is
> > > +            reproducible or not.  */
> > > +         const char *p;
> > > +         if (flag_report_bug
> > > +             && WEXITSTATUS (status) == ICE_EXIT_CODE
> > > +             && i == 0
> > > +             && (p = strrchr (commands[0].argv[0], DIR_SEPARATOR))
> > > +             && ! strncmp (p + 1, "cc1", 3))
> > > +           try_generate_repro (commands[0].argv);
> > > +         if (WEXITSTATUS (status) > greatest_status)
> > > +           greatest_status = WEXITSTATUS (status);
> > > +         ret_code = -1;
> > > +       }
> > >
> > > -       if (report_times || report_times_to_file)
> > > -         {
> > > -           struct pex_time *pt = &times[i];
> > > -           double ut, st;
> > > +      if (report_times || report_times_to_file)
> > > +       {
> > > +         struct pex_time *pt = &times[i];
> > > +         double ut, st;
> > >
> > > -           ut = ((double) pt->user_seconds
> > > -                 + (double) pt->user_microseconds / 1.0e6);
> > > -           st = ((double) pt->system_seconds
> > > -                 + (double) pt->system_microseconds / 1.0e6);
> > > +         ut = ((double) pt->user_seconds
> > > +               + (double) pt->user_microseconds / 1.0e6);
> > > +         st = ((double) pt->system_seconds
> > > +               + (double) pt->system_microseconds / 1.0e6);
> > >
> > > -           if (ut + st != 0)
> > > -             {
> > > -               if (report_times)
> > > -                 fnotice (stderr, "# %s %.2f %.2f\n",
> > > -                          commands[i].prog, ut, st);
> > > +         if (ut + st != 0)
> > > +           {
> > > +             if (report_times)
> > > +               fnotice (stderr, "# %s %.2f %.2f\n",
> > > +                        commands[i].prog, ut, st);
> > >
> > > -               if (report_times_to_file)
> > > -                 {
> > > -                   int c = 0;
> > > -                   const char *const *j;
> > > +             if (report_times_to_file)
> > > +               {
> > > +                 int c = 0;
> > > +                 const char *const *j;
> > >
> > > -                   fprintf (report_times_to_file, "%g %g", ut, st);
> > > +                 fprintf (report_times_to_file, "%g %g", ut, st);
> > >
> > > -                   for (j = &commands[i].prog; *j; j = &commands[i].argv[++c])
> > > -                     {
> > > -                       const char *p;
> > > -                       for (p = *j; *p; ++p)
> > > -                         if (*p == '"' || *p == '\\' || *p == '$'
> > > -                             || ISSPACE (*p))
> > > -                           break;
> > > +                 for (j = &commands[i].prog; *j; j = &commands[i].argv[++c])
> > > +                   {
> > > +                     const char *p;
> > > +                     for (p = *j; *p; ++p)
> > > +                       if (*p == '"' || *p == '\\' || *p == '$'
> > > +                           || ISSPACE (*p))
> > > +                         break;
> > >
> > > -                       if (*p)
> > > -                         {
> > > -                           fprintf (report_times_to_file, " \"");
> > > -                           for (p = *j; *p; ++p)
> > > -                             {
> > > -                               if (*p == '"' || *p == '\\' || *p == '$')
> > > -                                 fputc ('\\', report_times_to_file);
> > > -                               fputc (*p, report_times_to_file);
> > > -                             }
> > > -                           fputc ('"', report_times_to_file);
> > > -                         }
> > > -                       else
> > > -                         fprintf (report_times_to_file, " %s", *j);
> > > -                     }
> > > +                     if (*p)
> > > +                       {
> > > +                         fprintf (report_times_to_file, " \"");
> > > +                         for (p = *j; *p; ++p)
> > > +                           {
> > > +                             if (*p == '"' || *p == '\\' || *p == '$')
> > > +                               fputc ('\\', report_times_to_file);
> > > +                             fputc (*p, report_times_to_file);
> > > +                           }
> > > +                         fputc ('"', report_times_to_file);
> > > +                       }
> > > +                     else
> > > +                       fprintf (report_times_to_file, " %s", *j);
> > > +                   }
> > >
> > > -                   fputc ('\n', report_times_to_file);
> > > -                 }
> > > -             }
> > > -         }
> > > +                 fputc ('\n', report_times_to_file);
> > > +               }
> > > +           }
> > > +       }
> > > +    }
> > > +
> > > +  return ret_code;
> > > +}
> > > +
> > > +/* Split a single command with pipes into several commands.  */
> > > +
> > > +static void
> > > +split_commands (vec<const_char_p> *argbuf_p,
> > > +               int n_commands, struct command commands[])
> > > +{
> > > +  int i;
> > > +  const char *arg;
> > > +  vec<const_char_p> &argbuf = *argbuf_p;
> > > +
> > > +  for (n_commands = 1, i = 0; argbuf.iterate (i, &arg); i++)
> > > +    if (arg && strcmp (arg, "|") == 0)
> > > +      {                                /* each command.  */
> > > +       const char *string;
> > > +#if defined (__MSDOS__) || defined (OS2) || defined (VMS)
> > > +       fatal_error (input_location, "%<-pipe%> not supported");
> > > +#endif
> > > +       argbuf[i] = 0; /* Termination of command args.  */
> > > +       commands[n_commands].prog = argbuf[i + 1];
> > > +       commands[n_commands].argv
> > > +         = &(argbuf.address ())[i + 1];
> > > +       string = find_a_file (&exec_prefixes, commands[n_commands].prog,
> > > +                             X_OK, false);
> > > +       if (string)
> > > +         commands[n_commands].argv[0] = string;
> > > +       n_commands++;
> > >        }
> > > +}
> > > +
> > > +struct command *
> > > +parse_argbuf (vec <const_char_p> *argbuf_p, int *n)
> > > +{
> > > +  int i, n_commands;
> > > +  vec<const_char_p> &argbuf = *argbuf_p;
> > > +  const char *arg;
> > > +  struct command *commands;
> > >
> > > -   if (commands[0].argv[0] != commands[0].prog)
> > > -     free (CONST_CAST (char *, commands[0].argv[0]));
> > > +  /* Count # of piped commands.  */
> > > +  for (n_commands = 1, i = 0; argbuf.iterate (i, &arg); i++)
> > > +    if (strcmp (arg, "|") == 0)
> > > +      n_commands++;
> > >
> > > -    return ret_code;
> > > -  }
> > > +  /* Get storage for each command.  */
> > > +  commands = XNEWVEC (struct command, n_commands);
> > > +
> > > +  /* Split argbuf into its separate piped processes,
> > > +     and record info about each one.
> > > +     Also search for the programs that are to be run.  */
> > > +
> > > +  argbuf.safe_push (0);
> > > +
> > > +  commands[0].prog = argbuf[0]; /* first command.  */
> > > +  commands[0].argv = argbuf.address ();
> > > +
> > > +  split_commands (argbuf_p, n_commands, commands);
> > > +
> > > +  *n = n_commands;
> > > +  return commands;
> > > +}
> > > +
> > > +/* Execute the command specified by the arguments on the current line of spec.
> > > +   When using pipes, this includes several piped-together commands
> > > +   with `|' between them.
> > > +
> > > +   Return 0 if successful, -1 if failed.  */
> > > +
> > > +static int
> > > +execute (void)
> > > +{
> > > +  struct pex_obj *pex;
> > > +  struct command *commands;     /* each command buffer with program to call
> > > +                                   and arguments.  */
> > > +  int n_commands;               /* # of command.  */
> > > +  int ret = 0;
> > > +
> > > +  struct command additional_ld = {NULL, NULL};
> > > +  extra_arg_storer storer;
> > > +
> > > +  struct command *commands_batch;
> > > +  int n;
> > > +
> > > +  gcc_assert (!processing_spec_function);
> > > +
> > > +  if (wrapper_string)
> > > +    {
> > > +      char *string = find_a_file (&exec_prefixes, argbuf[0], X_OK, false);
> > > +      if (string)
> > > +       argbuf[0] = string;
> > > +      insert_wrapper (wrapper_string);
> > > +    }
> > > +
> > > +  /* Parse the argbuf into several commands.  */
> > > +  commands = parse_argbuf (&argbuf, &n_commands);
> > > +
> > > +  if (!have_S && !have_E && flag_parallel_jobs)
> > > +    append_split_outputs (&storer, &additional_ld, &commands, &n_commands);
> > > +
> > > +  if (!wrapper_string)
> > > +    {
> > > +      char *string = find_a_file (&exec_prefixes, commands[0].prog,
> > > +                                 X_OK, false);
> > > +      if (string)
> > > +       commands[0].argv[0] = string;
> > > +    }
> > > +
> > > +  /* If -v, print what we are about to do, and maybe query.  */
> > > +
> > > +  if (verbose_flag)
> > > +    {
> > > +      int ret_verbose = handle_verbose (n_commands, commands);
> > > +      if (ret_verbose > 0)
> > > +       {
> > > +         ret = 0;
> > > +         goto cleanup;
> > > +       }
> > > +    }
> > > +
> > > +#ifdef ENABLE_VALGRIND_CHECKING
> > > +  /* Stack of strings to be released on function return.  */
> > > +  struct obstack to_be_released;
> > > +  obstack_init (&to_be_released);
> > > +  append_valgrind (&to_be_released, n_commands, commands);
> > > +#endif
> > > +
> > > +  /* FIXME: Interact with GNU Jobserver if necessary.  */
> > > +
> > > +  commands_batch = commands;
> > > +  n = flag_parallel_jobs? 1: n_commands;
> > > +
> > > +  for (int i = 0; i < n_commands; i += n)
> > > +    {
> > > +      /* Run each piped subprocess.  */
> > > +
> > > +      pex = pex_init (PEX_USE_PIPES | ((report_times || report_times_to_file)
> > > +                                      ? PEX_RECORD_TIMES : 0),
> > > +                     progname, temp_filename);
> > > +      if (pex == NULL)
> > > +       fatal_error (input_location, "%<pex_init%> failed: %m");
> > > +
> > > +      /* Lauch the commands.  */
> > > +      async_launch_commands (pex, n, commands_batch);
> > > +
> > > +      /* Await them to be done.  */
> > > +      ret |= await_commands_to_finish (pex, n, commands_batch);
> > > +
> > > +      commands_batch = commands_batch + n;
> > > +
> > > +      /* Cleanup.  */
> > > +      pex_free (pex);
> > > +    }
> > > +
> > > +
> > > +  if (ret != 0)
> > > +    goto cleanup;
> > > +
> > > +  /* Run extra ld call.  */
> > > +  if (!EMPTY_CMD (additional_ld))
> > > +    {
> > > +      /* If we are here, we must be sure that we had at least two object
> > > +        files to link.  */
> > > +      //gcc_assert (n_commands != 1);
> > > +
> > > +      pex = pex_init (PEX_USE_PIPES | ((report_times || report_times_to_file)
> > > +                                      ? PEX_RECORD_TIMES : 0),
> > > +                     progname, temp_filename);
> > > +
> > > +      if (verbose_flag)
> > > +       print_command (&additional_ld);
> > > +
> > > +      async_launch_commands (pex, 1, &additional_ld);
> > > +      ret = await_commands_to_finish (pex, 1, &additional_ld);
> > > +      pex_free (pex);
> > > +    }
> > > +
> > > +
> > > +#ifdef ENABLE_VALGRIND_CHECKING
> > > +  obstack_free (&to_be_released, NULL);
> > > +#endif
> > > +
> > > +cleanup:
> > > +  if (commands[0].argv[0] != commands[0].prog)
> > > +    free (CONST_CAST (char *, commands[0].argv[0]));
> > > +
> > > +  free (commands);
> > > +
> > > +  return ret;
> > >  }
> > > +
> > >
> > >  /* Find all the switches given to us
> > >     and make a vector describing them.
> > > @@ -3480,29 +4106,33 @@ static int n_switches_alloc_debug_check[2];
> > >
> > >  static char *debug_check_temp_file[2];
> > >
> > > -/* Language is one of three things:
> > > -
> > > -   1) The name of a real programming language.
> > > -   2) NULL, indicating that no one has figured out
> > > -   what it is yet.
> > > -   3) '*', indicating that the file should be passed
> > > -   to the linker.  */
> > > -struct infile
> > > +static const char *
> > > +fsplit_arg (extra_arg_storer *storer)
> > >  {
> > > -  const char *name;
> > > -  const char *language;
> > > -  struct compiler *incompiler;
> > > -  bool compiled;
> > > -  bool preprocessed;
> > > -};
> > > +  const char *tempname = make_temp_file ("additional-asm");
> > > +  const char arg[] = "-fsplit-outputs=";
> > > +  char *final;
> > >
> > > -/* Also a vector of input files specified.  */
> > > +  size_t n = ARRAY_SIZE (arg) + strlen (tempname);
> > >
> > > -static struct infile *infiles;
> > > +  gcc_assert (current_infile);
> > >
> > > -int n_infiles;
> > > +  current_infile->temp_additional_asm = tempname;
> > > +
> > > +  /* Remove file, once we may not even need it and create it later.  */
> > > +  /* FIXME: This is a little hackish.  */
> > > +  remove (tempname);
> > > +
> > > +  final = storer->create_string (n);
> > > +
> > > +  strcpy (final, arg);
> > > +  strcat (final, tempname);
> > > +
> > > +  record_temp_file (tempname, true, true);
> > > +
> > > +  return final;
> > > +}
> > >
> > > -static int n_infiles_alloc;
> > >
> > >  /* True if undefined environment variables encountered during spec processing
> > >     are ok to ignore, typically when we're running for --help or --version.  */
> > > @@ -3683,6 +4313,8 @@ alloc_infile (void)
> > >      {
> > >        n_infiles_alloc = 16;
> > >        infiles = XNEWVEC (struct infile, n_infiles_alloc);
> > > +      memset (infiles, 0x00, sizeof (*infiles) * n_infiles_alloc);
> > > +
> > >      }
> > >    else if (n_infiles_alloc == n_infiles)
> > >      {
> > > @@ -4648,6 +5280,9 @@ process_command (unsigned int decoded_options_count,
> > >        switch (decoded_options[j].opt_index)
> > >         {
> > >         case OPT_S:
> > > +         have_S = 1;
> > > +         have_c = 1;
> > > +         break;
> > >         case OPT_c:
> > >         case OPT_E:
> > >           have_c = 1;
> > > @@ -6155,11 +6790,14 @@ do_spec_1 (const char *spec, int inswitch, const char *soft_matched_part)
> > >                   open_at_file ();
> > >
> > >                 for (i = 0; (int) i < n_infiles; i++)
> > > -                 if (compile_input_file_p (&infiles[i]))
> > > -                   {
> > > -                     store_arg (infiles[i].name, 0, 0);
> > > -                     infiles[i].compiled = true;
> > > -                   }
> > > +                 {
> > > +                   current_infile = &infiles[i];
> > > +                   if (compile_input_file_p (current_infile))
> > > +                     {
> > > +                       store_arg (current_infile->name, 0, 0);
> > > +                       current_infile->compiled = true;
> > > +                     }
> > > +                 }
> > >
> > >                 if (at_file_supplied)
> > >                   close_at_file ();
> > > @@ -6515,7 +7153,7 @@ do_spec_1 (const char *spec, int inswitch, const char *soft_matched_part)
> > >                      "%{foo=*:bar%*}%{foo=*:one%*two}"
> > >
> > >                    matches -foo=hello then it will produce:
> > > -
> > > +
> > >                      barhello onehellotwo
> > >                 */
> > >                 if (*p == 0 || *p == '}')
> > > @@ -8642,6 +9280,7 @@ driver::do_spec_on_infiles () const
> > >    for (i = 0; (int) i < n_infiles; i++)
> > >      {
> > >        int this_file_error = 0;
> > > +      current_infile = &infiles[i];
> > >
> > >        /* Tell do_spec what to substitute for %i.  */
> > >
> > > @@ -8761,12 +9400,15 @@ driver::do_spec_on_infiles () const
> > >        int i;
> > >
> > >        for (i = 0; i < n_infiles ; i++)
> > > -       if (infiles[i].incompiler
> > > -           || (infiles[i].language && infiles[i].language[0] != '*'))
> > > -         {
> > > -           set_input (infiles[i].name);
> > > -           break;
> > > -         }
> > > +       {
> > > +         current_infile = &infiles[i];
> > > +         if (infiles[i].incompiler
> > > +             || (infiles[i].language && infiles[i].language[0] != '*'))
> > > +           {
> > > +             set_input (infiles[i].name);
> > > +             break;
> > > +           }
> > > +       }
> > >      }
> > >
> > >    if (!seen_error ())
> > > @@ -8788,11 +9430,31 @@ driver::maybe_run_linker (const char *argv0) const
> > >    int linker_was_run = 0;
> > >    int num_linker_inputs;
> > >
> > > -  /* Determine if there are any linker input files.  */
> > > -  num_linker_inputs = 0;
> > > -  for (i = 0; (int) i < n_infiles; i++)
> > > -    if (explicit_link_files[i] || outfiles[i] != NULL)
> > > -      num_linker_inputs++;
> > > +  /* Set outfiles to be the temporary object vector.  */
> > > +  const char **outfiles_holder = outfiles;
> > > +  int n_infiles_holder = n_infiles;
> > > +  bool outfiles_switched = false;
> > > +  if (temp_object_files.length () > 0)
> > > +    {
> > > +      /* Insert explicit link files into the temp object vector.  */
> > > +
> > > +      for (i = 0; (int) i < n_infiles; i++)
> > > +       if (explicit_link_files[i] && outfiles[i] != NULL)
> > > +         temp_object_files.safe_push (outfiles[i]);
> > > +
> > > +      num_linker_inputs = n_infiles = temp_object_files.length ();
> > > +      temp_object_files.safe_push (NULL); /* the NULL sentinel.  */
> > > +      outfiles = temp_object_files.address ();
> > > +    }
> > > +  else /* Fall back to the old method.  */
> > > +    {
> > > +
> > > +      /* Determine if there are any linker input files.  */
> > > +      num_linker_inputs = 0;
> > > +      for (i = 0; (int) i < n_infiles; i++)
> > > +       if (explicit_link_files[i] || outfiles[i] != NULL)
> > > +         num_linker_inputs++;
> > > +    }
> > >
> > >    /* Arrange for temporary file names created during linking to take
> > >       on names related with the linker output rather than with the
> > > @@ -8897,14 +9559,24 @@ driver::maybe_run_linker (const char *argv0) const
> > >      }
> > >
> > >    /* If options said don't run linker,
> > > -     complain about input files to be given to the linker.  */
> > > +     complain about input files to be given to the linker.
> > > +     When fsplit-arg is active, the linker will run and this if
> > > +     will not be triggered.  */
> > >
> > > -  if (! linker_was_run && !seen_error ())
> > > +  if (!outfiles_switched && !linker_was_run && !seen_error ()
> > > +      && temp_object_files.length () == 0)
> > >      for (i = 0; (int) i < n_infiles; i++)
> > >        if (explicit_link_files[i]
> > >           && !(infiles[i].language && infiles[i].language[0] == '*'))
> > >         warning (0, "%s: linker input file unused because linking not done",
> > >                  outfiles[i]);
> > > +
> > > +  if (outfiles_switched)
> > > +    {
> > > +      /* Undo our changes.  */
> > > +      outfiles = outfiles_holder;
> > > +      n_infiles = n_infiles_holder;
> > > +    }
> > >  }
> > >
> > >  /* The end of "main".  */
> > > @@ -10808,6 +11480,7 @@ driver::finalize ()
> > >    linker_options.truncate (0);
> > >    assembler_options.truncate (0);
> > >    preprocessor_options.truncate (0);
> > > +  temp_object_files.truncate (0);
> > >
> > >    path_prefix_reset (&exec_prefixes);
> > >    path_prefix_reset (&startfile_prefixes);
> > > diff --git a/gcc/testsuite/driver/a.c b/gcc/testsuite/driver/a.c
> > > new file mode 100644
> > > index 00000000000..c6b8c2eb61e
> > > --- /dev/null
> > > +++ b/gcc/testsuite/driver/a.c
> > > @@ -0,0 +1,6 @@
> > > +int puts (const char *);
> > > +
> > > +void a_func (void)
> > > +{
> > > +  puts ("A test");
> > > +}
> > > diff --git a/gcc/testsuite/driver/b.c b/gcc/testsuite/driver/b.c
> > > new file mode 100644
> > > index 00000000000..76a2cba0bd9
> > > --- /dev/null
> > > +++ b/gcc/testsuite/driver/b.c
> > > @@ -0,0 +1,6 @@
> > > +int puts (const char *);
> > > +
> > > +void a_func (void)
> > > +{
> > > +  puts ("Another test");
> > > +}
> > > diff --git a/gcc/testsuite/driver/driver.exp b/gcc/testsuite/driver/driver.exp
> > > new file mode 100644
> > > index 00000000000..2bbaf07778a
> > > --- /dev/null
> > > +++ b/gcc/testsuite/driver/driver.exp
> > > @@ -0,0 +1,80 @@
> > > +#   Copyright (C) 2008-2020 Free Software Foundation, Inc.
> > > +
> > > +# This program is free software; you can redistribute it and/or modify
> > > +# it under the terms of the GNU General Public License as published by
> > > +# the Free Software Foundation; either version 3 of the License, or
> > > +# (at your option) any later version.
> > > +#
> > > +# This program is distributed in the hope that it will be useful,
> > > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > > +# GNU General Public License for more details.
> > > +#
> > > +# You should have received a copy of the GNU General Public License
> > > +# along with GCC; see the file COPYING3.  If not see
> > > +# <http://www.gnu.org/licenses/>.
> > > +
> > > +# GCC testsuite that uses the `dg.exp' driver.
> > > +
> > > +# Load support procs.
> > > +load_lib gcc-dg.exp
> > > +
> > > +proc check-for-errors { test input } {
> > > +    if { [string equal "$input" ""] } then {
> > > +       pass "$test: std out"
> > > +    } else {
> > > +       fail "$test: std out\n$input"
> > > +    }
> > > +}
> > > +
> > > +if ![check_effective_target_pthread] {
> > > +  return
> > > +}
> > > +
> > > +# If a testcase doesn't have special options, use these.
> > > +global DEFAULT_CFLAGS
> > > +if ![info exists DEFAULT_CFLAGS] then {
> > > +    set DEFAULT_CFLAGS " -ansi -pedantic-errors"
> > > +}
> > > +
> > > +# Initialize `dg'.
> > > +dg-init
> > > +
> > > +
> > > +# Test multi-input compilation
> > > +check-for-errors "Multi-input Compilation" \
> > > +       [gcc_target_compile "$srcdir/$subdir/a.c $srcdir/$subdir/b.c -c" "" none ""]
> > > +
> > > +# Compile file and generate an assembler and object file
> > > +check-for-errors "Object Generation" \
> > > +       [gcc_target_compile "$srcdir/$subdir/a.c -c" "a.o" none ""]
> > > +check-for-errors "Object Generation" \
> > > +       [gcc_target_compile "$srcdir/$subdir/b.c -c" "a.o" none ""]
> > > +check-for-errors "Assembler Generation" \
> > > +       [gcc_target_compile "$srcdir/$subdir/a.c -S" "a.S" none ""]
> > > +check-for-errors "Assembler Generation" \
> > > +       [gcc_target_compile "$srcdir/$subdir/b.c -S" "b.S" none ""]
> > > +
> > > +# Empty file is a valid program
> > > +check-for-errors "Empty Program" \
> > > +       [gcc_target_compile "$srcdir/$subdir/empty.c -c" "empty.o" none ""]
> > > +
> > > +# Test object file passthrough
> > > +check-for-errors "Object file passthrough" \
> > > +       [gcc_target_compile "$srcdir/$subdir/foo.c a.o" "a.exe" none ""]
> > > +
> > > +# Test compilation when assembler is provided
> > > +check-for-errors "Assembler with Macros" \
> > > +       [gcc_target_compile "a.S -c" "a.o" none ""]
> > > +
> > > +# Clean temporary generated files.
> > > +set temp_files {"a.o" "a.S" "b.o" "b.S" "empty.o"}
> > > +
> > > +foreach f $temp_files {
> > > +       if { [file exists $f] } {
> > > +               file delete $f
> > > +       }
> > > +}
> > > +
> > > +# All done.
> > > +dg-finish
> > > diff --git a/gcc/testsuite/driver/empty.c b/gcc/testsuite/driver/empty.c
> > > new file mode 100644
> > > index 00000000000..e69de29bb2d
> > > diff --git a/gcc/testsuite/driver/foo.c b/gcc/testsuite/driver/foo.c
> > > new file mode 100644
> > > index 00000000000..a18fd2a3b14
> > > --- /dev/null
> > > +++ b/gcc/testsuite/driver/foo.c
> > > @@ -0,0 +1,7 @@
> > > +void a_func (void);
> > > +
> > > +int main()
> > > +{
> > > +  a_func ();
> > > +  return 0;
> > > +}
> > > --
> > > 2.28.0
> > >

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/6] Parallelize Intra-Procedural Optimizations using the LTO Engine.
  2020-08-24 18:38       ` Giuliano Belinassi
@ 2020-08-25  7:03         ` Richard Biener
  0 siblings, 0 replies; 31+ messages in thread
From: Richard Biener @ 2020-08-25  7:03 UTC (permalink / raw)
  To: Giuliano Belinassi; +Cc: Josh Triplett, GCC Patches

On Mon, Aug 24, 2020 at 8:39 PM Giuliano Belinassi via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Ho, Josh.
>
> On 08/24, Josh Triplett wrote:
> > On Sat, Aug 22, 2020 at 06:04:48PM -0300, Giuliano Belinassi wrote:
> > > Hi, Josh
> > >
> > > On 08/21, Josh Triplett wrote:
> > > > On Thu, Aug 20, 2020 at 07:00:13PM -0300, Giuliano Belinassi wrote:
> > > > > This patch series add a new flag "-fparallel-jobs=" to control if the
> > > > > compiler should try to compile the current file in parallel.
> > > > [...]
> > > > > Bootstrapped and Regtested on Linux x86_64.
> > > > >
> > > > > Giuliano Belinassi (6):
> > > > >   Modify gcc driver for parallel compilation
> > > > >   Implement a new partitioner for parallel compilation
> > > > >   Implement fork-based parallelism engine
> > > > >   Add `+' for Jobserver Integration
> > > > >   Add invoke documentation
> > > > >   New tests for parallel compilation feature
> > > >
> > > > Very nice!
> > >
> > > Thank you for your interest in this :)
> > >
> > > >
> > > > I'm interested in testing this on a highly parallel system. What
> > > > baseline do these patches apply to?  They don't seem to apply to GCC
> > > > trunk.
> > >
> > > Hummm, this was supposed to work on trunk out of the box. However,
> > > there is a high probability that I messed up something while rebasing.
> > > I will post a version 2 of it when I get more comments and when I fix
> > > the Makefile issue that Joseph pointed out in other e-mail.
> > >
> > > If you want to test it on a high parallel system, I think it will be
> > > cool to see how it behaves also when --param=promote-statics=1, as it
> > > increases parallelism opportunity. :)
> >
> > I plan to try several variations, including that.
> >
> > I'd like to see how it affects the performance of Linux kernel builds.
>
> Well, I expect little to no impact on that.  I ran an experiment back
> on 2018 looking for parallelism bottleneck in Kernel, and what I found
> was that the developers did a good job on balancing the file sizes.
>
> This was run on a machine with 4x AMD Opteron CPUs, (64 cores in total)
> https://www.ime.usp.br/~belinass/64cores-kernel-experiment.svg
>
> As you can see from this image, the jobs ends almost at the same time.
>
> >
> > > > Also, I tried to bootstrap the current tip of the devel/autopar_devel
> > > > branch, but ended up with compiler segfaults that all look like this:
> > > > ../../gcc/zlib/compress.c:86:1: internal compiler error: Segmentation fault
> > > >    86 | }
> > > >       | ^
> > >
> > > Well, there was once a bug in this branch when compiling with -flto that
> > > caused the assembler output file not to be properly initialized early
> > > enough, resulting in LTO LGEN stage writing into a invalid FILE pointer.
> > > I fixed this during rebasing but I forgot to push to the autopar_devel
> > > branch. In any case, I just pushed the recent changes to autopar_devel
> > > which fix this issue.
> >
> > That might explain the problem; I had tried to build gcc with the
> > bootstrap-lto configuration.
> >
> > > In any case, -fparallel-jobs= should NOT be used together with -flto.
> > > Although I used part of the LTO engine for development of this feature,
> > > they are meant for distinct things. I guess I should give a warning
> > > about that in next version :)
> >
> > Interesting. Is that something that could change in the future? I'd like
> > to be able to get some parallelism when creating the object files, and
> > then more parallelism when doing the final LTO link.
>
> Well, if by "final LTO link" you mean LTO's Whole Program Analysis,
> that is a quite challenging task to parallelize :)
>
> As for the "creating object files", you mean the LTO LGEN, I think
> it is not possible for now because -- as far as I understeand --, LTO
> object files are just containers for a intermediate language and
> does not support partial linking.

It was designed to allow partially linked LTO IR files but IIRC support
for this may have been rotten a bit.  But since most of the compile time
for LTO LGEN is spent in the frontends parsing the code (and that's
incredibly hard if not impossible to parallelize), splitting this task
is not going to bring much improvements.

> However, I would not expect LGEN bottlenecking compilation of any
> project. Most compilation time is spent in optimization, that is
> IPA and Intra-Procedural.

Indeed.

Btw, you can "emulate" what -fparallel-jobs=N does via

> gcc -c t.c -o t.il.o -flto -fno-fat-lto-objects
> gcc -o t.o t.il.o -r -flinker-output=nolto-rel -flto=N

with the twist that the partitioning done by the LTO link step
might not be exactly the same as the one done by -fparallel-jobs
(surprisingly we needed a different partitioner).  What -fparallel-jobs
improves over the manual -flto way above is that it completely
elides LTO IR streaming but otherwise it operates in the same
manner.

There's a regression with -fparallel-jobs when you use -g which
we still need to address since with -fparallel-jobs you get
duplicate DWARF for most of the "early" source-level debug info.

I guess for the final report of the GSoC project it would be nice
to include the two-step -flto "paralellization" in the tables comparing
the compile-speed.  At least for gimple-match.o it provided a
reasonable speedup (wall-clock) as well.

> >
> > > Also, I just tested bootstrap with
> > >
> > > ../gcc/configure --disable-multilib --enable-languages=c,c++
> > >
> > > on x86_64 linux and it is working.
> >
> > I'd used --enable-multilib, and --enable-languages=c,c++,lto . Would
> > that be expected to work?
>
> Yes. If it doesn't, that is a bug :)
>
> >
> > Thanks,
> > Josh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/6] Implement a new partitioner for parallel compilation
  2020-08-20 22:00 ` [PATCH 2/6] Implement a new partitioner " Giuliano Belinassi
@ 2020-08-27 15:18   ` Jan Hubicka
  2020-08-27 21:42     ` Giuliano Belinassi
  2020-08-31  9:25   ` Richard Biener
  1 sibling, 1 reply; 31+ messages in thread
From: Jan Hubicka @ 2020-08-27 15:18 UTC (permalink / raw)
  To: Giuliano Belinassi; +Cc: gcc-patches, richard.guenther

> When using the LTO infrastructure to compile files in parallel, we
> can't simply use any of the LTO partitioner, once extra dependency
> analysis is required to ensure that some nodes are correctly
> partitioned together.
> 
> Therefore, here we implement a new partitioner called
> "lto_merge_comdat_map" that does all these required analysis.
> The partitioner works as follows:
> 
> 1. We create a number of disjoint sets and inserts each node into a
>    separate set, which may be merged together in the future.
> 
> 2. Find COMDAT groups, and mark them to be partitioned together.
> 
> 3. Check all nodes that would require any COMDAT group to be
>    copied to its partition (which we name "COMDAT frontier"),
>    and mark them to be partitioned together.
>    This avoids duplication of COMDAT groups and crashes on the LTO
>    partitioning infrastructure.

What kind of crash you get here?
> 
> 4. Check if the user allows the partitioner to promote non-public
>    functions or variables to global to improve parallelization
>    opportunity with a cost of modifying the output code layout.
> 
> 5. Balance generated partitions for performance unless not told to.
> 
> The choice of 1. was by design, so we could use a union-find
> data structure, which are know for being very fast on set unite
> operations.

In LTO partitioner the groups of objects that "must go toghether"
are discovered when first object is placed into the partition (via
add_to_partition) because with the LTO rules it is always possible to
discover all members from the group starting from any random element via
graph walking.

I guess it is same with your partitioner?  I basically wonder how much
code can be shared and what needs to be duplicated.
It is not very nice to have partitioning implemented twice - it is bit
subtle problem when it comes to details so I would be happier if we
brought in the lto/lto-partition.c to middle end and updaed/cleaned it
up as needed.
> 
> For 3. to work properly, we also had to modify
> lto_promote_cross_file_statics to handle this case.
> 
> The parameters --param=promote-statics and --param=balance-partitions
> control 4. and 5., respectively
> 
> gcc/ChangeLog:
> 2020-08-20  Giuliano Belinassi  <giuliano.belinassi@usp.br>
> 
> 	* Makefile.in: Add lto-partition.o
> 	* cgraph.h (struct symtab_node::aux2): New variable.
> 	* lto-partition.c: Move from gcc/lto/lto-partition.c
> 	(add_symbol_to_partition_1): Only compute insn size
> 	if information is available.
> 	(node_cmp): Same as above.
> 	(class union_find): New.
> 	(ds_print_roots): New function.
> 	(balance_partitions): New function.
> 	(build_ltrans_partitions): New function.
> 	(merge_comdat_nodes): New function.
> 	(merge_static_calls): New function.
> 	(merge_contained_symbols): New function.
> 	(lto_merge_comdat_map): New function.
> 	(privatize_symbol_name_1): Handle when WPA is not enabled.
> 	(privatize_symbol_name): Same as above.
> 	(lto_promote_cross_file_statics): New parameter to select when
> 	to promote to global.
> 	(lto_check_usage_from_other_partitions): New function.
> 	* lto-partition.h: Move from gcc/lto/lto-partition.h
> 	(lto_promote_cross_file_statics): Update prototype.
> 	(lto_check_usage_from_other_partitions): Declare.
> 	(lto_merge_comdat_map): Declare.
> 
> gcc/lto/ChangeLog:
> 2020-08-20  Giuliano Belinassi  <giuliano.belinassi@usp.br>
> 
> 	* lto-partition.c: Move to gcc/lto-partition.c.
> 	* lto-partition.h: Move to gcc/lto-partition.h.
> 	* lto.c: Update call to lto_promote_cross_file_statics.
> 	* Makefile.in: Remove lto-partition.o.
> ---
>  gcc/Makefile.in               |   1 +
>  gcc/cgraph.h                  |   1 +
>  gcc/{lto => }/lto-partition.c | 463 +++++++++++++++++++++++++++++++++-
>  gcc/{lto => }/lto-partition.h |   4 +-
>  gcc/lto/Make-lang.in          |   4 +-
>  gcc/lto/lto.c                 |   2 +-
>  gcc/params.opt                |   8 +
>  gcc/tree.c                    |  23 +-
>  8 files changed, 489 insertions(+), 17 deletions(-)
>  rename gcc/{lto => }/lto-partition.c (78%)
>  rename gcc/{lto => }/lto-partition.h (89%)
> 
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 79e854aa938..be42b15f4ff 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1457,6 +1457,7 @@ OBJS = \
>  	lra-spills.o \
>  	lto-cgraph.o \
>  	lto-streamer.o \
> +	lto-partition.o \
>  	lto-streamer-in.o \
>  	lto-streamer-out.o \
>  	lto-section-in.o \
> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> index 0211f08964f..b4a7871bd3d 100644
> --- a/gcc/cgraph.h
> +++ b/gcc/cgraph.h
> @@ -615,6 +615,7 @@ public:
>    struct lto_file_decl_data * lto_file_data;
>  
>    PTR GTY ((skip)) aux;
> +  int aux2;

We do not really want to add more pass specific data to the symbol table
(since it is critical wrt memory use during WPA stage).
It is possible to attach extra info using the symbol-summary.h
>  
>    /* Comdat group the symbol is in.  Can be private if GGC allowed that.  */
>    tree x_comdat_group;
> diff --git a/gcc/lto/lto-partition.c b/gcc/lto-partition.c
> similarity index 78%
> rename from gcc/lto/lto-partition.c
> rename to gcc/lto-partition.c
> index 8e0488ab13e..ca962e69b5d 100644
> --- a/gcc/lto/lto-partition.c
> +++ b/gcc/lto-partition.c
> @@ -170,7 +170,11 @@ add_symbol_to_partition_1 (ltrans_partition part, symtab_node *node)
>      {
>        struct cgraph_edge *e;
>        if (!node->alias && c == SYMBOL_PARTITION)
> -	part->insns += ipa_size_summaries->get (cnode)->size;
> +	{
> +	  /* FIXME: Find out why this is being returned NULL in some cases.  */
> +	  if (ipa_size_summaries->get (cnode))
> +	    part->insns += ipa_size_summaries->get (cnode)->size;

It returns NULL when symbol summary is not available.  Either symbol is
not defined (in which case it should not be SYMBOL_PARTITION) or it is
not computed (probaly because of -O0?)
> +/* Quickly balance partitions, trying to reach target_size in each of
> +   them.  Returns true if something was done, or false if we decided
> +   that it is not worth.  */
> +
> +static bool
> +balance_partitions (union_find *ds, int n, int jobs)

Generally options should be documented, so I would learn what N means ;)
> +{
> +  int *sizes, i, j;
> +  int total_size = 0, max_size = -1;
> +  int target_size;
> +  const int eps = 0;
> +
> +  symtab_node *node;
> +
> +  sizes = (int *) alloca (n * sizeof (*sizes));
> +  memset (sizes, 0, n * sizeof (*sizes));

And we avoid using alloca for arrays that can grow over stack limit.
I assume this has the size of symbol table, which means that you want to
use vec.h API and allocae on heap.
> + 
> +  /* Compute costs.  */
> +  i = 0;
> +  FOR_EACH_SYMBOL (node)
> +    {
> +      int root = ds->find (i);
> +
> +      if (cgraph_node *cnode = dyn_cast<cgraph_node *> (node))
> +	{
> +	  ipa_size_summary *summary = ipa_size_summaries->get (cnode);
> +	  if (summary)
> +	    sizes[root] += summary->size;
> +	  else
> +	    sizes[root] += 10;
> +	}
> +      else
> +	sizes[root] += 10;

Do you have testcases where summary is mising?
> +
> +
> +      i++;
> +    }
> +
> +  /* Compute the total size and maximum size.  */
> +  for (i = 0; i < n; ++i)
> +    {
> +      total_size += sizes[i];
> +      max_size    = MAX (max_size, sizes[i]);

Also i think we should start using 64bit values for total sizes of
units. In some extreme cases they already get close to overflow.
> +    }
> +
> +  /* Quick return if total size is small.  */
> +  if (total_size < param_min_partition_size)
> +    return false;
> +
> +  target_size = total_size / (jobs + 1);
> +
> +  /* Unite small partitions.  */
> +  for (i = 0, j = 0; j < n; ++j)
> +    {
> +      if (sizes[j] == 0)
> +	continue;
> +
> +      if (i == -1)
> +	i = j;
> +      else
> +	{
> +	  if (sizes[i] + sizes[j] < target_size + eps)
> +	    {
> +	      ds->unite (i, j);
> +	      sizes[i] += sizes[j];
> +	      sizes[j] = 0;
> +	    }
> +	  else
> +	      i = j;
> +	}
> +    }
> +  return true;

Note that partitioning is not free, since reference to static var or a
call from one partition to another will lead to more expensive
relocations/instructions to be used (since gas will not be able to relax
it to IP relative addressing where available).

For that reason LTO partitioner has the logic tracking boundary and
minimizing it.  I think we should merge them and also cleanup
lto/lto-partition.c - the partitioner was one late night experiment that
I intended as a first proof of concept to be rewritten later, but we
never got into impementing anything much smarter. On the other hand we
added a lot of extra hacks to it (order preserving and other things), so
it deserves some TLC.

Also I think you unite partitions in the FOR_EACH_* order that is not
really meaningful, the code layout is controlled by node->order values.

> +}
> +
> +/* Builds the LTRANS partitions, or return if not needed.  */
> +
> +static int
> +build_ltrans_partitions (union_find *ds, int n)
> +{
> +  int i, n_partitions;
> +  symtab_node *node;
> +
> +  int *compression = (int *) alloca (n * sizeof (*compression));
> +  for (i = 0; i < n; ++i)
> +    compression[i] = -1; /* Invalid value.  */
> +
> +  i = 0, n_partitions = 0;
> +  FOR_EACH_SYMBOL (node)
> +    {
> +      int root = ds->find (i);
> +      node->aux2 = root;
> +      node->aux = NULL;
> +
> +      if (node->get_partitioning_class () == SYMBOL_PARTITION
> +	  && compression[root] < 0)
> +	compression[root] = n_partitions++;
> +      i++;
> +    }

What exactly compression is used for?
> +
> +  if (dump_file)
> +    fprintf (dump_file, "n_partitions = %d\n", n_partitions);
> +
> +  if (n_partitions <= 1)
> +    return false;
> +
> +  /* Create LTRANS partitions.  */
> +  ltrans_partitions.create (n_partitions);
> +  for (i = 0; i < n_partitions; i++)
> +    new_partition ("");
> +
> +  FOR_EACH_SYMBOL (node)
> +    {
> +      if (node->get_partitioning_class () != SYMBOL_PARTITION
> +	  || symbol_partitioned_p (node))
> +	  continue;
> +
> +      int p = compression[node->aux2];
> +      if (dump_file)
> +	fprintf (dump_file, "p = %d\t;; %s\n", p, node->dump_name ());
> +      add_symbol_to_partition (ltrans_partitions[p], node);
> +    }
> +
> +  return true;
> +}
> +
> +/* Partition COMDAT groups together, and also bring together nodes that
> +   requires them. Such nodes that are not in the COMDAT group that have
> +   references to COMDAT grouped nodes are called the COMDAT frontier.  */
> +
> +static bool
> +merge_comdat_nodes (symtab_node *node, int set)
> +{
> +  enum symbol_partitioning_class c = node->get_partitioning_class ();
> +  bool ret = false;
> +  symtab_node *node1;
> +  cgraph_edge *e;
> +
> +  /* If node is already analysed, quickly return.  */
> +  if (node->aux)
> +    return false;
> +
> +  /* Mark as analysed.  */
> +  node->aux = (void *) 1;
> +
> +
> +  /* Aglomerate the COMDAT group into the same partition.  */
> +  if (node->same_comdat_group)
> +    {
> +      for (node1 = node->same_comdat_group;
> +	   node1 != node; node1 = node1->same_comdat_group)
> +	if (!node->alias)
> +	  {
> +	    ds->unite (node1->aux2, set);
> +	    merge_comdat_nodes (node1, set);
> +	    ret = true;
> +	  }
> +    }
> +
> +  /* Look at nodes that can reach the COMDAT group, and aglomerate to the
> +     same partition.  These nodes are called the "COMDAT Frontier".  The
> +     idea is that every unpartitioned node that reaches a COMDAT group MUST
> +     go through the COMDAT frontier before reaching it.  Therefore, only
> +     nodes in the frontier are exported.  */
> +  if (node->same_comdat_group || c == SYMBOL_DUPLICATE)
> +    {
> +      int i;
> +      struct ipa_ref *ref = NULL;
> +
> +      if (cgraph_node *cnode = dyn_cast <cgraph_node *> (node))
> +	{
> +	  /* Add all inline clones and callees that are duplicated.  */
> +	  for (e = cnode->callers; e; e = e->next_caller)
> +	    if (!e->inline_failed || c == SYMBOL_DUPLICATE)
> +	      {
> +		ds->unite (set, e->caller->aux2);
> +		merge_comdat_nodes (e->caller, set);
> +		ret = true;
> +	      }
> +
> +	  /* Add all thunks associated with the function.  */
> +	  for (e = cnode->callees; e; e = e->next_callee)
> +	    if (e->caller->thunk.thunk_p && !e->caller->inlined_to)
> +	      {
> +		ds->unite (set, e->callee->aux2);
> +		merge_comdat_nodes (e->callee, set);
> +		ret = true;
> +	      }
> +	}

I do not think it is strictly necessary to merge comdat function with
all users. If the comdat is some easy accessor this may prevent a lot of
merging.

All you need to do is to place it into one of partitins and mark set
"used" flag on the symbols used in other partitions, so it does not get
optimized away when unused.  Other partitions should reffer to it w/o
problems.

I am not sure what exactly the goal is here about not changing code
layout.

> +/* Partition the program into several partitions with a restriction that
> +   COMDATS are partitioned together with all nodes requiring them.  If
> +   promote_statics is false, we also partition together static functions
> +   and nodes that call eachother, so non-public functions are not promoted
> +   to globals.  */
> +
> +void
> +lto_merge_comdat_map (bool balance, bool promote_statics, int jobs)

BTW I think it is odd name for partitioner. Comdat is just one of
problems it deals with. But I would still like to see this merged with
the lto partitioning logic, so we could also use all kinds of
partitioners in both modes.

It all looks quite nice, but lets work on avoiding the code duplication
here...

Honza

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/6] Implement fork-based parallelism engine
  2020-08-20 22:00 ` [PATCH 3/6] Implement fork-based parallelism engine Giuliano Belinassi
@ 2020-08-27 15:25   ` Jan Hubicka
  2020-08-27 15:37   ` Jan Hubicka
  1 sibling, 0 replies; 31+ messages in thread
From: Jan Hubicka @ 2020-08-27 15:25 UTC (permalink / raw)
  To: Giuliano Belinassi; +Cc: gcc-patches, richard.guenther

> We also implemented a GNU Make Jobserver integration to this mechanism,
> as implemented in jobserver.cc. This works as follows:
> diff --git a/gcc/jobserver.cc b/gcc/jobserver.cc
> new file mode 100644
> index 00000000000..8cb374de86e
> --- /dev/null
> +++ b/gcc/jobserver.cc

I wonder if this can go in separately and be first used to trottle down
the number of streaming processes in WPA?

See TODO at the beggining of do_whole_program_analysis and
the logic in stream_out_partitions.  Adding your API to take tokens for
every new partition being streamed (with exception of first one) is
probably very easy.

Note that there is also logic for detecting jobserv in toplev.c that may
be merged with your logic.
In longer run I think your jobserv.cc fits more to liberty, but we could
have it for GCC only until other tools will want to integreate.

Honza

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/6] Implement fork-based parallelism engine
  2020-08-20 22:00 ` [PATCH 3/6] Implement fork-based parallelism engine Giuliano Belinassi
  2020-08-27 15:25   ` Jan Hubicka
@ 2020-08-27 15:37   ` Jan Hubicka
  2020-08-27 18:27     ` Giuliano Belinassi
  1 sibling, 1 reply; 31+ messages in thread
From: Jan Hubicka @ 2020-08-27 15:37 UTC (permalink / raw)
  To: Giuliano Belinassi; +Cc: gcc-patches, richard.guenther

> diff --git a/gcc/cgraph.c b/gcc/cgraph.c
> index c0b45795059..22405098dc5 100644
> --- a/gcc/cgraph.c
> +++ b/gcc/cgraph.c
> @@ -226,6 +226,22 @@ cgraph_node::delete_function_version_by_decl (tree decl)
>    decl_node->remove ();
>  }
>  
> +/* Release function dominator info if present.  */
> +
> +void
> +cgraph_node::maybe_release_dominators (void)
> +{
> +  struct function *fun = DECL_STRUCT_FUNCTION (decl);
> +
> +  if (fun && fun->cfg)
> +    {
> +      if (dom_info_available_p (fun, CDI_DOMINATORS))
> +	free_dominance_info (fun, CDI_DOMINATORS);
> +      if (dom_info_available_p (fun, CDI_POST_DOMINATORS))
> +	free_dominance_info (fun, CDI_POST_DOMINATORS);
> +    }
> +}

I am not sure if that needs to be member function, but if so we want to
merge it with other places in cgraph.c and cgraphunit.c where dominators
are freed.  I do not think you need to check avalability.
> +
>  /* Record that DECL1 and DECL2 are semantically identical function
>     versions.  */
>  void
> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> index b4a7871bd3d..72ac19f9672 100644
> --- a/gcc/cgraph.h
> +++ b/gcc/cgraph.h
> @@ -463,6 +463,15 @@ public:
>       Return NULL if there's no such node.  */
>    static symtab_node *get_for_asmname (const_tree asmname);
>  
> +  /* Get symtab node by order.  */
> +  static symtab_node *find_by_order (int order);

This is quadratic and moreover seems unused. Why do you add it?
> +
> +  /* Get symtab_node by its name.  */
> +  static symtab_node *find_by_name (const char *);

Similarly here, note that names are not really very meaningful as lookup
things, since they get duplicated.
> +
> +  /* Get symtab_node by its ASM name.  */
> +  static symtab_node *find_by_asm_name (const char *);

For this we have get_for_asmname (which also populates asm name hash as
needed and is not quadratic)
> diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
> index d10d635e942..73e4bed3b61 100644
> --- a/gcc/cgraphunit.c
> +++ b/gcc/cgraphunit.c
> @@ -2258,6 +2258,11 @@ cgraph_node::expand (void)
>  {
>    location_t saved_loc;
>  
> +  /* FIXME: Find out why body-removed nodes are marked for output.  */
> +  if (body_removed)
> +    return;

Indeed, we should know :)
> +
> +
>    /* We ought to not compile any inline clones.  */
>    gcc_assert (!inlined_to);
>  
> @@ -2658,6 +2663,7 @@ ipa_passes (void)
>  
>        execute_ipa_summary_passes
>  	((ipa_opt_pass_d *) passes->all_regular_ipa_passes);
> +
This seems accidental.
>      }
>  
>    /* Some targets need to handle LTO assembler output specially.  */
> @@ -2687,10 +2693,17 @@ ipa_passes (void)
>    if (flag_generate_lto || flag_generate_offload)
>      targetm.asm_out.lto_end ();
>  
> -  if (!flag_ltrans
> +  if (split_outputs)
> +    flag_ltrans = true;
> +
> +  if ((!flag_ltrans || split_outputs)
>        && ((in_lto_p && flag_incremental_link != INCREMENTAL_LINK_LTO)
>  	  || !flag_lto || flag_fat_lto_objects))
>      execute_ipa_pass_list (passes->all_regular_ipa_passes);
> +
> +  if (split_outputs)
> +    flag_ltrans = false;
> +
>    invoke_plugin_callbacks (PLUGIN_ALL_IPA_PASSES_END, NULL);
>  
>    bitmap_obstack_release (NULL);
> @@ -2742,6 +2755,185 @@ symbol_table::output_weakrefs (void)
>        }
>  }
>  
> +static bool is_number (const char *str)
> +{
> +  while (*str != '\0')
> +    switch (*str++)
> +      {
> +	case '0':
> +	case '1':
> +	case '2':
> +	case '3':
> +	case '4':
> +	case '5':
> +	case '6':
> +	case '7':
> +	case '8':
> +	case '9':
> +	  continue;
> +	default:
> +	  return false;
> +      }
> +
> +  return true;
> +}

This looks odd, we have other places where we parse number from command
line :)
> +
> +/* If forked, which child am I?  */
> +
> +static int childno = -1;
> +
> +static bool
> +maybe_compile_in_parallel (void)
> +{
> +  struct symtab_node *node;
> +  int partitions, i, j;
> +  int *pids;
> +
> +  bool promote_statics = param_promote_statics;
> +  bool balance = param_balance_partitions;
> +  bool jobserver = false;
> +  bool job_auto = false;
> +  int num_jobs = -1;
> +
> +  if (!flag_parallel_jobs || !split_outputs)
> +    return false;
> +
> +  if (!strcmp (flag_parallel_jobs, "auto"))
> +    {
> +      jobserver = jobserver_initialize ();
> +      job_auto = true;
> +    }
> +  else if (!strcmp (flag_parallel_jobs, "jobserver"))
> +    jobserver = jobserver_initialize ();
> +  else if (is_number (flag_parallel_jobs))
> +    num_jobs = atoi (flag_parallel_jobs);
> +  else
> +    gcc_unreachable ();
> +
> +  if (job_auto && !jobserver)
> +    {
> +      num_jobs = sysconf (_SC_NPROCESSORS_CONF);
> +      if (num_jobs > 2)
> +	num_jobs = 2;
> +    }
> +
> +  if (num_jobs < 0 && !jobserver)
> +    {
> +      inform (UNKNOWN_LOCATION,
> +	      "-fparallel-jobs=jobserver, but no GNU Jobserver found");
> +      return false;
> +    }
> +
> +  if (jobserver)
> +    num_jobs = 2;
> +
> +  if (num_jobs == 0)
> +    {
> +      inform (UNKNOWN_LOCATION, "-fparallel-jobs=0 makes no sense");
> +      return false;
> +    }
> +
> +  /* Trick the compiler to think that we are in WPA.  */
> +  flag_wpa = "";
> +  symtab_node::checking_verify_symtab_nodes ();

Messing with WPA/ltrans flags is not good idea.  You already have
split_output for parallel build.  I sort of see why you set ltrans, but
why WPA?
> +
> +  /* Partition the program so that COMDATs get mapped to the same
> +     partition.  If promote_statics is true, it also maps statics
> +     to the same partition.  If balance is true, try to balance the
> +     partitions for compilation performance.  */
> +  lto_merge_comdat_map (balance, promote_statics, num_jobs);
> +
> +  /* AUX pointers are used by partitioning code to bookkeep number of
> +     partitions symbol is in.  This is no longer needed.  */
> +  FOR_EACH_SYMBOL (node)
> +    node->aux = NULL;
> +
> +  /* We decided that partitioning is a bad idea.  In this case, just
> +     proceed with the default compilation method.  */
> +  if (ltrans_partitions.length () <= 1)
> +    {
> +      flag_wpa = NULL;
> +      jobserver_finalize ();
> +      return false;
> +    }
> +
> +  /* Find out statics that need to be promoted
> +     to globals with hidden visibility because they are accessed from
> +     multiple partitions.  */
> +  lto_promote_cross_file_statics (promote_statics);
> +
> +  /* Check if we have variables being referenced across partitions.  */
> +  lto_check_usage_from_other_partitions ();
> +
> +  /* Trick the compiler to think we are not in WPA anymore.  */
> +  flag_wpa = NULL;
> +
> +  partitions = ltrans_partitions.length ();
> +  pids = XALLOCAVEC (pid_t, partitions);
> +
> +  /* There is no point in launching more jobs than we have partitions.  */
> +  if (num_jobs > partitions)
> +    num_jobs = partitions;
> +
> +  /* Trick the compiler to think we are in LTRANS mode.  */
> +  flag_ltrans = true;
> +
> +  init_additional_asm_names_file ();
> +
> +  /* Flush asm file, so we don't get repeated output as we fork.  */
> +  fflush (asm_out_file);
> +
> +  /* Insert a token for child to consume.  */
> +  if (jobserver)
> +    {
> +      num_jobs = partitions;
> +      jobserver_return_token ('p');
> +    }
> +
> +  /* Spawn processes.  Spawn as soon as there is a free slot.  */
> +  for (j = 0, i = -num_jobs; i < partitions; i++, j++)
> +    {
> +      if (i >= 0)
> +	{
> +	  int wstatus, ret;
> +	  ret = waitpid (pids[i], &wstatus, 0);
> +
> +	  if (ret < 0)
> +	    internal_error ("Unable to wait child %d to finish", i);
> +	  else if (WIFEXITED (wstatus))
> +	    {
> +	      if (WEXITSTATUS (wstatus) != 0)
> +		error ("Child %d exited with error", i);
> +	    }
> +	  else if (WIFSIGNALED (wstatus))
> +	    error ("Child %d aborted with error", i);
> +	}
> +
> +      if (j < partitions)
> +	{
> +	  gcc_assert (ltrans_partitions[j]->symbols > 0);
> +
> +	  if (jobserver)
> +	    jobserver_get_token ();
> +
> +	  pids[j] = fork ();
> +	  if (pids[j] == 0)
> +	    {
> +	      childno = j;
> +	      lto_apply_partition_mask (ltrans_partitions[j]);
> +	      return true;
> +	    }
> +	}
> +    }
> +
> +  /* Get the token which parent inserted for the childs, which they returned by
> +     now.  */
> +  if (jobserver)
> +    jobserver_get_token ();
> +  exit (0);
> +}
> +
> +
>  /* Perform simple optimizations based on callgraph.  */
>  
>  void
> @@ -2768,6 +2960,7 @@ symbol_table::compile (void)
>    {
>      timevar_start (TV_CGRAPH_IPA_PASSES);
>      ipa_passes ();
> +    maybe_compile_in_parallel ();
>      timevar_stop (TV_CGRAPH_IPA_PASSES);
>    }
>    /* Do nothing else if any IPA pass found errors or if we are just streaming LTO.  */
> @@ -2790,6 +2983,9 @@ symbol_table::compile (void)
>    timevar_pop (TV_CGRAPHOPT);
>  
>    /* Output everything.  */
> +  if (split_outputs)
> +    handle_additional_asm (childno);
What this is doin?
> +
>    switch_to_section (text_section);
>    (*debug_hooks->assembly_start) ();
>    if (!quiet_flag)
> diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
> index 2cfab40156e..bc500df4853 100644
> --- a/gcc/ipa-fnsummary.c
> +++ b/gcc/ipa-fnsummary.c
> @@ -4610,7 +4610,7 @@ public:
>        gcc_assert (n == 0);
>        small_p = param;
>      }
> -  virtual bool gate (function *) { return true; }
> +  virtual bool gate (function *) { return !(flag_ltrans && split_outputs); }
>    virtual unsigned int execute (function *)
>      {
>        ipa_free_fn_summary ();
> diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
> index 069de9d82fb..6a5657c7507 100644
> --- a/gcc/ipa-icf.c
> +++ b/gcc/ipa-icf.c
> @@ -2345,7 +2345,8 @@ sem_item_optimizer::filter_removed_items (void)
>          {
>  	  cgraph_node *cnode = static_cast <sem_function *>(item)->get_node ();
>  
> -	  if (in_lto_p && (cnode->alias || cnode->body_removed))
> +	  if ((in_lto_p || split_outputs)
> +	      && (cnode->alias || cnode->body_removed))


And I wonder why you need these. IPA passes are run before we split,
right?
>  	    remove_item (item);
>  	  else
>  	    filtered.safe_push (item);
> diff --git a/gcc/ipa-visibility.c b/gcc/ipa-visibility.c
> index 7c854f471e8..4d9e11482d3 100644
> --- a/gcc/ipa-visibility.c
> +++ b/gcc/ipa-visibility.c
> @@ -540,7 +540,8 @@ optimize_weakref (symtab_node *node)
>  static void
>  localize_node (bool whole_program, symtab_node *node)
>  {
> -  gcc_assert (whole_program || in_lto_p || !TREE_PUBLIC (node->decl));
> +  gcc_assert (split_outputs || whole_program || in_lto_p
> +	      || !TREE_PUBLIC (node->decl));
>  
>    /* It is possible that one comdat group contains both hidden and non-hidden
>       symbols.  In this case we can privatize all hidden symbol but we need
> diff --git a/gcc/ipa.c b/gcc/ipa.c
> index 288b58cf73d..b397ea2fed8 100644
> --- a/gcc/ipa.c
> +++ b/gcc/ipa.c
> @@ -350,7 +350,7 @@ symbol_table::remove_unreachable_nodes (FILE *file)
>  
>    /* Mark variables that are obviously needed.  */
>    FOR_EACH_DEFINED_VARIABLE (vnode)
> -    if (!vnode->can_remove_if_no_refs_p()
> +    if (!vnode->can_remove_if_no_refs_p ()
>  	&& !vnode->in_other_partition)
>        {
>  	reachable.add (vnode);
> @@ -564,7 +564,7 @@ symbol_table::remove_unreachable_nodes (FILE *file)
>  	}
>        else
>  	gcc_assert (node->clone_of || !node->has_gimple_body_p ()
> -		    || in_lto_p || DECL_RESULT (node->decl));
> +		    || in_lto_p || split_outputs || DECL_RESULT (node->decl));
>      }
>  
>    /* Inline clones might be kept around so their materializing allows further
> diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
> index 93a99f3465b..12be8546d9c 100644
> --- a/gcc/lto-cgraph.c
> +++ b/gcc/lto-cgraph.c
> @@ -39,6 +39,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "omp-offload.h"
>  #include "stringpool.h"
>  #include "attribs.h"
> +#include "lto-partition.h"
>  
>  /* True when asm nodes has been output.  */
>  bool asm_nodes_output = false;
> @@ -2065,3 +2066,174 @@ input_cgraph_opt_summary (vec<symtab_node *> nodes)
>  	input_cgraph_opt_section (file_data, data, len, nodes);
>      }
>  }
> +
> +/* When analysing function for removal, we have mainly three states, as
> +   defined below.  */
> +
> +enum node_partition_state
> +{
> +  CAN_REMOVE,		/* This node can be removed, or is still to be
> +			   analysed.  */
> +  IN_CURRENT_PARTITION, /* This node is in current partition and should not be
> +			   touched.  */
> +  IN_BOUNDARY,		/* This node is in boundary, therefore being in other
> +			   partition or is a external symbol, and its body can
> +			   be released.  */
> +  IN_BOUNDARY_KEEP_BODY /* This symbol is in other partition but we may need its
> +			   body for inlining, for instance.  */
> +};
> +
> +/* Handle node that are in the LTRANS boundary, releasing its body and
> +   other informations if necessary.  */
> +
> +static void
> +handle_node_in_boundary (symtab_node *node, bool keep_body)
> +{
> +  if (cgraph_node *cnode = dyn_cast <cgraph_node *> (node))
> +    {
> +      if (cnode->inlined_to && cnode->inlined_to->aux2 != IN_CURRENT_PARTITION)
> +	{
> +	  /* If marked to be inlined into a node not in current partition,
> +	     then undo the inline.  */
> +
> +	  if (cnode->callers) /* This edge could be removed.  */
> +	    cnode->callers->inline_failed = CIF_UNSPECIFIED;
> +	  cnode->inlined_to = NULL;
> +	}
> +
> +      if (cnode->has_gimple_body_p ())
> +	{
> +	  if (!keep_body)
> +	    {
> +	      cnode->maybe_release_dominators ();
> +	      cnode->remove_callees ();
> +	      cnode->remove_all_references ();
> +
> +	      /* FIXME: Releasing body of clones can release bodies of functions
> +		 in current partition.  */
> +
> +	      /* cnode->release_body ();  */
> +	      cnode->body_removed = true;
> +	      cnode->definition = false;
> +	      cnode->analyzed = false;
> +	    }
> +	  cnode->cpp_implicit_alias = false;
> +	  cnode->alias = false;
> +	  cnode->transparent_alias = false;
> +	  cnode->thunk.thunk_p = false;
> +	  cnode->weakref = false;
> +	  /* After early inlining we drop always_inline attributes on
> +	     bodies of functions that are still referenced (have their
> +	     address taken).  */
> +	  DECL_ATTRIBUTES (cnode->decl)
> +	    = remove_attribute ("always_inline",
> +				DECL_ATTRIBUTES (node->decl));
> +
> +	  cnode->in_other_partition = true;
> +	}
> +    }
> +  else if (is_a <varpool_node *> (node) && !DECL_EXTERNAL (node->decl))
> +    {
> +      DECL_EXTERNAL (node->decl) = true;
> +      node->in_other_partition = true;
> +    }
> +}
> +
> +/* Check the boundary and expands it if necessary, including more nodes or
> +   promoting then to a state where their body is required.  */
> +
> +static void
> +compute_boundary (ltrans_partition partition)
> +{
> +  vec<lto_encoder_entry> &nodes = partition->encoder->nodes;
> +  symtab_node *node;
> +  cgraph_node *cnode;
> +  auto_vec<symtab_node *, 16> mark_to_remove;
> +  unsigned int i;
> +
> +  FOR_EACH_SYMBOL (node)
> +    node->aux2 = CAN_REMOVE;

There is boundary computation in lto-cgraph.c so we should merge the
logic...
If you keep the lto-partition datastructures it will compute boundary
for you and you can just remove the rest (I got that working at one
point).

I am also not sure about copy-on-write effect of this. It may be better
to keep things around and just teach late passes to not compile things
in other partitions, but that is definitly for incremental change.
> +void
> +init_additional_asm_names_file (void)
> +{
> +  gcc_assert (split_outputs);
> +
> +  additional_asm_filenames = fopen (split_outputs, "w");
> +  if (!additional_asm_filenames)
> +    error ("Unable to create a temporary write-only file.");
> +
> +  fclose (additional_asm_filenames);
> +}

Aha, that is what it does :)
I wonder if creating the file conditionally (and late) can not lead to
race condition where we will use same tmp file name for other build
executed in parallel by make.
> +
> +/* Reinitialize the assembler file and store it in the additional asm file.  */
> +
> +void
> +handle_additional_asm (int childno)
> +{
> +  gcc_assert (split_outputs);
> +
> +  if (childno < 0)
> +    return;
> +
> +  const char *temp_asm_name = make_temp_file (".s");
> +  asm_file_name = temp_asm_name;
> +
> +  if (asm_out_file == stdout)
> +    fatal_error (UNKNOWN_LOCATION, "Unexpected asm output to stdout");
> +
> +  fclose (asm_out_file);
> +
> +  asm_out_file = fopen (temp_asm_name, "w");
> +  if (!asm_out_file)
> +    fatal_error (UNKNOWN_LOCATION, "Unable to create asm output file");
> +
> +  /* Reopen file as append mode.  Here we assume that write to append file is
> +     atomic, as it is in Linux.  */
> +  additional_asm_filenames = fopen (split_outputs, "a");
> +  if (!additional_asm_filenames)
> +    fatal_error (UNKNOWN_LOCATION,
> +		 "Unable to open the temporary asm files container");
> +
> +  fprintf (additional_asm_filenames, "%d %s\n", childno, asm_file_name);
> +  fclose (additional_asm_filenames);
> +}
> +
>  /* A helper function; used as the reallocator function for cpp's line
>     table.  */
>  static void *
> @@ -2311,7 +2359,7 @@ do_compile ()
>  
>            timevar_stop (TV_PHASE_SETUP);
>  
> -          compile_file ();
> +	  compile_file ();
>          }
>        else
>          {
> @@ -2477,6 +2525,12 @@ toplev::main (int argc, char **argv)
>  
>    finalize_plugins ();
>  
> +  if (jobserver_initialized)
> +    {
> +      jobserver_return_token (JOBSERVER_NULL_TOKEN);
> +      jobserver_finalize ();
> +    }
> +
>    after_memory_report = true;
>  
>    if (seen_error () || werrorcount)
> diff --git a/gcc/toplev.h b/gcc/toplev.h
> index d6c316962b0..3abbf74cd02 100644
> --- a/gcc/toplev.h
> +++ b/gcc/toplev.h
> @@ -103,4 +103,7 @@ extern void parse_alignment_opts (void);
>  
>  extern void initialize_rtl (void);
>  
> +extern void init_additional_asm_names_file (void);
> +extern void handle_additional_asm (int);
> +
>  #endif /* ! GCC_TOPLEV_H */
> diff --git a/gcc/varasm.c b/gcc/varasm.c
> index 4070f9c17e8..84df52013d7 100644
> --- a/gcc/varasm.c
> +++ b/gcc/varasm.c
> @@ -110,7 +110,7 @@ static void decode_addr_const (tree, class addr_const *);
>  static hashval_t const_hash_1 (const tree);
>  static int compare_constant (const tree, const tree);
>  static void output_constant_def_contents (rtx);
> -static void output_addressed_constants (tree);
> +static void output_addressed_constants (tree, int);
>  static unsigned HOST_WIDE_INT output_constant (tree, unsigned HOST_WIDE_INT,
>  					       unsigned int, bool, bool);
>  static void globalize_decl (tree);
> @@ -2272,7 +2272,7 @@ assemble_variable (tree decl, int top_level ATTRIBUTE_UNUSED,
>  
>    /* Output any data that we will need to use the address of.  */
>    if (DECL_INITIAL (decl) && DECL_INITIAL (decl) != error_mark_node)
> -    output_addressed_constants (DECL_INITIAL (decl));
> +    output_addressed_constants (DECL_INITIAL (decl), 0);
>  
>    /* dbxout.c needs to know this.  */
>    if (sect && (sect->common.flags & SECTION_CODE) != 0)
> @@ -3426,11 +3426,11 @@ build_constant_desc (tree exp)
>     already have labels.  */
>  
>  static constant_descriptor_tree *
> -add_constant_to_table (tree exp)
> +add_constant_to_table (tree exp, int defer)
>  {
>    /* The hash table methods may call output_constant_def for addressed
>       constants, so handle them first.  */
> -  output_addressed_constants (exp);
> +  output_addressed_constants (exp, defer);
>  
>    /* Sanity check to catch recursive insertion.  */
>    static bool inserting;
> @@ -3474,7 +3474,7 @@ add_constant_to_table (tree exp)
>  rtx
>  output_constant_def (tree exp, int defer)
>  {
> -  struct constant_descriptor_tree *desc = add_constant_to_table (exp);
> +  struct constant_descriptor_tree *desc = add_constant_to_table (exp, defer);
>    maybe_output_constant_def_contents (desc, defer);
>    return desc->rtl;
>  }
> @@ -3544,7 +3544,7 @@ output_constant_def_contents (rtx symbol)
>  
>    /* Make sure any other constants whose addresses appear in EXP
>       are assigned label numbers.  */
> -  output_addressed_constants (exp);
> +  output_addressed_constants (exp, 0);
>  
>    /* We are no longer deferring this constant.  */
>    TREE_ASM_WRITTEN (decl) = TREE_ASM_WRITTEN (exp) = 1;
> @@ -3608,7 +3608,7 @@ lookup_constant_def (tree exp)
>  tree
>  tree_output_constant_def (tree exp)
>  {
> -  struct constant_descriptor_tree *desc = add_constant_to_table (exp);
> +  struct constant_descriptor_tree *desc = add_constant_to_table (exp, 1);
>    tree decl = SYMBOL_REF_DECL (XEXP (desc->rtl, 0));
>    varpool_node::finalize_decl (decl);
>    return decl;
> @@ -4327,7 +4327,7 @@ compute_reloc_for_constant (tree exp)
>     Indicate whether an ADDR_EXPR has been encountered.  */
>  
>  static void
> -output_addressed_constants (tree exp)
> +output_addressed_constants (tree exp, int defer)
>  {
>    tree tem;
>  
> @@ -4347,21 +4347,21 @@ output_addressed_constants (tree exp)
>  	tem = DECL_INITIAL (tem);
>  
>        if (CONSTANT_CLASS_P (tem) || TREE_CODE (tem) == CONSTRUCTOR)
> -	output_constant_def (tem, 0);
> +	output_constant_def (tem, defer);
>  
>        if (TREE_CODE (tem) == MEM_REF)
> -	output_addressed_constants (TREE_OPERAND (tem, 0));
> +	output_addressed_constants (TREE_OPERAND (tem, 0), defer);
>        break;
>  
>      case PLUS_EXPR:
>      case POINTER_PLUS_EXPR:
>      case MINUS_EXPR:
> -      output_addressed_constants (TREE_OPERAND (exp, 1));
> +      output_addressed_constants (TREE_OPERAND (exp, 1), defer);
>        gcc_fallthrough ();
>  
>      CASE_CONVERT:
>      case VIEW_CONVERT_EXPR:
> -      output_addressed_constants (TREE_OPERAND (exp, 0));
> +      output_addressed_constants (TREE_OPERAND (exp, 0), defer);
>        break;
>  
>      case CONSTRUCTOR:
> @@ -4369,7 +4369,7 @@ output_addressed_constants (tree exp)
>  	unsigned HOST_WIDE_INT idx;
>  	FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (exp), idx, tem)
>  	  if (tem != 0)
> -	    output_addressed_constants (tem);
> +	    output_addressed_constants (tem, defer);
>        }
>        break;

Nice job :)
Honza

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/6] Add `+' for Jobserver Integration
  2020-08-20 22:33   ` Joseph Myers
  2020-08-24 13:19     ` Richard Biener
@ 2020-08-27 15:38     ` Jan Hubicka
  1 sibling, 0 replies; 31+ messages in thread
From: Jan Hubicka @ 2020-08-27 15:38 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Giuliano Belinassi, gcc-patches

> On Thu, 20 Aug 2020, Giuliano Belinassi via Gcc-patches wrote:
> 
> >  libbacktrace/Makefile.in |   2 +-
> >  zlib/Makefile.in         |  64 ++++++------
> 
> These directories use makefiles generated by automake.  Rather than 
> modifying the generated files, you need to modify the sources (whether 
> that's Makefile.am, or code in automake itself - if in automake itself, we 
> should wait for an actual new automake release before updating the version 
> used in GCC).

We really want to fix make to allow tools connect to the jobserver
without the "+" on the line.  It affects other things, like dry run and
is generally a sad hack.

Honza
> 
> -- 
> Joseph S. Myers
> joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/6] Implement fork-based parallelism engine
  2020-08-27 15:37   ` Jan Hubicka
@ 2020-08-27 18:27     ` Giuliano Belinassi
  2020-08-29 11:41       ` Jan Hubicka
  2020-08-31  9:33       ` Richard Biener
  0 siblings, 2 replies; 31+ messages in thread
From: Giuliano Belinassi @ 2020-08-27 18:27 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches, richard.guenther

Hi, Honza.

Thank you for your detailed review!

On 08/27, Jan Hubicka wrote:
> > diff --git a/gcc/cgraph.c b/gcc/cgraph.c
> > index c0b45795059..22405098dc5 100644
> > --- a/gcc/cgraph.c
> > +++ b/gcc/cgraph.c
> > @@ -226,6 +226,22 @@ cgraph_node::delete_function_version_by_decl (tree decl)
> >    decl_node->remove ();
> >  }
> >  
> > +/* Release function dominator info if present.  */
> > +
> > +void
> > +cgraph_node::maybe_release_dominators (void)
> > +{
> > +  struct function *fun = DECL_STRUCT_FUNCTION (decl);
> > +
> > +  if (fun && fun->cfg)
> > +    {
> > +      if (dom_info_available_p (fun, CDI_DOMINATORS))
> > +	free_dominance_info (fun, CDI_DOMINATORS);
> > +      if (dom_info_available_p (fun, CDI_POST_DOMINATORS))
> > +	free_dominance_info (fun, CDI_POST_DOMINATORS);
> > +    }
> > +}
> 
> I am not sure if that needs to be member function, but if so we want to
> merge it with other places in cgraph.c and cgraphunit.c where dominators
> are freed.  I do not think you need to check avalability.

This is necessary to remove some nodes from the callgraph.  For some
reason, if I node->remove () and it still have the dominance info
available, it will fail some assertions on the compiler.

However, with regard to code layout, this can be moved to lto-cgraph.c,
as it is only used there.

> > +
> >  /* Record that DECL1 and DECL2 are semantically identical function
> >     versions.  */
> >  void
> > diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> > index b4a7871bd3d..72ac19f9672 100644
> > --- a/gcc/cgraph.h
> > +++ b/gcc/cgraph.h
> > @@ -463,6 +463,15 @@ public:
> >       Return NULL if there's no such node.  */
> >    static symtab_node *get_for_asmname (const_tree asmname);
> >  
> > +  /* Get symtab node by order.  */
> > +  static symtab_node *find_by_order (int order);
> 
> This is quadratic and moreover seems unused. Why do you add it?

I added this for debugging, since I used this a lot inside GDB.
Sure, I can remove this without any problems, or print a warning
for the developer to avoid calling this in production code.

> > +
> > +  /* Get symtab_node by its name.  */
> > +  static symtab_node *find_by_name (const char *);
> 
> Similarly here, note that names are not really very meaningful as lookup
> things, since they get duplicated.
> > +
> > +  /* Get symtab_node by its ASM name.  */
> > +  static symtab_node *find_by_asm_name (const char *);
> 
> For this we have get_for_asmname (which also populates asm name hash as
> needed and is not quadratic)

Cool. I will surely remove this then :)

> > diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
> > index d10d635e942..73e4bed3b61 100644
> > --- a/gcc/cgraphunit.c
> > +++ b/gcc/cgraphunit.c
> > @@ -2258,6 +2258,11 @@ cgraph_node::expand (void)
> >  {
> >    location_t saved_loc;
> >  
> > +  /* FIXME: Find out why body-removed nodes are marked for output.  */
> > +  if (body_removed)
> > +    return;
> 
> Indeed, we should know :)

Looks like this was due an early problem. I removed this and bootstrap
is working OK.

> > +
> > +
> >    /* We ought to not compile any inline clones.  */
> >    gcc_assert (!inlined_to);
> >  
> > @@ -2658,6 +2663,7 @@ ipa_passes (void)
> >  
> >        execute_ipa_summary_passes
> >  	((ipa_opt_pass_d *) passes->all_regular_ipa_passes);
> > +
> This seems accidental.

Yes.

> >      }
> >  
> >    /* Some targets need to handle LTO assembler output specially.  */
> > @@ -2687,10 +2693,17 @@ ipa_passes (void)
> >    if (flag_generate_lto || flag_generate_offload)
> >      targetm.asm_out.lto_end ();
> >  
> > -  if (!flag_ltrans
> > +  if (split_outputs)
> > +    flag_ltrans = true;
> > +
> > +  if ((!flag_ltrans || split_outputs)
> >        && ((in_lto_p && flag_incremental_link != INCREMENTAL_LINK_LTO)
> >  	  || !flag_lto || flag_fat_lto_objects))
> >      execute_ipa_pass_list (passes->all_regular_ipa_passes);
> > +
> > +  if (split_outputs)
> > +    flag_ltrans = false;
> > +
> >    invoke_plugin_callbacks (PLUGIN_ALL_IPA_PASSES_END, NULL);
> >  
> >    bitmap_obstack_release (NULL);
> > @@ -2742,6 +2755,185 @@ symbol_table::output_weakrefs (void)
> >        }
> >  }
> >  
> > +static bool is_number (const char *str)
> > +{
> > +  while (*str != '\0')
> > +    switch (*str++)
> > +      {
> > +	case '0':
> > +	case '1':
> > +	case '2':
> > +	case '3':
> > +	case '4':
> > +	case '5':
> > +	case '6':
> > +	case '7':
> > +	case '8':
> > +	case '9':
> > +	  continue;
> > +	default:
> > +	  return false;
> > +      }
> > +
> > +  return true;
> > +}
> 
> This looks odd, we have other places where we parse number from command
> line :)

isdigit () is poisoned in GCC. But I guess I should look how -flto=
does this.

> > +
> > +/* If forked, which child am I?  */
> > +
> > +static int childno = -1;
> > +
> > +static bool
> > +maybe_compile_in_parallel (void)
> > +{
> > +  struct symtab_node *node;
> > +  int partitions, i, j;
> > +  int *pids;
> > +
> > +  bool promote_statics = param_promote_statics;
> > +  bool balance = param_balance_partitions;
> > +  bool jobserver = false;
> > +  bool job_auto = false;
> > +  int num_jobs = -1;
> > +
> > +  if (!flag_parallel_jobs || !split_outputs)
> > +    return false;
> > +
> > +  if (!strcmp (flag_parallel_jobs, "auto"))
> > +    {
> > +      jobserver = jobserver_initialize ();
> > +      job_auto = true;
> > +    }
> > +  else if (!strcmp (flag_parallel_jobs, "jobserver"))
> > +    jobserver = jobserver_initialize ();
> > +  else if (is_number (flag_parallel_jobs))
> > +    num_jobs = atoi (flag_parallel_jobs);
> > +  else
> > +    gcc_unreachable ();
> > +
> > +  if (job_auto && !jobserver)
> > +    {
> > +      num_jobs = sysconf (_SC_NPROCESSORS_CONF);
> > +      if (num_jobs > 2)
> > +	num_jobs = 2;
> > +    }
> > +
> > +  if (num_jobs < 0 && !jobserver)
> > +    {
> > +      inform (UNKNOWN_LOCATION,
> > +	      "-fparallel-jobs=jobserver, but no GNU Jobserver found");
> > +      return false;
> > +    }
> > +
> > +  if (jobserver)
> > +    num_jobs = 2;
> > +
> > +  if (num_jobs == 0)
> > +    {
> > +      inform (UNKNOWN_LOCATION, "-fparallel-jobs=0 makes no sense");
> > +      return false;
> > +    }
> > +
> > +  /* Trick the compiler to think that we are in WPA.  */
> > +  flag_wpa = "";
> > +  symtab_node::checking_verify_symtab_nodes ();
> 
> Messing with WPA/ltrans flags is not good idea.  You already have
> split_output for parallel build.  I sort of see why you set ltrans, but
> why WPA?

Some assertions expected flag_wpa to be present.  Sure, I can add
split_outputs in these assertions.

> > +
> > +  /* Partition the program so that COMDATs get mapped to the same
> > +     partition.  If promote_statics is true, it also maps statics
> > +     to the same partition.  If balance is true, try to balance the
> > +     partitions for compilation performance.  */
> > +  lto_merge_comdat_map (balance, promote_statics, num_jobs);
> > +
> > +  /* AUX pointers are used by partitioning code to bookkeep number of
> > +     partitions symbol is in.  This is no longer needed.  */
> > +  FOR_EACH_SYMBOL (node)
> > +    node->aux = NULL;
> > +
> > +  /* We decided that partitioning is a bad idea.  In this case, just
> > +     proceed with the default compilation method.  */
> > +  if (ltrans_partitions.length () <= 1)
> > +    {
> > +      flag_wpa = NULL;
> > +      jobserver_finalize ();
> > +      return false;
> > +    }
> > +
> > +  /* Find out statics that need to be promoted
> > +     to globals with hidden visibility because they are accessed from
> > +     multiple partitions.  */
> > +  lto_promote_cross_file_statics (promote_statics);
> > +
> > +  /* Check if we have variables being referenced across partitions.  */
> > +  lto_check_usage_from_other_partitions ();
> > +
> > +  /* Trick the compiler to think we are not in WPA anymore.  */
> > +  flag_wpa = NULL;
> > +
> > +  partitions = ltrans_partitions.length ();
> > +  pids = XALLOCAVEC (pid_t, partitions);
> > +
> > +  /* There is no point in launching more jobs than we have partitions.  */
> > +  if (num_jobs > partitions)
> > +    num_jobs = partitions;
> > +
> > +  /* Trick the compiler to think we are in LTRANS mode.  */
> > +  flag_ltrans = true;
> > +
> > +  init_additional_asm_names_file ();
> > +
> > +  /* Flush asm file, so we don't get repeated output as we fork.  */
> > +  fflush (asm_out_file);
> > +
> > +  /* Insert a token for child to consume.  */
> > +  if (jobserver)
> > +    {
> > +      num_jobs = partitions;
> > +      jobserver_return_token ('p');
> > +    }
> > +
> > +  /* Spawn processes.  Spawn as soon as there is a free slot.  */
> > +  for (j = 0, i = -num_jobs; i < partitions; i++, j++)
> > +    {
> > +      if (i >= 0)
> > +	{
> > +	  int wstatus, ret;
> > +	  ret = waitpid (pids[i], &wstatus, 0);
> > +
> > +	  if (ret < 0)
> > +	    internal_error ("Unable to wait child %d to finish", i);
> > +	  else if (WIFEXITED (wstatus))
> > +	    {
> > +	      if (WEXITSTATUS (wstatus) != 0)
> > +		error ("Child %d exited with error", i);
> > +	    }
> > +	  else if (WIFSIGNALED (wstatus))
> > +	    error ("Child %d aborted with error", i);
> > +	}
> > +
> > +      if (j < partitions)
> > +	{
> > +	  gcc_assert (ltrans_partitions[j]->symbols > 0);
> > +
> > +	  if (jobserver)
> > +	    jobserver_get_token ();
> > +
> > +	  pids[j] = fork ();
> > +	  if (pids[j] == 0)
> > +	    {
> > +	      childno = j;
> > +	      lto_apply_partition_mask (ltrans_partitions[j]);
> > +	      return true;
> > +	    }
> > +	}
> > +    }
> > +
> > +  /* Get the token which parent inserted for the childs, which they returned by
> > +     now.  */
> > +  if (jobserver)
> > +    jobserver_get_token ();
> > +  exit (0);
> > +}
> > +
> > +
> >  /* Perform simple optimizations based on callgraph.  */
> >  
> >  void
> > @@ -2768,6 +2960,7 @@ symbol_table::compile (void)
> >    {
> >      timevar_start (TV_CGRAPH_IPA_PASSES);
> >      ipa_passes ();
> > +    maybe_compile_in_parallel ();
> >      timevar_stop (TV_CGRAPH_IPA_PASSES);
> >    }
> >    /* Do nothing else if any IPA pass found errors or if we are just streaming LTO.  */
> > @@ -2790,6 +2983,9 @@ symbol_table::compile (void)
> >    timevar_pop (TV_CGRAPHOPT);
> >  
> >    /* Output everything.  */
> > +  if (split_outputs)
> > +    handle_additional_asm (childno);
> What this is doin?

Create an auxiliary file in which we will write the name of every
assembler output.  This will change, as it is a better idea to write
them using the main process rather than the child, as Richi already
pointed out.

> > +
> >    switch_to_section (text_section);
> >    (*debug_hooks->assembly_start) ();
> >    if (!quiet_flag)
> > diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
> > index 2cfab40156e..bc500df4853 100644
> > --- a/gcc/ipa-fnsummary.c
> > +++ b/gcc/ipa-fnsummary.c
> > @@ -4610,7 +4610,7 @@ public:
> >        gcc_assert (n == 0);
> >        small_p = param;
> >      }
> > -  virtual bool gate (function *) { return true; }
> > +  virtual bool gate (function *) { return !(flag_ltrans && split_outputs); }
> >    virtual unsigned int execute (function *)
> >      {
> >        ipa_free_fn_summary ();
> > diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
> > index 069de9d82fb..6a5657c7507 100644
> > --- a/gcc/ipa-icf.c
> > +++ b/gcc/ipa-icf.c
> > @@ -2345,7 +2345,8 @@ sem_item_optimizer::filter_removed_items (void)
> >          {
> >  	  cgraph_node *cnode = static_cast <sem_function *>(item)->get_node ();
> >  
> > -	  if (in_lto_p && (cnode->alias || cnode->body_removed))
> > +	  if ((in_lto_p || split_outputs)
> > +	      && (cnode->alias || cnode->body_removed))
> 
> 
> And I wonder why you need these. IPA passes are run before we split,
> right?

This was due to an early problem.  I removed this and bootstrap is
still working.

> >  	    remove_item (item);
> >  	  else
> >  	    filtered.safe_push (item);
> > diff --git a/gcc/ipa-visibility.c b/gcc/ipa-visibility.c
> > index 7c854f471e8..4d9e11482d3 100644
> > --- a/gcc/ipa-visibility.c
> > +++ b/gcc/ipa-visibility.c
> > @@ -540,7 +540,8 @@ optimize_weakref (symtab_node *node)
> >  static void
> >  localize_node (bool whole_program, symtab_node *node)
> >  {
> > -  gcc_assert (whole_program || in_lto_p || !TREE_PUBLIC (node->decl));
> > +  gcc_assert (split_outputs || whole_program || in_lto_p
> > +	      || !TREE_PUBLIC (node->decl));
> >  
> >    /* It is possible that one comdat group contains both hidden and non-hidden
> >       symbols.  In this case we can privatize all hidden symbol but we need
> > diff --git a/gcc/ipa.c b/gcc/ipa.c
> > index 288b58cf73d..b397ea2fed8 100644
> > --- a/gcc/ipa.c
> > +++ b/gcc/ipa.c
> > @@ -350,7 +350,7 @@ symbol_table::remove_unreachable_nodes (FILE *file)
> >  
> >    /* Mark variables that are obviously needed.  */
> >    FOR_EACH_DEFINED_VARIABLE (vnode)
> > -    if (!vnode->can_remove_if_no_refs_p()
> > +    if (!vnode->can_remove_if_no_refs_p ()
> >  	&& !vnode->in_other_partition)
> >        {
> >  	reachable.add (vnode);
> > @@ -564,7 +564,7 @@ symbol_table::remove_unreachable_nodes (FILE *file)
> >  	}
> >        else
> >  	gcc_assert (node->clone_of || !node->has_gimple_body_p ()
> > -		    || in_lto_p || DECL_RESULT (node->decl));
> > +		    || in_lto_p || split_outputs || DECL_RESULT (node->decl));
> >      }
> >  
> >    /* Inline clones might be kept around so their materializing allows further
> > diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
> > index 93a99f3465b..12be8546d9c 100644
> > --- a/gcc/lto-cgraph.c
> > +++ b/gcc/lto-cgraph.c
> > @@ -39,6 +39,7 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "omp-offload.h"
> >  #include "stringpool.h"
> >  #include "attribs.h"
> > +#include "lto-partition.h"
> >  
> >  /* True when asm nodes has been output.  */
> >  bool asm_nodes_output = false;
> > @@ -2065,3 +2066,174 @@ input_cgraph_opt_summary (vec<symtab_node *> nodes)
> >  	input_cgraph_opt_section (file_data, data, len, nodes);
> >      }
> >  }
> > +
> > +/* When analysing function for removal, we have mainly three states, as
> > +   defined below.  */
> > +
> > +enum node_partition_state
> > +{
> > +  CAN_REMOVE,		/* This node can be removed, or is still to be
> > +			   analysed.  */
> > +  IN_CURRENT_PARTITION, /* This node is in current partition and should not be
> > +			   touched.  */
> > +  IN_BOUNDARY,		/* This node is in boundary, therefore being in other
> > +			   partition or is a external symbol, and its body can
> > +			   be released.  */
> > +  IN_BOUNDARY_KEEP_BODY /* This symbol is in other partition but we may need its
> > +			   body for inlining, for instance.  */
> > +};
> > +
> > +/* Handle node that are in the LTRANS boundary, releasing its body and
> > +   other informations if necessary.  */
> > +
> > +static void
> > +handle_node_in_boundary (symtab_node *node, bool keep_body)
> > +{
> > +  if (cgraph_node *cnode = dyn_cast <cgraph_node *> (node))
> > +    {
> > +      if (cnode->inlined_to && cnode->inlined_to->aux2 != IN_CURRENT_PARTITION)
> > +	{
> > +	  /* If marked to be inlined into a node not in current partition,
> > +	     then undo the inline.  */
> > +
> > +	  if (cnode->callers) /* This edge could be removed.  */
> > +	    cnode->callers->inline_failed = CIF_UNSPECIFIED;
> > +	  cnode->inlined_to = NULL;
> > +	}
> > +
> > +      if (cnode->has_gimple_body_p ())
> > +	{
> > +	  if (!keep_body)
> > +	    {
> > +	      cnode->maybe_release_dominators ();
> > +	      cnode->remove_callees ();
> > +	      cnode->remove_all_references ();
> > +
> > +	      /* FIXME: Releasing body of clones can release bodies of functions
> > +		 in current partition.  */
> > +
> > +	      /* cnode->release_body ();  */
> > +	      cnode->body_removed = true;
> > +	      cnode->definition = false;
> > +	      cnode->analyzed = false;
> > +	    }
> > +	  cnode->cpp_implicit_alias = false;
> > +	  cnode->alias = false;
> > +	  cnode->transparent_alias = false;
> > +	  cnode->thunk.thunk_p = false;
> > +	  cnode->weakref = false;
> > +	  /* After early inlining we drop always_inline attributes on
> > +	     bodies of functions that are still referenced (have their
> > +	     address taken).  */
> > +	  DECL_ATTRIBUTES (cnode->decl)
> > +	    = remove_attribute ("always_inline",
> > +				DECL_ATTRIBUTES (node->decl));
> > +
> > +	  cnode->in_other_partition = true;
> > +	}
> > +    }
> > +  else if (is_a <varpool_node *> (node) && !DECL_EXTERNAL (node->decl))
> > +    {
> > +      DECL_EXTERNAL (node->decl) = true;
> > +      node->in_other_partition = true;
> > +    }
> > +}
> > +
> > +/* Check the boundary and expands it if necessary, including more nodes or
> > +   promoting then to a state where their body is required.  */
> > +
> > +static void
> > +compute_boundary (ltrans_partition partition)
> > +{
> > +  vec<lto_encoder_entry> &nodes = partition->encoder->nodes;
> > +  symtab_node *node;
> > +  cgraph_node *cnode;
> > +  auto_vec<symtab_node *, 16> mark_to_remove;
> > +  unsigned int i;
> > +
> > +  FOR_EACH_SYMBOL (node)
> > +    node->aux2 = CAN_REMOVE;
> 
> There is boundary computation in lto-cgraph.c so we should merge the
> logic...

Agree.

> If you keep the lto-partition datastructures it will compute boundary
> for you and you can just remove the rest (I got that working at one
> point).

This is interesting, because I could not get that working out of the
box. The lto_promote_cross_statics did not provide a fully working
boundary that I could simple remove everything else. If you take a
closer look, you will see that I am using the already computed boundary
as a base, and incrementing that with the extra stuff required
(mainly inline clones body).

> 
> I am also not sure about copy-on-write effect of this. It may be better
> to keep things around and just teach late passes to not compile things
> in other partitions, but that is definitly for incremental change.

Well, this looks like a lot of work :)

> > +void
> > +init_additional_asm_names_file (void)
> > +{
> > +  gcc_assert (split_outputs);
> > +
> > +  additional_asm_filenames = fopen (split_outputs, "w");
> > +  if (!additional_asm_filenames)
> > +    error ("Unable to create a temporary write-only file.");
> > +
> > +  fclose (additional_asm_filenames);
> > +}
> 
> Aha, that is what it does :)
> I wonder if creating the file conditionally (and late) can not lead to
> race condition where we will use same tmp file name for other build
> executed in parallel by make.

Hummm. True. Added to my TODO list :)
Well, I never had any sort of issues with race condition here, even
after stressing it, but this certainly is not proof that it is free of
race conditition :)

> > +
> > +/* Reinitialize the assembler file and store it in the additional asm file.  */
> > +
> > +void
> > +handle_additional_asm (int childno)
> > +{
> > +  gcc_assert (split_outputs);
> > +
> > +  if (childno < 0)
> > +    return;
> > +
> > +  const char *temp_asm_name = make_temp_file (".s");
> > +  asm_file_name = temp_asm_name;
> > +
> > +  if (asm_out_file == stdout)
> > +    fatal_error (UNKNOWN_LOCATION, "Unexpected asm output to stdout");
> > +
> > +  fclose (asm_out_file);
> > +
> > +  asm_out_file = fopen (temp_asm_name, "w");
> > +  if (!asm_out_file)
> > +    fatal_error (UNKNOWN_LOCATION, "Unable to create asm output file");
> > +
> > +  /* Reopen file as append mode.  Here we assume that write to append file is
> > +     atomic, as it is in Linux.  */
> > +  additional_asm_filenames = fopen (split_outputs, "a");
> > +  if (!additional_asm_filenames)
> > +    fatal_error (UNKNOWN_LOCATION,
> > +		 "Unable to open the temporary asm files container");
> > +
> > +  fprintf (additional_asm_filenames, "%d %s\n", childno, asm_file_name);
> > +  fclose (additional_asm_filenames);
> > +}
> > +
> >  /* A helper function; used as the reallocator function for cpp's line
> >     table.  */
> >  static void *
> > @@ -2311,7 +2359,7 @@ do_compile ()
> >  
> >            timevar_stop (TV_PHASE_SETUP);
> >  
> > -          compile_file ();
> > +	  compile_file ();
> >          }
> >        else
> >          {
> > @@ -2477,6 +2525,12 @@ toplev::main (int argc, char **argv)
> >  
> >    finalize_plugins ();
> >  
> > +  if (jobserver_initialized)
> > +    {
> > +      jobserver_return_token (JOBSERVER_NULL_TOKEN);
> > +      jobserver_finalize ();
> > +    }
> > +
> >    after_memory_report = true;
> >  
> >    if (seen_error () || werrorcount)
> > diff --git a/gcc/toplev.h b/gcc/toplev.h
> > index d6c316962b0..3abbf74cd02 100644
> > --- a/gcc/toplev.h
> > +++ b/gcc/toplev.h
> > @@ -103,4 +103,7 @@ extern void parse_alignment_opts (void);
> >  
> >  extern void initialize_rtl (void);
> >  
> > +extern void init_additional_asm_names_file (void);
> > +extern void handle_additional_asm (int);
> > +
> >  #endif /* ! GCC_TOPLEV_H */
> > diff --git a/gcc/varasm.c b/gcc/varasm.c
> > index 4070f9c17e8..84df52013d7 100644
> > --- a/gcc/varasm.c
> > +++ b/gcc/varasm.c
> > @@ -110,7 +110,7 @@ static void decode_addr_const (tree, class addr_const *);
> >  static hashval_t const_hash_1 (const tree);
> >  static int compare_constant (const tree, const tree);
> >  static void output_constant_def_contents (rtx);
> > -static void output_addressed_constants (tree);
> > +static void output_addressed_constants (tree, int);
> >  static unsigned HOST_WIDE_INT output_constant (tree, unsigned HOST_WIDE_INT,
> >  					       unsigned int, bool, bool);
> >  static void globalize_decl (tree);
> > @@ -2272,7 +2272,7 @@ assemble_variable (tree decl, int top_level ATTRIBUTE_UNUSED,
> >  
> >    /* Output any data that we will need to use the address of.  */
> >    if (DECL_INITIAL (decl) && DECL_INITIAL (decl) != error_mark_node)
> > -    output_addressed_constants (DECL_INITIAL (decl));
> > +    output_addressed_constants (DECL_INITIAL (decl), 0);
> >  
> >    /* dbxout.c needs to know this.  */
> >    if (sect && (sect->common.flags & SECTION_CODE) != 0)
> > @@ -3426,11 +3426,11 @@ build_constant_desc (tree exp)
> >     already have labels.  */
> >  
> >  static constant_descriptor_tree *
> > -add_constant_to_table (tree exp)
> > +add_constant_to_table (tree exp, int defer)
> >  {
> >    /* The hash table methods may call output_constant_def for addressed
> >       constants, so handle them first.  */
> > -  output_addressed_constants (exp);
> > +  output_addressed_constants (exp, defer);
> >  
> >    /* Sanity check to catch recursive insertion.  */
> >    static bool inserting;
> > @@ -3474,7 +3474,7 @@ add_constant_to_table (tree exp)
> >  rtx
> >  output_constant_def (tree exp, int defer)
> >  {
> > -  struct constant_descriptor_tree *desc = add_constant_to_table (exp);
> > +  struct constant_descriptor_tree *desc = add_constant_to_table (exp, defer);
> >    maybe_output_constant_def_contents (desc, defer);
> >    return desc->rtl;
> >  }
> > @@ -3544,7 +3544,7 @@ output_constant_def_contents (rtx symbol)
> >  
> >    /* Make sure any other constants whose addresses appear in EXP
> >       are assigned label numbers.  */
> > -  output_addressed_constants (exp);
> > +  output_addressed_constants (exp, 0);
> >  
> >    /* We are no longer deferring this constant.  */
> >    TREE_ASM_WRITTEN (decl) = TREE_ASM_WRITTEN (exp) = 1;
> > @@ -3608,7 +3608,7 @@ lookup_constant_def (tree exp)
> >  tree
> >  tree_output_constant_def (tree exp)
> >  {
> > -  struct constant_descriptor_tree *desc = add_constant_to_table (exp);
> > +  struct constant_descriptor_tree *desc = add_constant_to_table (exp, 1);
> >    tree decl = SYMBOL_REF_DECL (XEXP (desc->rtl, 0));
> >    varpool_node::finalize_decl (decl);
> >    return decl;
> > @@ -4327,7 +4327,7 @@ compute_reloc_for_constant (tree exp)
> >     Indicate whether an ADDR_EXPR has been encountered.  */
> >  
> >  static void
> > -output_addressed_constants (tree exp)
> > +output_addressed_constants (tree exp, int defer)
> >  {
> >    tree tem;
> >  
> > @@ -4347,21 +4347,21 @@ output_addressed_constants (tree exp)
> >  	tem = DECL_INITIAL (tem);
> >  
> >        if (CONSTANT_CLASS_P (tem) || TREE_CODE (tem) == CONSTRUCTOR)
> > -	output_constant_def (tem, 0);
> > +	output_constant_def (tem, defer);
> >  
> >        if (TREE_CODE (tem) == MEM_REF)
> > -	output_addressed_constants (TREE_OPERAND (tem, 0));
> > +	output_addressed_constants (TREE_OPERAND (tem, 0), defer);
> >        break;
> >  
> >      case PLUS_EXPR:
> >      case POINTER_PLUS_EXPR:
> >      case MINUS_EXPR:
> > -      output_addressed_constants (TREE_OPERAND (exp, 1));
> > +      output_addressed_constants (TREE_OPERAND (exp, 1), defer);
> >        gcc_fallthrough ();
> >  
> >      CASE_CONVERT:
> >      case VIEW_CONVERT_EXPR:
> > -      output_addressed_constants (TREE_OPERAND (exp, 0));
> > +      output_addressed_constants (TREE_OPERAND (exp, 0), defer);
> >        break;
> >  
> >      case CONSTRUCTOR:
> > @@ -4369,7 +4369,7 @@ output_addressed_constants (tree exp)
> >  	unsigned HOST_WIDE_INT idx;
> >  	FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (exp), idx, tem)
> >  	  if (tem != 0)
> > -	    output_addressed_constants (tem);
> > +	    output_addressed_constants (tem, defer);
> >        }
> >        break;
> 
> Nice job :)
> Honza

Thank you,
Giuliano.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/6] Implement a new partitioner for parallel compilation
  2020-08-27 15:18   ` Jan Hubicka
@ 2020-08-27 21:42     ` Giuliano Belinassi
  0 siblings, 0 replies; 31+ messages in thread
From: Giuliano Belinassi @ 2020-08-27 21:42 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches, richard.guenther

Hi, Honza.

Again, thank you for your detailed review!

On 08/27, Jan Hubicka wrote:
> > When using the LTO infrastructure to compile files in parallel, we
> > can't simply use any of the LTO partitioner, once extra dependency
> > analysis is required to ensure that some nodes are correctly
> > partitioned together.
> > 
> > Therefore, here we implement a new partitioner called
> > "lto_merge_comdat_map" that does all these required analysis.
> > The partitioner works as follows:
> > 
> > 1. We create a number of disjoint sets and inserts each node into a
> >    separate set, which may be merged together in the future.
> > 
> > 2. Find COMDAT groups, and mark them to be partitioned together.
> > 
> > 3. Check all nodes that would require any COMDAT group to be
> >    copied to its partition (which we name "COMDAT frontier"),
> >    and mark them to be partitioned together.
> >    This avoids duplication of COMDAT groups and crashes on the LTO
> >    partitioning infrastructure.
> 
> What kind of crash you get here?

This assertion.

	  bool added = add_symbol_to_partition_1 (part, node1);
	  gcc_assert (added);

It checks if the COMDAT node was not already inserted into somewhere
else partition.

> > 
> > 4. Check if the user allows the partitioner to promote non-public
> >    functions or variables to global to improve parallelization
> >    opportunity with a cost of modifying the output code layout.
> > 
> > 5. Balance generated partitions for performance unless not told to.
> > 
> > The choice of 1. was by design, so we could use a union-find
> > data structure, which are know for being very fast on set unite
> > operations.
> 
> In LTO partitioner the groups of objects that "must go toghether"
> are discovered when first object is placed into the partition (via
> add_to_partition) because with the LTO rules it is always possible to
> discover all members from the group starting from any random element via
> graph walking.
> 
> I guess it is same with your partitioner?  I basically wonder how much
> code can be shared and what needs to be duplicated.
> It is not very nice to have partitioning implemented twice - it is bit
> subtle problem when it comes to details so I would be happier if we
> brought in the lto/lto-partition.c to middle end and updaed/cleaned it
> up as needed.

They are almost the same thing, they group together nodes that
require to be in the same partition before deciding how to partition
them.

Things are a little different in the way that this partitioner starts
with n partitions and merge nodes together as we decide that these
nodes needs to be in the same partition.  The advantage of this is that
merging partitions is quite cheap, but the drawback is that I can't
undo partitions easily. You can also see that I only use the
add_node_to_partition after it decides what nodes should be in the
partition.

I think that if there is a way to avoid failing that assertion that I
mentioned above, we could even get rid of this step and use the
balanced_map partitioner.

> > 
> > For 3. to work properly, we also had to modify
> > lto_promote_cross_file_statics to handle this case.
> > 
> > The parameters --param=promote-statics and --param=balance-partitions
> > control 4. and 5., respectively
> > 
> > gcc/ChangeLog:
> > 2020-08-20  Giuliano Belinassi  <giuliano.belinassi@usp.br>
> > 
> > 	* Makefile.in: Add lto-partition.o
> > 	* cgraph.h (struct symtab_node::aux2): New variable.
> > 	* lto-partition.c: Move from gcc/lto/lto-partition.c
> > 	(add_symbol_to_partition_1): Only compute insn size
> > 	if information is available.
> > 	(node_cmp): Same as above.
> > 	(class union_find): New.
> > 	(ds_print_roots): New function.
> > 	(balance_partitions): New function.
> > 	(build_ltrans_partitions): New function.
> > 	(merge_comdat_nodes): New function.
> > 	(merge_static_calls): New function.
> > 	(merge_contained_symbols): New function.
> > 	(lto_merge_comdat_map): New function.
> > 	(privatize_symbol_name_1): Handle when WPA is not enabled.
> > 	(privatize_symbol_name): Same as above.
> > 	(lto_promote_cross_file_statics): New parameter to select when
> > 	to promote to global.
> > 	(lto_check_usage_from_other_partitions): New function.
> > 	* lto-partition.h: Move from gcc/lto/lto-partition.h
> > 	(lto_promote_cross_file_statics): Update prototype.
> > 	(lto_check_usage_from_other_partitions): Declare.
> > 	(lto_merge_comdat_map): Declare.
> > 
> > gcc/lto/ChangeLog:
> > 2020-08-20  Giuliano Belinassi  <giuliano.belinassi@usp.br>
> > 
> > 	* lto-partition.c: Move to gcc/lto-partition.c.
> > 	* lto-partition.h: Move to gcc/lto-partition.h.
> > 	* lto.c: Update call to lto_promote_cross_file_statics.
> > 	* Makefile.in: Remove lto-partition.o.
> > ---
> >  gcc/Makefile.in               |   1 +
> >  gcc/cgraph.h                  |   1 +
> >  gcc/{lto => }/lto-partition.c | 463 +++++++++++++++++++++++++++++++++-
> >  gcc/{lto => }/lto-partition.h |   4 +-
> >  gcc/lto/Make-lang.in          |   4 +-
> >  gcc/lto/lto.c                 |   2 +-
> >  gcc/params.opt                |   8 +
> >  gcc/tree.c                    |  23 +-
> >  8 files changed, 489 insertions(+), 17 deletions(-)
> >  rename gcc/{lto => }/lto-partition.c (78%)
> >  rename gcc/{lto => }/lto-partition.h (89%)
> > 
> > diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> > index 79e854aa938..be42b15f4ff 100644
> > --- a/gcc/Makefile.in
> > +++ b/gcc/Makefile.in
> > @@ -1457,6 +1457,7 @@ OBJS = \
> >  	lra-spills.o \
> >  	lto-cgraph.o \
> >  	lto-streamer.o \
> > +	lto-partition.o \
> >  	lto-streamer-in.o \
> >  	lto-streamer-out.o \
> >  	lto-section-in.o \
> > diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> > index 0211f08964f..b4a7871bd3d 100644
> > --- a/gcc/cgraph.h
> > +++ b/gcc/cgraph.h
> > @@ -615,6 +615,7 @@ public:
> >    struct lto_file_decl_data * lto_file_data;
> >  
> >    PTR GTY ((skip)) aux;
> > +  int aux2;
> 
> We do not really want to add more pass specific data to the symbol table
> (since it is critical wrt memory use during WPA stage).
> It is possible to attach extra info using the symbol-summary.h

How about if I refactor add_node_to_partition to not use the aux
pointer, but a single bit for marking if a node was already
partitioned? I counted 18 bits used in symtab_node, so there are plenty
of space left in this bitfield :)

Then I can use the aux pointer variable for storing the partitioning.
This will avoid me the cost of hash accesses.

> >  
> >    /* Comdat group the symbol is in.  Can be private if GGC allowed that.  */
> >    tree x_comdat_group;
> > diff --git a/gcc/lto/lto-partition.c b/gcc/lto-partition.c
> > similarity index 78%
> > rename from gcc/lto/lto-partition.c
> > rename to gcc/lto-partition.c
> > index 8e0488ab13e..ca962e69b5d 100644
> > --- a/gcc/lto/lto-partition.c
> > +++ b/gcc/lto-partition.c
> > @@ -170,7 +170,11 @@ add_symbol_to_partition_1 (ltrans_partition part, symtab_node *node)
> >      {
> >        struct cgraph_edge *e;
> >        if (!node->alias && c == SYMBOL_PARTITION)
> > -	part->insns += ipa_size_summaries->get (cnode)->size;
> > +	{
> > +	  /* FIXME: Find out why this is being returned NULL in some cases.  */
> > +	  if (ipa_size_summaries->get (cnode))
> > +	    part->insns += ipa_size_summaries->get (cnode)->size;
> 
> It returns NULL when symbol summary is not available.  Either symbol is
> not defined (in which case it should not be SYMBOL_PARTITION) or it is
> not computed (probaly because of -O0?)
> > +/* Quickly balance partitions, trying to reach target_size in each of
> > +   them.  Returns true if something was done, or false if we decided
> > +   that it is not worth.  */
> > +
> > +static bool
> > +balance_partitions (union_find *ds, int n, int jobs)
> 
> Generally options should be documented, so I would learn what N means ;)

Woops.  n is just the number of nodes.

> > +{
> > +  int *sizes, i, j;
> > +  int total_size = 0, max_size = -1;
> > +  int target_size;
> > +  const int eps = 0;
> > +
> > +  symtab_node *node;
> > +
> > +  sizes = (int *) alloca (n * sizeof (*sizes));
> > +  memset (sizes, 0, n * sizeof (*sizes));
> 
> And we avoid using alloca for arrays that can grow over stack limit.
> I assume this has the size of symbol table, which means that you want to
> use vec.h API and allocae on heap.

Okay.

> > + 
> > +  /* Compute costs.  */
> > +  i = 0;
> > +  FOR_EACH_SYMBOL (node)
> > +    {
> > +      int root = ds->find (i);
> > +
> > +      if (cgraph_node *cnode = dyn_cast<cgraph_node *> (node))
> > +	{
> > +	  ipa_size_summary *summary = ipa_size_summaries->get (cnode);
> > +	  if (summary)
> > +	    sizes[root] += summary->size;
> > +	  else
> > +	    sizes[root] += 10;
> > +	}
> > +      else
> > +	sizes[root] += 10;
> 
> Do you have testcases where summary is mising?

This may be related to an early problem.  I will just remove these and
check if i hit these again.

> > +
> > +
> > +      i++;
> > +    }
> > +
> > +  /* Compute the total size and maximum size.  */
> > +  for (i = 0; i < n; ++i)
> > +    {
> > +      total_size += sizes[i];
> > +      max_size    = MAX (max_size, sizes[i]);
> 
> Also i think we should start using 64bit values for total sizes of
> units. In some extreme cases they already get close to overflow.

Okay :)

> > +    }
> > +
> > +  /* Quick return if total size is small.  */
> > +  if (total_size < param_min_partition_size)
> > +    return false;
> > +
> > +  target_size = total_size / (jobs + 1);
> > +
> > +  /* Unite small partitions.  */
> > +  for (i = 0, j = 0; j < n; ++j)
> > +    {
> > +      if (sizes[j] == 0)
> > +	continue;
> > +
> > +      if (i == -1)
> > +	i = j;
> > +      else
> > +	{
> > +	  if (sizes[i] + sizes[j] < target_size + eps)
> > +	    {
> > +	      ds->unite (i, j);
> > +	      sizes[i] += sizes[j];
> > +	      sizes[j] = 0;
> > +	    }
> > +	  else
> > +	      i = j;
> > +	}
> > +    }
> > +  return true;
> 
> Note that partitioning is not free, since reference to static var or a
> call from one partition to another will lead to more expensive
> relocations/instructions to be used (since gas will not be able to relax
> it to IP relative addressing where available).
> 
> For that reason LTO partitioner has the logic tracking boundary and
> minimizing it.  I think we should merge them and also cleanup
> lto/lto-partition.c - the partitioner was one late night experiment that
> I intended as a first proof of concept to be rewritten later, but we
> never got into impementing anything much smarter. On the other hand we
> added a lot of extra hacks to it (order preserving and other things), so
> it deserves some TLC.

You mean the balanced_map or the add_node_to_partition?

> 
> Also I think you unite partitions in the FOR_EACH_* order that is not
> really meaningful, the code layout is controlled by node->order values.

This was mainly to improve compilation performance and did not use the
code layout as account. Maybe the best strategy is to "fix" the LTO
partitioner so that we could use it for this, instead of implementing
a new one.

> 
> > +}
> > +
> > +/* Builds the LTRANS partitions, or return if not needed.  */
> > +
> > +static int
> > +build_ltrans_partitions (union_find *ds, int n)
> > +{
> > +  int i, n_partitions;
> > +  symtab_node *node;
> > +
> > +  int *compression = (int *) alloca (n * sizeof (*compression));
> > +  for (i = 0; i < n; ++i)
> > +    compression[i] = -1; /* Invalid value.  */
> > +
> > +  i = 0, n_partitions = 0;
> > +  FOR_EACH_SYMBOL (node)
> > +    {
> > +      int root = ds->find (i);
> > +      node->aux2 = root;
> > +      node->aux = NULL;
> > +
> > +      if (node->get_partitioning_class () == SYMBOL_PARTITION
> > +	  && compression[root] < 0)
> > +	compression[root] = n_partitions++;
> > +      i++;
> > +    }
> 
> What exactly compression is used for?

This is coordinate compression.  Partitions generated by my partitioner
will be identified as an integer between 0, ..., n-1, where n is the
number of nodes in the graph, but we may have m partitions, where
0 < m <= n.  What that does is map these 0, ..., n-1 identifiers to
a unique number between 0, ..., m-1.  If you take a closer look, the
algorithm ressembles the counting sort.

> > +
> > +  if (dump_file)
> > +    fprintf (dump_file, "n_partitions = %d\n", n_partitions);
> > +
> > +  if (n_partitions <= 1)
> > +    return false;
> > +
> > +  /* Create LTRANS partitions.  */
> > +  ltrans_partitions.create (n_partitions);
> > +  for (i = 0; i < n_partitions; i++)
> > +    new_partition ("");
> > +
> > +  FOR_EACH_SYMBOL (node)
> > +    {
> > +      if (node->get_partitioning_class () != SYMBOL_PARTITION
> > +	  || symbol_partitioned_p (node))
> > +	  continue;
> > +
> > +      int p = compression[node->aux2];
> > +      if (dump_file)
> > +	fprintf (dump_file, "p = %d\t;; %s\n", p, node->dump_name ());
> > +      add_symbol_to_partition (ltrans_partitions[p], node);
> > +    }
> > +
> > +  return true;
> > +}
> > +
> > +/* Partition COMDAT groups together, and also bring together nodes that
> > +   requires them. Such nodes that are not in the COMDAT group that have
> > +   references to COMDAT grouped nodes are called the COMDAT frontier.  */
> > +
> > +static bool
> > +merge_comdat_nodes (symtab_node *node, int set)
> > +{
> > +  enum symbol_partitioning_class c = node->get_partitioning_class ();
> > +  bool ret = false;
> > +  symtab_node *node1;
> > +  cgraph_edge *e;
> > +
> > +  /* If node is already analysed, quickly return.  */
> > +  if (node->aux)
> > +    return false;
> > +
> > +  /* Mark as analysed.  */
> > +  node->aux = (void *) 1;
> > +
> > +
> > +  /* Aglomerate the COMDAT group into the same partition.  */
> > +  if (node->same_comdat_group)
> > +    {
> > +      for (node1 = node->same_comdat_group;
> > +	   node1 != node; node1 = node1->same_comdat_group)
> > +	if (!node->alias)
> > +	  {
> > +	    ds->unite (node1->aux2, set);
> > +	    merge_comdat_nodes (node1, set);
> > +	    ret = true;
> > +	  }
> > +    }
> > +
> > +  /* Look at nodes that can reach the COMDAT group, and aglomerate to the
> > +     same partition.  These nodes are called the "COMDAT Frontier".  The
> > +     idea is that every unpartitioned node that reaches a COMDAT group MUST
> > +     go through the COMDAT frontier before reaching it.  Therefore, only
> > +     nodes in the frontier are exported.  */
> > +  if (node->same_comdat_group || c == SYMBOL_DUPLICATE)
> > +    {
> > +      int i;
> > +      struct ipa_ref *ref = NULL;
> > +
> > +      if (cgraph_node *cnode = dyn_cast <cgraph_node *> (node))
> > +	{
> > +	  /* Add all inline clones and callees that are duplicated.  */
> > +	  for (e = cnode->callers; e; e = e->next_caller)
> > +	    if (!e->inline_failed || c == SYMBOL_DUPLICATE)
> > +	      {
> > +		ds->unite (set, e->caller->aux2);
> > +		merge_comdat_nodes (e->caller, set);
> > +		ret = true;
> > +	      }
> > +
> > +	  /* Add all thunks associated with the function.  */
> > +	  for (e = cnode->callees; e; e = e->next_callee)
> > +	    if (e->caller->thunk.thunk_p && !e->caller->inlined_to)
> > +	      {
> > +		ds->unite (set, e->callee->aux2);
> > +		merge_comdat_nodes (e->callee, set);
> > +		ret = true;
> > +	      }
> > +	}
> 
> I do not think it is strictly necessary to merge comdat function with
> all users. If the comdat is some easy accessor this may prevent a lot of
> merging.
> 
> All you need to do is to place it into one of partitins and mark set
> "used" flag on the symbols used in other partitions, so it does not get
> optimized away when unused.  Other partitions should reffer to it w/o
> problems.

I guess I will need your help in the future with this, then.  In any
case, speedups was quite OK even with this, so probably I will leave it
as is, so it could be improved incrementally.

> 
> I am not sure what exactly the goal is here about not changing code
> layout.
> 
> > +/* Partition the program into several partitions with a restriction that
> > +   COMDATS are partitioned together with all nodes requiring them.  If
> > +   promote_statics is false, we also partition together static functions
> > +   and nodes that call eachother, so non-public functions are not promoted
> > +   to globals.  */
> > +
> > +void
> > +lto_merge_comdat_map (bool balance, bool promote_statics, int jobs)
> 
> BTW I think it is odd name for partitioner. Comdat is just one of
> problems it deals with. But I would still like to see this merged with
> the lto partitioning logic, so we could also use all kinds of
> partitioners in both modes.

Well... it does merge COMDATs into the same partition :D

> 
> It all looks quite nice, but lets work on avoiding the code duplication
> here...

Thank you for your detailed review.
Giuliano.

> 
> Honza

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/6] Parallelize Intra-Procedural Optimizations using the LTO Engine.
  2020-08-24 15:13   ` Giuliano Belinassi
@ 2020-08-29 11:31     ` Jan Hubicka
  2020-08-31  8:15       ` Richard Biener
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Hubicka @ 2020-08-29 11:31 UTC (permalink / raw)
  To: Giuliano Belinassi; +Cc: Richard Biener, GCC Patches

> 
> It not only creates hidden symbols, but it also changes the original
> symbol name to avoid clashses with other object files. It could be
> very nice to avoid doing this at all.
> 
> There was once an idea (I don't remember if from Richi or Honza) to
> avoid using partial linking, but instead concatenate the generated
> assembler files into a single assembler file and assemble it.  This
> would remove the requirement of symbol promotion, as they would be
> in the same assembler file, but I am not sure if this would work out
> of the box (i.e. if GCC generates assembler that could be concatenated
> together).

Not out of the box, because we number labels which will clash but we
also do smarter tihngs like producing constant pools based on whole unit
knowledge.

But it should not be technically that hard: simply initialize asm output
before fork to one file, in each fork arrange separate file and avoid
priting the end of file stuff and then concatenate in the main compiler
and work out the places where this breaks...

Honza
> 
> > 
> > > Bootstrapped and Regtested on Linux x86_64.
> > >
> > > Giuliano Belinassi (6):
> > >   Modify gcc driver for parallel compilation
> > >   Implement a new partitioner for parallel compilation
> > >   Implement fork-based parallelism engine
> > >   Add `+' for Jobserver Integration
> > >   Add invoke documentation
> > >   New tests for parallel compilation feature
> > >
> > >  gcc/Makefile.in                               |    6 +-
> > >  gcc/cgraph.c                                  |   16 +
> > >  gcc/cgraph.h                                  |   13 +
> > >  gcc/cgraphunit.c                              |  198 ++-
> > >  gcc/common.opt                                |    4 +
> > >  gcc/doc/invoke.texi                           |   32 +-
> > >  gcc/gcc.c                                     | 1219 +++++++++++++----
> > >  gcc/ipa-fnsummary.c                           |    2 +-
> > >  gcc/ipa-icf.c                                 |    3 +-
> > >  gcc/ipa-visibility.c                          |    3 +-
> > >  gcc/ipa.c                                     |    4 +-
> > >  gcc/jobserver.cc                              |  168 +++
> > >  gcc/jobserver.h                               |   33 +
> > >  gcc/lto-cgraph.c                              |  172 +++
> > >  gcc/{lto => }/lto-partition.c                 |  463 ++++++-
> > >  gcc/{lto => }/lto-partition.h                 |    4 +-
> > >  gcc/lto-streamer.h                            |    4 +
> > >  gcc/lto/Make-lang.in                          |    4 +-
> > >  gcc/lto/lto.c                                 |    2 +-
> > >  gcc/params.opt                                |    8 +
> > >  gcc/symtab.c                                  |   46 +-
> > >  gcc/testsuite/driver/a.c                      |    6 +
> > >  gcc/testsuite/driver/b.c                      |    6 +
> > >  gcc/testsuite/driver/driver.exp               |   80 ++
> > >  gcc/testsuite/driver/empty.c                  |    0
> > >  gcc/testsuite/driver/foo.c                    |    7 +
> > >  .../gcc.dg/parallel-early-constant.c          |   22 +
> > >  gcc/testsuite/gcc.dg/parallel-static-1.c      |   21 +
> > >  gcc/testsuite/gcc.dg/parallel-static-2.c      |   21 +
> > >  .../gcc.dg/parallel-static-clash-1.c          |   23 +
> > >  .../gcc.dg/parallel-static-clash-aux.c        |   14 +
> > >  gcc/toplev.c                                  |   58 +-
> > >  gcc/toplev.h                                  |    3 +
> > >  gcc/tree.c                                    |   23 +-
> > >  gcc/varasm.c                                  |   26 +-
> > >  intl/Makefile.in                              |    2 +-
> > >  libbacktrace/Makefile.in                      |    2 +-
> > >  libcpp/Makefile.in                            |    2 +-
> > >  libdecnumber/Makefile.in                      |    2 +-
> > >  libiberty/Makefile.in                         |  212 +--
> > >  zlib/Makefile.in                              |   64 +-
> > >  41 files changed, 2539 insertions(+), 459 deletions(-)
> > >  create mode 100644 gcc/jobserver.cc
> > >  create mode 100644 gcc/jobserver.h
> > >  rename gcc/{lto => }/lto-partition.c (78%)
> > >  rename gcc/{lto => }/lto-partition.h (89%)
> > >  create mode 100644 gcc/testsuite/driver/a.c
> > >  create mode 100644 gcc/testsuite/driver/b.c
> > >  create mode 100644 gcc/testsuite/driver/driver.exp
> > >  create mode 100644 gcc/testsuite/driver/empty.c
> > >  create mode 100644 gcc/testsuite/driver/foo.c
> > >  create mode 100644 gcc/testsuite/gcc.dg/parallel-early-constant.c
> > >  create mode 100644 gcc/testsuite/gcc.dg/parallel-static-1.c
> > >  create mode 100644 gcc/testsuite/gcc.dg/parallel-static-2.c
> > >  create mode 100644 gcc/testsuite/gcc.dg/parallel-static-clash-1.c
> > >  create mode 100644 gcc/testsuite/gcc.dg/parallel-static-clash-aux.c
> > >
> > > --
> > > 2.28.0
> > >

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/6] Implement fork-based parallelism engine
  2020-08-27 18:27     ` Giuliano Belinassi
@ 2020-08-29 11:41       ` Jan Hubicka
  2020-08-31  9:33       ` Richard Biener
  1 sibling, 0 replies; 31+ messages in thread
From: Jan Hubicka @ 2020-08-29 11:41 UTC (permalink / raw)
  To: Giuliano Belinassi; +Cc: gcc-patches, richard.guenther

> Hi, Honza.
> 
> Thank you for your detailed review!
> 
> On 08/27, Jan Hubicka wrote:
> > > diff --git a/gcc/cgraph.c b/gcc/cgraph.c
> > > index c0b45795059..22405098dc5 100644
> > > --- a/gcc/cgraph.c
> > > +++ b/gcc/cgraph.c
> > > @@ -226,6 +226,22 @@ cgraph_node::delete_function_version_by_decl (tree decl)
> > >    decl_node->remove ();
> > >  }
> > >  
> > > +/* Release function dominator info if present.  */
> > > +
> > > +void
> > > +cgraph_node::maybe_release_dominators (void)
> > > +{
> > > +  struct function *fun = DECL_STRUCT_FUNCTION (decl);
> > > +
> > > +  if (fun && fun->cfg)
> > > +    {
> > > +      if (dom_info_available_p (fun, CDI_DOMINATORS))
> > > +	free_dominance_info (fun, CDI_DOMINATORS);
> > > +      if (dom_info_available_p (fun, CDI_POST_DOMINATORS))
> > > +	free_dominance_info (fun, CDI_POST_DOMINATORS);
> > > +    }
> > > +}
> > 
> > I am not sure if that needs to be member function, but if so we want to
> > merge it with other places in cgraph.c and cgraphunit.c where dominators
> > are freed.  I do not think you need to check avalability.
> 
> This is necessary to remove some nodes from the callgraph.  For some
> reason, if I node->remove () and it still have the dominance info
> available, it will fail some assertions on the compiler.
> 
> However, with regard to code layout, this can be moved to lto-cgraph.c,
> as it is only used there.

node->remove () is supposed to work here.  I relalize that we remove
random nodes currently only via the unreachable code removal that also
does node->remove_body () so there may be some bug.
> > > +  /* Get symtab node by order.  */
> > > +  static symtab_node *find_by_order (int order);
> > 
> > This is quadratic and moreover seems unused. Why do you add it?
> 
> I added this for debugging, since I used this a lot inside GDB.
> Sure, I can remove this without any problems, or print a warning
> for the developer to avoid calling this in production code.

We could keep this as separate patch, since it is independent of rest of
changes.   Having more debug_* functions is
fine with me, but I would like to having nondebug members since sooner
or later someone will use them in non-debug code.
> > > +  /* FIXME: Find out why body-removed nodes are marked for output.  */
> > > +  if (body_removed)
> > > +    return;
> > 
> > Indeed, we should know :)
> 
> Looks like this was due an early problem. I removed this and bootstrap
> is working OK.

Cool, random changes like this looks suspicious :)
> > 
> > This looks odd, we have other places where we parse number from command
> > line :)
> 
> isdigit () is poisoned in GCC. But I guess I should look how -flto=
> does this.

Seems we simply do atoi and compare with 0 that is invalid.
> > 
> > Messing with WPA/ltrans flags is not good idea.  You already have
> > split_output for parallel build.  I sort of see why you set ltrans, but
> > why WPA?
> 
> Some assertions expected flag_wpa to be present.  Sure, I can add
> split_outputs in these assertions.

Yep, that would be cleaner. 
> > 
> > There is boundary computation in lto-cgraph.c so we should merge the
> > logic...
> 
> Agree.
> 
> > If you keep the lto-partition datastructures it will compute boundary
> > for you and you can just remove the rest (I got that working at one
> > point).
> 
> This is interesting, because I could not get that working out of the
> box. The lto_promote_cross_statics did not provide a fully working
> boundary that I could simple remove everything else. If you take a
> closer look, you will see that I am using the already computed boundary
> as a base, and incrementing that with the extra stuff required
> (mainly inline clones body).

What problems exactly you run into when you remove the extra code?
> > 
> > Aha, that is what it does :)
> > I wonder if creating the file conditionally (and late) can not lead to
> > race condition where we will use same tmp file name for other build
> > executed in parallel by make.
> 
> Hummm. True. Added to my TODO list :)
> Well, I never had any sort of issues with race condition here, even
> after stressing it, but this certainly is not proof that it is free of
> race conditition :)

I really do not know how this works, since we create other files with
delay too.  I suppose we get a tmp filename that is tested to be unique
and then we simply assume that we can derive names from it.

Honza

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/6] Parallelize Intra-Procedural Optimizations using the LTO Engine.
  2020-08-29 11:31     ` Jan Hubicka
@ 2020-08-31  8:15       ` Richard Biener
  2020-08-31 11:44         ` Jan Hubicka
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Biener @ 2020-08-31  8:15 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Giuliano Belinassi, GCC Patches

On Sat, Aug 29, 2020 at 1:31 PM Jan Hubicka <hubicka@ucw.cz> wrote:
>
> >
> > It not only creates hidden symbols, but it also changes the original
> > symbol name to avoid clashses with other object files. It could be
> > very nice to avoid doing this at all.
> >
> > There was once an idea (I don't remember if from Richi or Honza) to
> > avoid using partial linking, but instead concatenate the generated
> > assembler files into a single assembler file and assemble it.  This
> > would remove the requirement of symbol promotion, as they would be
> > in the same assembler file, but I am not sure if this would work out
> > of the box (i.e. if GCC generates assembler that could be concatenated
> > together).
>
> Not out of the box, because we number labels which will clash but we
> also do smarter tihngs like producing constant pools based on whole unit
> knowledge.
>
> But it should not be technically that hard: simply initialize asm output
> before fork to one file, in each fork arrange separate file and avoid
> priting the end of file stuff and then concatenate in the main compiler
> and work out the places where this breaks...

Yeah.  Or even refactor the output machinery so that in theory
we can create asm fragments [into memory] for functions and variables
and only at the end concat/output them in a desired order  (cf.
-fno-toplevel-reorder
wrt toporder code-generation we'd prefer).  That would also be one (small)
step towards eventually being able to thread the RTL pipeline.

Note that in theory the auto-parallel work could be leveraged to
elide LTRANS streaming if we'd drive LTRANS compile from WPA
instead of from LTO wrapper.  We could simply do the forking,
apply the partition and then load bodies from the original IL files
(with the caveat of needing GC to truncate the decls and types
memory from WPA).

But yes, avoiding the partial link would be nice - it would also
be possible to support parallelizing -S compiles.  And it would
avoid the symbol renaming.

Richard.

> Honza
> >
> > >
> > > > Bootstrapped and Regtested on Linux x86_64.
> > > >
> > > > Giuliano Belinassi (6):
> > > >   Modify gcc driver for parallel compilation
> > > >   Implement a new partitioner for parallel compilation
> > > >   Implement fork-based parallelism engine
> > > >   Add `+' for Jobserver Integration
> > > >   Add invoke documentation
> > > >   New tests for parallel compilation feature
> > > >
> > > >  gcc/Makefile.in                               |    6 +-
> > > >  gcc/cgraph.c                                  |   16 +
> > > >  gcc/cgraph.h                                  |   13 +
> > > >  gcc/cgraphunit.c                              |  198 ++-
> > > >  gcc/common.opt                                |    4 +
> > > >  gcc/doc/invoke.texi                           |   32 +-
> > > >  gcc/gcc.c                                     | 1219 +++++++++++++----
> > > >  gcc/ipa-fnsummary.c                           |    2 +-
> > > >  gcc/ipa-icf.c                                 |    3 +-
> > > >  gcc/ipa-visibility.c                          |    3 +-
> > > >  gcc/ipa.c                                     |    4 +-
> > > >  gcc/jobserver.cc                              |  168 +++
> > > >  gcc/jobserver.h                               |   33 +
> > > >  gcc/lto-cgraph.c                              |  172 +++
> > > >  gcc/{lto => }/lto-partition.c                 |  463 ++++++-
> > > >  gcc/{lto => }/lto-partition.h                 |    4 +-
> > > >  gcc/lto-streamer.h                            |    4 +
> > > >  gcc/lto/Make-lang.in                          |    4 +-
> > > >  gcc/lto/lto.c                                 |    2 +-
> > > >  gcc/params.opt                                |    8 +
> > > >  gcc/symtab.c                                  |   46 +-
> > > >  gcc/testsuite/driver/a.c                      |    6 +
> > > >  gcc/testsuite/driver/b.c                      |    6 +
> > > >  gcc/testsuite/driver/driver.exp               |   80 ++
> > > >  gcc/testsuite/driver/empty.c                  |    0
> > > >  gcc/testsuite/driver/foo.c                    |    7 +
> > > >  .../gcc.dg/parallel-early-constant.c          |   22 +
> > > >  gcc/testsuite/gcc.dg/parallel-static-1.c      |   21 +
> > > >  gcc/testsuite/gcc.dg/parallel-static-2.c      |   21 +
> > > >  .../gcc.dg/parallel-static-clash-1.c          |   23 +
> > > >  .../gcc.dg/parallel-static-clash-aux.c        |   14 +
> > > >  gcc/toplev.c                                  |   58 +-
> > > >  gcc/toplev.h                                  |    3 +
> > > >  gcc/tree.c                                    |   23 +-
> > > >  gcc/varasm.c                                  |   26 +-
> > > >  intl/Makefile.in                              |    2 +-
> > > >  libbacktrace/Makefile.in                      |    2 +-
> > > >  libcpp/Makefile.in                            |    2 +-
> > > >  libdecnumber/Makefile.in                      |    2 +-
> > > >  libiberty/Makefile.in                         |  212 +--
> > > >  zlib/Makefile.in                              |   64 +-
> > > >  41 files changed, 2539 insertions(+), 459 deletions(-)
> > > >  create mode 100644 gcc/jobserver.cc
> > > >  create mode 100644 gcc/jobserver.h
> > > >  rename gcc/{lto => }/lto-partition.c (78%)
> > > >  rename gcc/{lto => }/lto-partition.h (89%)
> > > >  create mode 100644 gcc/testsuite/driver/a.c
> > > >  create mode 100644 gcc/testsuite/driver/b.c
> > > >  create mode 100644 gcc/testsuite/driver/driver.exp
> > > >  create mode 100644 gcc/testsuite/driver/empty.c
> > > >  create mode 100644 gcc/testsuite/driver/foo.c
> > > >  create mode 100644 gcc/testsuite/gcc.dg/parallel-early-constant.c
> > > >  create mode 100644 gcc/testsuite/gcc.dg/parallel-static-1.c
> > > >  create mode 100644 gcc/testsuite/gcc.dg/parallel-static-2.c
> > > >  create mode 100644 gcc/testsuite/gcc.dg/parallel-static-clash-1.c
> > > >  create mode 100644 gcc/testsuite/gcc.dg/parallel-static-clash-aux.c
> > > >
> > > > --
> > > > 2.28.0
> > > >

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/6] Implement a new partitioner for parallel compilation
  2020-08-20 22:00 ` [PATCH 2/6] Implement a new partitioner " Giuliano Belinassi
  2020-08-27 15:18   ` Jan Hubicka
@ 2020-08-31  9:25   ` Richard Biener
  1 sibling, 0 replies; 31+ messages in thread
From: Richard Biener @ 2020-08-31  9:25 UTC (permalink / raw)
  To: Giuliano Belinassi; +Cc: GCC Patches, Jan Hubicka

On Fri, Aug 21, 2020 at 12:00 AM Giuliano Belinassi
<giuliano.belinassi@usp.br> wrote:
>
> When using the LTO infrastructure to compile files in parallel, we
> can't simply use any of the LTO partitioner, once extra dependency
> analysis is required to ensure that some nodes are correctly
> partitioned together.
>
> Therefore, here we implement a new partitioner called
> "lto_merge_comdat_map" that does all these required analysis.
> The partitioner works as follows:
>
> 1. We create a number of disjoint sets and inserts each node into a
>    separate set, which may be merged together in the future.
>
> 2. Find COMDAT groups, and mark them to be partitioned together.
>
> 3. Check all nodes that would require any COMDAT group to be
>    copied to its partition (which we name "COMDAT frontier"),
>    and mark them to be partitioned together.
>    This avoids duplication of COMDAT groups and crashes on the LTO
>    partitioning infrastructure.
>
> 4. Check if the user allows the partitioner to promote non-public
>    functions or variables to global to improve parallelization
>    opportunity with a cost of modifying the output code layout.
>
> 5. Balance generated partitions for performance unless not told to.
>
> The choice of 1. was by design, so we could use a union-find
> data structure, which are know for being very fast on set unite
> operations.
>
> For 3. to work properly, we also had to modify
> lto_promote_cross_file_statics to handle this case.
>
> The parameters --param=promote-statics and --param=balance-partitions
> control 4. and 5., respectively

Just a few comments ontop of Honzas remarks:

> gcc/ChangeLog:
> 2020-08-20  Giuliano Belinassi  <giuliano.belinassi@usp.br>
>
>         * Makefile.in: Add lto-partition.o
>         * cgraph.h (struct symtab_node::aux2): New variable.
>         * lto-partition.c: Move from gcc/lto/lto-partition.c
>         (add_symbol_to_partition_1): Only compute insn size
>         if information is available.
>         (node_cmp): Same as above.
>         (class union_find): New.
>         (ds_print_roots): New function.
>         (balance_partitions): New function.
>         (build_ltrans_partitions): New function.
>         (merge_comdat_nodes): New function.
>         (merge_static_calls): New function.
>         (merge_contained_symbols): New function.
>         (lto_merge_comdat_map): New function.
>         (privatize_symbol_name_1): Handle when WPA is not enabled.
>         (privatize_symbol_name): Same as above.
>         (lto_promote_cross_file_statics): New parameter to select when
>         to promote to global.
>         (lto_check_usage_from_other_partitions): New function.
>         * lto-partition.h: Move from gcc/lto/lto-partition.h
>         (lto_promote_cross_file_statics): Update prototype.
>         (lto_check_usage_from_other_partitions): Declare.
>         (lto_merge_comdat_map): Declare.
>
> gcc/lto/ChangeLog:
> 2020-08-20  Giuliano Belinassi  <giuliano.belinassi@usp.br>
>
>         * lto-partition.c: Move to gcc/lto-partition.c.
>         * lto-partition.h: Move to gcc/lto-partition.h.
>         * lto.c: Update call to lto_promote_cross_file_statics.
>         * Makefile.in: Remove lto-partition.o.
> ---
>  gcc/Makefile.in               |   1 +
>  gcc/cgraph.h                  |   1 +
>  gcc/{lto => }/lto-partition.c | 463 +++++++++++++++++++++++++++++++++-
>  gcc/{lto => }/lto-partition.h |   4 +-
>  gcc/lto/Make-lang.in          |   4 +-
>  gcc/lto/lto.c                 |   2 +-
>  gcc/params.opt                |   8 +
>  gcc/tree.c                    |  23 +-
>  8 files changed, 489 insertions(+), 17 deletions(-)
>  rename gcc/{lto => }/lto-partition.c (78%)
>  rename gcc/{lto => }/lto-partition.h (89%)
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 79e854aa938..be42b15f4ff 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1457,6 +1457,7 @@ OBJS = \
>         lra-spills.o \
>         lto-cgraph.o \
>         lto-streamer.o \
> +       lto-partition.o \
>         lto-streamer-in.o \
>         lto-streamer-out.o \
>         lto-section-in.o \
> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> index 0211f08964f..b4a7871bd3d 100644
> --- a/gcc/cgraph.h
> +++ b/gcc/cgraph.h
> @@ -615,6 +615,7 @@ public:
>    struct lto_file_decl_data * lto_file_data;
>
>    PTR GTY ((skip)) aux;
> +  int aux2;
>
>    /* Comdat group the symbol is in.  Can be private if GGC allowed that.  */
>    tree x_comdat_group;
> diff --git a/gcc/lto/lto-partition.c b/gcc/lto-partition.c
> similarity index 78%
> rename from gcc/lto/lto-partition.c
> rename to gcc/lto-partition.c
> index 8e0488ab13e..ca962e69b5d 100644
> --- a/gcc/lto/lto-partition.c
> +++ b/gcc/lto-partition.c
> @@ -170,7 +170,11 @@ add_symbol_to_partition_1 (ltrans_partition part, symtab_node *node)
>      {
>        struct cgraph_edge *e;
>        if (!node->alias && c == SYMBOL_PARTITION)
> -       part->insns += ipa_size_summaries->get (cnode)->size;
> +       {
> +         /* FIXME: Find out why this is being returned NULL in some cases.  */
> +         if (ipa_size_summaries->get (cnode))
> +           part->insns += ipa_size_summaries->get (cnode)->size;
> +       }
>
>        /* Add all inline clones and callees that are duplicated.  */
>        for (e = cnode->callees; e; e = e->next_callee)
> @@ -372,6 +376,402 @@ lto_max_map (void)
>      new_partition ("empty");
>  }
>
> +/* Class implementing a union-find algorithm.  */
> +
> +class union_find
> +{
> +public:
> +
> +  int *parent;
> +  int *rank;
> +  int n;
> +  int successful_unions;
> +
> +  union_find (int num_nodes)
> +    {
> +      n = num_nodes;
> +      parent = XNEWVEC (int, n);
> +      rank   = XNEWVEC (int, n);
> +
> +      for (int i = 0; i < n; ++i)
> +       parent[i] = i;
> +
> +      memset (rank, 0, n*sizeof(*rank));
> +      successful_unions = 0;
> +    }
> +
> +  ~union_find ()
> +    {
> +      free (parent);
> +      free (rank);
> +    }
> +
> +  int find (int x)
> +    {
> +      while (parent[x] != x)
> +       {
> +         parent[x] = parent[parent[x]];
> +         x = parent[x];
> +       }
> +      return x;
> +    }
> +
> +  void unite (int x, int y)
> +    {
> +      int x_root = find (x);
> +      int y_root = find (y);
> +
> +      if (x_root == y_root) /* If x and y are in same set.  */
> +       return;
> +
> +      successful_unions++;
> +
> +      if (rank[x_root] > rank[y_root]) /* Get which ones have greater rank.  */
> +       {
> +         x_root ^= y_root; /* Swap.  */
> +         y_root ^= x_root;
> +         x_root ^= y_root;

You can use std::swap (x_root, y_root); which is nicer to read.

> +       }
> +
> +      parent[y_root] = x_root;
> +      if (rank[x_root] == rank[y_root])
> +       rank[x_root]++;
> +    }
> +
> +  void print_roots ()
> +    {
> +      int i;
> +      for (i = 0; i < n; ++i)
> +       printf ("%d, ", find (i));
> +      printf ("\n");
> +    }
> +};
> +
> +static union_find *ds;
> +
> +DEBUG_FUNCTION void ds_print_roots (void)
> +{
> +  ds->print_roots ();
> +}
> +
> +static bool
> +privatize_symbol_name (symtab_node *);
> +
> +static void
> +promote_symbol (symtab_node *);
> +
> +/* Quickly balance partitions, trying to reach target_size in each of
> +   them.  Returns true if something was done, or false if we decided
> +   that it is not worth.  */
> +
> +static bool
> +balance_partitions (union_find *ds, int n, int jobs)
> +{
> +  int *sizes, i, j;
> +  int total_size = 0, max_size = -1;
> +  int target_size;
> +  const int eps = 0;
> +
> +  symtab_node *node;
> +
> +  sizes = (int *) alloca (n * sizeof (*sizes));
> +  memset (sizes, 0, n * sizeof (*sizes));
> +
> +  /* Compute costs.  */
> +  i = 0;
> +  FOR_EACH_SYMBOL (node)
> +    {
> +      int root = ds->find (i);
> +
> +      if (cgraph_node *cnode = dyn_cast<cgraph_node *> (node))
> +       {
> +         ipa_size_summary *summary = ipa_size_summaries->get (cnode);
> +         if (summary)
> +           sizes[root] += summary->size;
> +         else
> +           sizes[root] += 10;
> +       }
> +      else
> +       sizes[root] += 10;
> +
> +
> +      i++;
> +    }
> +
> +  /* Compute the total size and maximum size.  */
> +  for (i = 0; i < n; ++i)
> +    {
> +      total_size += sizes[i];
> +      max_size    = MAX (max_size, sizes[i]);
> +    }
> +
> +  /* Quick return if total size is small.  */
> +  if (total_size < param_min_partition_size)
> +    return false;
> +
> +  target_size = total_size / (jobs + 1);
> +
> +  /* Unite small partitions.  */
> +  for (i = 0, j = 0; j < n; ++j)
> +    {
> +      if (sizes[j] == 0)
> +       continue;
> +
> +      if (i == -1)
> +       i = j;
> +      else
> +       {
> +         if (sizes[i] + sizes[j] < target_size + eps)
> +           {
> +             ds->unite (i, j);
> +             sizes[i] += sizes[j];
> +             sizes[j] = 0;
> +           }
> +         else
> +             i = j;
> +       }
> +    }
> +  return true;
> +}
> +
> +/* Builds the LTRANS partitions, or return if not needed.  */
> +
> +static int
> +build_ltrans_partitions (union_find *ds, int n)
> +{
> +  int i, n_partitions;
> +  symtab_node *node;
> +
> +  int *compression = (int *) alloca (n * sizeof (*compression));
> +  for (i = 0; i < n; ++i)
> +    compression[i] = -1; /* Invalid value.  */
> +
> +  i = 0, n_partitions = 0;
> +  FOR_EACH_SYMBOL (node)
> +    {
> +      int root = ds->find (i);
> +      node->aux2 = root;
> +      node->aux = NULL;
> +
> +      if (node->get_partitioning_class () == SYMBOL_PARTITION
> +         && compression[root] < 0)
> +       compression[root] = n_partitions++;
> +      i++;
> +    }
> +
> +  if (dump_file)
> +    fprintf (dump_file, "n_partitions = %d\n", n_partitions);
> +
> +  if (n_partitions <= 1)
> +    return false;
> +
> +  /* Create LTRANS partitions.  */
> +  ltrans_partitions.create (n_partitions);
> +  for (i = 0; i < n_partitions; i++)
> +    new_partition ("");
> +
> +  FOR_EACH_SYMBOL (node)
> +    {
> +      if (node->get_partitioning_class () != SYMBOL_PARTITION
> +         || symbol_partitioned_p (node))
> +         continue;
> +
> +      int p = compression[node->aux2];
> +      if (dump_file)
> +       fprintf (dump_file, "p = %d\t;; %s\n", p, node->dump_name ());
> +      add_symbol_to_partition (ltrans_partitions[p], node);
> +    }
> +
> +  return true;
> +}
> +
> +/* Partition COMDAT groups together, and also bring together nodes that
> +   requires them. Such nodes that are not in the COMDAT group that have
> +   references to COMDAT grouped nodes are called the COMDAT frontier.  */
> +
> +static bool
> +merge_comdat_nodes (symtab_node *node, int set)
> +{
> +  enum symbol_partitioning_class c = node->get_partitioning_class ();
> +  bool ret = false;
> +  symtab_node *node1;
> +  cgraph_edge *e;
> +
> +  /* If node is already analysed, quickly return.  */
> +  if (node->aux)
> +    return false;
> +
> +  /* Mark as analysed.  */
> +  node->aux = (void *) 1;
> +
> +
> +  /* Aglomerate the COMDAT group into the same partition.  */
> +  if (node->same_comdat_group)
> +    {
> +      for (node1 = node->same_comdat_group;
> +          node1 != node; node1 = node1->same_comdat_group)
> +       if (!node->alias)
> +         {
> +           ds->unite (node1->aux2, set);
> +           merge_comdat_nodes (node1, set);
> +           ret = true;
> +         }
> +    }
> +
> +  /* Look at nodes that can reach the COMDAT group, and aglomerate to the
> +     same partition.  These nodes are called the "COMDAT Frontier".  The
> +     idea is that every unpartitioned node that reaches a COMDAT group MUST
> +     go through the COMDAT frontier before reaching it.  Therefore, only
> +     nodes in the frontier are exported.  */
> +  if (node->same_comdat_group || c == SYMBOL_DUPLICATE)
> +    {
> +      int i;
> +      struct ipa_ref *ref = NULL;
> +
> +      if (cgraph_node *cnode = dyn_cast <cgraph_node *> (node))
> +       {
> +         /* Add all inline clones and callees that are duplicated.  */
> +         for (e = cnode->callers; e; e = e->next_caller)
> +           if (!e->inline_failed || c == SYMBOL_DUPLICATE)
> +             {
> +               ds->unite (set, e->caller->aux2);
> +               merge_comdat_nodes (e->caller, set);
> +               ret = true;
> +             }
> +
> +         /* Add all thunks associated with the function.  */
> +         for (e = cnode->callees; e; e = e->next_callee)
> +           if (e->caller->thunk.thunk_p && !e->caller->inlined_to)
> +             {
> +               ds->unite (set, e->callee->aux2);
> +               merge_comdat_nodes (e->callee, set);
> +               ret = true;
> +             }
> +       }
> +
> +      for (i = 0; node->iterate_referring (i, ref); i++)
> +       {
> +         symtab_node *node1 = ref->referring;
> +         ds->unite (node1->aux2, set);
> +         ret = true;
> +
> +         if (node1->get_partitioning_class () == SYMBOL_DUPLICATE)
> +           merge_comdat_nodes (node1, set);
> +       }
> +    }
> +
> +  return ret;
> +}
> +
> +/* Bring together static nodes that are called by static functions, so
> +   promotion of statics to globals are not required.  This *MIGHT* negatively
> +   impact the number of partitions, and even generate very umbalanced
> +   partitions that can't be fixed.  */
> +
> +static bool
> +merge_static_calls (symtab_node *node, int set)
> +{
> +  bool ret = false;
> +  enum symbol_partitioning_class c = node->get_partitioning_class ();
> +
> +  if (node->aux)
> +    return false;
> +
> +  node->aux = (void *) 1;
> +
> +
> +  if (!TREE_PUBLIC (node->decl) || c == SYMBOL_DUPLICATE)
> +    {
> +      int i;
> +      struct ipa_ref *ref = NULL;
> +
> +      if (cgraph_node *cnode = dyn_cast <cgraph_node *> (node))
> +       {
> +         for (cgraph_edge *e = cnode->callers; e; e = e->next_caller)
> +           {
> +             /* FIXME: In theory, inlined functions should be a criteria to not
> +                merge partitions.  */
> +             ds->unite (node->aux2, e->caller->aux2);
> +             merge_static_calls (e->caller, set);
> +             ret = true;
> +           }
> +
> +       }
> +
> +      for (i = 0; node->iterate_referring (i, ref); ++i)
> +       {
> +         symtab_node *node1 = ref->referring;
> +         ds->unite (node1->aux2, set);
> +         merge_static_calls (node1, set);
> +         ret = true;
> +       }
> +    }
> +
> +  return ret;
> +}
> +
> +static bool
> +merge_contained_symbols (symtab_node *node, int set)
> +{
> +  bool ret = false;
> +  symtab_node *node1;
> +
> +  while ((node1 = contained_in_symbol (node)) != node)
> +    {
> +      node = node1;
> +      ds->unite (node->aux2, set);
> +      ret = true;
> +    }
> +
> +  return ret;
> +}
> +
> +/* Partition the program into several partitions with a restriction that
> +   COMDATS are partitioned together with all nodes requiring them.  If
> +   promote_statics is false, we also partition together static functions
> +   and nodes that call eachother, so non-public functions are not promoted
> +   to globals.  */
> +
> +void
> +lto_merge_comdat_map (bool balance, bool promote_statics, int jobs)
> +{
> +  symtab_node *node;
> +  int n = 0;
> +
> +  /* Initialize each not into its own distinct disjoint sets.  */
> +  FOR_EACH_SYMBOL (node)
> +    node->aux2 = n++;
> +
> +  union_find disjoint_sets = union_find (n);
> +  ds = &disjoint_sets;
> +
> +  /* First look at COMDATs.  */
> +  FOR_EACH_SYMBOL (node)
> +    {
> +      if (node->same_comdat_group)
> +       merge_comdat_nodes (node, node->aux2);
> +      merge_contained_symbols (node, node->aux2);
> +    }
> +
> +  FOR_EACH_SYMBOL (node)
> +    node->aux = NULL;
> +
> +  /* Then look at STATICs, if needed.  */
> +  if (!promote_statics)
> +    FOR_EACH_SYMBOL (node)
> +      if (!TREE_PUBLIC (node->decl))
> +       merge_static_calls (node, node->aux2);
> +
> +  FOR_EACH_SYMBOL (node)
> +    node->aux = NULL;
> +
> +  if (balance && !balance_partitions (&disjoint_sets, n, jobs))
> +    return;
> +
> +  build_ltrans_partitions (&disjoint_sets, n);
> +}
> +
> +
>  /* Helper function for qsort; sort nodes by order.  */
>  static int
>  node_cmp (const void *pa, const void *pb)
> @@ -931,7 +1331,7 @@ static hash_map<const char *, unsigned> *lto_clone_numbers;
>     represented by DECL.  */
>
>  static bool
> -privatize_symbol_name_1 (symtab_node *node, tree decl)
> +privatize_symbol_name_1 (symtab_node *node, tree decl, bool wpa)
>  {
>    const char *name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
>
> @@ -939,11 +1339,19 @@ privatize_symbol_name_1 (symtab_node *node, tree decl)
>      return false;
>
>    name = maybe_rewrite_identifier (name);
> -  unsigned &clone_number = lto_clone_numbers->get_or_insert (name);
> -  symtab->change_decl_assembler_name (decl,
> -                                     clone_function_name (
> -                                         name, "lto_priv", clone_number));
> -  clone_number++;
> +  if (wpa)
> +    {
> +      gcc_assert (lto_clone_numbers);
> +
> +      unsigned &clone_number = lto_clone_numbers->get_or_insert (name);
> +      symtab->change_decl_assembler_name (decl,
> +                                         clone_function_name (
> +                                             name, "lto_priv", clone_number));
> +      clone_number++;
> +    }
> +  else
> +    symtab->change_decl_assembler_name (decl, get_file_function_name
> +                                       (node->asm_name ()));
>
>    if (node->lto_file_data)
>      lto_record_renamed_decl (node->lto_file_data, name,
> @@ -968,7 +1376,9 @@ privatize_symbol_name_1 (symtab_node *node, tree decl)
>  static bool
>  privatize_symbol_name (symtab_node *node)
>  {
> -  if (!privatize_symbol_name_1 (node, node->decl))
> +  bool wpa = !split_outputs;
> +
> +  if (!privatize_symbol_name_1 (node, node->decl, wpa))
>      return false;
>
>    return true;
> @@ -1117,7 +1527,7 @@ rename_statics (lto_symtab_encoder_t encoder, symtab_node *node)
>     all inlinees are added.  */
>
>  void
> -lto_promote_cross_file_statics (void)
> +lto_promote_cross_file_statics (bool promote)
>  {
>    unsigned i, n_sets;
>
> @@ -1147,10 +1557,17 @@ lto_promote_cross_file_statics (void)
>            lsei_next (&lsei))
>          {
>            symtab_node *node = lsei_node (lsei);
> +         cgraph_node *cnode = dyn_cast <cgraph_node *> (node);
>
>           /* If symbol is static, rename it if its assembler name
>              clashes with anything else in this unit.  */
>           rename_statics (encoder, node);
> +         if (cnode)
> +           {
> +             bool in_partition = lsei.encoder->nodes[lsei.index].in_partition;
> +             if (!in_partition)
> +               cnode->local = false;
> +           }
>
>           /* No need to promote if symbol already is externally visible ... */
>           if (node->externally_visible
> @@ -1163,8 +1580,12 @@ lto_promote_cross_file_statics (void)
>               validize_symbol_for_target (node);
>               continue;
>             }
> -
> -          promote_symbol (node);
> +         if (promote)
> +           {
> +             promote_symbol (node);
> +             if (cnode && split_outputs)
> +               cnode->local = false;
> +           }
>          }
>      }
>    delete lto_clone_numbers;
> @@ -1186,3 +1607,23 @@ lto_promote_statics_nonwpa (void)
>      }
>    delete lto_clone_numbers;
>  }
> +
> +/* Check if a variable is accessed across partitions.  If yesm then update
> +   used_from_other_partition.  */
> +
> +void
> +lto_check_usage_from_other_partitions (void)
> +{
> +  unsigned int i, j;
> +  for (i = 0; i < ltrans_partitions.length (); i++)
> +    {
> +      vec<lto_encoder_entry> &nodes = (ltrans_partitions[i])->encoder->nodes;
> +
> +      for (j = 0; j < nodes.length (); j++)
> +       {
> +         symtab_node *node = nodes[j].node;
> +         if (node && !nodes[j].in_partition)
> +           node->used_from_other_partition = true;
> +       }
> +    }
> +}
> diff --git a/gcc/lto/lto-partition.h b/gcc/lto-partition.h
> similarity index 89%
> rename from gcc/lto/lto-partition.h
> rename to gcc/lto-partition.h
> index 42b5ea8c80c..4a1b17fa728 100644
> --- a/gcc/lto/lto-partition.h
> +++ b/gcc/lto-partition.h
> @@ -36,6 +36,8 @@ extern vec<ltrans_partition> ltrans_partitions;
>  void lto_1_to_1_map (void);
>  void lto_max_map (void);
>  void lto_balanced_map (int, int);
> -void lto_promote_cross_file_statics (void);
> +void lto_promote_cross_file_statics (bool promote);
>  void free_ltrans_partitions (void);
>  void lto_promote_statics_nonwpa (void);
> +void lto_check_usage_from_other_partitions (void);
> +void lto_merge_comdat_map (bool, bool, int);
> diff --git a/gcc/lto/Make-lang.in b/gcc/lto/Make-lang.in
> index 0b73f9ef7bb..46b52cff183 100644
> --- a/gcc/lto/Make-lang.in
> +++ b/gcc/lto/Make-lang.in
> @@ -24,9 +24,9 @@ LTO_EXE = lto1$(exeext)
>  LTO_DUMP_EXE = lto-dump$(exeext)
>  LTO_DUMP_INSTALL_NAME := $(shell echo lto-dump|sed '$(program_transform_name)')
>  # The LTO-specific object files inclued in $(LTO_EXE).
> -LTO_OBJS = lto/lto-lang.o lto/lto.o lto/lto-object.o attribs.o lto/lto-partition.o lto/lto-symtab.o lto/lto-common.o
> +LTO_OBJS = lto/lto-lang.o lto/lto.o lto/lto-object.o attribs.o lto/lto-symtab.o lto/lto-common.o
>  lto_OBJS = $(LTO_OBJS)
> -LTO_DUMP_OBJS = lto/lto-lang.o lto/lto-object.o attribs.o lto/lto-partition.o lto/lto-symtab.o lto/lto-dump.o lto/lto-common.o
> +LTO_DUMP_OBJS = lto/lto-lang.o lto/lto-object.o attribs.o lto/lto-symtab.o lto/lto-dump.o lto/lto-common.o
>  lto_dump_OBJS = $(LTO_DUMP_OBJS)
>
>  # this is only useful in a LTO bootstrap, but this does not work right
> diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
> index 1c37814bde4..803b9920e35 100644
> --- a/gcc/lto/lto.c
> +++ b/gcc/lto/lto.c
> @@ -515,7 +515,7 @@ do_whole_program_analysis (void)
>    /* Find out statics that need to be promoted
>       to globals with hidden visibility because they are accessed from multiple
>       partitions.  */
> -  lto_promote_cross_file_statics ();
> +  lto_promote_cross_file_statics (true);
>    if (dump_file)
>       dump_end (partition_dump_id, dump_file);
>    dump_file = NULL;
> diff --git a/gcc/params.opt b/gcc/params.opt
> index f39e5d1a012..00fc58cd5cc 100644
> --- a/gcc/params.opt
> +++ b/gcc/params.opt
> @@ -366,6 +366,14 @@ Minimal size of a partition for LTO (in estimated instructions).
>  Common Joined UInteger Var(param_lto_partitions) Init(128) IntegerRange(1, 65536) Param
>  Number of partitions the program should be split to.
>
> +-param=promote-statics=
> +Common Joined UInteger Var(param_promote_statics) Init(0) IntegerRange(0, 1) Param
> +Allow statics and non-public functions to be promoted as public when compiling in parallel.
> +
> +-param=balance-partitions=
> +Common Joined UInteger Var(param_balance_partitions) Init(1) IntegerRange(0, 1) Param
> +When compiling in parallel, try to balance the partitions for compilation performance.
> +
>  -param=max-average-unrolled-insns=
>  Common Joined UInteger Var(param_max_average_unrolled_insns) Init(80) Param Optimization
>  The maximum number of instructions to consider to unroll in a loop on average.
> diff --git a/gcc/tree.c b/gcc/tree.c
> index d0202c3f785..3ca162d5070 100644
> --- a/gcc/tree.c
> +++ b/gcc/tree.c
> @@ -9595,6 +9595,24 @@ make_anon_name ()
>    return id;
>  }
>
> +/* Filter the input name removing characters that may confuse the linker.  */
> +
> +static void
> +filter_name (char *name)
> +{
> +  char *p = name;
> +
> +  while (*p != '\0')
> +    {
> +      switch (*p)
> +       {
> +         case '*':
> +           *p = '_';
> +       }
> +      p++;
> +    }
> +}
> +
>  /* Generate a name for a special-purpose function.
>     The generated name may need to be unique across the whole link.
>     Changes to this function may also require corresponding changes to
> @@ -9651,8 +9669,7 @@ get_file_function_name (const char *type)
>        q = (char *) alloca (9 + 19 + len + 1);
>        memcpy (q, file, len + 1);
>
> -      snprintf (q + len, 9 + 19 + 1, "_%08X_" HOST_WIDE_INT_PRINT_HEX,
> -               crc32_string (0, name), get_random_seed (false));
> +      snprintf (q + len, 9 + 19 + 1, "_%08X", crc32_string (0, name));
>
>        p = q;
>      }
> @@ -9665,7 +9682,9 @@ get_file_function_name (const char *type)
>       Use a global object (which is already required to be unique over
>       the program) rather than the file name (which imposes extra
>       constraints).  */
> +
>    sprintf (buf, FILE_FUNCTION_FORMAT, type, p);
> +  filter_name (buf);

I wonder why you need this - none of the other callers of
get_file_function_name do so?

I'd rather have you not use get_file_function_name but instead
modify the partitioners symbol promotion code to append a
hash computed once by the partitioning code.  Like maybe simply
hash the cgraph structure somehow:  for-each cgraph node in
UID oder hash UID and merge hashes of all callees (or if we
have some simple way, hash node UIDs in PRE order of the
graph?).

I wonder why the LTO code does not run into collisions - maybe
we do not try hard enough?  Guess doing LTO bootstrap with a modified
-flto that randomly turns itself on/off might show some cases.

Btw, I'd default to the symbol promotion being enabled for
explicit -fparallel.

Richard.

>
>    return get_identifier (buf);
>  }
> --
> 2.28.0
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/6] Implement fork-based parallelism engine
  2020-08-27 18:27     ` Giuliano Belinassi
  2020-08-29 11:41       ` Jan Hubicka
@ 2020-08-31  9:33       ` Richard Biener
  1 sibling, 0 replies; 31+ messages in thread
From: Richard Biener @ 2020-08-31  9:33 UTC (permalink / raw)
  To: Giuliano Belinassi; +Cc: Jan Hubicka, GCC Patches

On Thu, Aug 27, 2020 at 8:27 PM Giuliano Belinassi
<giuliano.belinassi@usp.br> wrote:
>
> Hi, Honza.
>
> Thank you for your detailed review!
>
> On 08/27, Jan Hubicka wrote:
> > > diff --git a/gcc/cgraph.c b/gcc/cgraph.c
> > > index c0b45795059..22405098dc5 100644
> > > --- a/gcc/cgraph.c
> > > +++ b/gcc/cgraph.c
> > > @@ -226,6 +226,22 @@ cgraph_node::delete_function_version_by_decl (tree decl)
> > >    decl_node->remove ();
> > >  }
> > >
> > > +/* Release function dominator info if present.  */
> > > +
> > > +void
> > > +cgraph_node::maybe_release_dominators (void)
> > > +{
> > > +  struct function *fun = DECL_STRUCT_FUNCTION (decl);
> > > +
> > > +  if (fun && fun->cfg)
> > > +    {
> > > +      if (dom_info_available_p (fun, CDI_DOMINATORS))
> > > +   free_dominance_info (fun, CDI_DOMINATORS);
> > > +      if (dom_info_available_p (fun, CDI_POST_DOMINATORS))
> > > +   free_dominance_info (fun, CDI_POST_DOMINATORS);
> > > +    }
> > > +}
> >
> > I am not sure if that needs to be member function, but if so we want to
> > merge it with other places in cgraph.c and cgraphunit.c where dominators
> > are freed.  I do not think you need to check avalability.
>
> This is necessary to remove some nodes from the callgraph.  For some
> reason, if I node->remove () and it still have the dominance info
> available, it will fail some assertions on the compiler.
>
> However, with regard to code layout, this can be moved to lto-cgraph.c,
> as it is only used there.
>
> > > +
> > >  /* Record that DECL1 and DECL2 are semantically identical function
> > >     versions.  */
> > >  void
> > > diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> > > index b4a7871bd3d..72ac19f9672 100644
> > > --- a/gcc/cgraph.h
> > > +++ b/gcc/cgraph.h
> > > @@ -463,6 +463,15 @@ public:
> > >       Return NULL if there's no such node.  */
> > >    static symtab_node *get_for_asmname (const_tree asmname);
> > >
> > > +  /* Get symtab node by order.  */
> > > +  static symtab_node *find_by_order (int order);
> >
> > This is quadratic and moreover seems unused. Why do you add it?
>
> I added this for debugging, since I used this a lot inside GDB.
> Sure, I can remove this without any problems, or print a warning
> for the developer to avoid calling this in production code.
>
> > > +
> > > +  /* Get symtab_node by its name.  */
> > > +  static symtab_node *find_by_name (const char *);
> >
> > Similarly here, note that names are not really very meaningful as lookup
> > things, since they get duplicated.
> > > +
> > > +  /* Get symtab_node by its ASM name.  */
> > > +  static symtab_node *find_by_asm_name (const char *);
> >
> > For this we have get_for_asmname (which also populates asm name hash as
> > needed and is not quadratic)
>
> Cool. I will surely remove this then :)
>
> > > diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
> > > index d10d635e942..73e4bed3b61 100644
> > > --- a/gcc/cgraphunit.c
> > > +++ b/gcc/cgraphunit.c
> > > @@ -2258,6 +2258,11 @@ cgraph_node::expand (void)
> > >  {
> > >    location_t saved_loc;
> > >
> > > +  /* FIXME: Find out why body-removed nodes are marked for output.  */
> > > +  if (body_removed)
> > > +    return;
> >
> > Indeed, we should know :)
>
> Looks like this was due an early problem. I removed this and bootstrap
> is working OK.
>
> > > +
> > > +
> > >    /* We ought to not compile any inline clones.  */
> > >    gcc_assert (!inlined_to);
> > >
> > > @@ -2658,6 +2663,7 @@ ipa_passes (void)
> > >
> > >        execute_ipa_summary_passes
> > >     ((ipa_opt_pass_d *) passes->all_regular_ipa_passes);
> > > +
> > This seems accidental.
>
> Yes.
>
> > >      }
> > >
> > >    /* Some targets need to handle LTO assembler output specially.  */
> > > @@ -2687,10 +2693,17 @@ ipa_passes (void)
> > >    if (flag_generate_lto || flag_generate_offload)
> > >      targetm.asm_out.lto_end ();
> > >
> > > -  if (!flag_ltrans
> > > +  if (split_outputs)
> > > +    flag_ltrans = true;
> > > +
> > > +  if ((!flag_ltrans || split_outputs)
> > >        && ((in_lto_p && flag_incremental_link != INCREMENTAL_LINK_LTO)
> > >       || !flag_lto || flag_fat_lto_objects))
> > >      execute_ipa_pass_list (passes->all_regular_ipa_passes);
> > > +
> > > +  if (split_outputs)
> > > +    flag_ltrans = false;
> > > +
> > >    invoke_plugin_callbacks (PLUGIN_ALL_IPA_PASSES_END, NULL);
> > >
> > >    bitmap_obstack_release (NULL);
> > > @@ -2742,6 +2755,185 @@ symbol_table::output_weakrefs (void)
> > >        }
> > >  }
> > >
> > > +static bool is_number (const char *str)
> > > +{
> > > +  while (*str != '\0')
> > > +    switch (*str++)
> > > +      {
> > > +   case '0':
> > > +   case '1':
> > > +   case '2':
> > > +   case '3':
> > > +   case '4':
> > > +   case '5':
> > > +   case '6':
> > > +   case '7':
> > > +   case '8':
> > > +   case '9':
> > > +     continue;
> > > +   default:
> > > +     return false;
> > > +      }
> > > +
> > > +  return true;
> > > +}
> >
> > This looks odd, we have other places where we parse number from command
> > line :)
>
> isdigit () is poisoned in GCC. But I guess I should look how -flto=
> does this.

You need to use the all-caps variants, ISDIGIT in this case.

> > > +
> > > +/* If forked, which child am I?  */
> > > +
> > > +static int childno = -1;
> > > +
> > > +static bool
> > > +maybe_compile_in_parallel (void)
> > > +{
> > > +  struct symtab_node *node;
> > > +  int partitions, i, j;
> > > +  int *pids;
> > > +
> > > +  bool promote_statics = param_promote_statics;
> > > +  bool balance = param_balance_partitions;
> > > +  bool jobserver = false;
> > > +  bool job_auto = false;
> > > +  int num_jobs = -1;
> > > +
> > > +  if (!flag_parallel_jobs || !split_outputs)
> > > +    return false;
> > > +
> > > +  if (!strcmp (flag_parallel_jobs, "auto"))
> > > +    {
> > > +      jobserver = jobserver_initialize ();
> > > +      job_auto = true;
> > > +    }
> > > +  else if (!strcmp (flag_parallel_jobs, "jobserver"))
> > > +    jobserver = jobserver_initialize ();
> > > +  else if (is_number (flag_parallel_jobs))
> > > +    num_jobs = atoi (flag_parallel_jobs);
> > > +  else
> > > +    gcc_unreachable ();
> > > +
> > > +  if (job_auto && !jobserver)
> > > +    {
> > > +      num_jobs = sysconf (_SC_NPROCESSORS_CONF);
> > > +      if (num_jobs > 2)
> > > +   num_jobs = 2;
> > > +    }
> > > +
> > > +  if (num_jobs < 0 && !jobserver)
> > > +    {
> > > +      inform (UNKNOWN_LOCATION,
> > > +         "-fparallel-jobs=jobserver, but no GNU Jobserver found");
> > > +      return false;
> > > +    }
> > > +
> > > +  if (jobserver)
> > > +    num_jobs = 2;
> > > +
> > > +  if (num_jobs == 0)
> > > +    {
> > > +      inform (UNKNOWN_LOCATION, "-fparallel-jobs=0 makes no sense");
> > > +      return false;
> > > +    }
> > > +
> > > +  /* Trick the compiler to think that we are in WPA.  */
> > > +  flag_wpa = "";
> > > +  symtab_node::checking_verify_symtab_nodes ();
> >
> > Messing with WPA/ltrans flags is not good idea.  You already have
> > split_output for parallel build.  I sort of see why you set ltrans, but
> > why WPA?
>
> Some assertions expected flag_wpa to be present.  Sure, I can add
> split_outputs in these assertions.
>
> > > +
> > > +  /* Partition the program so that COMDATs get mapped to the same
> > > +     partition.  If promote_statics is true, it also maps statics
> > > +     to the same partition.  If balance is true, try to balance the
> > > +     partitions for compilation performance.  */
> > > +  lto_merge_comdat_map (balance, promote_statics, num_jobs);
> > > +
> > > +  /* AUX pointers are used by partitioning code to bookkeep number of
> > > +     partitions symbol is in.  This is no longer needed.  */
> > > +  FOR_EACH_SYMBOL (node)
> > > +    node->aux = NULL;
> > > +
> > > +  /* We decided that partitioning is a bad idea.  In this case, just
> > > +     proceed with the default compilation method.  */
> > > +  if (ltrans_partitions.length () <= 1)
> > > +    {
> > > +      flag_wpa = NULL;
> > > +      jobserver_finalize ();
> > > +      return false;
> > > +    }
> > > +
> > > +  /* Find out statics that need to be promoted
> > > +     to globals with hidden visibility because they are accessed from
> > > +     multiple partitions.  */
> > > +  lto_promote_cross_file_statics (promote_statics);
> > > +
> > > +  /* Check if we have variables being referenced across partitions.  */
> > > +  lto_check_usage_from_other_partitions ();
> > > +
> > > +  /* Trick the compiler to think we are not in WPA anymore.  */
> > > +  flag_wpa = NULL;
> > > +
> > > +  partitions = ltrans_partitions.length ();
> > > +  pids = XALLOCAVEC (pid_t, partitions);
> > > +
> > > +  /* There is no point in launching more jobs than we have partitions.  */
> > > +  if (num_jobs > partitions)
> > > +    num_jobs = partitions;
> > > +
> > > +  /* Trick the compiler to think we are in LTRANS mode.  */
> > > +  flag_ltrans = true;
> > > +
> > > +  init_additional_asm_names_file ();
> > > +
> > > +  /* Flush asm file, so we don't get repeated output as we fork.  */
> > > +  fflush (asm_out_file);
> > > +
> > > +  /* Insert a token for child to consume.  */
> > > +  if (jobserver)
> > > +    {
> > > +      num_jobs = partitions;
> > > +      jobserver_return_token ('p');
> > > +    }
> > > +
> > > +  /* Spawn processes.  Spawn as soon as there is a free slot.  */
> > > +  for (j = 0, i = -num_jobs; i < partitions; i++, j++)
> > > +    {
> > > +      if (i >= 0)
> > > +   {
> > > +     int wstatus, ret;
> > > +     ret = waitpid (pids[i], &wstatus, 0);
> > > +
> > > +     if (ret < 0)
> > > +       internal_error ("Unable to wait child %d to finish", i);
> > > +     else if (WIFEXITED (wstatus))
> > > +       {
> > > +         if (WEXITSTATUS (wstatus) != 0)
> > > +           error ("Child %d exited with error", i);
> > > +       }
> > > +     else if (WIFSIGNALED (wstatus))
> > > +       error ("Child %d aborted with error", i);
> > > +   }
> > > +
> > > +      if (j < partitions)
> > > +   {
> > > +     gcc_assert (ltrans_partitions[j]->symbols > 0);
> > > +
> > > +     if (jobserver)
> > > +       jobserver_get_token ();
> > > +
> > > +     pids[j] = fork ();
> > > +     if (pids[j] == 0)
> > > +       {
> > > +         childno = j;
> > > +         lto_apply_partition_mask (ltrans_partitions[j]);
> > > +         return true;
> > > +       }
> > > +   }
> > > +    }
> > > +
> > > +  /* Get the token which parent inserted for the childs, which they returned by
> > > +     now.  */
> > > +  if (jobserver)
> > > +    jobserver_get_token ();
> > > +  exit (0);
> > > +}
> > > +
> > > +
> > >  /* Perform simple optimizations based on callgraph.  */
> > >
> > >  void
> > > @@ -2768,6 +2960,7 @@ symbol_table::compile (void)
> > >    {
> > >      timevar_start (TV_CGRAPH_IPA_PASSES);
> > >      ipa_passes ();
> > > +    maybe_compile_in_parallel ();
> > >      timevar_stop (TV_CGRAPH_IPA_PASSES);
> > >    }
> > >    /* Do nothing else if any IPA pass found errors or if we are just streaming LTO.  */
> > > @@ -2790,6 +2983,9 @@ symbol_table::compile (void)
> > >    timevar_pop (TV_CGRAPHOPT);
> > >
> > >    /* Output everything.  */
> > > +  if (split_outputs)
> > > +    handle_additional_asm (childno);
> > What this is doin?
>
> Create an auxiliary file in which we will write the name of every
> assembler output.  This will change, as it is a better idea to write
> them using the main process rather than the child, as Richi already
> pointed out.
>
> > > +
> > >    switch_to_section (text_section);
> > >    (*debug_hooks->assembly_start) ();
> > >    if (!quiet_flag)
> > > diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
> > > index 2cfab40156e..bc500df4853 100644
> > > --- a/gcc/ipa-fnsummary.c
> > > +++ b/gcc/ipa-fnsummary.c
> > > @@ -4610,7 +4610,7 @@ public:
> > >        gcc_assert (n == 0);
> > >        small_p = param;
> > >      }
> > > -  virtual bool gate (function *) { return true; }
> > > +  virtual bool gate (function *) { return !(flag_ltrans && split_outputs); }
> > >    virtual unsigned int execute (function *)
> > >      {
> > >        ipa_free_fn_summary ();
> > > diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
> > > index 069de9d82fb..6a5657c7507 100644
> > > --- a/gcc/ipa-icf.c
> > > +++ b/gcc/ipa-icf.c
> > > @@ -2345,7 +2345,8 @@ sem_item_optimizer::filter_removed_items (void)
> > >          {
> > >       cgraph_node *cnode = static_cast <sem_function *>(item)->get_node ();
> > >
> > > -     if (in_lto_p && (cnode->alias || cnode->body_removed))
> > > +     if ((in_lto_p || split_outputs)
> > > +         && (cnode->alias || cnode->body_removed))
> >
> >
> > And I wonder why you need these. IPA passes are run before we split,
> > right?
>
> This was due to an early problem.  I removed this and bootstrap is
> still working.
>
> > >         remove_item (item);
> > >       else
> > >         filtered.safe_push (item);
> > > diff --git a/gcc/ipa-visibility.c b/gcc/ipa-visibility.c
> > > index 7c854f471e8..4d9e11482d3 100644
> > > --- a/gcc/ipa-visibility.c
> > > +++ b/gcc/ipa-visibility.c
> > > @@ -540,7 +540,8 @@ optimize_weakref (symtab_node *node)
> > >  static void
> > >  localize_node (bool whole_program, symtab_node *node)
> > >  {
> > > -  gcc_assert (whole_program || in_lto_p || !TREE_PUBLIC (node->decl));
> > > +  gcc_assert (split_outputs || whole_program || in_lto_p
> > > +         || !TREE_PUBLIC (node->decl));
> > >
> > >    /* It is possible that one comdat group contains both hidden and non-hidden
> > >       symbols.  In this case we can privatize all hidden symbol but we need
> > > diff --git a/gcc/ipa.c b/gcc/ipa.c
> > > index 288b58cf73d..b397ea2fed8 100644
> > > --- a/gcc/ipa.c
> > > +++ b/gcc/ipa.c
> > > @@ -350,7 +350,7 @@ symbol_table::remove_unreachable_nodes (FILE *file)
> > >
> > >    /* Mark variables that are obviously needed.  */
> > >    FOR_EACH_DEFINED_VARIABLE (vnode)
> > > -    if (!vnode->can_remove_if_no_refs_p()
> > > +    if (!vnode->can_remove_if_no_refs_p ()
> > >     && !vnode->in_other_partition)
> > >        {
> > >     reachable.add (vnode);
> > > @@ -564,7 +564,7 @@ symbol_table::remove_unreachable_nodes (FILE *file)
> > >     }
> > >        else
> > >     gcc_assert (node->clone_of || !node->has_gimple_body_p ()
> > > -               || in_lto_p || DECL_RESULT (node->decl));
> > > +               || in_lto_p || split_outputs || DECL_RESULT (node->decl));
> > >      }
> > >
> > >    /* Inline clones might be kept around so their materializing allows further
> > > diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
> > > index 93a99f3465b..12be8546d9c 100644
> > > --- a/gcc/lto-cgraph.c
> > > +++ b/gcc/lto-cgraph.c
> > > @@ -39,6 +39,7 @@ along with GCC; see the file COPYING3.  If not see
> > >  #include "omp-offload.h"
> > >  #include "stringpool.h"
> > >  #include "attribs.h"
> > > +#include "lto-partition.h"
> > >
> > >  /* True when asm nodes has been output.  */
> > >  bool asm_nodes_output = false;
> > > @@ -2065,3 +2066,174 @@ input_cgraph_opt_summary (vec<symtab_node *> nodes)
> > >     input_cgraph_opt_section (file_data, data, len, nodes);
> > >      }
> > >  }
> > > +
> > > +/* When analysing function for removal, we have mainly three states, as
> > > +   defined below.  */
> > > +
> > > +enum node_partition_state
> > > +{
> > > +  CAN_REMOVE,              /* This node can be removed, or is still to be
> > > +                      analysed.  */
> > > +  IN_CURRENT_PARTITION, /* This node is in current partition and should not be
> > > +                      touched.  */
> > > +  IN_BOUNDARY,             /* This node is in boundary, therefore being in other
> > > +                      partition or is a external symbol, and its body can
> > > +                      be released.  */
> > > +  IN_BOUNDARY_KEEP_BODY /* This symbol is in other partition but we may need its
> > > +                      body for inlining, for instance.  */
> > > +};
> > > +
> > > +/* Handle node that are in the LTRANS boundary, releasing its body and
> > > +   other informations if necessary.  */
> > > +
> > > +static void
> > > +handle_node_in_boundary (symtab_node *node, bool keep_body)
> > > +{
> > > +  if (cgraph_node *cnode = dyn_cast <cgraph_node *> (node))
> > > +    {
> > > +      if (cnode->inlined_to && cnode->inlined_to->aux2 != IN_CURRENT_PARTITION)
> > > +   {
> > > +     /* If marked to be inlined into a node not in current partition,
> > > +        then undo the inline.  */
> > > +
> > > +     if (cnode->callers) /* This edge could be removed.  */
> > > +       cnode->callers->inline_failed = CIF_UNSPECIFIED;
> > > +     cnode->inlined_to = NULL;
> > > +   }
> > > +
> > > +      if (cnode->has_gimple_body_p ())
> > > +   {
> > > +     if (!keep_body)
> > > +       {
> > > +         cnode->maybe_release_dominators ();
> > > +         cnode->remove_callees ();
> > > +         cnode->remove_all_references ();
> > > +
> > > +         /* FIXME: Releasing body of clones can release bodies of functions
> > > +            in current partition.  */
> > > +
> > > +         /* cnode->release_body ();  */
> > > +         cnode->body_removed = true;
> > > +         cnode->definition = false;
> > > +         cnode->analyzed = false;
> > > +       }
> > > +     cnode->cpp_implicit_alias = false;
> > > +     cnode->alias = false;
> > > +     cnode->transparent_alias = false;
> > > +     cnode->thunk.thunk_p = false;
> > > +     cnode->weakref = false;
> > > +     /* After early inlining we drop always_inline attributes on
> > > +        bodies of functions that are still referenced (have their
> > > +        address taken).  */
> > > +     DECL_ATTRIBUTES (cnode->decl)
> > > +       = remove_attribute ("always_inline",
> > > +                           DECL_ATTRIBUTES (node->decl));
> > > +
> > > +     cnode->in_other_partition = true;
> > > +   }
> > > +    }
> > > +  else if (is_a <varpool_node *> (node) && !DECL_EXTERNAL (node->decl))
> > > +    {
> > > +      DECL_EXTERNAL (node->decl) = true;
> > > +      node->in_other_partition = true;
> > > +    }
> > > +}
> > > +
> > > +/* Check the boundary and expands it if necessary, including more nodes or
> > > +   promoting then to a state where their body is required.  */
> > > +
> > > +static void
> > > +compute_boundary (ltrans_partition partition)
> > > +{
> > > +  vec<lto_encoder_entry> &nodes = partition->encoder->nodes;
> > > +  symtab_node *node;
> > > +  cgraph_node *cnode;
> > > +  auto_vec<symtab_node *, 16> mark_to_remove;
> > > +  unsigned int i;
> > > +
> > > +  FOR_EACH_SYMBOL (node)
> > > +    node->aux2 = CAN_REMOVE;
> >
> > There is boundary computation in lto-cgraph.c so we should merge the
> > logic...
>
> Agree.
>
> > If you keep the lto-partition datastructures it will compute boundary
> > for you and you can just remove the rest (I got that working at one
> > point).
>
> This is interesting, because I could not get that working out of the
> box. The lto_promote_cross_statics did not provide a fully working
> boundary that I could simple remove everything else. If you take a
> closer look, you will see that I am using the already computed boundary
> as a base, and incrementing that with the extra stuff required
> (mainly inline clones body).
>
> >
> > I am also not sure about copy-on-write effect of this. It may be better
> > to keep things around and just teach late passes to not compile things
> > in other partitions, but that is definitly for incremental change.
>
> Well, this looks like a lot of work :)
>
> > > +void
> > > +init_additional_asm_names_file (void)
> > > +{
> > > +  gcc_assert (split_outputs);
> > > +
> > > +  additional_asm_filenames = fopen (split_outputs, "w");
> > > +  if (!additional_asm_filenames)
> > > +    error ("Unable to create a temporary write-only file.");
> > > +
> > > +  fclose (additional_asm_filenames);
> > > +}
> >
> > Aha, that is what it does :)
> > I wonder if creating the file conditionally (and late) can not lead to
> > race condition where we will use same tmp file name for other build
> > executed in parallel by make.
>
> Hummm. True. Added to my TODO list :)
> Well, I never had any sort of issues with race condition here, even
> after stressing it, but this certainly is not proof that it is free of
> race conditition :)
>
> > > +
> > > +/* Reinitialize the assembler file and store it in the additional asm file.  */
> > > +
> > > +void
> > > +handle_additional_asm (int childno)
> > > +{
> > > +  gcc_assert (split_outputs);
> > > +
> > > +  if (childno < 0)
> > > +    return;
> > > +
> > > +  const char *temp_asm_name = make_temp_file (".s");
> > > +  asm_file_name = temp_asm_name;
> > > +
> > > +  if (asm_out_file == stdout)
> > > +    fatal_error (UNKNOWN_LOCATION, "Unexpected asm output to stdout");
> > > +
> > > +  fclose (asm_out_file);
> > > +
> > > +  asm_out_file = fopen (temp_asm_name, "w");
> > > +  if (!asm_out_file)
> > > +    fatal_error (UNKNOWN_LOCATION, "Unable to create asm output file");
> > > +
> > > +  /* Reopen file as append mode.  Here we assume that write to append file is
> > > +     atomic, as it is in Linux.  */
> > > +  additional_asm_filenames = fopen (split_outputs, "a");
> > > +  if (!additional_asm_filenames)
> > > +    fatal_error (UNKNOWN_LOCATION,
> > > +            "Unable to open the temporary asm files container");
> > > +
> > > +  fprintf (additional_asm_filenames, "%d %s\n", childno, asm_file_name);
> > > +  fclose (additional_asm_filenames);
> > > +}
> > > +
> > >  /* A helper function; used as the reallocator function for cpp's line
> > >     table.  */
> > >  static void *
> > > @@ -2311,7 +2359,7 @@ do_compile ()
> > >
> > >            timevar_stop (TV_PHASE_SETUP);
> > >
> > > -          compile_file ();
> > > +     compile_file ();
> > >          }
> > >        else
> > >          {
> > > @@ -2477,6 +2525,12 @@ toplev::main (int argc, char **argv)
> > >
> > >    finalize_plugins ();
> > >
> > > +  if (jobserver_initialized)
> > > +    {
> > > +      jobserver_return_token (JOBSERVER_NULL_TOKEN);
> > > +      jobserver_finalize ();
> > > +    }
> > > +
> > >    after_memory_report = true;
> > >
> > >    if (seen_error () || werrorcount)
> > > diff --git a/gcc/toplev.h b/gcc/toplev.h
> > > index d6c316962b0..3abbf74cd02 100644
> > > --- a/gcc/toplev.h
> > > +++ b/gcc/toplev.h
> > > @@ -103,4 +103,7 @@ extern void parse_alignment_opts (void);
> > >
> > >  extern void initialize_rtl (void);
> > >
> > > +extern void init_additional_asm_names_file (void);
> > > +extern void handle_additional_asm (int);
> > > +
> > >  #endif /* ! GCC_TOPLEV_H */
> > > diff --git a/gcc/varasm.c b/gcc/varasm.c
> > > index 4070f9c17e8..84df52013d7 100644
> > > --- a/gcc/varasm.c
> > > +++ b/gcc/varasm.c
> > > @@ -110,7 +110,7 @@ static void decode_addr_const (tree, class addr_const *);
> > >  static hashval_t const_hash_1 (const tree);
> > >  static int compare_constant (const tree, const tree);
> > >  static void output_constant_def_contents (rtx);
> > > -static void output_addressed_constants (tree);
> > > +static void output_addressed_constants (tree, int);
> > >  static unsigned HOST_WIDE_INT output_constant (tree, unsigned HOST_WIDE_INT,
> > >                                            unsigned int, bool, bool);
> > >  static void globalize_decl (tree);
> > > @@ -2272,7 +2272,7 @@ assemble_variable (tree decl, int top_level ATTRIBUTE_UNUSED,
> > >
> > >    /* Output any data that we will need to use the address of.  */
> > >    if (DECL_INITIAL (decl) && DECL_INITIAL (decl) != error_mark_node)
> > > -    output_addressed_constants (DECL_INITIAL (decl));
> > > +    output_addressed_constants (DECL_INITIAL (decl), 0);
> > >
> > >    /* dbxout.c needs to know this.  */
> > >    if (sect && (sect->common.flags & SECTION_CODE) != 0)
> > > @@ -3426,11 +3426,11 @@ build_constant_desc (tree exp)
> > >     already have labels.  */
> > >
> > >  static constant_descriptor_tree *
> > > -add_constant_to_table (tree exp)
> > > +add_constant_to_table (tree exp, int defer)
> > >  {
> > >    /* The hash table methods may call output_constant_def for addressed
> > >       constants, so handle them first.  */
> > > -  output_addressed_constants (exp);
> > > +  output_addressed_constants (exp, defer);
> > >
> > >    /* Sanity check to catch recursive insertion.  */
> > >    static bool inserting;
> > > @@ -3474,7 +3474,7 @@ add_constant_to_table (tree exp)
> > >  rtx
> > >  output_constant_def (tree exp, int defer)
> > >  {
> > > -  struct constant_descriptor_tree *desc = add_constant_to_table (exp);
> > > +  struct constant_descriptor_tree *desc = add_constant_to_table (exp, defer);
> > >    maybe_output_constant_def_contents (desc, defer);
> > >    return desc->rtl;
> > >  }
> > > @@ -3544,7 +3544,7 @@ output_constant_def_contents (rtx symbol)
> > >
> > >    /* Make sure any other constants whose addresses appear in EXP
> > >       are assigned label numbers.  */
> > > -  output_addressed_constants (exp);
> > > +  output_addressed_constants (exp, 0);
> > >
> > >    /* We are no longer deferring this constant.  */
> > >    TREE_ASM_WRITTEN (decl) = TREE_ASM_WRITTEN (exp) = 1;
> > > @@ -3608,7 +3608,7 @@ lookup_constant_def (tree exp)
> > >  tree
> > >  tree_output_constant_def (tree exp)
> > >  {
> > > -  struct constant_descriptor_tree *desc = add_constant_to_table (exp);
> > > +  struct constant_descriptor_tree *desc = add_constant_to_table (exp, 1);
> > >    tree decl = SYMBOL_REF_DECL (XEXP (desc->rtl, 0));
> > >    varpool_node::finalize_decl (decl);
> > >    return decl;
> > > @@ -4327,7 +4327,7 @@ compute_reloc_for_constant (tree exp)
> > >     Indicate whether an ADDR_EXPR has been encountered.  */
> > >
> > >  static void
> > > -output_addressed_constants (tree exp)
> > > +output_addressed_constants (tree exp, int defer)
> > >  {
> > >    tree tem;
> > >
> > > @@ -4347,21 +4347,21 @@ output_addressed_constants (tree exp)
> > >     tem = DECL_INITIAL (tem);
> > >
> > >        if (CONSTANT_CLASS_P (tem) || TREE_CODE (tem) == CONSTRUCTOR)
> > > -   output_constant_def (tem, 0);
> > > +   output_constant_def (tem, defer);
> > >
> > >        if (TREE_CODE (tem) == MEM_REF)
> > > -   output_addressed_constants (TREE_OPERAND (tem, 0));
> > > +   output_addressed_constants (TREE_OPERAND (tem, 0), defer);
> > >        break;
> > >
> > >      case PLUS_EXPR:
> > >      case POINTER_PLUS_EXPR:
> > >      case MINUS_EXPR:
> > > -      output_addressed_constants (TREE_OPERAND (exp, 1));
> > > +      output_addressed_constants (TREE_OPERAND (exp, 1), defer);
> > >        gcc_fallthrough ();
> > >
> > >      CASE_CONVERT:
> > >      case VIEW_CONVERT_EXPR:
> > > -      output_addressed_constants (TREE_OPERAND (exp, 0));
> > > +      output_addressed_constants (TREE_OPERAND (exp, 0), defer);
> > >        break;
> > >
> > >      case CONSTRUCTOR:
> > > @@ -4369,7 +4369,7 @@ output_addressed_constants (tree exp)
> > >     unsigned HOST_WIDE_INT idx;
> > >     FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (exp), idx, tem)
> > >       if (tem != 0)
> > > -       output_addressed_constants (tem);
> > > +       output_addressed_constants (tem, defer);
> > >        }
> > >        break;
> >
> > Nice job :)
> > Honza
>
> Thank you,
> Giuliano.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/6] Parallelize Intra-Procedural Optimizations using the LTO Engine.
  2020-08-31  8:15       ` Richard Biener
@ 2020-08-31 11:44         ` Jan Hubicka
  0 siblings, 0 replies; 31+ messages in thread
From: Jan Hubicka @ 2020-08-31 11:44 UTC (permalink / raw)
  To: Richard Biener; +Cc: Giuliano Belinassi, GCC Patches

> 
> Yeah.  Or even refactor the output machinery so that in theory
> we can create asm fragments [into memory] for functions and variables
> and only at the end concat/output them in a desired order  (cf.
> -fno-toplevel-reorder
> wrt toporder code-generation we'd prefer).  That would also be one (small)
> step towards eventually being able to thread the RTL pipeline.

That would be nice.  We also have WIP Martin's code layout pass
that would benefit from this.
> 
> Note that in theory the auto-parallel work could be leveraged to
> elide LTRANS streaming if we'd drive LTRANS compile from WPA
> instead of from LTO wrapper.  We could simply do the forking,
> apply the partition and then load bodies from the original IL files
> (with the caveat of needing GC to truncate the decls and types
> memory from WPA).

Yep, as mentioned in other mail, I implemented that at one point, but
it was memory hungry and not faster than ltrans streaming, so put it on
hold :)
I suppose same way as for non-WPA build, there is a size of unit until
which one thread works best, from some point partitioning help and for
even bigger units streaming start to help, just because it repickles
data with better locality.

Honza
> 
> But yes, avoiding the partial link would be nice - it would also
> be possible to support parallelizing -S compiles.  And it would
> avoid the symbol renaming.
> 
> Richard.
> 
> > Honza
> > >
> > > >
> > > > > Bootstrapped and Regtested on Linux x86_64.
> > > > >
> > > > > Giuliano Belinassi (6):
> > > > >   Modify gcc driver for parallel compilation
> > > > >   Implement a new partitioner for parallel compilation
> > > > >   Implement fork-based parallelism engine
> > > > >   Add `+' for Jobserver Integration
> > > > >   Add invoke documentation
> > > > >   New tests for parallel compilation feature
> > > > >
> > > > >  gcc/Makefile.in                               |    6 +-
> > > > >  gcc/cgraph.c                                  |   16 +
> > > > >  gcc/cgraph.h                                  |   13 +
> > > > >  gcc/cgraphunit.c                              |  198 ++-
> > > > >  gcc/common.opt                                |    4 +
> > > > >  gcc/doc/invoke.texi                           |   32 +-
> > > > >  gcc/gcc.c                                     | 1219 +++++++++++++----
> > > > >  gcc/ipa-fnsummary.c                           |    2 +-
> > > > >  gcc/ipa-icf.c                                 |    3 +-
> > > > >  gcc/ipa-visibility.c                          |    3 +-
> > > > >  gcc/ipa.c                                     |    4 +-
> > > > >  gcc/jobserver.cc                              |  168 +++
> > > > >  gcc/jobserver.h                               |   33 +
> > > > >  gcc/lto-cgraph.c                              |  172 +++
> > > > >  gcc/{lto => }/lto-partition.c                 |  463 ++++++-
> > > > >  gcc/{lto => }/lto-partition.h                 |    4 +-
> > > > >  gcc/lto-streamer.h                            |    4 +
> > > > >  gcc/lto/Make-lang.in                          |    4 +-
> > > > >  gcc/lto/lto.c                                 |    2 +-
> > > > >  gcc/params.opt                                |    8 +
> > > > >  gcc/symtab.c                                  |   46 +-
> > > > >  gcc/testsuite/driver/a.c                      |    6 +
> > > > >  gcc/testsuite/driver/b.c                      |    6 +
> > > > >  gcc/testsuite/driver/driver.exp               |   80 ++
> > > > >  gcc/testsuite/driver/empty.c                  |    0
> > > > >  gcc/testsuite/driver/foo.c                    |    7 +
> > > > >  .../gcc.dg/parallel-early-constant.c          |   22 +
> > > > >  gcc/testsuite/gcc.dg/parallel-static-1.c      |   21 +
> > > > >  gcc/testsuite/gcc.dg/parallel-static-2.c      |   21 +
> > > > >  .../gcc.dg/parallel-static-clash-1.c          |   23 +
> > > > >  .../gcc.dg/parallel-static-clash-aux.c        |   14 +
> > > > >  gcc/toplev.c                                  |   58 +-
> > > > >  gcc/toplev.h                                  |    3 +
> > > > >  gcc/tree.c                                    |   23 +-
> > > > >  gcc/varasm.c                                  |   26 +-
> > > > >  intl/Makefile.in                              |    2 +-
> > > > >  libbacktrace/Makefile.in                      |    2 +-
> > > > >  libcpp/Makefile.in                            |    2 +-
> > > > >  libdecnumber/Makefile.in                      |    2 +-
> > > > >  libiberty/Makefile.in                         |  212 +--
> > > > >  zlib/Makefile.in                              |   64 +-
> > > > >  41 files changed, 2539 insertions(+), 459 deletions(-)
> > > > >  create mode 100644 gcc/jobserver.cc
> > > > >  create mode 100644 gcc/jobserver.h
> > > > >  rename gcc/{lto => }/lto-partition.c (78%)
> > > > >  rename gcc/{lto => }/lto-partition.h (89%)
> > > > >  create mode 100644 gcc/testsuite/driver/a.c
> > > > >  create mode 100644 gcc/testsuite/driver/b.c
> > > > >  create mode 100644 gcc/testsuite/driver/driver.exp
> > > > >  create mode 100644 gcc/testsuite/driver/empty.c
> > > > >  create mode 100644 gcc/testsuite/driver/foo.c
> > > > >  create mode 100644 gcc/testsuite/gcc.dg/parallel-early-constant.c
> > > > >  create mode 100644 gcc/testsuite/gcc.dg/parallel-static-1.c
> > > > >  create mode 100644 gcc/testsuite/gcc.dg/parallel-static-2.c
> > > > >  create mode 100644 gcc/testsuite/gcc.dg/parallel-static-clash-1.c
> > > > >  create mode 100644 gcc/testsuite/gcc.dg/parallel-static-clash-aux.c
> > > > >
> > > > > --
> > > > > 2.28.0
> > > > >

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2020-08-31 11:44 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-20 22:00 [PATCH 0/6] Parallelize Intra-Procedural Optimizations using the LTO Engine Giuliano Belinassi
2020-08-20 22:00 ` [PATCH 1/6] Modify gcc driver for parallel compilation Giuliano Belinassi
2020-08-24 13:17   ` Richard Biener
2020-08-24 18:06     ` Giuliano Belinassi
2020-08-25  6:53       ` Richard Biener
2020-08-20 22:00 ` [PATCH 2/6] Implement a new partitioner " Giuliano Belinassi
2020-08-27 15:18   ` Jan Hubicka
2020-08-27 21:42     ` Giuliano Belinassi
2020-08-31  9:25   ` Richard Biener
2020-08-20 22:00 ` [PATCH 3/6] Implement fork-based parallelism engine Giuliano Belinassi
2020-08-27 15:25   ` Jan Hubicka
2020-08-27 15:37   ` Jan Hubicka
2020-08-27 18:27     ` Giuliano Belinassi
2020-08-29 11:41       ` Jan Hubicka
2020-08-31  9:33       ` Richard Biener
2020-08-20 22:00 ` [PATCH 4/6] Add `+' for Jobserver Integration Giuliano Belinassi
2020-08-20 22:33   ` Joseph Myers
2020-08-24 13:19     ` Richard Biener
2020-08-27 15:38     ` Jan Hubicka
2020-08-20 22:00 ` [PATCH 5/6] Add invoke documentation Giuliano Belinassi
2020-08-20 22:00 ` [PATCH 6/6] New tests for parallel compilation feature Giuliano Belinassi
2020-08-21 21:08 ` [PATCH 0/6] Parallelize Intra-Procedural Optimizations using the LTO Engine Josh Triplett
2020-08-22 21:04   ` Giuliano Belinassi
2020-08-24 16:44     ` Josh Triplett
2020-08-24 18:38       ` Giuliano Belinassi
2020-08-25  7:03         ` Richard Biener
2020-08-24 12:50 ` Richard Biener
2020-08-24 15:13   ` Giuliano Belinassi
2020-08-29 11:31     ` Jan Hubicka
2020-08-31  8:15       ` Richard Biener
2020-08-31 11:44         ` Jan Hubicka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).