public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* nvptx offloading patches [3/n], RFD
@ 2014-11-01 11:58 Bernd Schmidt
  2014-11-03 22:28 ` Jeff Law
  2015-02-04 11:38 ` nvptx offloading patches [3/n], RFD Jakub Jelinek
  0 siblings, 2 replies; 42+ messages in thread
From: Bernd Schmidt @ 2014-11-01 11:58 UTC (permalink / raw)
  To: GCC Patches; +Cc: Ilya Verbin

[-- Attachment #1: Type: text/plain, Size: 881 bytes --]

This is not against current trunk; it applies to gomp-4_0-branch where 
it is one of the necessary parts to make offloading x86->nvptx work. The 
issue is that the LTO file format depends on the machine_modes enum, it 
needs to match between host and offload target. The easiest way to do 
this is to just use the host-modes.def when compiling an offload compiler.

Ports that want to be hosts for offloading may need to modify their 
modes.def. The patch below contains changes to i386-modes.def which 
modifies XFmode depending on a target switch. I'm not actually entirely 
sure what to do about this. Do we want to make this flag an error when 
offloading is enabled? Or maybe add float format support to the 
-foffload-abi option?

Thoughts? Ok for the first part of the patch once the other offloading 
patches have gone in (bootstrapped and tested on x86_64-linux)?


Bernd

[-- Attachment #2: modes.diff --]
[-- Type: text/x-patch, Size: 2263 bytes --]

	* config.gcc (offload_host_cpu_type): Compute.
	(extra_modes): Use it to pick the offload host CPU's modes.def when
	building an offload target.
	* config/i386/i386-modes.def (XF): Skip adjustments when building an
	offload compiler.

Index: gomp-4_0-branch/gcc/config.gcc
===================================================================
--- gomp-4_0-branch.orig/gcc/config.gcc
+++ gomp-4_0-branch/gcc/config.gcc
@@ -483,15 +483,26 @@ tilepro*-*-*)
 	;;
 esac
 
+offload_host_cpu_type=${cpu_type}
+if test "x${enable_as_accelerator}" != "xno"
+then
+	offload_host_cpu_type=`echo ${enable_as_accelerator_for} | sed 's/-.*$//'`
+fi
+case ${offload_host_cpu_type} in
+x86_64)
+          offload_host_cpu_type=i386
+	  ;;
+esac
+
 tm_file=${cpu_type}/${cpu_type}.h
 if test -f ${srcdir}/config/${cpu_type}/${cpu_type}-protos.h
 then
 	tm_p_file=${cpu_type}/${cpu_type}-protos.h
 fi
 extra_modes=
-if test -f ${srcdir}/config/${cpu_type}/${cpu_type}-modes.def
+if test -f ${srcdir}/config/${offload_host_cpu_type}/${offload_host_cpu_type}-modes.def
 then
-	extra_modes=${cpu_type}/${cpu_type}-modes.def
+	extra_modes=${offload_host_cpu_type}/${offload_host_cpu_type}-modes.def
 fi
 if test -f ${srcdir}/config/${cpu_type}/${cpu_type}.opt
 then
Index: gomp-4_0-branch/gcc/config/i386/i386-modes.def
===================================================================
--- gomp-4_0-branch.orig/gcc/config/i386/i386-modes.def
+++ gomp-4_0-branch/gcc/config/i386/i386-modes.def
@@ -24,6 +24,9 @@ along with GCC; see the file COPYING3.
 FRACTIONAL_FLOAT_MODE (XF, 80, 12, ieee_extended_intel_96_format);
 FLOAT_MODE (TF, 16, ieee_quad_format);
 
+/* This file may be used when building a compiler for an offload target.
+   Assume that no special floating point options are used.  */
+#ifndef ACCEL_COMPILER
 /* In ILP32 mode, XFmode has size 12 and alignment 4.
    In LP64 mode, XFmode has size and alignment 16.  */
 ADJUST_FLOAT_FORMAT (XF, (TARGET_128BIT_LONG_DOUBLE
@@ -33,6 +36,7 @@ ADJUST_FLOAT_FORMAT (XF, (TARGET_128BIT_
 			  : &ieee_extended_intel_96_format));
 ADJUST_BYTESIZE  (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 12);
 ADJUST_ALIGNMENT (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 4);
+#endif
 
 /* Add any extra modes needed to represent the condition code.
 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: nvptx offloading patches [3/n], RFD
  2014-11-01 11:58 nvptx offloading patches [3/n], RFD Bernd Schmidt
@ 2014-11-03 22:28 ` Jeff Law
  2014-11-04 12:38   ` nvptx offloading patches [3/n], i386 bits RFD Bernd Schmidt
  2015-02-04 11:38 ` nvptx offloading patches [3/n], RFD Jakub Jelinek
  1 sibling, 1 reply; 42+ messages in thread
From: Jeff Law @ 2014-11-03 22:28 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches; +Cc: Ilya Verbin

On 11/01/14 05:57, Bernd Schmidt wrote:
> This is not against current trunk; it applies to gomp-4_0-branch where
> it is one of the necessary parts to make offloading x86->nvptx work. The
> issue is that the LTO file format depends on the machine_modes enum, it
> needs to match between host and offload target. The easiest way to do
> this is to just use the host-modes.def when compiling an offload compiler.
>
> Ports that want to be hosts for offloading may need to modify their
> modes.def. The patch below contains changes to i386-modes.def which
> modifies XFmode depending on a target switch. I'm not actually entirely
> sure what to do about this. Do we want to make this flag an error when
> offloading is enabled? Or maybe add float format support to the
> -foffload-abi option?
>
> Thoughts? Ok for the first part of the patch once the other offloading
> patches have gone in (bootstrapped and tested on x86_64-linux)?
It feels like we've got another real distinction to make.  We've had 
host, build & target and they're all independent.  It feels like we need 
offload target and better separate between target and offload target. 
Then we need to figure out the places where we've got bleed-out.

Not sure how to deal with this any further out than the immediate term 
than using a hack like this. Though I'd prefer to avoid the #ifdef as it 
seems to me this shouldn't be baked in at build/configure time.

jeff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: nvptx offloading patches [3/n], i386 bits RFD
  2014-11-03 22:28 ` Jeff Law
@ 2014-11-04 12:38   ` Bernd Schmidt
  2014-11-04 18:58     ` Uros Bizjak
  2014-11-04 21:50     ` Jeff Law
  0 siblings, 2 replies; 42+ messages in thread
From: Bernd Schmidt @ 2014-11-04 12:38 UTC (permalink / raw)
  To: Jeff Law, GCC Patches; +Cc: Ilya Verbin, Uros Bizjak

[-- Attachment #1: Type: text/plain, Size: 2411 bytes --]

On 11/03/2014 11:27 PM, Jeff Law wrote:
> On 11/01/14 05:57, Bernd Schmidt wrote:
>> This is not against current trunk; it applies to gomp-4_0-branch where
>> it is one of the necessary parts to make offloading x86->nvptx work. The
>> issue is that the LTO file format depends on the machine_modes enum, it
>> needs to match between host and offload target. The easiest way to do
>> this is to just use the host-modes.def when compiling an offload
>> compiler.
>>
>> Ports that want to be hosts for offloading may need to modify their
>> modes.def. The patch below contains changes to i386-modes.def which
>> modifies XFmode depending on a target switch. I'm not actually entirely
>> sure what to do about this. Do we want to make this flag an error when
>> offloading is enabled? Or maybe add float format support to the
>> -foffload-abi option?
>>
>> Thoughts? Ok for the first part of the patch once the other offloading
>> patches have gone in (bootstrapped and tested on x86_64-linux)?
> It feels like we've got another real distinction to make.  We've had
> host, build & target and they're all independent.  It feels like we need
> offload target and better separate between target and offload target.
> Then we need to figure out the places where we've got bleed-out.

Is this a question of terminology? I agree that saying "offload host" 
when we'd normally be calling it the "target" is confusing, but it's 
difficult to come up with better names.

Offload host and target are not quite independent - when it's 
implemented through LTO at least there has to be a fairly close 
agreement even down to machine modes, which is why a patch like this is 
necessary.

> Not sure how to deal with this any further out than the immediate term
> than using a hack like this. Though I'd prefer to avoid the #ifdef as it
> seems to me this shouldn't be baked in at build/configure time.

Yeah, I'm not expecting the i386 part to go in quite as-is. For 
reference I'm including the offload-abi patch - Ilya is submitting this 
along with other option changes. One possibility would be to print and 
recognize strings such as lp64D128 or lp64D96 which would include 
information about the size of long double. Somehow though I can't really 
bring myself to believe that -mlong-double128 is a real use case with 
offloading so we might just disallow the combination.

CCing Uros in case he has an opinion.


Bernd


[-- Attachment #2: offload-opts2.diff --]
[-- Type: text/x-patch, Size: 22307 bytes --]

Index: gcc/common.opt
===================================================================
--- gcc/common.opt.orig
+++ gcc/common.opt
@@ -1601,6 +1601,19 @@ fnon-call-exceptions
 Common Report Var(flag_non_call_exceptions) Optimization
 Support synchronous non-call exceptions
 
+foffload-abi=
+Common Joined RejectNegative Enum(offload_abi) Var(flag_offload_abi) Init(OFFLOAD_ABI_UNSET)
+-foffload-abi=[lp64|ilp32]	Set the ABI to use in an offload compiler
+
+Enum
+Name(offload_abi) Type(enum offload_abi) UnknownError(unknown offload ABI %qs)
+
+EnumValue
+Enum(offload_abi) String(ilp32) Value(OFFLOAD_ABI_ILP32)
+
+EnumValue
+Enum(offload_abi) String(lp64) Value(OFFLOAD_ABI_LP64)
+
 fomit-frame-pointer
 Common Report Var(flag_omit_frame_pointer) Optimization
 When possible do not generate stack frames
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c.orig
+++ gcc/config/i386/i386.c
@@ -4261,6 +4261,15 @@ ix86_option_override (void)
   register_pass (&insert_vzeroupper_info);
 }
 
+/* Implement the TARGET_OFFLOAD_OPTIONS hook.  */
+static char *
+ix86_offload_options (void)
+{
+  if (TARGET_LP64)
+    return xstrdup ("-foffload-abi=lp64");
+  return xstrdup ("-foffload-abi=ilp32");
+}
+
 /* Update register usage after having seen the compiler flags.  */
 
 static void
@@ -47142,6 +47151,10 @@ ix86_atomic_assign_expand_fenv (tree *ho
 #define TARGET_FLOAT_EXCEPTIONS_ROUNDING_SUPPORTED_P \
   ix86_float_exceptions_rounding_supported_p
 
+#undef TARGET_OFFLOAD_OPTIONS
+#define TARGET_OFFLOAD_OPTIONS \
+  ix86_offload_options
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 \f
 #include "gt-i386.h"
Index: gcc/coretypes.h
===================================================================
--- gcc/coretypes.h.orig
+++ gcc/coretypes.h
@@ -111,6 +111,12 @@ enum tls_model {
   TLS_MODEL_LOCAL_EXEC
 };
 
+enum offload_abi {
+  OFFLOAD_ABI_UNSET,
+  OFFLOAD_ABI_LP64,
+  OFFLOAD_ABI_ILP32
+};
+
 /* Types of unwind/exception handling info that can be generated.  */
 
 enum unwind_info_type
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi.orig
+++ gcc/doc/tm.texi
@@ -11446,3 +11446,12 @@ Used when offloaded functions are seen i
 sections are available.  It is called once for each symbol that must be
 recorded in the offload function and variable table.
 @end deftypefn
+
+@deftypefn {Target Hook} {char *} TARGET_OFFLOAD_OPTIONS (void)
+Used when writing out the list of options into an LTO file.  It should
+translate any relevant target-specific options (such as the ABI in use)
+into one of the @option{-foffload} options that exist as a common interface
+to express such options. It should return a string containing these options,
+separated by spaces, which the caller will free.
+
+@end deftypefn
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in.orig
+++ gcc/doc/tm.texi.in
@@ -8427,3 +8427,5 @@ and the associated definitions of those
 @hook TARGET_ATOMIC_ASSIGN_EXPAND_FENV
 
 @hook TARGET_RECORD_OFFLOAD_SYMBOL
+
+@hook TARGET_OFFLOAD_OPTIONS
Index: gcc/hooks.c
===================================================================
--- gcc/hooks.c.orig
+++ gcc/hooks.c
@@ -373,12 +373,19 @@ hook_tree_tree_tree_tree_3rd_identity (t
   return c;
 }
 
-/* Generic hook that takes no arguments and returns a NULL string.  */
+/* Generic hook that takes no arguments and returns a NULL const string.  */
 const char *
 hook_constcharptr_void_null (void)
 {
   return NULL;
 }
+
+/* Generic hook that takes no arguments and returns a NULL string.  */
+char *
+hook_charptr_void_null (void)
+{
+  return NULL;
+}
 
 /* Generic hook that takes a tree and returns a NULL string.  */
 const char *
Index: gcc/hooks.h
===================================================================
--- gcc/hooks.h.orig
+++ gcc/hooks.h
@@ -101,6 +101,7 @@ extern rtx hook_rtx_rtx_identity (rtx);
 extern rtx hook_rtx_rtx_null (rtx);
 extern rtx hook_rtx_tree_int_null (tree, int);
 
+extern char *hook_charptr_void_null (void);
 extern const char *hook_constcharptr_void_null (void);
 extern const char *hook_constcharptr_const_tree_null (const_tree);
 extern const char *hook_constcharptr_const_rtx_null (const_rtx);
Index: gcc/lto-opts.c
===================================================================
--- gcc/lto-opts.c.orig
+++ gcc/lto-opts.c
@@ -37,6 +37,7 @@ along with GCC; see the file COPYING3.
 #include "common/common-target.h"
 #include "diagnostic.h"
 #include "lto-streamer.h"
+#include "lto-section-names.h"
 #include "toplev.h"
 
 /* Append the option piece OPT to the COLLECT_GCC_OPTIONS string
@@ -130,6 +131,22 @@ lto_write_options (void)
     append_to_collect_gcc_options (&temporary_obstack, &first_p,
 			       "-fno-strict-overflow");
 
+  if (strcmp (section_name_prefix, OMP_SECTION_NAME_PREFIX) == 0)
+    {
+      char *offload_opts = targetm.offload_options ();
+      char *offload_ptr = offload_opts;
+      while (offload_ptr)
+	{
+	  char *next = strchr (offload_ptr, ' ');
+	  if (next)
+	    *next++ = '\0';
+	  append_to_collect_gcc_options (&temporary_obstack, &first_p,
+					 offload_ptr);
+	  offload_ptr = next;
+	}
+      free (offload_opts);
+    }
+
   /* Output explicitly passed options.  */
   for (i = 1; i < save_decoded_options_count; ++i)
     {
@@ -153,6 +170,10 @@ lto_write_options (void)
       if (!(cl_options[option->opt_index].flags & (CL_COMMON|CL_TARGET|CL_LTO)))
 	continue;
 
+      if ((cl_options[option->opt_index].flags & CL_TARGET)
+	  && strcmp (section_name_prefix, OMP_SECTION_NAME_PREFIX) == 0)
+	continue;
+
       /* Drop options created from the gcc driver that will be rejected
 	 when passed on to the driver again.  */
       if (cl_options[option->opt_index].cl_reject_driver)
Index: gcc/lto-wrapper.c
===================================================================
--- gcc/lto-wrapper.c.orig
+++ gcc/lto-wrapper.c
@@ -282,6 +282,17 @@ merge_and_complain (struct cl_decoded_op
 		   foption->orig_option_with_args_text);
 	  break;
 
+	case OPT_foffload_abi_:
+	  for (j = 0; j < *decoded_options_count; ++j)
+	    if ((*decoded_options)[j].opt_index == foption->opt_index)
+	      break;
+	  if (j == *decoded_options_count)
+	    append_option (decoded_options, decoded_options_count, foption);
+	  else if (foption->value != (*decoded_options)[j].value)
+	    fatal ("Option %s not used consistently in all LTO input files",
+		   foption->orig_option_with_args_text);
+	  break;
+
 	case OPT_O:
 	case OPT_Ofast:
 	case OPT_Og:
@@ -414,6 +425,109 @@ parse_env_var (const char *str, char ***
   return num;
 }
 
+static void
+append_compiler_options (obstack *argv_obstack, struct cl_decoded_option *opts,
+			 unsigned int count)
+{
+  /* Append compiler driver arguments as far as they were merged.  */
+  for (unsigned int j = 1; j < count; ++j)
+    {
+      struct cl_decoded_option *option = &opts[j];
+
+      /* File options have been properly filtered by lto-opts.c.  */
+      switch (option->opt_index)
+	{
+	  /* Drop arguments that we want to take from the link line.  */
+	case OPT_flto_:
+	case OPT_flto:
+	case OPT_flto_partition_:
+	  continue;
+
+	default:
+	  break;
+	}
+
+      /* For now do what the original LTO option code was doing - pass
+	 on any CL_TARGET flag and a few selected others.  */
+      switch (option->opt_index)
+	{
+	case OPT_fPIC:
+	case OPT_fpic:
+	case OPT_fPIE:
+	case OPT_fpie:
+	case OPT_fcommon:
+	case OPT_fexceptions:
+	case OPT_fnon_call_exceptions:
+	case OPT_fgnu_tm:
+	case OPT_freg_struct_return:
+	case OPT_fpcc_struct_return:
+	case OPT_fshort_double:
+	case OPT_ffp_contract_:
+	case OPT_fwrapv:
+	case OPT_ftrapv:
+	case OPT_fstrict_overflow:
+	case OPT_foffload_abi_:
+	case OPT_O:
+	case OPT_Ofast:
+	case OPT_Og:
+	case OPT_Os:
+	  break;
+
+	default:
+	  if (!(cl_options[option->opt_index].flags & CL_TARGET))
+	    continue;
+	}
+
+      /* Pass the option on.  */
+      for (unsigned int i = 0; i < option->canonical_option_num_elements; ++i)
+	obstack_ptr_grow (argv_obstack, option->canonical_option[i]);
+    }
+}
+
+static void
+append_linker_options (obstack *argv_obstack, struct cl_decoded_option *opts,
+		       unsigned int count, bool include_target_options)
+{
+  /* Append linker driver arguments.  Compiler options from the linker
+     driver arguments will override / merge with those from the compiler.  */
+  for (unsigned int j = 1; j < count; ++j)
+    {
+      struct cl_decoded_option *option = &opts[j];
+
+      /* Do not pass on frontend specific flags not suitable for lto.  */
+      if (!(cl_options[option->opt_index].flags
+	    & (CL_COMMON|CL_TARGET|CL_DRIVER|CL_LTO)))
+	continue;
+
+      if ((cl_options[option->opt_index].flags & CL_TARGET)
+	  && !include_target_options)
+	continue;
+
+      switch (option->opt_index)
+	{
+	case OPT_o:
+	case OPT_flto_:
+	case OPT_flto:
+	  /* We've handled these LTO options, do not pass them on.  */
+	  continue;
+
+	case OPT_freg_struct_return:
+	case OPT_fpcc_struct_return:
+	case OPT_fshort_double:
+	  /* Ignore these, they are determined by the input files.
+	     ???  We fail to diagnose a possible mismatch here.  */
+	  continue;
+
+	default:
+	  break;
+	}
+
+      /* Pass the option on.  */
+      for (unsigned int i = 0; i < option->canonical_option_num_elements; ++i)
+	obstack_ptr_grow (argv_obstack, option->canonical_option[i]);
+    }
+}
+
 /* Check whether NAME can be accessed in MODE.  This is like access,
    except that it never considers directories to be executable.  */
 
@@ -440,7 +554,11 @@ access_check (const char *name, int mode
 
 static char*
 prepare_target_image (const char *target, const char *compiler_path,
-		      unsigned in_argc, char *in_argv[])
+		      unsigned in_argc, char *in_argv[],
+		      struct cl_decoded_option *compiler_opts,
+		      unsigned int compiler_opt_count,
+		      struct cl_decoded_option * /*linker_opts */,
+		      unsigned int /*linker_opt_count*/)
 {
   const char **argv;
   struct obstack argv_obstack;
@@ -469,8 +587,6 @@ prepare_target_image (const char *target
   /* Generate temp file name.  */
   filename = make_temp_file (".target.o");
 
-  /* --------------------------------------  */
-  /* Run gcc for target.  */
   obstack_init (&argv_obstack);
   obstack_ptr_grow (&argv_obstack, compiler);
   obstack_ptr_grow (&argv_obstack, "-o");
@@ -479,6 +595,8 @@ prepare_target_image (const char *target
   for (i = 1; i < in_argc; ++i)
     if (strncmp (in_argv[i], "-fresolution=", sizeof ("-fresolution=") - 1))
       obstack_ptr_grow (&argv_obstack, in_argv[i]);
+
+  append_compiler_options (&argv_obstack, compiler_opts, compiler_opt_count);
   obstack_ptr_grow (&argv_obstack, NULL);
 
   argv = XOBFINISH (&argv_obstack, const char **);
@@ -501,7 +619,11 @@ prepare_target_image (const char *target
    array.  */
 
 static void
-compile_images_for_openmp_targets (unsigned in_argc, char *in_argv[])
+compile_images_for_openmp_targets (unsigned in_argc, char *in_argv[],
+				   struct cl_decoded_option *compiler_opts,
+				   unsigned int compiler_opt_count,
+				   struct cl_decoded_option *linker_opts,
+				   unsigned int linker_opt_count)
 {
   char *target_names;
   char **names;
@@ -523,8 +645,10 @@ compile_images_for_openmp_targets (unsig
   offload_names = XCNEWVEC (char *, num_targets + 1);
   for (unsigned i = 0; i < num_targets; i++)
     {
-      offload_names[i] = prepare_target_image (names[i], compiler_path,
-					       in_argc, in_argv);
+      offload_names[i]
+	= prepare_target_image (names[i], compiler_path, in_argc, in_argv,
+				compiler_opts, compiler_opt_count,
+				linker_opts, linker_opt_count);
       if (!offload_names[i])
 	fatal_perror ("Problem with building target image for %s.\n", names[i]);
     }
@@ -592,6 +716,74 @@ find_ompbeginend (void)
   free_array_of_ptrs ((void**) paths, n_paths);
 }
 
+/* A subroutine of run_gcc.  Examine the open file FD for lto sections with
+   name prefix PREFIX, at FILE_OFFSET, and store any options we find in OPTS
+   and OPT_COUNT.  Return true if we found a matchingn section, false
+   otherwise.  COLLECT_GCC holds the value of the environment variable with
+   the same name.  */
+
+static bool
+find_and_merge_options (int fd, off_t file_offset, const char *prefix,
+			struct cl_decoded_option **opts,
+			unsigned int *opt_count, const char *collect_gcc)
+{
+  off_t offset, length;
+  char *data;
+  char *fopts;
+  const char *errmsg;
+  int err;
+  struct cl_decoded_option *fdecoded_options = *opts;
+  unsigned int fdecoded_options_count = *opt_count;
+
+  simple_object_read *sobj;
+  sobj = simple_object_start_read (fd, file_offset, "__GNU_LTO",
+				   &errmsg, &err);
+  if (!sobj)
+    return false;
+
+  char *secname = XNEWVEC (char, strlen (prefix) + 6);
+  strcpy (secname, prefix);
+  strcat (secname, ".opts");
+  if (!simple_object_find_section (sobj, secname, &offset, &length,
+				   &errmsg, &err))
+    {
+      simple_object_release_read (sobj);
+      return false;
+    }
+
+  lseek (fd, file_offset + offset, SEEK_SET);
+  data = (char *)xmalloc (length);
+  read (fd, data, length);
+  fopts = data;
+  do
+    {
+      struct cl_decoded_option *f2decoded_options;
+      unsigned int f2decoded_options_count;
+      get_options_from_collect_gcc_options (collect_gcc,
+					    fopts, CL_LANG_ALL,
+					    &f2decoded_options,
+					    &f2decoded_options_count);
+      if (!fdecoded_options)
+	{
+	  fdecoded_options = f2decoded_options;
+	  fdecoded_options_count = f2decoded_options_count;
+	}
+      else
+	merge_and_complain (&fdecoded_options,
+			    &fdecoded_options_count,
+			    f2decoded_options, f2decoded_options_count);
+
+      fopts += strlen (fopts) + 1;
+    }
+  while (fopts - data < length);
+
+  free (data);
+  simple_object_release_read (sobj);
+  *opts = fdecoded_options;
+  *opt_count = fdecoded_options_count;
+  return true;
+}
+
 /* Execute gcc. ARGC is the number of arguments. ARGV contains the arguments. */
 
 static void
@@ -607,7 +799,9 @@ run_gcc (unsigned argc, char *argv[])
   int jobserver = 0;
   bool no_partition = false;
   struct cl_decoded_option *fdecoded_options = NULL;
+  struct cl_decoded_option *omp_fdecoded_options = NULL;
   unsigned int fdecoded_options_count = 0;
+  unsigned int omp_fdecoded_options_count = 0;
   struct cl_decoded_option *decoded_options;
   unsigned int decoded_options_count;
   struct obstack argv_obstack;
@@ -629,18 +823,13 @@ run_gcc (unsigned argc, char *argv[])
   /* Look at saved options in the IL files.  */
   for (i = 1; i < argc; ++i)
     {
-      char *data, *p;
-      char *fopts;
+      char *p;
       int fd;
-      const char *errmsg;
-      int err;
-      off_t file_offset = 0, offset, length;
+      off_t file_offset = 0;
       long loffset;
-      simple_object_read *sobj;
       int consumed;
-      struct cl_decoded_option *f2decoded_options;
-      unsigned int f2decoded_options_count;
       char *filename = argv[i];
+
       if ((p = strrchr (argv[i], '@'))
 	  && p != argv[i] 
 	  && sscanf (p, "@%li%n", &loffset, &consumed) >= 1
@@ -654,51 +843,16 @@ run_gcc (unsigned argc, char *argv[])
       fd = open (argv[i], O_RDONLY);
       if (fd == -1)
 	continue;
-      sobj = simple_object_start_read (fd, file_offset, "__GNU_LTO", 
-	  			       &errmsg, &err);
-      if (!sobj)
-	{
-	  close (fd);
-	  continue;
-	}
-      if (!simple_object_find_section (sobj, LTO_SECTION_NAME_PREFIX "." "opts",
-				       &offset, &length, &errmsg, &err))
-	{
-	  simple_object_release_read (sobj);
-	  close (fd);
-	  continue;
-	}
-      /* We may choose not to write out this .opts section in the future.  In
-	 that case we'll have to use something else to look for.  */
-      if (simple_object_find_section (sobj, OMP_SECTION_NAME_PREFIX "." "opts",
-				      &offset, &length, &errmsg, &err))
-	have_offload = true;
-      lseek (fd, file_offset + offset, SEEK_SET);
-      data = (char *)xmalloc (length);
-      read (fd, data, length);
-      fopts = data;
-      do
-	{
-	  get_options_from_collect_gcc_options (collect_gcc,
-						fopts, CL_LANG_ALL,
-						&f2decoded_options,
-						&f2decoded_options_count);
-	  if (!fdecoded_options)
-	    {
-	      fdecoded_options = f2decoded_options;
-	      fdecoded_options_count = f2decoded_options_count;
-	    }
-	  else
-	    merge_and_complain (&fdecoded_options,
-				&fdecoded_options_count,
-				f2decoded_options, f2decoded_options_count);
-
-	  fopts += strlen (fopts) + 1;
-	}
-      while (fopts - data < length);
-
-      free (data);
-      simple_object_release_read (sobj);
+      bool omp_found;
+      find_and_merge_options (fd, file_offset, LTO_SECTION_NAME_PREFIX,
+			      &fdecoded_options, &fdecoded_options_count,
+			      collect_gcc);
+      omp_found = find_and_merge_options (fd, file_offset,
+					  OMP_SECTION_NAME_PREFIX,
+					  &omp_fdecoded_options,
+					  &omp_fdecoded_options_count,
+					  collect_gcc);
+      have_offload |= omp_found;
       close (fd);
     }
 
@@ -708,76 +862,21 @@ run_gcc (unsigned argc, char *argv[])
   obstack_ptr_grow (&argv_obstack, "-xlto");
   obstack_ptr_grow (&argv_obstack, "-c");
 
-  /* Append compiler driver arguments as far as they were merged.  */
-  for (j = 1; j < fdecoded_options_count; ++j)
-    {
-      struct cl_decoded_option *option = &fdecoded_options[j];
-
-      /* File options have been properly filtered by lto-opts.c.  */
-      switch (option->opt_index)
-	{
-	  /* Drop arguments that we want to take from the link line.  */
-	  case OPT_flto_:
-	  case OPT_flto:
-	  case OPT_flto_partition_:
-	      continue;
-
-	  default:
-	      break;
-	}
-
-      /* For now do what the original LTO option code was doing - pass
-	 on any CL_TARGET flag and a few selected others.  */
-      switch (option->opt_index)
-	{
-	case OPT_fPIC:
-	case OPT_fpic:
-	case OPT_fPIE:
-	case OPT_fpie:
-	case OPT_fcommon:
-	case OPT_fexceptions:
-	case OPT_fnon_call_exceptions:
-	case OPT_fgnu_tm:
-	case OPT_freg_struct_return:
-	case OPT_fpcc_struct_return:
-	case OPT_fshort_double:
-	case OPT_ffp_contract_:
-	case OPT_fwrapv:
-	case OPT_ftrapv:
-	case OPT_fstrict_overflow:
-	case OPT_O:
-	case OPT_Ofast:
-	case OPT_Og:
-	case OPT_Os:
-	  break;
-
-	default:
-	  if (!(cl_options[option->opt_index].flags & CL_TARGET))
-	    continue;
-	}
+  append_compiler_options (&argv_obstack, fdecoded_options,
+			   fdecoded_options_count);
+  append_linker_options (&argv_obstack, decoded_options, decoded_options_count,
+		      true);
 
-      /* Pass the option on.  */
-      for (i = 0; i < option->canonical_option_num_elements; ++i)
-	obstack_ptr_grow (&argv_obstack, option->canonical_option[i]);
-    }
-
-  /* Append linker driver arguments.  Compiler options from the linker
-     driver arguments will override / merge with those from the compiler.  */
+  /* Scan linker driver arguments for things that are of relevance to us.  */
   for (j = 1; j < decoded_options_count; ++j)
     {
       struct cl_decoded_option *option = &decoded_options[j];
 
-      /* Do not pass on frontend specific flags not suitable for lto.  */
-      if (!(cl_options[option->opt_index].flags
-	    & (CL_COMMON|CL_TARGET|CL_DRIVER|CL_LTO)))
-	continue;
-
       switch (option->opt_index)
 	{
 	case OPT_o:
 	  linker_output = option->arg;
-	  /* We generate new intermediate output, drop this arg.  */
-	  continue;
+	  break;
 
 	case OPT_save_temps:
 	  debug = 1;
@@ -808,23 +907,11 @@ run_gcc (unsigned argc, char *argv[])
 
 	case OPT_flto:
 	  lto_mode = LTO_MODE_WHOPR;
-	  /* We've handled these LTO options, do not pass them on.  */
-	  continue;
-
-	case OPT_freg_struct_return:
-	case OPT_fpcc_struct_return:
-	case OPT_fshort_double:
-	  /* Ignore these, they are determined by the input files.
-	     ???  We fail to diagnose a possible mismatch here.  */
-	  continue;
+	  break;
 
 	default:
 	  break;
 	}
-
-      /* Pass the option on.  */
-      for (i = 0; i < option->canonical_option_num_elements; ++i)
-	obstack_ptr_grow (&argv_obstack, option->canonical_option[i]);
     }
 
   if (no_partition)
@@ -897,7 +984,7 @@ run_gcc (unsigned argc, char *argv[])
       else
 	ltrans_output_file = make_temp_file (".ltrans.out");
       list_option_full = (char *) xmalloc (sizeof (char) *
-		         (strlen (ltrans_output_file) + list_option_len + 1));
+					   (strlen (ltrans_output_file) + list_option_len + 1));
       tmp = list_option_full;
 
       obstack_ptr_grow (&argv_obstack, tmp);
@@ -952,7 +1039,7 @@ run_gcc (unsigned argc, char *argv[])
 	  size_t len;
 
 	  buf = input_name;
-cont:
+	cont:
 	  if (!fgets (buf, piece, stream))
 	    break;
 	  len = strlen (input_name);
@@ -1003,8 +1090,8 @@ cont:
 	  if (linker_output)
 	    {
 	      char *dumpbase
-		  = (char *) xmalloc (strlen (linker_output)
-				      + sizeof (DUMPBASE_SUFFIX) + 1);
+		= (char *) xmalloc (strlen (linker_output)
+				    + sizeof (DUMPBASE_SUFFIX) + 1);
 	      snprintf (dumpbase,
 			strlen (linker_output) + sizeof (DUMPBASE_SUFFIX),
 			"%s.ltrans%u", linker_output, i);
@@ -1079,7 +1166,10 @@ cont:
 	}
       if (have_offload)
 	{
-	  compile_images_for_openmp_targets (argc, argv);
+	  compile_images_for_openmp_targets (argc, argv, omp_fdecoded_options,
+					     omp_fdecoded_options_count,
+					     decoded_options,
+					     decoded_options_count);
 	  if (offload_names)
 	    {
 	      find_ompbeginend ();
Index: gcc/target.def
===================================================================
--- gcc/target.def.orig
+++ gcc/target.def
@@ -1795,6 +1795,15 @@ actions then, you should have @code{TARG
  void, (void),
  hook_void_void)
 
+DEFHOOK
+(offload_options,
+ "Used when writing out the list of options into an LTO file.  It should\n\
+translate any relevant target-specific options (such as the ABI in use)\n\
+into one of the @option{-foffload} options that exist as a common interface\n\
+to express such options. It should return a string containing these options,\n\
+separated by spaces, which the caller will free.\n",
+char *, (void), hook_charptr_void_null)
+
 DEFHOOK_UNDOC
 (eh_return_filter_mode,
  "Return machine mode for filter value.",

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: nvptx offloading patches [3/n], i386 bits RFD
  2014-11-04 12:38   ` nvptx offloading patches [3/n], i386 bits RFD Bernd Schmidt
@ 2014-11-04 18:58     ` Uros Bizjak
  2014-11-04 21:50     ` Jeff Law
  1 sibling, 0 replies; 42+ messages in thread
From: Uros Bizjak @ 2014-11-04 18:58 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: Jeff Law, GCC Patches, Ilya Verbin, H.J. Lu

On Tue, Nov 4, 2014 at 1:35 PM, Bernd Schmidt <bernds@codesourcery.com> wrote:

>> Not sure how to deal with this any further out than the immediate term
>> than using a hack like this. Though I'd prefer to avoid the #ifdef as it
>> seems to me this shouldn't be baked in at build/configure time.
>
>
> Yeah, I'm not expecting the i386 part to go in quite as-is. For reference
> I'm including the offload-abi patch - Ilya is submitting this along with
> other option changes. One possibility would be to print and recognize
> strings such as lp64D128 or lp64D96 which would include information about
> the size of long double. Somehow though I can't really bring myself to
> believe that -mlong-double128 is a real use case with offloading so we might
> just disallow the combination.
>
> CCing Uros in case he has an opinion.

-mlong-double-128 was introduced for Android in:

2014-02-03  H.J. Lu  <hongjiu.lu@intel.com>

    * config/i386/i386.c (flag_opts): Add -mlong-double-128.
    (ix86_option_override_internal): Default long double to 64-bit for
    32-bit Bionic and to 128-bit for 64-bit Bionic.

    [...]

IMO, if it troubles offloading, anything else than the default
-mlong-double-80 should be disallowed.

Uros.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: nvptx offloading patches [3/n], i386 bits RFD
  2014-11-04 12:38   ` nvptx offloading patches [3/n], i386 bits RFD Bernd Schmidt
  2014-11-04 18:58     ` Uros Bizjak
@ 2014-11-04 21:50     ` Jeff Law
  2014-11-05  0:23       ` Bernd Schmidt
  1 sibling, 1 reply; 42+ messages in thread
From: Jeff Law @ 2014-11-04 21:50 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches; +Cc: Ilya Verbin, Uros Bizjak

On 11/04/14 05:35, Bernd Schmidt wrote:
>>> Ports that want to be hosts for offloading may need to modify their
>>> modes.def. The patch below contains changes to i386-modes.def which
>>> modifies XFmode depending on a target switch. I'm not actually entirely
>>> sure what to do about this. Do we want to make this flag an error when
>>> offloading is enabled? Or maybe add float format support to the
>>> -foffload-abi option?
>>>
>>> Thoughts? Ok for the first part of the patch once the other offloading
>>> patches have gone in (bootstrapped and tested on x86_64-linux)?
>> It feels like we've got another real distinction to make.  We've had
>> host, build & target and they're all independent.  It feels like we need
>> offload target and better separate between target and offload target.
>> Then we need to figure out the places where we've got bleed-out.
>
> Is this a question of terminology? I agree that saying "offload host"
> when we'd normally be calling it the "target" is confusing, but it's
> difficult to come up with better names.
No, I don't think it's terminology.  It's really that in effect we have 
two targets.  One is a normal CPU, the other is a GPU.

ie, there's nothing that says we won't have a GPU that's being driven by 
an ARM or PPC.  What I want to avoid is GPU-isms getting sprinkled into 
the x86 (or any other) backend.

The problem is we don't have any infrastructure in place for this kind 
of situation.  So we start off with a few hacks and hopefully we're able 
to see some commonality and start to see how to handle the 
multi-architecture target issues a bit better.

Jeff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: nvptx offloading patches [3/n], i386 bits RFD
  2014-11-04 21:50     ` Jeff Law
@ 2014-11-05  0:23       ` Bernd Schmidt
  2014-11-14 18:42         ` Bernd Schmidt
  0 siblings, 1 reply; 42+ messages in thread
From: Bernd Schmidt @ 2014-11-05  0:23 UTC (permalink / raw)
  To: Jeff Law, GCC Patches; +Cc: Ilya Verbin, Uros Bizjak

On 11/04/2014 10:50 PM, Jeff Law wrote:
> No, I don't think it's terminology.  It's really that in effect we have
> two targets.  One is a normal CPU, the other is a GPU.
>
> ie, there's nothing that says we won't have a GPU that's being driven by
> an ARM or PPC.  What I want to avoid is GPU-isms getting sprinkled into
> the x86 (or any other) backend.
>
> The problem is we don't have any infrastructure in place for this kind
> of situation.  So we start off with a few hacks and hopefully we're able
> to see some commonality and start to see how to handle the
> multi-architecture target issues a bit better.

FWIW the three non-ptx patches I sent plus the -foffload-abi stuff are 
the only ones necessary to make offloading through the LTO path work 
(this was against the gomp-4_0-branch with earlier versions of the 
offload patches Ilya's been posting; I haven't had a chance to test 
everything together in trunk yet). That doesn't seem like a large amount 
of changes.

For other targets I don't expect the situation to be too different. ARM 
has a similar float mode issue for HFmode, and things like 
m{big,little}-endian may have to be handled. I expect these can be 
handled with -foffload-abi machinery.

So, looking ahead - I'm imagining extra switches along the lines of
  -foffload-abi-hflt={arm,ieee,...}
  -foffload-abi-ldbl={64,x86,128}
  -foffload-abi-endian={big,little}

On some targets it might make sense to disallow offloading if certain 
switches are used. Uros seems to agree that on x86 the -mlong-double-128 
switch isn't very interesting. I'm thinking about how to deal with such 
a situation - maybe an offload_abi_valid hook that gets called whenever 
we find that we want to stream out offloaded functions. That would then 
sorry out (or maybe just warn) if the hook returns false.

I can do either or both, whatever the consensus turns out to be.


Bernd

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: nvptx offloading patches [3/n], i386 bits RFD
  2014-11-05  0:23       ` Bernd Schmidt
@ 2014-11-14 18:42         ` Bernd Schmidt
  0 siblings, 0 replies; 42+ messages in thread
From: Bernd Schmidt @ 2014-11-14 18:42 UTC (permalink / raw)
  To: Jeff Law, GCC Patches; +Cc: Ilya Verbin, Uros Bizjak

On 11/05/2014 01:19 AM, Bernd Schmidt wrote:
> On 11/04/2014 10:50 PM, Jeff Law wrote:
>> No, I don't think it's terminology.  It's really that in effect we have
>> two targets.  One is a normal CPU, the other is a GPU.
>>
>> ie, there's nothing that says we won't have a GPU that's being driven by
>> an ARM or PPC.  What I want to avoid is GPU-isms getting sprinkled into
>> the x86 (or any other) backend.
>>
>> The problem is we don't have any infrastructure in place for this kind
>> of situation.  So we start off with a few hacks and hopefully we're able
>> to see some commonality and start to see how to handle the
>> multi-architecture target issues a bit better.
>
> FWIW the three non-ptx patches I sent plus the -foffload-abi stuff are
> the only ones necessary to make offloading through the LTO path work
> (this was against the gomp-4_0-branch with earlier versions of the
> offload patches Ilya's been posting; I haven't had a chance to test
> everything together in trunk yet). That doesn't seem like a large amount
> of changes.
>
> For other targets I don't expect the situation to be too different. ARM
> has a similar float mode issue for HFmode, and things like
> m{big,little}-endian may have to be handled. I expect these can be
> handled with -foffload-abi machinery.
>
> So, looking ahead - I'm imagining extra switches along the lines of
>   -foffload-abi-hflt={arm,ieee,...}
>   -foffload-abi-ldbl={64,x86,128}
>   -foffload-abi-endian={big,little}
>
> On some targets it might make sense to disallow offloading if certain
> switches are used. Uros seems to agree that on x86 the -mlong-double-128
> switch isn't very interesting. I'm thinking about how to deal with such
> a situation - maybe an offload_abi_valid hook that gets called whenever
> we find that we want to stream out offloaded functions. That would then
> sorry out (or maybe just warn) if the hook returns false.
>
> I can do either or both, whatever the consensus turns out to be.

The discussion on these patches kind of stalled, so a gentle ping - it 
would be nice to integrate these now that Ilya's offload patches are in.


Bernd


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: nvptx offloading patches [3/n], RFD
  2014-11-01 11:58 nvptx offloading patches [3/n], RFD Bernd Schmidt
  2014-11-03 22:28 ` Jeff Law
@ 2015-02-04 11:38 ` Jakub Jelinek
  2015-02-09 10:20   ` Richard Biener
  1 sibling, 1 reply; 42+ messages in thread
From: Jakub Jelinek @ 2015-02-04 11:38 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches, Ilya Verbin

On Sat, Nov 01, 2014 at 12:57:45PM +0100, Bernd Schmidt wrote:
> This is not against current trunk; it applies to gomp-4_0-branch where it is
> one of the necessary parts to make offloading x86->nvptx work. The issue is
> that the LTO file format depends on the machine_modes enum, it needs to
> match between host and offload target. The easiest way to do this is to just
> use the host-modes.def when compiling an offload compiler.
> 
> Ports that want to be hosts for offloading may need to modify their
> modes.def. The patch below contains changes to i386-modes.def which modifies
> XFmode depending on a target switch. I'm not actually entirely sure what to
> do about this. Do we want to make this flag an error when offloading is
> enabled? Or maybe add float format support to the -foffload-abi option?
> 
> Thoughts? Ok for the first part of the patch once the other offloading
> patches have gone in (bootstrapped and tested on x86_64-linux)?

I don't like this at all.

IMHO instead we should stream in the offloading LTO sections some kind of mode
description table (perhaps limited to the modes actually ever streamed),
and when reading back the offloading LTO sections, let the offloading
compiler remap the modes to its own modes where there is a mapping in
between the two, choose some other mapping (e.g. map various vector modes
the host has but offloading target does not to say BLKmode), or give up
otherwise with offloading (say if you attempt to stream floating point modes
the offloading target doesn't support etc.).

So perhaps stream for each used mode the mode value, corresponding mode
class, size, precision, inner mode, nunits, and for floating point modes
supposedly somehow encode the real_format (perhaps just add a name <->
struct real_format mapping for the real.c modes, and map anything else
to "unknown").

	Jakub

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: nvptx offloading patches [3/n], RFD
  2015-02-04 11:38 ` nvptx offloading patches [3/n], RFD Jakub Jelinek
@ 2015-02-09 10:20   ` Richard Biener
  2015-02-16 21:08     ` Jakub Jelinek
  0 siblings, 1 reply; 42+ messages in thread
From: Richard Biener @ 2015-02-09 10:20 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Bernd Schmidt, GCC Patches, Ilya Verbin

On Wed, Feb 4, 2015 at 12:38 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Sat, Nov 01, 2014 at 12:57:45PM +0100, Bernd Schmidt wrote:
>> This is not against current trunk; it applies to gomp-4_0-branch where it is
>> one of the necessary parts to make offloading x86->nvptx work. The issue is
>> that the LTO file format depends on the machine_modes enum, it needs to
>> match between host and offload target. The easiest way to do this is to just
>> use the host-modes.def when compiling an offload compiler.
>>
>> Ports that want to be hosts for offloading may need to modify their
>> modes.def. The patch below contains changes to i386-modes.def which modifies
>> XFmode depending on a target switch. I'm not actually entirely sure what to
>> do about this. Do we want to make this flag an error when offloading is
>> enabled? Or maybe add float format support to the -foffload-abi option?
>>
>> Thoughts? Ok for the first part of the patch once the other offloading
>> patches have gone in (bootstrapped and tested on x86_64-linux)?
>
> I don't like this at all.
>
> IMHO instead we should stream in the offloading LTO sections some kind of mode
> description table (perhaps limited to the modes actually ever streamed),
> and when reading back the offloading LTO sections, let the offloading
> compiler remap the modes to its own modes where there is a mapping in
> between the two, choose some other mapping (e.g. map various vector modes
> the host has but offloading target does not to say BLKmode), or give up
> otherwise with offloading (say if you attempt to stream floating point modes
> the offloading target doesn't support etc.).
>
> So perhaps stream for each used mode the mode value, corresponding mode
> class, size, precision, inner mode, nunits, and for floating point modes
> supposedly somehow encode the real_format (perhaps just add a name <->
> struct real_format mapping for the real.c modes, and map anything else
> to "unknown").

I think (also communicated that on IRC) we should instead try not streaming
machine-modes at all but generating them at stream-in time via layout_type
or layout_decl.

Richard.

>         Jakub

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: nvptx offloading patches [3/n], RFD
  2015-02-09 10:20   ` Richard Biener
@ 2015-02-16 21:08     ` Jakub Jelinek
  2015-02-16 21:35       ` Richard Biener
  2015-02-17 13:32       ` Ilya Verbin
  0 siblings, 2 replies; 42+ messages in thread
From: Jakub Jelinek @ 2015-02-16 21:08 UTC (permalink / raw)
  To: Richard Biener, Jan Hubicka, Ilya Verbin, Bernd Schmidt, Thomas Schwinge
  Cc: gcc-patches

Hi!

On Mon, Feb 09, 2015 at 11:20:00AM +0100, Richard Biener wrote:
> I think (also communicated that on IRC) we should instead try not streaming
> machine-modes at all but generating them at stream-in time via layout_type
> or layout_decl.

Here is a WIP prototype for being able to stream a machine mode description
table and streaming it back in.
In the end, I'd like to stream this out only for lto_stream_offload_p and
stream it in only for ACCEL_COMPILER reading in when available, but wanted
to see what it does even for native LTO.
For that it doesn't work very well, because it seems that wpa phase
doesn't stream in some sections and stream them out again, but instead
somehow copies them directly to the output object, so the mode table
isn't aware of the modes used in there that were bypassed this way.

Anyway, the question is if for offloading we use wpa stage at all these days
or not at all, if there is a way for ACCEL_COMPILER to differentiate
somehow between LTO sections written by the host compiler and LTO sections
perhaps created by the offloading compiler when trying to LTO the thing (if
it does it at all).  Because obviously the host compiler written LTO
(in .gnu.offload_lto_*) would need the machine modes translated, while
LTO streamed already by the ACCEL_COMPILER (if any) generally would already
use the offloading target machine modes and therefore should be treated as
native lto (.gnu.lto_*). 

If we don't try to write .gnu.offload_lto_* again, I think following patch
with additionally not calling lto_write_mode_table for !lto_stream_offload_p
and not calling lto_input_mode_table for !ACCEL_COMPILER - instead build
a single shared identity table - might actually work.

Thoughts on this?

Bernd/Thomas, do you plan to commit the other approved patches soon?

--- gcc/passes.c.jj	2015-02-16 20:14:09.477345693 +0100
+++ gcc/passes.c	2015-02-16 20:26:23.659299189 +0100
@@ -2460,6 +2460,7 @@ ipa_write_summaries_1 (lto_symtab_encode
   struct lto_out_decl_state *state = lto_new_out_decl_state ();
   state->symtab_node_encoder = encoder;
 
+  lto_output_init_mode_table ();
   lto_push_out_decl_state (state);
 
   gcc_assert (!flag_wpa);
@@ -2581,6 +2582,7 @@ ipa_write_optimization_summaries (lto_sy
   lto_symtab_encoder_iterator lsei;
   state->symtab_node_encoder = encoder;
 
+  lto_output_init_mode_table ();
   lto_push_out_decl_state (state);
   for (lsei = lsei_start_function_in_partition (encoder);
        !lsei_end_p (lsei); lsei_next_function_in_partition (&lsei))
--- gcc/tree-streamer.h.jj	2015-02-16 20:14:09.446346202 +0100
+++ gcc/tree-streamer.h	2015-02-16 21:14:50.701615850 +0100
@@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.
 
 #include "streamer-hooks.h"
 #include "lto-streamer.h"
+#include "data-streamer.h"
 #include "hash-map.h"
 
 /* Cache of pickled nodes.  Used to avoid writing the same node more
@@ -91,6 +92,7 @@ void streamer_write_integer_cst (struct
 void streamer_write_builtin (struct output_block *, tree);
 
 /* In tree-streamer.c.  */
+extern unsigned char streamer_mode_table[1 << 8];
 void streamer_check_handled_ts_structures (void);
 bool streamer_tree_cache_insert (struct streamer_tree_cache_d *, tree,
 				 hashval_t, unsigned *);
@@ -119,5 +121,19 @@ streamer_tree_cache_get_hash (struct str
   return cache->hashes[ix];
 }
 
+static inline void
+bp_pack_machine_mode (struct bitpack_d *bp, machine_mode mode)
+{
+  streamer_mode_table[mode] = 1;
+  bp_pack_enum (bp, machine_mode, 1 << 8, mode);
+}
+
+static inline machine_mode
+bp_unpack_machine_mode (struct bitpack_d *bp)
+{
+  return (machine_mode)
+	   ((struct lto_input_block *)
+	    bp->stream)->mode_table[bp_unpack_enum (bp, machine_mode, 1 << 8)];
+}
 
 #endif  /* GCC_TREE_STREAMER_H  */
--- gcc/lto-streamer-out.c.jj	2015-02-16 20:14:09.046352765 +0100
+++ gcc/lto-streamer-out.c	2015-02-16 20:26:23.665299091 +0100
@@ -2642,6 +2642,96 @@ produce_symtab (struct output_block *ob)
 }
 
 
+/* Init the streamer_mode_table for output, where we collect info on what
+   machine_mode values have been streamed.  */
+void
+lto_output_init_mode_table (void)
+{
+  memset (streamer_mode_table, '\0', MAX_MACHINE_MODE);
+}
+
+
+/* Write the mode table.  */
+static void
+lto_write_mode_table (void)
+{
+  struct output_block *ob;
+  ob = create_output_block (LTO_section_mode_table);
+  bitpack_d bp = bitpack_create (ob->main_stream);
+
+  /* Ensure that for GET_MODE_INNER (m) != VOIDmode we have
+     also the inner mode marked.  */
+  for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
+    if (streamer_mode_table[i])
+      {
+	machine_mode m = (machine_mode) i;
+	if (GET_MODE_INNER (m) != VOIDmode)
+	  streamer_mode_table[(int) GET_MODE_INNER (m)] = 1;
+      }
+  /* First stream modes that have GET_MODE_INNER (m) == VOIDmode,
+     so that we can refer to them afterwards.  */
+  for (int pass = 0; pass < 2; pass++)
+    for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
+      if (streamer_mode_table[i] && i != (int) VOIDmode && i != (int) BLKmode)
+	{
+	  machine_mode m = (machine_mode) i;
+	  if ((GET_MODE_INNER (m) == VOIDmode) ^ (pass == 0))
+	    continue;
+	  bp_pack_value (&bp, m, 8);
+	  bp_pack_enum (&bp, mode_class, MAX_MODE_CLASS, GET_MODE_CLASS (m));
+	  bp_pack_value (&bp, GET_MODE_SIZE (m), 8);
+	  bp_pack_value (&bp, GET_MODE_PRECISION (m), 16);
+	  bp_pack_value (&bp, GET_MODE_INNER (m), 8);
+	  bp_pack_value (&bp, GET_MODE_NUNITS (m), 8);
+	  switch (GET_MODE_CLASS (m))
+	    {
+	    case MODE_FRACT:
+	    case MODE_UFRACT:
+	    case MODE_ACCUM:
+	    case MODE_UACCUM:
+	      bp_pack_value (&bp, GET_MODE_IBIT (m), 8);
+	      bp_pack_value (&bp, GET_MODE_FBIT (m), 8);
+	      break;
+	    case MODE_FLOAT:
+	    case MODE_DECIMAL_FLOAT:
+	      bp_pack_string (ob, &bp, REAL_MODE_FORMAT (m)->name, true);
+	      break;
+	    default:
+	      break;
+	    }
+	  bp_pack_string (ob, &bp, GET_MODE_NAME (m), true);
+	}
+  bp_pack_value (&bp, VOIDmode, 8);
+
+  streamer_write_bitpack (&bp);
+
+  char *section_name
+    = lto_get_section_name (LTO_section_mode_table, NULL, NULL);
+  lto_begin_section (section_name, !flag_wpa);
+  free (section_name);
+
+  /* The entire header stream is computed here.  */
+  struct lto_simple_header_with_strings header;
+  memset (&header, 0, sizeof (header));
+
+  /* Write the header.  */
+  header.major_version = LTO_major_version;
+  header.minor_version = LTO_minor_version;
+
+  header.main_size = ob->main_stream->total_size;
+  header.string_size = ob->string_stream->total_size;
+  lto_write_data (&header, sizeof header);
+
+  /* Put all of the gimple and the string table out the asm file as a
+     block of text.  */
+  lto_write_stream (ob->main_stream);
+  lto_write_stream (ob->string_stream);
+
+  lto_end_section ();
+  destroy_output_block (ob);
+}
+
+
 /* This pass is run after all of the functions are serialized and all
    of the IPA passes have written their serialized forms.  This pass
    causes the vector of all of the global decls and types used from
@@ -2749,4 +2839,5 @@ produce_asm_for_decls (void)
   lto_symtab_encoder_delete (ob->decl_state->symtab_node_encoder);
   lto_function_decl_states.release ();
   destroy_output_block (ob);
+  lto_write_mode_table ();
 }
--- gcc/lto-section-in.c.jj	2015-02-16 20:14:09.567344217 +0100
+++ gcc/lto-section-in.c	2015-02-16 20:26:23.655299255 +0100
@@ -89,7 +89,8 @@ const char *lto_section_name[LTO_N_SECTI
   "inline",
   "ipcp_trans",
   "icf",
-  "offload_table"
+  "offload_table",
+  "mode_table"
 };
 
 
@@ -262,7 +263,8 @@ lto_create_simple_input_block (struct lt
     return NULL;
 
   *datar = data;
-  return new lto_input_block (data + main_offset, header->main_size);
+  return new lto_input_block (data + main_offset, header->main_size,
+			      file_data->mode_table);
 }
 
 
--- gcc/tree-streamer-out.c.jj	2015-02-16 20:14:09.248349451 +0100
+++ gcc/tree-streamer-out.c	2015-02-16 20:26:23.661299156 +0100
@@ -190,7 +190,7 @@ static void
 pack_ts_fixed_cst_value_fields (struct bitpack_d *bp, tree expr)
 {
   struct fixed_value fv = TREE_FIXED_CST (expr);
-  bp_pack_enum (bp, machine_mode, MAX_MACHINE_MODE, fv.mode);
+  bp_pack_machine_mode (bp, fv.mode);
   bp_pack_var_len_int (bp, fv.data.low);
   bp_pack_var_len_int (bp, fv.data.high);
 }
@@ -201,7 +201,7 @@ pack_ts_fixed_cst_value_fields (struct b
 static void
 pack_ts_decl_common_value_fields (struct bitpack_d *bp, tree expr)
 {
-  bp_pack_enum (bp, machine_mode, MAX_MACHINE_MODE, DECL_MODE (expr));
+  bp_pack_machine_mode (bp, DECL_MODE (expr));
   bp_pack_value (bp, DECL_NONLOCAL (expr), 1);
   bp_pack_value (bp, DECL_VIRTUAL_P (expr), 1);
   bp_pack_value (bp, DECL_IGNORED_P (expr), 1);
@@ -325,7 +325,7 @@ pack_ts_function_decl_value_fields (stru
 static void
 pack_ts_type_common_value_fields (struct bitpack_d *bp, tree expr)
 {
-  bp_pack_enum (bp, machine_mode, MAX_MACHINE_MODE, TYPE_MODE (expr));
+  bp_pack_machine_mode (bp, TYPE_MODE (expr));
   bp_pack_value (bp, TYPE_STRING_FLAG (expr), 1);
   bp_pack_value (bp, TYPE_NO_FORCE_BLK (expr), 1);
   bp_pack_value (bp, TYPE_NEEDS_CONSTRUCTING (expr), 1);
--- gcc/real.h.jj	2015-02-16 20:14:09.278348958 +0100
+++ gcc/real.h	2015-02-16 20:26:23.666299074 +0100
@@ -155,6 +155,7 @@ struct real_format
   bool has_signed_zero;
   bool qnan_msb_set;
   bool canonical_nan_lsbs_set;
+  const char *name;
 };
 
 
--- gcc/lto-streamer.h.jj	2015-02-16 20:14:09.390347121 +0100
+++ gcc/lto-streamer.h	2015-02-16 20:26:23.658299206 +0100
@@ -248,6 +248,7 @@ enum lto_section_type
   LTO_section_ipcp_transform,
   LTO_section_ipa_icf,
   LTO_section_offload_table,
+  LTO_section_mode_table,
   LTO_N_SECTION_TYPES		/* Must be last.  */
 };
 
@@ -312,12 +313,15 @@ class lto_input_block
 public:
   /* Special constructor for the string table, it abuses this to
      do random access but use the uhwi decoder.  */
-  lto_input_block (const char *data_, unsigned int p_, unsigned int len_)
-      : data (data_), p (p_), len (len_) {}
-  lto_input_block (const char *data_, unsigned int len_)
-      : data (data_), p (0), len (len_) {}
+  lto_input_block (const char *data_, unsigned int p_, unsigned int len_,
+		   const unsigned char *mode_table_)
+      : data (data_), mode_table (mode_table_), p (p_), len (len_) {}
+  lto_input_block (const char *data_, unsigned int len_,
+		   const unsigned char *mode_table_)
+      : data (data_), mode_table (mode_table_), p (0), len (len_) {}
 
   const char *data;
+  const unsigned char *mode_table;
   unsigned int p;
   unsigned int len;
 };
@@ -527,6 +531,9 @@ struct GTY(()) lto_file_decl_data
 
   /* Map assigning declarations their resolutions.  */
   hash_map<tree, ld_plugin_symbol_resolution> * GTY((skip)) resolution_map;
+
+  /* Mode translation table.  */
+  const unsigned char *mode_table;
 };
 
 typedef struct lto_file_decl_data *lto_file_decl_data_ptr;
@@ -775,6 +782,7 @@ extern void lto_input_variable_construct
 extern void lto_input_constructors_and_inits (struct lto_file_decl_data *,
 					      const char *);
 extern void lto_input_toplevel_asms (struct lto_file_decl_data *, int);
+extern void lto_input_mode_table (struct lto_file_decl_data *);
 extern struct data_in *lto_data_in_create (struct lto_file_decl_data *,
 				    const char *, unsigned,
 				    vec<ld_plugin_symbol_resolution_t> );
@@ -807,6 +815,7 @@ void lto_output_decl_state_refs (struct
 			         struct lto_output_stream *,
 			         struct lto_out_decl_state *);
 void lto_output_location (struct output_block *, struct bitpack_d *, location_t);
+void lto_output_init_mode_table (void);
 
 
 /* In lto-cgraph.c  */
--- gcc/ipa-prop.c.jj	2015-02-16 20:14:09.832339869 +0100
+++ gcc/ipa-prop.c	2015-02-16 20:26:23.663299123 +0100
@@ -4868,7 +4868,7 @@ ipa_prop_read_section (struct lto_file_d
   unsigned int count;
 
   lto_input_block ib_main ((const char *) data + main_offset,
-			   header->main_size);
+			   header->main_size, file_data->mode_table);
 
   data_in =
     lto_data_in_create (file_data, (const char *) data + string_offset,
@@ -5089,7 +5089,7 @@ read_replacements_section (struct lto_fi
   unsigned int count;
 
   lto_input_block ib_main ((const char *) data + main_offset,
-			   header->main_size);
+			   header->main_size, file_data->mode_table);
 
   data_in = lto_data_in_create (file_data, (const char *) data + string_offset,
 				header->string_size, vNULL);
--- gcc/data-streamer-in.c.jj	2015-02-16 20:14:09.800340394 +0100
+++ gcc/data-streamer-in.c	2015-02-16 20:26:23.654299271 +0100
@@ -70,7 +70,7 @@ string_for_index (struct data_in *data_i
     }
 
   /* Get the string stored at location LOC in DATA_IN->STRINGS.  */
-  lto_input_block str_tab (data_in->strings, loc - 1, data_in->strings_len);
+  lto_input_block str_tab (data_in->strings, loc - 1, data_in->strings_len, NULL);
   len = streamer_read_uhwi (&str_tab);
   *rlen = len;
 
--- gcc/tree-streamer-in.c.jj	2015-02-16 20:14:09.524344922 +0100
+++ gcc/tree-streamer-in.c	2015-02-16 20:26:23.661299156 +0100
@@ -224,7 +224,7 @@ static void
 unpack_ts_fixed_cst_value_fields (struct bitpack_d *bp, tree expr)
 {
   FIXED_VALUE_TYPE *fp = ggc_alloc<fixed_value> ();
-  fp->mode = bp_unpack_enum (bp, machine_mode, MAX_MACHINE_MODE);
+  fp->mode = bp_unpack_machine_mode (bp);
   fp->data.low = bp_unpack_var_len_int (bp);
   fp->data.high = bp_unpack_var_len_int (bp);
   TREE_FIXED_CST_PTR (expr) = fp;
@@ -236,7 +236,7 @@ unpack_ts_fixed_cst_value_fields (struct
 static void
 unpack_ts_decl_common_value_fields (struct bitpack_d *bp, tree expr)
 {
-  DECL_MODE (expr) = bp_unpack_enum (bp, machine_mode, MAX_MACHINE_MODE);
+  DECL_MODE (expr) = bp_unpack_machine_mode (bp);
   DECL_NONLOCAL (expr) = (unsigned) bp_unpack_value (bp, 1);
   DECL_VIRTUAL_P (expr) = (unsigned) bp_unpack_value (bp, 1);
   DECL_IGNORED_P (expr) = (unsigned) bp_unpack_value (bp, 1);
@@ -373,7 +373,7 @@ unpack_ts_type_common_value_fields (stru
 {
   machine_mode mode;
 
-  mode = bp_unpack_enum (bp, machine_mode, MAX_MACHINE_MODE);
+  mode = bp_unpack_machine_mode (bp);
   SET_TYPE_MODE (expr, mode);
   TYPE_STRING_FLAG (expr) = (unsigned) bp_unpack_value (bp, 1);
   TYPE_NO_FORCE_BLK (expr) = (unsigned) bp_unpack_value (bp, 1);
--- gcc/ipa-inline-analysis.c.jj	2015-02-16 20:14:09.777340771 +0100
+++ gcc/ipa-inline-analysis.c	2015-02-16 20:26:23.655299255 +0100
@@ -4190,7 +4190,8 @@ inline_read_section (struct lto_file_dec
   unsigned int i, count2, j;
   unsigned int f_count;
 
-  lto_input_block ib ((const char *) data + main_offset, header->main_size);
+  lto_input_block ib ((const char *) data + main_offset, header->main_size,
+		      file_data->mode_table);
 
   data_in =
     lto_data_in_create (file_data, (const char *) data + string_offset,
--- gcc/ipa-icf.c.jj	2015-02-16 20:14:09.306348499 +0100
+++ gcc/ipa-icf.c	2015-02-16 20:26:23.663299123 +0100
@@ -1500,7 +1500,7 @@ sem_item_optimizer::read_section (lto_fi
   unsigned int count;
 
   lto_input_block ib_main ((const char *) data + main_offset, 0,
-			   header->main_size);
+			   header->main_size, file_data->mode_table);
 
   data_in =
     lto_data_in_create (file_data, (const char *) data + string_offset,
--- gcc/tree-streamer.c.jj	2015-02-16 20:14:09.440346300 +0100
+++ gcc/tree-streamer.c	2015-02-16 20:26:23.660299173 +0100
@@ -53,6 +53,14 @@ along with GCC; see the file COPYING3.
 #include "cgraph.h"
 #include "tree-streamer.h"
 
+/* Table indexed by machine_mode, used for 2 different purposes.
+   During streaming out we record there non-zero value for all modes
+   that were streamed out.
+   During streaming in, we translate the on the disk mode using this
+   table.  For normal LTO it is set to identity, for ACCEL_COMPILER
+   depending on the mode_table content.  */
+unsigned char streamer_mode_table[1 << 8];
+
 /* Check that all the TS_* structures handled by the streamer_write_* and
    streamer_read_* routines are exactly ALL the structures defined in
    treestruct.def.  */
--- gcc/lto/lto.c.jj	2015-02-16 20:14:09.133351338 +0100
+++ gcc/lto/lto.c	2015-02-16 20:26:23.666299074 +0100
@@ -1877,7 +1877,7 @@ lto_read_decls (struct lto_file_decl_dat
   uint32_t num_decl_states;
 
   lto_input_block ib_main ((const char *) data + main_offset,
-			   header->main_size);
+			   header->main_size, decl_data->mode_table);
 
   data_in = lto_data_in_create (decl_data, (const char *) data + string_offset,
 				header->string_size, resolutions);
@@ -2219,6 +2219,7 @@ lto_file_finalize (struct lto_file_decl_
 
   file_data->renaming_hash_table = lto_create_renaming_table ();
   file_data->file_name = file->filename;
+  lto_input_mode_table (file_data);
   data = lto_get_section_data (file_data, LTO_section_decls, NULL, &len);
   if (data == NULL)
     {
--- gcc/lto-cgraph.c.jj	2015-02-16 20:14:09.099351895 +0100
+++ gcc/lto-cgraph.c	2015-02-16 20:26:23.664299107 +0100
@@ -2113,7 +2113,7 @@ input_cgraph_opt_section (struct lto_fil
   unsigned int count;
 
   lto_input_block ib_main ((const char *) data + main_offset,
-			   header->main_size);
+			   header->main_size, file_data->mode_table);
 
   data_in =
     lto_data_in_create (file_data, (const char *) data + string_offset,
--- gcc/lto-streamer-in.c.jj	2015-02-16 20:14:09.346347843 +0100
+++ gcc/lto-streamer-in.c	2015-02-16 21:15:42.033774537 +0100
@@ -1116,10 +1116,12 @@ lto_read_body_or_constructor (struct lto
 
       /* Set up the struct function.  */
       from = data_in->reader_cache->nodes.length ();
-      lto_input_block ib_main (data + main_offset, header->main_size);
+      lto_input_block ib_main (data + main_offset, header->main_size,
+			       file_data->mode_table);
       if (TREE_CODE (node->decl) == FUNCTION_DECL)
 	{
-	  lto_input_block ib_cfg (data + cfg_offset, header->cfg_size);
+	  lto_input_block ib_cfg (data + cfg_offset, header->cfg_size,
+				  file_data->mode_table);
 	  input_function (fn_decl, data_in, &ib_main, &ib_cfg);
 	}
       else
@@ -1384,7 +1386,8 @@ lto_input_toplevel_asms (struct lto_file
 
   string_offset = sizeof (*header) + header->main_size;
 
-  lto_input_block ib (data + sizeof (*header), header->main_size);
+  lto_input_block ib (data + sizeof (*header), header->main_size,
+		      file_data->mode_table);
 
   data_in = lto_data_in_create (file_data, data + string_offset,
 			      header->string_size, vNULL);
@@ -1403,6 +1406,124 @@ lto_input_toplevel_asms (struct lto_file
 }
 
 
+/* Input mode table.  */
+
+void
+lto_input_mode_table (struct lto_file_decl_data *file_data)
+{
+  size_t len;
+  const char *data = lto_get_section_data (file_data, LTO_section_mode_table,
+					   NULL, &len);
+  unsigned char *table = ggc_cleared_vec_alloc<unsigned char> (1 << 8);
+
+  file_data->mode_table = table;
+  if (! data)
+    {
+      for (unsigned int m = 0; m < MAX_MACHINE_MODE; m++)
+	table[m] = m;
+      return;
+    }
+
+  const struct lto_simple_header_with_strings *header
+    = (const struct lto_simple_header_with_strings *) data;
+  int string_offset;
+  struct data_in *data_in;
+  string_offset = sizeof (*header) + header->main_size;
+
+  lto_input_block ib (data + sizeof (*header), header->main_size, NULL);
+  data_in = lto_data_in_create (file_data, data + string_offset,
+			        header->string_size, vNULL);
+  bitpack_d bp = streamer_read_bitpack (&ib);
+
+  table[VOIDmode] = VOIDmode;
+  table[BLKmode] = BLKmode;
+  unsigned int m;
+  while ((m = bp_unpack_value (&bp, 8)) != VOIDmode)
+    {
+      enum mode_class mclass
+	= bp_unpack_enum (&bp, mode_class, MAX_MODE_CLASS);
+      unsigned int size = bp_unpack_value (&bp, 8);
+      unsigned int prec = bp_unpack_value (&bp, 16);
+      machine_mode inner = (machine_mode) table[bp_unpack_value (&bp, 8)];
+      unsigned int nunits = bp_unpack_value (&bp, 8);
+      unsigned int ibit = 0, fbit = 0;
+      unsigned int real_fmt_len = 0;
+      const char *real_fmt_name = NULL;
+      switch (mclass)
+	{
+	case MODE_FRACT:
+        case MODE_UFRACT:
+        case MODE_ACCUM:
+        case MODE_UACCUM:
+          ibit = bp_unpack_value (&bp, 8);
+          fbit = bp_unpack_value (&bp, 8);
+          break;
+	case MODE_FLOAT:
+	case MODE_DECIMAL_FLOAT:
+	  real_fmt_name = bp_unpack_indexed_string (data_in, &bp,
+						    &real_fmt_len);
+	  break;
+	default:
+	  break;
+	}
+      /* First search just the GET_CLASS_NARROWEST_MODE to wider modes,
+	 if not found, fallback to all modes.  */
+      int pass;
+      for (pass = 0; pass < 2; pass++)
+	for (machine_mode mr = pass ? VOIDmode
+				    : GET_CLASS_NARROWEST_MODE (mclass);
+	     pass ? mr < MAX_MACHINE_MODE : mr != VOIDmode;
+	     pass ? mr = (machine_mode) (m + 1)
+		  : mr = GET_MODE_WIDER_MODE (mr))
+	  if (GET_MODE_CLASS (mr) != mclass
+	      || GET_MODE_SIZE (mr) != size
+	      || GET_MODE_PRECISION (mr) != prec
+	      || GET_MODE_INNER (mr) != inner
+	      || GET_MODE_IBIT (mr) != ibit
+	      || GET_MODE_FBIT (mr) != fbit
+	      || GET_MODE_NUNITS (mr) != nunits)
+	    continue;
+	  else if ((mclass == MODE_FLOAT || mclass == MODE_DECIMAL_FLOAT)
+		   && strcmp (REAL_MODE_FORMAT (mr)->name, real_fmt_name) != 0)
+	    continue;
+	  else
+	    {
+	      table[m] = mr;
+	      pass = 2;
+	      break;
+	    }
+      unsigned int len;
+      const char *mname = bp_unpack_indexed_string (data_in, &bp, &len);
+      if (pass == 2)
+	{
+	  switch (mclass)
+	    {
+	    case MODE_VECTOR_INT:
+	    case MODE_VECTOR_FLOAT:
+	    case MODE_VECTOR_FRACT:
+	    case MODE_VECTOR_UFRACT:
+	    case MODE_VECTOR_ACCUM:
+	    case MODE_VECTOR_UACCUM:
+	      /* For unsupported vector modes just use BLKmode,
+		 if the scalar mode is supported.  */
+	      if (inner != VOIDmode)
+		{
+		  table[m] = BLKmode;
+		  break;
+		}
+	      /* FALLTHRU */
+	    default:
+	      error ("unsupported mode %s\n", mname);
+	      break;
+	    }
+	}
+    }
+  lto_data_in_delete (data_in);
+
+  lto_free_section_data (file_data, LTO_section_asm, NULL, data, len);
+}
+
+
 /* Initialization for the LTO reader.  */
 
 void
--- gcc/config/pdp11/pdp11.c.jj	2015-02-16 20:14:09.818340098 +0100
+++ gcc/config/pdp11/pdp11.c	2015-02-16 20:26:23.653299288 +0100
@@ -107,7 +107,8 @@ const struct real_format pdp11_f_format
     false,
     false,
     false,
-    false
+    false,
+    "pdp11_f"
   };
 
 const struct real_format pdp11_d_format =
@@ -128,7 +129,8 @@ const struct real_format pdp11_d_format
     false,
     false,
     false,
-    false
+    false,
+    "pdp11_d"
   };
 
 static void
--- gcc/real.c.jj	2015-02-16 20:14:09.208350107 +0100
+++ gcc/real.c	2015-02-16 20:26:23.657299222 +0100
@@ -3031,7 +3031,8 @@ const struct real_format ieee_single_for
     true,
     true,
     true,
-    false
+    false,
+    "ieee_single"
   };
 
 const struct real_format mips_single_format =
@@ -3052,7 +3053,8 @@ const struct real_format mips_single_for
     true,
     true,
     false,
-    true
+    true,
+    "mips_single"
   };
 
 const struct real_format motorola_single_format =
@@ -3073,7 +3075,8 @@ const struct real_format motorola_single
     true,
     true,
     true,
-    true
+    true,
+    "motorola_single"
   };
 
 /*  SPU Single Precision (Extended-Range Mode) format is the same as IEEE
@@ -3105,7 +3108,8 @@ const struct real_format spu_single_form
     true,
     true,
     false,
-    false
+    false,
+    "spu_single"
   };
 \f
 /* IEEE double-precision format.  */
@@ -3314,7 +3318,8 @@ const struct real_format ieee_double_for
     true,
     true,
     true,
-    false
+    false,
+    "ieee_double"
   };
 
 const struct real_format mips_double_format =
@@ -3335,7 +3340,8 @@ const struct real_format mips_double_for
     true,
     true,
     false,
-    true
+    true,
+    "mips_double"
   };
 
 const struct real_format motorola_double_format =
@@ -3356,7 +3362,8 @@ const struct real_format motorola_double
     true,
     true,
     true,
-    true
+    true,
+    "motorola_double"
   };
 \f
 /* IEEE extended real format.  This comes in three flavors: Intel's as
@@ -3700,7 +3707,8 @@ const struct real_format ieee_extended_m
     true,
     true,
     true,
-    true
+    true,
+    "ieee_extended_motorola"
   };
 
 const struct real_format ieee_extended_intel_96_format =
@@ -3721,7 +3729,8 @@ const struct real_format ieee_extended_i
     true,
     true,
     true,
-    false
+    false,
+    "ieee_extended_intel_96"
   };
 
 const struct real_format ieee_extended_intel_128_format =
@@ -3742,7 +3751,8 @@ const struct real_format ieee_extended_i
     true,
     true,
     true,
-    false
+    false,
+    "ieee_extended_intel_128"
   };
 
 /* The following caters to i386 systems that set the rounding precision
@@ -3765,7 +3775,8 @@ const struct real_format ieee_extended_i
     true,
     true,
     true,
-    false
+    false,
+    "ieee_extended_intel_96_round_53"
   };
 \f
 /* IBM 128-bit extended precision format: a pair of IEEE double precision
@@ -3853,7 +3864,8 @@ const struct real_format ibm_extended_fo
     true,
     true,
     true,
-    false
+    false,
+    "ibm_extended"
   };
 
 const struct real_format mips_extended_format =
@@ -3874,7 +3886,8 @@ const struct real_format mips_extended_f
     true,
     true,
     false,
-    true
+    true,
+    "mips_extended"
   };
 
 \f
@@ -4137,7 +4150,8 @@ const struct real_format ieee_quad_forma
     true,
     true,
     true,
-    false
+    false,
+    "ieee_quad"
   };
 
 const struct real_format mips_quad_format =
@@ -4158,7 +4172,8 @@ const struct real_format mips_quad_forma
     true,
     true,
     false,
-    true
+    true,
+    "mips_quad"
   };
 \f
 /* Descriptions of VAX floating point formats can be found beginning at
@@ -4458,7 +4473,8 @@ const struct real_format vax_f_format =
     false,
     false,
     false,
-    false
+    false,
+    "vax_f"
   };
 
 const struct real_format vax_d_format =
@@ -4479,7 +4495,8 @@ const struct real_format vax_d_format =
     false,
     false,
     false,
-    false
+    false,
+    "vax_d"
   };
 
 const struct real_format vax_g_format =
@@ -4500,7 +4517,8 @@ const struct real_format vax_g_format =
     false,
     false,
     false,
-    false
+    false,
+    "vax_g"
   };
 \f
 /* Encode real R into a single precision DFP value in BUF.  */
@@ -4576,7 +4594,8 @@ const struct real_format decimal_single_
     true,
     true,
     true,
-    false
+    false,
+    "decimal_single"
   };
 
 /* Double precision decimal floating point (IEEE 754). */
@@ -4598,7 +4617,8 @@ const struct real_format decimal_double_
     true,
     true,
     true,
-    false
+    false,
+    "decimal_double"
   };
 
 /* Quad precision decimal floating point (IEEE 754). */
@@ -4620,7 +4640,8 @@ const struct real_format decimal_quad_fo
     true,
     true,
     true,
-    false
+    false,
+    "decimal_quad"
   };
 \f
 /* Encode half-precision floats.  This routine is used both for the IEEE
@@ -4757,7 +4778,8 @@ const struct real_format ieee_half_forma
     true,
     true,
     true,
-    false
+    false,
+    "ieee_half"
   };
 
 /* ARM's alternative half-precision format, similar to IEEE but with
@@ -4781,7 +4803,8 @@ const struct real_format arm_half_format
     true,
     true,
     false,
-    false
+    false,
+    "arm_half"
   };
 \f
 /* A synthetic "format" for internal arithmetic.  It's the size of the
@@ -4826,7 +4849,8 @@ const struct real_format real_internal_f
     false,
     true,
     true,
-    false
+    false,
+    "real_internal"
   };
 \f
 /* Calculate X raised to the integer exponent N in mode MODE and store


	Jakub

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: nvptx offloading patches [3/n], RFD
  2015-02-16 21:08     ` Jakub Jelinek
@ 2015-02-16 21:35       ` Richard Biener
  2015-02-16 21:44         ` Jakub Jelinek
  2015-02-17 13:32       ` Ilya Verbin
  1 sibling, 1 reply; 42+ messages in thread
From: Richard Biener @ 2015-02-16 21:35 UTC (permalink / raw)
  To: Jakub Jelinek, Jan Hubicka, Ilya Verbin, Bernd Schmidt, Thomas Schwinge
  Cc: gcc-patches

On February 16, 2015 10:08:12 PM CET, Jakub Jelinek <jakub@redhat.com> wrote:
>Hi!
>
>On Mon, Feb 09, 2015 at 11:20:00AM +0100, Richard Biener wrote:
>> I think (also communicated that on IRC) we should instead try not
>streaming
>> machine-modes at all but generating them at stream-in time via
>layout_type
>> or layout_decl.
>
>Here is a WIP prototype for being able to stream a machine mode
>description
>table and streaming it back in.
>In the end, I'd like to stream this out only for lto_stream_offload_p
>and
>stream it in only for ACCEL_COMPILER reading in when available, but
>wanted
>to see what it does even for native LTO.
>For that it doesn't work very well, because it seems that wpa phase
>doesn't stream in some sections and stream them out again, but instead
>somehow copies them directly to the output object, so the mode table
>isn't aware of the modes used in there that were bypassed this way.
>
>Anyway, the question is if for offloading we use wpa stage at all these
>days
>or not at all, if there is a way for ACCEL_COMPILER to differentiate
>somehow between LTO sections written by the host compiler and LTO
>sections
>perhaps created by the offloading compiler when trying to LTO the thing
>(if
>it does it at all).  Because obviously the host compiler written LTO
>(in .gnu.offload_lto_*) would need the machine modes translated, while
>LTO streamed already by the ACCEL_COMPILER (if any) generally would
>already
>use the offloading target machine modes and therefore should be treated
>as
>native lto (.gnu.lto_*). 
>
>If we don't try to write .gnu.offload_lto_* again, I think following
>patch
>with additionally not calling lto_write_mode_table for
>!lto_stream_offload_p
>and not calling lto_input_mode_table for !ACCEL_COMPILER - instead
>build
>a single shared identity table - might actually work.
>
>Thoughts on this?

Seeing the real format string you introduce I wonder if identifying modes by their names wouldn't work in 99% of all cases (apart from PSImode maybe).

Also for most cases we can construct the machine mode from the type.  Or where that is not possible stream the extra info that is necessary instead.

Overall feels like a hack BTW :)  can't we assign machine mode enum IDs in a target independent way?  I mean, it doesn't have to be densely allocated?

Richard.

>Bernd/Thomas, do you plan to commit the other approved patches soon?
>
>--- gcc/passes.c.jj	2015-02-16 20:14:09.477345693 +0100
>+++ gcc/passes.c	2015-02-16 20:26:23.659299189 +0100
>@@ -2460,6 +2460,7 @@ ipa_write_summaries_1 (lto_symtab_encode
>   struct lto_out_decl_state *state = lto_new_out_decl_state ();
>   state->symtab_node_encoder = encoder;
> 
>+  lto_output_init_mode_table ();
>   lto_push_out_decl_state (state);
> 
>   gcc_assert (!flag_wpa);
>@@ -2581,6 +2582,7 @@ ipa_write_optimization_summaries (lto_sy
>   lto_symtab_encoder_iterator lsei;
>   state->symtab_node_encoder = encoder;
> 
>+  lto_output_init_mode_table ();
>   lto_push_out_decl_state (state);
>   for (lsei = lsei_start_function_in_partition (encoder);
>        !lsei_end_p (lsei); lsei_next_function_in_partition (&lsei))
>--- gcc/tree-streamer.h.jj	2015-02-16 20:14:09.446346202 +0100
>+++ gcc/tree-streamer.h	2015-02-16 21:14:50.701615850 +0100
>@@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.
> 
> #include "streamer-hooks.h"
> #include "lto-streamer.h"
>+#include "data-streamer.h"
> #include "hash-map.h"
> 
> /* Cache of pickled nodes.  Used to avoid writing the same node more
>@@ -91,6 +92,7 @@ void streamer_write_integer_cst (struct
> void streamer_write_builtin (struct output_block *, tree);
> 
> /* In tree-streamer.c.  */
>+extern unsigned char streamer_mode_table[1 << 8];
> void streamer_check_handled_ts_structures (void);
> bool streamer_tree_cache_insert (struct streamer_tree_cache_d *, tree,
> 				 hashval_t, unsigned *);
>@@ -119,5 +121,19 @@ streamer_tree_cache_get_hash (struct str
>   return cache->hashes[ix];
> }
> 
>+static inline void
>+bp_pack_machine_mode (struct bitpack_d *bp, machine_mode mode)
>+{
>+  streamer_mode_table[mode] = 1;
>+  bp_pack_enum (bp, machine_mode, 1 << 8, mode);
>+}
>+
>+static inline machine_mode
>+bp_unpack_machine_mode (struct bitpack_d *bp)
>+{
>+  return (machine_mode)
>+	   ((struct lto_input_block *)
>+	    bp->stream)->mode_table[bp_unpack_enum (bp, machine_mode, 1 <<
>8)];
>+}
> 
> #endif  /* GCC_TREE_STREAMER_H  */
>--- gcc/lto-streamer-out.c.jj	2015-02-16 20:14:09.046352765 +0100
>+++ gcc/lto-streamer-out.c	2015-02-16 20:26:23.665299091 +0100
>@@ -2642,6 +2642,96 @@ produce_symtab (struct output_block *ob)
> }
> 
> 
>+/* Init the streamer_mode_table for output, where we collect info on
>what
>+   machine_mode values have been streamed.  */
>+void
>+lto_output_init_mode_table (void)
>+{
>+  memset (streamer_mode_table, '\0', MAX_MACHINE_MODE);
>+}
>+
>+
>+/* Write the mode table.  */
>+static void
>+lto_write_mode_table (void)
>+{
>+  struct output_block *ob;
>+  ob = create_output_block (LTO_section_mode_table);
>+  bitpack_d bp = bitpack_create (ob->main_stream);
>+
>+  /* Ensure that for GET_MODE_INNER (m) != VOIDmode we have
>+     also the inner mode marked.  */
>+  for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
>+    if (streamer_mode_table[i])
>+      {
>+	machine_mode m = (machine_mode) i;
>+	if (GET_MODE_INNER (m) != VOIDmode)
>+	  streamer_mode_table[(int) GET_MODE_INNER (m)] = 1;
>+      }
>+  /* First stream modes that have GET_MODE_INNER (m) == VOIDmode,
>+     so that we can refer to them afterwards.  */
>+  for (int pass = 0; pass < 2; pass++)
>+    for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
>+      if (streamer_mode_table[i] && i != (int) VOIDmode && i != (int)
>BLKmode)
>+	{
>+	  machine_mode m = (machine_mode) i;
>+	  if ((GET_MODE_INNER (m) == VOIDmode) ^ (pass == 0))
>+	    continue;
>+	  bp_pack_value (&bp, m, 8);
>+	  bp_pack_enum (&bp, mode_class, MAX_MODE_CLASS, GET_MODE_CLASS (m));
>+	  bp_pack_value (&bp, GET_MODE_SIZE (m), 8);
>+	  bp_pack_value (&bp, GET_MODE_PRECISION (m), 16);
>+	  bp_pack_value (&bp, GET_MODE_INNER (m), 8);
>+	  bp_pack_value (&bp, GET_MODE_NUNITS (m), 8);
>+	  switch (GET_MODE_CLASS (m))
>+	    {
>+	    case MODE_FRACT:
>+	    case MODE_UFRACT:
>+	    case MODE_ACCUM:
>+	    case MODE_UACCUM:
>+	      bp_pack_value (&bp, GET_MODE_IBIT (m), 8);
>+	      bp_pack_value (&bp, GET_MODE_FBIT (m), 8);
>+	      break;
>+	    case MODE_FLOAT:
>+	    case MODE_DECIMAL_FLOAT:
>+	      bp_pack_string (ob, &bp, REAL_MODE_FORMAT (m)->name, true);
>+	      break;
>+	    default:
>+	      break;
>+	    }
>+	  bp_pack_string (ob, &bp, GET_MODE_NAME (m), true);
>+	}
>+  bp_pack_value (&bp, VOIDmode, 8);
>+
>+  streamer_write_bitpack (&bp);
>+
>+  char *section_name
>+    = lto_get_section_name (LTO_section_mode_table, NULL, NULL);
>+  lto_begin_section (section_name, !flag_wpa);
>+  free (section_name);
>+
>+  /* The entire header stream is computed here.  */
>+  struct lto_simple_header_with_strings header;
>+  memset (&header, 0, sizeof (header));
>+
>+  /* Write the header.  */
>+  header.major_version = LTO_major_version;
>+  header.minor_version = LTO_minor_version;
>+
>+  header.main_size = ob->main_stream->total_size;
>+  header.string_size = ob->string_stream->total_size;
>+  lto_write_data (&header, sizeof header);
>+
>+  /* Put all of the gimple and the string table out the asm file as a
>+     block of text.  */
>+  lto_write_stream (ob->main_stream);
>+  lto_write_stream (ob->string_stream);
>+
>+  lto_end_section ();
>+  destroy_output_block (ob);
>+}
>+
>+
> /* This pass is run after all of the functions are serialized and all
>    of the IPA passes have written their serialized forms.  This pass
>    causes the vector of all of the global decls and types used from
>@@ -2749,4 +2839,5 @@ produce_asm_for_decls (void)
>   lto_symtab_encoder_delete (ob->decl_state->symtab_node_encoder);
>   lto_function_decl_states.release ();
>   destroy_output_block (ob);
>+  lto_write_mode_table ();
> }
>--- gcc/lto-section-in.c.jj	2015-02-16 20:14:09.567344217 +0100
>+++ gcc/lto-section-in.c	2015-02-16 20:26:23.655299255 +0100
>@@ -89,7 +89,8 @@ const char *lto_section_name[LTO_N_SECTI
>   "inline",
>   "ipcp_trans",
>   "icf",
>-  "offload_table"
>+  "offload_table",
>+  "mode_table"
> };
> 
> 
>@@ -262,7 +263,8 @@ lto_create_simple_input_block (struct lt
>     return NULL;
> 
>   *datar = data;
>-  return new lto_input_block (data + main_offset, header->main_size);
>+  return new lto_input_block (data + main_offset, header->main_size,
>+			      file_data->mode_table);
> }
> 
> 
>--- gcc/tree-streamer-out.c.jj	2015-02-16 20:14:09.248349451 +0100
>+++ gcc/tree-streamer-out.c	2015-02-16 20:26:23.661299156 +0100
>@@ -190,7 +190,7 @@ static void
> pack_ts_fixed_cst_value_fields (struct bitpack_d *bp, tree expr)
> {
>   struct fixed_value fv = TREE_FIXED_CST (expr);
>-  bp_pack_enum (bp, machine_mode, MAX_MACHINE_MODE, fv.mode);
>+  bp_pack_machine_mode (bp, fv.mode);
>   bp_pack_var_len_int (bp, fv.data.low);
>   bp_pack_var_len_int (bp, fv.data.high);
> }
>@@ -201,7 +201,7 @@ pack_ts_fixed_cst_value_fields (struct b
> static void
> pack_ts_decl_common_value_fields (struct bitpack_d *bp, tree expr)
> {
>-  bp_pack_enum (bp, machine_mode, MAX_MACHINE_MODE, DECL_MODE (expr));
>+  bp_pack_machine_mode (bp, DECL_MODE (expr));
>   bp_pack_value (bp, DECL_NONLOCAL (expr), 1);
>   bp_pack_value (bp, DECL_VIRTUAL_P (expr), 1);
>   bp_pack_value (bp, DECL_IGNORED_P (expr), 1);
>@@ -325,7 +325,7 @@ pack_ts_function_decl_value_fields (stru
> static void
> pack_ts_type_common_value_fields (struct bitpack_d *bp, tree expr)
> {
>-  bp_pack_enum (bp, machine_mode, MAX_MACHINE_MODE, TYPE_MODE (expr));
>+  bp_pack_machine_mode (bp, TYPE_MODE (expr));
>   bp_pack_value (bp, TYPE_STRING_FLAG (expr), 1);
>   bp_pack_value (bp, TYPE_NO_FORCE_BLK (expr), 1);
>   bp_pack_value (bp, TYPE_NEEDS_CONSTRUCTING (expr), 1);
>--- gcc/real.h.jj	2015-02-16 20:14:09.278348958 +0100
>+++ gcc/real.h	2015-02-16 20:26:23.666299074 +0100
>@@ -155,6 +155,7 @@ struct real_format
>   bool has_signed_zero;
>   bool qnan_msb_set;
>   bool canonical_nan_lsbs_set;
>+  const char *name;
> };
> 
> 
>--- gcc/lto-streamer.h.jj	2015-02-16 20:14:09.390347121 +0100
>+++ gcc/lto-streamer.h	2015-02-16 20:26:23.658299206 +0100
>@@ -248,6 +248,7 @@ enum lto_section_type
>   LTO_section_ipcp_transform,
>   LTO_section_ipa_icf,
>   LTO_section_offload_table,
>+  LTO_section_mode_table,
>   LTO_N_SECTION_TYPES		/* Must be last.  */
> };
> 
>@@ -312,12 +313,15 @@ class lto_input_block
> public:
>   /* Special constructor for the string table, it abuses this to
>      do random access but use the uhwi decoder.  */
>-  lto_input_block (const char *data_, unsigned int p_, unsigned int
>len_)
>-      : data (data_), p (p_), len (len_) {}
>-  lto_input_block (const char *data_, unsigned int len_)
>-      : data (data_), p (0), len (len_) {}
>+  lto_input_block (const char *data_, unsigned int p_, unsigned int
>len_,
>+		   const unsigned char *mode_table_)
>+      : data (data_), mode_table (mode_table_), p (p_), len (len_) {}
>+  lto_input_block (const char *data_, unsigned int len_,
>+		   const unsigned char *mode_table_)
>+      : data (data_), mode_table (mode_table_), p (0), len (len_) {}
> 
>   const char *data;
>+  const unsigned char *mode_table;
>   unsigned int p;
>   unsigned int len;
> };
>@@ -527,6 +531,9 @@ struct GTY(()) lto_file_decl_data
> 
>   /* Map assigning declarations their resolutions.  */
>hash_map<tree, ld_plugin_symbol_resolution> * GTY((skip))
>resolution_map;
>+
>+  /* Mode translation table.  */
>+  const unsigned char *mode_table;
> };
> 
> typedef struct lto_file_decl_data *lto_file_decl_data_ptr;
>@@ -775,6 +782,7 @@ extern void lto_input_variable_construct
>extern void lto_input_constructors_and_inits (struct lto_file_decl_data
>*,
> 					      const char *);
>extern void lto_input_toplevel_asms (struct lto_file_decl_data *, int);
>+extern void lto_input_mode_table (struct lto_file_decl_data *);
>extern struct data_in *lto_data_in_create (struct lto_file_decl_data *,
> 				    const char *, unsigned,
> 				    vec<ld_plugin_symbol_resolution_t> );
>@@ -807,6 +815,7 @@ void lto_output_decl_state_refs (struct
> 			         struct lto_output_stream *,
> 			         struct lto_out_decl_state *);
>void lto_output_location (struct output_block *, struct bitpack_d *,
>location_t);
>+void lto_output_init_mode_table (void);
> 
> 
> /* In lto-cgraph.c  */
>--- gcc/ipa-prop.c.jj	2015-02-16 20:14:09.832339869 +0100
>+++ gcc/ipa-prop.c	2015-02-16 20:26:23.663299123 +0100
>@@ -4868,7 +4868,7 @@ ipa_prop_read_section (struct lto_file_d
>   unsigned int count;
> 
>   lto_input_block ib_main ((const char *) data + main_offset,
>-			   header->main_size);
>+			   header->main_size, file_data->mode_table);
> 
>   data_in =
>    lto_data_in_create (file_data, (const char *) data + string_offset,
>@@ -5089,7 +5089,7 @@ read_replacements_section (struct lto_fi
>   unsigned int count;
> 
>   lto_input_block ib_main ((const char *) data + main_offset,
>-			   header->main_size);
>+			   header->main_size, file_data->mode_table);
> 
>data_in = lto_data_in_create (file_data, (const char *) data +
>string_offset,
> 				header->string_size, vNULL);
>--- gcc/data-streamer-in.c.jj	2015-02-16 20:14:09.800340394 +0100
>+++ gcc/data-streamer-in.c	2015-02-16 20:26:23.654299271 +0100
>@@ -70,7 +70,7 @@ string_for_index (struct data_in *data_i
>     }
> 
>   /* Get the string stored at location LOC in DATA_IN->STRINGS.  */
>-  lto_input_block str_tab (data_in->strings, loc - 1,
>data_in->strings_len);
>+  lto_input_block str_tab (data_in->strings, loc - 1,
>data_in->strings_len, NULL);
>   len = streamer_read_uhwi (&str_tab);
>   *rlen = len;
> 
>--- gcc/tree-streamer-in.c.jj	2015-02-16 20:14:09.524344922 +0100
>+++ gcc/tree-streamer-in.c	2015-02-16 20:26:23.661299156 +0100
>@@ -224,7 +224,7 @@ static void
> unpack_ts_fixed_cst_value_fields (struct bitpack_d *bp, tree expr)
> {
>   FIXED_VALUE_TYPE *fp = ggc_alloc<fixed_value> ();
>-  fp->mode = bp_unpack_enum (bp, machine_mode, MAX_MACHINE_MODE);
>+  fp->mode = bp_unpack_machine_mode (bp);
>   fp->data.low = bp_unpack_var_len_int (bp);
>   fp->data.high = bp_unpack_var_len_int (bp);
>   TREE_FIXED_CST_PTR (expr) = fp;
>@@ -236,7 +236,7 @@ unpack_ts_fixed_cst_value_fields (struct
> static void
> unpack_ts_decl_common_value_fields (struct bitpack_d *bp, tree expr)
> {
>-  DECL_MODE (expr) = bp_unpack_enum (bp, machine_mode,
>MAX_MACHINE_MODE);
>+  DECL_MODE (expr) = bp_unpack_machine_mode (bp);
>   DECL_NONLOCAL (expr) = (unsigned) bp_unpack_value (bp, 1);
>   DECL_VIRTUAL_P (expr) = (unsigned) bp_unpack_value (bp, 1);
>   DECL_IGNORED_P (expr) = (unsigned) bp_unpack_value (bp, 1);
>@@ -373,7 +373,7 @@ unpack_ts_type_common_value_fields (stru
> {
>   machine_mode mode;
> 
>-  mode = bp_unpack_enum (bp, machine_mode, MAX_MACHINE_MODE);
>+  mode = bp_unpack_machine_mode (bp);
>   SET_TYPE_MODE (expr, mode);
>   TYPE_STRING_FLAG (expr) = (unsigned) bp_unpack_value (bp, 1);
>   TYPE_NO_FORCE_BLK (expr) = (unsigned) bp_unpack_value (bp, 1);
>--- gcc/ipa-inline-analysis.c.jj	2015-02-16 20:14:09.777340771 +0100
>+++ gcc/ipa-inline-analysis.c	2015-02-16 20:26:23.655299255 +0100
>@@ -4190,7 +4190,8 @@ inline_read_section (struct lto_file_dec
>   unsigned int i, count2, j;
>   unsigned int f_count;
> 
>-  lto_input_block ib ((const char *) data + main_offset,
>header->main_size);
>+  lto_input_block ib ((const char *) data + main_offset,
>header->main_size,
>+		      file_data->mode_table);
> 
>   data_in =
>    lto_data_in_create (file_data, (const char *) data + string_offset,
>--- gcc/ipa-icf.c.jj	2015-02-16 20:14:09.306348499 +0100
>+++ gcc/ipa-icf.c	2015-02-16 20:26:23.663299123 +0100
>@@ -1500,7 +1500,7 @@ sem_item_optimizer::read_section (lto_fi
>   unsigned int count;
> 
>   lto_input_block ib_main ((const char *) data + main_offset, 0,
>-			   header->main_size);
>+			   header->main_size, file_data->mode_table);
> 
>   data_in =
>    lto_data_in_create (file_data, (const char *) data + string_offset,
>--- gcc/tree-streamer.c.jj	2015-02-16 20:14:09.440346300 +0100
>+++ gcc/tree-streamer.c	2015-02-16 20:26:23.660299173 +0100
>@@ -53,6 +53,14 @@ along with GCC; see the file COPYING3.
> #include "cgraph.h"
> #include "tree-streamer.h"
> 
>+/* Table indexed by machine_mode, used for 2 different purposes.
>+   During streaming out we record there non-zero value for all modes
>+   that were streamed out.
>+   During streaming in, we translate the on the disk mode using this
>+   table.  For normal LTO it is set to identity, for ACCEL_COMPILER
>+   depending on the mode_table content.  */
>+unsigned char streamer_mode_table[1 << 8];
>+
>/* Check that all the TS_* structures handled by the streamer_write_*
>and
>    streamer_read_* routines are exactly ALL the structures defined in
>    treestruct.def.  */
>--- gcc/lto/lto.c.jj	2015-02-16 20:14:09.133351338 +0100
>+++ gcc/lto/lto.c	2015-02-16 20:26:23.666299074 +0100
>@@ -1877,7 +1877,7 @@ lto_read_decls (struct lto_file_decl_dat
>   uint32_t num_decl_states;
> 
>   lto_input_block ib_main ((const char *) data + main_offset,
>-			   header->main_size);
>+			   header->main_size, decl_data->mode_table);
> 
>data_in = lto_data_in_create (decl_data, (const char *) data +
>string_offset,
> 				header->string_size, resolutions);
>@@ -2219,6 +2219,7 @@ lto_file_finalize (struct lto_file_decl_
> 
>   file_data->renaming_hash_table = lto_create_renaming_table ();
>   file_data->file_name = file->filename;
>+  lto_input_mode_table (file_data);
>data = lto_get_section_data (file_data, LTO_section_decls, NULL, &len);
>   if (data == NULL)
>     {
>--- gcc/lto-cgraph.c.jj	2015-02-16 20:14:09.099351895 +0100
>+++ gcc/lto-cgraph.c	2015-02-16 20:26:23.664299107 +0100
>@@ -2113,7 +2113,7 @@ input_cgraph_opt_section (struct lto_fil
>   unsigned int count;
> 
>   lto_input_block ib_main ((const char *) data + main_offset,
>-			   header->main_size);
>+			   header->main_size, file_data->mode_table);
> 
>   data_in =
>    lto_data_in_create (file_data, (const char *) data + string_offset,
>--- gcc/lto-streamer-in.c.jj	2015-02-16 20:14:09.346347843 +0100
>+++ gcc/lto-streamer-in.c	2015-02-16 21:15:42.033774537 +0100
>@@ -1116,10 +1116,12 @@ lto_read_body_or_constructor (struct lto
> 
>       /* Set up the struct function.  */
>       from = data_in->reader_cache->nodes.length ();
>-      lto_input_block ib_main (data + main_offset, header->main_size);
>+      lto_input_block ib_main (data + main_offset, header->main_size,
>+			       file_data->mode_table);
>       if (TREE_CODE (node->decl) == FUNCTION_DECL)
> 	{
>-	  lto_input_block ib_cfg (data + cfg_offset, header->cfg_size);
>+	  lto_input_block ib_cfg (data + cfg_offset, header->cfg_size,
>+				  file_data->mode_table);
> 	  input_function (fn_decl, data_in, &ib_main, &ib_cfg);
> 	}
>       else
>@@ -1384,7 +1386,8 @@ lto_input_toplevel_asms (struct lto_file
> 
>   string_offset = sizeof (*header) + header->main_size;
> 
>-  lto_input_block ib (data + sizeof (*header), header->main_size);
>+  lto_input_block ib (data + sizeof (*header), header->main_size,
>+		      file_data->mode_table);
> 
>   data_in = lto_data_in_create (file_data, data + string_offset,
> 			      header->string_size, vNULL);
>@@ -1403,6 +1406,124 @@ lto_input_toplevel_asms (struct lto_file
> }
> 
> 
>+/* Input mode table.  */
>+
>+void
>+lto_input_mode_table (struct lto_file_decl_data *file_data)
>+{
>+  size_t len;
>+  const char *data = lto_get_section_data (file_data,
>LTO_section_mode_table,
>+					   NULL, &len);
>+  unsigned char *table = ggc_cleared_vec_alloc<unsigned char> (1 <<
>8);
>+
>+  file_data->mode_table = table;
>+  if (! data)
>+    {
>+      for (unsigned int m = 0; m < MAX_MACHINE_MODE; m++)
>+	table[m] = m;
>+      return;
>+    }
>+
>+  const struct lto_simple_header_with_strings *header
>+    = (const struct lto_simple_header_with_strings *) data;
>+  int string_offset;
>+  struct data_in *data_in;
>+  string_offset = sizeof (*header) + header->main_size;
>+
>+  lto_input_block ib (data + sizeof (*header), header->main_size,
>NULL);
>+  data_in = lto_data_in_create (file_data, data + string_offset,
>+			        header->string_size, vNULL);
>+  bitpack_d bp = streamer_read_bitpack (&ib);
>+
>+  table[VOIDmode] = VOIDmode;
>+  table[BLKmode] = BLKmode;
>+  unsigned int m;
>+  while ((m = bp_unpack_value (&bp, 8)) != VOIDmode)
>+    {
>+      enum mode_class mclass
>+	= bp_unpack_enum (&bp, mode_class, MAX_MODE_CLASS);
>+      unsigned int size = bp_unpack_value (&bp, 8);
>+      unsigned int prec = bp_unpack_value (&bp, 16);
>+      machine_mode inner = (machine_mode) table[bp_unpack_value (&bp,
>8)];
>+      unsigned int nunits = bp_unpack_value (&bp, 8);
>+      unsigned int ibit = 0, fbit = 0;
>+      unsigned int real_fmt_len = 0;
>+      const char *real_fmt_name = NULL;
>+      switch (mclass)
>+	{
>+	case MODE_FRACT:
>+        case MODE_UFRACT:
>+        case MODE_ACCUM:
>+        case MODE_UACCUM:
>+          ibit = bp_unpack_value (&bp, 8);
>+          fbit = bp_unpack_value (&bp, 8);
>+          break;
>+	case MODE_FLOAT:
>+	case MODE_DECIMAL_FLOAT:
>+	  real_fmt_name = bp_unpack_indexed_string (data_in, &bp,
>+						    &real_fmt_len);
>+	  break;
>+	default:
>+	  break;
>+	}
>+      /* First search just the GET_CLASS_NARROWEST_MODE to wider
>modes,
>+	 if not found, fallback to all modes.  */
>+      int pass;
>+      for (pass = 0; pass < 2; pass++)
>+	for (machine_mode mr = pass ? VOIDmode
>+				    : GET_CLASS_NARROWEST_MODE (mclass);
>+	     pass ? mr < MAX_MACHINE_MODE : mr != VOIDmode;
>+	     pass ? mr = (machine_mode) (m + 1)
>+		  : mr = GET_MODE_WIDER_MODE (mr))
>+	  if (GET_MODE_CLASS (mr) != mclass
>+	      || GET_MODE_SIZE (mr) != size
>+	      || GET_MODE_PRECISION (mr) != prec
>+	      || GET_MODE_INNER (mr) != inner
>+	      || GET_MODE_IBIT (mr) != ibit
>+	      || GET_MODE_FBIT (mr) != fbit
>+	      || GET_MODE_NUNITS (mr) != nunits)
>+	    continue;
>+	  else if ((mclass == MODE_FLOAT || mclass == MODE_DECIMAL_FLOAT)
>+		   && strcmp (REAL_MODE_FORMAT (mr)->name, real_fmt_name) != 0)
>+	    continue;
>+	  else
>+	    {
>+	      table[m] = mr;
>+	      pass = 2;
>+	      break;
>+	    }
>+      unsigned int len;
>+      const char *mname = bp_unpack_indexed_string (data_in, &bp,
>&len);
>+      if (pass == 2)
>+	{
>+	  switch (mclass)
>+	    {
>+	    case MODE_VECTOR_INT:
>+	    case MODE_VECTOR_FLOAT:
>+	    case MODE_VECTOR_FRACT:
>+	    case MODE_VECTOR_UFRACT:
>+	    case MODE_VECTOR_ACCUM:
>+	    case MODE_VECTOR_UACCUM:
>+	      /* For unsupported vector modes just use BLKmode,
>+		 if the scalar mode is supported.  */
>+	      if (inner != VOIDmode)
>+		{
>+		  table[m] = BLKmode;
>+		  break;
>+		}
>+	      /* FALLTHRU */
>+	    default:
>+	      error ("unsupported mode %s\n", mname);
>+	      break;
>+	    }
>+	}
>+    }
>+  lto_data_in_delete (data_in);
>+
>+  lto_free_section_data (file_data, LTO_section_asm, NULL, data, len);
>+}
>+
>+
> /* Initialization for the LTO reader.  */
> 
> void
>--- gcc/config/pdp11/pdp11.c.jj	2015-02-16 20:14:09.818340098 +0100
>+++ gcc/config/pdp11/pdp11.c	2015-02-16 20:26:23.653299288 +0100
>@@ -107,7 +107,8 @@ const struct real_format pdp11_f_format
>     false,
>     false,
>     false,
>-    false
>+    false,
>+    "pdp11_f"
>   };
> 
> const struct real_format pdp11_d_format =
>@@ -128,7 +129,8 @@ const struct real_format pdp11_d_format
>     false,
>     false,
>     false,
>-    false
>+    false,
>+    "pdp11_d"
>   };
> 
> static void
>--- gcc/real.c.jj	2015-02-16 20:14:09.208350107 +0100
>+++ gcc/real.c	2015-02-16 20:26:23.657299222 +0100
>@@ -3031,7 +3031,8 @@ const struct real_format ieee_single_for
>     true,
>     true,
>     true,
>-    false
>+    false,
>+    "ieee_single"
>   };
> 
> const struct real_format mips_single_format =
>@@ -3052,7 +3053,8 @@ const struct real_format mips_single_for
>     true,
>     true,
>     false,
>-    true
>+    true,
>+    "mips_single"
>   };
> 
> const struct real_format motorola_single_format =
>@@ -3073,7 +3075,8 @@ const struct real_format motorola_single
>     true,
>     true,
>     true,
>-    true
>+    true,
>+    "motorola_single"
>   };
> 
>/*  SPU Single Precision (Extended-Range Mode) format is the same as
>IEEE
>@@ -3105,7 +3108,8 @@ const struct real_format spu_single_form
>     true,
>     true,
>     false,
>-    false
>+    false,
>+    "spu_single"
>   };
> \f>
> /* IEEE double-precision format.  */
>@@ -3314,7 +3318,8 @@ const struct real_format ieee_double_for
>     true,
>     true,
>     true,
>-    false
>+    false,
>+    "ieee_double"
>   };
> 
> const struct real_format mips_double_format =
>@@ -3335,7 +3340,8 @@ const struct real_format mips_double_for
>     true,
>     true,
>     false,
>-    true
>+    true,
>+    "mips_double"
>   };
> 
> const struct real_format motorola_double_format =
>@@ -3356,7 +3362,8 @@ const struct real_format motorola_double
>     true,
>     true,
>     true,
>-    true
>+    true,
>+    "motorola_double"
>   };
> \f>
> /* IEEE extended real format.  This comes in three flavors: Intel's as
>@@ -3700,7 +3707,8 @@ const struct real_format ieee_extended_m
>     true,
>     true,
>     true,
>-    true
>+    true,
>+    "ieee_extended_motorola"
>   };
> 
> const struct real_format ieee_extended_intel_96_format =
>@@ -3721,7 +3729,8 @@ const struct real_format ieee_extended_i
>     true,
>     true,
>     true,
>-    false
>+    false,
>+    "ieee_extended_intel_96"
>   };
> 
> const struct real_format ieee_extended_intel_128_format =
>@@ -3742,7 +3751,8 @@ const struct real_format ieee_extended_i
>     true,
>     true,
>     true,
>-    false
>+    false,
>+    "ieee_extended_intel_128"
>   };
> 
>/* The following caters to i386 systems that set the rounding precision
>@@ -3765,7 +3775,8 @@ const struct real_format ieee_extended_i
>     true,
>     true,
>     true,
>-    false
>+    false,
>+    "ieee_extended_intel_96_round_53"
>   };
> \f>
>/* IBM 128-bit extended precision format: a pair of IEEE double
>precision
>@@ -3853,7 +3864,8 @@ const struct real_format ibm_extended_fo
>     true,
>     true,
>     true,
>-    false
>+    false,
>+    "ibm_extended"
>   };
> 
> const struct real_format mips_extended_format =
>@@ -3874,7 +3886,8 @@ const struct real_format mips_extended_f
>     true,
>     true,
>     false,
>-    true
>+    true,
>+    "mips_extended"
>   };
> 
> \f>
>@@ -4137,7 +4150,8 @@ const struct real_format ieee_quad_forma
>     true,
>     true,
>     true,
>-    false
>+    false,
>+    "ieee_quad"
>   };
> 
> const struct real_format mips_quad_format =
>@@ -4158,7 +4172,8 @@ const struct real_format mips_quad_forma
>     true,
>     true,
>     false,
>-    true
>+    true,
>+    "mips_quad"
>   };
> \f>
>/* Descriptions of VAX floating point formats can be found beginning at
>@@ -4458,7 +4473,8 @@ const struct real_format vax_f_format =
>     false,
>     false,
>     false,
>-    false
>+    false,
>+    "vax_f"
>   };
> 
> const struct real_format vax_d_format =
>@@ -4479,7 +4495,8 @@ const struct real_format vax_d_format =
>     false,
>     false,
>     false,
>-    false
>+    false,
>+    "vax_d"
>   };
> 
> const struct real_format vax_g_format =
>@@ -4500,7 +4517,8 @@ const struct real_format vax_g_format =
>     false,
>     false,
>     false,
>-    false
>+    false,
>+    "vax_g"
>   };
> \f>
> /* Encode real R into a single precision DFP value in BUF.  */
>@@ -4576,7 +4594,8 @@ const struct real_format decimal_single_
>     true,
>     true,
>     true,
>-    false
>+    false,
>+    "decimal_single"
>   };
> 
> /* Double precision decimal floating point (IEEE 754). */
>@@ -4598,7 +4617,8 @@ const struct real_format decimal_double_
>     true,
>     true,
>     true,
>-    false
>+    false,
>+    "decimal_double"
>   };
> 
> /* Quad precision decimal floating point (IEEE 754). */
>@@ -4620,7 +4640,8 @@ const struct real_format decimal_quad_fo
>     true,
>     true,
>     true,
>-    false
>+    false,
>+    "decimal_quad"
>   };
> \f>
>/* Encode half-precision floats.  This routine is used both for the
>IEEE
>@@ -4757,7 +4778,8 @@ const struct real_format ieee_half_forma
>     true,
>     true,
>     true,
>-    false
>+    false,
>+    "ieee_half"
>   };
> 
> /* ARM's alternative half-precision format, similar to IEEE but with
>@@ -4781,7 +4803,8 @@ const struct real_format arm_half_format
>     true,
>     true,
>     false,
>-    false
>+    false,
>+    "arm_half"
>   };
> \f>
> /* A synthetic "format" for internal arithmetic.  It's the size of the
>@@ -4826,7 +4849,8 @@ const struct real_format real_internal_f
>     false,
>     true,
>     true,
>-    false
>+    false,
>+    "real_internal"
>   };
> \f>
> /* Calculate X raised to the integer exponent N in mode MODE and store
>
>
>	Jakub


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: nvptx offloading patches [3/n], RFD
  2015-02-16 21:35       ` Richard Biener
@ 2015-02-16 21:44         ` Jakub Jelinek
  2015-02-17 10:00           ` Richard Biener
  2015-02-18  9:05           ` nvptx offloading patches [3/n], RFD Thomas Schwinge
  0 siblings, 2 replies; 42+ messages in thread
From: Jakub Jelinek @ 2015-02-16 21:44 UTC (permalink / raw)
  To: Richard Biener
  Cc: Jan Hubicka, Ilya Verbin, Bernd Schmidt, Thomas Schwinge, gcc-patches

On Mon, Feb 16, 2015 at 10:35:30PM +0100, Richard Biener wrote:
> Seeing the real format string you introduce I wonder if identifying modes
> by their names wouldn't work in 99% of all cases (apart from PSImode
> maybe).

There are various corner cases.  Plus of course sometimes insignificant, but
sometimes very significant, floating mode changes.  SFmode on one target
might be completely different from another target.

> Also for most cases we can construct the machine mode from the type.  Or
> where that is not possible stream the extra info that is necessary
> instead.

I thought we've discussed that already on IRC.  E.g. decimal modes are
identified only by mode and nothing else, and it doesn't look like it
can be easily derived from types in many cases (spent quite some time on
that).

> Overall feels like a hack BTW :)  can't we assign machine mode enum IDs in
> a target independent way?  I mean, it doesn't have to be densely
> allocated?

We iterate over modes, we have tons of tables indexed by modes, so if we
introduce gaps, we'll make the compiler bigger and slower.
If this is limited to the offloading path, like in the attached updated
patch, the overhead for native LTO should be not measurable.

--- gcc/passes.c.jj	2015-02-16 22:18:33.219702315 +0100
+++ gcc/passes.c	2015-02-16 22:19:20.842917807 +0100
@@ -2460,6 +2460,7 @@ ipa_write_summaries_1 (lto_symtab_encode
   struct lto_out_decl_state *state = lto_new_out_decl_state ();
   state->symtab_node_encoder = encoder;
 
+  lto_output_init_mode_table ();
   lto_push_out_decl_state (state);
 
   gcc_assert (!flag_wpa);
@@ -2581,6 +2582,7 @@ ipa_write_optimization_summaries (lto_sy
   lto_symtab_encoder_iterator lsei;
   state->symtab_node_encoder = encoder;
 
+  lto_output_init_mode_table ();
   lto_push_out_decl_state (state);
   for (lsei = lsei_start_function_in_partition (encoder);
        !lsei_end_p (lsei); lsei_next_function_in_partition (&lsei))
--- gcc/tree-streamer.h.jj	2015-02-16 22:18:33.222702266 +0100
+++ gcc/tree-streamer.h	2015-02-16 22:19:20.843917791 +0100
@@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.
 
 #include "streamer-hooks.h"
 #include "lto-streamer.h"
+#include "data-streamer.h"
 #include "hash-map.h"
 
 /* Cache of pickled nodes.  Used to avoid writing the same node more
@@ -91,6 +92,7 @@ void streamer_write_integer_cst (struct
 void streamer_write_builtin (struct output_block *, tree);
 
 /* In tree-streamer.c.  */
+extern unsigned char streamer_mode_table[1 << 8];
 void streamer_check_handled_ts_structures (void);
 bool streamer_tree_cache_insert (struct streamer_tree_cache_d *, tree,
 				 hashval_t, unsigned *);
@@ -119,5 +121,19 @@ streamer_tree_cache_get_hash (struct str
   return cache->hashes[ix];
 }
 
+static inline void
+bp_pack_machine_mode (struct bitpack_d *bp, machine_mode mode)
+{
+  streamer_mode_table[mode] = 1;
+  bp_pack_enum (bp, machine_mode, 1 << 8, mode);
+}
+
+static inline machine_mode
+bp_unpack_machine_mode (struct bitpack_d *bp)
+{
+  return (machine_mode)
+	   ((struct lto_input_block *)
+	    bp->stream)->mode_table[bp_unpack_enum (bp, machine_mode, 1 << 8)];
+}
 
 #endif  /* GCC_TREE_STREAMER_H  */
--- gcc/lto-streamer-out.c.jj	2015-02-16 22:18:33.204702562 +0100
+++ gcc/lto-streamer-out.c	2015-02-16 22:20:06.659163066 +0100
@@ -2642,6 +2642,96 @@ produce_symtab (struct output_block *ob)
 }
 
 
+/* Init the streamer_mode_table for output, where we collect info on what
+   machine_mode values have been streamed.  */
+void
+lto_output_init_mode_table (void)
+{
+  memset (streamer_mode_table, '\0', MAX_MACHINE_MODE);
+}
+
+
+/* Write the mode table.  */
+static void
+lto_write_mode_table (void)
+{
+  struct output_block *ob;
+  ob = create_output_block (LTO_section_mode_table);
+  bitpack_d bp = bitpack_create (ob->main_stream);
+
+  /* Ensure that for GET_MODE_INNER (m) != VOIDmode we have
+     also the inner mode marked.  */
+  for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
+    if (streamer_mode_table[i])
+      {
+	machine_mode m = (machine_mode) i;
+	if (GET_MODE_INNER (m) != VOIDmode)
+	  streamer_mode_table[(int) GET_MODE_INNER (m)] = 1;
+      }
+  /* First stream modes that have GET_MODE_INNER (m) == VOIDmode,
+     so that we can refer to them afterwards.  */
+  for (int pass = 0; pass < 2; pass++)
+    for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
+      if (streamer_mode_table[i] && i != (int) VOIDmode && i != (int) BLKmode)
+	{
+	  machine_mode m = (machine_mode) i;
+	  if ((GET_MODE_INNER (m) == VOIDmode) ^ (pass == 0))
+	    continue;
+	  bp_pack_value (&bp, m, 8);
+	  bp_pack_enum (&bp, mode_class, MAX_MODE_CLASS, GET_MODE_CLASS (m));
+	  bp_pack_value (&bp, GET_MODE_SIZE (m), 8);
+	  bp_pack_value (&bp, GET_MODE_PRECISION (m), 16);
+	  bp_pack_value (&bp, GET_MODE_INNER (m), 8);
+	  bp_pack_value (&bp, GET_MODE_NUNITS (m), 8);
+	  switch (GET_MODE_CLASS (m))
+	    {
+	    case MODE_FRACT:
+	    case MODE_UFRACT:
+	    case MODE_ACCUM:
+	    case MODE_UACCUM:
+	      bp_pack_value (&bp, GET_MODE_IBIT (m), 8);
+	      bp_pack_value (&bp, GET_MODE_FBIT (m), 8);
+	      break;
+	    case MODE_FLOAT:
+	    case MODE_DECIMAL_FLOAT:
+	      bp_pack_string (ob, &bp, REAL_MODE_FORMAT (m)->name, true);
+	      break;
+	    default:
+	      break;
+	    }
+	  bp_pack_string (ob, &bp, GET_MODE_NAME (m), true);
+	}
+  bp_pack_value (&bp, VOIDmode, 8);
+
+  streamer_write_bitpack (&bp);
+
+  char *section_name
+    = lto_get_section_name (LTO_section_mode_table, NULL, NULL);
+  lto_begin_section (section_name, !flag_wpa);
+  free (section_name);
+
+  /* The entire header stream is computed here.  */
+  struct lto_simple_header_with_strings header;
+  memset (&header, 0, sizeof (header));
+
+  /* Write the header.  */
+  header.major_version = LTO_major_version;
+  header.minor_version = LTO_minor_version;
+
+  header.main_size = ob->main_stream->total_size;
+  header.string_size = ob->string_stream->total_size;
+  lto_write_data (&header, sizeof header);
+
+  /* Put all of the gimple and the string table out the asm file as a
+     block of text.  */
+  lto_write_stream (ob->main_stream);
+  lto_write_stream (ob->string_stream);
+
+  lto_end_section ();
+  destroy_output_block (ob);
+}
+
+
 /* This pass is run after all of the functions are serialized and all
    of the IPA passes have written their serialized forms.  This pass
    causes the vector of all of the global decls and types used from
@@ -2749,4 +2839,6 @@ produce_asm_for_decls (void)
   lto_symtab_encoder_delete (ob->decl_state->symtab_node_encoder);
   lto_function_decl_states.release ();
   destroy_output_block (ob);
+  if (lto_stream_offload_p)
+    lto_write_mode_table ();
 }
--- gcc/config/pdp11/pdp11.c.jj	2015-02-16 22:18:33.209702480 +0100
+++ gcc/config/pdp11/pdp11.c	2015-02-16 22:19:20.845917758 +0100
@@ -107,7 +107,8 @@ const struct real_format pdp11_f_format
     false,
     false,
     false,
-    false
+    false,
+    "pdp11_f"
   };
 
 const struct real_format pdp11_d_format =
@@ -128,7 +129,8 @@ const struct real_format pdp11_d_format
     false,
     false,
     false,
-    false
+    false,
+    "pdp11_d"
   };
 
 static void
--- gcc/lto-section-in.c.jj	2015-02-16 22:18:33.202702595 +0100
+++ gcc/lto-section-in.c	2015-02-16 22:19:20.845917758 +0100
@@ -89,7 +89,8 @@ const char *lto_section_name[LTO_N_SECTI
   "inline",
   "ipcp_trans",
   "icf",
-  "offload_table"
+  "offload_table",
+  "mode_table"
 };
 
 
@@ -262,7 +263,8 @@ lto_create_simple_input_block (struct lt
     return NULL;
 
   *datar = data;
-  return new lto_input_block (data + main_offset, header->main_size);
+  return new lto_input_block (data + main_offset, header->main_size,
+			      file_data->mode_table);
 }
 
 
--- gcc/tree-streamer-out.c.jj	2015-02-16 22:18:33.222702266 +0100
+++ gcc/tree-streamer-out.c	2015-02-16 22:19:20.845917758 +0100
@@ -190,7 +190,7 @@ static void
 pack_ts_fixed_cst_value_fields (struct bitpack_d *bp, tree expr)
 {
   struct fixed_value fv = TREE_FIXED_CST (expr);
-  bp_pack_enum (bp, machine_mode, MAX_MACHINE_MODE, fv.mode);
+  bp_pack_machine_mode (bp, fv.mode);
   bp_pack_var_len_int (bp, fv.data.low);
   bp_pack_var_len_int (bp, fv.data.high);
 }
@@ -201,7 +201,7 @@ pack_ts_fixed_cst_value_fields (struct b
 static void
 pack_ts_decl_common_value_fields (struct bitpack_d *bp, tree expr)
 {
-  bp_pack_enum (bp, machine_mode, MAX_MACHINE_MODE, DECL_MODE (expr));
+  bp_pack_machine_mode (bp, DECL_MODE (expr));
   bp_pack_value (bp, DECL_NONLOCAL (expr), 1);
   bp_pack_value (bp, DECL_VIRTUAL_P (expr), 1);
   bp_pack_value (bp, DECL_IGNORED_P (expr), 1);
@@ -325,7 +325,7 @@ pack_ts_function_decl_value_fields (stru
 static void
 pack_ts_type_common_value_fields (struct bitpack_d *bp, tree expr)
 {
-  bp_pack_enum (bp, machine_mode, MAX_MACHINE_MODE, TYPE_MODE (expr));
+  bp_pack_machine_mode (bp, TYPE_MODE (expr));
   bp_pack_value (bp, TYPE_STRING_FLAG (expr), 1);
   bp_pack_value (bp, TYPE_NO_FORCE_BLK (expr), 1);
   bp_pack_value (bp, TYPE_NEEDS_CONSTRUCTING (expr), 1);
--- gcc/real.h.jj	2015-02-16 22:18:33.220702299 +0100
+++ gcc/real.h	2015-02-16 22:19:20.846917741 +0100
@@ -155,6 +155,7 @@ struct real_format
   bool has_signed_zero;
   bool qnan_msb_set;
   bool canonical_nan_lsbs_set;
+  const char *name;
 };
 
 
--- gcc/lto-streamer.h.jj	2015-02-16 22:18:33.211702447 +0100
+++ gcc/lto-streamer.h	2015-02-16 22:19:20.846917741 +0100
@@ -248,6 +248,7 @@ enum lto_section_type
   LTO_section_ipcp_transform,
   LTO_section_ipa_icf,
   LTO_section_offload_table,
+  LTO_section_mode_table,
   LTO_N_SECTION_TYPES		/* Must be last.  */
 };
 
@@ -312,12 +313,15 @@ class lto_input_block
 public:
   /* Special constructor for the string table, it abuses this to
      do random access but use the uhwi decoder.  */
-  lto_input_block (const char *data_, unsigned int p_, unsigned int len_)
-      : data (data_), p (p_), len (len_) {}
-  lto_input_block (const char *data_, unsigned int len_)
-      : data (data_), p (0), len (len_) {}
+  lto_input_block (const char *data_, unsigned int p_, unsigned int len_,
+		   const unsigned char *mode_table_)
+      : data (data_), mode_table (mode_table_), p (p_), len (len_) {}
+  lto_input_block (const char *data_, unsigned int len_,
+		   const unsigned char *mode_table_)
+      : data (data_), mode_table (mode_table_), p (0), len (len_) {}
 
   const char *data;
+  const unsigned char *mode_table;
   unsigned int p;
   unsigned int len;
 };
@@ -527,6 +531,9 @@ struct GTY(()) lto_file_decl_data
 
   /* Map assigning declarations their resolutions.  */
   hash_map<tree, ld_plugin_symbol_resolution> * GTY((skip)) resolution_map;
+
+  /* Mode translation table.  */
+  const unsigned char *mode_table;
 };
 
 typedef struct lto_file_decl_data *lto_file_decl_data_ptr;
@@ -775,6 +782,7 @@ extern void lto_input_variable_construct
 extern void lto_input_constructors_and_inits (struct lto_file_decl_data *,
 					      const char *);
 extern void lto_input_toplevel_asms (struct lto_file_decl_data *, int);
+extern void lto_input_mode_table (struct lto_file_decl_data *);
 extern struct data_in *lto_data_in_create (struct lto_file_decl_data *,
 				    const char *, unsigned,
 				    vec<ld_plugin_symbol_resolution_t> );
@@ -807,6 +815,7 @@ void lto_output_decl_state_refs (struct
 			         struct lto_output_stream *,
 			         struct lto_out_decl_state *);
 void lto_output_location (struct output_block *, struct bitpack_d *, location_t);
+void lto_output_init_mode_table (void);
 
 
 /* In lto-cgraph.c  */
--- gcc/ipa-prop.c.jj	2015-02-16 22:18:33.219702315 +0100
+++ gcc/ipa-prop.c	2015-02-16 22:19:20.848917709 +0100
@@ -4868,7 +4868,7 @@ ipa_prop_read_section (struct lto_file_d
   unsigned int count;
 
   lto_input_block ib_main ((const char *) data + main_offset,
-			   header->main_size);
+			   header->main_size, file_data->mode_table);
 
   data_in =
     lto_data_in_create (file_data, (const char *) data + string_offset,
@@ -5089,7 +5089,7 @@ read_replacements_section (struct lto_fi
   unsigned int count;
 
   lto_input_block ib_main ((const char *) data + main_offset,
-			   header->main_size);
+			   header->main_size, file_data->mode_table);
 
   data_in = lto_data_in_create (file_data, (const char *) data + string_offset,
 				header->string_size, vNULL);
--- gcc/data-streamer-in.c.jj	2015-02-16 22:18:33.224702233 +0100
+++ gcc/data-streamer-in.c	2015-02-16 22:19:20.848917709 +0100
@@ -70,7 +70,7 @@ string_for_index (struct data_in *data_i
     }
 
   /* Get the string stored at location LOC in DATA_IN->STRINGS.  */
-  lto_input_block str_tab (data_in->strings, loc - 1, data_in->strings_len);
+  lto_input_block str_tab (data_in->strings, loc - 1, data_in->strings_len, NULL);
   len = streamer_read_uhwi (&str_tab);
   *rlen = len;
 
--- gcc/tree-streamer-in.c.jj	2015-02-16 22:18:33.220702299 +0100
+++ gcc/tree-streamer-in.c	2015-02-16 22:19:20.849917692 +0100
@@ -224,7 +224,7 @@ static void
 unpack_ts_fixed_cst_value_fields (struct bitpack_d *bp, tree expr)
 {
   FIXED_VALUE_TYPE *fp = ggc_alloc<fixed_value> ();
-  fp->mode = bp_unpack_enum (bp, machine_mode, MAX_MACHINE_MODE);
+  fp->mode = bp_unpack_machine_mode (bp);
   fp->data.low = bp_unpack_var_len_int (bp);
   fp->data.high = bp_unpack_var_len_int (bp);
   TREE_FIXED_CST_PTR (expr) = fp;
@@ -236,7 +236,7 @@ unpack_ts_fixed_cst_value_fields (struct
 static void
 unpack_ts_decl_common_value_fields (struct bitpack_d *bp, tree expr)
 {
-  DECL_MODE (expr) = bp_unpack_enum (bp, machine_mode, MAX_MACHINE_MODE);
+  DECL_MODE (expr) = bp_unpack_machine_mode (bp);
   DECL_NONLOCAL (expr) = (unsigned) bp_unpack_value (bp, 1);
   DECL_VIRTUAL_P (expr) = (unsigned) bp_unpack_value (bp, 1);
   DECL_IGNORED_P (expr) = (unsigned) bp_unpack_value (bp, 1);
@@ -373,7 +373,7 @@ unpack_ts_type_common_value_fields (stru
 {
   machine_mode mode;
 
-  mode = bp_unpack_enum (bp, machine_mode, MAX_MACHINE_MODE);
+  mode = bp_unpack_machine_mode (bp);
   SET_TYPE_MODE (expr, mode);
   TYPE_STRING_FLAG (expr) = (unsigned) bp_unpack_value (bp, 1);
   TYPE_NO_FORCE_BLK (expr) = (unsigned) bp_unpack_value (bp, 1);
--- gcc/ipa-inline-analysis.c.jj	2015-02-16 22:18:33.223702249 +0100
+++ gcc/ipa-inline-analysis.c	2015-02-16 22:19:20.850917676 +0100
@@ -4190,7 +4190,8 @@ inline_read_section (struct lto_file_dec
   unsigned int i, count2, j;
   unsigned int f_count;
 
-  lto_input_block ib ((const char *) data + main_offset, header->main_size);
+  lto_input_block ib ((const char *) data + main_offset, header->main_size,
+		      file_data->mode_table);
 
   data_in =
     lto_data_in_create (file_data, (const char *) data + string_offset,
--- gcc/ipa-icf.c.jj	2015-02-16 22:18:33.222702266 +0100
+++ gcc/ipa-icf.c	2015-02-16 22:19:20.851917659 +0100
@@ -1500,7 +1500,7 @@ sem_item_optimizer::read_section (lto_fi
   unsigned int count;
 
   lto_input_block ib_main ((const char *) data + main_offset, 0,
-			   header->main_size);
+			   header->main_size, file_data->mode_table);
 
   data_in =
     lto_data_in_create (file_data, (const char *) data + string_offset,
--- gcc/real.c.jj	2015-02-16 22:18:33.220702299 +0100
+++ gcc/real.c	2015-02-16 22:19:20.853917626 +0100
@@ -3031,7 +3031,8 @@ const struct real_format ieee_single_for
     true,
     true,
     true,
-    false
+    false,
+    "ieee_single"
   };
 
 const struct real_format mips_single_format =
@@ -3052,7 +3053,8 @@ const struct real_format mips_single_for
     true,
     true,
     false,
-    true
+    true,
+    "mips_single"
   };
 
 const struct real_format motorola_single_format =
@@ -3073,7 +3075,8 @@ const struct real_format motorola_single
     true,
     true,
     true,
-    true
+    true,
+    "motorola_single"
   };
 
 /*  SPU Single Precision (Extended-Range Mode) format is the same as IEEE
@@ -3105,7 +3108,8 @@ const struct real_format spu_single_form
     true,
     true,
     false,
-    false
+    false,
+    "spu_single"
   };
 \f
 /* IEEE double-precision format.  */
@@ -3314,7 +3318,8 @@ const struct real_format ieee_double_for
     true,
     true,
     true,
-    false
+    false,
+    "ieee_double"
   };
 
 const struct real_format mips_double_format =
@@ -3335,7 +3340,8 @@ const struct real_format mips_double_for
     true,
     true,
     false,
-    true
+    true,
+    "mips_double"
   };
 
 const struct real_format motorola_double_format =
@@ -3356,7 +3362,8 @@ const struct real_format motorola_double
     true,
     true,
     true,
-    true
+    true,
+    "motorola_double"
   };
 \f
 /* IEEE extended real format.  This comes in three flavors: Intel's as
@@ -3700,7 +3707,8 @@ const struct real_format ieee_extended_m
     true,
     true,
     true,
-    true
+    true,
+    "ieee_extended_motorola"
   };
 
 const struct real_format ieee_extended_intel_96_format =
@@ -3721,7 +3729,8 @@ const struct real_format ieee_extended_i
     true,
     true,
     true,
-    false
+    false,
+    "ieee_extended_intel_96"
   };
 
 const struct real_format ieee_extended_intel_128_format =
@@ -3742,7 +3751,8 @@ const struct real_format ieee_extended_i
     true,
     true,
     true,
-    false
+    false,
+    "ieee_extended_intel_128"
   };
 
 /* The following caters to i386 systems that set the rounding precision
@@ -3765,7 +3775,8 @@ const struct real_format ieee_extended_i
     true,
     true,
     true,
-    false
+    false,
+    "ieee_extended_intel_96_round_53"
   };
 \f
 /* IBM 128-bit extended precision format: a pair of IEEE double precision
@@ -3853,7 +3864,8 @@ const struct real_format ibm_extended_fo
     true,
     true,
     true,
-    false
+    false,
+    "ibm_extended"
   };
 
 const struct real_format mips_extended_format =
@@ -3874,7 +3886,8 @@ const struct real_format mips_extended_f
     true,
     true,
     false,
-    true
+    true,
+    "mips_extended"
   };
 
 \f
@@ -4137,7 +4150,8 @@ const struct real_format ieee_quad_forma
     true,
     true,
     true,
-    false
+    false,
+    "ieee_quad"
   };
 
 const struct real_format mips_quad_format =
@@ -4158,7 +4172,8 @@ const struct real_format mips_quad_forma
     true,
     true,
     false,
-    true
+    true,
+    "mips_quad"
   };
 \f
 /* Descriptions of VAX floating point formats can be found beginning at
@@ -4458,7 +4473,8 @@ const struct real_format vax_f_format =
     false,
     false,
     false,
-    false
+    false,
+    "vax_f"
   };
 
 const struct real_format vax_d_format =
@@ -4479,7 +4495,8 @@ const struct real_format vax_d_format =
     false,
     false,
     false,
-    false
+    false,
+    "vax_d"
   };
 
 const struct real_format vax_g_format =
@@ -4500,7 +4517,8 @@ const struct real_format vax_g_format =
     false,
     false,
     false,
-    false
+    false,
+    "vax_g"
   };
 \f
 /* Encode real R into a single precision DFP value in BUF.  */
@@ -4576,7 +4594,8 @@ const struct real_format decimal_single_
     true,
     true,
     true,
-    false
+    false,
+    "decimal_single"
   };
 
 /* Double precision decimal floating point (IEEE 754). */
@@ -4598,7 +4617,8 @@ const struct real_format decimal_double_
     true,
     true,
     true,
-    false
+    false,
+    "decimal_double"
   };
 
 /* Quad precision decimal floating point (IEEE 754). */
@@ -4620,7 +4640,8 @@ const struct real_format decimal_quad_fo
     true,
     true,
     true,
-    false
+    false,
+    "decimal_quad"
   };
 \f
 /* Encode half-precision floats.  This routine is used both for the IEEE
@@ -4757,7 +4778,8 @@ const struct real_format ieee_half_forma
     true,
     true,
     true,
-    false
+    false,
+    "ieee_half"
   };
 
 /* ARM's alternative half-precision format, similar to IEEE but with
@@ -4781,7 +4803,8 @@ const struct real_format arm_half_format
     true,
     true,
     false,
-    false
+    false,
+    "arm_half"
   };
 \f
 /* A synthetic "format" for internal arithmetic.  It's the size of the
@@ -4826,7 +4849,8 @@ const struct real_format real_internal_f
     false,
     true,
     true,
-    false
+    false,
+    "real_internal"
   };
 \f
 /* Calculate X raised to the integer exponent N in mode MODE and store
--- gcc/tree-streamer.c.jj	2015-02-16 22:18:33.221702282 +0100
+++ gcc/tree-streamer.c	2015-02-16 22:19:20.853917626 +0100
@@ -53,6 +53,14 @@ along with GCC; see the file COPYING3.
 #include "cgraph.h"
 #include "tree-streamer.h"
 
+/* Table indexed by machine_mode, used for 2 different purposes.
+   During streaming out we record there non-zero value for all modes
+   that were streamed out.
+   During streaming in, we translate the on the disk mode using this
+   table.  For normal LTO it is set to identity, for ACCEL_COMPILER
+   depending on the mode_table content.  */
+unsigned char streamer_mode_table[1 << 8];
+
 /* Check that all the TS_* structures handled by the streamer_write_* and
    streamer_read_* routines are exactly ALL the structures defined in
    treestruct.def.  */
--- gcc/lto/lto.c.jj	2015-02-16 22:18:33.221702282 +0100
+++ gcc/lto/lto.c	2015-02-16 22:35:56.213523202 +0100
@@ -85,6 +85,8 @@ static int lto_parallelism;
 
 static GTY(()) tree first_personality_decl;
 
+static GTY(()) const unsigned char *lto_mode_identity_table;
+
 /* Returns a hash code for P.  */
 
 static hashval_t
@@ -1877,7 +1879,7 @@ lto_read_decls (struct lto_file_decl_dat
   uint32_t num_decl_states;
 
   lto_input_block ib_main ((const char *) data + main_offset,
-			   header->main_size);
+			   header->main_size, decl_data->mode_table);
 
   data_in = lto_data_in_create (decl_data, (const char *) data + string_offset,
 				header->string_size, resolutions);
@@ -2219,6 +2221,11 @@ lto_file_finalize (struct lto_file_decl_
 
   file_data->renaming_hash_table = lto_create_renaming_table ();
   file_data->file_name = file->filename;
+#ifdef ACCEL_COMPILER
+  lto_input_mode_table (file_data);
+#else
+  file_data->mode_table = lto_mode_identity_table;
+#endif
   data = lto_get_section_data (file_data, LTO_section_decls, NULL, &len);
   if (data == NULL)
     {
@@ -3394,6 +3401,13 @@ lto_init (void)
   memset (&lto_stats, 0, sizeof (lto_stats));
   bitmap_obstack_initialize (NULL);
   gimple_register_cfg_hooks ();
+#ifndef ACCEL_COMPILER
+  unsigned char *table
+    = ggc_vec_alloc<unsigned char> (MAX_MACHINE_MODE);
+  for (int m = 0; m < MAX_MACHINE_MODE; m++)
+    table[m] = m;
+  lto_mode_identity_table = table;
+#endif
 }
 
 
--- gcc/lto-cgraph.c.jj	2015-02-16 22:18:33.211702447 +0100
+++ gcc/lto-cgraph.c	2015-02-16 22:19:20.855917593 +0100
@@ -2113,7 +2113,7 @@ input_cgraph_opt_section (struct lto_fil
   unsigned int count;
 
   lto_input_block ib_main ((const char *) data + main_offset,
-			   header->main_size);
+			   header->main_size, file_data->mode_table);
 
   data_in =
     lto_data_in_create (file_data, (const char *) data + string_offset,
--- gcc/lto-streamer-in.c.jj	2015-02-16 22:18:33.204702562 +0100
+++ gcc/lto-streamer-in.c	2015-02-16 22:26:53.355464202 +0100
@@ -1116,10 +1116,12 @@ lto_read_body_or_constructor (struct lto
 
       /* Set up the struct function.  */
       from = data_in->reader_cache->nodes.length ();
-      lto_input_block ib_main (data + main_offset, header->main_size);
+      lto_input_block ib_main (data + main_offset, header->main_size,
+			       file_data->mode_table);
       if (TREE_CODE (node->decl) == FUNCTION_DECL)
 	{
-	  lto_input_block ib_cfg (data + cfg_offset, header->cfg_size);
+	  lto_input_block ib_cfg (data + cfg_offset, header->cfg_size,
+				  file_data->mode_table);
 	  input_function (fn_decl, data_in, &ib_main, &ib_cfg);
 	}
       else
@@ -1384,7 +1386,8 @@ lto_input_toplevel_asms (struct lto_file
 
   string_offset = sizeof (*header) + header->main_size;
 
-  lto_input_block ib (data + sizeof (*header), header->main_size);
+  lto_input_block ib (data + sizeof (*header), header->main_size,
+		      file_data->mode_table);
 
   data_in = lto_data_in_create (file_data, data + string_offset,
 			      header->string_size, vNULL);
@@ -1403,6 +1406,123 @@ lto_input_toplevel_asms (struct lto_file
 }
 
 
+/* Input mode table.  */
+
+void
+lto_input_mode_table (struct lto_file_decl_data *file_data)
+{
+  size_t len;
+  const char *data = lto_get_section_data (file_data, LTO_section_mode_table,
+					   NULL, &len);
+  if (! data)
+    {
+      internal_error ("cannot read LTO mode table from %s",
+		      file_data->file_name);
+      return;
+    }
+
+  unsigned char *table = ggc_cleared_vec_alloc<unsigned char> (1 << 8);
+  file_data->mode_table = table;
+  const struct lto_simple_header_with_strings *header
+    = (const struct lto_simple_header_with_strings *) data;
+  int string_offset;
+  struct data_in *data_in;
+  string_offset = sizeof (*header) + header->main_size;
+
+  lto_input_block ib (data + sizeof (*header), header->main_size, NULL);
+  data_in = lto_data_in_create (file_data, data + string_offset,
+				header->string_size, vNULL);
+  bitpack_d bp = streamer_read_bitpack (&ib);
+
+  table[VOIDmode] = VOIDmode;
+  table[BLKmode] = BLKmode;
+  unsigned int m;
+  while ((m = bp_unpack_value (&bp, 8)) != VOIDmode)
+    {
+      enum mode_class mclass
+	= bp_unpack_enum (&bp, mode_class, MAX_MODE_CLASS);
+      unsigned int size = bp_unpack_value (&bp, 8);
+      unsigned int prec = bp_unpack_value (&bp, 16);
+      machine_mode inner = (machine_mode) table[bp_unpack_value (&bp, 8)];
+      unsigned int nunits = bp_unpack_value (&bp, 8);
+      unsigned int ibit = 0, fbit = 0;
+      unsigned int real_fmt_len = 0;
+      const char *real_fmt_name = NULL;
+      switch (mclass)
+	{
+	case MODE_FRACT:
+	case MODE_UFRACT:
+	case MODE_ACCUM:
+	case MODE_UACCUM:
+	  ibit = bp_unpack_value (&bp, 8);
+	  fbit = bp_unpack_value (&bp, 8);
+	  break;
+	case MODE_FLOAT:
+	case MODE_DECIMAL_FLOAT:
+	  real_fmt_name = bp_unpack_indexed_string (data_in, &bp,
+						    &real_fmt_len);
+	  break;
+	default:
+	  break;
+	}
+      /* First search just the GET_CLASS_NARROWEST_MODE to wider modes,
+	 if not found, fallback to all modes.  */
+      int pass;
+      for (pass = 0; pass < 2; pass++)
+	for (machine_mode mr = pass ? VOIDmode
+				    : GET_CLASS_NARROWEST_MODE (mclass);
+	     pass ? mr < MAX_MACHINE_MODE : mr != VOIDmode;
+	     pass ? mr = (machine_mode) (m + 1)
+		  : mr = GET_MODE_WIDER_MODE (mr))
+	  if (GET_MODE_CLASS (mr) != mclass
+	      || GET_MODE_SIZE (mr) != size
+	      || GET_MODE_PRECISION (mr) != prec
+	      || GET_MODE_INNER (mr) != inner
+	      || GET_MODE_IBIT (mr) != ibit
+	      || GET_MODE_FBIT (mr) != fbit
+	      || GET_MODE_NUNITS (mr) != nunits)
+	    continue;
+	  else if ((mclass == MODE_FLOAT || mclass == MODE_DECIMAL_FLOAT)
+		   && strcmp (REAL_MODE_FORMAT (mr)->name, real_fmt_name) != 0)
+	    continue;
+	  else
+	    {
+	      table[m] = mr;
+	      pass = 2;
+	      break;
+	    }
+      unsigned int mname_len;
+      const char *mname = bp_unpack_indexed_string (data_in, &bp, &mname_len);
+      if (pass == 2)
+	{
+	  switch (mclass)
+	    {
+	    case MODE_VECTOR_INT:
+	    case MODE_VECTOR_FLOAT:
+	    case MODE_VECTOR_FRACT:
+	    case MODE_VECTOR_UFRACT:
+	    case MODE_VECTOR_ACCUM:
+	    case MODE_VECTOR_UACCUM:
+	      /* For unsupported vector modes just use BLKmode,
+		 if the scalar mode is supported.  */
+	      if (inner != VOIDmode)
+		{
+		  table[m] = BLKmode;
+		  break;
+		}
+	      /* FALLTHRU */
+	    default:
+	      error ("unsupported mode %s\n", mname);
+	      break;
+	    }
+	}
+    }
+  lto_data_in_delete (data_in);
+
+  lto_free_section_data (file_data, LTO_section_mode_table, NULL, data, len);
+}
+
+
 /* Initialization for the LTO reader.  */
 
 void


	Jakub

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: nvptx offloading patches [3/n], RFD
  2015-02-16 21:44         ` Jakub Jelinek
@ 2015-02-17 10:00           ` Richard Biener
  2015-02-18 10:00             ` Jakub Jelinek
  2015-02-18  9:05           ` nvptx offloading patches [3/n], RFD Thomas Schwinge
  1 sibling, 1 reply; 42+ messages in thread
From: Richard Biener @ 2015-02-17 10:00 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Jan Hubicka, Ilya Verbin, Bernd Schmidt, Thomas Schwinge, GCC Patches

On Mon, Feb 16, 2015 at 10:43 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Mon, Feb 16, 2015 at 10:35:30PM +0100, Richard Biener wrote:
>> Seeing the real format string you introduce I wonder if identifying modes
>> by their names wouldn't work in 99% of all cases (apart from PSImode
>> maybe).
>
> There are various corner cases.  Plus of course sometimes insignificant, but
> sometimes very significant, floating mode changes.  SFmode on one target
> might be completely different from another target.

But we can't deal with arbitrary target differences anyway - otherwise
we have generated wrong code already.

>> Also for most cases we can construct the machine mode from the type.  Or
>> where that is not possible stream the extra info that is necessary
>> instead.
>
> I thought we've discussed that already on IRC.  E.g. decimal modes are
> identified only by mode and nothing else, and it doesn't look like it
> can be easily derived from types in many cases (spent quite some time on
> that).

Sure, still modes and types have quite some overlap in information
so we might be able to do more compact streaming (and at the same
time not rely on the machine-mode enum).  The machine-modes
of course are very compact to stream (they are basically a common set
of all possible types), and your mapping introduces kind of a cache
for common type properties.

I know that Honza wanted to make trees slimmer by taking into account
more (redundant) information from the modes associated with trees.

I'm just looking for a way to make this less of a hack (and the LTO IL
less target dependent).  Not for GCC 5 for which something like your
patch is probably ok, but for the future.

>> Overall feels like a hack BTW :)  can't we assign machine mode enum IDs in
>> a target independent way?  I mean, it doesn't have to be densely
>> allocated?
>
> We iterate over modes, we have tons of tables indexed by modes, so if we
> introduce gaps, we'll make the compiler bigger and slower.
> If this is limited to the offloading path, like in the attached updated
> patch, the overhead for native LTO should be not measurable.

Sure.

Thanks,
Richard.

> --- gcc/passes.c.jj     2015-02-16 22:18:33.219702315 +0100
> +++ gcc/passes.c        2015-02-16 22:19:20.842917807 +0100
> @@ -2460,6 +2460,7 @@ ipa_write_summaries_1 (lto_symtab_encode
>    struct lto_out_decl_state *state = lto_new_out_decl_state ();
>    state->symtab_node_encoder = encoder;
>
> +  lto_output_init_mode_table ();
>    lto_push_out_decl_state (state);
>
>    gcc_assert (!flag_wpa);
> @@ -2581,6 +2582,7 @@ ipa_write_optimization_summaries (lto_sy
>    lto_symtab_encoder_iterator lsei;
>    state->symtab_node_encoder = encoder;
>
> +  lto_output_init_mode_table ();
>    lto_push_out_decl_state (state);
>    for (lsei = lsei_start_function_in_partition (encoder);
>         !lsei_end_p (lsei); lsei_next_function_in_partition (&lsei))
> --- gcc/tree-streamer.h.jj      2015-02-16 22:18:33.222702266 +0100
> +++ gcc/tree-streamer.h 2015-02-16 22:19:20.843917791 +0100
> @@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.
>
>  #include "streamer-hooks.h"
>  #include "lto-streamer.h"
> +#include "data-streamer.h"
>  #include "hash-map.h"
>
>  /* Cache of pickled nodes.  Used to avoid writing the same node more
> @@ -91,6 +92,7 @@ void streamer_write_integer_cst (struct
>  void streamer_write_builtin (struct output_block *, tree);
>
>  /* In tree-streamer.c.  */
> +extern unsigned char streamer_mode_table[1 << 8];
>  void streamer_check_handled_ts_structures (void);
>  bool streamer_tree_cache_insert (struct streamer_tree_cache_d *, tree,
>                                  hashval_t, unsigned *);
> @@ -119,5 +121,19 @@ streamer_tree_cache_get_hash (struct str
>    return cache->hashes[ix];
>  }
>
> +static inline void
> +bp_pack_machine_mode (struct bitpack_d *bp, machine_mode mode)
> +{
> +  streamer_mode_table[mode] = 1;
> +  bp_pack_enum (bp, machine_mode, 1 << 8, mode);
> +}
> +
> +static inline machine_mode
> +bp_unpack_machine_mode (struct bitpack_d *bp)
> +{
> +  return (machine_mode)
> +          ((struct lto_input_block *)
> +           bp->stream)->mode_table[bp_unpack_enum (bp, machine_mode, 1 << 8)];
> +}
>
>  #endif  /* GCC_TREE_STREAMER_H  */
> --- gcc/lto-streamer-out.c.jj   2015-02-16 22:18:33.204702562 +0100
> +++ gcc/lto-streamer-out.c      2015-02-16 22:20:06.659163066 +0100
> @@ -2642,6 +2642,96 @@ produce_symtab (struct output_block *ob)
>  }
>
>
> +/* Init the streamer_mode_table for output, where we collect info on what
> +   machine_mode values have been streamed.  */
> +void
> +lto_output_init_mode_table (void)
> +{
> +  memset (streamer_mode_table, '\0', MAX_MACHINE_MODE);
> +}
> +
> +
> +/* Write the mode table.  */
> +static void
> +lto_write_mode_table (void)
> +{
> +  struct output_block *ob;
> +  ob = create_output_block (LTO_section_mode_table);
> +  bitpack_d bp = bitpack_create (ob->main_stream);
> +
> +  /* Ensure that for GET_MODE_INNER (m) != VOIDmode we have
> +     also the inner mode marked.  */
> +  for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
> +    if (streamer_mode_table[i])
> +      {
> +       machine_mode m = (machine_mode) i;
> +       if (GET_MODE_INNER (m) != VOIDmode)
> +         streamer_mode_table[(int) GET_MODE_INNER (m)] = 1;
> +      }
> +  /* First stream modes that have GET_MODE_INNER (m) == VOIDmode,
> +     so that we can refer to them afterwards.  */
> +  for (int pass = 0; pass < 2; pass++)
> +    for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
> +      if (streamer_mode_table[i] && i != (int) VOIDmode && i != (int) BLKmode)
> +       {
> +         machine_mode m = (machine_mode) i;
> +         if ((GET_MODE_INNER (m) == VOIDmode) ^ (pass == 0))
> +           continue;
> +         bp_pack_value (&bp, m, 8);
> +         bp_pack_enum (&bp, mode_class, MAX_MODE_CLASS, GET_MODE_CLASS (m));
> +         bp_pack_value (&bp, GET_MODE_SIZE (m), 8);
> +         bp_pack_value (&bp, GET_MODE_PRECISION (m), 16);
> +         bp_pack_value (&bp, GET_MODE_INNER (m), 8);
> +         bp_pack_value (&bp, GET_MODE_NUNITS (m), 8);
> +         switch (GET_MODE_CLASS (m))
> +           {
> +           case MODE_FRACT:
> +           case MODE_UFRACT:
> +           case MODE_ACCUM:
> +           case MODE_UACCUM:
> +             bp_pack_value (&bp, GET_MODE_IBIT (m), 8);
> +             bp_pack_value (&bp, GET_MODE_FBIT (m), 8);
> +             break;
> +           case MODE_FLOAT:
> +           case MODE_DECIMAL_FLOAT:
> +             bp_pack_string (ob, &bp, REAL_MODE_FORMAT (m)->name, true);
> +             break;
> +           default:
> +             break;
> +           }
> +         bp_pack_string (ob, &bp, GET_MODE_NAME (m), true);
> +       }
> +  bp_pack_value (&bp, VOIDmode, 8);
> +
> +  streamer_write_bitpack (&bp);
> +
> +  char *section_name
> +    = lto_get_section_name (LTO_section_mode_table, NULL, NULL);
> +  lto_begin_section (section_name, !flag_wpa);
> +  free (section_name);
> +
> +  /* The entire header stream is computed here.  */
> +  struct lto_simple_header_with_strings header;
> +  memset (&header, 0, sizeof (header));
> +
> +  /* Write the header.  */
> +  header.major_version = LTO_major_version;
> +  header.minor_version = LTO_minor_version;
> +
> +  header.main_size = ob->main_stream->total_size;
> +  header.string_size = ob->string_stream->total_size;
> +  lto_write_data (&header, sizeof header);
> +
> +  /* Put all of the gimple and the string table out the asm file as a
> +     block of text.  */
> +  lto_write_stream (ob->main_stream);
> +  lto_write_stream (ob->string_stream);
> +
> +  lto_end_section ();
> +  destroy_output_block (ob);
> +}
> +
> +
>  /* This pass is run after all of the functions are serialized and all
>     of the IPA passes have written their serialized forms.  This pass
>     causes the vector of all of the global decls and types used from
> @@ -2749,4 +2839,6 @@ produce_asm_for_decls (void)
>    lto_symtab_encoder_delete (ob->decl_state->symtab_node_encoder);
>    lto_function_decl_states.release ();
>    destroy_output_block (ob);
> +  if (lto_stream_offload_p)
> +    lto_write_mode_table ();
>  }
> --- gcc/config/pdp11/pdp11.c.jj 2015-02-16 22:18:33.209702480 +0100
> +++ gcc/config/pdp11/pdp11.c    2015-02-16 22:19:20.845917758 +0100
> @@ -107,7 +107,8 @@ const struct real_format pdp11_f_format
>      false,
>      false,
>      false,
> -    false
> +    false,
> +    "pdp11_f"
>    };
>
>  const struct real_format pdp11_d_format =
> @@ -128,7 +129,8 @@ const struct real_format pdp11_d_format
>      false,
>      false,
>      false,
> -    false
> +    false,
> +    "pdp11_d"
>    };
>
>  static void
> --- gcc/lto-section-in.c.jj     2015-02-16 22:18:33.202702595 +0100
> +++ gcc/lto-section-in.c        2015-02-16 22:19:20.845917758 +0100
> @@ -89,7 +89,8 @@ const char *lto_section_name[LTO_N_SECTI
>    "inline",
>    "ipcp_trans",
>    "icf",
> -  "offload_table"
> +  "offload_table",
> +  "mode_table"
>  };
>
>
> @@ -262,7 +263,8 @@ lto_create_simple_input_block (struct lt
>      return NULL;
>
>    *datar = data;
> -  return new lto_input_block (data + main_offset, header->main_size);
> +  return new lto_input_block (data + main_offset, header->main_size,
> +                             file_data->mode_table);
>  }
>
>
> --- gcc/tree-streamer-out.c.jj  2015-02-16 22:18:33.222702266 +0100
> +++ gcc/tree-streamer-out.c     2015-02-16 22:19:20.845917758 +0100
> @@ -190,7 +190,7 @@ static void
>  pack_ts_fixed_cst_value_fields (struct bitpack_d *bp, tree expr)
>  {
>    struct fixed_value fv = TREE_FIXED_CST (expr);
> -  bp_pack_enum (bp, machine_mode, MAX_MACHINE_MODE, fv.mode);
> +  bp_pack_machine_mode (bp, fv.mode);
>    bp_pack_var_len_int (bp, fv.data.low);
>    bp_pack_var_len_int (bp, fv.data.high);
>  }
> @@ -201,7 +201,7 @@ pack_ts_fixed_cst_value_fields (struct b
>  static void
>  pack_ts_decl_common_value_fields (struct bitpack_d *bp, tree expr)
>  {
> -  bp_pack_enum (bp, machine_mode, MAX_MACHINE_MODE, DECL_MODE (expr));
> +  bp_pack_machine_mode (bp, DECL_MODE (expr));
>    bp_pack_value (bp, DECL_NONLOCAL (expr), 1);
>    bp_pack_value (bp, DECL_VIRTUAL_P (expr), 1);
>    bp_pack_value (bp, DECL_IGNORED_P (expr), 1);
> @@ -325,7 +325,7 @@ pack_ts_function_decl_value_fields (stru
>  static void
>  pack_ts_type_common_value_fields (struct bitpack_d *bp, tree expr)
>  {
> -  bp_pack_enum (bp, machine_mode, MAX_MACHINE_MODE, TYPE_MODE (expr));
> +  bp_pack_machine_mode (bp, TYPE_MODE (expr));
>    bp_pack_value (bp, TYPE_STRING_FLAG (expr), 1);
>    bp_pack_value (bp, TYPE_NO_FORCE_BLK (expr), 1);
>    bp_pack_value (bp, TYPE_NEEDS_CONSTRUCTING (expr), 1);
> --- gcc/real.h.jj       2015-02-16 22:18:33.220702299 +0100
> +++ gcc/real.h  2015-02-16 22:19:20.846917741 +0100
> @@ -155,6 +155,7 @@ struct real_format
>    bool has_signed_zero;
>    bool qnan_msb_set;
>    bool canonical_nan_lsbs_set;
> +  const char *name;
>  };
>
>
> --- gcc/lto-streamer.h.jj       2015-02-16 22:18:33.211702447 +0100
> +++ gcc/lto-streamer.h  2015-02-16 22:19:20.846917741 +0100
> @@ -248,6 +248,7 @@ enum lto_section_type
>    LTO_section_ipcp_transform,
>    LTO_section_ipa_icf,
>    LTO_section_offload_table,
> +  LTO_section_mode_table,
>    LTO_N_SECTION_TYPES          /* Must be last.  */
>  };
>
> @@ -312,12 +313,15 @@ class lto_input_block
>  public:
>    /* Special constructor for the string table, it abuses this to
>       do random access but use the uhwi decoder.  */
> -  lto_input_block (const char *data_, unsigned int p_, unsigned int len_)
> -      : data (data_), p (p_), len (len_) {}
> -  lto_input_block (const char *data_, unsigned int len_)
> -      : data (data_), p (0), len (len_) {}
> +  lto_input_block (const char *data_, unsigned int p_, unsigned int len_,
> +                  const unsigned char *mode_table_)
> +      : data (data_), mode_table (mode_table_), p (p_), len (len_) {}
> +  lto_input_block (const char *data_, unsigned int len_,
> +                  const unsigned char *mode_table_)
> +      : data (data_), mode_table (mode_table_), p (0), len (len_) {}
>
>    const char *data;
> +  const unsigned char *mode_table;
>    unsigned int p;
>    unsigned int len;
>  };
> @@ -527,6 +531,9 @@ struct GTY(()) lto_file_decl_data
>
>    /* Map assigning declarations their resolutions.  */
>    hash_map<tree, ld_plugin_symbol_resolution> * GTY((skip)) resolution_map;
> +
> +  /* Mode translation table.  */
> +  const unsigned char *mode_table;
>  };
>
>  typedef struct lto_file_decl_data *lto_file_decl_data_ptr;
> @@ -775,6 +782,7 @@ extern void lto_input_variable_construct
>  extern void lto_input_constructors_and_inits (struct lto_file_decl_data *,
>                                               const char *);
>  extern void lto_input_toplevel_asms (struct lto_file_decl_data *, int);
> +extern void lto_input_mode_table (struct lto_file_decl_data *);
>  extern struct data_in *lto_data_in_create (struct lto_file_decl_data *,
>                                     const char *, unsigned,
>                                     vec<ld_plugin_symbol_resolution_t> );
> @@ -807,6 +815,7 @@ void lto_output_decl_state_refs (struct
>                                  struct lto_output_stream *,
>                                  struct lto_out_decl_state *);
>  void lto_output_location (struct output_block *, struct bitpack_d *, location_t);
> +void lto_output_init_mode_table (void);
>
>
>  /* In lto-cgraph.c  */
> --- gcc/ipa-prop.c.jj   2015-02-16 22:18:33.219702315 +0100
> +++ gcc/ipa-prop.c      2015-02-16 22:19:20.848917709 +0100
> @@ -4868,7 +4868,7 @@ ipa_prop_read_section (struct lto_file_d
>    unsigned int count;
>
>    lto_input_block ib_main ((const char *) data + main_offset,
> -                          header->main_size);
> +                          header->main_size, file_data->mode_table);
>
>    data_in =
>      lto_data_in_create (file_data, (const char *) data + string_offset,
> @@ -5089,7 +5089,7 @@ read_replacements_section (struct lto_fi
>    unsigned int count;
>
>    lto_input_block ib_main ((const char *) data + main_offset,
> -                          header->main_size);
> +                          header->main_size, file_data->mode_table);
>
>    data_in = lto_data_in_create (file_data, (const char *) data + string_offset,
>                                 header->string_size, vNULL);
> --- gcc/data-streamer-in.c.jj   2015-02-16 22:18:33.224702233 +0100
> +++ gcc/data-streamer-in.c      2015-02-16 22:19:20.848917709 +0100
> @@ -70,7 +70,7 @@ string_for_index (struct data_in *data_i
>      }
>
>    /* Get the string stored at location LOC in DATA_IN->STRINGS.  */
> -  lto_input_block str_tab (data_in->strings, loc - 1, data_in->strings_len);
> +  lto_input_block str_tab (data_in->strings, loc - 1, data_in->strings_len, NULL);
>    len = streamer_read_uhwi (&str_tab);
>    *rlen = len;
>
> --- gcc/tree-streamer-in.c.jj   2015-02-16 22:18:33.220702299 +0100
> +++ gcc/tree-streamer-in.c      2015-02-16 22:19:20.849917692 +0100
> @@ -224,7 +224,7 @@ static void
>  unpack_ts_fixed_cst_value_fields (struct bitpack_d *bp, tree expr)
>  {
>    FIXED_VALUE_TYPE *fp = ggc_alloc<fixed_value> ();
> -  fp->mode = bp_unpack_enum (bp, machine_mode, MAX_MACHINE_MODE);
> +  fp->mode = bp_unpack_machine_mode (bp);
>    fp->data.low = bp_unpack_var_len_int (bp);
>    fp->data.high = bp_unpack_var_len_int (bp);
>    TREE_FIXED_CST_PTR (expr) = fp;
> @@ -236,7 +236,7 @@ unpack_ts_fixed_cst_value_fields (struct
>  static void
>  unpack_ts_decl_common_value_fields (struct bitpack_d *bp, tree expr)
>  {
> -  DECL_MODE (expr) = bp_unpack_enum (bp, machine_mode, MAX_MACHINE_MODE);
> +  DECL_MODE (expr) = bp_unpack_machine_mode (bp);
>    DECL_NONLOCAL (expr) = (unsigned) bp_unpack_value (bp, 1);
>    DECL_VIRTUAL_P (expr) = (unsigned) bp_unpack_value (bp, 1);
>    DECL_IGNORED_P (expr) = (unsigned) bp_unpack_value (bp, 1);
> @@ -373,7 +373,7 @@ unpack_ts_type_common_value_fields (stru
>  {
>    machine_mode mode;
>
> -  mode = bp_unpack_enum (bp, machine_mode, MAX_MACHINE_MODE);
> +  mode = bp_unpack_machine_mode (bp);
>    SET_TYPE_MODE (expr, mode);
>    TYPE_STRING_FLAG (expr) = (unsigned) bp_unpack_value (bp, 1);
>    TYPE_NO_FORCE_BLK (expr) = (unsigned) bp_unpack_value (bp, 1);
> --- gcc/ipa-inline-analysis.c.jj        2015-02-16 22:18:33.223702249 +0100
> +++ gcc/ipa-inline-analysis.c   2015-02-16 22:19:20.850917676 +0100
> @@ -4190,7 +4190,8 @@ inline_read_section (struct lto_file_dec
>    unsigned int i, count2, j;
>    unsigned int f_count;
>
> -  lto_input_block ib ((const char *) data + main_offset, header->main_size);
> +  lto_input_block ib ((const char *) data + main_offset, header->main_size,
> +                     file_data->mode_table);
>
>    data_in =
>      lto_data_in_create (file_data, (const char *) data + string_offset,
> --- gcc/ipa-icf.c.jj    2015-02-16 22:18:33.222702266 +0100
> +++ gcc/ipa-icf.c       2015-02-16 22:19:20.851917659 +0100
> @@ -1500,7 +1500,7 @@ sem_item_optimizer::read_section (lto_fi
>    unsigned int count;
>
>    lto_input_block ib_main ((const char *) data + main_offset, 0,
> -                          header->main_size);
> +                          header->main_size, file_data->mode_table);
>
>    data_in =
>      lto_data_in_create (file_data, (const char *) data + string_offset,
> --- gcc/real.c.jj       2015-02-16 22:18:33.220702299 +0100
> +++ gcc/real.c  2015-02-16 22:19:20.853917626 +0100
> @@ -3031,7 +3031,8 @@ const struct real_format ieee_single_for
>      true,
>      true,
>      true,
> -    false
> +    false,
> +    "ieee_single"
>    };
>
>  const struct real_format mips_single_format =
> @@ -3052,7 +3053,8 @@ const struct real_format mips_single_for
>      true,
>      true,
>      false,
> -    true
> +    true,
> +    "mips_single"
>    };
>
>  const struct real_format motorola_single_format =
> @@ -3073,7 +3075,8 @@ const struct real_format motorola_single
>      true,
>      true,
>      true,
> -    true
> +    true,
> +    "motorola_single"
>    };
>
>  /*  SPU Single Precision (Extended-Range Mode) format is the same as IEEE
> @@ -3105,7 +3108,8 @@ const struct real_format spu_single_form
>      true,
>      true,
>      false,
> -    false
> +    false,
> +    "spu_single"
>    };
>
>  /* IEEE double-precision format.  */
> @@ -3314,7 +3318,8 @@ const struct real_format ieee_double_for
>      true,
>      true,
>      true,
> -    false
> +    false,
> +    "ieee_double"
>    };
>
>  const struct real_format mips_double_format =
> @@ -3335,7 +3340,8 @@ const struct real_format mips_double_for
>      true,
>      true,
>      false,
> -    true
> +    true,
> +    "mips_double"
>    };
>
>  const struct real_format motorola_double_format =
> @@ -3356,7 +3362,8 @@ const struct real_format motorola_double
>      true,
>      true,
>      true,
> -    true
> +    true,
> +    "motorola_double"
>    };
>
>  /* IEEE extended real format.  This comes in three flavors: Intel's as
> @@ -3700,7 +3707,8 @@ const struct real_format ieee_extended_m
>      true,
>      true,
>      true,
> -    true
> +    true,
> +    "ieee_extended_motorola"
>    };
>
>  const struct real_format ieee_extended_intel_96_format =
> @@ -3721,7 +3729,8 @@ const struct real_format ieee_extended_i
>      true,
>      true,
>      true,
> -    false
> +    false,
> +    "ieee_extended_intel_96"
>    };
>
>  const struct real_format ieee_extended_intel_128_format =
> @@ -3742,7 +3751,8 @@ const struct real_format ieee_extended_i
>      true,
>      true,
>      true,
> -    false
> +    false,
> +    "ieee_extended_intel_128"
>    };
>
>  /* The following caters to i386 systems that set the rounding precision
> @@ -3765,7 +3775,8 @@ const struct real_format ieee_extended_i
>      true,
>      true,
>      true,
> -    false
> +    false,
> +    "ieee_extended_intel_96_round_53"
>    };
>
>  /* IBM 128-bit extended precision format: a pair of IEEE double precision
> @@ -3853,7 +3864,8 @@ const struct real_format ibm_extended_fo
>      true,
>      true,
>      true,
> -    false
> +    false,
> +    "ibm_extended"
>    };
>
>  const struct real_format mips_extended_format =
> @@ -3874,7 +3886,8 @@ const struct real_format mips_extended_f
>      true,
>      true,
>      false,
> -    true
> +    true,
> +    "mips_extended"
>    };
>
>
> @@ -4137,7 +4150,8 @@ const struct real_format ieee_quad_forma
>      true,
>      true,
>      true,
> -    false
> +    false,
> +    "ieee_quad"
>    };
>
>  const struct real_format mips_quad_format =
> @@ -4158,7 +4172,8 @@ const struct real_format mips_quad_forma
>      true,
>      true,
>      false,
> -    true
> +    true,
> +    "mips_quad"
>    };
>
>  /* Descriptions of VAX floating point formats can be found beginning at
> @@ -4458,7 +4473,8 @@ const struct real_format vax_f_format =
>      false,
>      false,
>      false,
> -    false
> +    false,
> +    "vax_f"
>    };
>
>  const struct real_format vax_d_format =
> @@ -4479,7 +4495,8 @@ const struct real_format vax_d_format =
>      false,
>      false,
>      false,
> -    false
> +    false,
> +    "vax_d"
>    };
>
>  const struct real_format vax_g_format =
> @@ -4500,7 +4517,8 @@ const struct real_format vax_g_format =
>      false,
>      false,
>      false,
> -    false
> +    false,
> +    "vax_g"
>    };
>
>  /* Encode real R into a single precision DFP value in BUF.  */
> @@ -4576,7 +4594,8 @@ const struct real_format decimal_single_
>      true,
>      true,
>      true,
> -    false
> +    false,
> +    "decimal_single"
>    };
>
>  /* Double precision decimal floating point (IEEE 754). */
> @@ -4598,7 +4617,8 @@ const struct real_format decimal_double_
>      true,
>      true,
>      true,
> -    false
> +    false,
> +    "decimal_double"
>    };
>
>  /* Quad precision decimal floating point (IEEE 754). */
> @@ -4620,7 +4640,8 @@ const struct real_format decimal_quad_fo
>      true,
>      true,
>      true,
> -    false
> +    false,
> +    "decimal_quad"
>    };
>
>  /* Encode half-precision floats.  This routine is used both for the IEEE
> @@ -4757,7 +4778,8 @@ const struct real_format ieee_half_forma
>      true,
>      true,
>      true,
> -    false
> +    false,
> +    "ieee_half"
>    };
>
>  /* ARM's alternative half-precision format, similar to IEEE but with
> @@ -4781,7 +4803,8 @@ const struct real_format arm_half_format
>      true,
>      true,
>      false,
> -    false
> +    false,
> +    "arm_half"
>    };
>
>  /* A synthetic "format" for internal arithmetic.  It's the size of the
> @@ -4826,7 +4849,8 @@ const struct real_format real_internal_f
>      false,
>      true,
>      true,
> -    false
> +    false,
> +    "real_internal"
>    };
>
>  /* Calculate X raised to the integer exponent N in mode MODE and store
> --- gcc/tree-streamer.c.jj      2015-02-16 22:18:33.221702282 +0100
> +++ gcc/tree-streamer.c 2015-02-16 22:19:20.853917626 +0100
> @@ -53,6 +53,14 @@ along with GCC; see the file COPYING3.
>  #include "cgraph.h"
>  #include "tree-streamer.h"
>
> +/* Table indexed by machine_mode, used for 2 different purposes.
> +   During streaming out we record there non-zero value for all modes
> +   that were streamed out.
> +   During streaming in, we translate the on the disk mode using this
> +   table.  For normal LTO it is set to identity, for ACCEL_COMPILER
> +   depending on the mode_table content.  */
> +unsigned char streamer_mode_table[1 << 8];
> +
>  /* Check that all the TS_* structures handled by the streamer_write_* and
>     streamer_read_* routines are exactly ALL the structures defined in
>     treestruct.def.  */
> --- gcc/lto/lto.c.jj    2015-02-16 22:18:33.221702282 +0100
> +++ gcc/lto/lto.c       2015-02-16 22:35:56.213523202 +0100
> @@ -85,6 +85,8 @@ static int lto_parallelism;
>
>  static GTY(()) tree first_personality_decl;
>
> +static GTY(()) const unsigned char *lto_mode_identity_table;
> +
>  /* Returns a hash code for P.  */
>
>  static hashval_t
> @@ -1877,7 +1879,7 @@ lto_read_decls (struct lto_file_decl_dat
>    uint32_t num_decl_states;
>
>    lto_input_block ib_main ((const char *) data + main_offset,
> -                          header->main_size);
> +                          header->main_size, decl_data->mode_table);
>
>    data_in = lto_data_in_create (decl_data, (const char *) data + string_offset,
>                                 header->string_size, resolutions);
> @@ -2219,6 +2221,11 @@ lto_file_finalize (struct lto_file_decl_
>
>    file_data->renaming_hash_table = lto_create_renaming_table ();
>    file_data->file_name = file->filename;
> +#ifdef ACCEL_COMPILER
> +  lto_input_mode_table (file_data);
> +#else
> +  file_data->mode_table = lto_mode_identity_table;
> +#endif
>    data = lto_get_section_data (file_data, LTO_section_decls, NULL, &len);
>    if (data == NULL)
>      {
> @@ -3394,6 +3401,13 @@ lto_init (void)
>    memset (&lto_stats, 0, sizeof (lto_stats));
>    bitmap_obstack_initialize (NULL);
>    gimple_register_cfg_hooks ();
> +#ifndef ACCEL_COMPILER
> +  unsigned char *table
> +    = ggc_vec_alloc<unsigned char> (MAX_MACHINE_MODE);
> +  for (int m = 0; m < MAX_MACHINE_MODE; m++)
> +    table[m] = m;
> +  lto_mode_identity_table = table;
> +#endif
>  }
>
>
> --- gcc/lto-cgraph.c.jj 2015-02-16 22:18:33.211702447 +0100
> +++ gcc/lto-cgraph.c    2015-02-16 22:19:20.855917593 +0100
> @@ -2113,7 +2113,7 @@ input_cgraph_opt_section (struct lto_fil
>    unsigned int count;
>
>    lto_input_block ib_main ((const char *) data + main_offset,
> -                          header->main_size);
> +                          header->main_size, file_data->mode_table);
>
>    data_in =
>      lto_data_in_create (file_data, (const char *) data + string_offset,
> --- gcc/lto-streamer-in.c.jj    2015-02-16 22:18:33.204702562 +0100
> +++ gcc/lto-streamer-in.c       2015-02-16 22:26:53.355464202 +0100
> @@ -1116,10 +1116,12 @@ lto_read_body_or_constructor (struct lto
>
>        /* Set up the struct function.  */
>        from = data_in->reader_cache->nodes.length ();
> -      lto_input_block ib_main (data + main_offset, header->main_size);
> +      lto_input_block ib_main (data + main_offset, header->main_size,
> +                              file_data->mode_table);
>        if (TREE_CODE (node->decl) == FUNCTION_DECL)
>         {
> -         lto_input_block ib_cfg (data + cfg_offset, header->cfg_size);
> +         lto_input_block ib_cfg (data + cfg_offset, header->cfg_size,
> +                                 file_data->mode_table);
>           input_function (fn_decl, data_in, &ib_main, &ib_cfg);
>         }
>        else
> @@ -1384,7 +1386,8 @@ lto_input_toplevel_asms (struct lto_file
>
>    string_offset = sizeof (*header) + header->main_size;
>
> -  lto_input_block ib (data + sizeof (*header), header->main_size);
> +  lto_input_block ib (data + sizeof (*header), header->main_size,
> +                     file_data->mode_table);
>
>    data_in = lto_data_in_create (file_data, data + string_offset,
>                               header->string_size, vNULL);
> @@ -1403,6 +1406,123 @@ lto_input_toplevel_asms (struct lto_file
>  }
>
>
> +/* Input mode table.  */
> +
> +void
> +lto_input_mode_table (struct lto_file_decl_data *file_data)
> +{
> +  size_t len;
> +  const char *data = lto_get_section_data (file_data, LTO_section_mode_table,
> +                                          NULL, &len);
> +  if (! data)
> +    {
> +      internal_error ("cannot read LTO mode table from %s",
> +                     file_data->file_name);
> +      return;
> +    }
> +
> +  unsigned char *table = ggc_cleared_vec_alloc<unsigned char> (1 << 8);
> +  file_data->mode_table = table;
> +  const struct lto_simple_header_with_strings *header
> +    = (const struct lto_simple_header_with_strings *) data;
> +  int string_offset;
> +  struct data_in *data_in;
> +  string_offset = sizeof (*header) + header->main_size;
> +
> +  lto_input_block ib (data + sizeof (*header), header->main_size, NULL);
> +  data_in = lto_data_in_create (file_data, data + string_offset,
> +                               header->string_size, vNULL);
> +  bitpack_d bp = streamer_read_bitpack (&ib);
> +
> +  table[VOIDmode] = VOIDmode;
> +  table[BLKmode] = BLKmode;
> +  unsigned int m;
> +  while ((m = bp_unpack_value (&bp, 8)) != VOIDmode)
> +    {
> +      enum mode_class mclass
> +       = bp_unpack_enum (&bp, mode_class, MAX_MODE_CLASS);
> +      unsigned int size = bp_unpack_value (&bp, 8);
> +      unsigned int prec = bp_unpack_value (&bp, 16);
> +      machine_mode inner = (machine_mode) table[bp_unpack_value (&bp, 8)];
> +      unsigned int nunits = bp_unpack_value (&bp, 8);
> +      unsigned int ibit = 0, fbit = 0;
> +      unsigned int real_fmt_len = 0;
> +      const char *real_fmt_name = NULL;
> +      switch (mclass)
> +       {
> +       case MODE_FRACT:
> +       case MODE_UFRACT:
> +       case MODE_ACCUM:
> +       case MODE_UACCUM:
> +         ibit = bp_unpack_value (&bp, 8);
> +         fbit = bp_unpack_value (&bp, 8);
> +         break;
> +       case MODE_FLOAT:
> +       case MODE_DECIMAL_FLOAT:
> +         real_fmt_name = bp_unpack_indexed_string (data_in, &bp,
> +                                                   &real_fmt_len);
> +         break;
> +       default:
> +         break;
> +       }
> +      /* First search just the GET_CLASS_NARROWEST_MODE to wider modes,
> +        if not found, fallback to all modes.  */
> +      int pass;
> +      for (pass = 0; pass < 2; pass++)
> +       for (machine_mode mr = pass ? VOIDmode
> +                                   : GET_CLASS_NARROWEST_MODE (mclass);
> +            pass ? mr < MAX_MACHINE_MODE : mr != VOIDmode;
> +            pass ? mr = (machine_mode) (m + 1)
> +                 : mr = GET_MODE_WIDER_MODE (mr))
> +         if (GET_MODE_CLASS (mr) != mclass
> +             || GET_MODE_SIZE (mr) != size
> +             || GET_MODE_PRECISION (mr) != prec
> +             || GET_MODE_INNER (mr) != inner
> +             || GET_MODE_IBIT (mr) != ibit
> +             || GET_MODE_FBIT (mr) != fbit
> +             || GET_MODE_NUNITS (mr) != nunits)
> +           continue;
> +         else if ((mclass == MODE_FLOAT || mclass == MODE_DECIMAL_FLOAT)
> +                  && strcmp (REAL_MODE_FORMAT (mr)->name, real_fmt_name) != 0)
> +           continue;
> +         else
> +           {
> +             table[m] = mr;
> +             pass = 2;
> +             break;
> +           }
> +      unsigned int mname_len;
> +      const char *mname = bp_unpack_indexed_string (data_in, &bp, &mname_len);
> +      if (pass == 2)
> +       {
> +         switch (mclass)
> +           {
> +           case MODE_VECTOR_INT:
> +           case MODE_VECTOR_FLOAT:
> +           case MODE_VECTOR_FRACT:
> +           case MODE_VECTOR_UFRACT:
> +           case MODE_VECTOR_ACCUM:
> +           case MODE_VECTOR_UACCUM:
> +             /* For unsupported vector modes just use BLKmode,
> +                if the scalar mode is supported.  */
> +             if (inner != VOIDmode)
> +               {
> +                 table[m] = BLKmode;
> +                 break;
> +               }
> +             /* FALLTHRU */
> +           default:
> +             error ("unsupported mode %s\n", mname);
> +             break;
> +           }
> +       }
> +    }
> +  lto_data_in_delete (data_in);
> +
> +  lto_free_section_data (file_data, LTO_section_mode_table, NULL, data, len);
> +}
> +
> +
>  /* Initialization for the LTO reader.  */
>
>  void
>
>
>         Jakub

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: nvptx offloading patches [3/n], RFD
  2015-02-16 21:08     ` Jakub Jelinek
  2015-02-16 21:35       ` Richard Biener
@ 2015-02-17 13:32       ` Ilya Verbin
  2015-02-17 15:39         ` Jakub Jelinek
  1 sibling, 1 reply; 42+ messages in thread
From: Ilya Verbin @ 2015-02-17 13:32 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Richard Biener, Jan Hubicka, Bernd Schmidt, Thomas Schwinge, gcc-patches

On Mon, Feb 16, 2015 at 22:08:12 +0100, Jakub Jelinek wrote:
> Anyway, the question is if for offloading we use wpa stage at all these days
> or not at all, if there is a way for ACCEL_COMPILER to differentiate
> somehow between LTO sections written by the host compiler and LTO sections
> perhaps created by the offloading compiler when trying to LTO the thing (if
> it does it at all).  Because obviously the host compiler written LTO
> (in .gnu.offload_lto_*) would need the machine modes translated, while
> LTO streamed already by the ACCEL_COMPILER (if any) generally would already
> use the offloading target machine modes and therefore should be treated as
> native lto (.gnu.lto_*). 

Currently both intelmic and nvptx offloading compilers are executed in
non-partitioned LTO mode.  I don't know whether we need to support WHOPR
(WPA+LTRANS) mode.  Maybe it would be useful for programs with large number of
target regions?  But I think this is not needed for GCC 5.
> 
> If we don't try to write .gnu.offload_lto_* again, I think following patch
> with additionally not calling lto_write_mode_table for !lto_stream_offload_p
> and not calling lto_input_mode_table for !ACCEL_COMPILER - instead build
> a single shared identity table - might actually work.
> 
> Thoughts on this?

Probably the ACCEL_COMPILER in WPA mode (flag_wpa) can read .gnu.offload_lto_*
sections and produce temporary partitions with .gnu.lto_* sections.  And the
ACCEL_COMPILER in LTRANS mode (flag_ltrans) will read .gnu.lto_* sections?

  -- Ilya

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: nvptx offloading patches [3/n], RFD
  2015-02-17 13:32       ` Ilya Verbin
@ 2015-02-17 15:39         ` Jakub Jelinek
  2015-02-17 16:21           ` Joseph Myers
  0 siblings, 1 reply; 42+ messages in thread
From: Jakub Jelinek @ 2015-02-17 15:39 UTC (permalink / raw)
  To: Ilya Verbin, Bernd Schmidt, Thomas Schwinge
  Cc: Richard Biener, Jan Hubicka, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2574 bytes --]

On Tue, Feb 17, 2015 at 04:32:06PM +0300, Ilya Verbin wrote:
> > If we don't try to write .gnu.offload_lto_* again, I think following patch
> > with additionally not calling lto_write_mode_table for !lto_stream_offload_p
> > and not calling lto_input_mode_table for !ACCEL_COMPILER - instead build
> > a single shared identity table - might actually work.
> > 
> > Thoughts on this?
> 
> Probably the ACCEL_COMPILER in WPA mode (flag_wpa) can read .gnu.offload_lto_*
> sections and produce temporary partitions with .gnu.lto_* sections.  And the
> ACCEL_COMPILER in LTRANS mode (flag_ltrans) will read .gnu.lto_* sections?

FYI, I have tested my mode_table patch with the intelmic emul offloading and
saw no regressions.

Then I went over and wanted to try nvptx offloading, but running into
various issues.

I had two patches from Bernd (already approved, why they haven't been
installed?) applied, had to tweak the first one so that it applies,
then my mode_table patch.
I've built nvptx-tools and configured:
../configure --target=nvptx-none --enable-as-accelerator-for=x86_64-pc-linux-gnu --with-build-time-tools=/usr/src/gcc/objnvptxinst/usr/local/nvptx-none/bin --disable-sjlj-exceptions --enable-newlib-io-long-long
make -j16
This failed miserably, because of missing mkoffload.o dependencies, patch attached
(ok for trunk?; it does what intelmic mkoffload.o does; I've tried to add
| $(generated_files) dependency instead, but that somehow didn't work).

The second attempt with that fixed died because for some reason
nvptx-none-as wants to verify by default using ptxas.  Can that be made
configurable?  E.g. for building nvptx offloading in distros, I believe
due to the proprietary cuda stuff it will be better if everything can be
built without the proprietary stuff and only used when actually running it.
E.g. could the verification be done by default only if ptxas is found in
$PATH and not if it isn't found?

Third attempt failed with:
../../../libgcc/config/nvptx/realloc.c:24:20: fatal error: stdlib.h: No such file or directory
compilation terminated.
../../../libgcc/static-object.mk:17: recipe for target 'realloc.o' failed
make[2]: *** [realloc.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[2]: Leaving directory '/usr/src/gcc/objnvptx/nvptx-none/libgcc'
I have nvptx-newlib symlinked into the gcc tree as newlib, so I expected it
would be built in-tree, is that not the case (at least wiki/Offloading
mentions that).  Or is it just that libgcc can't really have dependencies on
newlib headers as newlib is built after libgcc?

	Jakub

[-- Attachment #2: bernds1 --]
[-- Type: text/plain, Size: 8549 bytes --]

    	* cgraph.h (clone_function_name_1): Declare.
    	* cgraphclones.c (clone_function_name_1): New function.
    	(clone_function_name): Use it.
    	* lto-partition.c: Include "stringpool.h".
    	(must_not_rename, maybe_rewrite_identifier,
    	validize_symbol_for_target): New static functions.
    	(privatize_symbol_name): Use must_not_rename.
    	(promote_symbol): Call validize_symbol_for_target.
    	(lto_promote_cross_file_statics): Likewise.
    	(lto_promote_statics_nonwpa): Likewise.

--- gcc/cgraph.h.jj	2015-02-16 11:19:03.474984223 +0100
+++ gcc/cgraph.h	2015-02-17 13:54:00.413964133 +0100
@@ -2206,6 +2206,7 @@ basic_block init_lowered_empty_function
 
 /* In cgraphclones.c  */
 
+tree clone_function_name_1 (const char *, const char *);
 tree clone_function_name (tree decl, const char *);
 
 void tree_function_versioning (tree, tree, vec<ipa_replace_map *, va_gc> *,
--- gcc/cgraphclones.c.jj	2015-02-17 10:07:53.208582797 +0100
+++ gcc/cgraphclones.c	2015-02-17 13:54:00.413964133 +0100
@@ -533,19 +533,19 @@ cgraph_node::create_clone (tree decl, gc
   return new_node;
 }
 
-/* Return a new assembler name for a clone of DECL with SUFFIX.  */
-
 static GTY(()) unsigned int clone_fn_id_num;
 
+/* Return a new assembler name for a clone with SUFFIX of a decl named
+   NAME.  */
+
 tree
-clone_function_name (tree decl, const char *suffix)
+clone_function_name_1 (const char *name, const char *suffix)
 {
-  tree name = DECL_ASSEMBLER_NAME (decl);
-  size_t len = IDENTIFIER_LENGTH (name);
+  size_t len = strlen (name);
   char *tmp_name, *prefix;
 
   prefix = XALLOCAVEC (char, len + strlen (suffix) + 2);
-  memcpy (prefix, IDENTIFIER_POINTER (name), len);
+  memcpy (prefix, name, len);
   strcpy (prefix + len + 1, suffix);
 #ifndef NO_DOT_IN_LABEL
   prefix[len] = '.';
@@ -558,6 +558,16 @@ clone_function_name (tree decl, const ch
   return get_identifier (tmp_name);
 }
 
+/* Return a new assembler name for a clone of DECL with SUFFIX.  */
+
+tree
+clone_function_name (tree decl, const char *suffix)
+{
+  tree name = DECL_ASSEMBLER_NAME (decl);
+  return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix);
+}
+
+
 /* Create callgraph node clone with new declaration.  The actual body will
    be copied later at compilation stage.
 
--- gcc/lto/lto-partition.c.jj	2015-01-15 14:05:08.706092596 +0100
+++ gcc/lto/lto-partition.c	2015-02-17 14:01:13.182718693 +0100
@@ -57,6 +57,7 @@ along with GCC; see the file COPYING3.
 #include "ipa-inline.h"
 #include "ipa-utils.h"
 #include "lto-partition.h"
+#include "stringpool.h"
 
 vec<ltrans_partition> ltrans_partitions;
 
@@ -783,29 +784,11 @@ lto_balanced_map (int n_lto_partitions)
   free (order);
 }
 
-/* Mangle NODE symbol name into a local name.  
-   This is necessary to do
-   1) if two or more static vars of same assembler name
-      are merged into single ltrans unit.
-   2) if prevoiusly static var was promoted hidden to avoid possible conflict
-      with symbols defined out of the LTO world.
-*/
-
+/* Return true if we must not change the name of the NODE.  The name as
+   extracted from the corresponding decl should be passed in NAME.  */
 static bool
-privatize_symbol_name (symtab_node *node)
+must_not_rename (symtab_node *node, const char *name)
 {
-  tree decl = node->decl;
-  cgraph_node *cnode = dyn_cast <cgraph_node *> (node);
-  const char *name;
-
-  /* If we want to privatize instrumentation clone
-     then we need to change original function name
-     which is used via transparent alias chain.  */
-  if (cnode && cnode->instrumentation_clone)
-    decl = cnode->orig_decl;
-
-  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
-
   /* Our renaming machinery do not handle more than one change of assembler name.
      We should not need more than one anyway.  */
   if (node->lto_file_data
@@ -813,9 +796,9 @@ privatize_symbol_name (symtab_node *node
     {
       if (symtab->dump_file)
 	fprintf (symtab->dump_file,
-		"Not privatizing symbol name: %s. It privatized already.\n",
-		name);
-      return false;
+		 "Not privatizing symbol name: %s. It privatized already.\n",
+		 name);
+      return true;
     }
   /* Avoid mangling of already mangled clones. 
      ???  should have a flag whether a symbol has a 'private' name already,
@@ -825,18 +808,108 @@ privatize_symbol_name (symtab_node *node
     {
       if (symtab->dump_file)
 	fprintf (symtab->dump_file,
-		"Not privatizing symbol name: %s. Has unique name.\n",
-		name);
-      return false;
+		 "Not privatizing symbol name: %s. Has unique name.\n",
+		 name);
+      return true;
+    }
+  return false;
+}
+
+/* If we are an offload compiler, we may have to rewrite symbols to be
+   valid on this target.  Return either PTR or a modified version of it.  */
+
+static const char *
+maybe_rewrite_identifier (const char *ptr)
+{
+#if defined ACCEL_COMPILER && (defined NO_DOT_IN_LABEL || defined NO_DOLLAR_IN_LABEL)
+#ifndef NO_DOT_IN_LABEL
+  char valid = '.';
+  const char reject[] = "$";
+#elif !defined NO_DOLLAR_IN_LABEL
+  char valid = '$';
+  const char reject[] = ".";
+#else
+  char valid = '_';
+  const char reject[] = ".$";
+#endif
+
+  char *copy = NULL;
+  const char *match = ptr;
+  for (;;)
+    {
+      size_t off = strcspn (match, reject);
+      if (match[off] == '\0')
+	break;
+      if (copy == NULL)
+	{
+	  copy = xstrdup (ptr);
+	  match = copy;
+	}
+      copy[off] = valid;
+    }
+  return match;
+#else
+  return ptr;
+#endif
+}
+
+/* Ensure that the symbol in NODE is valid for the target, and if not,
+   rewrite it.  */
+
+static void
+validize_symbol_for_target (symtab_node *node)
+{
+  tree decl = node->decl;
+  const char *name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+
+  if (must_not_rename (node, name))
+    return;
+
+  const char *name2 = maybe_rewrite_identifier (name);
+  if (name2 != name)
+    {
+      symtab->change_decl_assembler_name (decl, get_identifier (name2));
+      if (node->lto_file_data)
+	lto_record_renamed_decl (node->lto_file_data, name,
+				 IDENTIFIER_POINTER
+				 (DECL_ASSEMBLER_NAME (decl)));
     }
+}
+
+/* Mangle NODE symbol name into a local name.  
+   This is necessary to do
+   1) if two or more static vars of same assembler name
+      are merged into single ltrans unit.
+   2) if previously static var was promoted hidden to avoid possible conflict
+      with symbols defined out of the LTO world.  */
+
+static bool
+privatize_symbol_name (symtab_node *node)
+{
+  tree decl = node->decl;
+  cgraph_node *cnode = dyn_cast <cgraph_node *> (node);
+
+  /* If we want to privatize instrumentation clone
+     then we need to change original function name
+     which is used via transparent alias chain.  */
+  if (cnode && cnode->instrumentation_clone)
+    decl = cnode->orig_decl;
+
+  const char *name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+
+  if (must_not_rename (node, name))
+    return false;
+
+  name = maybe_rewrite_identifier (name);
   symtab->change_decl_assembler_name (decl,
-				      clone_function_name (decl, "lto_priv"));
+				      clone_function_name_1 (name,
+							     "lto_priv"));
   if (node->lto_file_data)
     lto_record_renamed_decl (node->lto_file_data, name,
 			     IDENTIFIER_POINTER
 			     (DECL_ASSEMBLER_NAME (decl)));
   /* We could change name which is a target of transparent alias
-     chain of instrumented function name.  Fix alias chain if so  .*/
+     chain of instrumented function name.  Fix alias chain if so.  */
   if (cnode)
     {
       tree iname = NULL_TREE;
@@ -868,7 +941,10 @@ promote_symbol (symtab_node *node)
   if (DECL_VISIBILITY (node->decl) == VISIBILITY_HIDDEN
       && DECL_VISIBILITY_SPECIFIED (node->decl)
       && TREE_PUBLIC (node->decl))
-    return;
+    {
+      validize_symbol_for_target (node);
+      return;
+    }
 
   gcc_checking_assert (!TREE_PUBLIC (node->decl)
 		       && !DECL_EXTERNAL (node->decl));
@@ -1007,7 +1083,10 @@ lto_promote_cross_file_statics (void)
 	      /* ... or if we do not partition it. This mean that it will
 		 appear in every partition refernecing it.  */
 	      || node->get_partitioning_class () != SYMBOL_PARTITION)
-	    continue;
+	    {
+	      validize_symbol_for_target (node);
+	      continue;
+	    }
 
           promote_symbol (node);
         }
@@ -1022,5 +1101,8 @@ lto_promote_statics_nonwpa (void)
 {
   symtab_node *node;
   FOR_EACH_SYMBOL (node)
-    rename_statics (NULL, node);
+    {
+      rename_statics (NULL, node);
+      validize_symbol_for_target (node);
+    }
 }

[-- Attachment #3: bernds2 --]
[-- Type: text/plain, Size: 4545 bytes --]

	* tree-streamer-in.c (unpack_ts_decl_common_value_fields,
	unpack_ts_type_common_value_fields): If ACCEL_COMPILER,
	restrict alignments to absolute_biggest_alignment.
	* config/i386/i386.c (TARGET_ABSOLUTE_BIGGEST_ALIGNMENT):
	Define.
	* doc/tm.texi.in (TARGET_ABSOLUTE_BIGGEST_ALIGNMENT): Add.
	* doc/tm.texi: Regenerate.
	* target.def (absolute_biggest_alignment): New DEFHOOKPOD.

Index: gcc/tree-streamer-in.c
===================================================================
--- gcc/tree-streamer-in.c.orig
+++ gcc/tree-streamer-in.c
@@ -217,7 +217,10 @@ unpack_ts_decl_common_value_fields (stru
   DECL_EXTERNAL (expr) = (unsigned) bp_unpack_value (bp, 1);
   DECL_GIMPLE_REG_P (expr) = (unsigned) bp_unpack_value (bp, 1);
   DECL_ALIGN (expr) = (unsigned) bp_unpack_var_len_unsigned (bp);
-
+#ifdef ACCEL_COMPILER
+  if (DECL_ALIGN (expr) > targetm.absolute_biggest_alignment)
+    DECL_ALIGN (expr) = targetm.absolute_biggest_alignment;
+#endif
   if (TREE_CODE (expr) == LABEL_DECL)
     {
       EH_LANDING_PAD_NR (expr) = (int) bp_unpack_var_len_unsigned (bp);
@@ -359,6 +362,10 @@ unpack_ts_type_common_value_fields (stru
   TYPE_READONLY (expr) = (unsigned) bp_unpack_value (bp, 1);
   TYPE_PRECISION (expr) = bp_unpack_var_len_unsigned (bp);
   TYPE_ALIGN (expr) = bp_unpack_var_len_unsigned (bp);
+#ifdef ACCEL_COMPILER
+  if (TYPE_ALIGN (expr) > targetm.absolute_biggest_alignment)
+    TYPE_ALIGN (expr) = targetm.absolute_biggest_alignment;
+#endif
   TYPE_ALIAS_SET (expr) = bp_unpack_var_len_int (bp);
 }
 
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c.orig
+++ gcc/config/i386/i386.c
@@ -47623,6 +47623,9 @@ ix86_atomic_assign_expand_fenv (tree *ho
 #undef TARGET_CALL_FUSAGE_CONTAINS_NON_CALLEE_CLOBBERS
 #define TARGET_CALL_FUSAGE_CONTAINS_NON_CALLEE_CLOBBERS true
 
+#undef TARGET_ABSOLUTE_BIGGEST_ALIGNMENT
+#define TARGET_ABSOLUTE_BIGGEST_ALIGNMENT 512
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 \f
 #include "gt-i386.h"
Index: gcc/config/i386/i386.h
===================================================================
--- gcc/config/i386/i386.h.orig
+++ gcc/config/i386/i386.h
@@ -784,7 +784,10 @@ extern const char *host_detect_local_cpu
    rounder than this.
 
    Pentium+ prefers DFmode values to be aligned to 64 bit boundary
-   and Pentium Pro XFmode values at 128 bit boundaries.  */
+   and Pentium Pro XFmode values at 128 bit boundaries.
+
+   When increasing the maximum, also update
+   TARGET_ABSOLUTE_BIGGEST_ALIGNMENT.  */
 
 #define BIGGEST_ALIGNMENT \
   (TARGET_AVX512F ? 512 : (TARGET_AVX ? 256 : 128))
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi.orig
+++ gcc/doc/tm.texi
@@ -1003,6 +1003,12 @@ bits.  Note that this is not the biggest
 just the biggest alignment that, when violated, may cause a fault.
 @end defmac
 
+@deftypevr {Target Hook} HOST_WIDE_INT TARGET_ABSOLUTE_BIGGEST_ALIGNMENT
+If defined, this target hook specifies the absolute biggest alignment
+that a type or variable can have on this machine, otherwise,
+@code{BIGGEST_ALIGNMENT} is used.
+@end deftypevr
+
 @defmac MALLOC_ABI_ALIGNMENT
 Alignment, in bits, a C conformant malloc implementation has to
 provide.  If not defined, the default value is @code{BITS_PER_WORD}.
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in.orig
+++ gcc/doc/tm.texi.in
@@ -957,6 +957,8 @@ bits.  Note that this is not the biggest
 just the biggest alignment that, when violated, may cause a fault.
 @end defmac
 
+@hook TARGET_ABSOLUTE_BIGGEST_ALIGNMENT
+
 @defmac MALLOC_ABI_ALIGNMENT
 Alignment, in bits, a C conformant malloc implementation has to
 provide.  If not defined, the default value is @code{BITS_PER_WORD}.
Index: gcc/target.def
===================================================================
--- gcc/target.def.orig
+++ gcc/target.def
@@ -1760,6 +1760,13 @@ HOOK_VECTOR_END (vectorize)
 #undef HOOK_PREFIX
 #define HOOK_PREFIX "TARGET_"
 
+DEFHOOKPOD
+(absolute_biggest_alignment,
+ "If defined, this target hook specifies the absolute biggest alignment\n\
+that a type or variable can have on this machine, otherwise,\n\
+@code{BIGGEST_ALIGNMENT} is used.",
+ HOST_WIDE_INT, BIGGEST_ALIGNMENT)
+
 /* Allow target specific overriding of option settings after options have
   been changed by an attribute or pragma or when it is reset at the
   end of the code affected by an attribute or pragma.  */


[-- Attachment #4: V492 --]
[-- Type: text/plain, Size: 514 bytes --]

2015-02-17  Jakub Jelinek  <jakub@redhat.com>

	* config/nvptx/t-nvptx (mkoffload.o): Compile after insn-modes.h
	is generated.

--- gcc/config/nvptx/t-nvptx.jj	2015-01-28 21:24:56.000000000 +0100
+++ gcc/config/nvptx/t-nvptx	2015-02-17 16:08:39.026676002 +0100
@@ -1,6 +1,6 @@
 CFLAGS-mkoffload.o += $(DRIVER_DEFINES) \
 	-DGCC_INSTALL_NAME=\"$(GCC_INSTALL_NAME)\"
-mkoffload.o: $(srcdir)/config/nvptx/mkoffload.c
+mkoffload.o: $(srcdir)/config/nvptx/mkoffload.c | insn-modes.h
 	$(COMPILE) $<
 	$(POSTCOMPILE)
 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: nvptx offloading patches [3/n], RFD
  2015-02-17 15:39         ` Jakub Jelinek
@ 2015-02-17 16:21           ` Joseph Myers
  2015-02-17 16:40             ` Jakub Jelinek
  0 siblings, 1 reply; 42+ messages in thread
From: Joseph Myers @ 2015-02-17 16:21 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Ilya Verbin, Bernd Schmidt, Thomas Schwinge, Richard Biener,
	Jan Hubicka, gcc-patches

On Tue, 17 Feb 2015, Jakub Jelinek wrote:

> Third attempt failed with:
> ../../../libgcc/config/nvptx/realloc.c:24:20: fatal error: stdlib.h: No such file or directory
> compilation terminated.
> ../../../libgcc/static-object.mk:17: recipe for target 'realloc.o' failed
> make[2]: *** [realloc.o] Error 1
> make[2]: *** Waiting for unfinished jobs....
> make[2]: Leaving directory '/usr/src/gcc/objnvptx/nvptx-none/libgcc'
> I have nvptx-newlib symlinked into the gcc tree as newlib, so I expected it
> would be built in-tree, is that not the case (at least wiki/Offloading
> mentions that).  Or is it just that libgcc can't really have dependencies on
> newlib headers as newlib is built after libgcc?

I've committed this patch to fix this last issue (the header dependence, 
that is; I don't know about the in-tree build).

2015-02-17  Joseph Myers  <joseph@codesourcery.com>

	* config/nvptx/realloc.c: Include <stddef.h> instead of <stdlib.h>
	and <string.h>.
	(__nvptx_realloc): Call __builtin_memcpy instead of memcpy.

Index: libgcc/config/nvptx/realloc.c
===================================================================
--- libgcc/config/nvptx/realloc.c	(revision 220763)
+++ libgcc/config/nvptx/realloc.c	(working copy)
@@ -21,8 +21,7 @@
    see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
-#include <stdlib.h>
-#include <string.h>
+#include <stddef.h>
 #include "nvptx-malloc.h"
 
 void *
@@ -44,7 +43,7 @@ __nvptx_realloc (void *ptr, size_t newsz)
       oldsz = *sp;
     }
   if (oldsz != 0)
-    memcpy (newptr, ptr, oldsz > newsz ? newsz : oldsz);
+    __builtin_memcpy (newptr, ptr, oldsz > newsz ? newsz : oldsz);
 
   __nvptx_free (ptr);
   return newptr;

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: nvptx offloading patches [3/n], RFD
  2015-02-17 16:21           ` Joseph Myers
@ 2015-02-17 16:40             ` Jakub Jelinek
  2015-02-18  9:12               ` Thomas Schwinge
  2015-02-19 10:20               ` nvptx offloading patches [3/n], RFD Bernd Schmidt
  0 siblings, 2 replies; 42+ messages in thread
From: Jakub Jelinek @ 2015-02-17 16:40 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Ilya Verbin, Bernd Schmidt, Thomas Schwinge, Richard Biener,
	Jan Hubicka, gcc-patches

On Tue, Feb 17, 2015 at 04:21:06PM +0000, Joseph Myers wrote:
> On Tue, 17 Feb 2015, Jakub Jelinek wrote:
> 
> > Third attempt failed with:
> > ../../../libgcc/config/nvptx/realloc.c:24:20: fatal error: stdlib.h: No such file or directory
> > compilation terminated.
> > ../../../libgcc/static-object.mk:17: recipe for target 'realloc.o' failed
> > make[2]: *** [realloc.o] Error 1
> > make[2]: *** Waiting for unfinished jobs....
> > make[2]: Leaving directory '/usr/src/gcc/objnvptx/nvptx-none/libgcc'
> > I have nvptx-newlib symlinked into the gcc tree as newlib, so I expected it
> > would be built in-tree, is that not the case (at least wiki/Offloading
> > mentions that).  Or is it just that libgcc can't really have dependencies on
> > newlib headers as newlib is built after libgcc?
> 
> I've committed this patch to fix this last issue (the header dependence, 
> that is; I don't know about the in-tree build).

Thanks, sure, libgcc now builds fine, the in-tree build fails:
configure:4261: checking for C compiler default output file name
configure:4283: /usr/src/gcc/objnvptx/./gcc/xgcc -B/usr/src/gcc/objnvptx/./gcc/ -nostdinc -B/usr/src/gcc/objnvptx/nvptx-none/newlib/ -isystem /usr/src/gcc/objnvptx/nvptx-none/newlib/targ-include -isystem /usr/src/gcc/newlib/libc/include -B/usr/local/nvptx-none/bin/ -B/usr/local/nvptx-none/lib/ -isystem /usr/local/nvptx-none/include -isystem /usr/local/nvptx-none/sys-include    -g -O2   conftest.c  >&5
error opening libc.a
collect2: error: ld returned 1 exit status
very early during in-tree newlib configure.

	Jakub

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: nvptx offloading patches [3/n], RFD
  2015-02-16 21:44         ` Jakub Jelinek
  2015-02-17 10:00           ` Richard Biener
@ 2015-02-18  9:05           ` Thomas Schwinge
  1 sibling, 0 replies; 42+ messages in thread
From: Thomas Schwinge @ 2015-02-18  9:05 UTC (permalink / raw)
  To: Jakub Jelinek, Richard Biener
  Cc: Jan Hubicka, Ilya Verbin, Bernd Schmidt, gcc-patches


[-- Attachment #1.1: Type: text/plain, Size: 835 bytes --]

Hi!

On Mon, 16 Feb 2015 22:08:12 +0100, Jakub Jelinek <jakub@redhat.com> wrote:
> On Mon, Feb 09, 2015 at 11:20:00AM +0100, Richard Biener wrote:
> > I think (also communicated that on IRC) we should instead try not streaming
> > machine-modes at all but generating them at stream-in time via layout_type
> > or layout_decl.
> 
> Here is a WIP prototype for being able to stream a machine mode description
> table and streaming it back in.  [...]

Many thanks for that!  (I had modified Bernd's patch to be less
intrusive, see attached, but of course that didn't resolve its design
problem.)

On Mon, 16 Feb 2015 22:43:49 +0100, Jakub Jelinek <jakub@redhat.com> wrote:
> [updated patch]

No regressions with
--enable-offload-targets=nvptx-none=[...],x86_64-intelmicemul-linux-gnu=[...].


Grüße,
 Thomas



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.2: modes-less-intrusive.patch --]
[-- Type: text/x-diff, Size: 3087 bytes --]

commit 97a1ad0d3a96321ded8fad5e3a3cc75b46970bfa
Author: Thomas Schwinge <thomas@codesourcery.com>
Date:   Fri Feb 13 19:51:09 2015 +0100

    Use the offload host CPU's modes.def when building an offloading compiler: make it less intrusive.

diff --git gcc/config.gcc gcc/config.gcc
index ebf0ee6..265ac0e 100644
--- gcc/config.gcc
+++ gcc/config.gcc
@@ -482,15 +482,15 @@ tilepro*-*-*)
 	;;
 esac
 
-offload_host_cpu_type=${cpu_type}
-if test "x${enable_as_accelerator}" != "xno"
-then
-	offload_host_cpu_type=`echo ${enable_as_accelerator_for} | sed 's/-.*$//'`
-fi
-case ${offload_host_cpu_type} in
-x86_64)
-          offload_host_cpu_type=i386
-	  ;;
+modes_cpu_type=${cpu_type}
+case ${enable_as_accelerator}:${target} in
+yes:nvptx-*-*)
+	modes_cpu_type=`echo ${enable_as_accelerator_for} | sed 's/-.*$//'`
+	case ${modes_cpu_type} in
+	x86_64)
+		modes_cpu_type=i386
+		;;
+	esac
 esac
 
 tm_file=${cpu_type}/${cpu_type}.h
@@ -499,9 +499,9 @@ then
 	tm_p_file=${cpu_type}/${cpu_type}-protos.h
 fi
 extra_modes=
-if test -f ${srcdir}/config/${offload_host_cpu_type}/${offload_host_cpu_type}-modes.def
+if test -f ${srcdir}/config/${modes_cpu_type}/${modes_cpu_type}-modes.def
 then
-	extra_modes=${offload_host_cpu_type}/${offload_host_cpu_type}-modes.def
+	extra_modes=${modes_cpu_type}/${modes_cpu_type}-modes.def
 fi
 if test -f ${srcdir}/config/${cpu_type}/${cpu_type}.opt
 then
diff --git gcc/config/i386/i386-modes.def gcc/config/i386/i386-modes.def
index 766681b..0b6a1f1 100644
--- gcc/config/i386/i386-modes.def
+++ gcc/config/i386/i386-modes.def
@@ -24,9 +24,6 @@ along with GCC; see the file COPYING3.  If not see
 FRACTIONAL_FLOAT_MODE (XF, 80, 12, ieee_extended_intel_96_format);
 FLOAT_MODE (TF, 16, ieee_quad_format);
 
-/* This file may be used when building a compiler for an offload target.
-   Assume that no special floating point options are used.  */
-#ifndef ACCEL_COMPILER
 /* In ILP32 mode, XFmode has size 12 and alignment 4.
    In LP64 mode, XFmode has size and alignment 16.  */
 ADJUST_FLOAT_FORMAT (XF, (TARGET_128BIT_LONG_DOUBLE
@@ -36,7 +33,6 @@ ADJUST_FLOAT_FORMAT (XF, (TARGET_128BIT_LONG_DOUBLE
 			  : &ieee_extended_intel_96_format));
 ADJUST_BYTESIZE  (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 12);
 ADJUST_ALIGNMENT (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 4);
-#endif
 
 /* Add any extra modes needed to represent the condition code.
 
diff --git gcc/config/nvptx/nvptx.h gcc/config/nvptx/nvptx.h
index 9a9954b..c0d97ee 100644
--- gcc/config/nvptx/nvptx.h
+++ gcc/config/nvptx/nvptx.h
@@ -64,6 +64,14 @@
 #define DOUBLE_TYPE_SIZE 64
 #define LONG_DOUBLE_TYPE_SIZE 64
 
+#ifdef ACCEL_COMPILER
+/* For ../i386/i386-modes.def.  */
+/* See ../i386/unix.h:TARGET_SUBTARGET64_DEFAULT.  */
+# define TARGET_128BIT_LONG_DOUBLE (TARGET_ABI64)
+/* See ../i386/i386.h:TARGET_96_ROUND_53_LONG_DOUBLE.  */
+# define TARGET_96_ROUND_53_LONG_DOUBLE 0
+#endif
+
 #undef SIZE_TYPE
 #define SIZE_TYPE (TARGET_ABI64 ? "long unsigned int" : "unsigned int")
 #undef PTRDIFF_TYPE

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: nvptx offloading patches [3/n], RFD
  2015-02-17 16:40             ` Jakub Jelinek
@ 2015-02-18  9:12               ` Thomas Schwinge
  2015-02-18 10:27                 ` Jakub Jelinek
  2015-02-18 11:34                 ` Jakub Jelinek
  2015-02-19 10:20               ` nvptx offloading patches [3/n], RFD Bernd Schmidt
  1 sibling, 2 replies; 42+ messages in thread
From: Thomas Schwinge @ 2015-02-18  9:12 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Ilya Verbin, Bernd Schmidt, Richard Biener, Jan Hubicka,
	gcc-patches, Joseph Myers

[-- Attachment #1: Type: text/plain, Size: 1466 bytes --]

Hi!

On Tue, 17 Feb 2015 17:40:33 +0100, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Feb 17, 2015 at 04:21:06PM +0000, Joseph Myers wrote:
> > On Tue, 17 Feb 2015, Jakub Jelinek wrote:
> > > I have nvptx-newlib symlinked into the gcc tree as newlib, so I expected it
> > > would be built in-tree, is that not the case (at least wiki/Offloading
> > > mentions that).

> configure:4261: checking for C compiler default output file name
> configure:4283: /usr/src/gcc/objnvptx/./gcc/xgcc -B/usr/src/gcc/objnvptx/./gcc/ -nostdinc -B/usr/src/gcc/objnvptx/nvptx-none/newlib/ -isystem /usr/src/gcc/objnvptx/nvptx-none/newlib/targ-include -isystem /usr/src/gcc/newlib/libc/include -B/usr/local/nvptx-none/bin/ -B/usr/local/nvptx-none/lib/ -isystem /usr/local/nvptx-none/include -isystem /usr/local/nvptx-none/sys-include    -g -O2   conftest.c  >&5
> error opening libc.a
> collect2: error: ld returned 1 exit status
> very early during in-tree newlib configure.

Do you literally have »nvptx-newlib symlinked into the gcc tree as
newlib«?  If yes, then that should explain the problem: as I wrote in
<http://news.gmane.org/find-root.php?message_id=%3C87egq8mir1.fsf%40schwinge.name%3E>,
you need to »add a symbolic link to nvptx-newlib's newlib directory to
the directory containing the GCC sources«, so not link [GCC]/newlib ->
[newlib-nvptx], but [GCC]/newlib -> [newlib-nvptx]/newlib.  Does that
resolve the issue?


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: nvptx offloading patches [3/n], RFD
  2015-02-17 10:00           ` Richard Biener
@ 2015-02-18 10:00             ` Jakub Jelinek
  2015-02-25  8:51               ` Patch ping Jakub Jelinek
  0 siblings, 1 reply; 42+ messages in thread
From: Jakub Jelinek @ 2015-02-18 10:00 UTC (permalink / raw)
  To: Richard Biener
  Cc: Jan Hubicka, Ilya Verbin, Bernd Schmidt, Thomas Schwinge, GCC Patches

On Tue, Feb 17, 2015 at 11:00:14AM +0100, Richard Biener wrote:
> I'm just looking for a way to make this less of a hack (and the LTO IL
> less target dependent).  Not for GCC 5 for which something like your
> patch is probably ok, but for the future.

So, given Ilya's and Thomas' testing, is this acceptable for now, and
perhaps we can try to do something better for GCC 6?

Here is the patch with full ChangeLog:

2015-02-18  Jakub Jelinek  <jakub@redhat.com>

	* passes.c (ipa_write_summaries_1): Call lto_output_init_mode_table.
	(ipa_write_optimization_summaries): Likewise.
	* tree-streamer.h: Include data-streamer.h.
	(streamer_mode_table): Declare extern variable.
	(bp_pack_machine_mode, bp_unpack_machine_mode): New inline functions.
	* lto-streamer-out.c (lto_output_init_mode_table,
	lto_write_mode_table): New functions.
	(produce_asm_for_decls): Call lto_write_mode_table when streaming
	offloading LTO.
	* lto-section-in.c (lto_section_name): Add "mode_table" entry.
	(lto_create_simple_input_block): Add mode_table argument to the
	lto_input_block constructors.
	* ipa-prop.c (ipa_prop_read_section, read_replacements_section):
	Likewise.
	* data-streamer-in.c (string_for_index): Likewise.
	* ipa-inline-analysis.c (inline_read_section): Likewise.
	* ipa-icf.c (sem_item_optimizer::read_section): Likewise.
	* lto-cgraph.c (input_cgraph_opt_section): Likewise.
	* lto-streamer-in.c (lto_read_body_or_constructor,
	lto_input_toplevel_asms): Likewise.
	(lto_input_mode_table): New function.
	* tree-streamer-out.c (pack_ts_fixed_cst_value_fields,
	pack_ts_decl_common_value_fields, pack_ts_type_common_value_fields):
	Use bp_pack_machine_mode.
	* real.h (struct real_format): Add name field.
	* lto-streamer.h (enum lto_section_type): Add LTO_section_mode_table.
	(class lto_input_block): Add mode_table member.
	(lto_input_block::lto_input_block): Add mode_table_ argument,
	initialize mode_table.
	(struct lto_file_decl_data): Add mode_table field.
	(lto_input_mode_table, lto_output_init_mode_table): New prototypes.
	* tree-streamer-in.c (unpack_ts_fixed_cst_value_fields,
	unpack_ts_decl_common_value_fields,
	unpack_ts_type_common_value_fields): Call bp_unpack_machine_mode.
	* tree-streamer.c (streamer_mode_table): New variable.
	* real.c (ieee_single_format, mips_single_format,
	motorola_single_format, spu_single_format, ieee_double_format,
	mips_double_format, motorola_double_format,
	ieee_extended_motorola_format, ieee_extended_intel_96_format,
	ieee_extended_intel_128_format, ieee_extended_intel_96_round_53_format,
	ibm_extended_format, mips_extended_format, ieee_quad_format,
	mips_quad_format, vax_f_format, vax_d_format, vax_g_format,
	decimal_single_format, decimal_double_format, decimal_quad_format,
	ieee_half_format, arm_half_format, real_internal_format): Add name
	field.
	* config/pdp11/pdp11.c (pdp11_f_format, pdp11_d_format): Likewise.
lto/
	* lto.c (lto_mode_identity_table): New variable.
	(lto_read_decls): Add mode_table argument to the lto_input_block
	constructor.
	(lto_file_finalize): Initialize mode_table.
	(lto_init): Initialize lto_mode_identity_table.

--- gcc/passes.c.jj	2015-02-16 22:18:33.219702315 +0100
+++ gcc/passes.c	2015-02-16 22:19:20.842917807 +0100
@@ -2460,6 +2460,7 @@ ipa_write_summaries_1 (lto_symtab_encode
   struct lto_out_decl_state *state = lto_new_out_decl_state ();
   state->symtab_node_encoder = encoder;
 
+  lto_output_init_mode_table ();
   lto_push_out_decl_state (state);
 
   gcc_assert (!flag_wpa);
@@ -2581,6 +2582,7 @@ ipa_write_optimization_summaries (lto_sy
   lto_symtab_encoder_iterator lsei;
   state->symtab_node_encoder = encoder;
 
+  lto_output_init_mode_table ();
   lto_push_out_decl_state (state);
   for (lsei = lsei_start_function_in_partition (encoder);
        !lsei_end_p (lsei); lsei_next_function_in_partition (&lsei))
--- gcc/tree-streamer.h.jj	2015-02-16 22:18:33.222702266 +0100
+++ gcc/tree-streamer.h	2015-02-16 22:19:20.843917791 +0100
@@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.
 
 #include "streamer-hooks.h"
 #include "lto-streamer.h"
+#include "data-streamer.h"
 #include "hash-map.h"
 
 /* Cache of pickled nodes.  Used to avoid writing the same node more
@@ -91,6 +92,7 @@ void streamer_write_integer_cst (struct
 void streamer_write_builtin (struct output_block *, tree);
 
 /* In tree-streamer.c.  */
+extern unsigned char streamer_mode_table[1 << 8];
 void streamer_check_handled_ts_structures (void);
 bool streamer_tree_cache_insert (struct streamer_tree_cache_d *, tree,
 				 hashval_t, unsigned *);
@@ -119,5 +121,19 @@ streamer_tree_cache_get_hash (struct str
   return cache->hashes[ix];
 }
 
+static inline void
+bp_pack_machine_mode (struct bitpack_d *bp, machine_mode mode)
+{
+  streamer_mode_table[mode] = 1;
+  bp_pack_enum (bp, machine_mode, 1 << 8, mode);
+}
+
+static inline machine_mode
+bp_unpack_machine_mode (struct bitpack_d *bp)
+{
+  return (machine_mode)
+	   ((struct lto_input_block *)
+	    bp->stream)->mode_table[bp_unpack_enum (bp, machine_mode, 1 << 8)];
+}
 
 #endif  /* GCC_TREE_STREAMER_H  */
--- gcc/lto-streamer-out.c.jj	2015-02-16 22:18:33.204702562 +0100
+++ gcc/lto-streamer-out.c	2015-02-16 22:20:06.659163066 +0100
@@ -2642,6 +2642,96 @@ produce_symtab (struct output_block *ob)
 }
 
 
+/* Init the streamer_mode_table for output, where we collect info on what
+   machine_mode values have been streamed.  */
+void
+lto_output_init_mode_table (void)
+{
+  memset (streamer_mode_table, '\0', MAX_MACHINE_MODE);
+}
+
+
+/* Write the mode table.  */
+static void
+lto_write_mode_table (void)
+{
+  struct output_block *ob;
+  ob = create_output_block (LTO_section_mode_table);
+  bitpack_d bp = bitpack_create (ob->main_stream);
+
+  /* Ensure that for GET_MODE_INNER (m) != VOIDmode we have
+     also the inner mode marked.  */
+  for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
+    if (streamer_mode_table[i])
+      {
+	machine_mode m = (machine_mode) i;
+	if (GET_MODE_INNER (m) != VOIDmode)
+	  streamer_mode_table[(int) GET_MODE_INNER (m)] = 1;
+      }
+  /* First stream modes that have GET_MODE_INNER (m) == VOIDmode,
+     so that we can refer to them afterwards.  */
+  for (int pass = 0; pass < 2; pass++)
+    for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
+      if (streamer_mode_table[i] && i != (int) VOIDmode && i != (int) BLKmode)
+	{
+	  machine_mode m = (machine_mode) i;
+	  if ((GET_MODE_INNER (m) == VOIDmode) ^ (pass == 0))
+	    continue;
+	  bp_pack_value (&bp, m, 8);
+	  bp_pack_enum (&bp, mode_class, MAX_MODE_CLASS, GET_MODE_CLASS (m));
+	  bp_pack_value (&bp, GET_MODE_SIZE (m), 8);
+	  bp_pack_value (&bp, GET_MODE_PRECISION (m), 16);
+	  bp_pack_value (&bp, GET_MODE_INNER (m), 8);
+	  bp_pack_value (&bp, GET_MODE_NUNITS (m), 8);
+	  switch (GET_MODE_CLASS (m))
+	    {
+	    case MODE_FRACT:
+	    case MODE_UFRACT:
+	    case MODE_ACCUM:
+	    case MODE_UACCUM:
+	      bp_pack_value (&bp, GET_MODE_IBIT (m), 8);
+	      bp_pack_value (&bp, GET_MODE_FBIT (m), 8);
+	      break;
+	    case MODE_FLOAT:
+	    case MODE_DECIMAL_FLOAT:
+	      bp_pack_string (ob, &bp, REAL_MODE_FORMAT (m)->name, true);
+	      break;
+	    default:
+	      break;
+	    }
+	  bp_pack_string (ob, &bp, GET_MODE_NAME (m), true);
+	}
+  bp_pack_value (&bp, VOIDmode, 8);
+
+  streamer_write_bitpack (&bp);
+
+  char *section_name
+    = lto_get_section_name (LTO_section_mode_table, NULL, NULL);
+  lto_begin_section (section_name, !flag_wpa);
+  free (section_name);
+
+  /* The entire header stream is computed here.  */
+  struct lto_simple_header_with_strings header;
+  memset (&header, 0, sizeof (header));
+
+  /* Write the header.  */
+  header.major_version = LTO_major_version;
+  header.minor_version = LTO_minor_version;
+
+  header.main_size = ob->main_stream->total_size;
+  header.string_size = ob->string_stream->total_size;
+  lto_write_data (&header, sizeof header);
+
+  /* Put all of the gimple and the string table out the asm file as a
+     block of text.  */
+  lto_write_stream (ob->main_stream);
+  lto_write_stream (ob->string_stream);
+
+  lto_end_section ();
+  destroy_output_block (ob);
+}
+
+
 /* This pass is run after all of the functions are serialized and all
    of the IPA passes have written their serialized forms.  This pass
    causes the vector of all of the global decls and types used from
@@ -2749,4 +2839,6 @@ produce_asm_for_decls (void)
   lto_symtab_encoder_delete (ob->decl_state->symtab_node_encoder);
   lto_function_decl_states.release ();
   destroy_output_block (ob);
+  if (lto_stream_offload_p)
+    lto_write_mode_table ();
 }
--- gcc/lto-section-in.c.jj	2015-02-16 22:18:33.202702595 +0100
+++ gcc/lto-section-in.c	2015-02-16 22:19:20.845917758 +0100
@@ -89,7 +89,8 @@ const char *lto_section_name[LTO_N_SECTI
   "inline",
   "ipcp_trans",
   "icf",
-  "offload_table"
+  "offload_table",
+  "mode_table"
 };
 
 
@@ -262,7 +263,8 @@ lto_create_simple_input_block (struct lt
     return NULL;
 
   *datar = data;
-  return new lto_input_block (data + main_offset, header->main_size);
+  return new lto_input_block (data + main_offset, header->main_size,
+			      file_data->mode_table);
 }
 
 
--- gcc/tree-streamer-out.c.jj	2015-02-16 22:18:33.222702266 +0100
+++ gcc/tree-streamer-out.c	2015-02-16 22:19:20.845917758 +0100
@@ -190,7 +190,7 @@ static void
 pack_ts_fixed_cst_value_fields (struct bitpack_d *bp, tree expr)
 {
   struct fixed_value fv = TREE_FIXED_CST (expr);
-  bp_pack_enum (bp, machine_mode, MAX_MACHINE_MODE, fv.mode);
+  bp_pack_machine_mode (bp, fv.mode);
   bp_pack_var_len_int (bp, fv.data.low);
   bp_pack_var_len_int (bp, fv.data.high);
 }
@@ -201,7 +201,7 @@ pack_ts_fixed_cst_value_fields (struct b
 static void
 pack_ts_decl_common_value_fields (struct bitpack_d *bp, tree expr)
 {
-  bp_pack_enum (bp, machine_mode, MAX_MACHINE_MODE, DECL_MODE (expr));
+  bp_pack_machine_mode (bp, DECL_MODE (expr));
   bp_pack_value (bp, DECL_NONLOCAL (expr), 1);
   bp_pack_value (bp, DECL_VIRTUAL_P (expr), 1);
   bp_pack_value (bp, DECL_IGNORED_P (expr), 1);
@@ -325,7 +325,7 @@ pack_ts_function_decl_value_fields (stru
 static void
 pack_ts_type_common_value_fields (struct bitpack_d *bp, tree expr)
 {
-  bp_pack_enum (bp, machine_mode, MAX_MACHINE_MODE, TYPE_MODE (expr));
+  bp_pack_machine_mode (bp, TYPE_MODE (expr));
   bp_pack_value (bp, TYPE_STRING_FLAG (expr), 1);
   bp_pack_value (bp, TYPE_NO_FORCE_BLK (expr), 1);
   bp_pack_value (bp, TYPE_NEEDS_CONSTRUCTING (expr), 1);
--- gcc/real.h.jj	2015-02-16 22:18:33.220702299 +0100
+++ gcc/real.h	2015-02-16 22:19:20.846917741 +0100
@@ -155,6 +155,7 @@ struct real_format
   bool has_signed_zero;
   bool qnan_msb_set;
   bool canonical_nan_lsbs_set;
+  const char *name;
 };
 
 
--- gcc/lto-streamer.h.jj	2015-02-16 22:18:33.211702447 +0100
+++ gcc/lto-streamer.h	2015-02-16 22:19:20.846917741 +0100
@@ -248,6 +248,7 @@ enum lto_section_type
   LTO_section_ipcp_transform,
   LTO_section_ipa_icf,
   LTO_section_offload_table,
+  LTO_section_mode_table,
   LTO_N_SECTION_TYPES		/* Must be last.  */
 };
 
@@ -312,12 +313,15 @@ class lto_input_block
 public:
   /* Special constructor for the string table, it abuses this to
      do random access but use the uhwi decoder.  */
-  lto_input_block (const char *data_, unsigned int p_, unsigned int len_)
-      : data (data_), p (p_), len (len_) {}
-  lto_input_block (const char *data_, unsigned int len_)
-      : data (data_), p (0), len (len_) {}
+  lto_input_block (const char *data_, unsigned int p_, unsigned int len_,
+		   const unsigned char *mode_table_)
+      : data (data_), mode_table (mode_table_), p (p_), len (len_) {}
+  lto_input_block (const char *data_, unsigned int len_,
+		   const unsigned char *mode_table_)
+      : data (data_), mode_table (mode_table_), p (0), len (len_) {}
 
   const char *data;
+  const unsigned char *mode_table;
   unsigned int p;
   unsigned int len;
 };
@@ -527,6 +531,9 @@ struct GTY(()) lto_file_decl_data
 
   /* Map assigning declarations their resolutions.  */
   hash_map<tree, ld_plugin_symbol_resolution> * GTY((skip)) resolution_map;
+
+  /* Mode translation table.  */
+  const unsigned char *mode_table;
 };
 
 typedef struct lto_file_decl_data *lto_file_decl_data_ptr;
@@ -775,6 +782,7 @@ extern void lto_input_variable_construct
 extern void lto_input_constructors_and_inits (struct lto_file_decl_data *,
 					      const char *);
 extern void lto_input_toplevel_asms (struct lto_file_decl_data *, int);
+extern void lto_input_mode_table (struct lto_file_decl_data *);
 extern struct data_in *lto_data_in_create (struct lto_file_decl_data *,
 				    const char *, unsigned,
 				    vec<ld_plugin_symbol_resolution_t> );
@@ -807,6 +815,7 @@ void lto_output_decl_state_refs (struct
 			         struct lto_output_stream *,
 			         struct lto_out_decl_state *);
 void lto_output_location (struct output_block *, struct bitpack_d *, location_t);
+void lto_output_init_mode_table (void);
 
 
 /* In lto-cgraph.c  */
--- gcc/ipa-prop.c.jj	2015-02-16 22:18:33.219702315 +0100
+++ gcc/ipa-prop.c	2015-02-16 22:19:20.848917709 +0100
@@ -4868,7 +4868,7 @@ ipa_prop_read_section (struct lto_file_d
   unsigned int count;
 
   lto_input_block ib_main ((const char *) data + main_offset,
-			   header->main_size);
+			   header->main_size, file_data->mode_table);
 
   data_in =
     lto_data_in_create (file_data, (const char *) data + string_offset,
@@ -5089,7 +5089,7 @@ read_replacements_section (struct lto_fi
   unsigned int count;
 
   lto_input_block ib_main ((const char *) data + main_offset,
-			   header->main_size);
+			   header->main_size, file_data->mode_table);
 
   data_in = lto_data_in_create (file_data, (const char *) data + string_offset,
 				header->string_size, vNULL);
--- gcc/data-streamer-in.c.jj	2015-02-16 22:18:33.224702233 +0100
+++ gcc/data-streamer-in.c	2015-02-16 22:19:20.848917709 +0100
@@ -70,7 +70,7 @@ string_for_index (struct data_in *data_i
     }
 
   /* Get the string stored at location LOC in DATA_IN->STRINGS.  */
-  lto_input_block str_tab (data_in->strings, loc - 1, data_in->strings_len);
+  lto_input_block str_tab (data_in->strings, loc - 1, data_in->strings_len, NULL);
   len = streamer_read_uhwi (&str_tab);
   *rlen = len;
 
--- gcc/tree-streamer-in.c.jj	2015-02-16 22:18:33.220702299 +0100
+++ gcc/tree-streamer-in.c	2015-02-16 22:19:20.849917692 +0100
@@ -224,7 +224,7 @@ static void
 unpack_ts_fixed_cst_value_fields (struct bitpack_d *bp, tree expr)
 {
   FIXED_VALUE_TYPE *fp = ggc_alloc<fixed_value> ();
-  fp->mode = bp_unpack_enum (bp, machine_mode, MAX_MACHINE_MODE);
+  fp->mode = bp_unpack_machine_mode (bp);
   fp->data.low = bp_unpack_var_len_int (bp);
   fp->data.high = bp_unpack_var_len_int (bp);
   TREE_FIXED_CST_PTR (expr) = fp;
@@ -236,7 +236,7 @@ unpack_ts_fixed_cst_value_fields (struct
 static void
 unpack_ts_decl_common_value_fields (struct bitpack_d *bp, tree expr)
 {
-  DECL_MODE (expr) = bp_unpack_enum (bp, machine_mode, MAX_MACHINE_MODE);
+  DECL_MODE (expr) = bp_unpack_machine_mode (bp);
   DECL_NONLOCAL (expr) = (unsigned) bp_unpack_value (bp, 1);
   DECL_VIRTUAL_P (expr) = (unsigned) bp_unpack_value (bp, 1);
   DECL_IGNORED_P (expr) = (unsigned) bp_unpack_value (bp, 1);
@@ -373,7 +373,7 @@ unpack_ts_type_common_value_fields (stru
 {
   machine_mode mode;
 
-  mode = bp_unpack_enum (bp, machine_mode, MAX_MACHINE_MODE);
+  mode = bp_unpack_machine_mode (bp);
   SET_TYPE_MODE (expr, mode);
   TYPE_STRING_FLAG (expr) = (unsigned) bp_unpack_value (bp, 1);
   TYPE_NO_FORCE_BLK (expr) = (unsigned) bp_unpack_value (bp, 1);
--- gcc/ipa-inline-analysis.c.jj	2015-02-16 22:18:33.223702249 +0100
+++ gcc/ipa-inline-analysis.c	2015-02-16 22:19:20.850917676 +0100
@@ -4190,7 +4190,8 @@ inline_read_section (struct lto_file_dec
   unsigned int i, count2, j;
   unsigned int f_count;
 
-  lto_input_block ib ((const char *) data + main_offset, header->main_size);
+  lto_input_block ib ((const char *) data + main_offset, header->main_size,
+		      file_data->mode_table);
 
   data_in =
     lto_data_in_create (file_data, (const char *) data + string_offset,
--- gcc/ipa-icf.c.jj	2015-02-16 22:18:33.222702266 +0100
+++ gcc/ipa-icf.c	2015-02-16 22:19:20.851917659 +0100
@@ -1500,7 +1500,7 @@ sem_item_optimizer::read_section (lto_fi
   unsigned int count;
 
   lto_input_block ib_main ((const char *) data + main_offset, 0,
-			   header->main_size);
+			   header->main_size, file_data->mode_table);
 
   data_in =
     lto_data_in_create (file_data, (const char *) data + string_offset,
--- gcc/tree-streamer.c.jj	2015-02-16 22:18:33.221702282 +0100
+++ gcc/tree-streamer.c	2015-02-16 22:19:20.853917626 +0100
@@ -53,6 +53,14 @@ along with GCC; see the file COPYING3.
 #include "cgraph.h"
 #include "tree-streamer.h"
 
+/* Table indexed by machine_mode, used for 2 different purposes.
+   During streaming out we record there non-zero value for all modes
+   that were streamed out.
+   During streaming in, we translate the on the disk mode using this
+   table.  For normal LTO it is set to identity, for ACCEL_COMPILER
+   depending on the mode_table content.  */
+unsigned char streamer_mode_table[1 << 8];
+
 /* Check that all the TS_* structures handled by the streamer_write_* and
    streamer_read_* routines are exactly ALL the structures defined in
    treestruct.def.  */
--- gcc/lto-cgraph.c.jj	2015-02-16 22:18:33.211702447 +0100
+++ gcc/lto-cgraph.c	2015-02-16 22:19:20.855917593 +0100
@@ -2113,7 +2113,7 @@ input_cgraph_opt_section (struct lto_fil
   unsigned int count;
 
   lto_input_block ib_main ((const char *) data + main_offset,
-			   header->main_size);
+			   header->main_size, file_data->mode_table);
 
   data_in =
     lto_data_in_create (file_data, (const char *) data + string_offset,
--- gcc/lto-streamer-in.c.jj	2015-02-16 22:18:33.204702562 +0100
+++ gcc/lto-streamer-in.c	2015-02-16 22:26:53.355464202 +0100
@@ -1116,10 +1116,12 @@ lto_read_body_or_constructor (struct lto
 
       /* Set up the struct function.  */
       from = data_in->reader_cache->nodes.length ();
-      lto_input_block ib_main (data + main_offset, header->main_size);
+      lto_input_block ib_main (data + main_offset, header->main_size,
+			       file_data->mode_table);
       if (TREE_CODE (node->decl) == FUNCTION_DECL)
 	{
-	  lto_input_block ib_cfg (data + cfg_offset, header->cfg_size);
+	  lto_input_block ib_cfg (data + cfg_offset, header->cfg_size,
+				  file_data->mode_table);
 	  input_function (fn_decl, data_in, &ib_main, &ib_cfg);
 	}
       else
@@ -1384,7 +1386,8 @@ lto_input_toplevel_asms (struct lto_file
 
   string_offset = sizeof (*header) + header->main_size;
 
-  lto_input_block ib (data + sizeof (*header), header->main_size);
+  lto_input_block ib (data + sizeof (*header), header->main_size,
+		      file_data->mode_table);
 
   data_in = lto_data_in_create (file_data, data + string_offset,
 			      header->string_size, vNULL);
@@ -1403,6 +1406,123 @@ lto_input_toplevel_asms (struct lto_file
 }
 
 
+/* Input mode table.  */
+
+void
+lto_input_mode_table (struct lto_file_decl_data *file_data)
+{
+  size_t len;
+  const char *data = lto_get_section_data (file_data, LTO_section_mode_table,
+					   NULL, &len);
+  if (! data)
+    {
+      internal_error ("cannot read LTO mode table from %s",
+		      file_data->file_name);
+      return;
+    }
+
+  unsigned char *table = ggc_cleared_vec_alloc<unsigned char> (1 << 8);
+  file_data->mode_table = table;
+  const struct lto_simple_header_with_strings *header
+    = (const struct lto_simple_header_with_strings *) data;
+  int string_offset;
+  struct data_in *data_in;
+  string_offset = sizeof (*header) + header->main_size;
+
+  lto_input_block ib (data + sizeof (*header), header->main_size, NULL);
+  data_in = lto_data_in_create (file_data, data + string_offset,
+				header->string_size, vNULL);
+  bitpack_d bp = streamer_read_bitpack (&ib);
+
+  table[VOIDmode] = VOIDmode;
+  table[BLKmode] = BLKmode;
+  unsigned int m;
+  while ((m = bp_unpack_value (&bp, 8)) != VOIDmode)
+    {
+      enum mode_class mclass
+	= bp_unpack_enum (&bp, mode_class, MAX_MODE_CLASS);
+      unsigned int size = bp_unpack_value (&bp, 8);
+      unsigned int prec = bp_unpack_value (&bp, 16);
+      machine_mode inner = (machine_mode) table[bp_unpack_value (&bp, 8)];
+      unsigned int nunits = bp_unpack_value (&bp, 8);
+      unsigned int ibit = 0, fbit = 0;
+      unsigned int real_fmt_len = 0;
+      const char *real_fmt_name = NULL;
+      switch (mclass)
+	{
+	case MODE_FRACT:
+	case MODE_UFRACT:
+	case MODE_ACCUM:
+	case MODE_UACCUM:
+	  ibit = bp_unpack_value (&bp, 8);
+	  fbit = bp_unpack_value (&bp, 8);
+	  break;
+	case MODE_FLOAT:
+	case MODE_DECIMAL_FLOAT:
+	  real_fmt_name = bp_unpack_indexed_string (data_in, &bp,
+						    &real_fmt_len);
+	  break;
+	default:
+	  break;
+	}
+      /* First search just the GET_CLASS_NARROWEST_MODE to wider modes,
+	 if not found, fallback to all modes.  */
+      int pass;
+      for (pass = 0; pass < 2; pass++)
+	for (machine_mode mr = pass ? VOIDmode
+				    : GET_CLASS_NARROWEST_MODE (mclass);
+	     pass ? mr < MAX_MACHINE_MODE : mr != VOIDmode;
+	     pass ? mr = (machine_mode) (m + 1)
+		  : mr = GET_MODE_WIDER_MODE (mr))
+	  if (GET_MODE_CLASS (mr) != mclass
+	      || GET_MODE_SIZE (mr) != size
+	      || GET_MODE_PRECISION (mr) != prec
+	      || GET_MODE_INNER (mr) != inner
+	      || GET_MODE_IBIT (mr) != ibit
+	      || GET_MODE_FBIT (mr) != fbit
+	      || GET_MODE_NUNITS (mr) != nunits)
+	    continue;
+	  else if ((mclass == MODE_FLOAT || mclass == MODE_DECIMAL_FLOAT)
+		   && strcmp (REAL_MODE_FORMAT (mr)->name, real_fmt_name) != 0)
+	    continue;
+	  else
+	    {
+	      table[m] = mr;
+	      pass = 2;
+	      break;
+	    }
+      unsigned int mname_len;
+      const char *mname = bp_unpack_indexed_string (data_in, &bp, &mname_len);
+      if (pass == 2)
+	{
+	  switch (mclass)
+	    {
+	    case MODE_VECTOR_INT:
+	    case MODE_VECTOR_FLOAT:
+	    case MODE_VECTOR_FRACT:
+	    case MODE_VECTOR_UFRACT:
+	    case MODE_VECTOR_ACCUM:
+	    case MODE_VECTOR_UACCUM:
+	      /* For unsupported vector modes just use BLKmode,
+		 if the scalar mode is supported.  */
+	      if (inner != VOIDmode)
+		{
+		  table[m] = BLKmode;
+		  break;
+		}
+	      /* FALLTHRU */
+	    default:
+	      error ("unsupported mode %s\n", mname);
+	      break;
+	    }
+	}
+    }
+  lto_data_in_delete (data_in);
+
+  lto_free_section_data (file_data, LTO_section_mode_table, NULL, data, len);
+}
+
+
 /* Initialization for the LTO reader.  */
 
 void
--- gcc/real.c.jj	2015-02-16 22:18:33.220702299 +0100
+++ gcc/real.c	2015-02-16 22:19:20.853917626 +0100
@@ -3031,7 +3031,8 @@ const struct real_format ieee_single_for
     true,
     true,
     true,
-    false
+    false,
+    "ieee_single"
   };
 
 const struct real_format mips_single_format =
@@ -3052,7 +3053,8 @@ const struct real_format mips_single_for
     true,
     true,
     false,
-    true
+    true,
+    "mips_single"
   };
 
 const struct real_format motorola_single_format =
@@ -3073,7 +3075,8 @@ const struct real_format motorola_single
     true,
     true,
     true,
-    true
+    true,
+    "motorola_single"
   };
 
 /*  SPU Single Precision (Extended-Range Mode) format is the same as IEEE
@@ -3105,7 +3108,8 @@ const struct real_format spu_single_form
     true,
     true,
     false,
-    false
+    false,
+    "spu_single"
   };
 \f
 /* IEEE double-precision format.  */
@@ -3314,7 +3318,8 @@ const struct real_format ieee_double_for
     true,
     true,
     true,
-    false
+    false,
+    "ieee_double"
   };
 
 const struct real_format mips_double_format =
@@ -3335,7 +3340,8 @@ const struct real_format mips_double_for
     true,
     true,
     false,
-    true
+    true,
+    "mips_double"
   };
 
 const struct real_format motorola_double_format =
@@ -3356,7 +3362,8 @@ const struct real_format motorola_double
     true,
     true,
     true,
-    true
+    true,
+    "motorola_double"
   };
 \f
 /* IEEE extended real format.  This comes in three flavors: Intel's as
@@ -3700,7 +3707,8 @@ const struct real_format ieee_extended_m
     true,
     true,
     true,
-    true
+    true,
+    "ieee_extended_motorola"
   };
 
 const struct real_format ieee_extended_intel_96_format =
@@ -3721,7 +3729,8 @@ const struct real_format ieee_extended_i
     true,
     true,
     true,
-    false
+    false,
+    "ieee_extended_intel_96"
   };
 
 const struct real_format ieee_extended_intel_128_format =
@@ -3742,7 +3751,8 @@ const struct real_format ieee_extended_i
     true,
     true,
     true,
-    false
+    false,
+    "ieee_extended_intel_128"
   };
 
 /* The following caters to i386 systems that set the rounding precision
@@ -3765,7 +3775,8 @@ const struct real_format ieee_extended_i
     true,
     true,
     true,
-    false
+    false,
+    "ieee_extended_intel_96_round_53"
   };
 \f
 /* IBM 128-bit extended precision format: a pair of IEEE double precision
@@ -3853,7 +3864,8 @@ const struct real_format ibm_extended_fo
     true,
     true,
     true,
-    false
+    false,
+    "ibm_extended"
   };
 
 const struct real_format mips_extended_format =
@@ -3874,7 +3886,8 @@ const struct real_format mips_extended_f
     true,
     true,
     false,
-    true
+    true,
+    "mips_extended"
   };
 
 \f
@@ -4137,7 +4150,8 @@ const struct real_format ieee_quad_forma
     true,
     true,
     true,
-    false
+    false,
+    "ieee_quad"
   };
 
 const struct real_format mips_quad_format =
@@ -4158,7 +4172,8 @@ const struct real_format mips_quad_forma
     true,
     true,
     false,
-    true
+    true,
+    "mips_quad"
   };
 \f
 /* Descriptions of VAX floating point formats can be found beginning at
@@ -4458,7 +4473,8 @@ const struct real_format vax_f_format =
     false,
     false,
     false,
-    false
+    false,
+    "vax_f"
   };
 
 const struct real_format vax_d_format =
@@ -4479,7 +4495,8 @@ const struct real_format vax_d_format =
     false,
     false,
     false,
-    false
+    false,
+    "vax_d"
   };
 
 const struct real_format vax_g_format =
@@ -4500,7 +4517,8 @@ const struct real_format vax_g_format =
     false,
     false,
     false,
-    false
+    false,
+    "vax_g"
   };
 \f
 /* Encode real R into a single precision DFP value in BUF.  */
@@ -4576,7 +4594,8 @@ const struct real_format decimal_single_
     true,
     true,
     true,
-    false
+    false,
+    "decimal_single"
   };
 
 /* Double precision decimal floating point (IEEE 754). */
@@ -4598,7 +4617,8 @@ const struct real_format decimal_double_
     true,
     true,
     true,
-    false
+    false,
+    "decimal_double"
   };
 
 /* Quad precision decimal floating point (IEEE 754). */
@@ -4620,7 +4640,8 @@ const struct real_format decimal_quad_fo
     true,
     true,
     true,
-    false
+    false,
+    "decimal_quad"
   };
 \f
 /* Encode half-precision floats.  This routine is used both for the IEEE
@@ -4757,7 +4778,8 @@ const struct real_format ieee_half_forma
     true,
     true,
     true,
-    false
+    false,
+    "ieee_half"
   };
 
 /* ARM's alternative half-precision format, similar to IEEE but with
@@ -4781,7 +4803,8 @@ const struct real_format arm_half_format
     true,
     true,
     false,
-    false
+    false,
+    "arm_half"
   };
 \f
 /* A synthetic "format" for internal arithmetic.  It's the size of the
@@ -4826,7 +4849,8 @@ const struct real_format real_internal_f
     false,
     true,
     true,
-    false
+    false,
+    "real_internal"
   };
 \f
 /* Calculate X raised to the integer exponent N in mode MODE and store
--- gcc/config/pdp11/pdp11.c.jj	2015-02-16 22:18:33.209702480 +0100
+++ gcc/config/pdp11/pdp11.c	2015-02-16 22:19:20.845917758 +0100
@@ -107,7 +107,8 @@ const struct real_format pdp11_f_format
     false,
     false,
     false,
-    false
+    false,
+    "pdp11_f"
   };
 
 const struct real_format pdp11_d_format =
@@ -128,7 +129,8 @@ const struct real_format pdp11_d_format
     false,
     false,
     false,
-    false
+    false,
+    "pdp11_d"
   };
 
 static void
--- gcc/lto/lto.c.jj	2015-02-16 22:18:33.221702282 +0100
+++ gcc/lto/lto.c	2015-02-16 22:35:56.213523202 +0100
@@ -85,6 +85,8 @@ static int lto_parallelism;
 
 static GTY(()) tree first_personality_decl;
 
+static GTY(()) const unsigned char *lto_mode_identity_table;
+
 /* Returns a hash code for P.  */
 
 static hashval_t
@@ -1877,7 +1879,7 @@ lto_read_decls (struct lto_file_decl_dat
   uint32_t num_decl_states;
 
   lto_input_block ib_main ((const char *) data + main_offset,
-			   header->main_size);
+			   header->main_size, decl_data->mode_table);
 
   data_in = lto_data_in_create (decl_data, (const char *) data + string_offset,
 				header->string_size, resolutions);
@@ -2219,6 +2221,11 @@ lto_file_finalize (struct lto_file_decl_
 
   file_data->renaming_hash_table = lto_create_renaming_table ();
   file_data->file_name = file->filename;
+#ifdef ACCEL_COMPILER
+  lto_input_mode_table (file_data);
+#else
+  file_data->mode_table = lto_mode_identity_table;
+#endif
   data = lto_get_section_data (file_data, LTO_section_decls, NULL, &len);
   if (data == NULL)
     {
@@ -3394,6 +3401,13 @@ lto_init (void)
   memset (&lto_stats, 0, sizeof (lto_stats));
   bitmap_obstack_initialize (NULL);
   gimple_register_cfg_hooks ();
+#ifndef ACCEL_COMPILER
+  unsigned char *table
+    = ggc_vec_alloc<unsigned char> (MAX_MACHINE_MODE);
+  for (int m = 0; m < MAX_MACHINE_MODE; m++)
+    table[m] = m;
+  lto_mode_identity_table = table;
+#endif
 }
 
 

	Jakub

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: nvptx offloading patches [3/n], RFD
  2015-02-18  9:12               ` Thomas Schwinge
@ 2015-02-18 10:27                 ` Jakub Jelinek
  2015-02-18 11:34                 ` Jakub Jelinek
  1 sibling, 0 replies; 42+ messages in thread
From: Jakub Jelinek @ 2015-02-18 10:27 UTC (permalink / raw)
  To: Thomas Schwinge
  Cc: Ilya Verbin, Bernd Schmidt, Richard Biener, Jan Hubicka,
	gcc-patches, Joseph Myers

On Wed, Feb 18, 2015 at 10:12:19AM +0100, Thomas Schwinge wrote:
> On Tue, 17 Feb 2015 17:40:33 +0100, Jakub Jelinek <jakub@redhat.com> wrote:
> > On Tue, Feb 17, 2015 at 04:21:06PM +0000, Joseph Myers wrote:
> > > On Tue, 17 Feb 2015, Jakub Jelinek wrote:
> > > > I have nvptx-newlib symlinked into the gcc tree as newlib, so I expected it
> > > > would be built in-tree, is that not the case (at least wiki/Offloading
> > > > mentions that).
> 
> > configure:4261: checking for C compiler default output file name
> > configure:4283: /usr/src/gcc/objnvptx/./gcc/xgcc -B/usr/src/gcc/objnvptx/./gcc/ -nostdinc -B/usr/src/gcc/objnvptx/nvptx-none/newlib/ -isystem /usr/src/gcc/objnvptx/nvptx-none/newlib/targ-include -isystem /usr/src/gcc/newlib/libc/include -B/usr/local/nvptx-none/bin/ -B/usr/local/nvptx-none/lib/ -isystem /usr/local/nvptx-none/include -isystem /usr/local/nvptx-none/sys-include    -g -O2   conftest.c  >&5
> > error opening libc.a
> > collect2: error: ld returned 1 exit status
> > very early during in-tree newlib configure.
> 
> Do you literally have »nvptx-newlib symlinked into the gcc tree as
> newlib«?  If yes, then that should explain the problem: as I wrote in
> <http://news.gmane.org/find-root.php?message_id=%3C87egq8mir1.fsf%40schwinge.name%3E>,
> you need to »add a symbolic link to nvptx-newlib's newlib directory to
> the directory containing the GCC sources«, so not link [GCC]/newlib ->
> [newlib-nvptx], but [GCC]/newlib -> [newlib-nvptx]/newlib.  Does that
> resolve the issue?

My bad.  Yes, that does resolve the issue, make & make install now worked
for nvptx-none for me with the patches (2 from Bernd, my mode_table, my
t-nvptx).

Can you or Bernd comment on the other issues I've raised, i.e. whether you
are going to apply Bernd's approved patches, on the t-nvptx fix?

I'll try to have a look at the va_list stuff, if it blocks everything rather
than just testcases with va_list being offloaded.

	Jakub

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: nvptx offloading patches [3/n], RFD
  2015-02-18  9:12               ` Thomas Schwinge
  2015-02-18 10:27                 ` Jakub Jelinek
@ 2015-02-18 11:34                 ` Jakub Jelinek
  2015-02-18 12:10                   ` Thomas Schwinge
  1 sibling, 1 reply; 42+ messages in thread
From: Jakub Jelinek @ 2015-02-18 11:34 UTC (permalink / raw)
  To: Thomas Schwinge
  Cc: Ilya Verbin, Bernd Schmidt, Richard Biener, Jan Hubicka,
	gcc-patches, Joseph Myers

On Wed, Feb 18, 2015 at 10:12:19AM +0100, Thomas Schwinge wrote:
> Do you literally have »nvptx-newlib symlinked into the gcc tree as
> newlib«?  If yes, then that should explain the problem: as I wrote in
> <http://news.gmane.org/find-root.php?message_id=%3C87egq8mir1.fsf%40schwinge.name%3E>,
> you need to »add a symbolic link to nvptx-newlib's newlib directory to
> the directory containing the GCC sources«, so not link [GCC]/newlib ->
> [newlib-nvptx], but [GCC]/newlib -> [newlib-nvptx]/newlib.  Does that
> resolve the issue?

BTW, --with-cuda-driver-{include,lib} are apparently not documented in
gcc/doc/ (--with-cuda-driver neither, but can't use that, as lib is
/usr/local/cuda-6.5/lib64 in my case), and isn't documented on wiki/Offloading
either.

../configure --target=nvptx-none --enable-as-accelerator-for=x86_64-pc-linux-gnu --with-build-time-tools=/usr/src/gcc/objnvptxinst/usr/local/nvptx-none/bin --disable-sjlj-exceptions --enable-newlib-io-long-long
make; make DESTDIR=/usr/src/gcc/objnvptxinst install

and

../configure --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu --enable-offload-targets=nvptx-none=/usr/src/gcc/objnvptxinst --disable-bootstrap --with-cuda-driver-include=/usr/local/cuda-6.5/include --with-cuda-driver-lib=/usr/local/cuda-6.5/lib64
make; make DESTDIR=/usr/src/gcc/objnvptxinst install

compilers now build, but offloading fails:

/usr/src/gcc/objnvptxinst/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/5.0.0//accel/nvptx-none/mkoffload @/tmp/cce9PdmR
x86_64-pc-linux-gnu-accel-nvptx-none-gcc: error: language lto not recognized
x86_64-pc-linux-gnu-accel-nvptx-none-gcc: error: language lto not recognized
mkoffload: fatal error: /usr/src/gcc/objnvptxinst/usr/local/bin/x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned 1 exit status
compilation terminated.
lto-wrapper: fatal error: /usr/src/gcc/objnvptxinst/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/5.0.0//accel/nvptx-none/mkoffload returned 1 exit status
compilation terminated.
/usr/bin/ld: lto-wrapper failed
collect2: error: ld returned 1 exit status

Is --enable-languages=c,c++,fortran,lto required when configuring the
offload compiler?  It isn't required for intelmic.

	Jakub

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: nvptx offloading patches [3/n], RFD
  2015-02-18 11:34                 ` Jakub Jelinek
@ 2015-02-18 12:10                   ` Thomas Schwinge
  2015-02-18 12:35                     ` Jakub Jelinek
  0 siblings, 1 reply; 42+ messages in thread
From: Thomas Schwinge @ 2015-02-18 12:10 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Ilya Verbin, Bernd Schmidt, Richard Biener, Jan Hubicka,
	gcc-patches, Joseph Myers

[-- Attachment #1: Type: text/plain, Size: 3586 bytes --]

Hi Jakub!

(Will respond to your other questions later.)


On Wed, 18 Feb 2015 12:34:38 +0100, Jakub Jelinek <jakub@redhat.com> wrote:
> On Wed, Feb 18, 2015 at 10:12:19AM +0100, Thomas Schwinge wrote:
> > Do you literally have »nvptx-newlib symlinked into the gcc tree as
> > newlib«?  If yes, then that should explain the problem: as I wrote in
> > <http://news.gmane.org/find-root.php?message_id=%3C87egq8mir1.fsf%40schwinge.name%3E>,
> > you need to »add a symbolic link to nvptx-newlib's newlib directory to
> > the directory containing the GCC sources«, so not link [GCC]/newlib ->
> > [newlib-nvptx], but [GCC]/newlib -> [newlib-nvptx]/newlib.  Does that
> > resolve the issue?

(It did.)  Can you suggest a better wording, to make this more clear in
the documentation?


> BTW, --with-cuda-driver-{include,lib} are apparently not documented in
> gcc/doc/ (--with-cuda-driver neither, but can't use that, as lib is
> /usr/local/cuda-6.5/lib64 in my case), and isn't documented on wiki/Offloading
> either.

Thanks for reporting; will fix that.


> ../configure --target=nvptx-none --enable-as-accelerator-for=x86_64-pc-linux-gnu --with-build-time-tools=/usr/src/gcc/objnvptxinst/usr/local/nvptx-none/bin --disable-sjlj-exceptions --enable-newlib-io-long-long
> make; make DESTDIR=/usr/src/gcc/objnvptxinst install
> 
> and
> 
> ../configure --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu --enable-offload-targets=nvptx-none=/usr/src/gcc/objnvptxinst --disable-bootstrap --with-cuda-driver-include=/usr/local/cuda-6.5/include --with-cuda-driver-lib=/usr/local/cuda-6.5/lib64
> make; make DESTDIR=/usr/src/gcc/objnvptxinst install
> 
> compilers now build

That looks very similar to what I'm using.  I currently install into
separate prefixes/DESTDIRS, because I have not yet verified that there
is no overlap in the installed files.


> offloading fails:
> 
> /usr/src/gcc/objnvptxinst/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/5.0.0//accel/nvptx-none/mkoffload @/tmp/cce9PdmR
> x86_64-pc-linux-gnu-accel-nvptx-none-gcc: error: language lto not recognized
> x86_64-pc-linux-gnu-accel-nvptx-none-gcc: error: language lto not recognized
> mkoffload: fatal error: /usr/src/gcc/objnvptxinst/usr/local/bin/x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned 1 exit status
> compilation terminated.
> lto-wrapper: fatal error: /usr/src/gcc/objnvptxinst/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/5.0.0//accel/nvptx-none/mkoffload returned 1 exit status
> compilation terminated.
> /usr/bin/ld: lto-wrapper failed
> collect2: error: ld returned 1 exit status
> 
> Is --enable-languages=c,c++,fortran,lto required when configuring the
> offload compiler?  It isn't required for intelmic.

Yes, exactly.  I assume the reason is that x86_64-intelmicemul-linux-gnu
defaults to supporting LTO, and due to this also defaults to building the
LTO front end.  I'll enhance the nvptx offloading documentation
accordingly.  Maybe we should add some "magic" to build the LTO front end
if --enable-as-accelerator-for=[...] has been specified?


Note that I recently added another prerequisite patch for nvptx
offloading to <https://gcc.gnu.org/wiki/Offloading#nvptx_Offloading>:
<http://news.gmane.org/find-root.php?message_id=%3C546CF508.9010807%40codesourcery.com%3E>.
If that is not applied, you'll get run-time errors because in
libgomp/plugin/plugin-nvptx.c:GOMP_OFFLOAD_get_table, cuModuleGetFunction
can't find main$_omp_fn$0 and similar symbols.


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: nvptx offloading patches [3/n], RFD
  2015-02-18 12:10                   ` Thomas Schwinge
@ 2015-02-18 12:35                     ` Jakub Jelinek
  2015-02-19 10:50                       ` If we're building an offloading compiler, always enable the LTO front end (was: nvptx offloading patches [3/n], RFD) Thomas Schwinge
  0 siblings, 1 reply; 42+ messages in thread
From: Jakub Jelinek @ 2015-02-18 12:35 UTC (permalink / raw)
  To: Thomas Schwinge
  Cc: Ilya Verbin, Bernd Schmidt, Richard Biener, Jan Hubicka,
	gcc-patches, Joseph Myers

On Wed, Feb 18, 2015 at 01:09:53PM +0100, Thomas Schwinge wrote:
> On Wed, 18 Feb 2015 12:34:38 +0100, Jakub Jelinek <jakub@redhat.com> wrote:
> > On Wed, Feb 18, 2015 at 10:12:19AM +0100, Thomas Schwinge wrote:
> > > Do you literally have »nvptx-newlib symlinked into the gcc tree as
> > > newlib«?  If yes, then that should explain the problem: as I wrote in
> > > <http://news.gmane.org/find-root.php?message_id=%3C87egq8mir1.fsf%40schwinge.name%3E>,
> > > you need to »add a symbolic link to nvptx-newlib's newlib directory to
> > > the directory containing the GCC sources«, so not link [GCC]/newlib ->
> > > [newlib-nvptx], but [GCC]/newlib -> [newlib-nvptx]/newlib.  Does that
> > > resolve the issue?
> 
> (It did.)  Can you suggest a better wording, to make this more clear in
> the documentation?

Your wording is fine, but should be listed on wiki/Offloading and
doc/install.texi perhaps too?

> > offloading fails:
> > 
> > /usr/src/gcc/objnvptxinst/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/5.0.0//accel/nvptx-none/mkoffload @/tmp/cce9PdmR
> > x86_64-pc-linux-gnu-accel-nvptx-none-gcc: error: language lto not recognized
> > x86_64-pc-linux-gnu-accel-nvptx-none-gcc: error: language lto not recognized
> > mkoffload: fatal error: /usr/src/gcc/objnvptxinst/usr/local/bin/x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned 1 exit status
> > compilation terminated.
> > lto-wrapper: fatal error: /usr/src/gcc/objnvptxinst/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/5.0.0//accel/nvptx-none/mkoffload returned 1 exit status
> > compilation terminated.
> > /usr/bin/ld: lto-wrapper failed
> > collect2: error: ld returned 1 exit status
> > 
> > Is --enable-languages=c,c++,fortran,lto required when configuring the
> > offload compiler?  It isn't required for intelmic.
> 
> Yes, exactly.  I assume the reason is that x86_64-intelmicemul-linux-gnu
> defaults to supporting LTO, and due to this also defaults to building the
> LTO front end.  I'll enhance the nvptx offloading documentation
> accordingly.  Maybe we should add some "magic" to build the LTO front end
> if --enable-as-accelerator-for=[...] has been specified?

Toplevel configure.ac has:
  # If LTO is enabled, add the LTO front end.
  if test "$enable_lto" = "yes" ; then
    case ,${enable_languages}, in
      *,lto,*) ;;
      *) enable_languages="${enable_languages},lto" ;;
    esac
    if test "${build_lto_plugin}" = "yes" ; then
      configdirs="$configdirs lto-plugin"
    fi
  fi
so IMHO we want similar snippet for the --enable-as-accelerator-for= case,
perhaps right below this one.  Not building lto FE for the accelerator
compilers make them completely useless, thus I think we really want to do
that automatically.

> Note that I recently added another prerequisite patch for nvptx
> offloading to <https://gcc.gnu.org/wiki/Offloading#nvptx_Offloading>:
> <http://news.gmane.org/find-root.php?message_id=%3C546CF508.9010807%40codesourcery.com%3E>.
> If that is not applied, you'll get run-time errors because in
> libgomp/plugin/plugin-nvptx.c:GOMP_OFFLOAD_get_table, cuModuleGetFunction
> can't find main$_omp_fn$0 and similar symbols.

Can you adjust that to add a cgraph flag alongside of the offloadable
instead and use that instead of the attribute?

	Jakub

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: nvptx offloading patches [3/n], RFD
  2015-02-17 16:40             ` Jakub Jelinek
  2015-02-18  9:12               ` Thomas Schwinge
@ 2015-02-19 10:20               ` Bernd Schmidt
  2015-02-19 12:02                 ` Offloading compilers' support libraries (was: nvptx offloading patches [3/n], RFD) Thomas Schwinge
  2015-02-20  9:33                 ` Offloading compilers' libgcc installation (was: nvptx offloading patches [3/n], RFD) Thomas Schwinge
  1 sibling, 2 replies; 42+ messages in thread
From: Bernd Schmidt @ 2015-02-19 10:20 UTC (permalink / raw)
  To: Jakub Jelinek, Joseph Myers
  Cc: Ilya Verbin, Thomas Schwinge, Richard Biener, Jan Hubicka, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2040 bytes --]

On 02/17/2015 05:40 PM, Jakub Jelinek wrote:
> On Tue, Feb 17, 2015 at 04:21:06PM +0000, Joseph Myers wrote:
>> On Tue, 17 Feb 2015, Jakub Jelinek wrote:
>>
>>> Third attempt failed with:
>>> ../../../libgcc/config/nvptx/realloc.c:24:20: fatal error: stdlib.h: No such file or directory
>>> compilation terminated.
>>> ../../../libgcc/static-object.mk:17: recipe for target 'realloc.o' failed
>>> make[2]: *** [realloc.o] Error 1
>>> make[2]: *** Waiting for unfinished jobs....
>>> make[2]: Leaving directory '/usr/src/gcc/objnvptx/nvptx-none/libgcc'
>>> I have nvptx-newlib symlinked into the gcc tree as newlib, so I expected it
>>> would be built in-tree, is that not the case (at least wiki/Offloading
>>> mentions that).  Or is it just that libgcc can't really have dependencies on
>>> newlib headers as newlib is built after libgcc?
>>
>> I've committed this patch to fix this last issue (the header dependence,
>> that is; I don't know about the in-tree build).
>
> Thanks, sure, libgcc now builds fine, the in-tree build fails:
> configure:4261: checking for C compiler default output file name
> configure:4283: /usr/src/gcc/objnvptx/./gcc/xgcc -B/usr/src/gcc/objnvptx/./gcc/ -nostdinc -B/usr/src/gcc/objnvptx/nvptx-none/newlib/ -isystem /usr/src/gcc/objnvptx/nvptx-none/newlib/targ-include -isystem /usr/src/gcc/newlib/libc/include -B/usr/local/nvptx-none/bin/ -B/usr/local/nvptx-none/lib/ -isystem /usr/local/nvptx-none/include -isystem /usr/local/nvptx-none/sys-include    -g -O2   conftest.c  >&5
> error opening libc.a
> collect2: error: ld returned 1 exit status
> very early during in-tree newlib configure.

Not a fix for your problem, but there's a similar issue when trying to 
get at the libgcc for the nvptx accel compiler after it's been 
installed. The libgcc Makefile puts it in the wrong place - 
gcc/nvptx-none/accel/nvptx-none instead of gcc/host/accel/nvptx-none. 
The patch below corrects that and removes an intelmicemul special case 
which I believe has the same effect - Ilya, could you test this?


Bernd


[-- Attachment #2: lgcc-ptx.diff --]
[-- Type: text/x-patch, Size: 2944 bytes --]

Index: libgcc/Makefile.in
===================================================================
--- libgcc/Makefile.in	(revision 445788)
+++ libgcc/Makefile.in	(working copy)
@@ -45,6 +45,7 @@ fixed_point = @fixed_point@
 with_aix_soname = @with_aix_soname@
 
 host_noncanonical = @host_noncanonical@
+real_host_noncanonical = @real_host_noncanonical@
 target_noncanonical = @target_noncanonical@
 
 # List of extra object files that should be compiled for this target machine.
@@ -185,7 +186,7 @@ STRIP = @STRIP@
 STRIP_FOR_TARGET = $(STRIP)
 
 # Directory in which the compiler finds libraries etc.
-libsubdir = $(libdir)/gcc/$(host_noncanonical)/$(version)@accel_dir_suffix@
+libsubdir = $(libdir)/gcc/$(real_host_noncanonical)/$(version)@accel_dir_suffix@
 # Used to install the shared libgcc.
 slibdir = @slibdir@
 # Maybe used for DLLs on Windows targets.
Index: libgcc/configure.ac
===================================================================
--- libgcc/configure.ac	(revision 445788)
+++ libgcc/configure.ac	(working copy)
@@ -398,16 +398,14 @@ esac
 
 # Used for constructing correct paths for offload compilers.
 accel_dir_suffix=
+real_host_noncanonical=${host_noncanonical}
+echo "eaaf: $enable_as_accelerator_for"
 if test x"$enable_as_accelerator_for" != x; then
   accel_dir_suffix=/accel/${target_noncanonical}
-  case "${target_noncanonical}" in
-    *-intelmicemul-*)
-      # In this case we expect offload compiler to be built as native, so we
-      # need to change install directory for driver to be able to find libgcc.
-      host_noncanonical=${enable_as_accelerator_for} ;;
-  esac
+  real_host_noncanonical=${enable_as_accelerator_for}
 fi
 AC_SUBST(accel_dir_suffix)
+AC_SUBST(real_host_noncanonical)
 
 if test x"$enable_offload_targets" != x; then
   extra_parts="${extra_parts} crtoffloadbegin.o crtoffloadend.o"
Index: libgcc/configure
===================================================================
--- libgcc/configure	(revision 445788)
+++ libgcc/configure	(working copy)
@@ -566,6 +566,7 @@ sfp_machine_header
 set_use_emutls
 set_have_cc_tls
 vis_hide
+real_host_noncanonical
 accel_dir_suffix
 force_explicit_eh_registry
 fixed_point
@@ -4482,17 +4483,15 @@ esac
 
 # Used for constructing correct paths for offload compilers.
 accel_dir_suffix=
+real_host_noncanonical=${host_noncanonical}
+echo "eaaf: $enable_as_accelerator_for"
 if test x"$enable_as_accelerator_for" != x; then
   accel_dir_suffix=/accel/${target_noncanonical}
-  case "${target_noncanonical}" in
-    *-intelmicemul-*)
-      # In this case we expect offload compiler to be built as native, so we
-      # need to change install directory for driver to be able to find libgcc.
-      host_noncanonical=${enable_as_accelerator_for} ;;
-  esac
+  real_host_noncanonical=${enable_as_accelerator_for}
 fi
 
 
+
 if test x"$enable_offload_targets" != x; then
   extra_parts="${extra_parts} crtoffloadbegin.o crtoffloadend.o"
 fi

^ permalink raw reply	[flat|nested] 42+ messages in thread

* If we're building an offloading compiler, always enable the LTO front end (was: nvptx offloading patches [3/n], RFD)
  2015-02-18 12:35                     ` Jakub Jelinek
@ 2015-02-19 10:50                       ` Thomas Schwinge
  2015-02-19 10:53                         ` Jakub Jelinek
  0 siblings, 1 reply; 42+ messages in thread
From: Thomas Schwinge @ 2015-02-19 10:50 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Ilya Verbin, Bernd Schmidt, Richard Biener, Jan Hubicka,
	gcc-patches, Joseph Myers

[-- Attachment #1: Type: text/plain, Size: 3753 bytes --]

Hi!

On Wed, 18 Feb 2015 13:35:18 +0100, Jakub Jelinek <jakub@redhat.com> wrote:
> On Wed, Feb 18, 2015 at 01:09:53PM +0100, Thomas Schwinge wrote:
> > On Wed, 18 Feb 2015 12:34:38 +0100, Jakub Jelinek <jakub@redhat.com> wrote:
> > > offloading fails:
> > > 
> > > /usr/src/gcc/objnvptxinst/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/5.0.0//accel/nvptx-none/mkoffload @/tmp/cce9PdmR
> > > x86_64-pc-linux-gnu-accel-nvptx-none-gcc: error: language lto not recognized
> > > x86_64-pc-linux-gnu-accel-nvptx-none-gcc: error: language lto not recognized
> > > mkoffload: fatal error: /usr/src/gcc/objnvptxinst/usr/local/bin/x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned 1 exit status
> > > compilation terminated.
> > > lto-wrapper: fatal error: /usr/src/gcc/objnvptxinst/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/5.0.0//accel/nvptx-none/mkoffload returned 1 exit status
> > > compilation terminated.
> > > /usr/bin/ld: lto-wrapper failed
> > > collect2: error: ld returned 1 exit status
> > > 
> > > Is --enable-languages=c,c++,fortran,lto required when configuring the
> > > offload compiler?  It isn't required for intelmic.
> > 
> > Yes, exactly.  I assume the reason is that x86_64-intelmicemul-linux-gnu
> > defaults to supporting LTO, and due to this also defaults to building the
> > LTO front end.  I'll enhance the nvptx offloading documentation
> > accordingly.  Maybe we should add some "magic" to build the LTO front end
> > if --enable-as-accelerator-for=[...] has been specified?
> 
> Toplevel configure.ac has:
>   # If LTO is enabled, add the LTO front end.
>   if test "$enable_lto" = "yes" ; then
>     case ,${enable_languages}, in
>       *,lto,*) ;;
>       *) enable_languages="${enable_languages},lto" ;;
>     esac
>     if test "${build_lto_plugin}" = "yes" ; then
>       configdirs="$configdirs lto-plugin"
>     fi
>   fi
> so IMHO we want similar snippet for the --enable-as-accelerator-for= case,
> perhaps right below this one.  Not building lto FE for the accelerator
> compilers make them completely useless, thus I think we really want to do
> that automatically.

Like this?

commit 56c0312469f583ba3fa9fa2777981742ab6d6c75
Author: Thomas Schwinge <thomas@codesourcery.com>
Date:   Thu Feb 19 11:41:23 2015 +0100

    If we're building an offloading compiler, always enable the LTO front end.
    
    	* configure.ac [--enable-as-accelerator-for] (enable_languages):
    	Make sure it contains lto.
    	* configure: Regenerate.
---
 configure    |    8 ++++++++
 configure.ac |    8 ++++++++
 2 files changed, 16 insertions(+)

diff --git configure configure
index dd794db..2afc52b 100755
--- configure
+++ configure
@@ -6217,6 +6217,14 @@ if test -d ${srcdir}/gcc; then
     fi
   fi
 
+  # If we're building an offloading compiler, add the LTO front end.
+  if test x"$enable_as_accelerator_for" != x ; then
+    case ,${enable_languages}, in
+      *,lto,*) ;;
+      *) enable_languages="${enable_languages},lto" ;;
+    esac
+  fi
+
   missing_languages=`echo ",$enable_languages," | sed -e s/,all,/,/ -e s/,c,/,/ `
   potential_languages=,c,
 
diff --git configure.ac configure.ac
index 4ea5e00..08a6fbf 100644
--- configure.ac
+++ configure.ac
@@ -1918,6 +1918,14 @@ if test -d ${srcdir}/gcc; then
     fi
   fi
 
+  # If we're building an offloading compiler, add the LTO front end.
+  if test x"$enable_as_accelerator_for" != x ; then
+    case ,${enable_languages}, in
+      *,lto,*) ;;
+      *) enable_languages="${enable_languages},lto" ;;
+    esac
+  fi
+
   missing_languages=`echo ",$enable_languages," | sed -e s/,all,/,/ -e s/,c,/,/ `
   potential_languages=,c,
 


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: If we're building an offloading compiler, always enable the LTO front end (was: nvptx offloading patches [3/n], RFD)
  2015-02-19 10:50                       ` If we're building an offloading compiler, always enable the LTO front end (was: nvptx offloading patches [3/n], RFD) Thomas Schwinge
@ 2015-02-19 10:53                         ` Jakub Jelinek
  2015-02-20  9:42                           ` Thomas Schwinge
  0 siblings, 1 reply; 42+ messages in thread
From: Jakub Jelinek @ 2015-02-19 10:53 UTC (permalink / raw)
  To: Thomas Schwinge
  Cc: Ilya Verbin, Bernd Schmidt, Richard Biener, Jan Hubicka,
	gcc-patches, Joseph Myers

On Thu, Feb 19, 2015 at 11:48:17AM +0100, Thomas Schwinge wrote:
> Like this?

Yes.

> commit 56c0312469f583ba3fa9fa2777981742ab6d6c75
> Author: Thomas Schwinge <thomas@codesourcery.com>
> Date:   Thu Feb 19 11:41:23 2015 +0100
> 
>     If we're building an offloading compiler, always enable the LTO front end.
>     
>     	* configure.ac [--enable-as-accelerator-for] (enable_languages):
>     	Make sure it contains lto.
>     	* configure: Regenerate.

Ok for trunk.

	Jakub

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Offloading compilers' support libraries (was: nvptx offloading patches [3/n], RFD)
  2015-02-19 10:20               ` nvptx offloading patches [3/n], RFD Bernd Schmidt
@ 2015-02-19 12:02                 ` Thomas Schwinge
  2015-02-19 12:11                   ` Offloading compilers' support libraries Bernd Schmidt
  2015-02-20  9:33                 ` Offloading compilers' libgcc installation (was: nvptx offloading patches [3/n], RFD) Thomas Schwinge
  1 sibling, 1 reply; 42+ messages in thread
From: Thomas Schwinge @ 2015-02-19 12:02 UTC (permalink / raw)
  To: Bernd Schmidt, Jakub Jelinek, Ilya Verbin
  Cc: Richard Biener, Jan Hubicka, gcc-patches, Joseph Myers,
	andrey.turetskiy, kirill.yukhin

[-- Attachment #1: Type: text/plain, Size: 5699 bytes --]

Hi!

On Thu, 19 Feb 2015 10:28:46 +0100, Bernd Schmidt <bernds@codesourcery.com> wrote:
> On 02/17/2015 05:40 PM, Jakub Jelinek wrote:
> > On Tue, Feb 17, 2015 at 04:21:06PM +0000, Joseph Myers wrote:
> >> On Tue, 17 Feb 2015, Jakub Jelinek wrote:
> >>
> >>> Third attempt failed with:
> >>> ../../../libgcc/config/nvptx/realloc.c:24:20: fatal error: stdlib.h: No such file or directory
> >>> compilation terminated.
> >>> ../../../libgcc/static-object.mk:17: recipe for target 'realloc.o' failed
> >>> make[2]: *** [realloc.o] Error 1
> >>> make[2]: *** Waiting for unfinished jobs....
> >>> make[2]: Leaving directory '/usr/src/gcc/objnvptx/nvptx-none/libgcc'
> >>> I have nvptx-newlib symlinked into the gcc tree as newlib, so I expected it
> >>> would be built in-tree, is that not the case (at least wiki/Offloading
> >>> mentions that).  Or is it just that libgcc can't really have dependencies on
> >>> newlib headers as newlib is built after libgcc?
> >>
> >> I've committed this patch to fix this last issue (the header dependence,
> >> that is; I don't know about the in-tree build).
> >
> > Thanks, sure, libgcc now builds fine, the in-tree build fails:
> > configure:4261: checking for C compiler default output file name
> > configure:4283: /usr/src/gcc/objnvptx/./gcc/xgcc -B/usr/src/gcc/objnvptx/./gcc/ -nostdinc -B/usr/src/gcc/objnvptx/nvptx-none/newlib/ -isystem /usr/src/gcc/objnvptx/nvptx-none/newlib/targ-include -isystem /usr/src/gcc/newlib/libc/include -B/usr/local/nvptx-none/bin/ -B/usr/local/nvptx-none/lib/ -isystem /usr/local/nvptx-none/include -isystem /usr/local/nvptx-none/sys-include    -g -O2   conftest.c  >&5
> > error opening libc.a
> > collect2: error: ld returned 1 exit status
> > very early during in-tree newlib configure.
> 
> Not a fix for your problem, but there's a similar issue when trying to 
> get at the libgcc for the nvptx accel compiler after it's been 
> installed. The libgcc Makefile puts it in the wrong place - 
> gcc/nvptx-none/accel/nvptx-none instead of gcc/host/accel/nvptx-none. 

I also wondered about this; it's somewhere on my TODO list...

> The patch below corrects that and removes an intelmicemul special case 
> which I believe has the same effect - Ilya, could you test this?

This code has originally been posted in
<http://news.gmane.org/find-root.php?message_id=%3C20140926123551.GA6892%40msticlxl57.ims.intel.com%3E>.

This specific buglet aside (that the handling of intelmic and nvptx
offloading is inconsistent) -- will we have to add such handling to each
and every library that is built for the offloading compilers?  (Including
libraries that aren't part of the GCC sources, but may be built as part
of GCC's build process, such as when newlib is linked into [GCC]/newlib?)

One step back -- I understand correctly that this change is to make sure
that the regular target compiler and the offloading compilers don't clash
in their installed files' names?  (By putting them into the
accel/[offloading architecture]/ subdirectory?)  (As I've written in
<http://news.gmane.org/find-root.php?message_id=%3C87vbize7zi.fsf%40schwinge.name%3E>,
I currently install into separate prefixes/DESTDIRS, because I have not
yet verified that there is no overlap in the installed files.)

Then, why does this only apply to libsubdir?  What about header files,
documentation files, and so on?  (If they aren't expected to differ
between the target and offloading compilers, I think it's still not a
good idea to arbitrarely have them be overwritten by on respective build
tree's make install process.)  Should we have a more general solution to
this problem?

> Index: libgcc/Makefile.in
> ===================================================================
> --- libgcc/Makefile.in	(revision 445788)
> +++ libgcc/Makefile.in	(working copy)
> @@ -45,6 +45,7 @@ fixed_point = @fixed_point@
>  with_aix_soname = @with_aix_soname@
>  
>  host_noncanonical = @host_noncanonical@
> +real_host_noncanonical = @real_host_noncanonical@
>  target_noncanonical = @target_noncanonical@
>  
>  # List of extra object files that should be compiled for this target machine.
> @@ -185,7 +186,7 @@ STRIP = @STRIP@
>  STRIP_FOR_TARGET = $(STRIP)
>  
>  # Directory in which the compiler finds libraries etc.
> -libsubdir = $(libdir)/gcc/$(host_noncanonical)/$(version)@accel_dir_suffix@
> +libsubdir = $(libdir)/gcc/$(real_host_noncanonical)/$(version)@accel_dir_suffix@
>  # Used to install the shared libgcc.
>  slibdir = @slibdir@
>  # Maybe used for DLLs on Windows targets.
> Index: libgcc/configure.ac
> ===================================================================
> --- libgcc/configure.ac	(revision 445788)
> +++ libgcc/configure.ac	(working copy)
> @@ -398,16 +398,14 @@ esac
>  
>  # Used for constructing correct paths for offload compilers.
>  accel_dir_suffix=
> +real_host_noncanonical=${host_noncanonical}
> +echo "eaaf: $enable_as_accelerator_for"
>  if test x"$enable_as_accelerator_for" != x; then
>    accel_dir_suffix=/accel/${target_noncanonical}
> -  case "${target_noncanonical}" in
> -    *-intelmicemul-*)
> -      # In this case we expect offload compiler to be built as native, so we
> -      # need to change install directory for driver to be able to find libgcc.
> -      host_noncanonical=${enable_as_accelerator_for} ;;
> -  esac
> +  real_host_noncanonical=${enable_as_accelerator_for}
>  fi
>  AC_SUBST(accel_dir_suffix)
> +AC_SUBST(real_host_noncanonical)
>  
>  if test x"$enable_offload_targets" != x; then
>    extra_parts="${extra_parts} crtoffloadbegin.o crtoffloadend.o"


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Offloading compilers' support libraries
  2015-02-19 12:02                 ` Offloading compilers' support libraries (was: nvptx offloading patches [3/n], RFD) Thomas Schwinge
@ 2015-02-19 12:11                   ` Bernd Schmidt
  2015-02-19 12:19                     ` Thomas Schwinge
  0 siblings, 1 reply; 42+ messages in thread
From: Bernd Schmidt @ 2015-02-19 12:11 UTC (permalink / raw)
  To: Thomas Schwinge, Jakub Jelinek, Ilya Verbin
  Cc: Richard Biener, Jan Hubicka, gcc-patches, Joseph Myers,
	andrey.turetskiy, kirill.yukhin

On 02/19/2015 12:42 PM, Thomas Schwinge wrote:
> This specific buglet aside (that the handling of intelmic and nvptx
> offloading is inconsistent) -- will we have to add such handling to each
> and every library that is built for the offloading compilers?  (Including
> libraries that aren't part of the GCC sources, but may be built as part
> of GCC's build process, such as when newlib is linked into [GCC]/newlib?)

No, they go into different directories. Only libgcc.a (along with a very 
few other pieces) is installed under lib/gcc/...

> Then, why does this only apply to libsubdir?  What about header files,
> documentation files, and so on?  (If they aren't expected to differ
> between the target and offloading compilers, I think it's still not a
> good idea to arbitrarely have them be overwritten by on respective build
> tree's make install process.)  Should we have a more general solution to
> this problem?

That stuff goes into the normal lib and include directories. I'm 
guessing a sysroot is what you want to keep it separate.


Bernd

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Offloading compilers' support libraries
  2015-02-19 12:11                   ` Offloading compilers' support libraries Bernd Schmidt
@ 2015-02-19 12:19                     ` Thomas Schwinge
  2015-02-20 15:35                       ` Ilya Verbin
  0 siblings, 1 reply; 42+ messages in thread
From: Thomas Schwinge @ 2015-02-19 12:19 UTC (permalink / raw)
  To: Bernd Schmidt, Jakub Jelinek, Ilya Verbin
  Cc: Richard Biener, Jan Hubicka, gcc-patches, Joseph Myers,
	andrey.turetskiy, kirill.yukhin

[-- Attachment #1: Type: text/plain, Size: 1722 bytes --]

Hi!

On Thu, 19 Feb 2015 13:08:20 +0100, Bernd Schmidt <bernds@codesourcery.com> wrote:
> On 02/19/2015 12:42 PM, Thomas Schwinge wrote:
> > This specific buglet aside (that the handling of intelmic and nvptx
> > offloading is inconsistent) -- will we have to add such handling to each
> > and every library that is built for the offloading compilers?  (Including
> > libraries that aren't part of the GCC sources, but may be built as part
> > of GCC's build process, such as when newlib is linked into [GCC]/newlib?)
> 
> No, they go into different directories. Only libgcc.a (along with a very 
> few other pieces) is installed under lib/gcc/...

Thanks, I see.

> > Then, why does this only apply to libsubdir?  What about header files,
> > documentation files, and so on?  (If they aren't expected to differ
> > between the target and offloading compilers, I think it's still not a
> > good idea to arbitrarely have them be overwritten by on respective build
> > tree's make install process.)  Should we have a more general solution to
> > this problem?
> 
> That stuff goes into the normal lib and include directories. I'm 
> guessing a sysroot is what you want to keep it separate.

My asumption is that it is always safe to install non-native (that is
cross) GCC installations into the same prefix.  (Which would resolve this
problem of clashing file names for target and offloading compilers for
good.)

So, the next question is, instead of this special handling, why can't we
require the offloading compilers to always be configured as cross
compilers?  Or, why is it a requirement that the intelmic offloading
compiler is configured as a native compiler?


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Offloading compilers' libgcc installation (was: nvptx offloading patches [3/n], RFD)
  2015-02-19 10:20               ` nvptx offloading patches [3/n], RFD Bernd Schmidt
  2015-02-19 12:02                 ` Offloading compilers' support libraries (was: nvptx offloading patches [3/n], RFD) Thomas Schwinge
@ 2015-02-20  9:33                 ` Thomas Schwinge
  2015-02-20 19:32                   ` Ilya Verbin
  1 sibling, 1 reply; 42+ messages in thread
From: Thomas Schwinge @ 2015-02-20  9:33 UTC (permalink / raw)
  To: Bernd Schmidt
  Cc: Ilya Verbin, Richard Biener, Jan Hubicka, gcc-patches,
	Jakub Jelinek, Joseph Myers

[-- Attachment #1: Type: text/plain, Size: 959 bytes --]

Hi Bernd!

On Thu, 19 Feb 2015 10:28:46 +0100, Bernd Schmidt <bernds@codesourcery.com> wrote:
> issue when trying to 
> get at the libgcc for the nvptx accel compiler after it's been 
> installed. The libgcc Makefile puts it in the wrong place - 
> gcc/nvptx-none/accel/nvptx-none instead of gcc/host/accel/nvptx-none. 
> The patch below corrects that and removes an intelmicemul special case 
> which I believe has the same effect - Ilya, could you test this?

Works fine for me for intelmic (no changes), and nvptx (changes as
expected).

You'll want to remove the following debugging print statement before
commit:

> --- libgcc/configure.ac	(revision 445788)
> +++ libgcc/configure.ac	(working copy)
> @@ -398,16 +398,14 @@ esac
>  
>  # Used for constructing correct paths for offload compilers.
>  accel_dir_suffix=
> +real_host_noncanonical=${host_noncanonical}
> +echo "eaaf: $enable_as_accelerator_for"


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: If we're building an offloading compiler, always enable the LTO front end (was: nvptx offloading patches [3/n], RFD)
  2015-02-19 10:53                         ` Jakub Jelinek
@ 2015-02-20  9:42                           ` Thomas Schwinge
  0 siblings, 0 replies; 42+ messages in thread
From: Thomas Schwinge @ 2015-02-20  9:42 UTC (permalink / raw)
  To: Jakub Jelinek, gcc-patches
  Cc: Ilya Verbin, Bernd Schmidt, Richard Biener, Jan Hubicka, Joseph Myers

[-- Attachment #1: Type: text/plain, Size: 652 bytes --]

Hi!

On Thu, 19 Feb 2015 11:51:02 +0100, Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, Feb 19, 2015 at 11:48:17AM +0100, Thomas Schwinge wrote:
> > Like this?
> 
> Yes.
> 
> > commit 56c0312469f583ba3fa9fa2777981742ab6d6c75
> > Author: Thomas Schwinge <thomas@codesourcery.com>
> > Date:   Thu Feb 19 11:41:23 2015 +0100
> > 
> >     If we're building an offloading compiler, always enable the LTO front end.
> >     
> >     	* configure.ac [--enable-as-accelerator-for] (enable_languages):
> >     	Make sure it contains lto.
> >     	* configure: Regenerate.
> 
> Ok for trunk.

Committed in r220838.


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Offloading compilers' support libraries
  2015-02-19 12:19                     ` Thomas Schwinge
@ 2015-02-20 15:35                       ` Ilya Verbin
  2015-02-20 19:59                         ` Ilya Verbin
  0 siblings, 1 reply; 42+ messages in thread
From: Ilya Verbin @ 2015-02-20 15:35 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: Bernd Schmidt, Jakub Jelinek, gcc-patches, Kirill Yukhin

On Thu, Feb 19, 2015 at 13:17:37 +0100, Thomas Schwinge wrote:
> My asumption is that it is always safe to install non-native (that is
> cross) GCC installations into the same prefix.  (Which would resolve this
> problem of clashing file names for target and offloading compilers for
> good.)
> 
> So, the next question is, instead of this special handling, why can't we
> require the offloading compilers to always be configured as cross
> compilers?  Or, why is it a requirement that the intelmic offloading
> compiler is configured as a native compiler?

If I understand correctly, to build a cross compiler, we need to specify a path
to the target sysroot, even for x86_64-pc-linux-gnu to x86_64-intelmic-linux-gnu
cross.  Or is it possible to build a cross compiler without --with-sysroot ?

Thanks,
  -- Ilya

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Offloading compilers' libgcc installation (was: nvptx offloading patches [3/n], RFD)
  2015-02-20  9:33                 ` Offloading compilers' libgcc installation (was: nvptx offloading patches [3/n], RFD) Thomas Schwinge
@ 2015-02-20 19:32                   ` Ilya Verbin
  2015-03-10 12:35                     ` Offloading compilers' libgcc installation Thomas Schwinge
  0 siblings, 1 reply; 42+ messages in thread
From: Ilya Verbin @ 2015-02-20 19:32 UTC (permalink / raw)
  To: Thomas Schwinge, Bernd Schmidt; +Cc: gcc-patches, Jakub Jelinek, Kirill Yukhin

On Fri, Feb 20, 2015 at 10:27:26 +0100, Thomas Schwinge wrote:
> On Thu, 19 Feb 2015 10:28:46 +0100, Bernd Schmidt <bernds@codesourcery.com> wrote:
> > issue when trying to 
> > get at the libgcc for the nvptx accel compiler after it's been 
> > installed. The libgcc Makefile puts it in the wrong place - 
> > gcc/nvptx-none/accel/nvptx-none instead of gcc/host/accel/nvptx-none. 
> > The patch below corrects that and removes an intelmicemul special case 
> > which I believe has the same effect - Ilya, could you test this?
> 
> Works fine for me for intelmic (no changes), and nvptx (changes as
> expected).

OK to me.

Thanks,
  -- Ilya

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Offloading compilers' support libraries
  2015-02-20 15:35                       ` Ilya Verbin
@ 2015-02-20 19:59                         ` Ilya Verbin
  2015-02-26 19:35                           ` Ilya Verbin
  0 siblings, 1 reply; 42+ messages in thread
From: Ilya Verbin @ 2015-02-20 19:59 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: Bernd Schmidt, Jakub Jelinek, gcc-patches, Kirill Yukhin

On Fri, Feb 20, 2015 at 18:05:01 +0300, Ilya Verbin wrote:
> On Thu, Feb 19, 2015 at 13:17:37 +0100, Thomas Schwinge wrote:
> > My asumption is that it is always safe to install non-native (that is
> > cross) GCC installations into the same prefix.  (Which would resolve this
> > problem of clashing file names for target and offloading compilers for
> > good.)
> > 
> > So, the next question is, instead of this special handling, why can't we
> > require the offloading compilers to always be configured as cross
> > compilers?  Or, why is it a requirement that the intelmic offloading
> > compiler is configured as a native compiler?
> 
> If I understand correctly, to build a cross compiler, we need to specify a path
> to the target sysroot, even for x86_64-pc-linux-gnu to x86_64-intelmic-linux-gnu
> cross.  Or is it possible to build a cross compiler without --with-sysroot ?

To be precise, for the cross compiler we need to specify a path to
--with-build-time-tools (rather than --with-sysroot).  The problem is that for
Intel MIC there are no special as/ld/etc.  So, is there an elegant way to build
a cross compiler with host's build time tools?

Thanks,
  -- Ilya

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Patch ping
  2015-02-18 10:00             ` Jakub Jelinek
@ 2015-02-25  8:51               ` Jakub Jelinek
  2015-02-25  9:30                 ` Richard Biener
  0 siblings, 1 reply; 42+ messages in thread
From: Jakub Jelinek @ 2015-02-25  8:51 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

On Wed, Feb 18, 2015 at 11:00:35AM +0100, Jakub Jelinek wrote:
> On Tue, Feb 17, 2015 at 11:00:14AM +0100, Richard Biener wrote:
> > I'm just looking for a way to make this less of a hack (and the LTO IL
> > less target dependent).  Not for GCC 5 for which something like your
> > patch is probably ok, but for the future.
> 
> So, given Ilya's and Thomas' testing, is this acceptable for now, and
> perhaps we can try to do something better for GCC 6?
> 
> Here is the patch with full ChangeLog:

I'd like to ping following patch:
http://gcc.gnu.org/ml/gcc-patches/2015-02/msg01080.html

> 2015-02-18  Jakub Jelinek  <jakub@redhat.com>
> 
> 	* passes.c (ipa_write_summaries_1): Call lto_output_init_mode_table.
> 	(ipa_write_optimization_summaries): Likewise.
> 	* tree-streamer.h: Include data-streamer.h.
> 	(streamer_mode_table): Declare extern variable.
> 	(bp_pack_machine_mode, bp_unpack_machine_mode): New inline functions.
> 	* lto-streamer-out.c (lto_output_init_mode_table,
> 	lto_write_mode_table): New functions.
> 	(produce_asm_for_decls): Call lto_write_mode_table when streaming
> 	offloading LTO.
> 	* lto-section-in.c (lto_section_name): Add "mode_table" entry.
> 	(lto_create_simple_input_block): Add mode_table argument to the
> 	lto_input_block constructors.
> 	* ipa-prop.c (ipa_prop_read_section, read_replacements_section):
> 	Likewise.
> 	* data-streamer-in.c (string_for_index): Likewise.
> 	* ipa-inline-analysis.c (inline_read_section): Likewise.
> 	* ipa-icf.c (sem_item_optimizer::read_section): Likewise.
> 	* lto-cgraph.c (input_cgraph_opt_section): Likewise.
> 	* lto-streamer-in.c (lto_read_body_or_constructor,
> 	lto_input_toplevel_asms): Likewise.
> 	(lto_input_mode_table): New function.
> 	* tree-streamer-out.c (pack_ts_fixed_cst_value_fields,
> 	pack_ts_decl_common_value_fields, pack_ts_type_common_value_fields):
> 	Use bp_pack_machine_mode.
> 	* real.h (struct real_format): Add name field.
> 	* lto-streamer.h (enum lto_section_type): Add LTO_section_mode_table.
> 	(class lto_input_block): Add mode_table member.
> 	(lto_input_block::lto_input_block): Add mode_table_ argument,
> 	initialize mode_table.
> 	(struct lto_file_decl_data): Add mode_table field.
> 	(lto_input_mode_table, lto_output_init_mode_table): New prototypes.
> 	* tree-streamer-in.c (unpack_ts_fixed_cst_value_fields,
> 	unpack_ts_decl_common_value_fields,
> 	unpack_ts_type_common_value_fields): Call bp_unpack_machine_mode.
> 	* tree-streamer.c (streamer_mode_table): New variable.
> 	* real.c (ieee_single_format, mips_single_format,
> 	motorola_single_format, spu_single_format, ieee_double_format,
> 	mips_double_format, motorola_double_format,
> 	ieee_extended_motorola_format, ieee_extended_intel_96_format,
> 	ieee_extended_intel_128_format, ieee_extended_intel_96_round_53_format,
> 	ibm_extended_format, mips_extended_format, ieee_quad_format,
> 	mips_quad_format, vax_f_format, vax_d_format, vax_g_format,
> 	decimal_single_format, decimal_double_format, decimal_quad_format,
> 	ieee_half_format, arm_half_format, real_internal_format): Add name
> 	field.
> 	* config/pdp11/pdp11.c (pdp11_f_format, pdp11_d_format): Likewise.
> lto/
> 	* lto.c (lto_mode_identity_table): New variable.
> 	(lto_read_decls): Add mode_table argument to the lto_input_block
> 	constructor.
> 	(lto_file_finalize): Initialize mode_table.
> 	(lto_init): Initialize lto_mode_identity_table.

	Jakub

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Patch ping
  2015-02-25  8:51               ` Patch ping Jakub Jelinek
@ 2015-02-25  9:30                 ` Richard Biener
  2015-02-25 16:51                   ` Jakub Jelinek
  0 siblings, 1 reply; 42+ messages in thread
From: Richard Biener @ 2015-02-25  9:30 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches

On Wed, 25 Feb 2015, Jakub Jelinek wrote:

> On Wed, Feb 18, 2015 at 11:00:35AM +0100, Jakub Jelinek wrote:
> > On Tue, Feb 17, 2015 at 11:00:14AM +0100, Richard Biener wrote:
> > > I'm just looking for a way to make this less of a hack (and the LTO IL
> > > less target dependent).  Not for GCC 5 for which something like your
> > > patch is probably ok, but for the future.
> > 
> > So, given Ilya's and Thomas' testing, is this acceptable for now, and
> > perhaps we can try to do something better for GCC 6?
> > 
> > Here is the patch with full ChangeLog:
> 
> I'd like to ping following patch:
> http://gcc.gnu.org/ml/gcc-patches/2015-02/msg01080.html

Oops, totally forgot about this one.

Shouldn't

+	    default:
+	      error ("unsupported mode %s\n", mname);

be a fatal_error ()?  After all if we hit this but continue we'll
stream random crap.  I also think we should be a bit more user-centric
here and maybe report "for host / offload target combination".

+static GTY(()) const unsigned char *lto_mode_identity_table;

why in GC memory?

Ok with changes along these lines.

Thanks,
Richard.


> > 2015-02-18  Jakub Jelinek  <jakub@redhat.com>
> > 
> > 	* passes.c (ipa_write_summaries_1): Call lto_output_init_mode_table.
> > 	(ipa_write_optimization_summaries): Likewise.
> > 	* tree-streamer.h: Include data-streamer.h.
> > 	(streamer_mode_table): Declare extern variable.
> > 	(bp_pack_machine_mode, bp_unpack_machine_mode): New inline functions.
> > 	* lto-streamer-out.c (lto_output_init_mode_table,
> > 	lto_write_mode_table): New functions.
> > 	(produce_asm_for_decls): Call lto_write_mode_table when streaming
> > 	offloading LTO.
> > 	* lto-section-in.c (lto_section_name): Add "mode_table" entry.
> > 	(lto_create_simple_input_block): Add mode_table argument to the
> > 	lto_input_block constructors.
> > 	* ipa-prop.c (ipa_prop_read_section, read_replacements_section):
> > 	Likewise.
> > 	* data-streamer-in.c (string_for_index): Likewise.
> > 	* ipa-inline-analysis.c (inline_read_section): Likewise.
> > 	* ipa-icf.c (sem_item_optimizer::read_section): Likewise.
> > 	* lto-cgraph.c (input_cgraph_opt_section): Likewise.
> > 	* lto-streamer-in.c (lto_read_body_or_constructor,
> > 	lto_input_toplevel_asms): Likewise.
> > 	(lto_input_mode_table): New function.
> > 	* tree-streamer-out.c (pack_ts_fixed_cst_value_fields,
> > 	pack_ts_decl_common_value_fields, pack_ts_type_common_value_fields):
> > 	Use bp_pack_machine_mode.
> > 	* real.h (struct real_format): Add name field.
> > 	* lto-streamer.h (enum lto_section_type): Add LTO_section_mode_table.
> > 	(class lto_input_block): Add mode_table member.
> > 	(lto_input_block::lto_input_block): Add mode_table_ argument,
> > 	initialize mode_table.
> > 	(struct lto_file_decl_data): Add mode_table field.
> > 	(lto_input_mode_table, lto_output_init_mode_table): New prototypes.
> > 	* tree-streamer-in.c (unpack_ts_fixed_cst_value_fields,
> > 	unpack_ts_decl_common_value_fields,
> > 	unpack_ts_type_common_value_fields): Call bp_unpack_machine_mode.
> > 	* tree-streamer.c (streamer_mode_table): New variable.
> > 	* real.c (ieee_single_format, mips_single_format,
> > 	motorola_single_format, spu_single_format, ieee_double_format,
> > 	mips_double_format, motorola_double_format,
> > 	ieee_extended_motorola_format, ieee_extended_intel_96_format,
> > 	ieee_extended_intel_128_format, ieee_extended_intel_96_round_53_format,
> > 	ibm_extended_format, mips_extended_format, ieee_quad_format,
> > 	mips_quad_format, vax_f_format, vax_d_format, vax_g_format,
> > 	decimal_single_format, decimal_double_format, decimal_quad_format,
> > 	ieee_half_format, arm_half_format, real_internal_format): Add name
> > 	field.
> > 	* config/pdp11/pdp11.c (pdp11_f_format, pdp11_d_format): Likewise.
> > lto/
> > 	* lto.c (lto_mode_identity_table): New variable.
> > 	(lto_read_decls): Add mode_table argument to the lto_input_block
> > 	constructor.
> > 	(lto_file_finalize): Initialize mode_table.
> > 	(lto_init): Initialize lto_mode_identity_table.
> 
> 	Jakub
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild,
Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Patch ping
  2015-02-25  9:30                 ` Richard Biener
@ 2015-02-25 16:51                   ` Jakub Jelinek
  0 siblings, 0 replies; 42+ messages in thread
From: Jakub Jelinek @ 2015-02-25 16:51 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

On Wed, Feb 25, 2015 at 10:10:52AM +0100, Richard Biener wrote:
> Oops, totally forgot about this one.
> 
> Shouldn't
> 
> +	    default:
> +	      error ("unsupported mode %s\n", mname);
> 
> be a fatal_error ()?  After all if we hit this but continue we'll

Ok, I'll change it.

> stream random crap.  I also think we should be a bit more user-centric
> here and maybe report "for host / offload target combination".

Eventually, sure, we should be able (based on options) either turn all the
errors from the offloading compiler into warnings that just disable the
offloading for some particular offloading target.

> +static GTY(()) const unsigned char *lto_mode_identity_table;
> 
> why in GC memory?

The reason for that is that it is referenced from GC structure, and in the
offloading path they should be GC allocated, so that they can be released
when the corresponding GC structure holding pointer to that goes away.
In the non-offloading LTO, all those GC structures will contain the same
value, lto_mode_identity_table, but if that would be heap allocated, GC
would be upset.

	Jakub

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Offloading compilers' support libraries
  2015-02-20 19:59                         ` Ilya Verbin
@ 2015-02-26 19:35                           ` Ilya Verbin
  0 siblings, 0 replies; 42+ messages in thread
From: Ilya Verbin @ 2015-02-26 19:35 UTC (permalink / raw)
  To: Thomas Schwinge, Jakub Jelinek; +Cc: Bernd Schmidt, gcc-patches, Kirill Yukhin

On Fri, Feb 20, 2015 at 22:48:52 +0300, Ilya Verbin wrote:
> On Fri, Feb 20, 2015 at 18:05:01 +0300, Ilya Verbin wrote:
> > On Thu, Feb 19, 2015 at 13:17:37 +0100, Thomas Schwinge wrote:
> > > My asumption is that it is always safe to install non-native (that is
> > > cross) GCC installations into the same prefix.  (Which would resolve this
> > > problem of clashing file names for target and offloading compilers for
> > > good.)
> > > 
> > > So, the next question is, instead of this special handling, why can't we
> > > require the offloading compilers to always be configured as cross
> > > compilers?  Or, why is it a requirement that the intelmic offloading
> > > compiler is configured as a native compiler?
> > 
> > If I understand correctly, to build a cross compiler, we need to specify a path
> > to the target sysroot, even for x86_64-pc-linux-gnu to x86_64-intelmic-linux-gnu
> > cross.  Or is it possible to build a cross compiler without --with-sysroot ?
> 
> To be precise, for the cross compiler we need to specify a path to
> --with-build-time-tools (rather than --with-sysroot).  The problem is that for
> Intel MIC there are no special as/ld/etc.  So, is there an elegant way to build
> a cross compiler with host's build time tools?

Probably one can build an offloading cross compiler as:
configure --target=x86_64-intelmic-linux-gnu --enable-as-accelerator-for=x86_64-pc-linux-gnu --with-build-time-tools=/usr/bin/
?

But I'm getting errors for such configuration :(

In file included from ../../../gcc/libgcc/gthr.h:148:0,
                 from ../../../gcc/libgcc/libgcov-interface.c:27:
./gthr-default.h:35:21: fatal error: pthread.h: No such file or directory
compilation terminated.
make[2]: *** [_gcov_dump.o] Error 1
make[2]: Leaving directory `/build/x86_64-intelmic-linux-gnu/libgcc'
make[1]: *** [all-target-libgcc] Error 2
make[1]: Leaving directory `/build'
make: *** [all] Error 2

I really want to remove the intelmicemul target, since there is no difference in
the compiler between real and emulated compilation for Intel MIC.  The only
difference is which libcoi_host.so is used by liboffloadmic_host.so at run-time.

Thanks,
  -- Ilya

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Offloading compilers' libgcc installation
  2015-02-20 19:32                   ` Ilya Verbin
@ 2015-03-10 12:35                     ` Thomas Schwinge
  2015-04-27 16:15                       ` Thomas Schwinge
  0 siblings, 1 reply; 42+ messages in thread
From: Thomas Schwinge @ 2015-03-10 12:35 UTC (permalink / raw)
  To: gcc-patches; +Cc: Kirill Yukhin, Ilya Verbin, Bernd Schmidt, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 862 bytes --]

Hi!

All the "offloading folks" agree, but we need someone to "formally
approve" this patch?


On Fri, 20 Feb 2015 22:27:43 +0300, Ilya Verbin <iverbin@gmail.com> wrote:
> On Fri, Feb 20, 2015 at 10:27:26 +0100, Thomas Schwinge wrote:
> > On Thu, 19 Feb 2015 10:28:46 +0100, Bernd Schmidt <bernds@codesourcery.com> wrote:
> > > issue when trying to 
> > > get at the libgcc for the nvptx accel compiler after it's been 
> > > installed. The libgcc Makefile puts it in the wrong place - 
> > > gcc/nvptx-none/accel/nvptx-none instead of gcc/host/accel/nvptx-none. 
> > > The patch below corrects that and removes an intelmicemul special case 
> > > which I believe has the same effect - Ilya, could you test this?
> > 
> > Works fine for me for intelmic (no changes), and nvptx (changes as
> > expected).
> 
> OK to me.


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Offloading compilers' libgcc installation
  2015-03-10 12:35                     ` Offloading compilers' libgcc installation Thomas Schwinge
@ 2015-04-27 16:15                       ` Thomas Schwinge
  2015-04-27 16:16                         ` Jakub Jelinek
  0 siblings, 1 reply; 42+ messages in thread
From: Thomas Schwinge @ 2015-04-27 16:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: Kirill Yukhin, Ilya Verbin, Bernd Schmidt, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 1256 bytes --]

Hi!

Ping.  (Or can Bernd just commit this patch,
<http://news.gmane.org/find-root.php?message_id=%3C54E5ACCE.7080502%40codesourcery.com%3E>
(with my review comment addressed,
<http://news.gmane.org/find-root.php?message_id=%3C87h9uhlypt.fsf%40schwinge.name%3E>),
given his nvptx architecture maintainership?)

On Tue, 10 Mar 2015 13:34:51 +0100, I wrote:
> All the "offloading folks" agree, but we need someone to "formally
> approve" this patch?
> 
> 
> On Fri, 20 Feb 2015 22:27:43 +0300, Ilya Verbin <iverbin@gmail.com> wrote:
> > On Fri, Feb 20, 2015 at 10:27:26 +0100, Thomas Schwinge wrote:
> > > On Thu, 19 Feb 2015 10:28:46 +0100, Bernd Schmidt <bernds@codesourcery.com> wrote:
> > > > issue when trying to 
> > > > get at the libgcc for the nvptx accel compiler after it's been 
> > > > installed. The libgcc Makefile puts it in the wrong place - 
> > > > gcc/nvptx-none/accel/nvptx-none instead of gcc/host/accel/nvptx-none. 
> > > > The patch below corrects that and removes an intelmicemul special case 
> > > > which I believe has the same effect - Ilya, could you test this?
> > > 
> > > Works fine for me for intelmic (no changes), and nvptx (changes as
> > > expected).
> > 
> > OK to me.


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Offloading compilers' libgcc installation
  2015-04-27 16:15                       ` Thomas Schwinge
@ 2015-04-27 16:16                         ` Jakub Jelinek
  0 siblings, 0 replies; 42+ messages in thread
From: Jakub Jelinek @ 2015-04-27 16:16 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: gcc-patches, Kirill Yukhin, Ilya Verbin, Bernd Schmidt

On Mon, Apr 27, 2015 at 06:14:36PM +0200, Thomas Schwinge wrote:
> Hi!
> 
> Ping.  (Or can Bernd just commit this patch,
> <http://news.gmane.org/find-root.php?message_id=%3C54E5ACCE.7080502%40codesourcery.com%3E>
> (with my review comment addressed,
> <http://news.gmane.org/find-root.php?message_id=%3C87h9uhlypt.fsf%40schwinge.name%3E>),
> given his nvptx architecture maintainership?)

Ok for trunk.

	Jakub

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2015-04-27 16:16 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-01 11:58 nvptx offloading patches [3/n], RFD Bernd Schmidt
2014-11-03 22:28 ` Jeff Law
2014-11-04 12:38   ` nvptx offloading patches [3/n], i386 bits RFD Bernd Schmidt
2014-11-04 18:58     ` Uros Bizjak
2014-11-04 21:50     ` Jeff Law
2014-11-05  0:23       ` Bernd Schmidt
2014-11-14 18:42         ` Bernd Schmidt
2015-02-04 11:38 ` nvptx offloading patches [3/n], RFD Jakub Jelinek
2015-02-09 10:20   ` Richard Biener
2015-02-16 21:08     ` Jakub Jelinek
2015-02-16 21:35       ` Richard Biener
2015-02-16 21:44         ` Jakub Jelinek
2015-02-17 10:00           ` Richard Biener
2015-02-18 10:00             ` Jakub Jelinek
2015-02-25  8:51               ` Patch ping Jakub Jelinek
2015-02-25  9:30                 ` Richard Biener
2015-02-25 16:51                   ` Jakub Jelinek
2015-02-18  9:05           ` nvptx offloading patches [3/n], RFD Thomas Schwinge
2015-02-17 13:32       ` Ilya Verbin
2015-02-17 15:39         ` Jakub Jelinek
2015-02-17 16:21           ` Joseph Myers
2015-02-17 16:40             ` Jakub Jelinek
2015-02-18  9:12               ` Thomas Schwinge
2015-02-18 10:27                 ` Jakub Jelinek
2015-02-18 11:34                 ` Jakub Jelinek
2015-02-18 12:10                   ` Thomas Schwinge
2015-02-18 12:35                     ` Jakub Jelinek
2015-02-19 10:50                       ` If we're building an offloading compiler, always enable the LTO front end (was: nvptx offloading patches [3/n], RFD) Thomas Schwinge
2015-02-19 10:53                         ` Jakub Jelinek
2015-02-20  9:42                           ` Thomas Schwinge
2015-02-19 10:20               ` nvptx offloading patches [3/n], RFD Bernd Schmidt
2015-02-19 12:02                 ` Offloading compilers' support libraries (was: nvptx offloading patches [3/n], RFD) Thomas Schwinge
2015-02-19 12:11                   ` Offloading compilers' support libraries Bernd Schmidt
2015-02-19 12:19                     ` Thomas Schwinge
2015-02-20 15:35                       ` Ilya Verbin
2015-02-20 19:59                         ` Ilya Verbin
2015-02-26 19:35                           ` Ilya Verbin
2015-02-20  9:33                 ` Offloading compilers' libgcc installation (was: nvptx offloading patches [3/n], RFD) Thomas Schwinge
2015-02-20 19:32                   ` Ilya Verbin
2015-03-10 12:35                     ` Offloading compilers' libgcc installation Thomas Schwinge
2015-04-27 16:15                       ` Thomas Schwinge
2015-04-27 16:16                         ` Jakub Jelinek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).