public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* LTO remapping/deduction of machine modes of types/decls
@ 2016-12-30 18:37 Alexander Monakov
  2017-01-02 10:19 ` Jakub Jelinek
  0 siblings, 1 reply; 16+ messages in thread
From: Alexander Monakov @ 2016-12-30 18:37 UTC (permalink / raw)
  To: gcc; +Cc: Richard Biener, Jakub Jelinek, Vladislav Ivanishin

[-- Attachment #1: Type: text/plain, Size: 3897 bytes --]

Hello, Richard, Jakub, community,

May I join/restart the old discussion about machine mode remapping at LTO
stream-in time.  To recap, when offloading to NVPTX was introduced, there
was a problem due to differences in the set of supported modes (e.g. there
was no 'XFmode' on NVPTX that would correspond to 'long double' tree type
node in GIMPLE LTO streams produced by x86 host compiler).

The current solution in GCC is to additionally stream a 'mode table' and use it
to remap numeric mode identifiers during LTO stream-in in all trees that have
modes.  This is the solution initially outlined by Jakub in the message
https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00226.html .  In response to that,
Richard said,

> I think (also communicated that on IRC) we should instead try not streaming
> machine-modes at all but generating them at stream-in time via layout_type
> or layout_decl.

and later in the thread also:

> I'm just looking for a way to make this less of a hack (and the LTO IL
> less target dependent).  Not for GCC 5 for which something like your
> patch is probably ok, but for the future.

Now that we're in the future, I've asked Vlad Ivanishin (Cc'ed) to try and
implement Richard's approach.  The motivation is enhancing LTO for offloaded
code, in particular to expose library code for inlining.  In that scenario,
the current scheme has a problem that WPA can arbitrarily mix LTO sections
coming from libraries (where the modes don't need remapping) and LTO sections
produced by the host compiler.  Thus, mode_table would need to be only
selectively applied during stream-in, based on the origin of the section.  And,
we'd need to ensure that WPA duplicates mode tables across all ltrans units.

In light of that, I felt that trying Richard's approach would be proper.
Actually, I don't know why gimple/tree representation carries machine modes in
the first place; it seems to be redundant information deducible from type
information.

Vlad's current patch is adding mode deduction for types and decls, matches the
deduced mode against the streamed-in mode, and ICEs in case of mismatch. To be
clear, he's checking this for native LTO via lto-bootstrap, but nevertheless
it's a nice way of giving confidence that mode inference works as intended.

This seems to be fine for C, but in C++ we are seeing some hard-to-explain cases
where the deduced BLKmode for 7-byte-sized/4-byte-aligned base-class decl is
mismatching against deduced DImode.  The DImode would be obviously correct for
8-byte-sized decl of the corresponding type, but the base class decl does not
have 1 byte of padding in the tail.  What's worse, the issue is just for the
mode of the decl: the mode of the type is BLKmode, as we'd expect.

Unfortunately, just adjusting the C++ frontend to place BLKmode on the decl too
doesn't lead to immediate success, because decl modes have implications for
debug info generation, and the compiler starts ICE'ing there instead.

So we're hitting under-documented places in the compiler here, and personally
I don't have the confidence to judge how they're intended to work.

Basically for now my questions are:

1. Is there an intended invariant that decl modes should match type modes? It
appears that if there was, the above situation with C++ base objects would be a
violation.

2. Do you think we should continue digging in this direction?

I'm not sure how much it'd help a discussion, but for completeness Vlad's
current patchset is provided as attachments. Patch 1/3 adds mode inference for
types (only), patch 2 just reverts Jakub's additions of mode_table handling,
and finally patch 3 adds mode inference for decls, adds checking against
streamed-in modes, and shows where the attempted adjustments in the C++ frontend
and debug info generation were.  There are a few coding style violations; sorry;
I hope they are not too distracting.

Thanks.
Alexander

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-diff; name=0001-Infer-modes-from-types-after-LTO-streaming-in.patch, Size: 6632 bytes --]

From 58ad9d4d75cbc057c003c701ff3f0e6b8fa35e39 Mon Sep 17 00:00:00 2001
From: Vladislav Ivanishin <vlad@ispras.ru>
Date: Tue, 13 Dec 2016 14:58:26 +0300
Subject: [PATCH 1/3] Infer modes from types after LTO streaming-in

    * gcc/lto/lto.c: New function lto_infer_mode () which calls ...
    * gcc/stor-layout.c: ... the new function set_mode_for_type ().
    * gcc/stor-layout.h: Declare set_mode_for_type ().
---
 gcc/lto/lto.c     |  20 +++++++++
 gcc/stor-layout.c | 127 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 gcc/stor-layout.h |   2 +
 3 files changed, 149 insertions(+)

diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index 6718fbbe..cec54e3 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -1656,6 +1656,25 @@ unify_scc (struct data_in *data_in, unsigned from,
   return unified_p;
 }
 
+static void
+lto_infer_mode (tree type)
+{
+  if (!TYPE_P (type))
+    return;
+
+  if (!COMPLETE_TYPE_P (type) && TYPE_MODE (type) == VOIDmode)
+    return;
+
+  /* C++ FE has complex logic for laying out classes.  We don't have
+     the information here to reproduce the decision process (nor do we
+     want to do it).  If the streamed mode is BLK (just like VOID it's BLK
+     everywhere) don't touch anything.  */
+  if (TREE_CODE (type) == RECORD_TYPE && TYPE_MODE (type) == BLKmode)
+    return;
+
+  SET_TYPE_MODE (type, VOIDmode);
+  set_mode_for_type (type);
+}
 
 /* Read all the symbols from buffer DATA, using descriptors in DECL_DATA.
    RESOLUTIONS is the set of symbols picked by the linker (read from the
@@ -1736,6 +1755,7 @@ lto_read_decls (struct lto_file_decl_data *decl_data, const void *data,
 		  seen_type = true;
 		  num_prevailing_types++;
 		  lto_fixup_prevailing_type (t);
+		  lto_infer_mode (t);
 		}
 	      /* Compute the canonical type of all types.
 		 ???  Should be able to assert that !TYPE_CANONICAL.  */
diff --git a/gcc/stor-layout.c b/gcc/stor-layout.c
index d1738d2..a264ea1 100644
--- a/gcc/stor-layout.c
+++ b/gcc/stor-layout.c
@@ -2414,6 +2414,133 @@ layout_type (tree type)
     gcc_assert (!TYPE_ALIAS_SET_KNOWN_P (type));
 }
 
+/* Determine and set mode based on type.  This function mirrors the structure
+   of layout_type.  */
+
+void
+set_mode_for_type (tree type)
+{
+  if (type == error_mark_node)
+    return;
+
+  switch (TREE_CODE (type))
+    {
+    case LANG_TYPE:
+      /* This kind of type is the responsibility
+	 of the language-specific code.  */
+      gcc_unreachable ();
+
+    /* Fortran and C have different boolean types (32 vs. 8 bits).  */
+    case BOOLEAN_TYPE:
+    case INTEGER_TYPE:
+    case ENUMERAL_TYPE:
+      {
+	unsigned HOST_WIDE_INT bitsize = tree_to_uhwi (TYPE_SIZE (type));
+	SET_TYPE_MODE (type, smallest_mode_for_size (bitsize, MODE_INT));
+	break;
+      }
+
+    case REAL_TYPE:
+      SET_TYPE_MODE (type,
+		     mode_for_size (TYPE_PRECISION (type), MODE_FLOAT, 0));
+      break;
+
+    case FIXED_POINT_TYPE:
+      gcc_unreachable ();
+      break;
+
+    case COMPLEX_TYPE:
+      SET_TYPE_MODE (type,
+		     mode_for_size (2 * TYPE_PRECISION (TREE_TYPE (type)),
+				    (TREE_CODE (TREE_TYPE (type)) == REAL_TYPE
+				     ? MODE_COMPLEX_FLOAT : MODE_COMPLEX_INT),
+				     0));
+      break;
+
+    case VECTOR_TYPE:
+      {
+	int nunits = TYPE_VECTOR_SUBPARTS (type);
+	tree innertype = TREE_TYPE (type);
+
+	gcc_assert (!(nunits & (nunits - 1)));
+	/* Find an appropriate mode for the vector type.  */
+        SET_TYPE_MODE (type,
+                       mode_for_vector (TYPE_MODE (innertype), nunits));
+        break;
+      }
+
+    case VOID_TYPE:
+      SET_TYPE_MODE (type, VOIDmode);
+      break;
+
+    case POINTER_BOUNDS_TYPE:
+      SET_TYPE_MODE (type, targetm.chkp_bound_mode ());
+      // I suspect there's no test case where control would reach here.
+      // internal_error ("pointer-bounds-type");
+      break;
+
+    case OFFSET_TYPE:
+      /* A pointer might be MODE_PARTIAL_INT, but ptrdiff_t must be
+	 integral, which may be an __intN.  */
+      SET_TYPE_MODE (type, mode_for_size (POINTER_SIZE, MODE_INT, 0));
+      break;
+
+    case FUNCTION_TYPE:
+    case METHOD_TYPE:
+      /* It's hard to see what the mode and size of a function ought to
+	 be, but we do know the alignment is FUNCTION_BOUNDARY, so
+	 make it consistent with that.  */
+      SET_TYPE_MODE (type, mode_for_size (FUNCTION_BOUNDARY, MODE_INT, 0));
+      break;
+
+    case POINTER_TYPE:
+    case REFERENCE_TYPE:
+      gcc_assert (ADDR_SPACE_GENERIC_P (TYPE_ADDR_SPACE (TREE_TYPE (type))));
+      /* Fallthrough.  */
+
+    case NULLPTR_TYPE:
+      SET_TYPE_MODE (type, ptr_mode);
+      break;
+
+    case ARRAY_TYPE:
+      {
+	SET_TYPE_MODE (type, BLKmode);
+	if (TYPE_SIZE (type) != 0
+	    && ! targetm.member_type_forces_blk (type, VOIDmode)
+	    /* BLKmode elements force BLKmode aggregate;
+	       else extract/store fields may lose.  */
+	    && (TYPE_MODE (TREE_TYPE (type)) != BLKmode
+		|| TYPE_NO_FORCE_BLK (TREE_TYPE (type))))
+	  {
+	    SET_TYPE_MODE (type, mode_for_array (TREE_TYPE (type),
+						 TYPE_SIZE (type)));
+	    if (TYPE_MODE (type) != BLKmode
+		&& STRICT_ALIGNMENT && TYPE_ALIGN (type) < BIGGEST_ALIGNMENT
+		&& TYPE_ALIGN (type) < GET_MODE_ALIGNMENT (TYPE_MODE (type)))
+	      {
+		TYPE_NO_FORCE_BLK (type) = 1; // FIXME: remove this side effect?
+		SET_TYPE_MODE (type, BLKmode);
+	      }
+	  }
+	break;
+      }
+
+    case RECORD_TYPE:
+    case UNION_TYPE:
+    case QUAL_UNION_TYPE:
+      compute_record_mode (type);
+      break;
+
+    default:
+      gcc_unreachable ();
+    }
+
+  /* We should never see alias sets on incomplete aggregates.  And we
+     should not call layout_type on not incomplete aggregates.  */
+  if (AGGREGATE_TYPE_P (type))
+    gcc_assert (!TYPE_ALIAS_SET_KNOWN_P (type));
+}
+
 /* Return the least alignment required for type TYPE.  */
 
 unsigned int
diff --git a/gcc/stor-layout.h b/gcc/stor-layout.h
index b240ea1..ee44207 100644
--- a/gcc/stor-layout.h
+++ b/gcc/stor-layout.h
@@ -59,6 +59,8 @@ extern void layout_decl (tree, unsigned);
    TYPE_ALIGN and TYPE_MODE fields.  If called more than once on one
    node, does nothing except for the first time.  */
 extern void layout_type (tree);
+/* Calculate and set mode for the given type.  */
+extern void set_mode_for_type (tree);
 
 /* Return the least alignment in bytes required for type TYPE.  */
 extern unsigned int min_align_of_type (tree);
-- 
1.8.3.1


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: Type: text/x-diff; name=0002-Revert-passes.c-ipa_write_summaries_1-Call-lto_outpu.patch, Size: 34138 bytes --]

From 42b38fbf19e91bef3a6bf7297188bdc002d8adf7 Mon Sep 17 00:00:00 2001
From: Vladislav Ivanishin <vlad@ispras.ru>
Date: Fri, 30 Dec 2016 20:00:04 +0300
Subject: [PATCH 2/3] Revert: passes.c (ipa_write_summaries_1): Call
 lto_output_init_mode_table.

This reverts commit 2e971afda97df69eaf05cfa33b9a108cc8b97a45.
---
 gcc/config/pdp11/pdp11.c  |   6 +--
 gcc/data-streamer-in.c    |   2 +-
 gcc/gimple-streamer-in.c  |   1 +
 gcc/gimple-streamer-out.c |   1 +
 gcc/ipa-hsa.c             |   3 +-
 gcc/ipa-icf.c             |   2 +-
 gcc/ipa-inline-analysis.c |   4 +-
 gcc/ipa-prop.c            |   5 +-
 gcc/ipa-pure-const.c      |   1 +
 gcc/lto-cgraph.c          |   3 +-
 gcc/lto-section-in.c      |   4 +-
 gcc/lto-streamer-in.c     | 130 ++--------------------------------------------
 gcc/lto-streamer-out.c    |  94 +--------------------------------
 gcc/lto-streamer.h        |  17 ++----
 gcc/lto/lto.c             |  17 +-----
 gcc/passes.c              |   2 -
 gcc/real.c                |  72 +++++++++----------------
 gcc/real.h                |   1 -
 gcc/tree-streamer-in.c    |   7 +--
 gcc/tree-streamer-out.c   |  10 ++--
 gcc/tree-streamer.c       |   8 ---
 gcc/tree-streamer.h       |  17 +-----
 22 files changed, 61 insertions(+), 346 deletions(-)

diff --git a/gcc/config/pdp11/pdp11.c b/gcc/config/pdp11/pdp11.c
index ec3d61a..9a3313a 100644
--- a/gcc/config/pdp11/pdp11.c
+++ b/gcc/config/pdp11/pdp11.c
@@ -76,8 +76,7 @@ const struct real_format pdp11_f_format =
     false,
     false,
     false,
-    false,
-    "pdp11_f"
+    false
   };
 
 const struct real_format pdp11_d_format =
@@ -98,8 +97,7 @@ const struct real_format pdp11_d_format =
     false,
     false,
     false,
-    false,
-    "pdp11_d"
+    false
   };
 
 static void
diff --git a/gcc/data-streamer-in.c b/gcc/data-streamer-in.c
index 2625af6..a6cee7b 100644
--- a/gcc/data-streamer-in.c
+++ b/gcc/data-streamer-in.c
@@ -45,7 +45,7 @@ string_for_index (struct data_in *data_in, unsigned int loc, unsigned int *rlen)
     }
 
   /* Get the string stored at location LOC in DATA_IN->STRINGS.  */
-  lto_input_block str_tab (data_in->strings, loc - 1, data_in->strings_len, NULL);
+  lto_input_block str_tab (data_in->strings, loc - 1, data_in->strings_len);
   len = streamer_read_uhwi (&str_tab);
   *rlen = len;
 
diff --git a/gcc/gimple-streamer-in.c b/gcc/gimple-streamer-in.c
index f706db9..79266c7 100644
--- a/gcc/gimple-streamer-in.c
+++ b/gcc/gimple-streamer-in.c
@@ -31,6 +31,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-iterator.h"
 #include "cgraph.h"
 #include "value-prof.h"
+#include "data-streamer.h"
 
 /* Read a PHI function for basic block BB in function FN.  DATA_IN is
    the file being read.  IB is the input block to use for reading.  */
diff --git a/gcc/gimple-streamer-out.c b/gcc/gimple-streamer-out.c
index 35124bd..0bb13df 100644
--- a/gcc/gimple-streamer-out.c
+++ b/gcc/gimple-streamer-out.c
@@ -31,6 +31,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-iterator.h"
 #include "cgraph.h"
 #include "value-prof.h"
+#include "data-streamer.h"
 
 /* Output PHI function PHI to the main stream in OB.  */
 
diff --git a/gcc/ipa-hsa.c b/gcc/ipa-hsa.c
index 769657f..f407c47 100644
--- a/gcc/ipa-hsa.c
+++ b/gcc/ipa-hsa.c
@@ -42,6 +42,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "print-tree.h"
 #include "symbol-summary.h"
 #include "hsa.h"
+#include "data-streamer.h"
 
 namespace {
 
@@ -221,7 +222,7 @@ ipa_hsa_read_section (struct lto_file_decl_data *file_data, const char *data,
   unsigned int count;
 
   lto_input_block ib_main ((const char *) data + main_offset,
-			   header->main_size, file_data->mode_table);
+			   header->main_size);
 
   data_in
     = lto_data_in_create (file_data, (const char *) data + string_offset,
diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
index ef04c55..686ea24 100644
--- a/gcc/ipa-icf.c
+++ b/gcc/ipa-icf.c
@@ -2355,7 +2355,7 @@ sem_item_optimizer::read_section (lto_file_decl_data *file_data,
   unsigned int count;
 
   lto_input_block ib_main ((const char *) data + main_offset, 0,
-			   header->main_size, file_data->mode_table);
+			   header->main_size);
 
   data_in =
     lto_data_in_create (file_data, (const char *) data + string_offset,
diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
index f8ca825..45b8f22 100644
--- a/gcc/ipa-inline-analysis.c
+++ b/gcc/ipa-inline-analysis.c
@@ -95,6 +95,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "cilk.h"
 #include "cfgexpand.h"
 #include "gimplify.h"
+#include "data-streamer.h"
 
 /* Estimate runtime of function can easilly run into huge numbers with many
    nested loops.  Be sure we can compute time * INLINE_SIZE_SCALE * 2 in an
@@ -4198,8 +4199,7 @@ inline_read_section (struct lto_file_decl_data *file_data, const char *data,
   unsigned int i, count2, j;
   unsigned int f_count;
 
-  lto_input_block ib ((const char *) data + main_offset, header->main_size,
-		      file_data->mode_table);
+  lto_input_block ib ((const char *) data + main_offset, header->main_size);
 
   data_in =
     lto_data_in_create (file_data, (const char *) data + string_offset,
diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
index 06a9aa2..a60aa0c 100644
--- a/gcc/ipa-prop.c
+++ b/gcc/ipa-prop.c
@@ -52,6 +52,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "dbgcnt.h"
 #include "domwalk.h"
 #include "builtins.h"
+#include "data-streamer.h"
 
 /* Function summary where the parameter infos are actually stored. */
 ipa_node_params_t *ipa_node_params_sum = NULL;
@@ -4792,7 +4793,7 @@ ipa_prop_read_section (struct lto_file_decl_data *file_data, const char *data,
   unsigned int count;
 
   lto_input_block ib_main ((const char *) data + main_offset,
-			   header->main_size, file_data->mode_table);
+			   header->main_size);
 
   data_in =
     lto_data_in_create (file_data, (const char *) data + string_offset,
@@ -5013,7 +5014,7 @@ read_replacements_section (struct lto_file_decl_data *file_data,
   unsigned int count;
 
   lto_input_block ib_main ((const char *) data + main_offset,
-			   header->main_size, file_data->mode_table);
+			   header->main_size);
 
   data_in = lto_data_in_create (file_data, (const char *) data + string_offset,
 				header->string_size, vNULL);
diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c
index 892bf46..b88d49c 100644
--- a/gcc/ipa-pure-const.c
+++ b/gcc/ipa-pure-const.c
@@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-scalar-evolution.h"
 #include "intl.h"
 #include "opts.h"
+#include "data-streamer.h"
 
 /* Lattice values for const and pure functions.  Everything starts out
    being const, then may drop to pure and then neither depending on
diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index 0634779..e569151 100644
--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-utils.h"
 #include "omp-low.h"
 #include "ipa-chkp.h"
+#include "data-streamer.h"
 
 /* True when asm nodes has been output.  */
 bool asm_nodes_output = false;
@@ -2121,7 +2122,7 @@ input_cgraph_opt_section (struct lto_file_decl_data *file_data,
   unsigned int count;
 
   lto_input_block ib_main ((const char *) data + main_offset,
-			   header->main_size, file_data->mode_table);
+			   header->main_size);
 
   data_in =
     lto_data_in_create (file_data, (const char *) data + string_offset,
diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c
index 93b82be..f45d57a 100644
--- a/gcc/lto-section-in.c
+++ b/gcc/lto-section-in.c
@@ -51,7 +51,6 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] =
   "ipcp_trans",
   "icf",
   "offload_table",
-  "mode_table",
   "hsa"
 };
 
@@ -245,8 +244,7 @@ lto_create_simple_input_block (struct lto_file_decl_data *file_data,
     return NULL;
 
   *datar = data;
-  return new lto_input_block (data + main_offset, header->main_size,
-			      file_data->mode_table);
+  return new lto_input_block (data + main_offset, header->main_size);
 }
 
 
diff --git a/gcc/lto-streamer-in.c b/gcc/lto-streamer-in.c
index 48a1c86..ace0f74 100644
--- a/gcc/lto-streamer-in.c
+++ b/gcc/lto-streamer-in.c
@@ -41,7 +41,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "except.h"
 #include "cgraph.h"
 #include "cfgloop.h"
-
+#include "data-streamer.h"
 
 struct freeing_string_slot_hasher : string_slot_hasher
 {
@@ -1206,12 +1206,10 @@ lto_read_body_or_constructor (struct lto_file_decl_data *file_data, struct symta
 
       /* Set up the struct function.  */
       from = data_in->reader_cache->nodes.length ();
-      lto_input_block ib_main (data + main_offset, header->main_size,
-			       file_data->mode_table);
+      lto_input_block ib_main (data + main_offset, header->main_size);
       if (TREE_CODE (node->decl) == FUNCTION_DECL)
 	{
-	  lto_input_block ib_cfg (data + cfg_offset, header->cfg_size,
-				  file_data->mode_table);
+	  lto_input_block ib_cfg (data + cfg_offset, header->cfg_size);
 	  input_function (fn_decl, data_in, &ib_main, &ib_cfg);
 	}
       else
@@ -1479,8 +1477,7 @@ lto_input_toplevel_asms (struct lto_file_decl_data *file_data, int order_base)
 
   string_offset = sizeof (*header) + header->main_size;
 
-  lto_input_block ib (data + sizeof (*header), header->main_size,
-		      file_data->mode_table);
+  lto_input_block ib (data + sizeof (*header), header->main_size);
 
   data_in = lto_data_in_create (file_data, data + string_offset,
 			      header->string_size, vNULL);
@@ -1499,125 +1496,6 @@ lto_input_toplevel_asms (struct lto_file_decl_data *file_data, int order_base)
 }
 
 
-/* Input mode table.  */
-
-void
-lto_input_mode_table (struct lto_file_decl_data *file_data)
-{
-  size_t len;
-  const char *data = lto_get_section_data (file_data, LTO_section_mode_table,
-					   NULL, &len);
-  if (! data)
-    {
-      internal_error ("cannot read LTO mode table from %s",
-		      file_data->file_name);
-      return;
-    }
-
-  unsigned char *table = ggc_cleared_vec_alloc<unsigned char> (1 << 8);
-  file_data->mode_table = table;
-  const struct lto_simple_header_with_strings *header
-    = (const struct lto_simple_header_with_strings *) data;
-  int string_offset;
-  struct data_in *data_in;
-  string_offset = sizeof (*header) + header->main_size;
-
-  lto_input_block ib (data + sizeof (*header), header->main_size, NULL);
-  data_in = lto_data_in_create (file_data, data + string_offset,
-				header->string_size, vNULL);
-  bitpack_d bp = streamer_read_bitpack (&ib);
-
-  table[VOIDmode] = VOIDmode;
-  table[BLKmode] = BLKmode;
-  unsigned int m;
-  while ((m = bp_unpack_value (&bp, 8)) != VOIDmode)
-    {
-      enum mode_class mclass
-	= bp_unpack_enum (&bp, mode_class, MAX_MODE_CLASS);
-      unsigned int size = bp_unpack_value (&bp, 8);
-      unsigned int prec = bp_unpack_value (&bp, 16);
-      machine_mode inner = (machine_mode) bp_unpack_value (&bp, 8);
-      unsigned int nunits = bp_unpack_value (&bp, 8);
-      unsigned int ibit = 0, fbit = 0;
-      unsigned int real_fmt_len = 0;
-      const char *real_fmt_name = NULL;
-      switch (mclass)
-	{
-	case MODE_FRACT:
-	case MODE_UFRACT:
-	case MODE_ACCUM:
-	case MODE_UACCUM:
-	  ibit = bp_unpack_value (&bp, 8);
-	  fbit = bp_unpack_value (&bp, 8);
-	  break;
-	case MODE_FLOAT:
-	case MODE_DECIMAL_FLOAT:
-	  real_fmt_name = bp_unpack_indexed_string (data_in, &bp,
-						    &real_fmt_len);
-	  break;
-	default:
-	  break;
-	}
-      /* First search just the GET_CLASS_NARROWEST_MODE to wider modes,
-	 if not found, fallback to all modes.  */
-      int pass;
-      for (pass = 0; pass < 2; pass++)
-	for (machine_mode mr = pass ? VOIDmode
-				    : GET_CLASS_NARROWEST_MODE (mclass);
-	     pass ? mr < MAX_MACHINE_MODE : mr != VOIDmode;
-	     pass ? mr = (machine_mode) (mr + 1)
-		  : mr = GET_MODE_WIDER_MODE (mr))
-	  if (GET_MODE_CLASS (mr) != mclass
-	      || GET_MODE_SIZE (mr) != size
-	      || GET_MODE_PRECISION (mr) != prec
-	      || (inner == m
-		  ? GET_MODE_INNER (mr) != mr
-		  : GET_MODE_INNER (mr) != table[(int) inner])
-	      || GET_MODE_IBIT (mr) != ibit
-	      || GET_MODE_FBIT (mr) != fbit
-	      || GET_MODE_NUNITS (mr) != nunits)
-	    continue;
-	  else if ((mclass == MODE_FLOAT || mclass == MODE_DECIMAL_FLOAT)
-		   && strcmp (REAL_MODE_FORMAT (mr)->name, real_fmt_name) != 0)
-	    continue;
-	  else
-	    {
-	      table[m] = mr;
-	      pass = 2;
-	      break;
-	    }
-      unsigned int mname_len;
-      const char *mname = bp_unpack_indexed_string (data_in, &bp, &mname_len);
-      if (pass == 2)
-	{
-	  switch (mclass)
-	    {
-	    case MODE_VECTOR_INT:
-	    case MODE_VECTOR_FLOAT:
-	    case MODE_VECTOR_FRACT:
-	    case MODE_VECTOR_UFRACT:
-	    case MODE_VECTOR_ACCUM:
-	    case MODE_VECTOR_UACCUM:
-	      /* For unsupported vector modes just use BLKmode,
-		 if the scalar mode is supported.  */
-	      if (table[(int) inner] != VOIDmode)
-		{
-		  table[m] = BLKmode;
-		  break;
-		}
-	      /* FALLTHRU */
-	    default:
-	      fatal_error (UNKNOWN_LOCATION, "unsupported mode %s\n", mname);
-	      break;
-	    }
-	}
-    }
-  lto_data_in_delete (data_in);
-
-  lto_free_section_data (file_data, LTO_section_mode_table, NULL, data, len);
-}
-
-
 /* Initialization for the LTO reader.  */
 
 void
diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c
index 6bb76cc..f47bc8b 100644
--- a/gcc/lto-streamer-out.c
+++ b/gcc/lto-streamer-out.c
@@ -40,7 +40,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgloop.h"
 #include "builtins.h"
 #include "gomp-constants.h"
-
+#include "data-streamer.h"
 
 static void lto_write_tree (struct output_block*, tree, bool);
 
@@ -2694,96 +2694,6 @@ produce_symtab (struct output_block *ob)
 }
 
 
-/* Init the streamer_mode_table for output, where we collect info on what
-   machine_mode values have been streamed.  */
-void
-lto_output_init_mode_table (void)
-{
-  memset (streamer_mode_table, '\0', MAX_MACHINE_MODE);
-}
-
-
-/* Write the mode table.  */
-static void
-lto_write_mode_table (void)
-{
-  struct output_block *ob;
-  ob = create_output_block (LTO_section_mode_table);
-  bitpack_d bp = bitpack_create (ob->main_stream);
-
-  /* Ensure that for GET_MODE_INNER (m) != m we have
-     also the inner mode marked.  */
-  for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
-    if (streamer_mode_table[i])
-      {
-	machine_mode m = (machine_mode) i;
-	if (GET_MODE_INNER (m) != m)
-	  streamer_mode_table[(int) GET_MODE_INNER (m)] = 1;
-      }
-  /* First stream modes that have GET_MODE_INNER (m) == m,
-     so that we can refer to them afterwards.  */
-  for (int pass = 0; pass < 2; pass++)
-    for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
-      if (streamer_mode_table[i] && i != (int) VOIDmode && i != (int) BLKmode)
-	{
-	  machine_mode m = (machine_mode) i;
-	  if ((GET_MODE_INNER (m) == m) ^ (pass == 0))
-	    continue;
-	  bp_pack_value (&bp, m, 8);
-	  bp_pack_enum (&bp, mode_class, MAX_MODE_CLASS, GET_MODE_CLASS (m));
-	  bp_pack_value (&bp, GET_MODE_SIZE (m), 8);
-	  bp_pack_value (&bp, GET_MODE_PRECISION (m), 16);
-	  bp_pack_value (&bp, GET_MODE_INNER (m), 8);
-	  bp_pack_value (&bp, GET_MODE_NUNITS (m), 8);
-	  switch (GET_MODE_CLASS (m))
-	    {
-	    case MODE_FRACT:
-	    case MODE_UFRACT:
-	    case MODE_ACCUM:
-	    case MODE_UACCUM:
-	      bp_pack_value (&bp, GET_MODE_IBIT (m), 8);
-	      bp_pack_value (&bp, GET_MODE_FBIT (m), 8);
-	      break;
-	    case MODE_FLOAT:
-	    case MODE_DECIMAL_FLOAT:
-	      bp_pack_string (ob, &bp, REAL_MODE_FORMAT (m)->name, true);
-	      break;
-	    default:
-	      break;
-	    }
-	  bp_pack_string (ob, &bp, GET_MODE_NAME (m), true);
-	}
-  bp_pack_value (&bp, VOIDmode, 8);
-
-  streamer_write_bitpack (&bp);
-
-  char *section_name
-    = lto_get_section_name (LTO_section_mode_table, NULL, NULL);
-  lto_begin_section (section_name, !flag_wpa);
-  free (section_name);
-
-  /* The entire header stream is computed here.  */
-  struct lto_simple_header_with_strings header;
-  memset (&header, 0, sizeof (header));
-
-  /* Write the header.  */
-  header.major_version = LTO_major_version;
-  header.minor_version = LTO_minor_version;
-
-  header.main_size = ob->main_stream->total_size;
-  header.string_size = ob->string_stream->total_size;
-  lto_write_data (&header, sizeof header);
-
-  /* Put all of the gimple and the string table out the asm file as a
-     block of text.  */
-  lto_write_stream (ob->main_stream);
-  lto_write_stream (ob->string_stream);
-
-  lto_end_section ();
-  destroy_output_block (ob);
-}
-
-
 /* This pass is run after all of the functions are serialized and all
    of the IPA passes have written their serialized forms.  This pass
    causes the vector of all of the global decls and types used from
@@ -2891,6 +2801,4 @@ produce_asm_for_decls (void)
   lto_symtab_encoder_delete (ob->decl_state->symtab_node_encoder);
   lto_function_decl_states.release ();
   destroy_output_block (ob);
-  if (lto_stream_offload_p)
-    lto_write_mode_table ();
 }
diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index 0cb200e..d214ac3 100644
--- a/gcc/lto-streamer.h
+++ b/gcc/lto-streamer.h
@@ -243,7 +243,6 @@ enum lto_section_type
   LTO_section_ipcp_transform,
   LTO_section_ipa_icf,
   LTO_section_offload_table,
-  LTO_section_mode_table,
   LTO_section_ipa_hsa,
   LTO_N_SECTION_TYPES		/* Must be last.  */
 };
@@ -376,15 +375,12 @@ class lto_input_block
 public:
   /* Special constructor for the string table, it abuses this to
      do random access but use the uhwi decoder.  */
-  lto_input_block (const char *data_, unsigned int p_, unsigned int len_,
-		   const unsigned char *mode_table_)
-      : data (data_), mode_table (mode_table_), p (p_), len (len_) {}
-  lto_input_block (const char *data_, unsigned int len_,
-		   const unsigned char *mode_table_)
-      : data (data_), mode_table (mode_table_), p (0), len (len_) {}
+  lto_input_block (const char *data_, unsigned int p_, unsigned int len_)
+      : data (data_), p (p_), len (len_) {}
+  lto_input_block (const char *data_, unsigned int len_)
+      : data (data_), p (0), len (len_) {}
 
   const char *data;
-  const unsigned char *mode_table;
   unsigned int p;
   unsigned int len;
 };
@@ -597,9 +593,6 @@ struct GTY(()) lto_file_decl_data
 
   /* Map assigning declarations their resolutions.  */
   hash_map<tree, ld_plugin_symbol_resolution> * GTY((skip)) resolution_map;
-
-  /* Mode translation table.  */
-  const unsigned char *mode_table;
 };
 
 typedef struct lto_file_decl_data *lto_file_decl_data_ptr;
@@ -859,7 +852,6 @@ extern void lto_input_variable_constructor (struct lto_file_decl_data *,
 extern void lto_input_constructors_and_inits (struct lto_file_decl_data *,
 					      const char *);
 extern void lto_input_toplevel_asms (struct lto_file_decl_data *, int);
-extern void lto_input_mode_table (struct lto_file_decl_data *);
 extern struct data_in *lto_data_in_create (struct lto_file_decl_data *,
 				    const char *, unsigned,
 				    vec<ld_plugin_symbol_resolution_t> );
@@ -894,7 +886,6 @@ void lto_output_decl_state_refs (struct output_block *,
 			         struct lto_output_stream *,
 			         struct lto_out_decl_state *);
 void lto_output_location (struct output_block *, struct bitpack_d *, location_t);
-void lto_output_init_mode_table (void);
 
 
 /* In lto-cgraph.c  */
diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index cec54e3..a4f6c14 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -52,6 +52,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "lto-symtab.h"
 #include "stringpool.h"
 #include "fold-const.h"
+#include "data-streamer.h"
 
 
 /* Number of parallel tasks to run, -1 if we want to use GNU Make jobserver.  */
@@ -59,8 +60,6 @@ static int lto_parallelism;
 
 static GTY(()) tree first_personality_decl;
 
-static GTY(()) const unsigned char *lto_mode_identity_table;
-
 /* Returns a hash code for P.  */
 
 static hashval_t
@@ -1694,7 +1693,7 @@ lto_read_decls (struct lto_file_decl_data *decl_data, const void *data,
   uint32_t num_decl_states;
 
   lto_input_block ib_main ((const char *) data + main_offset,
-			   header->main_size, decl_data->mode_table);
+			   header->main_size);
 
   data_in = lto_data_in_create (decl_data, (const char *) data + string_offset,
 				header->string_size, resolutions);
@@ -2048,11 +2047,6 @@ lto_file_finalize (struct lto_file_decl_data *file_data, lto_file *file)
 
   file_data->renaming_hash_table = lto_create_renaming_table ();
   file_data->file_name = file->filename;
-#ifdef ACCEL_COMPILER
-  lto_input_mode_table (file_data);
-#else
-  file_data->mode_table = lto_mode_identity_table;
-#endif
   data = lto_get_section_data (file_data, LTO_section_decls, NULL, &len);
   if (data == NULL)
     {
@@ -3239,13 +3233,6 @@ lto_init (void)
   memset (&lto_stats, 0, sizeof (lto_stats));
   bitmap_obstack_initialize (NULL);
   gimple_register_cfg_hooks ();
-#ifndef ACCEL_COMPILER
-  unsigned char *table
-    = ggc_vec_alloc<unsigned char> (MAX_MACHINE_MODE);
-  for (int m = 0; m < MAX_MACHINE_MODE; m++)
-    table[m] = m;
-  lto_mode_identity_table = table;
-#endif
 }
 
 /* Create artificial pointers for "omp declare target link" vars.  */
diff --git a/gcc/passes.c b/gcc/passes.c
index 1bf89ed..8b77af6 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -2489,7 +2489,6 @@ ipa_write_summaries_1 (lto_symtab_encoder_t encoder)
   struct lto_out_decl_state *state = lto_new_out_decl_state ();
   state->symtab_node_encoder = encoder;
 
-  lto_output_init_mode_table ();
   lto_push_out_decl_state (state);
 
   gcc_assert (!flag_wpa);
@@ -2611,7 +2610,6 @@ ipa_write_optimization_summaries (lto_symtab_encoder_t encoder)
   lto_symtab_encoder_iterator lsei;
   state->symtab_node_encoder = encoder;
 
-  lto_output_init_mode_table ();
   lto_push_out_decl_state (state);
   for (lsei = lsei_start_function_in_partition (encoder);
        !lsei_end_p (lsei); lsei_next_function_in_partition (&lsei))
diff --git a/gcc/real.c b/gcc/real.c
index fbebbf0..efbe33b 100644
--- a/gcc/real.c
+++ b/gcc/real.c
@@ -3049,8 +3049,7 @@ const struct real_format ieee_single_format =
     true,
     true,
     true,
-    false,
-    "ieee_single"
+    false
   };
 
 const struct real_format mips_single_format =
@@ -3071,8 +3070,7 @@ const struct real_format mips_single_format =
     true,
     true,
     false,
-    true,
-    "mips_single"
+    true
   };
 
 const struct real_format motorola_single_format =
@@ -3093,8 +3091,7 @@ const struct real_format motorola_single_format =
     true,
     true,
     true,
-    true,
-    "motorola_single"
+    true
   };
 
 /*  SPU Single Precision (Extended-Range Mode) format is the same as IEEE
@@ -3126,8 +3123,7 @@ const struct real_format spu_single_format =
     true,
     true,
     false,
-    false,
-    "spu_single"
+    false
   };
 \f
 /* IEEE double-precision format.  */
@@ -3336,8 +3332,7 @@ const struct real_format ieee_double_format =
     true,
     true,
     true,
-    false,
-    "ieee_double"
+    false
   };
 
 const struct real_format mips_double_format =
@@ -3358,8 +3353,7 @@ const struct real_format mips_double_format =
     true,
     true,
     false,
-    true,
-    "mips_double"
+    true
   };
 
 const struct real_format motorola_double_format =
@@ -3380,8 +3374,7 @@ const struct real_format motorola_double_format =
     true,
     true,
     true,
-    true,
-    "motorola_double"
+    true
   };
 \f
 /* IEEE extended real format.  This comes in three flavors: Intel's as
@@ -3725,8 +3718,7 @@ const struct real_format ieee_extended_motorola_format =
     true,
     true,
     true,
-    true,
-    "ieee_extended_motorola"
+    true
   };
 
 const struct real_format ieee_extended_intel_96_format =
@@ -3747,8 +3739,7 @@ const struct real_format ieee_extended_intel_96_format =
     true,
     true,
     true,
-    false,
-    "ieee_extended_intel_96"
+    false
   };
 
 const struct real_format ieee_extended_intel_128_format =
@@ -3769,8 +3760,7 @@ const struct real_format ieee_extended_intel_128_format =
     true,
     true,
     true,
-    false,
-    "ieee_extended_intel_128"
+    false
   };
 
 /* The following caters to i386 systems that set the rounding precision
@@ -3793,8 +3783,7 @@ const struct real_format ieee_extended_intel_96_round_53_format =
     true,
     true,
     true,
-    false,
-    "ieee_extended_intel_96_round_53"
+    false
   };
 \f
 /* IBM 128-bit extended precision format: a pair of IEEE double precision
@@ -3882,8 +3871,7 @@ const struct real_format ibm_extended_format =
     true,
     true,
     true,
-    false,
-    "ibm_extended"
+    false
   };
 
 const struct real_format mips_extended_format =
@@ -3904,8 +3892,7 @@ const struct real_format mips_extended_format =
     true,
     true,
     false,
-    true,
-    "mips_extended"
+    true
   };
 
 \f
@@ -4168,8 +4155,7 @@ const struct real_format ieee_quad_format =
     true,
     true,
     true,
-    false,
-    "ieee_quad"
+    false
   };
 
 const struct real_format mips_quad_format =
@@ -4190,8 +4176,7 @@ const struct real_format mips_quad_format =
     true,
     true,
     false,
-    true,
-    "mips_quad"
+    true
   };
 \f
 /* Descriptions of VAX floating point formats can be found beginning at
@@ -4491,8 +4476,7 @@ const struct real_format vax_f_format =
     false,
     false,
     false,
-    false,
-    "vax_f"
+    false
   };
 
 const struct real_format vax_d_format =
@@ -4513,8 +4497,7 @@ const struct real_format vax_d_format =
     false,
     false,
     false,
-    false,
-    "vax_d"
+    false
   };
 
 const struct real_format vax_g_format =
@@ -4535,8 +4518,7 @@ const struct real_format vax_g_format =
     false,
     false,
     false,
-    false,
-    "vax_g"
+    false
   };
 \f
 /* Encode real R into a single precision DFP value in BUF.  */
@@ -4612,8 +4594,7 @@ const struct real_format decimal_single_format =
     true,
     true,
     true,
-    false,
-    "decimal_single"
+    false
   };
 
 /* Double precision decimal floating point (IEEE 754). */
@@ -4635,8 +4616,7 @@ const struct real_format decimal_double_format =
     true,
     true,
     true,
-    false,
-    "decimal_double"
+    false
   };
 
 /* Quad precision decimal floating point (IEEE 754). */
@@ -4658,8 +4638,7 @@ const struct real_format decimal_quad_format =
     true,
     true,
     true,
-    false,
-    "decimal_quad"
+    false
   };
 \f
 /* Encode half-precision floats.  This routine is used both for the IEEE
@@ -4796,8 +4775,7 @@ const struct real_format ieee_half_format =
     true,
     true,
     true,
-    false,
-    "ieee_half"
+    false
   };
 
 /* ARM's alternative half-precision format, similar to IEEE but with
@@ -4821,8 +4799,7 @@ const struct real_format arm_half_format =
     true,
     true,
     false,
-    false,
-    "arm_half"
+    false
   };
 \f
 /* A synthetic "format" for internal arithmetic.  It's the size of the
@@ -4867,8 +4844,7 @@ const struct real_format real_internal_format =
     false,
     true,
     true,
-    false,
-    "real_internal"
+    false
   };
 \f
 /* Calculate X raised to the integer exponent N in format FMT and store
diff --git a/gcc/real.h b/gcc/real.h
index be95161..988bd52 100644
--- a/gcc/real.h
+++ b/gcc/real.h
@@ -150,7 +150,6 @@ struct real_format
   bool has_signed_zero;
   bool qnan_msb_set;
   bool canonical_nan_lsbs_set;
-  const char *name;
 };
 
 
diff --git a/gcc/tree-streamer-in.c b/gcc/tree-streamer-in.c
index 1da01e2..00d8699 100644
--- a/gcc/tree-streamer-in.c
+++ b/gcc/tree-streamer-in.c
@@ -32,6 +32,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "builtins.h"
 #include "ipa-chkp.h"
 #include "gomp-constants.h"
+#include "data-streamer.h"
 
 
 /* Read a STRING_CST from the string table in DATA_IN using input
@@ -204,7 +205,7 @@ static void
 unpack_ts_fixed_cst_value_fields (struct bitpack_d *bp, tree expr)
 {
   FIXED_VALUE_TYPE *fp = ggc_alloc<fixed_value> ();
-  fp->mode = bp_unpack_machine_mode (bp);
+  fp->mode = bp_unpack_enum (bp, machine_mode, 1 << 8);
   fp->data.low = bp_unpack_var_len_int (bp);
   fp->data.high = bp_unpack_var_len_int (bp);
   TREE_FIXED_CST_PTR (expr) = fp;
@@ -216,7 +217,7 @@ unpack_ts_fixed_cst_value_fields (struct bitpack_d *bp, tree expr)
 static void
 unpack_ts_decl_common_value_fields (struct bitpack_d *bp, tree expr)
 {
-  DECL_MODE (expr) = bp_unpack_machine_mode (bp);
+  DECL_MODE (expr) = bp_unpack_enum (bp, machine_mode, 1 << 8);
   DECL_NONLOCAL (expr) = (unsigned) bp_unpack_value (bp, 1);
   DECL_VIRTUAL_P (expr) = (unsigned) bp_unpack_value (bp, 1);
   DECL_IGNORED_P (expr) = (unsigned) bp_unpack_value (bp, 1);
@@ -356,7 +357,7 @@ unpack_ts_type_common_value_fields (struct bitpack_d *bp, tree expr)
 {
   machine_mode mode;
 
-  mode = bp_unpack_machine_mode (bp);
+  mode = bp_unpack_enum (bp, machine_mode, 1 << 8);
   SET_TYPE_MODE (expr, mode);
   TYPE_STRING_FLAG (expr) = (unsigned) bp_unpack_value (bp, 1);
   /* TYPE_NO_FORCE_BLK is private to stor-layout and need
diff --git a/gcc/tree-streamer-out.c b/gcc/tree-streamer-out.c
index f9272d6..4ec122c 100644
--- a/gcc/tree-streamer-out.c
+++ b/gcc/tree-streamer-out.c
@@ -31,6 +31,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "alias.h"
 #include "stor-layout.h"
 #include "gomp-constants.h"
+#include "data-streamer.h"
 
 
 /* Output the STRING constant to the string
@@ -173,7 +174,7 @@ static void
 pack_ts_fixed_cst_value_fields (struct bitpack_d *bp, tree expr)
 {
   struct fixed_value fv = TREE_FIXED_CST (expr);
-  bp_pack_machine_mode (bp, fv.mode);
+  bp_pack_enum (bp, machine_mode, 1 << 8, fv.mode);
   bp_pack_var_len_int (bp, fv.data.low);
   bp_pack_var_len_int (bp, fv.data.high);
 }
@@ -184,7 +185,7 @@ pack_ts_fixed_cst_value_fields (struct bitpack_d *bp, tree expr)
 static void
 pack_ts_decl_common_value_fields (struct bitpack_d *bp, tree expr)
 {
-  bp_pack_machine_mode (bp, DECL_MODE (expr));
+  bp_pack_enum (bp, machine_mode, 1 << 8, DECL_MODE (expr));
   bp_pack_value (bp, DECL_NONLOCAL (expr), 1);
   bp_pack_value (bp, DECL_VIRTUAL_P (expr), 1);
   bp_pack_value (bp, DECL_IGNORED_P (expr), 1);
@@ -308,10 +309,7 @@ pack_ts_function_decl_value_fields (struct bitpack_d *bp, tree expr)
 static void
 pack_ts_type_common_value_fields (struct bitpack_d *bp, tree expr)
 {
-  /* for VECTOR_TYPE, TYPE_MODE reevaluates the mode using target_flags
-     not necessary valid in a global context.
-     Use the raw value previously set by layout_type.  */
-  bp_pack_machine_mode (bp, TYPE_MODE_RAW (expr));
+  bp_pack_enum (bp, machine_mode, 1 << 8, TYPE_MODE_RAW (expr));
   bp_pack_value (bp, TYPE_STRING_FLAG (expr), 1);
   /* TYPE_NO_FORCE_BLK is private to stor-layout and need
      no streaming.  */
diff --git a/gcc/tree-streamer.c b/gcc/tree-streamer.c
index 7ea7096..71cef4d 100644
--- a/gcc/tree-streamer.c
+++ b/gcc/tree-streamer.c
@@ -29,14 +29,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-streamer.h"
 #include "cgraph.h"
 
-/* Table indexed by machine_mode, used for 2 different purposes.
-   During streaming out we record there non-zero value for all modes
-   that were streamed out.
-   During streaming in, we translate the on the disk mode using this
-   table.  For normal LTO it is set to identity, for ACCEL_COMPILER
-   depending on the mode_table content.  */
-unsigned char streamer_mode_table[1 << 8];
-
 /* Check that all the TS_* structures handled by the streamer_write_* and
    streamer_read_* routines are exactly ALL the structures defined in
    treestruct.def.  */
diff --git a/gcc/tree-streamer.h b/gcc/tree-streamer.h
index 41a76a2..9d4ea25 100644
--- a/gcc/tree-streamer.h
+++ b/gcc/tree-streamer.h
@@ -23,7 +23,7 @@ along with GCC; see the file COPYING3.  If not see
 #define GCC_TREE_STREAMER_H
 
 #include "streamer-hooks.h"
-#include "data-streamer.h"
+#include "lto-streamer.h"
 
 /* Cache of pickled nodes.  Used to avoid writing the same node more
    than once.  The first time a tree node is streamed out, it is
@@ -90,7 +90,6 @@ void streamer_write_integer_cst (struct output_block *, tree, bool);
 void streamer_write_builtin (struct output_block *, tree);
 
 /* In tree-streamer.c.  */
-extern unsigned char streamer_mode_table[1 << 8];
 void streamer_check_handled_ts_structures (void);
 bool streamer_tree_cache_insert (struct streamer_tree_cache_d *, tree,
 				 hashval_t, unsigned *);
@@ -119,19 +118,5 @@ streamer_tree_cache_get_hash (struct streamer_tree_cache_d *cache, unsigned ix)
   return cache->hashes[ix];
 }
 
-static inline void
-bp_pack_machine_mode (struct bitpack_d *bp, machine_mode mode)
-{
-  streamer_mode_table[mode] = 1;
-  bp_pack_enum (bp, machine_mode, 1 << 8, mode);
-}
-
-static inline machine_mode
-bp_unpack_machine_mode (struct bitpack_d *bp)
-{
-  return (machine_mode)
-	   ((struct lto_input_block *)
-	    bp->stream)->mode_table[bp_unpack_enum (bp, machine_mode, 1 << 8)];
-}
 
 #endif  /* GCC_TREE_STREAMER_H  */
-- 
1.8.3.1


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #4: Type: text/x-diff; name=0003-Infer-modes-for-decls.patch, Size: 6812 bytes --]

From f5caef0d4540334108fffd2ad38d3cb539e5bff3 Mon Sep 17 00:00:00 2001
From: Vladislav Ivanishin <vlad@ispras.ru>
Date: Fri, 30 Dec 2016 20:04:41 +0300
Subject: [PATCH 3/3] Infer modes for decls

* gcc/cfgexpand.c
* gcc/cp/class.c
* gcc/lto/lto.c
* gcc/stor-layout.c
* gcc/stor-layout.h
---
 gcc/cfgexpand.c   |  2 +-
 gcc/cp/class.c    |  1 +
 gcc/lto/lto.c     | 62 +++++++++++++++++++++++++++++++++++++++++++------------
 gcc/stor-layout.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++
 gcc/stor-layout.h |  2 ++
 5 files changed, 108 insertions(+), 14 deletions(-)

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 4ac8421..eeee07d 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -4481,7 +4481,7 @@ expand_debug_expr (tree exp)
         if (bitpos < 0)
           return NULL;
 
-	if (GET_MODE (op0) == BLKmode)
+	if (GET_MODE (op0) == BLKmode || mode == BLKmode)
 	  return NULL;
 
 	if ((bitpos % BITS_PER_UNIT) == 0
diff --git a/gcc/cp/class.c b/gcc/cp/class.c
index fc47f91..b907abb9 100644
--- a/gcc/cp/class.c
+++ b/gcc/cp/class.c
@@ -6406,6 +6406,7 @@ layout_class_type (tree t, tree *virtuals_p)
 	      TREE_TYPE (field)
 		= cp_build_qualified_type (TREE_TYPE (field),
 					   cp_type_quals (ftype));
+	      DECL_MODE (field) = TYPE_MODE (TREE_TYPE (field));
 	    }
 	}
 
diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index a4f6c14..29f18e7 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -1656,23 +1656,52 @@ unify_scc (struct data_in *data_in, unsigned from,
 }
 
 static void
-lto_infer_mode (tree type)
+lto_infer_mode (tree expr)
 {
-  if (!TYPE_P (type))
-    return;
+  if (TYPE_P (expr))
+    {
+      tree type = expr;
 
-  if (!COMPLETE_TYPE_P (type) && TYPE_MODE (type) == VOIDmode)
-    return;
+      machine_mode mode = TYPE_MODE (type);
 
-  /* C++ FE has complex logic for laying out classes.  We don't have
-     the information here to reproduce the decision process (nor do we
-     want to do it).  If the streamed mode is BLK (just like VOID it's BLK
-     everywhere) don't touch anything.  */
-  if (TREE_CODE (type) == RECORD_TYPE && TYPE_MODE (type) == BLKmode)
-    return;
+      if (!COMPLETE_TYPE_P (type) && TYPE_MODE (type) == VOIDmode)
+        return;
 
-  SET_TYPE_MODE (type, VOIDmode);
-  set_mode_for_type (type);
+      /* C++ FE has complex logic for laying out classes.  We don't have
+         the information here to reproduce the decision process (nor do we
+         want to do it).  If the streamed mode is BLK (just like VOID it's BLK
+         everywhere) don't touch anything.  */
+      if (TREE_CODE (type) == RECORD_TYPE && TYPE_MODE (type) == BLKmode)
+        return;
+
+      SET_TYPE_MODE (type, VOIDmode);
+      set_mode_for_type (type);
+
+      machine_mode our = TYPE_MODE (type);
+
+      if (mode != our)
+        {
+          if (TREE_CODE (expr) == FUNCTION_TYPE ||
+              TREE_CODE (expr) == METHOD_TYPE ||
+              tree_to_uhwi (TYPE_SIZE (expr)) > MAX_FIXED_MODE_SIZE)
+            ;
+          else
+            internal_error ("my-err");
+        }
+    }
+  else if (DECL_P (expr))
+    {
+      machine_mode streamed = DECL_MODE (expr);
+      if (DECL_MODE (expr) != VOIDmode && DECL_MODE (expr) != BLKmode)
+        {
+          DECL_MODE (expr) = VOIDmode;
+          set_mode_for_decl (expr);
+          if (DECL_MODE (expr) != streamed)
+            internal_error ("DECL_MODE - err");
+        }
+    }
+  else
+    gcc_unreachable ();
 }
 
 /* Read all the symbols from buffer DATA, using descriptors in DECL_DATA.
@@ -1794,6 +1823,13 @@ lto_read_decls (struct lto_file_decl_data *decl_data, const void *data,
 		    vec_safe_push (tree_with_vars, t);
 		}
 	    }
+          for (unsigned i = 0; i < len; ++i)
+            {
+              tree t = streamer_tree_cache_get_tree (data_in->reader_cache,
+                                                     from + i);
+              if (DECL_P (t))
+                lto_infer_mode (t);
+            }
 	  if (seen_type)
 	    num_type_scc_trees += len;
 	}
diff --git a/gcc/stor-layout.c b/gcc/stor-layout.c
index a264ea1..d1697b2 100644
--- a/gcc/stor-layout.c
+++ b/gcc/stor-layout.c
@@ -2541,6 +2541,61 @@ set_mode_for_type (tree type)
     gcc_assert (!TYPE_ALIAS_SET_KNOWN_P (type));
 }
 
+
+/* Determine and set mode for DECL.  Assume other attributes of DECL are already
+   set.  */
+
+// Well, there are actually 2 options:
+// 1) try following layout_decl, where in most cases mode is inferred from type's mode
+//    (but there are exceptions such as bit fields).
+// 2) use other attributes of the DECL (such as size) to reconstruct mode.
+void
+set_mode_for_decl (tree decl)
+{
+  tree type = TREE_TYPE (decl);
+  enum tree_code code = TREE_CODE (decl);
+
+  if (code == CONST_DECL)
+    return;
+
+  if (code == FUNCTION_DECL)
+    {
+      DECL_MODE (decl) = FUNCTION_MODE;
+      return;
+    }
+
+  gcc_assert (code == VAR_DECL || code == PARM_DECL || code == RESULT_DECL
+	      || code == TYPE_DECL ||code == FIELD_DECL);
+
+ // if (code == FIELD_DECL && DECL_BIT_FIELD (decl))
+ //   type = DECL_BIT_FIELD_TYPE (decl);
+
+  DECL_MODE (decl) = TYPE_MODE (type);
+
+  if (code == FIELD_DECL && DECL_BIT_FIELD (decl))
+    {
+
+      /* See if we can use an ordinary integer mode for a bit-field.
+         Conditions are: a fixed size that is correct for another mode,
+         occupying a complete byte or bytes on proper boundary.  */
+      if (TYPE_SIZE (type) != 0
+          && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST
+          && GET_MODE_CLASS (TYPE_MODE (type)) == MODE_INT)
+        {
+          machine_mode xmode
+            = mode_for_size_tree (DECL_SIZE (decl), MODE_INT, 1);
+          unsigned int xalign = GET_MODE_ALIGNMENT (xmode);
+
+          if (xmode != BLKmode
+              && !(xalign > BITS_PER_UNIT && DECL_PACKED (decl))
+              && (DECL_ALIGN (decl) >= xalign))
+            {
+              DECL_MODE (decl) = xmode;
+            }
+        }
+    }
+}
+
 /* Return the least alignment required for type TYPE.  */
 
 unsigned int
diff --git a/gcc/stor-layout.h b/gcc/stor-layout.h
index ee44207..4de6ddf 100644
--- a/gcc/stor-layout.h
+++ b/gcc/stor-layout.h
@@ -61,6 +61,8 @@ extern void layout_decl (tree, unsigned);
 extern void layout_type (tree);
 /* Calculate and set mode for the given type.  */
 extern void set_mode_for_type (tree);
+/* Calculate and set mode for the given decl.  */
+extern void set_mode_for_decl (tree);
 
 /* Return the least alignment in bytes required for type TYPE.  */
 extern unsigned int min_align_of_type (tree);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: LTO remapping/deduction of machine modes of types/decls
  2016-12-30 18:37 LTO remapping/deduction of machine modes of types/decls Alexander Monakov
@ 2017-01-02 10:19 ` Jakub Jelinek
  2017-01-02 15:39   ` Alexander Monakov
  2017-01-04 10:04   ` Richard Biener
  0 siblings, 2 replies; 16+ messages in thread
From: Jakub Jelinek @ 2017-01-02 10:19 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: gcc, Richard Biener, Vladislav Ivanishin

On Fri, Dec 30, 2016 at 08:40:11PM +0300, Alexander Monakov wrote:
> Hello, Richard, Jakub, community,
> 
> May I join/restart the old discussion about machine mode remapping at LTO
> stream-in time.  To recap, when offloading to NVPTX was introduced, there
> was a problem due to differences in the set of supported modes (e.g. there
> was no 'XFmode' on NVPTX that would correspond to 'long double' tree type
> node in GIMPLE LTO streams produced by x86 host compiler).
> 
> The current solution in GCC is to additionally stream a 'mode table' and use it
> to remap numeric mode identifiers during LTO stream-in in all trees that have
> modes.  This is the solution initially outlined by Jakub in the message
> https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00226.html .  In response to that,
> Richard said,

In my view mode is essential part of the type system.  It (sadly, but still)
participates in many ABI decisions, but more importantly especially for
floating point types it is the main source of information of what the type
actually is, as just size and precision are nowhere near enough.
The precision/size isn't able to carry information like whether the type is
decimal or binary floating, what padding it has and where, what NaN etc.
conventions it uses.  So trying to throw away modes and reconstruct them
looks conceptually wrong to me.  One can also just use
float __attribute__((mode (XFmode))) or float __attribute__((mode (TFmode)))
or float __attribute__((mode (KFmode))) or IFmode etc., how do you want to
differentiate between those?  And I don't see how this can help with the
long double stuff for NVPTX offloading.  If user uses 80-bit long double
(or mode(XFmode) floats/doubles) in his source, then as PTX only has SFmode
and DFmode (perhaps also HFmode?), the only way to get it working is through
emulation (whether soft-fp, or writing some emulation using double,
whatever).  Pretending long double on the host is DFmode on the PTX side
just won't work, they have different representation.

	Jakub

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: LTO remapping/deduction of machine modes of types/decls
  2017-01-02 10:19 ` Jakub Jelinek
@ 2017-01-02 15:39   ` Alexander Monakov
  2017-01-02 15:54     ` Jakub Jelinek
  2017-01-04 10:04   ` Richard Biener
  1 sibling, 1 reply; 16+ messages in thread
From: Alexander Monakov @ 2017-01-02 15:39 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc, Richard Biener, Vladislav Ivanishin

On Mon, 2 Jan 2017, Jakub Jelinek wrote:
> In my view mode is essential part of the type system.  It (sadly, but still)
> participates in many ABI decisions, but more importantly especially for
> floating point types it is the main source of information of what the type
> actually is, as just size and precision are nowhere near enough.
> The precision/size isn't able to carry information like whether the type is
> decimal or binary floating, what padding it has and where, what NaN etc.
> conventions it uses.  So trying to throw away modes and reconstruct them
> looks conceptually wrong to me.

I wonder if it's possible to have a small tag in tree types to distinguish
between binary/decimal/fixed-point types.  For prototyping, I was thinking
about just looking at the type name, because non-ieee-binary types have an
easily recognizable prefix.

For padding and NaN variability, can you point me on which targets the modes
affect that? The "Machine Modes" chapter in the documentation doesn't give a
hint (IFmode/KFmode are not documented there either).


Alternatively, is reconstructing all modes necessary in the first place? On
tree level GCC has explicit trees for all fundamental types like
float_type_node. Is it possible to remap just those trees? Modes of composite
types should be deducible, and modes of decls may be deducible from their types
(not sure; why do decls have modes separately from their types, anyway?)

> One can also just use
> float __attribute__((mode (XFmode))) or float __attribute__((mode (TFmode)))
> or float __attribute__((mode (KFmode))) or IFmode etc., how do you want to
> differentiate between those?  And I don't see how this can help with the
> long double stuff for NVPTX offloading.  If user uses 80-bit long double
> (or mode(XFmode) floats/doubles) in his source, then as PTX only has SFmode
> and DFmode (perhaps also HFmode?), the only way to get it working is through
> emulation (whether soft-fp, or writing some emulation using double,
> whatever).  Pretending long double on the host is DFmode on the PTX side
> just won't work, they have different representation.

(yes, PTX spec has half floats, but GCC doesn't implement those on PTX today,
and thus doesn't have HFmode now)

For attribute-mode, I wasn't aware of KFmode/IFmode stuff; wherever the modes
affect semantics without leaving any other trace in the type, leaving out the
mode loses information. So either one keeps the modes, or adds sufficient
tagging in the type tree.

For long double, I think offloading to PTX should have the following semantics:
size/alignment of long double matches those on host. Otherwise, storage layout
of composite types won't match, and that's really bad. But otherwise long double
is the same as double on PTX (so for offloading from x86-64 it has 64 bits of
padding). This means that long double data is not transferable between
accelerator and host, but otherwise gives the most sane semantics I can imagine.
I think this implies that XFmode/TFmode don't need to exist on NVPTX to back
long_double_type_node.

Thanks.
Alexander

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: LTO remapping/deduction of machine modes of types/decls
  2017-01-02 15:39   ` Alexander Monakov
@ 2017-01-02 15:54     ` Jakub Jelinek
  2017-01-02 18:50       ` Alexander Monakov
  0 siblings, 1 reply; 16+ messages in thread
From: Jakub Jelinek @ 2017-01-02 15:54 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: gcc, Richard Biener, Vladislav Ivanishin

On Mon, Jan 02, 2017 at 06:38:28PM +0300, Alexander Monakov wrote:
> I wonder if it's possible to have a small tag in tree types to distinguish
> between binary/decimal/fixed-point types.  For prototyping, I was thinking
> about just looking at the type name, because non-ieee-binary types have an
> easily recognizable prefix.
> 
> For padding and NaN variability, can you point me on which targets the modes
> affect that? The "Machine Modes" chapter in the documentation doesn't give a
> hint (IFmode/KFmode are not documented there either).

Each target registers its own modes, IFmode/KFmode are e.g. from rs6000.
For floating point modes, one then associates those with some description in
real.[ch] that identifies them.  But even on x86_64, there is XFmode and
TFmode, both have the same size, in this case they happen to have different
precision, so precision+size can be used to reconstruct a mode if you know
the mode would have to be one of XFmode or TFmode.

> For attribute-mode, I wasn't aware of KFmode/IFmode stuff; wherever the modes
> affect semantics without leaving any other trace in the type, leaving out the
> mode loses information. So either one keeps the modes, or adds sufficient
> tagging in the type tree.
> 
> For long double, I think offloading to PTX should have the following semantics:
> size/alignment of long double matches those on host. Otherwise, storage layout
> of composite types won't match, and that's really bad. But otherwise long double
> is the same as double on PTX (so for offloading from x86-64 it has 64 bits of
> padding). This means that long double data is not transferable between
> accelerator and host, but otherwise gives the most sane semantics I can imagine.
> I think this implies that XFmode/TFmode don't need to exist on NVPTX to back
> long_double_type_node.

If the host has long double the same as double, sure, PTX can use its native
DFmode even for long double.  But otherwise, the storage must be
transferable between accelerator and host.  If you want to implement very
imprecise XFmode (Intel extended 80-bit double) and TFmode (IEEE quad) for PTX that
just honors those during memory loads and stores, basically performs what would
cast to double do for loads on the host and cast from double to long double
or __float128 for stores, and performs everything else as DFmode
computations, it might be possible.  Or as I said emulate the modes
unsupported on the hw in libgcc.a.  Or error out on long double uses as we
do now.  People who care about performance with PTX offloading won't use
long double anyway when the HW doesn't support it.

	Jakub

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: LTO remapping/deduction of machine modes of types/decls
  2017-01-02 15:54     ` Jakub Jelinek
@ 2017-01-02 18:50       ` Alexander Monakov
  2017-01-02 19:03         ` Jakub Jelinek
  0 siblings, 1 reply; 16+ messages in thread
From: Alexander Monakov @ 2017-01-02 18:50 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc, Richard Biener, Vladislav Ivanishin

On Mon, 2 Jan 2017, Jakub Jelinek wrote:
> If the host has long double the same as double, sure, PTX can use its native
> DFmode even for long double.  But otherwise, the storage must be
> transferable between accelerator and host.

Hm, sorry, the 'must' is not obvious to me: is it known that the OpenMP ARB
would find only this implementation behavior acceptable?


Apart from floating-point types, are there other situations where modes carry
information not deducible from the rest of the tree node?

Thanks.
Alexander

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: LTO remapping/deduction of machine modes of types/decls
  2017-01-02 18:50       ` Alexander Monakov
@ 2017-01-02 19:03         ` Jakub Jelinek
  2017-01-02 19:39           ` Alexander Monakov
  2017-01-04 10:14           ` Richard Biener
  0 siblings, 2 replies; 16+ messages in thread
From: Jakub Jelinek @ 2017-01-02 19:03 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: gcc, Richard Biener, Vladislav Ivanishin

On Mon, Jan 02, 2017 at 09:49:55PM +0300, Alexander Monakov wrote:
> On Mon, 2 Jan 2017, Jakub Jelinek wrote:
> > If the host has long double the same as double, sure, PTX can use its native
> > DFmode even for long double.  But otherwise, the storage must be
> > transferable between accelerator and host.
> 
> Hm, sorry, the 'must' is not obvious to me: is it known that the OpenMP ARB
> would find only this implementation behavior acceptable?

long double is not non-mappable type in the spec, so it is supposed to work.
The implementation may choose not to offload whenever it sees long
double/__float128/_Float128/_Float128x etc.

> Apart from floating-point types, are there other situations where modes carry
> information not deducible from the rest of the tree node?

Dunno about fixed types, partial ints etc., but it is mostly floating point
types, sure.

	Jakub

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: LTO remapping/deduction of machine modes of types/decls
  2017-01-02 19:03         ` Jakub Jelinek
@ 2017-01-02 19:39           ` Alexander Monakov
  2017-01-02 20:24             ` Jakub Jelinek
  2017-01-04 10:14           ` Richard Biener
  1 sibling, 1 reply; 16+ messages in thread
From: Alexander Monakov @ 2017-01-02 19:39 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc, Richard Biener, Vladislav Ivanishin

On Mon, 2 Jan 2017, Jakub Jelinek wrote:
> On Mon, Jan 02, 2017 at 09:49:55PM +0300, Alexander Monakov wrote:
> > On Mon, 2 Jan 2017, Jakub Jelinek wrote:
> > > If the host has long double the same as double, sure, PTX can use its native
> > > DFmode even for long double.  But otherwise, the storage must be
> > > transferable between accelerator and host.
> > 
> > Hm, sorry, the 'must' is not obvious to me: is it known that the OpenMP ARB
> > would find only this implementation behavior acceptable?
> 
> long double is not non-mappable type in the spec, so it is supposed to work.
> The implementation may choose not to offload whenever it sees long
> double/__float128/_Float128/_Float128x etc.

But this is not something the implementation can properly enforce; consider

  long double v;
  char buf[sizeof v];
  #pragma omp target map(from:buf)
    sscanf ("1.0", "%Lf", buf);
  memcpy(&v, buf, sizeof v);

The offloading compiler wouldn't see a 'long double' anywhere, it gets brought
in at linking stage. So the implementation would have to tag individual
translation units and see only in the end of linking if the offloaded image
touches a long double datum anywhere. And as the example shows, it would prevent
using printf-like functions.

Alexander

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: LTO remapping/deduction of machine modes of types/decls
  2017-01-02 19:39           ` Alexander Monakov
@ 2017-01-02 20:24             ` Jakub Jelinek
  0 siblings, 0 replies; 16+ messages in thread
From: Jakub Jelinek @ 2017-01-02 20:24 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: gcc, Richard Biener, Vladislav Ivanishin

On Mon, Jan 02, 2017 at 10:38:54PM +0300, Alexander Monakov wrote:
> On Mon, 2 Jan 2017, Jakub Jelinek wrote:
> > On Mon, Jan 02, 2017 at 09:49:55PM +0300, Alexander Monakov wrote:
> > > On Mon, 2 Jan 2017, Jakub Jelinek wrote:
> > > > If the host has long double the same as double, sure, PTX can use its native
> > > > DFmode even for long double.  But otherwise, the storage must be
> > > > transferable between accelerator and host.
> > > 
> > > Hm, sorry, the 'must' is not obvious to me: is it known that the OpenMP ARB
> > > would find only this implementation behavior acceptable?
> > 
> > long double is not non-mappable type in the spec, so it is supposed to work.
> > The implementation may choose not to offload whenever it sees long
> > double/__float128/_Float128/_Float128x etc.
> 
> But this is not something the implementation can properly enforce; consider
> 
>   long double v;
>   char buf[sizeof v];
>   #pragma omp target map(from:buf)
>     sscanf ("1.0", "%Lf", buf);
>   memcpy(&v, buf, sizeof v);
> 
> The offloading compiler wouldn't see a 'long double' anywhere, it gets brought
> in at linking stage. So the implementation would have to tag individual
> translation units and see only in the end of linking if the offloaded image
> touches a long double datum anywhere. And as the example shows, it would prevent
> using printf-like functions.

Well, it can, but of course it can be more or much more work.  The case where
long double appears in code seen by the offloading compiler is likely more common
and easier to deal with, it is just a matter of handling the various modes.
For the newlib library routines that deal with long double, it can be solved
e.g. by multilibing across the long double choices (but, we have usually
XFmode on x86_64-linux, on powerpc64le-linux it can be the IBM double double
(IFmode) or IEEE quad (KFmode), ...), or by just multilibing the affected
subset of functions or passing info on what long double is some other way
to the printf*/scanf* family of functions.

	Jakub

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: LTO remapping/deduction of machine modes of types/decls
  2017-01-02 10:19 ` Jakub Jelinek
  2017-01-02 15:39   ` Alexander Monakov
@ 2017-01-04 10:04   ` Richard Biener
  2017-01-09 18:56     ` Alexander Monakov
  1 sibling, 1 reply; 16+ messages in thread
From: Richard Biener @ 2017-01-04 10:04 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Alexander Monakov, gcc, Vladislav Ivanishin

On Mon, 2 Jan 2017, Jakub Jelinek wrote:

> On Fri, Dec 30, 2016 at 08:40:11PM +0300, Alexander Monakov wrote:
> > Hello, Richard, Jakub, community,
> > 
> > May I join/restart the old discussion about machine mode remapping at LTO
> > stream-in time.  To recap, when offloading to NVPTX was introduced, there
> > was a problem due to differences in the set of supported modes (e.g. there
> > was no 'XFmode' on NVPTX that would correspond to 'long double' tree type
> > node in GIMPLE LTO streams produced by x86 host compiler).
> > 
> > The current solution in GCC is to additionally stream a 'mode table' and use it
> > to remap numeric mode identifiers during LTO stream-in in all trees that have
> > modes.  This is the solution initially outlined by Jakub in the message
> > https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00226.html .  In response to that,
> > Richard said,

My suggestion at that time isn't likely working in practice due to the
limitations Jakub outlines below.  The situation is a bit unfortunate
but expect to run into more host(!) dependences in the LTO bytecode.
Yes, while it would be nice to LTO x86_64->arm and ppc64le->arm
LTO bytecode it very likely isn't going to work.

> In my view mode is essential part of the type system.  It (sadly, but still)
> participates in many ABI decisions, but more importantly especially for
> floating point types it is the main source of information of what the type
> actually is, as just size and precision are nowhere near enough.
> The precision/size isn't able to carry information like whether the type is
> decimal or binary floating, what padding it has and where, what NaN etc.
> conventions it uses.  So trying to throw away modes and reconstruct them
> looks conceptually wrong to me.  One can also just use
> float __attribute__((mode (XFmode))) or float __attribute__((mode (TFmode)))
> or float __attribute__((mode (KFmode))) or IFmode etc., how do you want to
> differentiate between those?  And I don't see how this can help with the
> long double stuff for NVPTX offloading.  If user uses 80-bit long double
> (or mode(XFmode) floats/doubles) in his source, then as PTX only has SFmode
> and DFmode (perhaps also HFmode?), the only way to get it working is through
> emulation (whether soft-fp, or writing some emulation using double,
> whatever).  Pretending long double on the host is DFmode on the PTX side
> just won't work, they have different representation.
> 
> 	Jakub
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: LTO remapping/deduction of machine modes of types/decls
  2017-01-02 19:03         ` Jakub Jelinek
  2017-01-02 19:39           ` Alexander Monakov
@ 2017-01-04 10:14           ` Richard Biener
  1 sibling, 0 replies; 16+ messages in thread
From: Richard Biener @ 2017-01-04 10:14 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Alexander Monakov, gcc, Vladislav Ivanishin

On Mon, 2 Jan 2017, Jakub Jelinek wrote:

> On Mon, Jan 02, 2017 at 09:49:55PM +0300, Alexander Monakov wrote:
> > On Mon, 2 Jan 2017, Jakub Jelinek wrote:
> > > If the host has long double the same as double, sure, PTX can use its native
> > > DFmode even for long double.  But otherwise, the storage must be
> > > transferable between accelerator and host.
> > 
> > Hm, sorry, the 'must' is not obvious to me: is it known that the OpenMP ARB
> > would find only this implementation behavior acceptable?
> 
> long double is not non-mappable type in the spec, so it is supposed to work.
> The implementation may choose not to offload whenever it sees long
> double/__float128/_Float128/_Float128x etc.
> 
> > Apart from floating-point types, are there other situations where modes carry
> > information not deducible from the rest of the tree node?
> 
> Dunno about fixed types, partial ints etc., but it is mostly floating point
> types, sure.

Mostly floats I guess.  But just to say it would be very nice to have
enough information in the trees so layout_type can re-construct the
mode.  It already does for 99% of the types...  (just grep for
SET_TYPE_MODE).

Richard.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: LTO remapping/deduction of machine modes of types/decls
  2017-01-04 10:04   ` Richard Biener
@ 2017-01-09 18:56     ` Alexander Monakov
  2017-01-10  9:23       ` Richard Biener
  0 siblings, 1 reply; 16+ messages in thread
From: Alexander Monakov @ 2017-01-09 18:56 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jakub Jelinek, gcc, Vladislav Ivanishin

On Wed, 4 Jan 2017, Richard Biener wrote:
> My suggestion at that time isn't likely working in practice due to the
> limitations Jakub outlines below.  The situation is a bit unfortunate
> but expect to run into more host(!) dependences in the LTO bytecode.
> Yes, while it would be nice to LTO x86_64->arm and ppc64le->arm
> LTO bytecode it very likely isn't going to work.

Yes, I think it's not really practical to seek wide portability of LTO bytecode.
After all, platform specifics get into constant expressions (e.g. 'int p =
sizeof (void *);') and are also observable on the preprocessor level (e.g. via
'#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__').

However the accelerator platform must be compatible with the host platform in
almost all ABI (storage layout?) features such as type sizes and alignments,
endianness, default signedness of char, bitfield layout, and possibly others
(but yet in the other subthread I was arguing that compromising and making
'long double' only partially compatible makes sense).  Thus, portability issue
is much smaller in scope here.

I think it's a bit unfortunate that the discussion really focused on the trouble
with floating-point types.  I'd really appreciate any insight on the other
questions that were raised, such as whether the decl mode should match that
decl's type mode.

For floating types, I believe in the long run it should be possible to tag tree
type nodes with the floating-point type 'kind' such as IEEE binary, IEEE
decimal, accum/fract/sat, or IBM double-double.

For our original goal, I think we'll switch to the other solution I've outlined
in the opening mail, i.e. propagating mode tables at WPA stage and keeping
enough information to know if the section comes from the host or native
compiler.

Thanks.
Alexander

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: LTO remapping/deduction of machine modes of types/decls
  2017-01-09 18:56     ` Alexander Monakov
@ 2017-01-10  9:23       ` Richard Biener
  2017-01-10 15:30         ` Alexander Monakov
  2017-01-10 15:53         ` Vladislav Ivanishin
  0 siblings, 2 replies; 16+ messages in thread
From: Richard Biener @ 2017-01-10  9:23 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: Jakub Jelinek, gcc, Vladislav Ivanishin

On Mon, 9 Jan 2017, Alexander Monakov wrote:

> On Wed, 4 Jan 2017, Richard Biener wrote:
> > My suggestion at that time isn't likely working in practice due to the
> > limitations Jakub outlines below.  The situation is a bit unfortunate
> > but expect to run into more host(!) dependences in the LTO bytecode.
> > Yes, while it would be nice to LTO x86_64->arm and ppc64le->arm
> > LTO bytecode it very likely isn't going to work.
> 
> Yes, I think it's not really practical to seek wide portability of LTO bytecode.
> After all, platform specifics get into constant expressions (e.g. 'int p =
> sizeof (void *);') and are also observable on the preprocessor level (e.g. via
> '#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__').
> 
> However the accelerator platform must be compatible with the host platform in
> almost all ABI (storage layout?) features such as type sizes and alignments,
> endianness, default signedness of char, bitfield layout, and possibly others
> (but yet in the other subthread I was arguing that compromising and making
> 'long double' only partially compatible makes sense).  Thus, portability issue
> is much smaller in scope here.
> 
> I think it's a bit unfortunate that the discussion really focused on the trouble
> with floating-point types.  I'd really appreciate any insight on the other
> questions that were raised, such as whether the decl mode should match that
> decl's type mode.

In general I think they should match.  But without seeing concrete 
examples of where they do not I can't comment on whether such exceptions
make sense.  For example if you adjust a DECLs alignment and then
re-layout it I'd expect you might get a non-BLKmode mode for an
aggregate in some circumstances -- but then decl and type are not 1:1
compatible (due to different alignment), but this case is clearly desired
as requiring type copies for the sake of alignment would be wasteful.

Generally all "redundancies" we have in decls vs. types expose the
possibility of conflicting/non-matching information.

> For floating types, I believe in the long run it should be possible to tag tree
> type nodes with the floating-point type 'kind' such as IEEE binary, IEEE
> decimal, accum/fract/sat, or IBM double-double.

The most simplistic solution would be to have a real_format * field in the
type (currently there's only a global mode-to-real_format mapping).  All
types are currently tree_type_non_common (huh), so there's some candidates
to put that info on (TYPE_BINFO, though that's used as TYPE_LANG_SLOT_1 - 
ugh, TYPE_VALUES looks unused though).

> For our original goal, I think we'll switch to the other solution I've 
> outlined in the opening mail, i.e. propagating mode tables at WPA stage 
> and keeping enough information to know if the section comes from the 
> host or native compiler.

So add a hack ontop of the hack?  Ugh.  So why exactly doesn't it
already work?  It looks like decls and types have their modes
"fixed" with the per-file mode table at WPA time.  So what is missing
is to "fix" modes in the per-function sections that are not touched
by WPA?

Richard.

> Thanks.
> Alexander
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: LTO remapping/deduction of machine modes of types/decls
  2017-01-10  9:23       ` Richard Biener
@ 2017-01-10 15:30         ` Alexander Monakov
  2017-01-11  8:16           ` Richard Biener
  2017-01-10 15:53         ` Vladislav Ivanishin
  1 sibling, 1 reply; 16+ messages in thread
From: Alexander Monakov @ 2017-01-10 15:30 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jakub Jelinek, gcc, Vladislav Ivanishin

On Tue, 10 Jan 2017, Richard Biener wrote:
> In general I think they should match.  But without seeing concrete 
> examples of where they do not I can't comment on whether such exceptions
> make sense.  For example if you adjust a DECLs alignment and then
> re-layout it I'd expect you might get a non-BLKmode mode for an
> aggregate in some circumstances -- but then decl and type are not 1:1
> compatible (due to different alignment), but this case is clearly desired
> as requiring type copies for the sake of alignment would be wasteful.

Thanks; Vlad will follow up with (I believe) a different kind of mismatches
originating in the C++ front-end.

> > For our original goal, I think we'll switch to the other solution I've 
> > outlined in the opening mail, i.e. propagating mode tables at WPA stage 
> > and keeping enough information to know if the section comes from the 
> > host or native compiler.
> 
> So add a hack ontop of the hack?  Ugh.  So why exactly doesn't it
> already work?  It looks like decls and types have their modes
> "fixed" with the per-file mode table at WPA time.  So what is missing
> is to "fix" modes in the per-function sections that are not touched
> by WPA?

WPA re-streams packed function bodies as-is, so anything referred to
from within just the body won't be subject to mode remapping; I think
only modes of toplevel declarations and functions' arguments will be
remapped.  And I believe it wouldn't be acceptable to unpack/remap/repack
function bodies at WPA stage (it's contrary to LTO scalability goal).

Alexander

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: LTO remapping/deduction of machine modes of types/decls
  2017-01-10  9:23       ` Richard Biener
  2017-01-10 15:30         ` Alexander Monakov
@ 2017-01-10 15:53         ` Vladislav Ivanishin
  1 sibling, 0 replies; 16+ messages in thread
From: Vladislav Ivanishin @ 2017-01-10 15:53 UTC (permalink / raw)
  To: Richard Biener; +Cc: Alexander Monakov, Jakub Jelinek, gcc

Hi

> In general I think they should match.  But without seeing concrete
> examples of where they do not I can't comment on whether such 
> exceptions
> make sense.  For example if you adjust a DECLs alignment and then
> re-layout it I'd expect you might get a non-BLKmode mode for an
> aggregate in some circumstances -- but then decl and type are not 1:1
> compatible (due to different alignment), but this case is clearly 
> desired
> as requiring type copies for the sake of alignment would be wasteful.

The C++ FE lays out base classes as FIELD_DECLs. In build_base_field_1 
the
mode for such decls is set:

gcc/cp/class.c
4528  SET_DECL_MODE (decl, TYPE_MODE (basetype));

Before this the type of this decl is set to CLASSTYPE_AS_BASE (basetype) 
via a
call to build_decl. Thus in general DECL_MODE (decl) != TYPE_MODE (its 
type)
(and it does happen e.g. TYPE_MODE (basetype) == BLKmode and
TYPE_MODE (CLASSTYPE_AS_BASE (basetype)) == DImode in the example 
below).

Moreover, the types of such fields are later re-set in layout_class_type 
to
values unrelated to both basetype and CLASSTYPE_AS_BASE (basetype):

gcc/cp/class.c
6692 /* Now that we're done with layout, give the base fields the real 
types.
6693 for (field = TYPE_FIELDS (t); field; field = DECL_CHAIN (field))
6694   if (DECL_ARTIFICIAL (field) && IS_FAKE_BASE_TYPE (TREE_TYPE 
(field)))
6695     TREE_TYPE (field) = TYPE_CONTEXT (TREE_TYPE (field));

So the modes of the decl and its type don't match and we've lost the
information needed to deduce the mode of the decl.

Here is the aforementioned example:

   struct operand
   {
       virtual void gen_transform () {}
   };

   struct c_expr : public operand {};

Here is another one (demonstrates a different kind of mismatch):

   class hsa_op_base
   {
     public:
       hsa_op_base *m_next;
       unsigned m_brig_op_offset;
     protected:
       hsa_op_base (int k);
   };

   class hsa_op_reg: public hsa_op_base {};

In the latter example TYPE_MODE (CLASSTYPE_AS_BASE (basetype)) == 
BLKmode and
TYPE_MODE (basetype) == TImode.

To be clear: in the examples above the problem (mismatch between 
streamed
type's mode and decl's mode if we are doing LTO) is with the implicit 
field in
the derived class corresponding to an instance of the base class.
-- 
Vlad

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: LTO remapping/deduction of machine modes of types/decls
  2017-01-10 15:30         ` Alexander Monakov
@ 2017-01-11  8:16           ` Richard Biener
  2017-01-11 14:06             ` Alexander Monakov
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Biener @ 2017-01-11  8:16 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: Jakub Jelinek, gcc, Vladislav Ivanishin

On Tue, 10 Jan 2017, Alexander Monakov wrote:

> On Tue, 10 Jan 2017, Richard Biener wrote:
> > In general I think they should match.  But without seeing concrete 
> > examples of where they do not I can't comment on whether such exceptions
> > make sense.  For example if you adjust a DECLs alignment and then
> > re-layout it I'd expect you might get a non-BLKmode mode for an
> > aggregate in some circumstances -- but then decl and type are not 1:1
> > compatible (due to different alignment), but this case is clearly desired
> > as requiring type copies for the sake of alignment would be wasteful.
> 
> Thanks; Vlad will follow up with (I believe) a different kind of mismatches
> originating in the C++ front-end.
> 
> > > For our original goal, I think we'll switch to the other solution I've 
> > > outlined in the opening mail, i.e. propagating mode tables at WPA stage 
> > > and keeping enough information to know if the section comes from the 
> > > host or native compiler.
> > 
> > So add a hack ontop of the hack?  Ugh.  So why exactly doesn't it
> > already work?  It looks like decls and types have their modes
> > "fixed" with the per-file mode table at WPA time.  So what is missing
> > is to "fix" modes in the per-function sections that are not touched
> > by WPA?
> 
> WPA re-streams packed function bodies as-is, so anything referred to
> from within just the body won't be subject to mode remapping; I think
> only modes of toplevel declarations and functions' arguments will be
> remapped.  And I believe it wouldn't be acceptable to unpack/remap/repack
> function bodies at WPA stage (it's contrary to LTO scalability goal).

Yes indeed.  But this means the mode-maps have to be per function
section (with possibly a way to "share" them?).  Or we need a way
to annotate function sections with "no need to re-map" as the
native nvptx sections don't need remapping and the others all use
the same map?

Richard.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: LTO remapping/deduction of machine modes of types/decls
  2017-01-11  8:16           ` Richard Biener
@ 2017-01-11 14:06             ` Alexander Monakov
  0 siblings, 0 replies; 16+ messages in thread
From: Alexander Monakov @ 2017-01-11 14:06 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jakub Jelinek, gcc, Vladislav Ivanishin

On Wed, 11 Jan 2017, Richard Biener wrote:
> > WPA re-streams packed function bodies as-is, so anything referred to
> > from within just the body won't be subject to mode remapping; I think
> > only modes of toplevel declarations and functions' arguments will be
> > remapped.  And I believe it wouldn't be acceptable to unpack/remap/repack
> > function bodies at WPA stage (it's contrary to LTO scalability goal).
> 
> Yes indeed.  But this means the mode-maps have to be per function
> section (with possibly a way to "share" them?).  Or we need a way
> to annotate function sections with "no need to re-map" as the
> native nvptx sections don't need remapping and the others all use
> the same map?

Right, the latter: we know that sections coming from the native compiler
already have the right modes and thus need no remapping, and the sections
coming from the host compiler all need remapping (and will use the same
mapping).  Prefixes of per-function section names already carry the distinction
(".gnu.lto_foo" vs. ".gnu.offload_lto_foo").

Alexander

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-01-11 14:06 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-30 18:37 LTO remapping/deduction of machine modes of types/decls Alexander Monakov
2017-01-02 10:19 ` Jakub Jelinek
2017-01-02 15:39   ` Alexander Monakov
2017-01-02 15:54     ` Jakub Jelinek
2017-01-02 18:50       ` Alexander Monakov
2017-01-02 19:03         ` Jakub Jelinek
2017-01-02 19:39           ` Alexander Monakov
2017-01-02 20:24             ` Jakub Jelinek
2017-01-04 10:14           ` Richard Biener
2017-01-04 10:04   ` Richard Biener
2017-01-09 18:56     ` Alexander Monakov
2017-01-10  9:23       ` Richard Biener
2017-01-10 15:30         ` Alexander Monakov
2017-01-11  8:16           ` Richard Biener
2017-01-11 14:06             ` Alexander Monakov
2017-01-10 15:53         ` Vladislav Ivanishin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).