public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc r12-7312] [nvptx] Initialize ptx regs
@ 2022-02-21 15:51 Tom de Vries
  0 siblings, 0 replies; only message in thread
From: Tom de Vries @ 2022-02-21 15:51 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:02aedc6f269b5e3c1f354edcf5b84d27b0a15946

commit r12-7312-g02aedc6f269b5e3c1f354edcf5b84d27b0a15946
Author: Tom de Vries <tdevries@suse.de>
Date:   Wed Feb 16 17:09:11 2022 +0100

    [nvptx] Initialize ptx regs
    
    With nvptx target, driver version 510.47.03 and board GT 1030 I, we run into:
    ...
    FAIL: gcc.c-torture/execute/pr53465.c -O1 execution test
    FAIL: gcc.c-torture/execute/pr53465.c -O2 execution test
    FAIL: gcc.c-torture/execute/pr53465.c -O3 -g execution test
    ...
    while the test-cases pass with nvptx-none-run -O0.
    
    The problem is that the generated ptx contains a read from an uninitialized
    ptx register, and the driver JIT doesn't handle this well.
    
    For -O2 and -O3, we can get rid of the FAIL using --param
    logical-op-non-short-circuit=0.  But not for -O1.
    
    At -O1, the test-case minimizes to:
    ...
    void __attribute__((noinline, noclone))
    foo (int y) {
      int c;
      for (int i = 0; i < y; i++)
        {
          int d = i + 1;
          if (i && d <= c)
            __builtin_abort ();
          c = d;
        }
    }
    
    int main () {
      foo (2); return 0;
    }
    ...
    
    Note that the test-case does not contain an uninitialized use.  In the first
    iteration, i is 0 and consequently c is not read.  In the second iteration, c
    is read, but by that time it's already initialized by 'c = d' from the first
    iteration.
    
    AFAICT the problem is introduced as follows: the conditional use of c in the
    loop body is translated into an unconditional use of c in the loop header:
    ...
      # c_1 = PHI <c_4(D)(2), c_9(6)>
    ...
    which forwprop1 propagates the 'c_9 = d_7' assignment into:
    ...
      # c_1 = PHI <c_4(D)(2), d_7(6)>
    ...
    which ends up being translated by expand into an unconditional:
    ...
    (insn 13 12 0 (set (reg/v:SI 22 [ c ])
            (reg/v:SI 23 [ d ])) -1
         (nil))
    ...
    at the start of the loop body, creating an uninitialized read of d on the
    path from loop entry.
    
    By disabling coalesce_ssa_name, we get the more usual copies on the incoming
    edges.  The copy on the loop entry path still does an uninitialized read, but
    that one's now initialized by init-regs.  The test-case passes, also when
    disabling init-regs, so it's possible that the JIT driver doesn't object to
    this type of uninitialized read.
    
    Now that we characterized the problem to some degree, we need to fix this,
    because either:
    - we're violating an undocumented ptx invariant, and this is a compiler bug,
      or
    - this is is a driver JIT bug and we need to work around it.
    
    There are essentially two strategies to address this:
    - stop the compiler from creating uninitialized reads
    - patch up uninitialized reads using additional initialization
    
    The former will probably involve:
    - making some optimizations more conservative in the presence of
      uninitialized reads, and
    - disabling some other optimizations (where making them more conservative is
      not possible, or cannot easily be achieved).
    This will probably will have a cost penalty for code that does not suffer from
    the original problem.
    
    The latter has the problem that it may paper over uninitialized reads
    in the source code, or indeed over ones that were incorrectly introduced
    by the compiler.  But it has the advantage that it allows for the problem to
    be addressed at a single location.
    
    There's an existing pass, init-regs, which implements a form of the latter,
    but it doesn't work for this example because it only inserts additional
    initialization for uses that have not a single reaching definition.
    
    Fix this by adding initialization of uninitialized ptx regs in reorg.
    
    Control the new functionality using -minit-regs=<0|1|2|3>, meaning:
    - 0: disabled.
    - 1: add initialization of all regs at the entry bb
    - 2: add initialization of uninitialized regs at the entry bb
    - 3: add initialization of uninitialized regs close to the use
    and defaulting to 3.
    
    Tested on nvptx.
    
    gcc/ChangeLog:
    
    2022-02-17  Tom de Vries  <tdevries@suse.de>
    
            PR target/104440
            * config/nvptx/nvptx.cc (workaround_uninit_method_1)
            (workaround_uninit_method_2, workaround_uninit_method_3)
            (workaround_uninit): New function.
            (nvptx_reorg): Use workaround_uninit.
            * config/nvptx/nvptx.opt (minit-regs): New option.

Diff:
---
 gcc/config/nvptx/nvptx.cc  | 188 +++++++++++++++++++++++++++++++++++++++++++++
 gcc/config/nvptx/nvptx.opt |   4 +
 2 files changed, 192 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index ed347cab70e..a37a6c78b41 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -5372,6 +5372,190 @@ workaround_barsyncs (void)
 }
 #endif
 
+/* Initialize all declared regs at function entry.
+   Advantage   : Fool-proof.
+   Disadvantage: Potentially creates a lot of long live ranges and adds a lot
+		 of insns.  */
+
+static void
+workaround_uninit_method_1 (void)
+{
+  rtx_insn *first = get_insns ();
+  rtx_insn *insert_here = NULL;
+
+  for (int ix = LAST_VIRTUAL_REGISTER + 1; ix < max_reg_num (); ix++)
+    {
+      rtx reg = regno_reg_rtx[ix];
+
+      /* Skip undeclared registers.  */
+      if (reg == const0_rtx)
+	continue;
+
+      gcc_assert (CONST0_RTX (GET_MODE (reg)));
+
+      start_sequence ();
+      emit_move_insn (reg, CONST0_RTX (GET_MODE (reg)));
+      rtx_insn *inits = get_insns ();
+      end_sequence ();
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	for (rtx_insn *init = inits; init != NULL; init = NEXT_INSN (init))
+	  fprintf (dump_file, "Default init of reg %u inserted: insn %u\n",
+		   ix, INSN_UID (init));
+
+      if (first != NULL)
+	{
+	  insert_here = emit_insn_before (inits, first);
+	  first = NULL;
+	}
+      else
+	insert_here = emit_insn_after (inits, insert_here);
+    }
+}
+
+/* Find uses of regs that are not defined on all incoming paths, and insert a
+   corresponding def at function entry.
+   Advantage   : Simple.
+   Disadvantage: Potentially creates long live ranges.
+		 May not catch all cases.  F.i. a clobber cuts a live range in
+		 the compiler and may prevent entry_lr_in from being set for a
+		 reg, but the clobber does not translate to a ptx insn, so in
+		 ptx there still may be an uninitialized ptx reg.  See f.i.
+		 gcc.c-torture/compile/20020926-1.c.  */
+
+static void
+workaround_uninit_method_2 (void)
+{
+  auto_bitmap entry_pseudo_uninit;
+  {
+    auto_bitmap not_pseudo;
+    bitmap_set_range (not_pseudo, 0, LAST_VIRTUAL_REGISTER);
+
+    bitmap entry_lr_in = DF_LR_IN (ENTRY_BLOCK_PTR_FOR_FN (cfun));
+    bitmap_and_compl (entry_pseudo_uninit, entry_lr_in, not_pseudo);
+  }
+
+  rtx_insn *first = get_insns ();
+  rtx_insn *insert_here = NULL;
+
+  bitmap_iterator iterator;
+  unsigned ix;
+  EXECUTE_IF_SET_IN_BITMAP (entry_pseudo_uninit, 0, ix, iterator)
+    {
+      rtx reg = regno_reg_rtx[ix];
+      gcc_assert (CONST0_RTX (GET_MODE (reg)));
+
+      start_sequence ();
+      emit_move_insn (reg, CONST0_RTX (GET_MODE (reg)));
+      rtx_insn *inits = get_insns ();
+      end_sequence ();
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	for (rtx_insn *init = inits; init != NULL; init = NEXT_INSN (init))
+	  fprintf (dump_file, "Missing init of reg %u inserted: insn %u\n",
+		   ix, INSN_UID (init));
+
+      if (first != NULL)
+	{
+	  insert_here = emit_insn_before (inits, first);
+	  first = NULL;
+	}
+      else
+	insert_here = emit_insn_after (inits, insert_here);
+    }
+}
+
+/* Find uses of regs that are not defined on all incoming paths, and insert a
+   corresponding def on those.
+   Advantage   : Doesn't create long live ranges.
+   Disadvantage: More complex, and potentially also more defs.  */
+
+static void
+workaround_uninit_method_3 (void)
+{
+  auto_bitmap not_pseudo;
+  bitmap_set_range (not_pseudo, 0, LAST_VIRTUAL_REGISTER);
+
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, cfun)
+    {
+      if (single_pred_p (bb))
+	continue;
+
+      auto_bitmap bb_pseudo_uninit;
+      bitmap_and_compl (bb_pseudo_uninit, DF_LIVE_IN (bb), DF_MIR_IN (bb));
+      bitmap_and_compl_into (bb_pseudo_uninit, not_pseudo);
+
+      bitmap_iterator iterator;
+      unsigned ix;
+      EXECUTE_IF_SET_IN_BITMAP (bb_pseudo_uninit, 0, ix, iterator)
+	{
+	  bool have_false = false;
+	  bool have_true = false;
+
+	  edge e;
+	  edge_iterator ei;
+	  FOR_EACH_EDGE (e, ei, bb->preds)
+	    {
+	      if (bitmap_bit_p (DF_LIVE_OUT (e->src), ix))
+		have_true = true;
+	      else
+		have_false = true;
+	    }
+	  if (have_false ^ have_true)
+	    continue;
+
+	  FOR_EACH_EDGE (e, ei, bb->preds)
+	    {
+	      if (bitmap_bit_p (DF_LIVE_OUT (e->src), ix))
+		continue;
+
+	      rtx reg = regno_reg_rtx[ix];
+	      gcc_assert (CONST0_RTX (GET_MODE (reg)));
+
+	      start_sequence ();
+	      emit_move_insn (reg, CONST0_RTX (GET_MODE (reg)));
+	      rtx_insn *inits = get_insns ();
+	      end_sequence ();
+
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		for (rtx_insn *init = inits; init != NULL;
+		     init = NEXT_INSN (init))
+		  fprintf (dump_file,
+			   "Missing init of reg %u inserted on edge: %d -> %d:"
+			   " insn %u\n", ix, e->src->index, e->dest->index,
+			   INSN_UID (init));
+
+	      insert_insn_on_edge (inits, e);
+	    }
+	}
+    }
+
+  commit_edge_insertions ();
+}
+
+static void
+workaround_uninit (void)
+{
+  switch (nvptx_init_regs)
+    {
+    case 0:
+      /* Skip.  */
+      break;
+    case 1:
+      workaround_uninit_method_1 ();
+      break;
+    case 2:
+      workaround_uninit_method_2 ();
+      break;
+    case 3:
+      workaround_uninit_method_3 ();
+      break;
+    default:
+      gcc_unreachable ();
+    }
+}
+
 /* PTX-specific reorganization
    - Split blocks at fork and join instructions
    - Compute live registers
@@ -5401,6 +5585,8 @@ nvptx_reorg (void)
   df_set_flags (DF_NO_INSN_RESCAN | DF_NO_HARD_REGS);
   df_live_add_problem ();
   df_live_set_all_dirty ();
+  if (nvptx_init_regs == 3)
+    df_mir_add_problem ();
   df_analyze ();
   regstat_init_n_sets_and_refs ();
 
@@ -5413,6 +5599,8 @@ nvptx_reorg (void)
     if (REG_N_SETS (i) == 0 && REG_N_REFS (i) == 0)
       regno_reg_rtx[i] = const0_rtx;
 
+  workaround_uninit ();
+
   /* Determine launch dimensions of the function.  If it is not an
      offloaded function  (i.e. this is a regular compiler), the
      function has no neutering.  */
diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt
index e3f65b2d0b1..08580071731 100644
--- a/gcc/config/nvptx/nvptx.opt
+++ b/gcc/config/nvptx/nvptx.opt
@@ -91,3 +91,7 @@ Enum(ptx_version) String(7.0) Value(PTX_VERSION_7_0)
 mptx=
 Target RejectNegative ToLower Joined Enum(ptx_version) Var(ptx_version_option)
 Specify the version of the ptx version to use.
+
+minit-regs=
+Target Var(nvptx_init_regs) IntegerRange(0, 3) Joined UInteger Init(3)
+Initialize ptx registers.


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2022-02-21 15:51 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-21 15:51 [gcc r12-7312] [nvptx] Initialize ptx regs Tom de Vries

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).