public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc(refs/vendors/ARM/heads/morello)] expand: Avoid invalidating PARALLEL caps in store_field
@ 2022-12-07 15:06 Alex Coplan
  0 siblings, 0 replies; only message in thread
From: Alex Coplan @ 2022-12-07 15:06 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:8ef68db3c929510bd5fa10938a2f208bb38c99cd

commit 8ef68db3c929510bd5fa10938a2f208bb38c99cd
Author: Alex Coplan <alex.coplan@arm.com>
Date:   Fri Nov 25 17:42:06 2022 +0000

    expand: Avoid invalidating PARALLEL caps in store_field
    
    Prior to this change, this C++ testcase (reduced from GCC's emit-rtl.c)
    was getting miscompiled at all optimization levels:
    
    class storage_ref {
      long *val;
      int precision;
    };
    
    struct wide_int_ref_storage : storage_ref {
      wide_int_ref_storage();
    };
    
    storage_ref decompose();
    
    wide_int_ref_storage::wide_int_ref_storage()
        : storage_ref(decompose()) {}
    
    For pure-capability Morello, the PCS requires that storage_ref is
    returned in a pair of capability registers (c0 and c1). We represent
    this in aarch64_function_value with a PARALLEL (as is standard for that
    target hook).
    
    When expanding the constructor for wide_int_ref_storage above, we are
    assigning 24 bytes from the result of the call to decompose() into
    the storage_ref for the given instance. This is done by
    expand_assignment calling store_field. When store_field expands the
    CALL_EXPR, it sees the PARALLEL indicating that the value is returned in
    a pair of capability registers:
    
    (parallel:BLK [
      (expr_list:REG_DEP_TRUE (reg:CADI 97)
          (const_int 0 [0]))
      (expr_list:REG_DEP_TRUE (reg:CADI 98)
          (const_int 16 [0x10]))
    ])
    
    The problem is that the existing code in store_field for handling the
    PARALLEL case decides to find a suitably-large integer mode (in this
    case we end up choosing OImode) and uses emit_group_store to store from
    the PARALLEL into the large integer pseudo. Once we have the large
    integer pseudo, we then proceed to use standard bitfield techniques to
    store the relevant part of the pseudo into memory.
    
    Unfortunately, this is hopeless for capabilities, since as soon as we
    store into the OImode pseudo, the validity of any capabilities in the
    PARALLEL will be lost.
    
    To fix this, this patch adjusts the PARALLEL handling in store_field.
    The basic approach to try and preserve capabilities here is to allocate
    a stack temporary and store the aggregate there before attempting the
    bitfield store.
    
    However, this is rather inefficient in that we use additional stack
    space, and may end up having to go through memory twice. As an
    additional optimization, in the case where bitpos is zero, we try
    storing as many entire bytes as possible directly to the target using
    emit_group_store. The residual bits are then stored using the
    existing techniques.
    
    With the testcase above, when emit_group_store is handling storing the
    partial tail of the PARALLEL, it uses store_bit_field to store 32 bits
    of a CADImode register to memory. This reveals a latent issue in
    store_integral_bit_field, where we ICE because we try to form a CADImode
    subreg of a TImode register.
    
    This ICE occurs because we forbid subregs punning from non-capability
    modes to capability modes. To work around this, when storing from a
    capability-mode rvalue, we pun on the rvalue instead.
    
    Prior to this patch, we get the following codegen for the above at -O2
    on pure-capability Morello:
    
    _ZN20wide_int_ref_storageC2Ev:
            stp     c29, c30, [csp, -64]!
            mov     c29, csp
            str     c20, [csp, 32]
            mov     c20, c0
            bl      _Z9decomposev
            str     c0, [csp, 48]
            ldr     x0, [csp, 48]
            str     x0, [c20]
            ldr     x0, [csp, 56]
            str     x0, [c20, 8]
            str     w1, [c20, 16]
            ldr     c20, [csp, 32]
            ldp     c29, c30, [csp], 64
            ret
    
    which invalidates the capability returned in c0. After the patch, we get
    this:
    
    _ZN20wide_int_ref_storageC2Ev:
            stp     c29, c30, [csp, -48]!
            mov     c29, csp
            str     c20, [csp, 32]
            mov     c20, c0
            bl      _Z9decomposev
            str     c0, [c20]
            str     w1, [c20, 16]
            ldr     c20, [csp, 32]
            ldp     c29, c30, [csp], 48
            ret
    
    The new execution test introduced with the patch fails before and passes
    after the code changes.

Diff:
---
 gcc/expmed.c                                       | 11 +++-
 gcc/expr.c                                         | 69 ++++++++++++++++++++--
 .../g++.target/aarch64/morello/cap-store-field.C   | 41 +++++++++++++
 3 files changed, 116 insertions(+), 5 deletions(-)

diff --git a/gcc/expmed.c b/gcc/expmed.c
index 4e87841ad95..3e1478d2501 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -1046,7 +1046,16 @@ store_integral_bit_field (rtx op0, opt_scalar_int_mode op0_mode,
     {
       value_mode = int_mode_for_mode (GET_MODE (value)).require ();
       value = gen_reg_rtx (value_mode);
-      emit_move_insn (gen_lowpart (GET_MODE (orig_value), value), orig_value);
+
+      /* For capability modes, we forbid punning from a non-capability
+	 mode to a capability mode, but allow punning in the other
+	 direction, so pun on the RHS instead.  */
+      if (is_a <scalar_addr_mode> (GET_MODE (orig_value)))
+	emit_move_insn (value, lowpart_subreg (value_mode,
+					       orig_value,
+					       GET_MODE (orig_value)));
+      else
+	emit_move_insn (gen_lowpart (GET_MODE (orig_value), value), orig_value);
     }
 
   /* If OP0 is a multi-word register, narrow it to the affected word.
diff --git a/gcc/expr.c b/gcc/expr.c
index 945aa002774..d778547c8de 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -7239,6 +7239,34 @@ store_constructor (tree exp, rtx target, int cleared, poly_int64 size,
     }
 }
 
+/* Determine if a region of bits of size BITSIZE starting at position
+   BITPOS covers any capability in a PARALLEL.  */
+static bool
+parallel_bitregion_covers_cap_p (rtx par,
+				 poly_int64 bitpos,
+				 poly_int64 bitsize)
+{
+  poly_int64 br_end = bitpos + bitsize;
+
+  for (int i = 0; i < XVECLEN (par, 0); i++)
+    {
+      rtx x = XVECEXP (par, 0, i);
+      gcc_assert (GET_CODE (x) == EXPR_LIST);
+
+      machine_mode mode = GET_MODE (XEXP (x, 0));
+      if (!CAPABILITY_MODE_P (mode))
+	continue;
+
+      auto cap_off = INTVAL (XEXP (x, 1)) * BITS_PER_UNIT;
+      auto cap_end = cap_off + GET_MODE_PRECISION (mode);
+
+      if (maybe_le (bitpos, cap_off) && maybe_ge (br_end, cap_end))
+	return true;
+    }
+
+  return false;
+}
+
 /* Store the value of EXP (an expression tree)
    into a subfield of TARGET which has mode MODE and occupies
    BITSIZE bits, starting BITPOS bits from the start of TARGET.
@@ -7378,10 +7406,43 @@ store_field (rtx target, poly_int64 bitsize, poly_int64 bitpos,
       if (GET_CODE (temp) == PARALLEL)
 	{
 	  HOST_WIDE_INT size = int_size_in_bytes (TREE_TYPE (exp));
-	  machine_mode temp_mode = GET_MODE (temp);
-	  if (temp_mode == BLKmode || temp_mode == VOIDmode)
-	    temp_mode = smallest_int_mode_for_size (size * BITS_PER_UNIT);
-	  rtx temp_target = gen_reg_rtx (temp_mode);
+
+	  /* As an optimization, try doing some of the store directly, where
+	     possible.  */
+	  if (known_eq (bitpos, 0)
+	      && known_gt (bitsize, BITS_PER_UNIT)
+	      && MEM_P (target))
+	    {
+	      poly_int64 to_store = bits_to_bytes_round_down (bitsize);
+	      emit_group_store (target, temp, TREE_TYPE (exp), to_store);
+
+	      poly_int64 bits_stored = to_store * BITS_PER_UNIT;
+
+	      /* If we stored everything, then we're done.  */
+	      if (known_eq (bitsize, bits_stored))
+		return const0_rtx;
+
+	      bitregion_start += bits_stored;
+	      bitpos += bits_stored;
+	      bitsize -= bits_stored;
+	      target = adjust_address (target, GET_MODE (target), to_store);
+	    }
+
+	  rtx temp_target;
+
+	  /* If the bits we want to store could potentially cover a valid
+	     capability, go via a stack temporary to avoid invalidating
+	     the capabilities if at all possible.  */
+	  if (parallel_bitregion_covers_cap_p (temp, bitpos, bitsize))
+	    temp_target = assign_stack_temp (GET_MODE (temp), size);
+	  else
+	    {
+	      machine_mode temp_mode = GET_MODE (temp);
+	      if (temp_mode == BLKmode || temp_mode == VOIDmode)
+		temp_mode = smallest_int_mode_for_size (size * BITS_PER_UNIT);
+	      temp_target = gen_reg_rtx (temp_mode);
+	    }
+
 	  emit_group_store (temp_target, temp, TREE_TYPE (exp), size);
 	  temp = temp_target;
 	}
diff --git a/gcc/testsuite/g++.target/aarch64/morello/cap-store-field.C b/gcc/testsuite/g++.target/aarch64/morello/cap-store-field.C
new file mode 100644
index 00000000000..8c6144b7a8e
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/morello/cap-store-field.C
@@ -0,0 +1,41 @@
+// { dg-do run }
+
+long g;
+
+class storage_ref {
+  long *val;
+  int precision;
+public:
+  storage_ref() : val(&g), precision(42) {}
+  void check();
+};
+
+struct wide_int_ref_storage : public storage_ref {
+  wide_int_ref_storage();
+};
+
+__attribute__((noipa))
+storage_ref decompose()
+{
+  storage_ref t;
+  return t;
+}
+
+wide_int_ref_storage::wide_int_ref_storage()
+    : storage_ref(decompose()) {}
+
+__attribute__((noipa))
+void storage_ref::check()
+{
+  if (*val != 123)
+    __builtin_abort ();
+  if (precision != 42)
+    __builtin_abort ();
+}
+
+int main()
+{
+  g = 123;
+  wide_int_ref_storage wrs;
+  wrs.check ();
+}

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2022-12-07 15:06 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-07 15:06 [gcc(refs/vendors/ARM/heads/morello)] expand: Avoid invalidating PARALLEL caps in store_field Alex Coplan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).