public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc(refs/vendors/ARM/heads/morello)] expand: Avoid invalidating PARALLEL caps in store_field
@ 2022-12-07 15:06 Alex Coplan
0 siblings, 0 replies; only message in thread
From: Alex Coplan @ 2022-12-07 15:06 UTC (permalink / raw)
To: gcc-cvs
https://gcc.gnu.org/g:8ef68db3c929510bd5fa10938a2f208bb38c99cd
commit 8ef68db3c929510bd5fa10938a2f208bb38c99cd
Author: Alex Coplan <alex.coplan@arm.com>
Date: Fri Nov 25 17:42:06 2022 +0000
expand: Avoid invalidating PARALLEL caps in store_field
Prior to this change, this C++ testcase (reduced from GCC's emit-rtl.c)
was getting miscompiled at all optimization levels:
class storage_ref {
long *val;
int precision;
};
struct wide_int_ref_storage : storage_ref {
wide_int_ref_storage();
};
storage_ref decompose();
wide_int_ref_storage::wide_int_ref_storage()
: storage_ref(decompose()) {}
For pure-capability Morello, the PCS requires that storage_ref is
returned in a pair of capability registers (c0 and c1). We represent
this in aarch64_function_value with a PARALLEL (as is standard for that
target hook).
When expanding the constructor for wide_int_ref_storage above, we are
assigning 24 bytes from the result of the call to decompose() into
the storage_ref for the given instance. This is done by
expand_assignment calling store_field. When store_field expands the
CALL_EXPR, it sees the PARALLEL indicating that the value is returned in
a pair of capability registers:
(parallel:BLK [
(expr_list:REG_DEP_TRUE (reg:CADI 97)
(const_int 0 [0]))
(expr_list:REG_DEP_TRUE (reg:CADI 98)
(const_int 16 [0x10]))
])
The problem is that the existing code in store_field for handling the
PARALLEL case decides to find a suitably-large integer mode (in this
case we end up choosing OImode) and uses emit_group_store to store from
the PARALLEL into the large integer pseudo. Once we have the large
integer pseudo, we then proceed to use standard bitfield techniques to
store the relevant part of the pseudo into memory.
Unfortunately, this is hopeless for capabilities, since as soon as we
store into the OImode pseudo, the validity of any capabilities in the
PARALLEL will be lost.
To fix this, this patch adjusts the PARALLEL handling in store_field.
The basic approach to try and preserve capabilities here is to allocate
a stack temporary and store the aggregate there before attempting the
bitfield store.
However, this is rather inefficient in that we use additional stack
space, and may end up having to go through memory twice. As an
additional optimization, in the case where bitpos is zero, we try
storing as many entire bytes as possible directly to the target using
emit_group_store. The residual bits are then stored using the
existing techniques.
With the testcase above, when emit_group_store is handling storing the
partial tail of the PARALLEL, it uses store_bit_field to store 32 bits
of a CADImode register to memory. This reveals a latent issue in
store_integral_bit_field, where we ICE because we try to form a CADImode
subreg of a TImode register.
This ICE occurs because we forbid subregs punning from non-capability
modes to capability modes. To work around this, when storing from a
capability-mode rvalue, we pun on the rvalue instead.
Prior to this patch, we get the following codegen for the above at -O2
on pure-capability Morello:
_ZN20wide_int_ref_storageC2Ev:
stp c29, c30, [csp, -64]!
mov c29, csp
str c20, [csp, 32]
mov c20, c0
bl _Z9decomposev
str c0, [csp, 48]
ldr x0, [csp, 48]
str x0, [c20]
ldr x0, [csp, 56]
str x0, [c20, 8]
str w1, [c20, 16]
ldr c20, [csp, 32]
ldp c29, c30, [csp], 64
ret
which invalidates the capability returned in c0. After the patch, we get
this:
_ZN20wide_int_ref_storageC2Ev:
stp c29, c30, [csp, -48]!
mov c29, csp
str c20, [csp, 32]
mov c20, c0
bl _Z9decomposev
str c0, [c20]
str w1, [c20, 16]
ldr c20, [csp, 32]
ldp c29, c30, [csp], 48
ret
The new execution test introduced with the patch fails before and passes
after the code changes.
Diff:
---
gcc/expmed.c | 11 +++-
gcc/expr.c | 69 ++++++++++++++++++++--
.../g++.target/aarch64/morello/cap-store-field.C | 41 +++++++++++++
3 files changed, 116 insertions(+), 5 deletions(-)
diff --git a/gcc/expmed.c b/gcc/expmed.c
index 4e87841ad95..3e1478d2501 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -1046,7 +1046,16 @@ store_integral_bit_field (rtx op0, opt_scalar_int_mode op0_mode,
{
value_mode = int_mode_for_mode (GET_MODE (value)).require ();
value = gen_reg_rtx (value_mode);
- emit_move_insn (gen_lowpart (GET_MODE (orig_value), value), orig_value);
+
+ /* For capability modes, we forbid punning from a non-capability
+ mode to a capability mode, but allow punning in the other
+ direction, so pun on the RHS instead. */
+ if (is_a <scalar_addr_mode> (GET_MODE (orig_value)))
+ emit_move_insn (value, lowpart_subreg (value_mode,
+ orig_value,
+ GET_MODE (orig_value)));
+ else
+ emit_move_insn (gen_lowpart (GET_MODE (orig_value), value), orig_value);
}
/* If OP0 is a multi-word register, narrow it to the affected word.
diff --git a/gcc/expr.c b/gcc/expr.c
index 945aa002774..d778547c8de 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -7239,6 +7239,34 @@ store_constructor (tree exp, rtx target, int cleared, poly_int64 size,
}
}
+/* Determine if a region of bits of size BITSIZE starting at position
+ BITPOS covers any capability in a PARALLEL. */
+static bool
+parallel_bitregion_covers_cap_p (rtx par,
+ poly_int64 bitpos,
+ poly_int64 bitsize)
+{
+ poly_int64 br_end = bitpos + bitsize;
+
+ for (int i = 0; i < XVECLEN (par, 0); i++)
+ {
+ rtx x = XVECEXP (par, 0, i);
+ gcc_assert (GET_CODE (x) == EXPR_LIST);
+
+ machine_mode mode = GET_MODE (XEXP (x, 0));
+ if (!CAPABILITY_MODE_P (mode))
+ continue;
+
+ auto cap_off = INTVAL (XEXP (x, 1)) * BITS_PER_UNIT;
+ auto cap_end = cap_off + GET_MODE_PRECISION (mode);
+
+ if (maybe_le (bitpos, cap_off) && maybe_ge (br_end, cap_end))
+ return true;
+ }
+
+ return false;
+}
+
/* Store the value of EXP (an expression tree)
into a subfield of TARGET which has mode MODE and occupies
BITSIZE bits, starting BITPOS bits from the start of TARGET.
@@ -7378,10 +7406,43 @@ store_field (rtx target, poly_int64 bitsize, poly_int64 bitpos,
if (GET_CODE (temp) == PARALLEL)
{
HOST_WIDE_INT size = int_size_in_bytes (TREE_TYPE (exp));
- machine_mode temp_mode = GET_MODE (temp);
- if (temp_mode == BLKmode || temp_mode == VOIDmode)
- temp_mode = smallest_int_mode_for_size (size * BITS_PER_UNIT);
- rtx temp_target = gen_reg_rtx (temp_mode);
+
+ /* As an optimization, try doing some of the store directly, where
+ possible. */
+ if (known_eq (bitpos, 0)
+ && known_gt (bitsize, BITS_PER_UNIT)
+ && MEM_P (target))
+ {
+ poly_int64 to_store = bits_to_bytes_round_down (bitsize);
+ emit_group_store (target, temp, TREE_TYPE (exp), to_store);
+
+ poly_int64 bits_stored = to_store * BITS_PER_UNIT;
+
+ /* If we stored everything, then we're done. */
+ if (known_eq (bitsize, bits_stored))
+ return const0_rtx;
+
+ bitregion_start += bits_stored;
+ bitpos += bits_stored;
+ bitsize -= bits_stored;
+ target = adjust_address (target, GET_MODE (target), to_store);
+ }
+
+ rtx temp_target;
+
+ /* If the bits we want to store could potentially cover a valid
+ capability, go via a stack temporary to avoid invalidating
+ the capabilities if at all possible. */
+ if (parallel_bitregion_covers_cap_p (temp, bitpos, bitsize))
+ temp_target = assign_stack_temp (GET_MODE (temp), size);
+ else
+ {
+ machine_mode temp_mode = GET_MODE (temp);
+ if (temp_mode == BLKmode || temp_mode == VOIDmode)
+ temp_mode = smallest_int_mode_for_size (size * BITS_PER_UNIT);
+ temp_target = gen_reg_rtx (temp_mode);
+ }
+
emit_group_store (temp_target, temp, TREE_TYPE (exp), size);
temp = temp_target;
}
diff --git a/gcc/testsuite/g++.target/aarch64/morello/cap-store-field.C b/gcc/testsuite/g++.target/aarch64/morello/cap-store-field.C
new file mode 100644
index 00000000000..8c6144b7a8e
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/morello/cap-store-field.C
@@ -0,0 +1,41 @@
+// { dg-do run }
+
+long g;
+
+class storage_ref {
+ long *val;
+ int precision;
+public:
+ storage_ref() : val(&g), precision(42) {}
+ void check();
+};
+
+struct wide_int_ref_storage : public storage_ref {
+ wide_int_ref_storage();
+};
+
+__attribute__((noipa))
+storage_ref decompose()
+{
+ storage_ref t;
+ return t;
+}
+
+wide_int_ref_storage::wide_int_ref_storage()
+ : storage_ref(decompose()) {}
+
+__attribute__((noipa))
+void storage_ref::check()
+{
+ if (*val != 123)
+ __builtin_abort ();
+ if (precision != 42)
+ __builtin_abort ();
+}
+
+int main()
+{
+ g = 123;
+ wide_int_ref_storage wrs;
+ wrs.check ();
+}
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2022-12-07 15:06 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-07 15:06 [gcc(refs/vendors/ARM/heads/morello)] expand: Avoid invalidating PARALLEL caps in store_field Alex Coplan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).