From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 2049) id 2F6AE385DC09; Fri, 21 Oct 2022 13:21:24 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2F6AE385DC09 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1666358484; bh=XS1SaMX6oxjUaKLB3e6lrJ5X5FPMJ7gl/HYioWKc+Ik=; h=From:To:Subject:Date:From; b=KLL5j9ZMBuQgvXdlzxMt/GeqaznGvZAGut4a9Lu2B46KV6+4W8rrVQrznizHjv8Jm XMKyK8Z21vaAZgCLk6Mxt+QGpNLWnLpLGdSvZz/Se2qq2l74w/zOlJ0nftBVaWPoul jyyRq5wJh5Ymke+MpSJGkL92lrytHfrTHorVsf+Q= Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Matthew Malcomson To: gcc-cvs@gcc.gnu.org Subject: [gcc(refs/vendors/ARM/heads/morello)] Inlining memcpy and tag-preserving behaviour X-Act-Checkin: gcc X-Git-Author: Matthew Malcomson X-Git-Refname: refs/vendors/ARM/heads/morello X-Git-Oldrev: 7c3edbbe0452d99192b97c88fdd65fb321e85e76 X-Git-Newrev: 476e4ecb6f70ea3c7ad7081670daeebb0d8605d5 Message-Id: <20221021132124.2F6AE385DC09@sourceware.org> Date: Fri, 21 Oct 2022 13:21:24 +0000 (GMT) List-Id: https://gcc.gnu.org/g:476e4ecb6f70ea3c7ad7081670daeebb0d8605d5 commit 476e4ecb6f70ea3c7ad7081670daeebb0d8605d5 Author: Matthew Malcomson Date: Fri Oct 21 14:18:47 2022 +0100 Inlining memcpy and tag-preserving behaviour This commit combines two fundamental changes. First we are bailing out of inlining memcpy in another situation. Second, we are refactoring the way that we avoid inlining memcpy using `bucket_type_for_size`. --- Bail out of performing any inlining transformation of memcpy/memmove when the size of the operation can hold a capability. Inlining a memcpy is slightly more complicated in the presence of capabilities. There are two cases where we know operations that we can make. 1) When we are moving between two destinations that are known to be aligned we can use `ldr`/`str` on capability registers and be certain that the operation will maintain the validity of any capability that may exist in the source memory region. 2) When we are moving from a source that is known to not contain any capabilities we may use integer loads to perform the operation. A naive implementation (and the one that we had until now) may determine that a memcpy source must not have capabilities if the types of the pointers given to the memcpy call do not contain capabilities. However this is not necessarily the case. One case where this is evident is in passing pointers to a union member, and that is indeed the testcase which we add for this patch. This patch implements a conservative approach with the idea that later improvements could identify more complex situations where a memory operation may be determined to act on known non-capability data. N.b. for future reference, it may be useful to look at the discussion below for information on which cases may be identified as operations on known non-capability data. https://github.com/CTSRD-CHERI/llvm-project/pull/506 --- A previous commit fixed problems in the C++ frontend inlining a 16 byte memcpy into integer loads and stores. We fixed this by avoiding inlining such calls in the case where the size may contain capabilities. The C frontend already had accounted for such cases by still inlining when the alignment was known to be enough for capabilities. The reason that the C++ and C frontends were not behaving similarly was due to the `bucket_type_for_size` function using the `lang_hooks.types.cap_type_for_size` hook. This had not been set up for the C++ frontend. Here we use the same hook implementation for the C++ frontend as the C frontend and hence can use the same method to avoid problematic memcpy inlining for both. N.b. while using this approach we noticed that the existing code can already handle capability loads and stores in the branch which uses `bucket_type_for_size`. The existing checks for alignment are sufficient to catch the extra requirements on capabilities in a general way. The only thing that appeared troublesome was using `fold_const_aggregate_ref`, but this is logically checking to see if our value can be determined at compile-time by recognising that it is a reference into a constant aggregate. Hence the existing code should correctly handle capability types. Diff: --- gcc/cp/cp-objcp-common.h | 2 + gcc/gimple-fold.c | 143 +++++++++++++++------ .../aarch64/morello/memcpy-avoid-inline-2.c | 13 ++ .../aarch64/morello/memcpy-avoid-inline.c | 27 ++++ .../gcc.target/aarch64/morello/memcpy-inline-3.c | 15 +++ .../gcc.target/aarch64/morello/memcpy-inline-4.c | 11 ++ .../gcc.target/aarch64/morello/memcpy-inline-5.c | 11 ++ 7 files changed, 185 insertions(+), 37 deletions(-) diff --git a/gcc/cp/cp-objcp-common.h b/gcc/cp/cp-objcp-common.h index 0936f166d5b..9e6c931f8af 100644 --- a/gcc/cp/cp-objcp-common.h +++ b/gcc/cp/cp-objcp-common.h @@ -141,6 +141,8 @@ extern tree cxx_simulate_enum_decl (location_t, const char *, #define LANG_HOOKS_TYPE_FOR_MODE c_common_type_for_mode #undef LANG_HOOKS_TYPE_FOR_SIZE #define LANG_HOOKS_TYPE_FOR_SIZE c_common_type_for_size +#undef LANG_HOOKS_CAP_TYPE_FOR_SIZE +#define LANG_HOOKS_CAP_TYPE_FOR_SIZE c_common_cap_type_for_size #undef LANG_HOOKS_INCOMPLETE_TYPE_ERROR #define LANG_HOOKS_INCOMPLETE_TYPE_ERROR cxx_incomplete_type_error #undef LANG_HOOKS_TYPE_PROMOTES_TO diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c index 73fbf8a97f9..d207d345372 100644 --- a/gcc/gimple-fold.c +++ b/gcc/gimple-fold.c @@ -688,40 +688,82 @@ size_must_be_zero_p (tree size) return vr.zero_p (); } -/* Return a type that can be used to copy a BITS sized area around in memory. - This is an integer unless there are capabilities of that size. - When there are capability types of that size we use them to ensure that - hidden validity bits are not lost. */ -tree -bucket_type_for_size (unsigned int bits, int unsignedp) +/* Returns true if an access of MODE at address aligned to ALIGN is known to be + slow due to misalignment. */ +static bool +alignment_permits_fast_mov (unsigned int align, machine_mode mode) { - tree ret = lang_hooks.types.cap_type_for_size (bits, unsignedp); - return ret && (TYPE_CAP_PRECISION (ret) == bits) - ? ret - : lang_hooks.types.type_for_size (bits, unsignedp); + return !(align < GET_MODE_ALIGNMENT (mode) + && targetm.slow_unaligned_access (mode, align) + && optab_handler (movmisalign_optab, mode) == CODE_FOR_nothing); } -/* Returns true if (conservatively) a memcpy of length LEN could be used - to copy capabilities. +/* This function returns true if a memcpy of length LEN with known source and + destination alignments of SRC_ALIGN and DEST_ALIGN can be implemented in + terms of capability loads and stores. - Note if we also take account of source and destination types here we may be - able to safely reject more cases (and thus optimize more), but for now we - keep the check simple and assume that any memcpy that copies enough bytes to - contain at least one capability on a capability target may indeed be a - capability copy. */ + If targetm.capability_mode ().exists (), then this also leaves the rounded + down number of capability loads and stores required to move LEN bytes in + NUM_CAPS. */ static bool -maybe_capability_copy_p (tree len) +maybe_capability_copy_p (tree len, unsigned src_align, unsigned dest_align, + unsigned *num_caps) { + *num_caps = 0; + if (!tree_fits_uhwi_p (len)) + return false; + + const unsigned HOST_WIDE_INT bytes = tree_to_uhwi (len); const auto maybe_cap_mode = targetm.capability_mode (); if (!maybe_cap_mode.exists ()) return false; const auto cap_mode = maybe_cap_mode.require (); - const unsigned HOST_WIDE_INT ilen = tree_to_uhwi (len); - const unsigned HOST_WIDE_INT cap_bytes - = GET_MODE_PRECISION (cap_mode) / BITS_PER_UNIT; + const unsigned HOST_WIDE_INT cap_bytes = GET_MODE_SIZE (cap_mode); + *num_caps = bytes / cap_bytes; + /* Only need to check for bytes < cap_bytes if the size is zero, for the + memcpy use we'll never see that. However it's nice to include the check + for completeness of the abstraction. */ + if (bytes % cap_bytes || bytes < cap_bytes) + return false; - return ilen >= cap_bytes; + gcc_assert (TYPE_MODE (uintcap_type_node) == cap_mode); + if (!alignment_permits_fast_mov (src_align, cap_mode) + || !alignment_permits_fast_mov (dest_align, cap_mode)) + return false; + return true; +} + +/* Return a type that can be used to copy a BITS sized area around in memory. + This is an integer unless there are capabilities of that size. + When there are capability types of that size we use them to ensure that + hidden validity bits are not lost. + When there is a capability which is smaller than the given size but there is + no capability matching this size, there is no data type which can carry + around `bits` worth of data without losing the validity bit, hence we return + NULL_TREE. */ +tree +bucket_type_for_size (unsigned int bits, int unsignedp) +{ + tree ret = lang_hooks.types.cap_type_for_size (bits, unsignedp); + if (ret) + return ret; + + /* Check for if the size requested could include a capability. + In that case we require a capability type in order to avoid invalidating + any capability in this size, but since we have no cap_type_for_size we + don't have one available. Hence there is no available type. */ + const unsigned HOST_WIDE_INT bytes = bits / BITS_PER_UNIT; + const auto maybe_cap_mode = targetm.capability_mode (); + if (maybe_cap_mode.exists ()) + { + const auto cap_mode = maybe_cap_mode.require (); + const unsigned HOST_WIDE_INT cap_bytes = GET_MODE_SIZE (cap_mode); + if (bytes >= cap_bytes) + return NULL_TREE; + } + + return lang_hooks.types.type_for_size (bits, unsignedp); } /* Fold function call to builtin mem{{,p}cpy,move}. Try to detect and @@ -805,11 +847,6 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi, dest_align = get_pointer_alignment (dest); if (tree_fits_uhwi_p (len) && compare_tree_int (len, MOVE_MAX) <= 0 - /* Avoid doing this for possible copies of capabilities: we - would need to use capability loads/stores but this is - only safe if the source and destination are suitably - aligned, and we don't necessarily know this. */ - && !maybe_capability_copy_p (len) /* FIXME: Don't transform copies from strings with known length. Until GCC 9 this prevented a case in gcc.dg/strlenopt-8.c from being handled, and the case was XFAILed for that reason. @@ -840,17 +877,14 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi, if (warning != OPT_Wrestrict) return false; - scalar_int_mode mode; + scalar_addr_mode mode; tree type = bucket_type_for_size (ilen * 8, 1); if (type - && is_a (TYPE_MODE (type), &mode) + && is_a (TYPE_MODE (type), &mode) && GET_MODE_SIZE (mode) * BITS_PER_UNIT == ilen * 8 /* If the destination pointer is not aligned we must be able to emit an unaligned store. */ - && (dest_align >= GET_MODE_ALIGNMENT (mode) - || !targetm.slow_unaligned_access (mode, dest_align) - || (optab_handler (movmisalign_optab, mode) - != CODE_FOR_nothing))) + && alignment_permits_fast_mov (dest_align, mode)) { tree srctype = type; tree desttype = type; @@ -860,10 +894,7 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi, tree tem = fold_const_aggregate_ref (srcmem); if (tem) srcmem = tem; - else if (src_align < GET_MODE_ALIGNMENT (mode) - && targetm.slow_unaligned_access (mode, src_align) - && (optab_handler (movmisalign_optab, mode) - == CODE_FOR_nothing)) + else if (!alignment_permits_fast_mov (src_align, mode)) srcmem = NULL_TREE; if (srcmem) { @@ -1124,6 +1155,44 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi, return false; gimple *new_stmt; + + /* Avoid any conversion from a function call to a MEM_REF if the length + that we are copying could contain a capability and we can not + implement the move in terms of capability loads and stores. */ + unsigned num_cap_moves = 0; + if (capability_type_p (srctype) && capability_type_p (desttype)) + /* This must be a is_gimple_reg_type and hence we will just implement + this as an assignment in the next if statement. + + Note that we may be able to look closer at source and destination + types here and identify cases where based on the types we know that + there can be no capability in the memory we are copying across (and + thus optimize more). */ + gcc_assert (is_gimple_reg_type (TREE_TYPE (srcvar))); + else if (maybe_capability_copy_p (len, src_align, dest_align, + &num_cap_moves)) + /* We don't know if this memory region contains capabilities or not. + However, we know that we can copy the whole thing as a set of + capability loads and stores. May as well perform the operation like + that to preserve any capabilities which may be there. */ + { + desttype = build_array_type_nelts (uintcap_type_node, num_cap_moves); + srctype = desttype; + if (src_align > TYPE_ALIGN (srctype)) + srctype = build_aligned_type (srctype, src_align); + if (dest_align > TYPE_ALIGN (desttype)) + desttype = build_aligned_type (desttype, dest_align); + srcvar = fold_build2 (MEM_REF, srctype, src, off0); + destvar = fold_build2 (MEM_REF, desttype, dest, off0); + new_stmt = gimple_build_assign (destvar, srcvar); + goto set_vop_and_replace; + } + else if (num_cap_moves != 0) + /* We don't know if this memory region will contain capabilities or + not. We can't tell for certain that we can use capability copies or + not. Hence do not inline. */ + return false; + if (is_gimple_reg_type (TREE_TYPE (srcvar))) { tree tem = fold_const_aggregate_ref (srcvar); diff --git a/gcc/testsuite/gcc.target/aarch64/morello/memcpy-avoid-inline-2.c b/gcc/testsuite/gcc.target/aarch64/morello/memcpy-avoid-inline-2.c new file mode 100644 index 00000000000..eacbc40e0df --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/morello/memcpy-avoid-inline-2.c @@ -0,0 +1,13 @@ +/* { dg-do compile { target cheri_capability_pure } } */ + +/* The below memcpy operation needs to preserve capabilities. + We can not assume that `x` is aligned, and hence we must emit a memcpy (or a + fully inlined version with runtime checks for alignment). */ +void *f(int **x) +{ + int *y; + __builtin_memcpy(&y, x, 16); + return y; +} + +/* { dg-final { scan-assembler-times "bl\tmemcpy" 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/morello/memcpy-avoid-inline.c b/gcc/testsuite/gcc.target/aarch64/morello/memcpy-avoid-inline.c new file mode 100644 index 00000000000..088a6b98aed --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/morello/memcpy-avoid-inline.c @@ -0,0 +1,27 @@ +/* { dg-do run { target cheri_capability_pure } } */ + +/* The below memcpy operation needs to preserve capabilities. + This is because even though the memcpy is being called on a 16 character + array type (which is not a capability type), in C type-punning via unions is + allowed and hence accessing via the pointer should be capability preserving. + This test ensures that our behaviour is indeed capability preserving. */ +__attribute__((noinline)) +void *f(void *s) +{ + union { + void *ptr; + char bytes[16]; + } u; + __builtin_memcpy(u.bytes, s, 16); + return u.ptr; +} + +int main() +{ + int x = 100; + int *myptr = &x; + int *ptr = f(&myptr); + if (*ptr != 100) + return 1; + return 0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/morello/memcpy-inline-3.c b/gcc/testsuite/gcc.target/aarch64/morello/memcpy-inline-3.c new file mode 100644 index 00000000000..3bfdadc5198 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/morello/memcpy-inline-3.c @@ -0,0 +1,15 @@ +/* { dg-do compile { target cheri_capability_pure } } */ +/* { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */ + +/* In this memcpy we know for certain that the source and destination are both + aligned, hence we can optimise this to a load. */ +extern int *extfunc(int); +void *f() +{ + int *y; + int *x = extfunc(10); + __builtin_memcpy(&y, &x, 16); + return y; +} + +/* { dg-final { scan-assembler-not "bl\tmemcpy" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/morello/memcpy-inline-4.c b/gcc/testsuite/gcc.target/aarch64/morello/memcpy-inline-4.c new file mode 100644 index 00000000000..5642ab573d5 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/morello/memcpy-inline-4.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-fdump-tree-lower" } */ +extern __int128 __attribute__((aligned(16))) outside[2]; +extern __int128 __attribute__((aligned(16))) otheroutside[2]; +void* other() +{ + return __builtin_memcpy(&outside, &otheroutside, 32); +} +/* Memcpy call is inlined in both the tree and the resulting assembly. */ +/* { dg-final { scan-tree-dump-not "memcpy" "lower" } } */ +/* { dg-final { scan-assembler-not "bl\tmemcpy" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/morello/memcpy-inline-5.c b/gcc/testsuite/gcc.target/aarch64/morello/memcpy-inline-5.c new file mode 100644 index 00000000000..910c7da04e5 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/morello/memcpy-inline-5.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-fdump-tree-lower" } */ +extern __int128 __attribute__((aligned(16))) outside[20]; +extern __int128 __attribute__((aligned(16))) otheroutside[20]; +void* other() +{ + return __builtin_memcpy(&outside, &otheroutside, 320); +} +/* Memcpy call is inlined in the tree but not the resulting assembly. */ +/* { dg-final { scan-tree-dump-not "memcpy" "lower" } } */ +/* { dg-final { scan-assembler "bl\tmemcpy" } } */