From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <matmal01@sourceware.org>
Received: by sourceware.org (Postfix, from userid 2049)
	id 2F6AE385DC09; Fri, 21 Oct 2022 13:21:24 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2F6AE385DC09
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1666358484;
	bh=XS1SaMX6oxjUaKLB3e6lrJ5X5FPMJ7gl/HYioWKc+Ik=;
	h=From:To:Subject:Date:From;
	b=KLL5j9ZMBuQgvXdlzxMt/GeqaznGvZAGut4a9Lu2B46KV6+4W8rrVQrznizHjv8Jm
	 XMKyK8Z21vaAZgCLk6Mxt+QGpNLWnLpLGdSvZz/Se2qq2l74w/zOlJ0nftBVaWPoul
	 jyyRq5wJh5Ymke+MpSJGkL92lrytHfrTHorVsf+Q=
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
From: Matthew Malcomson <matmal01@gcc.gnu.org>
To: gcc-cvs@gcc.gnu.org
Subject: [gcc(refs/vendors/ARM/heads/morello)] Inlining memcpy and
 tag-preserving behaviour
X-Act-Checkin: gcc
X-Git-Author: Matthew Malcomson <matthew.malcomson@arm.com>
X-Git-Refname: refs/vendors/ARM/heads/morello
X-Git-Oldrev: 7c3edbbe0452d99192b97c88fdd65fb321e85e76
X-Git-Newrev: 476e4ecb6f70ea3c7ad7081670daeebb0d8605d5
Message-Id: <20221021132124.2F6AE385DC09@sourceware.org>
Date: Fri, 21 Oct 2022 13:21:24 +0000 (GMT)
List-Id: <gcc-cvs.sourceware.org>

https://gcc.gnu.org/g:476e4ecb6f70ea3c7ad7081670daeebb0d8605d5

commit 476e4ecb6f70ea3c7ad7081670daeebb0d8605d5
Author: Matthew Malcomson <matthew.malcomson@arm.com>
Date:   Fri Oct 21 14:18:47 2022 +0100

    Inlining memcpy and tag-preserving behaviour
    
    This commit combines two fundamental changes.  First we are bailing out of
    inlining memcpy in another situation.  Second, we are refactoring the
    way that we avoid inlining memcpy using `bucket_type_for_size`.
    
    ---
    Bail out of performing any inlining transformation of memcpy/memmove
    when the size of the operation can hold a capability.  Inlining a memcpy
    is slightly more complicated in the presence of capabilities.  There are
    two cases where we know operations that we can make.
    
    1) When we are moving between two destinations that are known to be
       aligned we can use `ldr`/`str` on capability registers and be certain
       that the operation will maintain the validity of any capability that
       may exist in the source memory region.
    2) When we are moving from a source that is known to not contain any
       capabilities we may use integer loads to perform the operation.
    
    A naive implementation (and the one that we had until now) may determine
    that a memcpy source must not have capabilities if the types of the
    pointers given to the memcpy call do not contain capabilities.  However
    this is not necessarily the case.
    
    One case where this is evident is in passing pointers to a union member,
    and that is indeed the testcase which we add for this patch.
    
    This patch implements a conservative approach with the idea that later
    improvements could identify more complex situations where a memory
    operation may be determined to act on known non-capability data.
    
    N.b. for future reference, it may be useful to look at the discussion
    below for information on which cases may be identified as operations on
    known non-capability data.
    https://github.com/CTSRD-CHERI/llvm-project/pull/506
    
    ---
    A previous commit fixed problems in the C++ frontend inlining a 16 byte
    memcpy into integer loads and stores.  We fixed this by avoiding
    inlining such calls in the case where the size may contain capabilities.
    The C frontend already had accounted for such cases by still inlining
    when the alignment was known to be enough for capabilities.
    
    The reason that the C++ and C frontends were not behaving similarly was
    due to the `bucket_type_for_size` function using the
    `lang_hooks.types.cap_type_for_size` hook.  This had not been set up for
    the C++ frontend.
    
    Here we use the same hook implementation for the C++ frontend as the C
    frontend and hence can use the same method to avoid problematic memcpy
    inlining for both.
    
    N.b. while using this approach we noticed that the existing code can
    already handle capability loads and stores in the branch which uses
    `bucket_type_for_size`.  The existing checks for alignment are
    sufficient to catch the extra requirements on capabilities in a general
    way.
    
    The only thing that appeared troublesome was using
    `fold_const_aggregate_ref`, but this is logically checking to see if our
    value can be determined at compile-time by recognising that it is a
    reference into a constant aggregate.  Hence the existing code should
    correctly handle capability types.

Diff:
---
 gcc/cp/cp-objcp-common.h                           |   2 +
 gcc/gimple-fold.c                                  | 143 +++++++++++++++------
 .../aarch64/morello/memcpy-avoid-inline-2.c        |  13 ++
 .../aarch64/morello/memcpy-avoid-inline.c          |  27 ++++
 .../gcc.target/aarch64/morello/memcpy-inline-3.c   |  15 +++
 .../gcc.target/aarch64/morello/memcpy-inline-4.c   |  11 ++
 .../gcc.target/aarch64/morello/memcpy-inline-5.c   |  11 ++
 7 files changed, 185 insertions(+), 37 deletions(-)

diff --git a/gcc/cp/cp-objcp-common.h b/gcc/cp/cp-objcp-common.h
index 0936f166d5b..9e6c931f8af 100644
--- a/gcc/cp/cp-objcp-common.h
+++ b/gcc/cp/cp-objcp-common.h
@@ -141,6 +141,8 @@ extern tree cxx_simulate_enum_decl (location_t, const char *,
 #define LANG_HOOKS_TYPE_FOR_MODE c_common_type_for_mode
 #undef LANG_HOOKS_TYPE_FOR_SIZE
 #define LANG_HOOKS_TYPE_FOR_SIZE c_common_type_for_size
+#undef LANG_HOOKS_CAP_TYPE_FOR_SIZE
+#define LANG_HOOKS_CAP_TYPE_FOR_SIZE c_common_cap_type_for_size
 #undef LANG_HOOKS_INCOMPLETE_TYPE_ERROR
 #define LANG_HOOKS_INCOMPLETE_TYPE_ERROR cxx_incomplete_type_error
 #undef LANG_HOOKS_TYPE_PROMOTES_TO
diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 73fbf8a97f9..d207d345372 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -688,40 +688,82 @@ size_must_be_zero_p (tree size)
   return vr.zero_p ();
 }
 
-/* Return a type that can be used to copy a BITS sized area around in memory.
-   This is an integer unless there are capabilities of that size.
-   When there are capability types of that size we use them to ensure that
-   hidden validity bits are not lost.  */
-tree
-bucket_type_for_size (unsigned int bits, int unsignedp)
+/* Returns true if an access of MODE at address aligned to ALIGN is known to be
+   slow due to misalignment.  */
+static bool
+alignment_permits_fast_mov (unsigned int align, machine_mode mode)
 {
-  tree ret = lang_hooks.types.cap_type_for_size (bits, unsignedp);
-  return ret && (TYPE_CAP_PRECISION (ret) == bits)
-	  ? ret
-	  : lang_hooks.types.type_for_size (bits, unsignedp);
+  return !(align < GET_MODE_ALIGNMENT (mode)
+	   && targetm.slow_unaligned_access (mode, align)
+	   && optab_handler (movmisalign_optab, mode) == CODE_FOR_nothing);
 }
 
-/* Returns true if (conservatively) a memcpy of length LEN could be used
-   to copy capabilities.
+/* This function returns true if a memcpy of length LEN with known source and
+   destination alignments of SRC_ALIGN and DEST_ALIGN can be implemented in
+   terms of capability loads and stores.
 
-   Note if we also take account of source and destination types here we may be
-   able to safely reject more cases (and thus optimize more), but for now we
-   keep the check simple and assume that any memcpy that copies enough bytes to
-   contain at least one capability on a capability target may indeed be a
-   capability copy.  */
+   If targetm.capability_mode ().exists (), then this also leaves the rounded
+   down number of capability loads and stores required to move LEN bytes in
+   NUM_CAPS.  */
 static bool
-maybe_capability_copy_p (tree len)
+maybe_capability_copy_p (tree len, unsigned src_align, unsigned dest_align,
+			 unsigned *num_caps)
 {
+  *num_caps = 0;
+  if (!tree_fits_uhwi_p (len))
+    return false;
+
+  const unsigned HOST_WIDE_INT bytes = tree_to_uhwi (len);
   const auto maybe_cap_mode = targetm.capability_mode ();
   if (!maybe_cap_mode.exists ())
     return false;
 
   const auto cap_mode = maybe_cap_mode.require ();
-  const unsigned HOST_WIDE_INT ilen = tree_to_uhwi (len);
-  const unsigned HOST_WIDE_INT cap_bytes
-    = GET_MODE_PRECISION (cap_mode) / BITS_PER_UNIT;
+  const unsigned HOST_WIDE_INT cap_bytes = GET_MODE_SIZE (cap_mode);
+  *num_caps = bytes / cap_bytes;
+  /* Only need to check for bytes < cap_bytes if the size is zero, for the
+     memcpy use we'll never see that.  However it's nice to include the check
+     for completeness of the abstraction.  */
+  if (bytes % cap_bytes || bytes < cap_bytes)
+    return false;
 
-  return ilen >= cap_bytes;
+  gcc_assert (TYPE_MODE (uintcap_type_node) == cap_mode);
+  if (!alignment_permits_fast_mov (src_align, cap_mode)
+      || !alignment_permits_fast_mov (dest_align, cap_mode))
+    return false;
+  return true;
+}
+
+/* Return a type that can be used to copy a BITS sized area around in memory.
+   This is an integer unless there are capabilities of that size.
+   When there are capability types of that size  we use them to ensure that
+   hidden validity bits are not lost.
+   When there is a capability which is smaller than the given size but there is
+   no capability matching this size, there is no data type which can carry
+   around `bits` worth of data without losing the validity bit, hence we return
+   NULL_TREE.  */
+tree
+bucket_type_for_size (unsigned int bits, int unsignedp)
+{
+  tree ret = lang_hooks.types.cap_type_for_size (bits, unsignedp);
+  if (ret)
+    return ret;
+
+  /* Check for if the size requested could include a capability.
+     In that case we require a capability type in order to avoid invalidating
+     any capability in this size, but since we have no cap_type_for_size we
+     don't have one available.  Hence there is no available type.  */
+  const unsigned HOST_WIDE_INT bytes = bits / BITS_PER_UNIT;
+  const auto maybe_cap_mode = targetm.capability_mode ();
+  if (maybe_cap_mode.exists ())
+    {
+      const auto cap_mode = maybe_cap_mode.require ();
+      const unsigned HOST_WIDE_INT cap_bytes = GET_MODE_SIZE (cap_mode);
+      if (bytes >= cap_bytes)
+	return NULL_TREE;
+    }
+
+  return lang_hooks.types.type_for_size (bits, unsignedp);
 }
 
 /* Fold function call to builtin mem{{,p}cpy,move}.  Try to detect and
@@ -805,11 +847,6 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
       dest_align = get_pointer_alignment (dest);
       if (tree_fits_uhwi_p (len)
 	  && compare_tree_int (len, MOVE_MAX) <= 0
-	  /* Avoid doing this for possible copies of capabilities: we
-	     would need to use capability loads/stores but this is
-	     only safe if the source and destination are suitably
-	     aligned, and we don't necessarily know this.  */
-	  && !maybe_capability_copy_p (len)
 	  /* FIXME: Don't transform copies from strings with known length.
 	     Until GCC 9 this prevented a case in gcc.dg/strlenopt-8.c
 	     from being handled, and the case was XFAILed for that reason.
@@ -840,17 +877,14 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
 		if (warning != OPT_Wrestrict)
 		  return false;
 
-	      scalar_int_mode mode;
+	      scalar_addr_mode mode;
 	      tree type = bucket_type_for_size (ilen * 8, 1);
 	      if (type
-		  && is_a <scalar_int_mode> (TYPE_MODE (type), &mode)
+		  && is_a <scalar_addr_mode> (TYPE_MODE (type), &mode)
 		  && GET_MODE_SIZE (mode) * BITS_PER_UNIT == ilen * 8
 		  /* If the destination pointer is not aligned we must be able
 		     to emit an unaligned store.  */
-		  && (dest_align >= GET_MODE_ALIGNMENT (mode)
-		      || !targetm.slow_unaligned_access (mode, dest_align)
-		      || (optab_handler (movmisalign_optab, mode)
-			  != CODE_FOR_nothing)))
+		  && alignment_permits_fast_mov (dest_align, mode))
 		{
 		  tree srctype = type;
 		  tree desttype = type;
@@ -860,10 +894,7 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
 		  tree tem = fold_const_aggregate_ref (srcmem);
 		  if (tem)
 		    srcmem = tem;
-		  else if (src_align < GET_MODE_ALIGNMENT (mode)
-			   && targetm.slow_unaligned_access (mode, src_align)
-			   && (optab_handler (movmisalign_optab, mode)
-			       == CODE_FOR_nothing))
+		  else if (!alignment_permits_fast_mov (src_align, mode))
 		    srcmem = NULL_TREE;
 		  if (srcmem)
 		    {
@@ -1124,6 +1155,44 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
 	  return false;
 
       gimple *new_stmt;
+
+      /* Avoid any conversion from a function call to a MEM_REF if the length
+	 that we are copying could contain a capability and we can not
+	 implement the move in terms of capability loads and stores.  */
+      unsigned num_cap_moves = 0;
+      if (capability_type_p (srctype) && capability_type_p (desttype))
+	/* This must be a is_gimple_reg_type and hence we will just implement
+	   this as an assignment in the next if statement.
+
+	   Note that we may be able to look closer at source and destination
+	   types here and identify cases where based on the types we know that
+	   there can be no capability in the memory we are copying across (and
+	   thus optimize more).  */
+	gcc_assert (is_gimple_reg_type (TREE_TYPE (srcvar)));
+      else if (maybe_capability_copy_p (len, src_align, dest_align,
+					&num_cap_moves))
+	/* We don't know if this memory region contains capabilities or not.
+	   However, we know that we can copy the whole thing as a set of
+	   capability loads and stores.  May as well perform the operation like
+	   that to preserve any capabilities which may be there.  */
+	{
+	  desttype = build_array_type_nelts (uintcap_type_node, num_cap_moves);
+	  srctype = desttype;
+	  if (src_align > TYPE_ALIGN (srctype))
+	    srctype = build_aligned_type (srctype, src_align);
+	  if (dest_align > TYPE_ALIGN (desttype))
+	    desttype = build_aligned_type (desttype, dest_align);
+	  srcvar = fold_build2 (MEM_REF, srctype, src, off0);
+	  destvar = fold_build2 (MEM_REF, desttype, dest, off0);
+	  new_stmt = gimple_build_assign (destvar, srcvar);
+	  goto set_vop_and_replace;
+	}
+      else if (num_cap_moves != 0)
+	/* We don't know if this memory region will contain capabilities or
+	   not.  We can't tell for certain that we can use capability copies or
+	   not.  Hence do not inline.  */
+	return false;
+
       if (is_gimple_reg_type (TREE_TYPE (srcvar)))
 	{
 	  tree tem = fold_const_aggregate_ref (srcvar);
diff --git a/gcc/testsuite/gcc.target/aarch64/morello/memcpy-avoid-inline-2.c b/gcc/testsuite/gcc.target/aarch64/morello/memcpy-avoid-inline-2.c
new file mode 100644
index 00000000000..eacbc40e0df
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/morello/memcpy-avoid-inline-2.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target cheri_capability_pure } } */
+
+/* The below memcpy operation needs to preserve capabilities.
+   We can not assume that `x` is aligned, and hence we must emit a memcpy (or a
+   fully inlined version with runtime checks for alignment).  */
+void *f(int **x)
+{
+  int *y;
+  __builtin_memcpy(&y, x, 16);
+  return y;
+}
+
+/* { dg-final { scan-assembler-times "bl\tmemcpy" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/morello/memcpy-avoid-inline.c b/gcc/testsuite/gcc.target/aarch64/morello/memcpy-avoid-inline.c
new file mode 100644
index 00000000000..088a6b98aed
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/morello/memcpy-avoid-inline.c
@@ -0,0 +1,27 @@
+/* { dg-do run { target cheri_capability_pure } } */
+
+/* The below memcpy operation needs to preserve capabilities.
+   This is because even though the memcpy is being called on a 16 character
+   array type (which is not a capability type), in C type-punning via unions is
+   allowed and hence accessing via the pointer should be capability preserving.
+   This test ensures that our behaviour is indeed capability preserving.  */
+__attribute__((noinline))
+void *f(void *s)
+{
+  union {
+    void *ptr;
+    char bytes[16];
+  } u;
+  __builtin_memcpy(u.bytes, s, 16);
+  return u.ptr;
+}
+
+int main()
+{
+  int x = 100;
+  int *myptr = &x;
+  int *ptr = f(&myptr);
+  if (*ptr != 100)
+    return 1;
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/morello/memcpy-inline-3.c b/gcc/testsuite/gcc.target/aarch64/morello/memcpy-inline-3.c
new file mode 100644
index 00000000000..3bfdadc5198
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/morello/memcpy-inline-3.c
@@ -0,0 +1,15 @@
+/* { dg-do compile { target cheri_capability_pure } } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } { "" }  } */
+
+/* In this memcpy we know for certain that the source and destination are both
+   aligned, hence we can optimise this to a load.  */
+extern int *extfunc(int);
+void *f()
+{
+  int *y;
+  int *x = extfunc(10);
+  __builtin_memcpy(&y, &x, 16);
+  return y;
+}
+
+/* { dg-final { scan-assembler-not "bl\tmemcpy" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/morello/memcpy-inline-4.c b/gcc/testsuite/gcc.target/aarch64/morello/memcpy-inline-4.c
new file mode 100644
index 00000000000..5642ab573d5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/morello/memcpy-inline-4.c
@@ -0,0 +1,11 @@
+/* { dg-do compile }  */
+/* { dg-additional-options "-fdump-tree-lower" } */
+extern __int128 __attribute__((aligned(16))) outside[2];
+extern __int128 __attribute__((aligned(16))) otheroutside[2];
+void* other()
+{
+  return __builtin_memcpy(&outside, &otheroutside, 32);
+}
+/* Memcpy call is inlined in both the tree and the resulting assembly.  */
+/* { dg-final { scan-tree-dump-not "memcpy" "lower" } } */
+/* { dg-final { scan-assembler-not "bl\tmemcpy" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/morello/memcpy-inline-5.c b/gcc/testsuite/gcc.target/aarch64/morello/memcpy-inline-5.c
new file mode 100644
index 00000000000..910c7da04e5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/morello/memcpy-inline-5.c
@@ -0,0 +1,11 @@
+/* { dg-do compile }  */
+/* { dg-additional-options "-fdump-tree-lower" } */
+extern __int128 __attribute__((aligned(16))) outside[20];
+extern __int128 __attribute__((aligned(16))) otheroutside[20];
+void* other()
+{
+  return __builtin_memcpy(&outside, &otheroutside, 320);
+}
+/* Memcpy call is inlined in the tree but not the resulting assembly.  */
+/* { dg-final { scan-tree-dump-not "memcpy" "lower" } } */
+/* { dg-final { scan-assembler "bl\tmemcpy" } } */