[gcc(refs/vendors/ARM/heads/morello)] Update PCS for packed structures containing capabilities

public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed

* [gcc(refs/vendors/ARM/heads/morello)] Update PCS for packed structures containing capabilities
@ 2021-09-21  9:15 Matthew Malcomson
  0 siblings, 0 replies; only message in thread
From: Matthew Malcomson @ 2021-09-21  9:15 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:a95417c683681fea6876cc70fcc1c34beb751e80

commit a95417c683681fea6876cc70fcc1c34beb751e80
Author: Matthew Malcomson <matthew.malcomson@arm.com>
Date:   Thu Sep 16 11:34:02 2021 +0100

    Update PCS for packed structures containing capabilities
    
    This is part of a change required to handle structures similar to the
    below where there is no overlap of non-capability members with the
    metadata area and the size is smaller than 32 bytes.
    __attribute__((packed)) { void *a; unsigned short b; }
    
    This needs to be passed through the PCS in capability registers, but we
    do not want to store/load the "overlap" from the last capability
    register into/from memory.
    
    Hence we need to a) Tell GCC that we are passing such a structure in two
    capability registers, b) Ensure GCC can handle extracting the low part
    of a capability register in order to store it to a structure.
    
    There is a choice around point (b) that affects point (a).  If we were
    to describe the registers we pass arguments in as CADImode registers
    then that requires the ability to store parts of a capability register
    into memory and extract some bytes from memory into part of a CADImode
    register (for loading into registers in order to pass as an argument and
    for storing to the stack something that came in as an argument).
    
    Storing part of a capability register into memory is not problematic, it
    just requires implementing `store_bit_field` to handle storing parts of
    CADImode registers.  While it could be tricky to extract part of the
    metadata of a capability and store that somewhere, we would not need
    this for the PCS boundary since all such cases would get caught by the
    "overlapping metadata" clause of the PCS, and hence go through memory.
    
    Loading part of a capability register from memory is a little more
    questionable.  The implementation would be very similar in
    `extract_bit_field` rather than `store_bit_field`, but it means we
    introduce a new vector by which GCC can create a known-invalid CADImode
    value from another value.
    
    Our mechanism by which we model the validity bit of capabilities is to
    treat it implicitly by disallowing methods to create an invalid
    capability.  While there are such methods that we allow, they are out
    of necessity (like allowing the user to create invalid capabilities from
    constants).
    It would not be a large break of this approach to allow creating a
    capability register via a bitfield insertion of a smaller value -- for
    one the only time it happens is at this PCS boundary where we are
    guaranteed to get an invalid capability -- but since we can avoid it we
    choose to do so.
    
    We avoid doing so by saying that we actually pass such arguments in one
    CADImode register and another DImode register.  This is the same as if
    we were to pass in two CADImode registers whenever the size of the
    structure is less than or equal to 24 since the top bits of the CADImode
    register are cleared.  We know that structures where the size is between
    24 and 31 will be passed in memory since there must be some
    non-capability member overlapping capability metadata.
    
    This way we can avoid implementing any way to extract some small number
    of bytes into a CADImode register creating an invalid one.

Diff:
---
 gcc/config/aarch64/aarch64.c | 79 +++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 70 insertions(+), 9 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 2443629f0c9..58ca8ec2fce 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1915,11 +1915,6 @@ enum capability_composite_type {
 static bool
 aarch64_field_overlaps_capability_metadata (const_tree field)
 {
-  tree t = DECL_FIELD_OFFSET (field);
-  /* Must know size of the field, since only ever called when the size of the
-     entire containing type is known.  */
-  gcc_assert (t && tree_fits_uhwi_p (t));
-
   tree ty = TREE_TYPE (field);
 
   /* Flexible array members are not passed, so they don't overlap for
@@ -1927,11 +1922,28 @@ aarch64_field_overlaps_capability_metadata (const_tree field)
   if (TREE_CODE (ty) == ARRAY_TYPE && !TYPE_SIZE (ty))
     return false;
 
+  tree t = DECL_FIELD_OFFSET (field);
+  tree tbo = DECL_FIELD_BIT_OFFSET (field);
+
+  /* Must know starting position and size of the field, since only ever called
+     when the size of the entire containing type is known.  */
+  gcc_assert (t && tree_fits_uhwi_p (t));
   gcc_assert (int_size_in_bytes (TREE_TYPE (field)) != -1);
+  gcc_assert (tbo && tree_fits_uhwi_p (tbo));
 
-  unsigned HOST_WIDE_INT first = tree_to_uhwi (t);
+  unsigned HOST_WIDE_INT bit_offset = tree_to_uhwi (tbo);
+  /* If the modulo is not zero then we overlap part of the first byte and part
+     of the second.  */
+  bool extra_byte_range = (bit_offset % BITS_PER_UNIT != 0);
+  /* The change in the first byte that we overlap is just the number of total
+     bytes that the extra bits move us.  */
+  unsigned HOST_WIDE_INT offset = (bit_offset / BITS_PER_UNIT);
+
+  unsigned HOST_WIDE_INT first = tree_to_uhwi (t) + offset;
   unsigned HOST_WIDE_INT last
     = first + int_size_in_bytes (TREE_TYPE (field)) - 1;
+  if (extra_byte_range)
+    last += 1;
   if ((first >= 8 && first <= 15) || (first >= 24 && first <= 31))
     return true;
   if ((last >= 8 && last <= 15) || (last >= 24 && last <= 31))
@@ -5902,10 +5914,41 @@ aarch64_function_value (const_tree type, const_tree func,
 	 registers.  */
       if (mode == BLKmode
 	  && aarch64_classify_capability_contents (type) == CAPCOM_SOME
-	  && int_size_in_bytes (type) == GET_MODE_SIZE (CADImode) * 2)
+	  && int_size_in_bytes (type) > GET_MODE_SIZE (CADImode))
 	{
+	  gcc_assert ((int_size_in_bytes (type) == GET_MODE_SIZE (CADImode) * 2)
+		      || (int_size_in_bytes (type)
+			  <= GET_MODE_SIZE (CADImode)
+			    + GET_MODE_SIZE (DImode)));
 	  rtx c0 = gen_rtx_REG (CADImode, R0_REGNUM);
-	  rtx c1 = gen_rtx_REG (CADImode, R1_REGNUM);
+	  rtx c1;
+	  if (int_size_in_bytes (type) == GET_MODE_SIZE (CADImode) * 2)
+	      c1 = gen_rtx_REG (CADImode, R1_REGNUM);
+	  else
+	    /* While the PCS calls for us to pass these structures in two
+	       CADImode registers, passing these structures in a CADImode
+	       register and a DImode register is functionally the same.
+
+	       This is because we can never need to use the metadata of the
+	       second capability register, as otherwise
+	       aarch64_classify_capability_contents would have returned
+	       CAPCOM_OVERLAP.
+
+	       On top of that, storing to a DImode register clears the metadata
+	       of a CADImode register, which means the CADImode register ends
+	       up properly specified.
+
+	       We choose to specify things differently here rather than passing
+	       in two CADImode registers because that means we avoid the need
+	       to load smaller sized values into a CADImode register.  If we
+	       passed arguments in CADImode registers then we would need to be
+	       able to load the remainder bits of a structure into a CADImode
+	       register.  This would mean allowing another vector by which we
+	       could generate invalid CADImode variables.  Removing such
+	       vectors is the way we model the validity bit (implicitly, by
+	       disallowing any method to make an invalid capability).  */
+	    c1 = gen_rtx_REG (DImode, R1_REGNUM);
+
 	  rtx e1 = gen_rtx_EXPR_LIST (VOIDmode, c0, gen_int_mode (0, POmode));
 	  rtx e2 = gen_rtx_EXPR_LIST (VOIDmode, c1,
 				      gen_int_mode (GET_MODE_SIZE (CADImode),
@@ -6176,6 +6219,7 @@ aarch64_layout_arg (cumulative_args_t pcum_v, const function_arg_info &arg)
     /* No frontends can create types with variable-sized modes, so we
        shouldn't be asked to pass or return them.  */
     size = GET_MODE_SIZE (mode).to_constant ();
+  HOST_WIDE_INT orig_size = size;
   size = ROUND_UP (size, UNITS_PER_WORD);
 
   allocate_ncrn = (type) ? !(FLOAT_TYPE_P (type)) : !FLOAT_MODE_P (mode);
@@ -6256,8 +6300,9 @@ aarch64_layout_arg (cumulative_args_t pcum_v, const function_arg_info &arg)
       in_cap_regs = true;
       units_per_reg = GET_MODE_SIZE (CADImode);
     }
+
   ncrn = pcum->aapcs_ncrn;
-  nregs = size / units_per_reg;
+  nregs = ROUND_UP (size, units_per_reg) / units_per_reg;
 
   /* C6 - C9.  though the sign and zero extension semantics are
      handled elsewhere.  This is the case where the argument fits
@@ -6306,6 +6351,22 @@ aarch64_layout_arg (cumulative_args_t pcum_v, const function_arg_info &arg)
 	  || (nregs == 1 && !sve_p)
 	  || GET_MODE_CLASS (mode) == MODE_INT)
 	pcum->aapcs_reg = gen_rtx_REG (mode, R0_REGNUM + ncrn);
+      else if (nregs == 2 && in_cap_regs && orig_size < 32)
+	{
+	  /* While the PCS calls for us to pass these structures in two
+	     CADImode registers, we represent passing these structures in a
+	     CADImode register and a DImode register.
+
+	     See the comment in aarch64_function_value for why.
+	     Also see that comment for why we know that `size` (having been
+	     rounded to an 8 byte boundary) must be 24.  */
+	  gcc_assert (size == 24);
+	  rtx c0 = gen_rtx_REG (CADImode, R0_REGNUM + ncrn);
+	  rtx c1 = gen_rtx_REG (DImode, R0_REGNUM + ncrn + 1);
+	  rtx e1 = gen_rtx_EXPR_LIST (VOIDmode, c0, GEN_INT (0));
+	  rtx e2 = gen_rtx_EXPR_LIST (VOIDmode, c1, GEN_INT (units_per_reg));
+	  pcum->aapcs_reg = gen_rtx_PARALLEL (mode, gen_rtvec (2, e1, e2));
+	}
       else
 	{
 	  rtx par;


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2021-09-21  9:15 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-21  9:15 [gcc(refs/vendors/ARM/heads/morello)] Update PCS for packed structures containing capabilities Matthew Malcomson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).