public inbox for libstdc++-cvs@sourceware.org help / color / mirror / Atom feed
From: Matthew Malcomson <matmal01@gcc.gnu.org> To: gcc-cvs@gcc.gnu.org, libstdc++-cvs@gcc.gnu.org Subject: [gcc(refs/vendors/ARM/heads/morello)] Bound pointers to variables allocated on the stack Date: Thu, 13 Oct 2022 10:46:47 +0000 (GMT) [thread overview] Message-ID: <20221013104648.A87183857839@sourceware.org> (raw) https://gcc.gnu.org/g:7bc86e22db7ad8d6cff34f754d1d8648647e6e1a commit 7bc86e22db7ad8d6cff34f754d1d8648647e6e1a Author: Matthew Malcomson <matthew.malcomson@arm.com> Date: Thu Oct 13 11:29:53 2022 +0100 Bound pointers to variables allocated on the stack This commit applies bounds to accesses of variables allocated on the stack. As a "first draft" to enable this feature for users quickly, this approach is designed to bound everything rather than apply optimisations like only applying bounds to accesses which could possibly be problematic. We do intend to revisit the design later. In this patch we have hooked into stack allocation during the cgfexpand phase where we transform our IR from TREE to RTL. There are three main places where we handle a stack allocation in this transformation. - A local frame variable. - Temporary stack slots for things like passing objects to other functions. - Alloca calls. This patch also handles more arguments for SCBNDS instructions, which improves codegen and makes it easier to generate things. --- Handling local frame variables is the most simple to handle. All stack variables are given an RTL expression which the rest of the code can refer to them by. This is done in `expand_one_stack_var_at`. Large objects which need padding and alignment to ensure precise bounds are handled by adding these requirements in `add_stack_var`. In order to ensure that the rest of the pass knows the new alignment bounds we also update `align_local_variable`. The RTX that is returned from `expand_one_stack_var_at` represents the address of an object on the stack. This RTX will generally need to be forced into an operand so that expressions can reference this object. Since we now use an UNSPEC to represent this address (as it is a bounded address we want to reference things through), we introduce a hook to be used in `force_operand` through which the backend can implement forcing a target-specific pattern into an operand that other patterns can use. Due to the fact that the new alignment requirements can now overflow a standard unsigned int we update the types we store alignment in throughout the cfgexpand.c file. `expand_one_stack_var_at` can be given either a local DECL or an SSA_NAME. SSA_NAME's are only put on the stack accidentally. We believe that their address never needs to be bounded since they are never accessed in an unsafe way (i.e. never accessed in any way that is user-controlled, and their address never escapes the function). --- Temporary stack slots. In various places throughout the compiler we assign a variable to a temporary stack slot in order to work with things. In most cases this is only for a memory access that the compiler generates with a known offset and is known-safe (e.g. when copying an object into a stack slot with a better alignment). However, in some places the address of these temporaries are passed over function boundaries. We ensure that bounds are applied to these pointers whenever this is the case. The times where a temporary is allocated on the stack are generally around arguments to functions. Apart from expansions for specific builtins (cexpi and atomic_compare_exchange) and handling arguments through `emit_library_call_value_1` (which are both pretty clear) the uses of stack temporaries that require bounding are described below. initialize_argument_information can allocate space on the stack through which to pass objects that are usually passed-by-reference in the ABI. These references to variables are bounded since (outside of whole-program-optimisation) we do not know what is done with them in other functions. expand_call's use of `assign_temp` is there to create a location that some other function can populate when it wants to return a structure. Hence this needs to be bounded. N.b. as yet I have not managed to trigger this clause without artificially changing the control flow in a debugger. Hence there is no test for this clause. In order to only apply bounds in a few places while not changing the code too much we add a default argument to some assign_*temp* functions to choose whether bounds should be applied or not, and create a new function `assign_stack_local_narrowed` which always returns bounded pointer. These temporary slots can be combined with each other for optimization purposes. That leads to an access of one slot being made through a larger offset from the RTX pointing to another slot. That is broken with tight bounds, so we disable the temporary slot combination for bounded slots. --- Alloca slots For variable sized arrays and direct alloca calls, creating an RTX for a stack slot happens in `expand_builtin_alloca`. The actual RTX is generated using `allocate_dynamic_stack_space`, and this function is used in three other places throughout the code. From auditing those call-sites it seems that `allocate_dynamic_stack_space` would be best returning a bounded pointer (only one other place uses the return value of this function, and that is `initializer_argument_information` which needs a bounded pointer). Hence the actual applying of bounds is done in `allocate_dynamic_stack_space`. In order to apply bounds to a size that is not known at compile-time we need to use the RRLEN and RRMASK instructions. It is a little awkward to fit these into the existing mechanism by which `allocate_dynamic_stack_space` ensures its allocation can be aligned. The usual mechanism allocates enough space to align the pointer upwards after the allocation has been made. RRMASK provides a mask by which we can more easily align downwards by using it in an AND operation. In order to accommodate this we *first* align the stack pointer downwards to the alignment required for precise bounds, *then* allocate the relevant size (including extra padding for the alignment requested by the user), and finally align upwards to this alignment requested by the user. On top of that, we need to accommodate the split-stack functionality which can allocate this object off the stack using `malloc`. Since `malloc` should already provide the necessary padding and bounding of pointers we need to pass this just the size of object that we need. This requires splitting the single concept of a size in this function into three different sizes. 1) The size we need to allocate on the stack (including all padding). 2) The size for bounds we want to apply to our pointer. 3) The size of the object. --- N.b. in order to be able to use the hooks we already have for finding the padding and alignment requirements of objects with a compile-time known size we redefine the semantics of these hooks (though the behaviour does not change). Originally they were defined to be general "does the backend want to add padding and alignment to a global variable" hook. Since the only use we have had for them so far is adding padding and alignment for capabilities, and this use is needed on stack variables too, we update the purpose of these hooks to be "does the backend need to handle padding and alignment for the precise bounds of capabilities". That allows their use in stack allocation without breaking the purpose of the hook. With the changed codegen we hit two clauses in `instantiate_virtual_regs_in_insn` that we had not hit before. Both of these already had comments wondering whether we needed to update them for capabilities. We do that here. --- Some extra testing beyond running the testsuite has been done. One large bit of extra testing is that the testsuite was run with *all* allocations done via `scbndse` rather than `scbnds`. This catches any time that large allocations did not have their bounds and alignment correctly adjusted (since `scbndse` clears the validity bit of a capability if the bounds and alignment are not precisely representable). We do not use this change in the final patch since the SCBNDSE instruction requires a register for the size argument, while the SCBNDS instruction can take some constants. Hence using SCBNDS allows fewer instructions and less register pressure in many codegen situations. --- N.b. There is a decision around whether stack slots for local frame variables should be allowed to be shared between variables of disjoint lifetimes. Since the same basic problem can be seen around stack slots for different frames at different times we chose not to enforce separation of stack slots for different variables in one function. However this has not been decided for good -- we intend to revisit it and properly think on the matter when revisiting this feature as a whole. Diff: --- gcc/builtins.c | 6 +- gcc/calls.c | 18 +- gcc/cfgexpand.c | 75 ++++--- gcc/common.opt | 5 + gcc/config/aarch64/aarch64-morello.md | 22 +- gcc/config/aarch64/aarch64-protos.h | 1 + gcc/config/aarch64/aarch64.c | 91 +++++++- gcc/config/aarch64/aarch64.md | 1 + gcc/config/aarch64/constraints.md | 6 + gcc/config/aarch64/predicates.md | 4 + gcc/doc/tm.texi | 39 +++- gcc/doc/tm.texi.in | 4 + gcc/explow.c | 191 ++++++++++++++++- gcc/explow.h | 7 +- gcc/expr.c | 9 +- gcc/function.c | 228 +++++++++++++++------ gcc/function.h | 7 +- gcc/hooks.c | 14 ++ gcc/target.def | 45 +++- gcc/target.h | 32 +++ gcc/testsuite/gcc.dg/torture/matrix-6.c | 4 +- gcc/testsuite/gcc.dg/torture/pr36227.c | 8 +- .../aarch64/morello/builtin_cheri_bounds_set.c | 2 +- .../morello/builtin_cheri_bounds_set_exact.c | 2 +- .../morello/restrictions/alloca_big_alignment.c | 18 ++ .../alloca_big_allocation_outgoing_args.c | 38 ++++ .../restrictions/alloca_detect_custom_size.c | 23 +++ .../morello/restrictions/alloca_overflow_partial.c | 18 ++ .../morello/restrictions/alloca_overflow_right.c | 18 ++ .../morello/restrictions/alloca_underflow_left.c | 18 ++ .../morello/restrictions/asan-stack-small.c | 30 +++ .../aarch64/morello/restrictions/bitfield-1.c | 23 +++ .../aarch64/morello/restrictions/bitfield-2.c | 22 ++ .../aarch64/morello/restrictions/bitfield-3.c | 23 +++ .../aarch64/morello/restrictions/bitfield-4.c | 23 +++ .../aarch64/morello/restrictions/bitfield-5.c | 24 +++ .../morello/restrictions/function-argument-1.c | 27 +++ .../morello/restrictions/function-argument-2.c | 24 +++ .../morello/restrictions/function-argument-3.c | 42 ++++ .../morello/restrictions/function-argument-4.c | 23 +++ .../morello/restrictions/function-argument-5.c | 35 ++++ .../morello/restrictions/function-argument-6.c | 28 +++ .../restrictions/function-argument-shouldpass.c | 27 +++ .../morello/restrictions/global-overflow-1.c | 24 +++ .../aarch64/morello/restrictions/heap-overflow-1.c | 26 +++ .../aarch64/morello/restrictions/memcmp-1.c | 17 ++ .../aarch64/morello/restrictions/misalign-1.c | 36 ++++ .../aarch64/morello/restrictions/misalign-2.c | 36 ++++ .../morello/restrictions/morello-restrictions.exp | 58 ++++++ .../restrictions/parameter-temp-on-stack-2.c | 28 +++ .../morello/restrictions/parameter-temp-on-stack.c | 30 +++ .../morello/restrictions/stack-overflow-1.c | 19 ++ .../morello/restrictions/strncpy-overflow-1.c | 14 ++ .../gcc.target/aarch64/stack-check-cfa-1.c | 10 +- .../gcc.target/aarch64/stack-check-cfa-2.c | 10 +- gcc/varasm.c | 51 +++-- .../uninitialized_default_n/sizes.cc | 4 +- .../uninitialized_value_construct_n/sizes.cc | 6 +- .../lexicographical_compare/constrained.cc | 2 +- 59 files changed, 1498 insertions(+), 178 deletions(-) diff --git a/gcc/builtins.c b/gcc/builtins.c index ac8190678f4..aba25bc861a 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -2727,8 +2727,8 @@ expand_builtin_cexpi (tree exp, rtx target) else gcc_unreachable (); - op1 = assign_temp (TREE_TYPE (arg), 1, 1); - op2 = assign_temp (TREE_TYPE (arg), 1, 1); + op1 = assign_temp (TREE_TYPE (arg), 1, 1, true); + op2 = assign_temp (TREE_TYPE (arg), 1, 1, true); op1a = copy_addr_to_reg (XEXP (op1, 0)); op2a = copy_addr_to_reg (XEXP (op2, 0)); top1 = make_tree (build_pointer_type (TREE_TYPE (arg)), op1a); @@ -7164,7 +7164,7 @@ expand_ifn_atomic_compare_exchange_into_call (gcall *call, machine_mode mode) vec->quick_push (gimple_call_arg (call, 0)); tree expected = gimple_call_arg (call, 1); rtx x = assign_stack_temp_for_type (mode, GET_MODE_SIZE (mode), - TREE_TYPE (expected)); + TREE_TYPE (expected), true); rtx expd = expand_expr (expected, x, mode, EXPAND_NORMAL); if (expd != x) emit_move_insn (x, expd); diff --git a/gcc/calls.c b/gcc/calls.c index 99835da958b..2cd7f8bf282 100644 --- a/gcc/calls.c +++ b/gcc/calls.c @@ -2339,7 +2339,7 @@ initialize_argument_information (int num_actuals ATTRIBUTE_UNUSED, set_mem_attributes (copy, type, 1); } else - copy = assign_temp (type, 1, 0); + copy = assign_temp (type, 1, 0, true); store_expr (args[i].tree_value, copy, 0, false, false); @@ -3762,7 +3762,7 @@ expand_call (tree exp, rtx target, int ignore) /* For variable-sized objects, we must be called with a target specified. If we were to allocate space on the stack here, we would have no way of knowing when to free it. */ - rtx d = assign_temp (rettype, 1, 1); + rtx d = assign_temp (rettype, 1, 1, true); structure_value_addr = XEXP (d, 0); target = 0; } @@ -4309,7 +4309,13 @@ expand_call (tree exp, rtx target, int ignore) } /* We can pass TRUE as the 4th argument because we just saved the stack pointer and will restore it right after - the call. */ + the call. + MORELLO TODO We should understand when and how this space is + used. Usually allocate_dynamic_stack_space returns a pointer + and that pointer is used. If this function is doing something + different then it's likely whatever space is allocated is not + used through a bounded pointer. This may or may not be OK. + */ allocate_dynamic_stack_space (push_size, 0, BIGGEST_ALIGNMENT, -1, true); } @@ -5119,7 +5125,7 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx value, if (value != 0 && MEM_P (value)) mem_value = value; else - mem_value = assign_temp (tfom, 1, 1); + mem_value = assign_temp (tfom, 1, 1, true); #endif /* This call returns a big structure. */ flags &= ~(ECF_CONST | ECF_PURE | ECF_LOOPING_CONST_OR_PURE); @@ -5473,8 +5479,8 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx value, { argvec[argnum].save_area = assign_stack_temp (BLKmode, - argvec[argnum].locate.size.constant - ); + argvec[argnum].locate.size.constant, + true); emit_block_move (validize_mem (copy_rtx (argvec[argnum].save_area)), diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c index 2ea0c2f1505..4aafeb22f36 100644 --- a/gcc/cfgexpand.c +++ b/gcc/cfgexpand.c @@ -317,7 +317,7 @@ public: /* The *byte* alignment required for this variable. Or as, with the size, the alignment for this partition. */ - unsigned int alignb; + unsigned HOST_WIDE_INT alignb; /* The partition representative. */ size_t representative; @@ -361,23 +361,30 @@ static bool has_short_buffer; /* Compute the byte alignment to use for DECL. Ignore alignment we can't do with expected alignment of the stack boundary. */ -static unsigned int -align_local_variable (tree decl, bool really_expand) +static unsigned HOST_WIDE_INT +align_local_variable (tree decl, poly_uint64 size, bool really_expand) { - unsigned int align; + /* `alignment_pad_from_bits` can increase the alignment, and it can + increase the alignment to very large values. + That is why the `alignb` member of `class stack_var` is an unsigned + HOST_WIDE_INT and we use an unsigned HOST_WIDE_INT here. */ + unsigned HOST_WIDE_INT align; if (TREE_CODE (decl) == SSA_NAME) align = TYPE_ALIGN (TREE_TYPE (decl)); else - { - align = LOCAL_DECL_ALIGNMENT (decl); - /* Don't change DECL_ALIGN when called from estimated_stack_frame_size. - That is done before IPA and could bump alignment based on host - backend even for offloaded code which wants different - LOCAL_DECL_ALIGNMENT. */ - if (really_expand) - SET_DECL_ALIGN (decl, align); - } + align = LOCAL_DECL_ALIGNMENT (decl); + + if (size.is_constant () && applying_cheri_stack_bounds ()) + align = alignment_pad_from_bits (size.to_constant (), align, + TREE_CODE (decl) == SSA_NAME ? NULL_TREE : decl); + + /* Don't change DECL_ALIGN when called from estimated_stack_frame_size. + That is done before IPA and could bump alignment based on host + backend even for offloaded code which wants different + LOCAL_DECL_ALIGNMENT. */ + if (TREE_CODE (decl) != SSA_NAME && really_expand) + SET_DECL_ALIGN (decl, align); return align / BITS_PER_UNIT; } @@ -448,11 +455,15 @@ add_stack_var (tree decl, bool really_expand) ? TYPE_SIZE_UNIT (TREE_TYPE (decl)) : DECL_SIZE_UNIT (decl); v->size = tree_to_poly_uint64 (size); + if (v->size.is_constant () && applying_cheri_stack_bounds ()) + v->size += targetm.data_padding_size + (v->size.to_constant (), 0, + TREE_CODE (decl) == SSA_NAME ? NULL_TREE : decl); /* Ensure that all variables have size, so that &a != &b for any two variables that are simultaneously live. */ if (known_eq (v->size, 0U)) v->size = 1; - v->alignb = align_local_variable (decl, really_expand); + v->alignb = align_local_variable (decl, v->size, really_expand); /* An alignment of zero can mightily confuse us later. */ gcc_assert (v->alignb != 0); @@ -680,8 +691,8 @@ stack_var_cmp (const void *a, const void *b) { size_t ia = *(const size_t *)a; size_t ib = *(const size_t *)b; - unsigned int aligna = stack_vars[ia].alignb; - unsigned int alignb = stack_vars[ib].alignb; + unsigned HOST_WIDE_INT aligna = stack_vars[ia].alignb; + unsigned HOST_WIDE_INT alignb = stack_vars[ib].alignb; poly_int64 sizea = stack_vars[ia].size; poly_int64 sizeb = stack_vars[ib].size; tree decla = stack_vars[ia].decl; @@ -911,7 +922,7 @@ partition_stack_vars (void) for (si = 0; si < n; ++si) { size_t i = stack_vars_sorted[si]; - unsigned int ialign = stack_vars[i].alignb; + unsigned HOST_WIDE_INT ialign = stack_vars[i].alignb; poly_int64 isize = stack_vars[i].size; /* Ignore objects that aren't partition representatives. If we @@ -923,7 +934,7 @@ partition_stack_vars (void) for (sj = si + 1; sj < n; ++sj) { size_t j = stack_vars_sorted[sj]; - unsigned int jalign = stack_vars[j].alignb; + unsigned HOST_WIDE_INT jalign = stack_vars[j].alignb; poly_int64 jsize = stack_vars[j].size; /* Ignore objects that aren't partition representatives. */ @@ -974,7 +985,8 @@ dump_stack_var_partition (void) fprintf (dump_file, "Partition %lu: size ", (unsigned long) i); print_dec (stack_vars[i].size, dump_file); - fprintf (dump_file, " align %u\n", stack_vars[i].alignb); + fprintf (dump_file, " align " HOST_WIDE_INT_PRINT_UNSIGNED "\n", + stack_vars[i].alignb); for (j = i; j != EOC; j = stack_vars[j].next) { @@ -998,6 +1010,18 @@ expand_one_stack_var_at (tree decl, rtx base, unsigned base_align, gcc_assert (known_eq (offset, trunc_int_for_mode (offset, POmode))); x = plus_constant (Pmode, base, offset); + if (applying_cheri_stack_bounds () + && TREE_CODE (decl) != SSA_NAME) + { + tree size = DECL_SIZE_UNIT (decl); + + unsigned HOST_WIDE_INT size_int + = tree_to_poly_uint64 (size).to_constant (); + size_int += targetm.data_padding_size (size_int, 0, decl); + rtx size_rtx = gen_int_mode (size_int, POmode); + + x = targetm.cap_narrowed_pointer (x, size_rtx); + } x = gen_rtx_MEM (TREE_CODE (decl) == SSA_NAME ? TYPE_MODE (TREE_TYPE (decl)) : DECL_MODE (SSAVAR (decl)), x); @@ -1053,7 +1077,7 @@ expand_stack_vars (bool (*pred) (size_t), class stack_vars_data *data) size_t si, i, j, n = stack_vars_num; poly_uint64 large_size = 0, large_alloc = 0; rtx large_base = NULL; - unsigned large_align = 0; + unsigned HOST_WIDE_INT large_align = 0; bool large_allocation_done = false; tree decl; @@ -1066,7 +1090,7 @@ expand_stack_vars (bool (*pred) (size_t), class stack_vars_data *data) /* Find the total size of these variables. */ for (si = 0; si < n; ++si) { - unsigned alignb; + unsigned HOST_WIDE_INT alignb; i = stack_vars_sorted[si]; alignb = stack_vars[i].alignb; @@ -1102,7 +1126,7 @@ expand_stack_vars (bool (*pred) (size_t), class stack_vars_data *data) for (si = 0; si < n; ++si) { rtx base; - unsigned base_align, alignb; + unsigned HOST_WIDE_INT base_align, alignb; poly_int64 offset; i = stack_vars_sorted[si]; @@ -1338,7 +1362,7 @@ expand_one_stack_var_1 (tree var) else { size = tree_to_poly_uint64 (DECL_SIZE_UNIT (var)); - byte_align = align_local_variable (var, true); + byte_align = align_local_variable (var, size, true); } /* We handle highly aligned variables in expand_stack_vars. */ @@ -1564,9 +1588,8 @@ defer_stack_allocation (tree var, bool toplevel) if (flag_stack_protect || asan_sanitize_stack_p ()) return true; - unsigned int align = TREE_CODE (var) == SSA_NAME - ? TYPE_ALIGN (TREE_TYPE (var)) - : DECL_ALIGN (var); + unsigned HOST_WIDE_INT align + = align_local_variable (var, size, false) * BITS_PER_UNIT; /* We handle "large" alignment via dynamic allocation. We want to handle this extra complication in only one place, so defer them. */ diff --git a/gcc/common.opt b/gcc/common.opt index 9293aa1173a..d9ab8c4c18a 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1251,6 +1251,11 @@ ffake-hybrid-init= Common Joined RejectNegative UInteger Var(flag_fake_hybrid_init) Init(INT_MAX) Use a round-robin counter for -ffake-hybrid and specify its initial value. +fcheri-stack-bounds +Common Var(flag_cheri_stack_bounds, 1) Init(1) +Developer option to enable or disable applying CHERI bounds to stack objects. +This is only relevant on CHERI targets. + fdebug-prefix-map= Common Joined RejectNegative Var(common_deferred_options) Defer -fdebug-prefix-map=<old>=<new> Map one directory name to another in debug information. diff --git a/gcc/config/aarch64/aarch64-morello.md b/gcc/config/aarch64/aarch64-morello.md index cc93fc9397c..d3e4c4ac399 100644 --- a/gcc/config/aarch64/aarch64-morello.md +++ b/gcc/config/aarch64/aarch64-morello.md @@ -216,13 +216,15 @@ ) (define_insn "cap_bounds_set_cadi" - [(set (match_operand:CADI 0 "register_operand" "=rk") - (unspec:CADI [(match_operand:CADI 1 "register_operand" "rk") - (match_operand:DI 2 "register_operand" "r")] + [(set (match_operand:CADI 0 "register_operand" "=rk,rk") + (unspec:CADI [(match_operand:CADI 1 "register_operand" "rk,rk") + (match_operand:DI 2 "aarch64_scbnds_operand" "r,Ucc")] UNSPEC_CHERI_BOUNDS_SET)) ] "TARGET_MORELLO" - "scbnds\\t%0, %1, %2" + "@ + scbnds\\t%0, %1, %2 + scbnds\\t%0, %1, %2" ) (define_insn "cap_bounds_set_exact_cadi" @@ -235,6 +237,18 @@ "scbndse\\t%0, %1, %2" ) +(define_insn "cap_bounds_set_maybe_exact" + [(set (match_operand:CADI 0 "register_operand" "=rk,rk") + (unspec:CADI [(match_operand:CADI 1 "register_operand" "rk,rk") + (match_operand:DI 2 "aarch64_scbnds_operand" "r,Ucc")] + UNSPEC_CHERI_BOUNDS_SET_MAYBE_EXACT)) + ] + "TARGET_MORELLO" + "@ + scbndse\\t%0, %1, %2 + scbnds\\t%0, %1, %2" +) + (define_insn "cap_seal_cadi" [(set (match_operand:CADI 0 "register_operand" "=rk") (unspec:CADI [(match_operand:CADI 1 "register_operand" "rk") diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 4b56cfa5155..142ba4dfb7a 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -564,6 +564,7 @@ bool aarch64_uimm12_shift (HOST_WIDE_INT); int aarch64_movk_shift (const wide_int_ref &, const wide_int_ref &); bool aarch64_use_return_insn_p (void); const char *aarch64_output_casesi (rtx *); +bool aarch64_scbnds_immediate (unsigned HOST_WIDE_INT); unsigned int aarch64_tlsdesc_abi_id (); enum aarch64_symbol_type aarch64_classify_symbol (rtx, HOST_WIDE_INT); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 1bf6113f62c..2011a062610 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -3010,8 +3010,8 @@ aarch64_morello_precise_bounds_align (uint64_t size, uint64_t align, section. Without a named section the user can not know which section the compiler will pick and hence can't be sure what padding will be between objects. */ - if (decl && DECL_USER_ALIGN (decl) && DECL_SECTION_NAME (decl) - && required.align > align) + if (decl && DECL_USER_ALIGN (decl) && is_global_var (decl) + && DECL_SECTION_NAME (decl) && required.align > align) { warning (OPT_Wcheri_bounds, "object %qD has cheri alignment overridden by a user-specified one", @@ -3048,7 +3048,8 @@ aarch64_data_padding_size (uint64_t size, uint64_t align ATTRIBUTE_UNUSED, const section. Without a named section the user can not know which section the compiler will pick and hence can't be sure what padding will be between objects. */ - if (decl && DECL_USER_ALIGN (decl) && DECL_SECTION_NAME (decl)) + if (decl && DECL_USER_ALIGN (decl) && is_global_var (decl) + && DECL_SECTION_NAME (decl)) return 0; /* As in align_variable, TLS space is too precious to waste. */ if (decl && DECL_THREAD_LOCAL_P (decl)) @@ -25123,6 +25124,84 @@ aarch64_target_capability_mode () return opt_scalar_addr_mode (); } +bool +aarch64_scbnds_immediate (unsigned HOST_WIDE_INT size) +{ + /* First attempt the unshifted version. */ + if ((size & ((unsigned HOST_WIDE_INT)0x3f)) == size) + return true; + + /* Then shift and re-attempt. */ + if (size & (unsigned HOST_WIDE_INT)0xf) + return false; + size >>= 4; + + if ((size & ((unsigned HOST_WIDE_INT)0x3f)) == size) + return true; + return false; +} + +/* Implement TARGET_CAP_NARROWED_POINTER hook. */ +rtx +aarch64_target_cap_narrowed_pointer (rtx base, rtx size) +{ + rtvec temp_vec = gen_rtvec (2, base, size); + /* We generate a pattern that can get matched by cap_bounds_set_maybe_exact + rather than either one of the cap_bounds_set_exact_cadi or + cap_bounds_set_cadi patterns. Using either UNSPEC_CHERI_BOUNDS_SET or + UNSPEC_CHERI_BOUNDS_SET_EXACT commits to using a specific instruction at + the point where we generate this expression. In general we want to emit + an SCBNDSE instruction whenever we are already using a register to hold + the size, but want to emit an SCBNDS instruction when we have a suitable + constant size since we do not want to unnecessarily emit extra + instructions. + + To achieve this we return an expression which can be loaded into a + register with the `cap_bounds_set_maybe_exact` insn. That pattern accepts + either a constant or a register, and emits the SCBNDSE instruction when it + is given a register. This allows the RTL passes the opportunity to + recognise that something which was a pseudo register when this expression + was formed is now a constant. */ + return gen_rtx_UNSPEC (CADImode, temp_vec, + UNSPEC_CHERI_BOUNDS_SET_MAYBE_EXACT); +} + +/* Implement TARGET_FORCE_OPERAND. */ +rtx +aarch64_target_force_operand (rtx value, rtx target) +{ + if (GET_CODE (value) != UNSPEC) + return NULL_RTX; + if (XINT (value, 1) != UNSPEC_CHERI_BOUNDS_SET + && XINT (value, 1) != UNSPEC_CHERI_BOUNDS_SET_EXACT + && XINT (value, 1) != UNSPEC_CHERI_BOUNDS_SET_MAYBE_EXACT) + return NULL_RTX; + gcc_assert (GET_MODE (value) == CADImode); + + /* Ensure each argument is valid for an insn. */ + rtx first_arg = force_reg (CADImode, XVECEXP (value, 0, 0)); + rtx second_arg = XVECEXP (value, 0, 1); + if (!CONST_INT_P (second_arg)) + second_arg = force_reg (DImode, second_arg); + rtvec new_arg_vec = gen_rtvec (2, first_arg, second_arg); + value = gen_rtx_UNSPEC (CADImode, new_arg_vec, XINT (value, 1)); + + /* Ensure the total expression is put into a register. */ + rtx temp; + if (target && REG_P (target) && GET_MODE (target) == CADImode) + temp = target; + else + temp = gen_reg_rtx (CADImode); + emit_move_insn (temp, value); + /* N.b. Look at the `force_reg` code for optimisation markings on the + previous instruction. */ + if (!target) + return temp; + if (temp != target) + emit_move_insn (target, temp); + return target; +} + /* Implement TARGET_CODE_ADDRESS_FROM_POINTER hook. */ rtx aarch64_code_address_from_pointer (rtx x) @@ -25801,6 +25880,12 @@ aarch64_libgcc_floating_mode_supported_p #undef TARGET_CAPABILITY_MODE #define TARGET_CAPABILITY_MODE aarch64_target_capability_mode +#undef TARGET_CAP_NARROWED_POINTER +#define TARGET_CAP_NARROWED_POINTER aarch64_target_cap_narrowed_pointer + +#undef TARGET_FORCE_OPERAND +#define TARGET_FORCE_OPERAND aarch64_target_force_operand + #undef TARGET_DWARF_FRAME_REG_MODE #define TARGET_DWARF_FRAME_REG_MODE aarch64_dwarf_frame_reg_mode diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index f4214f25757..854f2f96c43 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -140,6 +140,7 @@ UNSPEC_CHERI_ADDR_SET UNSPEC_CHERI_BOUNDS_SET UNSPEC_CHERI_BOUNDS_SET_EXACT + UNSPEC_CHERI_BOUNDS_SET_MAYBE_EXACT UNSPEC_CHERI_BASE_GET UNSPEC_CHERI_BIT_EQ UNSPEC_CHERI_CAP_BUILD diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md index bfa379b2fe9..386ce5b64d6 100644 --- a/gcc/config/aarch64/constraints.md +++ b/gcc/config/aarch64/constraints.md @@ -85,6 +85,12 @@ using multiple instructions, with one temporary register." (match_operand 0 "aarch64_split_add_offset_immediate")) +(define_constraint "Ucc" + "A constraint matching a const_int immediate which can be used in scbnds." + (and (match_code "const_int") + (match_test "aarch64_scbnds_immediate (ival)"))) + + (define_constraint "J" "A constant that can be used with a SUB operation (once negated)." (and (match_code "const_int") diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md index 3e79fbcc572..50448bd19fc 100644 --- a/gcc/config/aarch64/predicates.md +++ b/gcc/config/aarch64/predicates.md @@ -60,6 +60,10 @@ (ior (match_operand 0 "register_operand") (match_operand 0 "aarch64_ccmp_immediate"))) +(define_predicate "aarch64_scbnds_operand" + (ior (match_operand 0 "register_operand") + (match_code "const_int"))) + (define_predicate "aarch64_simd_register" (and (match_code "reg") (match_test "FP_REGNUM_P (REGNO (op))"))) diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 27a5a226605..30705cb576d 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -1158,13 +1158,16 @@ padding of the given amount when emitting this variable. This hook takes both its arguments in bytes. The default definition returns @code{0}. -The typical use of this hook is to add padding to the end of objects on -a capability architecture to ensure that the bounds of a capability pointing -to objects do not allow accesses to any neighbouring objects. +This hook is used to add padding to the end of objects on a capability +architecture to ensure that the bounds of a capability pointing to objects do +not allow accesses to any neighbouring objects. A requirement on the implementation of this function is that if @var{decl} has a user-specified alignment on a decl which has an associated section then this hook must return @code{0}. + +This hook must be defined so that it can be used on global variables and stack +variables. @end deftypefn @deftypefn {Target Hook} {unsigned HOST_WIDE_INT} TARGET_DATA_ALIGNMENT (unsigned HOST_WIDE_INT @var{size}, unsigned HOST_WIDE_INT @var{align}, const_tree @var{decl}) @@ -1173,11 +1176,20 @@ size @var{size} when writing it out to memory. This hook takes its argument in bytes. The default definition returns the alignment given as an argument. +The hook is used to ensure the alignment required so that the size @var{size} +can be precisely bounded using the CHERI bounds compression format. -The typical use of this hook is to ensure alignment in order to give precise -bounds for a capability pointing to the given object on capability systems. A requirement on the implementation of this function is that if @var{decl} -has a user-specified alignment then this hook must not decrease the alignment. +is a global variable with a named section and a user-specified alignment then +this hook must not return a greater alignment. +This requirement is so that users can ensure zero padding between objects +(something that is used in crtstuff.c). + +Similar is required for thread local variables due to this space being too +precious to waste (matching the @code{align_variable} behaviour). + +This hook must be defined so that it can be used on global variables and stack +variables. @end deftypefn @deftypefn {Target Hook} HOST_WIDE_INT TARGET_VECTOR_ALIGNMENT (const_tree @var{type}) @@ -4377,6 +4389,21 @@ Defines the mode for @code{__intcap_t} and @code{__uintcap_t}. opt_scalar_addr_mode. This is the default. @end deftypefn +@deftypefn {Target Hook} rtx TARGET_CAP_NARROWED_POINTER (rtx @var{base}, rtx @var{size}) +Return an RTL expression which represents @var{base} narrowed such that it + can only access a length of @var{size} bytes. + This is designed for capability targets. + Note that this hook must not emit any insns. +@end deftypefn + +@deftypefn {Target Hook} rtx TARGET_FORCE_OPERAND (rtx @var{value}, rtx @var{target}) +Target hook to enable @code{force_operand} on target-specific RTL + expressions. + When @code{cap_narrowed_pointer} returns an RTX which can not be made an + operand by the general @code{force_operand}, this hook may need to be + implemented in order to handle said RTX. +@end deftypefn + @deftypefn {Target Hook} machine_mode TARGET_TRANSLATE_MODE_ATTRIBUTE (machine_mode @var{mode}) Define this hook if during mode attribute processing, the port should translate machine_mode @var{mode} to another mode. For example, rs6000's diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 8e0055a0af7..52bd1afc9a0 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -3386,6 +3386,10 @@ stack. @hook TARGET_CAPABILITY_MODE +@hook TARGET_CAP_NARROWED_POINTER + +@hook TARGET_FORCE_OPERAND + @hook TARGET_TRANSLATE_MODE_ATTRIBUTE @hook TARGET_SCALAR_MODE_SUPPORTED_P diff --git a/gcc/explow.c b/gcc/explow.c index d7b04803e3f..e72f7221227 100644 --- a/gcc/explow.c +++ b/gcc/explow.c @@ -1263,7 +1263,7 @@ record_new_stack_level (void) /* Return an rtx doing runtime alignment to REQUIRED_ALIGN on TARGET. */ rtx -align_dynamic_address (rtx target, unsigned required_align) +align_dynamic_address (rtx target, unsigned HOST_WIDE_INT required_align) { return expand_align_up (Pmode, target, gen_int_mode (required_align / BITS_PER_UNIT, POmode), NULL_RTX, @@ -1286,7 +1286,7 @@ align_dynamic_address (rtx target, unsigned required_align) the additional size returned. */ void get_dynamic_stack_size (rtx *psize, unsigned size_align, - unsigned required_align, + unsigned HOST_WIDE_INT required_align, HOST_WIDE_INT *pstack_usage_size) { rtx size = *psize; @@ -1329,7 +1329,8 @@ get_dynamic_stack_size (rtx *psize, unsigned size_align, known_align = BITS_PER_UNIT; if (required_align > known_align) { - unsigned extra = (required_align - known_align) / BITS_PER_UNIT; + unsigned HOST_WIDE_INT extra + = (required_align - known_align) / BITS_PER_UNIT; size = plus_constant (POmode, size, extra); size = force_operand (size, NULL_RTX); if (size_align > known_align) @@ -1414,17 +1415,34 @@ get_stack_check_protect (void) in the course of the execution of the function. It is always safe to pass FALSE here and the following criterion is sufficient in order to pass TRUE: every path in the CFG that starts at the allocation point and - loops to it executes the associated deallocation code. */ + loops to it executes the associated deallocation code. + + For capability targets this returns a bounded pointer to the allocated + space. */ rtx allocate_dynamic_stack_space (rtx size, unsigned size_align, - unsigned required_align, + unsigned HOST_WIDE_INT required_align, HOST_WIDE_INT max_size, bool cannot_accumulate) { HOST_WIDE_INT stack_usage_size = -1; rtx_code_label *final_label; rtx final_target, target; + rtx cheri_alignment_mask, cheri_alignment_hunk; + /* We introduce a distinction between different sizes in this function. + There is "the size of the object bounds" for capability precise bounds, + "the size required on the stack" to represent how much space is needed on + the stack in order to allocate this object (including space required to + align the object), and "the size of the object" that we give as our + request to __morestack_allocate_stack_space. + + This allows ensuring that we give as precise bounds as possible. + + After all sizes have been calculated, `size` represents the total space on + the stack. */ + rtx orig_size = size; + rtx bounds_size = size; /* If we're asking for zero bytes, it doesn't matter what we point to since we can't dereference it. But return a reasonable @@ -1469,10 +1487,133 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align, current_function_has_unbounded_dynamic_stack_size = 1; stack_usage_size = 0; } + /* If the size is constant, then account for any bounding that may be + required. */ + else if (applying_cheri_stack_bounds ()) + stack_usage_size += targetm.data_padding_size (stack_usage_size, + required_align, + NULL_TREE); + } + + if (!applying_cheri_stack_bounds ()) + { /* Do nothing. */ } + else if (CONST_INT_P (size)) + { + unsigned HOST_WIDE_INT s = UINTVAL (size); + s += targetm.data_padding_size (s, required_align, NULL_TREE); + size = gen_int_mode (s, POmode); + required_align = alignment_pad_from_bits (s, required_align, NULL_TREE); + } + else + { + rtx tmp = force_reg (POmode, size); + rtx target = gen_reg_rtx (POmode); + emit_insn (gen_cap_round_representable_length (target, tmp)); + size = target; } + bounds_size = size; get_dynamic_stack_size (&size, size_align, required_align, &stack_usage_size); + if (!CONST_INT_P (size) && applying_cheri_stack_bounds ()) + { + /* Ensure alignment to that required for capability precise bounds. + + This is slightly awkward since the operation CHERI provides is + "generate a mask defining the required alignment". This operation is + not easy to fit in with how this function currently behaves. + + This function acts by calculating enough space to ensure that we can + ensure alignment *after* allocation. We then use that size to do + probing of the stack and query whether we need to split the stack + (when that feature is enabled). It is only after this that the + allocation is made. + + Hence naively it seems we need to use our mask first to calculate the + size needed, and later to perform our alignment. + + For STACK_GROWS_DOWNWARD this is not actually needed. + + This is because we know that the `cap_round_representable_length` + operation will round up the length to a multiple of the same alignment + requirement that `cap_representable_alignment_mask` describes. Hence + after having calculated the extra size needed for alignment, we know + that "before + alignment size + rounded CHERI size" is still at a + CHERI alignment boundary. + + Calling the above position X, we can then show that the extra + operations to ensure alignment to `required_align` will still always + leave us at a properly CHERI aligned value. + If `required_align` is greater than the CHERI alignment requirement, + then the extra space from `get_dynamic_stack_size` and the alignment + to `required_align` both work as usual to ensure we are aligned to + `required_align`, since `required_align` is greater than the CHERI + alignment requirement it will satisfy both requirements. + + - If `required_align` is less than the CHERI alignment requirement + then we can model the alignment for that `required_align` as "extra + work" done on top of the CHERI alignment operations. + - This "extra work" adds some space on the bottom of the stack (this + is the STACK_GROWS_DOWNWARD case), and then aligns upwards to a + `required_align` boundary. + - The amount extra is never going to be enough to span two + `required_align` boundaries (due to the calculations in + `get_dynamic_stack_size`). + - Since the CHERI alignment requirement is greater than that of + `required_align` we know that the position X is both a CHERI + alignment boundary *and* a `required_align` boundary. + - Given this is a `required_align` boundary the act of adding space + and aligning upwards to `required_align` must bring us back to + that same position. + + For *NON* STACK_GROWS_DOWNWARD cases we must align for CHERI after + allocation, since in that case the allocation size does not affect the + pointer returned from this function. */ + + /* Calculate the extra size required to align for CHERI precise bounds. + */ + rtx cap_aligned = gen_reg_rtx (POmode); + emit_move_insn (cap_aligned, + drop_capability (virtual_stack_dynamic_rtx)); + rtx saved_top = gen_reg_rtx (POmode); + emit_move_insn (saved_top, cap_aligned); + + /* Get the mask which describes the alignment we need. */ + rtx size_rtx = force_reg (POmode, orig_size); + cheri_alignment_mask = gen_reg_rtx (POmode); + emit_insn (gen_cap_representable_alignment_mask + (cheri_alignment_mask, size_rtx)); + + /* Use that mask to apply the relevant alignment to our pointer onto the + stack. */ + cap_aligned = expand_binop (POmode, and_optab, + cap_aligned, cheri_alignment_mask, + NULL_RTX, 1, OPTAB_LIB_WIDEN); + + if (!STACK_GROWS_DOWNWARD) + { + rtx tmp = expand_unop (POmode, one_cmpl_optab, + cheri_alignment_mask, NULL_RTX, 0); + cheri_alignment_hunk = plus_constant (POmode, tmp, 1); + cap_aligned = expand_pointer_plus (POmode, cap_aligned, + cheri_alignment_hunk, + NULL_RTX, 1, OPTAB_DIRECT); + } + + rtx extra_size; + if (STACK_GROWS_DOWNWARD) + extra_size = expand_binop (POmode, sub_optab, + saved_top, cap_aligned, + NULL_RTX, 1, OPTAB_LIB_WIDEN); + else + extra_size = expand_binop (POmode, sub_optab, + cap_aligned, saved_top, + NULL_RTX, 1, OPTAB_LIB_WIDEN); + size = expand_binop (POmode, add_optab, size, extra_size, + extra_size, 1, OPTAB_LIB_WIDEN); + } + + target = gen_reg_rtx (Pmode); /* The size is supposed to be fully adjusted at this point so record it @@ -1521,13 +1662,18 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align, by malloc does not meet REQUIRED_ALIGN, we increase SIZE to make sure we allocate enough space. */ if (MALLOC_ABI_ALIGNMENT >= required_align) - ask = size; + ask = orig_size; else - ask = expand_binop (POmode, add_optab, size, + ask = expand_binop (POmode, add_optab, orig_size, gen_int_mode (required_align / BITS_PER_UNIT - 1, POmode), NULL_RTX, 1, OPTAB_LIB_WIDEN); + /* As mentioned above, __morestack_allocate_stack_space uses `malloc`. + For purecap, malloc should provide a suitably aligned and padded + allocation such that the returned capability does not give access to + anything else. Hence we should not have to adjust anything for + precise bounds of the alloca object. */ func = init_one_libfunc ("__morestack_allocate_stack_space"); space = emit_library_call_value (func, target, LCT_NORMAL, Pmode, @@ -1641,14 +1787,43 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align, target = final_target; } + /* Ensure alignment to that requested by the caller. */ target = align_dynamic_address (target, required_align); + /* For STACK_GROWS_DOWNWARD we do not need to align for CHERI precise bounds + since the length of allocation has been calculated so that the "end" of + the allocation is on a correct alignment boundary after the above call to + `align_dynamic_address`. In the other case we do need to align for CHERI + precise bounds since the length of allocation has no bearing on the value + we return. */ + if (!CONST_INT_P (size) && !STACK_GROWS_DOWNWARD + && applying_cheri_stack_bounds ()) + { + /* N.b. untested since we have no !STACK_GROWS_DOWNWARD capability + targets. However this should at least give an idea of how to proceed + when that does happen. */ + rtx tmp = expand_binop (POmode, add_optab, + drop_capability (target), + cheri_alignment_hunk, + NULL_RTX, 1, OPTAB_LIB_WIDEN); + tmp = expand_binop (POmode, and_optab, + tmp, cheri_alignment_mask, + NULL_RTX, 1, OPTAB_LIB_WIDEN); + target = expand_replace_address_value (Pmode, target, tmp, target); + } - /* Now that we've committed to a return value, mark its alignment. */ + /* Now that we've committed to a return value, mark its alignment. + This is always a *minimum* alignment, so is not affected by possible extra + alignment added by capability precise bounds requirements. */ mark_reg_pointer (target, required_align); /* Record the new stack level. */ record_new_stack_level (); + /* If we have split_stack_space_check, then we're returning a pointer from + malloc and hence it is already bounded. */ + if (!targetm.have_split_stack_space_check () + && applying_cheri_stack_bounds ()) + target = targetm.cap_narrowed_pointer (target, bounds_size); return target; } diff --git a/gcc/explow.h b/gcc/explow.h index 0df8c62b82a..fddb578d12b 100644 --- a/gcc/explow.h +++ b/gcc/explow.h @@ -98,18 +98,19 @@ extern void update_nonlocal_goto_save_area (void); extern void record_new_stack_level (void); /* Allocate some space on the stack dynamically and return its address. */ -extern rtx allocate_dynamic_stack_space (rtx, unsigned, unsigned, +extern rtx allocate_dynamic_stack_space (rtx, unsigned, unsigned HOST_WIDE_INT, HOST_WIDE_INT, bool); /* Calculate the necessary size of a constant dynamic stack allocation from the size of the variable area. */ -extern void get_dynamic_stack_size (rtx *, unsigned, unsigned, HOST_WIDE_INT *); +extern void get_dynamic_stack_size (rtx *, unsigned, unsigned HOST_WIDE_INT, + HOST_WIDE_INT *); /* Returns the address of the dynamic stack space without allocating it. */ extern rtx get_dynamic_stack_base (poly_int64, unsigned); /* Return an rtx doing runtime alignment to REQUIRED_ALIGN on TARGET. */ -extern rtx align_dynamic_address (rtx, unsigned); +extern rtx align_dynamic_address (rtx, unsigned HOST_WIDE_INT); /* Emit one stack probe at ADDRESS, an address within the stack. */ extern void emit_stack_probe (rtx); diff --git a/gcc/expr.c b/gcc/expr.c index ae41993aec8..945aa002774 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -7933,6 +7933,12 @@ force_operand (rtx value, rtx target) SUBREG_BYTE (value)); #endif + /* We allow the target to handle forcing RTX's that it recognises and we + don't into operands. */ + rtx machine_specific = targetm.force_operand (value, target); + if (machine_specific) + return machine_specific; + return value; } \f @@ -11757,7 +11763,8 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode, target = assign_stack_temp_for_type (TYPE_MODE (inner_type), - GET_MODE_SIZE (TYPE_MODE (inner_type)), inner_type); + GET_MODE_SIZE (TYPE_MODE (inner_type)), inner_type, + true); emit_move_insn (target, op0); op0 = target; diff --git a/gcc/function.c b/gcc/function.c index 2fc80ec6642..4a9eaab2ae0 100644 --- a/gcc/function.c +++ b/gcc/function.c @@ -354,28 +354,13 @@ add_frame_space (poly_int64 start, poly_int64 end) space->length = end - start; } -/* Allocate a stack slot of SIZE bytes and return a MEM rtx for it - with machine mode MODE. - - ALIGN controls the amount of alignment for the address of the slot: - 0 means according to MODE, - -1 means use BIGGEST_ALIGNMENT and round size to multiple of that, - -2 means use BITS_PER_UNIT, - positive specifies alignment boundary in bits. - - KIND has ASLK_REDUCE_ALIGN bit set if it is OK to reduce - alignment and ASLK_RECORD_PAD bit set if we should remember - extra space we allocated for alignment purposes. When we are - called from assign_stack_temp_for_type, it is not set so we don't - track the same stack slot in two independent lists. - - We do not round to stack_boundary here. */ - -rtx -assign_stack_local_1 (machine_mode mode, poly_int64 size, - int align, int kind) +static rtx +assign_stack_local_1_base (machine_mode mode, poly_int64 size, + int align, int kind, + unsigned int *pbit_align) { - rtx x, addr; + + rtx addr; poly_int64 bigend_correction = 0; poly_int64 slot_offset = 0, old_frame_offset; unsigned int alignment, alignment_in_bits; @@ -524,18 +509,71 @@ assign_stack_local_1 (machine_mode mode, poly_int64 size, (slot_offset + bigend_correction, POmode)); - x = gen_rtx_MEM (mode, addr); + if (frame_offset_overflow (frame_offset, current_function_decl)) + frame_offset = 0; + + if (pbit_align) + *pbit_align = alignment_in_bits; + + return addr; +} + +static rtx +assign_stack_local_1_as_mem (machine_mode mode, + rtx addr, + unsigned int alignment_in_bits) +{ + rtx x = gen_rtx_MEM (mode, addr); set_mem_align (x, alignment_in_bits); MEM_NOTRAP_P (x) = 1; vec_safe_push (stack_slot_list, x); - if (frame_offset_overflow (frame_offset, current_function_decl)) - frame_offset = 0; - return x; } +static rtx +assign_stack_local_narrowed (machine_mode mode, poly_int64 size, int align, + int kind) +{ + unsigned int alignment_in_bits; + rtx addr = assign_stack_local_1_base (mode, size, align, kind, + &alignment_in_bits); + if (size.is_constant () && applying_cheri_stack_bounds ()) + { + rtx size_rtx = gen_int_mode (size.to_constant (), POmode); + addr = targetm.cap_narrowed_pointer (addr, size_rtx); + } + return assign_stack_local_1_as_mem (mode, addr, alignment_in_bits); +} +/* Allocate a stack slot of SIZE bytes and return a MEM rtx for it + with machine mode MODE. + + ALIGN controls the amount of alignment for the address of the slot: + 0 means according to MODE, + -1 means use BIGGEST_ALIGNMENT and round size to multiple of that, + -2 means use BITS_PER_UNIT, + positive specifies alignment boundary in bits. + + KIND has ASLK_REDUCE_ALIGN bit set if it is OK to reduce + alignment and ASLK_RECORD_PAD bit set if we should remember + extra space we allocated for alignment purposes. When we are + called from assign_stack_temp_for_type, it is not set so we don't + track the same stack slot in two independent lists. + + We do not round to stack_boundary here. */ + +rtx +assign_stack_local_1 (machine_mode mode, poly_int64 size, + int align, int kind) +{ + unsigned int alignment_in_bits; + rtx addr = assign_stack_local_1_base (mode, size, align, kind, + &alignment_in_bits); + + return assign_stack_local_1_as_mem (mode, addr, alignment_in_bits); +} + /* Wrap up assign_stack_local_1 with last parameter as false. */ rtx @@ -574,7 +612,7 @@ public: conflict with objects of the type of the old slot. */ tree type; /* The alignment (in bits) of the slot. */ - unsigned int align; + unsigned HOST_WIDE_INT align; /* Nonzero if this temporary is currently in use. */ char in_use; /* Nesting level at which this slot is being used. */ @@ -585,6 +623,9 @@ public: /* The size of the slot, including extra space for alignment. This info is for combine_temp_slots. */ poly_int64 full_size; + /* Whether or not access to this slot is bounded. If it is then it can not + be combined with other slots. */ + bool bounded; }; /* Entry for the below hash table. */ @@ -786,15 +827,24 @@ find_temp_slot_from_address (rtx x) TYPE is the type that will be used for the stack slot. */ rtx -assign_stack_temp_for_type (machine_mode mode, poly_int64 size, tree type) +assign_stack_temp_for_type (machine_mode mode, poly_int64 size, tree type, + bool bounded) { - unsigned int align; + unsigned HOST_WIDE_INT align; + unsigned orig_align; class temp_slot *p, *best_p = 0, *selected = NULL, **pp; rtx slot; + poly_int64 size_stored = size; gcc_assert (known_size_p (size)); - align = get_stack_local_alignment (type, mode); + orig_align = align = get_stack_local_alignment (type, mode); + bounded &= applying_cheri_stack_bounds (); + if (size.is_constant () && bounded) + { + size_stored += targetm.data_padding_size (size.to_constant (), 0, NULL_TREE); + align = alignment_pad_from_bits (size.to_constant (), align, NULL_TREE); + } /* Try to find an available, already-allocated temporary of the proper mode which meets the size and alignment requirements. Choose the @@ -839,7 +889,7 @@ assign_stack_temp_for_type (machine_mode mode, poly_int64 size, tree type) for BLKmode slots, so that we can be sure of the alignment. */ if (GET_MODE (best_p->slot) == BLKmode) { - int alignment = best_p->align / BITS_PER_UNIT; + HOST_WIDE_INT alignment = best_p->align / BITS_PER_UNIT; poly_int64 rounded_size = aligned_upper_bound (size, alignment); if (known_ge (best_p->size - rounded_size, alignment)) @@ -876,14 +926,33 @@ assign_stack_temp_for_type (machine_mode mode, poly_int64 size, tree type) So for requests which depended on the rounding of SIZE, we go ahead and round it now. We also make sure ALIGNMENT is at least BIGGEST_ALIGNMENT. */ - gcc_assert (mode != BLKmode || align == BIGGEST_ALIGNMENT); - p->slot = assign_stack_local_1 (mode, - (mode == BLKmode - ? aligned_upper_bound (size, - (int) align - / BITS_PER_UNIT) - : size), - align, 0); + gcc_assert (mode != BLKmode || align >= BIGGEST_ALIGNMENT); + if (!bounded) + p->slot = assign_stack_local_1 + (mode, + (mode == BLKmode + ? aligned_upper_bound (size, align / BITS_PER_UNIT) + : size), + align, 0); + else if (align > MAX_SUPPORTED_STACK_ALIGNMENT) + { + rtx allocsize = gen_int_mode (size_stored, POmode); + get_dynamic_stack_size (&allocsize, 0, align, NULL); + rtx addr = assign_stack_local_1_base (BLKmode, UINTVAL (allocsize), + MAX_SUPPORTED_STACK_ALIGNMENT, + ASLK_RECORD_PAD, NULL); + addr = align_dynamic_address (addr, align); + addr = targetm.cap_narrowed_pointer + (addr, gen_int_mode (size_stored, POmode)); + p->slot = assign_stack_local_1_as_mem (mode, addr, align); + } + else + p->slot = assign_stack_local_narrowed + (mode, + (mode == BLKmode + ? aligned_upper_bound (size, align / BITS_PER_UNIT) + : size), + align, 0); p->align = align; @@ -918,6 +987,7 @@ assign_stack_temp_for_type (machine_mode mode, poly_int64 size, tree type) p->in_use = 1; p->type = type; p->level = temp_slot_level; + p->bounded = bounded; n_temp_slots_in_use++; pp = temp_slots_at_level (p->level); @@ -932,7 +1002,10 @@ assign_stack_temp_for_type (machine_mode mode, poly_int64 size, tree type) it. If there's no TYPE, then we don't know anything about the alias set for the memory. */ set_mem_alias_set (slot, type ? get_alias_set (type) : 0); - set_mem_align (slot, align); + if (align < UINT_MAX) + set_mem_align (slot, align); + else + set_mem_align (slot, orig_align); /* If a type is specified, set the relevant flags. */ if (type != 0) @@ -946,9 +1019,9 @@ assign_stack_temp_for_type (machine_mode mode, poly_int64 size, tree type) reuse. First two arguments are same as in preceding function. */ rtx -assign_stack_temp (machine_mode mode, poly_int64 size) +assign_stack_temp (machine_mode mode, poly_int64 size, bool bounded) { - return assign_stack_temp_for_type (mode, size, NULL_TREE); + return assign_stack_temp_for_type (mode, size, NULL_TREE, bounded); } \f /* Assign a temporary. @@ -962,7 +1035,8 @@ assign_stack_temp (machine_mode mode, poly_int64 size) rtx assign_temp (tree type_or_decl, int memory_required, - int dont_promote ATTRIBUTE_UNUSED) + int dont_promote ATTRIBUTE_UNUSED, + bool bounded) { tree type, decl; machine_mode mode; @@ -1012,7 +1086,7 @@ assign_temp (tree type_or_decl, int memory_required, size = 1; } - tmp = assign_stack_temp_for_type (mode, size, type); + tmp = assign_stack_temp_for_type (mode, size, type, bounded); return tmp; } @@ -1053,6 +1127,8 @@ combine_temp_slots (void) int delete_p = 0; next = p->next; + if (p->bounded) + continue; if (GET_MODE (p->slot) != BLKmode) continue; @@ -1062,6 +1138,8 @@ combine_temp_slots (void) int delete_q = 0; next_q = q->next; + if (q->bounded) + continue; if (GET_MODE (q->slot) != BLKmode) continue; @@ -1714,9 +1792,11 @@ instantiate_virtual_regs_in_insn (rtx_insn *insn) /* ??? Recognize address_operand and/or "p" constraints to see if (plus new offset) is a valid before we put this through expand_simple_binop. */ - /* MORELLO TODO: update for capabilities? */ - x = expand_simple_binop (GET_MODE (x), PLUS, new_rtx, - gen_int_mode (offset, GET_MODE (x)), + scalar_addr_mode sa_mode + = as_a <scalar_addr_mode> (GET_MODE (x)); + scalar_int_mode off_mode = offset_mode (sa_mode); + x = expand_pointer_plus (sa_mode, new_rtx, + gen_int_mode (offset, off_mode), NULL_RTX, 1, OPTAB_LIB_WIDEN); seq = get_insns (); end_sequence (); @@ -1731,10 +1811,11 @@ instantiate_virtual_regs_in_insn (rtx_insn *insn) if (maybe_ne (offset, 0)) { start_sequence (); - /* MORELLO TODO: expand_pointer_plus? */ - new_rtx = expand_simple_binop - (GET_MODE (new_rtx), PLUS, new_rtx, - gen_int_mode (offset, GET_MODE (new_rtx)), + scalar_addr_mode sa_mode + = as_a <scalar_addr_mode> (GET_MODE (new_rtx)); + scalar_int_mode off_mode = offset_mode (sa_mode); + new_rtx = expand_pointer_plus + (sa_mode, new_rtx, gen_int_mode (offset, off_mode), NULL_RTX, 1, OPTAB_LIB_WIDEN); seq = get_insns (); end_sequence (); @@ -2929,31 +3010,47 @@ assign_parm_setup_block (struct assign_parm_data_all *all, size = int_size_in_bytes (data->arg.type); size_stored = CEIL_ROUND (size, UNITS_PER_WORD); + if (applying_cheri_stack_bounds ()) + size_stored += targetm.data_padding_size (size, DECL_ALIGN (parm), parm); if (stack_parm == 0) { /* MORELLO TODO (OPTIMISED). Based on the surrounding code it seems we may be able to branch based on mode_strict_alignment (word_mode). Look into that ... */ - HOST_WIDE_INT parm_align + unsigned HOST_WIDE_INT parm_align = (any_modes_strict_align () ? MAX (DECL_ALIGN (parm), BITS_PER_WORD) : DECL_ALIGN (parm)); - - SET_DECL_ALIGN (parm, parm_align); - if (DECL_ALIGN (parm) > MAX_SUPPORTED_STACK_ALIGNMENT) + /* While it is possible to store any 64 bit alignment in the 6bit + log-power field of DECL_ALIGN, the majority of the rest of the code + uses a 32bit int (as per limits on ELF alignment commented in + tree_type_common.align). However, marking the DECL as having a known + very large alignment is not necessary, the alignment it is marked as + having is also correct (just smaller than we could have guaranteed). + We ensure the object is aligned on the stack using + `get_dynamic_stack_size` and `align_dynamic_address` anyway. */ + if (applying_cheri_stack_bounds ()) + parm_align = alignment_pad_from_bits (size, DECL_ALIGN (parm), parm); + + if (parm_align < UINT_MAX) + SET_DECL_ALIGN (parm, parm_align); + if (parm_align > MAX_SUPPORTED_STACK_ALIGNMENT) { rtx allocsize = gen_int_mode (size_stored, POmode); - get_dynamic_stack_size (&allocsize, 0, DECL_ALIGN (parm), NULL); - stack_parm = assign_stack_local (BLKmode, UINTVAL (allocsize), - MAX_SUPPORTED_STACK_ALIGNMENT); - rtx addr = align_dynamic_address (XEXP (stack_parm, 0), - DECL_ALIGN (parm)); + get_dynamic_stack_size (&allocsize, 0, parm_align, NULL); + rtx addr = assign_stack_local_1_base (BLKmode, UINTVAL (allocsize), + MAX_SUPPORTED_STACK_ALIGNMENT, + ASLK_RECORD_PAD, NULL); + addr = align_dynamic_address (addr, parm_align); mark_reg_pointer (addr, DECL_ALIGN (parm)); - stack_parm = gen_rtx_MEM (GET_MODE (stack_parm), addr); + if (applying_cheri_stack_bounds ()) + addr = targetm.cap_narrowed_pointer (addr, allocsize); + stack_parm = gen_rtx_MEM (BLKmode, addr); MEM_NOTRAP_P (stack_parm) = 1; } else - stack_parm = assign_stack_local (BLKmode, size_stored, - DECL_ALIGN (parm)); + stack_parm = assign_stack_local_narrowed (BLKmode, size_stored, + DECL_ALIGN (parm), + ASLK_RECORD_PAD); if (known_eq (GET_MODE_SIZE (GET_MODE (entry_parm)), size)) PUT_MODE (stack_parm, GET_MODE (entry_parm)); set_mem_attributes (stack_parm, parm, 1); @@ -3503,9 +3600,10 @@ assign_parm_setup_stack (struct assign_parm_data_all *all, tree parm, align))) align = GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm)); data->stack_parm - = assign_stack_local (GET_MODE (data->entry_parm), - GET_MODE_SIZE (GET_MODE (data->entry_parm)), - align); + = assign_stack_local_narrowed + (GET_MODE (data->entry_parm), + GET_MODE_SIZE (GET_MODE (data->entry_parm)), align, + ASLK_RECORD_PAD); align = MEM_ALIGN (data->stack_parm); set_mem_attributes (data->stack_parm, parm, 1); set_mem_align (data->stack_parm, align); diff --git a/gcc/function.h b/gcc/function.h index ca7fee7e83b..9321bda9a03 100644 --- a/gcc/function.h +++ b/gcc/function.h @@ -625,9 +625,10 @@ extern unsigned int spill_slot_alignment (machine_mode); extern rtx assign_stack_local_1 (machine_mode, poly_int64, int, int); extern rtx assign_stack_local (machine_mode, poly_int64, int); -extern rtx assign_stack_temp_for_type (machine_mode, poly_int64, tree); -extern rtx assign_stack_temp (machine_mode, poly_int64); -extern rtx assign_temp (tree, int, int); +extern rtx assign_stack_temp_for_type (machine_mode, poly_int64, tree, + bool bounded = false); +extern rtx assign_stack_temp (machine_mode, poly_int64, bool bounded = false); +extern rtx assign_temp (tree, int, int, bool bounded = false); extern void update_temp_slot_address (rtx, rtx); extern void preserve_temp_slots (rtx); extern void free_temp_slots (void); diff --git a/gcc/hooks.c b/gcc/hooks.c index 9c493790416..c0d9a1fc989 100644 --- a/gcc/hooks.c +++ b/gcc/hooks.c @@ -394,6 +394,20 @@ hook_rtx_rtx_null (rtx) return NULL; } +/* Generic hook that takes two rtx's and returns NULL_RTX. */ +rtx +hook_rtx_rtx_rtx_null (rtx, rtx) +{ + return NULL; +} + +/* Generic hook that takes two rtx's and returns the first. */ +rtx +hook_rtx_rtx_rtx_idfirst (rtx a, rtx) +{ + return a; +} + /* Generic hook that takes a tree and an int and returns NULL_RTX. */ rtx hook_rtx_tree_int_null (tree, int) diff --git a/gcc/target.def b/gcc/target.def index fccbfa2ce9c..8ae729d9a63 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -3274,6 +3274,25 @@ DEFHOOK opt_scalar_addr_mode. This is the default.", opt_scalar_addr_mode, (), default_capability_mode) +DEFHOOK +(cap_narrowed_pointer, + "Return an RTL expression which represents @var{base} narrowed such that it\n\ + can only access a length of @var{size} bytes.\n\ + This is designed for capability targets.\n\ + Note that this hook must not emit any insns.", + rtx, (rtx base, rtx size), + hook_rtx_rtx_rtx_idfirst) + +DEFHOOK +(force_operand, + "Target hook to enable @code{force_operand} on target-specific RTL\n\ + expressions.\n\ + When @code{cap_narrowed_pointer} returns an RTX which can not be made an\n\ + operand by the general @code{force_operand}, this hook may need to be\n\ + implemented in order to handle said RTX.", + rtx, (rtx value, rtx target), + hook_rtx_rtx_rtx_null) + /* Disambiguate with errno. */ DEFHOOK (capabilities_in_hardware, @@ -3470,13 +3489,16 @@ padding of the given amount when emitting this variable.\n\ This hook takes both its arguments in bytes. The default definition returns\n\ @code{0}.\n\ \n\ -The typical use of this hook is to add padding to the end of objects on\n\ -a capability architecture to ensure that the bounds of a capability pointing\n\ -to objects do not allow accesses to any neighbouring objects.\n\ +This hook is used to add padding to the end of objects on a capability\n\ +architecture to ensure that the bounds of a capability pointing to objects do\n\ +not allow accesses to any neighbouring objects.\n\ \n\ A requirement on the implementation of this function is that if @var{decl}\n\ has a user-specified alignment on a decl which has an associated section\n\ -then this hook must return @code{0}.", +then this hook must return @code{0}.\n\ +\n\ +This hook must be defined so that it can be used on global variables and stack\n\ +variables.", unsigned HOST_WIDE_INT, (unsigned HOST_WIDE_INT size, unsigned HOST_WIDE_INT align, const_tree decl), default_data_padding_size) @@ -3488,11 +3510,20 @@ size @var{size} when writing it out to memory.\n\ \n\ This hook takes its argument in bytes. The default definition returns the\n\ alignment given as an argument.\n\ +The hook is used to ensure the alignment required so that the size @var{size}\n\ +can be precisely bounded using the CHERI bounds compression format.\n\ \n\ -The typical use of this hook is to ensure alignment in order to give precise\n\ -bounds for a capability pointing to the given object on capability systems.\n\ A requirement on the implementation of this function is that if @var{decl}\n\ -has a user-specified alignment then this hook must not decrease the alignment.", +is a global variable with a named section and a user-specified alignment then\n\ +this hook must not return a greater alignment.\n\ +This requirement is so that users can ensure zero padding between objects\n\ +(something that is used in crtstuff.c).\n\ +\n\ +Similar is required for thread local variables due to this space being too\n\ +precious to waste (matching the @code{align_variable} behaviour).\n\ +\n\ +This hook must be defined so that it can be used on global variables and stack\n\ +variables.", unsigned HOST_WIDE_INT, (unsigned HOST_WIDE_INT size, unsigned HOST_WIDE_INT align, const_tree decl), default_data_padding_size) diff --git a/gcc/target.h b/gcc/target.h index 344a6b7934b..a5523545a94 100644 --- a/gcc/target.h +++ b/gcc/target.h @@ -320,6 +320,38 @@ address_mode_to_pointer_mode (scalar_addr_mode address_mode, addr_space_t as) return targetm.addr_space.pointer_mode (as, is_capability); } +/* Handle using targetm.data_alignment hook on an alignment provided in bits. + Since the hook takes an alignment provided in bytes we could lose some + bit-wise alignment requirement. This ensures that we maintain the bit-wise + alignment if the hook does not increase the alignment requirement. */ +inline unsigned HOST_WIDE_INT +alignment_pad_from_bits (unsigned HOST_WIDE_INT size, + unsigned HOST_WIDE_INT align_orig, + const_tree decl) +{ + /* Alignment must be a power of two throughout the compiler (e.g. see + how TYPE_ALIGN and DECL_ALIGN are recorded). */ + gcc_assert (align_orig < BITS_PER_UNIT + || (align_orig % BITS_PER_UNIT == 0)); + unsigned HOST_WIDE_INT align + = targetm.data_alignment (size, align_orig/BITS_PER_UNIT, decl); + /* Only need to worry about the time that align_orig is less than + BITS_PER_UNIT, which is very rare (on bootstrapping and running the + testsuite at the time this line was added the only time that was hit was + via output_constant_pool_1 from output_object_block using an artificial + alignment of 1 bit because the alignment was already handled). */ + return MAX (align * BITS_PER_UNIT, align_orig); +} + +/* Wrapper around choosing whether we need to apply CHERI bounds or not. */ +inline bool +applying_cheri_stack_bounds () +{ + return targetm.capabilities_in_hardware () + && CAPABILITY_MODE_P (Pmode) + && flag_cheri_stack_bounds; +} + #ifdef GCC_TM_H #ifndef CUMULATIVE_ARGS_MAGIC diff --git a/gcc/testsuite/gcc.dg/torture/matrix-6.c b/gcc/testsuite/gcc.dg/torture/matrix-6.c index e01e5311cd8..90fb2b029d4 100644 --- a/gcc/testsuite/gcc.dg/torture/matrix-6.c +++ b/gcc/testsuite/gcc.dg/torture/matrix-6.c @@ -1,4 +1,6 @@ -/* { dg-do run } */ +/* Can not run on Pure capability since this accesses a stack-local variable + out of the bounds of that stack local variable. */ +/* { dg-do run { target { ! cheri_capability_pure } } } */ /* { dg-options "-fwhole-program" } */ diff --git a/gcc/testsuite/gcc.dg/torture/pr36227.c b/gcc/testsuite/gcc.dg/torture/pr36227.c index ee5df5f5851..8ac71e5a674 100644 --- a/gcc/testsuite/gcc.dg/torture/pr36227.c +++ b/gcc/testsuite/gcc.dg/torture/pr36227.c @@ -1,10 +1,8 @@ -/* { dg-do run { target { stdint_types } } } */ +/* Avoid cheri_capability_pure since this accesses one stack variable via + a pointer to another, and that breaks capability bounds. */ +/* { dg-do run { target { { stdint_types } && { ! cheri_capability_pure } } } } */ -#ifdef __GCC_ARM_CAPABILITY_ANY -typedef __UINTPTR_TYPE__ uintptr_t; -#else #include <stdint.h> -#endif extern void abort (void); int main() diff --git a/gcc/testsuite/gcc.target/aarch64/morello/builtin_cheri_bounds_set.c b/gcc/testsuite/gcc.target/aarch64/morello/builtin_cheri_bounds_set.c index 45bb6c3b060..5018a84707e 100644 --- a/gcc/testsuite/gcc.target/aarch64/morello/builtin_cheri_bounds_set.c +++ b/gcc/testsuite/gcc.target/aarch64/morello/builtin_cheri_bounds_set.c @@ -1,5 +1,5 @@ /* { dg-do compile { target aarch64*-*-* } } */ -/* { dg-additional-options "-march=morello+c64 -mabi=purecap" } */ +/* { dg-additional-options "-march=morello+c64 -mabi=purecap -fno-cheri-stack-bounds" } */ #include <stddef.h> diff --git a/gcc/testsuite/gcc.target/aarch64/morello/builtin_cheri_bounds_set_exact.c b/gcc/testsuite/gcc.target/aarch64/morello/builtin_cheri_bounds_set_exact.c index ade31136e3e..c3efed91379 100644 --- a/gcc/testsuite/gcc.target/aarch64/morello/builtin_cheri_bounds_set_exact.c +++ b/gcc/testsuite/gcc.target/aarch64/morello/builtin_cheri_bounds_set_exact.c @@ -1,5 +1,5 @@ /* { dg-do compile { target aarch64*-*-* } } */ -/* { dg-additional-options "-march=morello+c64 -mabi=purecap" } */ +/* { dg-additional-options "-march=morello+c64 -mabi=purecap -fno-cheri-stack-bounds" } */ #include <stddef.h> diff --git a/gcc/testsuite/gcc.target/aarch64/morello/restrictions/alloca_big_alignment.c b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/alloca_big_alignment.c new file mode 100644 index 00000000000..953fa08d2d8 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/alloca_big_alignment.c @@ -0,0 +1,18 @@ +/* Taken from ASAN testsuite. */ +/* { dg-do run } */ +/* { dg-shouldfail-purecap "morello bounds" } */ + +#include <assert.h> + +volatile int ten = 10; + +__attribute__((noinline)) void foo(int index, int len) { + volatile char str[len] __attribute__((aligned(128))); + assert(!((long) str & 127L)); + str[index] = '1'; // BOOM +} + +int main() { + foo(ten, ten); + return 0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/morello/restrictions/alloca_big_allocation_outgoing_args.c b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/alloca_big_allocation_outgoing_args.c new file mode 100644 index 00000000000..3d2c29483b0 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/alloca_big_allocation_outgoing_args.c @@ -0,0 +1,38 @@ +/* { dg-do run } */ +/* Should pass. + This testcase is checking the case where there are outgoing arguments that + spill onto the stack in the same function as an alloca call. In this case + we reserve those outgoing arguments always under the space in which we + dynamically allocate with alloca and this can get messed up. */ + +#include <assert.h> + +/* Choose a size to allocate which requires extra alignment for precise Morello + bounds. */ +volatile int bigsize = 0x10000; + +/* A function with enough arguments that we spill something onto the stack when + calling it. */ +__attribute__((noinline, noipa)) +int otherfunc(int x, int a0, int a1, int a2, int a3, + int a4, int a5, int a6, int a7, int a8) +{ + return x % a8 % a0; +} + +/* Allocate enough space that the allocation needs 32 bytes of alignment for + Morello bounds, then access the variable. */ +__attribute__((noinline, noipa)) +int foo(int size, int index) +{ + int *myvariable = __builtin_alloca(size * sizeof(int)); + return otherfunc(myvariable[index], index, + index, index, index, index, + index, index, index, index); +} + +int main() { + foo(bigsize, bigsize - 1); + return 0; +} + diff --git a/gcc/testsuite/gcc.target/aarch64/morello/restrictions/alloca_detect_custom_size.c b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/alloca_detect_custom_size.c new file mode 100644 index 00000000000..943765486f3 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/alloca_detect_custom_size.c @@ -0,0 +1,23 @@ +/* Taken from ASAN testsuite. */ +/* { dg-do run } */ +/* { dg-shouldfail-purecap "morello bounds" } */ + +#include <assert.h> + +struct A { + char a[3]; + int b[3]; +}; + +volatile int ten = 10; + +__attribute__((noinline)) void foo(int index, int len) { + volatile struct A str[len] __attribute__((aligned(32))); + assert(!((long) str & 31L)); + str[index].a[0] = '1'; // BOOM +} + +int main(int argc, char **argv) { + foo(ten, ten); + return 0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/morello/restrictions/alloca_overflow_partial.c b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/alloca_overflow_partial.c new file mode 100644 index 00000000000..8393da5ce19 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/alloca_overflow_partial.c @@ -0,0 +1,18 @@ +/* Taken from ASAN testsuite. */ +/* { dg-do run } */ +/* { dg-shouldfail-purecap "morello bounds" } */ + +#include <assert.h> + +volatile const int ten = 10; + +__attribute__((noinline)) void foo(int index, int len) { + volatile char str[len] __attribute__((aligned(32))); + assert(!((long) str & 31L)); + str[index] = '1'; // BOOM +} + +int main(int argc, char **argv) { + foo(ten, ten); + return 0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/morello/restrictions/alloca_overflow_right.c b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/alloca_overflow_right.c new file mode 100644 index 00000000000..0f1c34d6665 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/alloca_overflow_right.c @@ -0,0 +1,18 @@ +/* Taken from ASAN testsuite. */ +/* { dg-do run } */ +/* { dg-shouldfail-purecap "morello bounds" } */ + +#include <assert.h> + +volatile const int ten = 10; + +__attribute__((noinline)) void foo(int index, int len) { + volatile char str[len] __attribute__((aligned(32))); + assert(!((long) str & 31L)); + str[index] = '1'; // BOOM +} + +int main(int argc, char **argv) { + foo(33, ten); + return 0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/morello/restrictions/alloca_underflow_left.c b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/alloca_underflow_left.c new file mode 100644 index 00000000000..f080bbc110a --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/alloca_underflow_left.c @@ -0,0 +1,18 @@ +/* Taken from ASAN testsuite. */ +/* { dg-do run } */ +/* { dg-shouldfail-purecap "morello bounds" } */ + +#include <assert.h> + +volatile const int ten = 10; + +__attribute__((noinline)) void foo(int index, int len) { + volatile char str[len] __attribute__((aligned(32))); + assert(!((long) str & 31L)); + str[index] = '1'; // BOOM +} + +int main(int argc, char **argv) { + foo(-1, ten); + return 0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/morello/restrictions/asan-stack-small.c b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/asan-stack-small.c new file mode 100644 index 00000000000..50a6402f79e --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/asan-stack-small.c @@ -0,0 +1,30 @@ +/* Taken from ASAN testsuite. */ +/* { dg-do run } */ +/* { dg-shouldfail-purecap "morello bounds" } */ + +char *pa; +char *pb; +char *pc; + +void access (volatile char *ptr) +{ + *ptr = 'x'; +} + +int main (int argc, char **argv) +{ + char a; + char b; + char c; + + pa = &a; + pb = &b; + pc = &c; + + access (pb); + access (pc); + // access 'b' here + access (pa + 32); + + return 0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/morello/restrictions/bitfield-1.c b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/bitfield-1.c new file mode 100644 index 00000000000..79e30dbc58f --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/bitfield-1.c @@ -0,0 +1,23 @@ +/* Taken from ASAN testsuite. */ +/* { dg-do run } */ +/* { dg-shouldfail-purecap "morello bounds" } */ + +struct A +{ + char base; + int : 4; + long x : 7; +}; + +int __attribute__ ((noinline, noclone)) +f (void *p, char *y __attribute((unused))) { + return ((struct A *)p)->x; +} + +int +main () +{ + char x[100] = {0}; + char a = 0; + return f (&a, x); +} diff --git a/gcc/testsuite/gcc.target/aarch64/morello/restrictions/bitfield-2.c b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/bitfield-2.c new file mode 100644 index 00000000000..b15a32f57c0 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/bitfield-2.c @@ -0,0 +1,22 @@ +/* Taken from ASAN testsuite. */ +/* { dg-do run } */ +/* { dg-shouldfail-purecap "morello bounds" } */ + +struct A +{ + char base; + int : 7; + int x : 8; +}; + +int __attribute__ ((noinline, noclone)) +f (void *p) { + return ((struct A *)p)->x; +} + +int +main () +{ + char a = 0; + return f (&a); +} diff --git a/gcc/testsuite/gcc.target/aarch64/morello/restrictions/bitfield-3.c b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/bitfield-3.c new file mode 100644 index 00000000000..ee6542303b2 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/bitfield-3.c @@ -0,0 +1,23 @@ +/* Taken from ASAN testsuite. */ +/* { dg-do run } */ +/* { dg-shouldfail-purecap "morello bounds" } */ + +struct A +{ + char base; + int : 8; + int x : 8; +}; + +int __attribute__ ((noinline, noclone)) +f (void *p, char *y __attribute__((unused))) { + return ((struct A *)p)->x; +} + +int +main () +{ + char x[100] = {0}; + char a = 0; + return f (&a, x); +} diff --git a/gcc/testsuite/gcc.target/aarch64/morello/restrictions/bitfield-4.c b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/bitfield-4.c new file mode 100644 index 00000000000..971ac761fd8 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/bitfield-4.c @@ -0,0 +1,23 @@ +/* Taken from ASAN testsuite. */ +/* { dg-do run } */ +/* { dg-shouldfail-purecap "morello bounds" } */ + +struct A +{ + char base; + int : 0; + int x : 8; +}; + +int __attribute__ ((noinline, noclone)) +f (void *p, char *y __attribute((unused))) { + return ((struct A *)p)->x; +} + +int +main () +{ + char x[100] = {0}; + char a = 0; + return f (&a, x); +} diff --git a/gcc/testsuite/gcc.target/aarch64/morello/restrictions/bitfield-5.c b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/bitfield-5.c new file mode 100644 index 00000000000..45404b3cea2 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/bitfield-5.c @@ -0,0 +1,24 @@ +/* Taken from ASAN testsuite. */ +/* Check BIT_FIELD_REF. */ + +/* { dg-do run } */ +/* { dg-shouldfail-purecap "morello bounds" } */ + +struct A +{ + int y : 20; + int x : 13; +}; + +int __attribute__ ((noinline, noclone)) +f (void *p, char *y __attribute((unused))) { + return ((struct A *)p)->x != 0; +} + +int +main () +{ + char x[100] = {0}; + int a = 0; + return f (&a, x); +} diff --git a/gcc/testsuite/gcc.target/aarch64/morello/restrictions/function-argument-1.c b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/function-argument-1.c new file mode 100644 index 00000000000..3cb09ede21d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/function-argument-1.c @@ -0,0 +1,27 @@ +/* Taken from ASAN testsuite. */ +/* { dg-do run } */ +/* { dg-shouldfail-purecap "morello bounds" } */ + +struct A +{ + int a[5]; +}; + +static __attribute__ ((noinline)) int +goo (struct A *a) +{ + int *ptr = &a->a[0]; + return *(volatile int *) (ptr - 1); +} + +__attribute__ ((noinline)) int +foo (struct A arg) +{ + return goo (&arg); +} + +int +main () +{ + return foo ((struct A){0}); +} diff --git a/gcc/testsuite/gcc.target/aarch64/morello/restrictions/function-argument-2.c b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/function-argument-2.c new file mode 100644 index 00000000000..a3f87dd18ca --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/function-argument-2.c @@ -0,0 +1,24 @@ +/* Taken from ASAN testsuite. */ +/* { dg-do run } */ +/* { dg-shouldfail-purecap "morello bounds" } */ + +static __attribute__ ((noinline)) int +goo (int *a) +{ + return *(volatile int *)a; +} + +__attribute__ ((noinline)) int +foo (char arg, char modval) +{ + int ret = goo ((int *)&arg); + if (ret % modval) + return modval; + return ret; +} + +int +main () +{ + return foo (12, 0); +} diff --git a/gcc/testsuite/gcc.target/aarch64/morello/restrictions/function-argument-3.c b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/function-argument-3.c new file mode 100644 index 00000000000..02c900886a6 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/function-argument-3.c @@ -0,0 +1,42 @@ +/* Taken from ASAN testsuite. */ +/* { dg-do run } */ +/* { dg-shouldfail-purecap "morello bounds" } */ +/* { dg-additional-options "-Wno-psabi" } */ + +/* On SPARC 32-bit, only vectors up to 8 bytes are passed in registers */ +#if defined(__sparc__) && !defined(__sparcv9) && !defined(__arch64__) +#define SMALL_VECTOR +#endif + +#ifdef SMALL_VECTOR +typedef int v4si __attribute__ ((vector_size (8))); +#else +typedef int v4si __attribute__ ((vector_size (16))); +#endif + +static __attribute__ ((noinline)) int +goo (v4si *a) +{ + return (*(volatile v4si *) (a + 1))[2]; +} + +__attribute__ ((noinline)) int +foo (v4si arg, char *y) +{ + int ret = goo (&arg); + if (ret % y[0]) + return y[0]; + return ret; +} + +int +main () +{ + char x[100] = {0}; +#ifdef SMALL_VECTOR + v4si v = {1,2}; +#else + v4si v = {1,2,3,4}; +#endif + return foo (v, x); +} diff --git a/gcc/testsuite/gcc.target/aarch64/morello/restrictions/function-argument-4.c b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/function-argument-4.c new file mode 100644 index 00000000000..68008ea1a35 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/function-argument-4.c @@ -0,0 +1,23 @@ +/* Taken from ASAN testsuite. */ +/* { dg-do run } */ +/* { dg-shouldfail-purecap "morello bounds" } */ + +#include <complex.h> + +static __attribute__ ((noinline)) long double +goo (long double _Complex *a) +{ + return crealf(*(volatile _Complex long double *)a); +} + +__attribute__ ((noinline)) float +foo (float _Complex arg) +{ + return goo ((long double _Complex *)&arg); +} + +int +main () +{ + return foo (3 + 2 * I); +} diff --git a/gcc/testsuite/gcc.target/aarch64/morello/restrictions/function-argument-5.c b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/function-argument-5.c new file mode 100644 index 00000000000..80e731f5f45 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/function-argument-5.c @@ -0,0 +1,35 @@ +/* { dg-do run } */ +/* { dg-shouldfail-purecap "morello bounds" } */ + +/* When this is passed on the stack, we round the size of the stack slot that + we generate up to a stack alignment of 16. This is done in + assign_stack_temp_for_type. That rounding up includes the bounds of the + pointer, and seems to require applying to the bounds of the pointer used + since later uses of the same stack slot may require the entire space + allocated. + It is for this reason that we allocate a structure of 112 bytes (since that + is divisible by 16 which is the size to which we round stack slots up to -- + hence accessing one past the structure will cause a problem). + MORELLO TODO It would be nice to look into this behaviour and see how + feasible it is to ensure that these stack slots can work without such a + rounding up. */ +struct large_struct { char x[112]; }; + +static __attribute__ ((noinline)) char +goo (struct large_struct *a) +{ + return (a+1)->x[0]; +} + +__attribute__ ((noinline)) char +foo (struct large_struct arg) +{ + return goo (&arg); +} + +int +main () +{ + return foo ((struct large_struct){0}); +} + diff --git a/gcc/testsuite/gcc.target/aarch64/morello/restrictions/function-argument-6.c b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/function-argument-6.c new file mode 100644 index 00000000000..5dcd588444c --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/function-argument-6.c @@ -0,0 +1,28 @@ +/* { dg-do run } */ +/* { dg-shouldfail-purecap "morello bounds" } */ + +struct large_struct { char x[99999]; }; + +static __attribute__ ((noinline)) char +goo (struct large_struct *a) +{ + /* The size of `large_struct` requires padding and alignment to ensure + precise bounds (i.e. that it can't overlap with other variables). + Hence we have to access x[1] rather than x[0] to ensure triggering the + problem. */ + return (a+1)->x[1]; +} + +__attribute__ ((noinline)) char +foo (struct large_struct arg) +{ + return goo (&arg); +} + +int +main () +{ + return foo ((struct large_struct){0}); +} + + diff --git a/gcc/testsuite/gcc.target/aarch64/morello/restrictions/function-argument-shouldpass.c b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/function-argument-shouldpass.c new file mode 100644 index 00000000000..cb8555b19be --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/function-argument-shouldpass.c @@ -0,0 +1,27 @@ +/* { dg-do run } */ + +/* This testcase ensures that our SCBNDSE on the large structure did not clear + the tag. This is not checked elsewhere in the testsuite. */ + +struct large_struct { char x[99999]; }; + +static __attribute__ ((noinline)) char +goo (struct large_struct *a) +{ + return a->x[99998]; +} + +__attribute__ ((noinline)) char +foo (struct large_struct arg) +{ + return goo (&arg); +} + +int +main () +{ + return foo ((struct large_struct){0}); +} + + + diff --git a/gcc/testsuite/gcc.target/aarch64/morello/restrictions/global-overflow-1.c b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/global-overflow-1.c new file mode 100644 index 00000000000..92370aab4a0 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/global-overflow-1.c @@ -0,0 +1,24 @@ +/* Taken from ASAN testsuite. */ +/* { dg-do run } */ +/* { dg-options "-fno-builtin-memset" } */ +/* { dg-shouldfail-purecap "morello bounds" } */ + +extern +#ifdef __cplusplus +"C" +#endif +void *memset (void *, int, __SIZE_TYPE__); + +volatile int ten = 10; + +int main() { + static char XXX[10]; + static char YYY[10]; + static char ZZZ[10]; + memset(XXX, 0, 10); + memset(YYY, 0, 10); + memset(ZZZ, 0, 10); + int res = YYY[ten]; /* BOOOM */ + res += XXX[ten/10] + ZZZ[ten/10]; + return res; +} diff --git a/gcc/testsuite/gcc.target/aarch64/morello/restrictions/heap-overflow-1.c b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/heap-overflow-1.c new file mode 100644 index 00000000000..5106c2cc947 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/heap-overflow-1.c @@ -0,0 +1,26 @@ +/* Taken from ASAN testsuite. */ +/* { dg-do run } */ +/* { dg-options "-fno-builtin-malloc -fno-builtin-free -fno-builtin-memset" } */ +/* { dg-shouldfail-purecap "morello bounds" } */ + +#ifdef __cplusplus +extern "C" { +#endif + +void *memset (void *, int, __SIZE_TYPE__); +void *malloc (__SIZE_TYPE__); +void free (void *); + +#ifdef __cplusplus +} +#endif + +volatile int ten = 10; +int main(int argc, char **argv) { + char *x = (char*)malloc(10); + memset(x, 0, 10); + int res = x[ten]; /* BOOOM */ + x[ten] = res%3; + free(x); + return 0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/morello/restrictions/memcmp-1.c b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/memcmp-1.c new file mode 100644 index 00000000000..1478275e81b --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/memcmp-1.c @@ -0,0 +1,17 @@ +/* Taken from ASAN testsuite. */ +/* { dg-do run } */ +/* { dg-options "-fno-builtin-memcmp" } */ +/* { dg-shouldfail-purecap "morello bounds" } */ + +#include <string.h> + +volatile int one = 1; + +int +main () +{ + char a1[] = {(char)one, 2, 3, 4}; + char a2[] = {1, (char)(2*one), 3, 4}; + int res = memcmp (a1, a2, 5 + one); + return res; +} diff --git a/gcc/testsuite/gcc.target/aarch64/morello/restrictions/misalign-1.c b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/misalign-1.c new file mode 100644 index 00000000000..ea6dbc332f3 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/misalign-1.c @@ -0,0 +1,36 @@ +/* Taken from ASAN testsuite. */ +/* { dg-do run } */ +/* { dg-shouldfail-purecap "morello bounds" } */ + +struct S { int i; } __attribute__ ((packed)); + +__attribute__((noinline, noclone)) int +foo (struct S *s) +{ + return s->i; +} + +__attribute__((noinline, noclone)) int +bar (int *s) +{ + return *s; +} + +__attribute__((noinline, noclone)) struct S +baz (struct S *s) +{ + return *s; +} + +int +main () +{ + struct T { char a[3]; struct S b[3]; char c; } t; + int v = 5; + struct S *p = t.b; + asm volatile ("" : "+rm" (p)); + p += 3; + if (bar (&v) != 5) __builtin_abort (); + volatile int w = foo (p); + return 0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/morello/restrictions/misalign-2.c b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/misalign-2.c new file mode 100644 index 00000000000..c5b4aaa2160 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/misalign-2.c @@ -0,0 +1,36 @@ +/* Taken from ASAN testsuite. */ +/* { dg-do run } */ +/* { dg-shouldfail-purecap "morello bounds" } */ + +struct S { int i; } __attribute__ ((packed)); + +__attribute__((noinline, noclone)) int +foo (struct S *s) +{ + return s->i; +} + +__attribute__((noinline, noclone)) int +bar (int *s) +{ + return *s; +} + +__attribute__((noinline, noclone)) struct S +baz (struct S *s) +{ + return *s; +} + +int +main () +{ + struct T { char a[3]; struct S b[3]; char c; } t; + int v = 5; + struct S *p = t.b; + asm volatile ("" : "+rm" (p)); + p += 3; + if (bar (&v) != 5) __builtin_abort (); + volatile struct S w = baz (p); + return 0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/morello/restrictions/morello-restrictions.exp b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/morello-restrictions.exp new file mode 100644 index 00000000000..bb5ebb2cc3f --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/morello-restrictions.exp @@ -0,0 +1,58 @@ +# Specific regression driver for AArch64 Morello. +# Copyright (C) 2021 Free Software Foundation, Inc. +# +# This file is part of GCC. +# +# GCC is free software; you can redistribute it and/or modify it +# under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3, or (at your option) +# any later version. +# +# GCC is distributed in the hope that it will be useful, but +# WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +# General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with GCC; see the file COPYING3. If not see +# <http://www.gnu.org/licenses/>. */ + +# Exit immediately if this isn't an AArch64 target. +if {![istarget aarch64*-*-*] } then { + return +} + +# Load support procs. +load_lib gcc-dg.exp +load_lib c-torture.exp + +# Initialize `dg'. +dg-init + +# We define a different proc to be used by the testcases so we can use +# `dg-shouldfail` without relying on looking for specific flags. Sometimes we +# compile purecap by default, sometimes we don't, hence can't use flags. +if { [check_effective_target_cheri_capability_pure] } { + set capability_flags "" + proc dg-shouldfail-purecap { args } { + upvar dg-do-what dg-do-what + dg-shouldfail "morello bounds" + } +} else { + set capability_flags "-mfake-capability" + proc dg-shouldfail-purecap { args } { } +} + +torture-init + +set-torture-options "$C_TORTURE_OPTIONS" + +# Main loop. +gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cCS\]]] \ + "" "$capability_flags" + +# Delete the proc now we don't need it. +rename dg-shouldfail-purecap "" +torture-finish +# All done. +dg-finish diff --git a/gcc/testsuite/gcc.target/aarch64/morello/restrictions/parameter-temp-on-stack-2.c b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/parameter-temp-on-stack-2.c new file mode 100644 index 00000000000..b4b3cda25a2 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/parameter-temp-on-stack-2.c @@ -0,0 +1,28 @@ +/* { dg-do run } */ +/* { dg-shouldfail-purecap "morello bounds" } */ + +void abort (); +extern void takes_int_addr (int *y); + +void __attribute__ ((noinline, noclone)) +takes_int_addr (int *x) +{ + /* N.b. we test with a read since the write can overwrite the return pointer + on the stack in an optimised compilation and crash even though we don't + have stack bounds turned on. */ + if (x[1] == 100) // BOOM + abort(); +} + + +void __attribute__ ((noinline)) +f1 (int x) +{ + takes_int_addr (&x); +} + +int main() +{ + f1(20); + return 0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/morello/restrictions/parameter-temp-on-stack.c b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/parameter-temp-on-stack.c new file mode 100644 index 00000000000..8da75495de8 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/parameter-temp-on-stack.c @@ -0,0 +1,30 @@ +/* { dg-do run } */ +/* { dg-shouldfail-purecap "morello bounds" } */ + +struct D { __uintcap_t x; unsigned long y; }; +void abort (); +extern void takes_struct_addr (struct D *y); +int ret; + +void __attribute__ ((noinline, noclone)) +takes_struct_addr (struct D *val) +{ + /* N.b. we test with a read since the write can overwrite the return pointer + on the stack in an optimised compilation and crash even though we don't + have stack bounds turned on. */ + ret = (val+1)->x; // BOOM +} + + +void __attribute__ ((noinline)) +f1 (struct D x) +{ + takes_struct_addr (&x); +} + +int main() +{ + struct D basic_arg = { 100, 10 }; + f1(basic_arg); + return 0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/morello/restrictions/stack-overflow-1.c b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/stack-overflow-1.c new file mode 100644 index 00000000000..c1dd7d594a4 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/stack-overflow-1.c @@ -0,0 +1,19 @@ +/* Taken from ASAN testsuite. */ +/* { dg-do run } */ +/* { dg-options "-fno-builtin-memset" } */ +/* { dg-shouldfail-purecap "morello bounds" } */ + +extern +#ifdef __cplusplus +"C" +#endif +void *memset (void *, int, __SIZE_TYPE__); + +volatile int ten = 10; + +int main() { + char x[10]; + memset(x, 0, 10); + int res = x[ten]; /* BOOOM */ + return res; +} diff --git a/gcc/testsuite/gcc.target/aarch64/morello/restrictions/strncpy-overflow-1.c b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/strncpy-overflow-1.c new file mode 100644 index 00000000000..94609abce6f --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/morello/restrictions/strncpy-overflow-1.c @@ -0,0 +1,14 @@ +/* Taken from ASAN testsuite. */ +/* { dg-do run } */ +/* { dg-options "-fno-builtin-malloc -fno-builtin-strncpy" } */ +/* { dg-shouldfail-purecap "morello bounds" } */ + +#include <string.h> +#include <stdlib.h> +int main(int argc, char **argv) { + char *hello = (char*)malloc(6); + strcpy(hello, "hello"); + char *short_buffer = (char*)malloc(9); + strncpy(short_buffer, hello, 10); /* BOOM */ + return short_buffer[8]; +} diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-cfa-1.c b/gcc/testsuite/gcc.target/aarch64/stack-check-cfa-1.c index 6885894a97e..24a52dc86ed 100644 --- a/gcc/testsuite/gcc.target/aarch64/stack-check-cfa-1.c +++ b/gcc/testsuite/gcc.target/aarch64/stack-check-cfa-1.c @@ -6,7 +6,15 @@ #include "stack-check-prologue.h" /* { dg-final { scan-assembler-times {\.cfi_def_cfa_offset 65536} 1 } } */ -/* { dg-final { scan-assembler-times {\.cfi_def_cfa_offset 131072} 1 } } */ +/* For capability compression we need to align the array of size SIZE to 64 + bytes. In order to do this when we only know the alignment is what a stack + boundary is aligned to when you enter a function, we need to add extra space + to the stack before aligning upwards to a 64 byte boundary. + Since this is AArch64, we know that the stack alignment on entering the + function is 16 bytes, hence we only need 48 extra bytes of space to be able + to find a 64 byte alignment boundary. */ +/* { dg-final { scan-assembler-times {\.cfi_def_cfa_offset 131072} 1 { target { ! cheri_capability_pure } } } } */ +/* { dg-final { scan-assembler-times {\.cfi_def_cfa_offset 131120} 1 { target cheri_capability_pure } } } */ /* { dg-final { scan-assembler-times {\.cfi_def_cfa_offset 0} 1 } } */ /* Checks that the CFA notes are correct for every sp adjustment. */ diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-cfa-2.c b/gcc/testsuite/gcc.target/aarch64/stack-check-cfa-2.c index 5796a53be06..24623d05db4 100644 --- a/gcc/testsuite/gcc.target/aarch64/stack-check-cfa-2.c +++ b/gcc/testsuite/gcc.target/aarch64/stack-check-cfa-2.c @@ -6,7 +6,15 @@ #include "stack-check-prologue.h" /* { dg-final { scan-assembler-times {\.cfi_def_cfa [0-9]+, 1310720} 1 } } */ -/* { dg-final { scan-assembler-times {\.cfi_def_cfa_offset 1311232} 1 } } */ +/* For capability compression we need to align the array of size SIZE to 512 + bytes. In order to do this when we only know the alignment is what a stack + boundary is aligned to when you enter a function, we need to add extra space + to the stack before aligning upwards to a 512 byte boundary. + Since this is AArch64, we know that the stack alignment on entering the + function is 16 bytes, hence we only need 496 extra bytes of space to be able + to find a 512 byte alignment boundary. */ +/* { dg-final { scan-assembler-times {\.cfi_def_cfa_offset 1311232} 1 { target { ! cheri_capability_pure } } } } */ +/* { dg-final { scan-assembler-times {\.cfi_def_cfa_offset 1311728} 1 { target cheri_capability_pure } } } */ /* { dg-final { scan-assembler-times {\.cfi_def_cfa_offset 1310720} 1 } } */ /* { dg-final { scan-assembler-times {\.cfi_def_cfa_offset 0} 1 } } */ diff --git a/gcc/varasm.c b/gcc/varasm.c index 79ecf9d9c60..9c664751618 100644 --- a/gcc/varasm.c +++ b/gcc/varasm.c @@ -1030,7 +1030,7 @@ bss_initializer_p (const_tree decl, bool named) void align_variable (tree decl, bool dont_output_data) { - unsigned int align = DECL_ALIGN (decl); + unsigned HOST_WIDE_INT align = DECL_ALIGN (decl); /* In the case for initialing an array whose length isn't specified, where we have not yet been able to do the layout, @@ -1069,7 +1069,8 @@ align_variable (tree decl, bool dont_output_data) && !DECL_VIRTUAL_P (decl)) { #ifdef DATA_ALIGNMENT - unsigned int data_align = DATA_ALIGNMENT (TREE_TYPE (decl), align); + unsigned HOST_WIDE_INT data_align + = DATA_ALIGNMENT (TREE_TYPE (decl), align); /* Don't increase alignment too much for TLS variables - TLS space is too precious. */ if (! DECL_THREAD_LOCAL_P (decl) || data_align <= BITS_PER_WORD) @@ -1080,11 +1081,19 @@ align_variable (tree decl, bool dont_output_data) to mark offlined constructors. */ && (in_lto_p || DECL_INITIAL (decl) != error_mark_node)) { - unsigned int const_align + unsigned HOST_WIDE_INT const_align = targetm.constant_alignment (DECL_INITIAL (decl), align); /* Don't increase alignment too much for TLS variables - TLS - space is too precious. */ - if (! DECL_THREAD_LOCAL_P (decl) || const_align <= BITS_PER_WORD) + space is too precious. + MORELLO TODO + After applying CHERI alignment requirements we may have a very + large alignment. Avoid recording new alignment if it's + greater than UINT_MAX. Most users of DECL_ALIGN use an + unsigned int so this could overflow. Marking a smaller + alignment is also correct, so this is not a correctness issue. + */ + if ((! DECL_THREAD_LOCAL_P (decl) || const_align <= BITS_PER_WORD) + && const_align < UINT_MAX) align = const_align; } } @@ -1101,7 +1110,7 @@ align_variable (tree decl, bool dont_output_data) static unsigned int get_variable_align (tree decl) { - unsigned int align = DECL_ALIGN (decl); + unsigned HOST_WIDE_INT align = DECL_ALIGN (decl); /* For user aligned vars or static vars align_variable already did everything. */ @@ -1121,7 +1130,8 @@ get_variable_align (tree decl) { /* On some machines, it is good to increase alignment sometimes. */ #ifdef DATA_ALIGNMENT - unsigned int data_align = DATA_ALIGNMENT (TREE_TYPE (decl), align); + unsigned HOST_WIDE_INT data_align + = DATA_ALIGNMENT (TREE_TYPE (decl), align); /* Don't increase alignment too much for TLS variables - TLS space is too precious. */ if (! DECL_THREAD_LOCAL_P (decl) || data_align <= BITS_PER_WORD) @@ -1132,11 +1142,15 @@ get_variable_align (tree decl) to mark offlined constructors. */ && (in_lto_p || DECL_INITIAL (decl) != error_mark_node)) { - unsigned int const_align + unsigned HOST_WIDE_INT const_align = targetm.constant_alignment (DECL_INITIAL (decl), align); /* Don't increase alignment too much for TLS variables - TLS space - is too precious. */ - if (! DECL_THREAD_LOCAL_P (decl) || const_align <= BITS_PER_WORD) + is too precious. + MORELLO TODO After applying CHERI alignment requirements we may + have a very large alignment. Avoid returning this alignment for + now. */ + if ((! DECL_THREAD_LOCAL_P (decl) || const_align <= BITS_PER_WORD) + && const_align < UINT_MAX) align = const_align; } } @@ -1982,23 +1996,6 @@ assemble_string (const char *p, int size) \f -/* Handle using targetm.data_alignment hook on an alignment provided in bits. - Since the hook takes an alignment provided in bytes we could lose some - bit-wise alignment requirement. This ensures that we maintain the bit-wise - alignment if the hook does not increase the alignment requirement. */ -static unsigned HOST_WIDE_INT -alignment_pad_from_bits (unsigned HOST_WIDE_INT size, - unsigned HOST_WIDE_INT align_orig, - const_tree decl) -{ - unsigned HOST_WIDE_INT align - = targetm.data_alignment (size, align_orig/BITS_PER_UNIT, decl); - if (align == 1 && align_orig < BITS_PER_UNIT) - return align_orig; - else - return align * BITS_PER_UNIT; -} - /* A noswitch_section_callback for lcomm_section. */ static bool diff --git a/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_default_n/sizes.cc b/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_default_n/sizes.cc index 6842c76df07..8bd4a27f1cb 100644 --- a/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_default_n/sizes.cc +++ b/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_default_n/sizes.cc @@ -42,9 +42,9 @@ test02() }; int i[3]; - Size n = {4}; + Size n = {3}; auto j = std::__uninitialized_default_n(i, n); - VERIFY( j == (i + 4) ); + VERIFY( j == (i + 3) ); } int diff --git a/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_value_construct_n/sizes.cc b/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_value_construct_n/sizes.cc index 7ad55e01157..75ad352ae88 100644 --- a/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_value_construct_n/sizes.cc +++ b/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_value_construct_n/sizes.cc @@ -43,9 +43,9 @@ test02() }; int i[3]; - Size n = {4}; - auto j = std::__uninitialized_default_n(i, n); - VERIFY( j == (i + 4) ); + Size n = {3}; + auto j = std::uninitialized_value_construct_n(i, n); + VERIFY( j == (i + 3) ); } int diff --git a/libstdc++-v3/testsuite/25_algorithms/lexicographical_compare/constrained.cc b/libstdc++-v3/testsuite/25_algorithms/lexicographical_compare/constrained.cc index b82c872bbbb..2019bbc75e4 100644 --- a/libstdc++-v3/testsuite/25_algorithms/lexicographical_compare/constrained.cc +++ b/libstdc++-v3/testsuite/25_algorithms/lexicographical_compare/constrained.cc @@ -136,7 +136,7 @@ test03() VERIFY( !ranges::lexicographical_compare(cy.begin(), cy.end(), cz.begin(), cz.end()) ); - std::vector<int> vx(x, x+5), vy(y, y+5); + std::vector<int> vx(x, x+5), vy(y, y+4); VERIFY( ranges::lexicographical_compare(vx, vy) ); VERIFY( !ranges::lexicographical_compare(vx, vy, ranges::greater{}) ); VERIFY( !ranges::lexicographical_compare(vy, vx) );
reply other threads:[~2022-10-13 10:46 UTC|newest] Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20221013104648.A87183857839@sourceware.org \ --to=matmal01@gcc.gnu.org \ --cc=gcc-cvs@gcc.gnu.org \ --cc=libstdc++-cvs@gcc.gnu.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).