From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 126396 invoked by alias); 31 Jul 2018 13:36:44 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 126385 invoked by uid 89); 31 Jul 2018 13:36:43 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-24.4 required=5.0 tests=BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_LAZY_DOMAIN_SECURITY,KAM_STOCKGEN autolearn=ham version=3.3.2 spammy=knew, Play, Forces X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 31 Jul 2018 13:36:38 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1A43A7A9; Tue, 31 Jul 2018 06:36:37 -0700 (PDT) Received: from [10.2.207.77] (e100706-lin.cambridge.arm.com [10.2.207.77]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 0881F3F5D0; Tue, 31 Jul 2018 06:36:35 -0700 (PDT) Message-ID: <5B6065E2.2030203@foss.arm.com> Date: Tue, 31 Jul 2018 13:36:00 -0000 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Thomas Preudhomme CC: Ramana Radhakrishnan , Richard Earnshaw , gcc-patches@gcc.gnu.org Subject: Re: [PATCH, ARM] PR85434: Prevent spilling of stack protector guard's address on ARM References: <8ba1a628-50de-f87a-0b83-5067d5178c22@redhat.com> <5B506FBE.1050701@foss.arm.com> In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-SW-Source: 2018-07/txt/msg01946.txt.bz2 Hi Thomas, On 25/07/18 14:28, Thomas Preudhomme wrote: > Hi Kyrill, > > Using memory_operand worked, the issues I encountered when using it in > earlier versions of the patch must have been due to the missing test > on address_operand in the preparation statements which I added later. > Please find an updated patch in attachment. ChangeLog entry is as > follows: > > *** gcc/ChangeLog *** > > 2018-07-05 Thomas Preud'homme > > * target-insns.def (stack_protect_combined_set): Define new standard > pattern name. > (stack_protect_combined_test): Likewise. > * cfgexpand.c (stack_protect_prologue): Try new > stack_protect_combined_set pattern first. > * function.c (stack_protect_epilogue): Try new > stack_protect_combined_test pattern first. > * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now > parameters to control which register to use as PIC register and force > reloading PIC register respectively. Insert in the stream of insns if > possible. > (legitimize_pic_address): Expose above new parameters in prototype and > adapt recursive calls accordingly. > (arm_legitimize_address): Adapt to new legitimize_pic_address > prototype. > (thumb_legitimize_address): Likewise. > (arm_emit_call_insn): Adapt to new require_pic_register prototype. > * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype > change. > * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address > prototype change. > (stack_protect_combined_set): New insn_and_split pattern. > (stack_protect_set): New insn pattern. > (stack_protect_combined_test): New insn_and_split pattern. > (stack_protect_test): New insn pattern. > * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec. > (UNSPEC_SP_TEST): Likewise. > * doc/md.texi (stack_protect_combined_set): Document new standard > pattern name. > (stack_protect_set): Clarify that the operand for guard's address is > legal. > (stack_protect_combined_test): Document new standard pattern name. > (stack_protect_test): Clarify that the operand for guard's address is > legal. > > *** gcc/testsuite/ChangeLog *** > > 2018-07-05 Thomas Preud'homme > > * gcc.target/arm/pr85434.c: New test. > > Bootstrapped again for Arm and Thumb-2 and regtested with and without > -fstack-protector-all without any regression. This looks ok to me now. Thank you for your patience and addressing my comments from before. Kyrill > Best regards, > > Thomas > On Thu, 19 Jul 2018 at 17:34, Thomas Preudhomme > wrote: >> [Dropping Jeff Law from the list since he already commented on the >> middle end parts] >> >> Hi Kyrill, >> >> On Thu, 19 Jul 2018 at 12:02, Kyrill Tkachov >> wrote: >>> Hi Thomas, >>> >>> On 17/07/18 12:02, Thomas Preudhomme wrote: >>>> Fixed in attached patch. ChangeLog entries are unchanged: >>>> >>>> *** gcc/ChangeLog *** >>>> >>>> 2018-07-05 Thomas Preud'homme >>>> >>>> PR target/85434 >>>> * target-insns.def (stack_protect_combined_set): Define new standard >>>> pattern name. >>>> (stack_protect_combined_test): Likewise. >>>> * cfgexpand.c (stack_protect_prologue): Try new >>>> stack_protect_combined_set pattern first. >>>> * function.c (stack_protect_epilogue): Try new >>>> stack_protect_combined_test pattern first. >>>> * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now >>>> parameters to control which register to use as PIC register and force >>>> reloading PIC register respectively. >>>> (legitimize_pic_address): Expose above new parameters in prototype and >>>> adapt recursive calls accordingly. >>>> (arm_legitimize_address): Adapt to new legitimize_pic_address >>>> prototype. >>>> (thumb_legitimize_address): Likewise. >>>> (arm_emit_call_insn): Adapt to new require_pic_register prototype. >>>> * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype >>>> change. >>>> * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address >>>> prototype change. >>>> (stack_protect_combined_set): New insn_and_split pattern. >>>> (stack_protect_set): New insn pattern. >>>> (stack_protect_combined_test): New insn_and_split pattern. >>>> (stack_protect_test): New insn pattern. >>>> * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec. >>>> (UNSPEC_SP_TEST): Likewise. >>>> * doc/md.texi (stack_protect_combined_set): Document new standard >>>> pattern name. >>>> (stack_protect_set): Clarify that the operand for guard's address is >>>> legal. >>>> (stack_protect_combined_test): Document new standard pattern name. >>>> (stack_protect_test): Clarify that the operand for guard's address is >>>> legal. >>>> >>>> *** gcc/testsuite/ChangeLog *** >>>> >>>> 2018-07-05 Thomas Preud'homme >>>> >>>> PR target/85434 >>>> * gcc.target/arm/pr85434.c: New test. >>>> >>> Sorry for the delay. Some comments inline. >>> >>> Kyrill >>> >>> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c >>> index d6e3c382085..d1a893ac56e 100644 >>> --- a/gcc/cfgexpand.c >>> +++ b/gcc/cfgexpand.c >>> @@ -6105,8 +6105,18 @@ stack_protect_prologue (void) >>> { >>> tree guard_decl = targetm.stack_protect_guard (); >>> rtx x, y; >>> + struct expand_operand ops[2]; >>> >>> x = expand_normal (crtl->stack_protect_guard); >>> + create_fixed_operand (&ops[0], x); >>> + create_fixed_operand (&ops[1], DECL_RTL (guard_decl)); >>> + /* Allow the target to compute address of Y and copy it to X without >>> + leaking Y into a register. This combined address + copy pattern allows >>> + the target to prevent spilling of any intermediate results by splitting >>> + it after register allocator. */ >>> + if (maybe_expand_insn (targetm.code_for_stack_protect_combined_set, 2, ops)) >>> + return; >>> + >>> if (guard_decl) >>> y = expand_normal (guard_decl); >>> else >>> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h >>> index 8537262ce64..100844e659c 100644 >>> --- a/gcc/config/arm/arm-protos.h >>> +++ b/gcc/config/arm/arm-protos.h >>> @@ -67,7 +67,7 @@ extern int const_ok_for_dimode_op (HOST_WIDE_INT, enum rtx_code); >>> extern int arm_split_constant (RTX_CODE, machine_mode, rtx, >>> HOST_WIDE_INT, rtx, rtx, int); >>> extern int legitimate_pic_operand_p (rtx); >>> -extern rtx legitimize_pic_address (rtx, machine_mode, rtx); >>> +extern rtx legitimize_pic_address (rtx, machine_mode, rtx, rtx, bool); >>> extern rtx legitimize_tls_address (rtx, rtx); >>> extern bool arm_legitimate_address_p (machine_mode, rtx, bool); >>> extern int arm_legitimate_address_outer_p (machine_mode, rtx, RTX_CODE, int); >>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c >>> index ec3abbcba9f..f4a970580c2 100644 >>> --- a/gcc/config/arm/arm.c >>> +++ b/gcc/config/arm/arm.c >>> @@ -7369,20 +7369,26 @@ legitimate_pic_operand_p (rtx x) >>> } >>> >>> /* Record that the current function needs a PIC register. Initialize >>> - cfun->machine->pic_reg if we have not already done so. */ >>> + cfun->machine->pic_reg if we have not already done so. >>> + >>> + If not NULL, PIC_REG indicates which register to use as PIC register, >>> + otherwise it is decided by register allocator. COMPUTE_NOW forces the PIC >>> + register to be loaded, irregardless of whether it was loaded previously. */ >>> >>> static void >>> -require_pic_register (void) >>> +require_pic_register (rtx pic_reg, bool compute_now) >>> { >>> /* A lot of the logic here is made obscure by the fact that this >>> routine gets called as part of the rtx cost estimation process. >>> We don't want those calls to affect any assumptions about the real >>> function; and further, we can't call entry_of_function() until we >>> start the real expansion process. */ >>> - if (!crtl->uses_pic_offset_table) >>> + if (!crtl->uses_pic_offset_table || compute_now) >>> { >>> - gcc_assert (can_create_pseudo_p ()); >>> + gcc_assert (can_create_pseudo_p () >>> + || (pic_reg != NULL_RTX && GET_MODE (pic_reg) == Pmode)); >>> if (arm_pic_register != INVALID_REGNUM >>> + && can_create_pseudo_p () >>> && !(TARGET_THUMB1 && arm_pic_register > LAST_LO_REGNUM)) >>> { >>> if (!cfun->machine->pic_reg) >>> @@ -7399,7 +7405,8 @@ require_pic_register (void) >>> rtx_insn *seq, *insn; >>> >>> if (!cfun->machine->pic_reg) >>> - cfun->machine->pic_reg = gen_reg_rtx (Pmode); >>> + cfun->machine->pic_reg = >>> + can_create_pseudo_p () ? gen_reg_rtx (Pmode) : pic_reg; >>> >>> >>> I'll admit I'm not too familiar with this code, but don't you want to be setting cfun->machine->pic_reg to the pic_reg that >>> was passed into this function (if pic_reg != NULL as per the function comment)? >> Indeed the comment needs to be fixed. I think it only makes sense to >> specify a register after register allocation when the amount of >> register is finite. Before that I don't see why one would want to do >> that. I've changed the sentence about PIC_REG in the heading comment >> to: >> >> "A new pseudo register is used for the PIC register if possible, >> otherwise PIC_REG must be non NULL and is used instead." >> >>> >>> /* Play games to avoid marking the function as needing pic >>> if we are being called as part of the cost-estimation >>> @@ -7410,7 +7417,8 @@ require_pic_register (void) >>> start_sequence (); >>> >>> if (TARGET_THUMB1 && arm_pic_register != INVALID_REGNUM >>> - && arm_pic_register > LAST_LO_REGNUM) >>> + && arm_pic_register > LAST_LO_REGNUM >>> + && can_create_pseudo_p ()) >>> emit_move_insn (cfun->machine->pic_reg, >>> gen_rtx_REG (Pmode, arm_pic_register)); >>> else >>> @@ -7427,15 +7435,29 @@ require_pic_register (void) >>> we can't yet emit instructions directly in the final >>> insn stream. Queue the insns on the entry edge, they will >>> be committed after everything else is expanded. */ >>> - insert_insn_on_edge (seq, >>> - single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun))); >>> + if (can_create_pseudo_p ()) >>> + insert_insn_on_edge (seq, >>> + single_succ_edge >>> + (ENTRY_BLOCK_PTR_FOR_FN (cfun))); >>> + else >>> + emit_insn (seq); >>> } >>> >>> It's not the capability for creating pseudos that is relevant here, but whether we have an instruction >>> stream to insert insns on. Do you think that using currently_expanding_to_rtl would work as a condition? >> Indeed it's not the right test. Do you mean testing if >> currently_expanding_to_rtl is false? I used can_create_pseudo_p >> because I knew the function could never be called when it was false >> before and thus my changes would only impact the code I was changing. >> >>> >>> +/* Legimitize PIC load to ORIG into REG. If REG is NULL, a new pseudo is >>> >>> "Legitimize" >>> >>> + created to hold the result of the load. If not NULL, PIC_REG indicates >>> + which register to use as PIC register, otherwise it is decided by register >>> + allocator. COMPUTE_NOW forces the PIC register to be loaded at the current >>> + location in the instruction stream, irregardless of whether it was loaded >>> + previously. >>> + >>> + Returns the register REG into which the PIC load is performed. */ >>> + >>> rtx >>> -legitimize_pic_address (rtx orig, machine_mode mode, rtx reg) >>> +legitimize_pic_address (rtx orig, machine_mode mode, rtx reg, rtx pic_reg, >>> + bool compute_now) >>> { >>> if (GET_CODE (orig) == SYMBOL_REF >>> || GET_CODE (orig) == LABEL_REF) >>> @@ -7469,7 +7491,7 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg) >>> rtx mem; >>> >>> /* If this function doesn't have a pic register, create one now. */ >>> - require_pic_register (); >>> + require_pic_register (pic_reg, compute_now); >>> >>> pat = gen_calculate_pic_address (reg, cfun->machine->pic_reg, orig); >>> >>> @@ -7520,9 +7542,11 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg) >>> >>> gcc_assert (GET_CODE (XEXP (orig, 0)) == PLUS); >>> >>> - base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg); >>> + base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg, >>> + pic_reg, compute_now); >>> offset = legitimize_pic_address (XEXP (XEXP (orig, 0), 1), Pmode, >>> - base == reg ? 0 : reg); >>> + base == reg ? 0 : reg, pic_reg, >>> + compute_now); >>> >>> if (CONST_INT_P (offset)) >>> { >>> @@ -8707,7 +8731,8 @@ arm_legitimize_address (rtx x, rtx orig_x, machine_mode mode) >>> { >>> /* We need to find and carefully transform any SYMBOL and LABEL >>> references; so go back to the original address expression. */ >>> - rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX); >>> + rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX, >>> + false /*compute_now*/); >>> >>> if (new_x != orig_x) >>> x = new_x; >>> @@ -8775,7 +8800,8 @@ thumb_legitimize_address (rtx x, rtx orig_x, machine_mode mode) >>> { >>> /* We need to find and carefully transform any SYMBOL and LABEL >>> references; so go back to the original address expression. */ >>> - rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX); >>> + rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX, >>> + false /*compute_now*/); >>> >>> if (new_x != orig_x) >>> x = new_x; >>> @@ -18059,7 +18085,7 @@ arm_emit_call_insn (rtx pat, rtx addr, bool sibcall) >>> ? !targetm.binds_local_p (SYMBOL_REF_DECL (addr)) >>> : !SYMBOL_REF_LOCAL_P (addr))) >>> { >>> - require_pic_register (); >>> + require_pic_register (NULL_RTX, false /*compute_now*/); >>> use_reg (&CALL_INSN_FUNCTION_USAGE (insn), cfun->machine->pic_reg); >>> } >>> >>> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md >>> index 361a02668b0..9582c3fbb11 100644 >>> --- a/gcc/config/arm/arm.md >>> +++ b/gcc/config/arm/arm.md >>> @@ -6021,7 +6021,8 @@ >>> operands[1] = legitimize_pic_address (operands[1], SImode, >>> (!can_create_pseudo_p () >>> ? operands[0] >>> - : 0)); >>> + : NULL_RTX), NULL_RTX, >>> + false /*compute_now*/); >>> } >>> " >>> ) >>> @@ -8634,6 +8635,95 @@ >>> (set_attr "conds" "clob")] >>> ) >>> >>> +;; Named patterns for stack smashing protection. >>> +(define_insn_and_split "stack_protect_combined_set" >>> + [(set (match_operand:SI 0 "memory_operand" "=m") >>> + (unspec:SI [(match_operand:SI 1 "general_operand" "X")] >>> + UNSPEC_SP_SET)) >>> + (match_scratch:SI 2 "=r") >>> + (match_scratch:SI 3 "=r")] >>> + "" >>> + "#" >>> + "reload_completed" >>> + [(parallel [(set (match_dup 0) (unspec:SI [(mem:SI (match_dup 2))] >>> + UNSPEC_SP_SET)) >>> + (clobber (match_dup 2))])] >>> + " >>> +{ >>> + rtx addr = XEXP (operands[1], 0); >>> >>> >>> I think you want to tighten the predicate of operands[1]. >>> general_operand is very, well, general, but you're taking its 0th subtree, which it may not necessarily have. >> I was afraid you'd ask that. Yes indeed a better predicate should be >> used but I'm not sure all the possible cases here. However it has to >> be some sort of memory since this is the address of the guard. I think >> the problem with using memory_operand was that it could be a MEM of a >> SYMBOL_REF which is not legitimate at this point. >> >>> + if (flag_pic) >>> + { >>> + /* Forces recomputing of GOT base now. */ >>> + operands[1] = legitimize_pic_address (addr, SImode, operands[2], >>> + operands[3], true /*compute_now*/); >>> + } >>> + else >>> + { >>> + if (!address_operand (addr, SImode)) >>> + operands[1] = force_const_mem (SImode, addr); >>> + emit_move_insn (operands[2], operands[1]); >>> + } >>> +}" >>> +) >>> + >>> +(define_insn "stack_protect_set" >>> + [(set (match_operand:SI 0 "memory_operand" "=m") >>> + (unspec:SI [(mem:SI (match_operand:SI 1 "register_operand" "r"))] >>> + UNSPEC_SP_SET)) >>> + (clobber (match_dup 1))] >>> >>> I'm not familiar with this idiom, don't you instead want to mark the constraint on operand 1 as earlyclobber (or read-write)? >> According to the documentation earlyclobber applies to operands being >> written to and this is an input operand only. As the doc says the >> earlyclobber indicates to the compiler that it cannot reuse a register >> used for an input operand to put the output operand into. Here I'm >> trying to express instead that the operand needs to be saved/restored >> if needed after this insn because it gets written to. This is only >> needed in Thumb-1 because stack_protect_set_combined and >> stack_protect_test_combined already use all registers available so I >> cannot ask for an extra clobber register. >> >>> + "" >>> + "ldr\\t%1, [%1]\;str\\t%1, %0\;mov\t%1,0" >>> + [(set_attr "length" "12") >>> + (set_attr "type" "multiple")]) >>> + >>> +(define_insn_and_split "stack_protect_combined_test" >>> + [(set (pc) >>> + (if_then_else >>> + (eq (match_operand:SI 0 "memory_operand" "m") >>> + (unspec:SI [(match_operand:SI 1 "general_operand" "X")] >>> + UNSPEC_SP_TEST)) >>> + (label_ref (match_operand 2)) >>> + (pc))) >>> + (match_scratch:SI 3 "=r") >>> + (match_scratch:SI 4 "=r")] >>> + "" >>> + "#" >>> + "reload_completed" >>> + [(const_int 0)] >>> +{ >>> + rtx eq, addr; >>> + >>> + addr = XEXP (operands[1], 0); >>> >>> Same concern as with stack_protect_combined_set about operand 1 predicate being too general. >> Will try to create a new predicate, thanks for raising this up. >> >>> + if (flag_pic) >>> + { >>> + /* Forces recomputing of GOT base now. */ >>> + operands[1] = legitimize_pic_address (addr, SImode, operands[3], >>> + operands[4], >>> + true /*compute_now*/); >>> + } >>> + else >>> + { >>> + if (!address_operand (addr, SImode)) >>> + operands[1] = force_const_mem (SImode, addr); >>> + emit_move_insn (operands[3], operands[1]); >>> + } >>> + emit_insn (gen_stack_protect_test (operands[4], operands[0], operands[3])); >>> + eq = gen_rtx_EQ (VOIDmode, operands[4], const0_rtx); >>> + emit_jump_insn (gen_cbranchsi4 (eq, operands[4], const0_rtx, operands[2])); >>> + DONE; >>> +}) >>> + >>> +(define_insn "stack_protect_test" >>> + [(set (match_operand:SI 0 "register_operand" "=r") >>> + (unspec:SI [(match_operand:SI 1 "memory_operand" "m") >>> + (mem:SI (match_operand:SI 2 "register_operand" "r"))] >>> + UNSPEC_SP_TEST)) >>> + (clobber (match_dup 2))] >>> >>> Same concern on clobbering and constraints. >> Same answer :-) >> >> Best regards, >> >> Thomas >> >>> + "" >>> + "ldr\t%0, [%2]\;ldr\t%2, %1\;eor\t%0, %2, %0" >>> + [(set_attr "length" "12") >>> + (set_attr "type" "multiple")]) >>> + >>> (define_expand "casesi" >>> [(match_operand:SI 0 "s_register_operand" "") ; index to jump on >>> (match_operand:SI 1 "const_int_operand" "") ; lower bound >>> diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md >>> index b05f85e10e4..429c937df60 100644 >>> --- a/gcc/config/arm/unspecs.md >>> +++ b/gcc/config/arm/unspecs.md >>> @@ -86,6 +86,9 @@ >>> UNSPEC_PROBE_STACK ; Probe stack memory reference >>> UNSPEC_NONSECURE_MEM ; Represent non-secure memory in ARMv8-M with >>> ; security extension >>> + UNSPEC_SP_SET ; Represent the setting of stack protector's canary >>> + UNSPEC_SP_TEST ; Represent the testing of stack protector's canary >>> + ; against the guard. >>> ]) >>> >>> (define_c_enum "unspec" [ >>> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi >>> index 734bc765387..f760da495bf 100644 >>> --- a/gcc/doc/md.texi >>> +++ b/gcc/doc/md.texi >>> @@ -7362,22 +7362,61 @@ builtins. >>> The get/set patterns have a single output/input operand respectively, >>> with @var{mode} intended to be @code{Pmode}. >>> >>> +@cindex @code{stack_protect_combined_set} instruction pattern >>> +@item @samp{stack_protect_combined_set} >>> +This pattern, if defined, moves a @code{ptr_mode} value from an address >>> +whose declaration RTX is given in operand 1 to the memory in operand 0 >>> +without leaving the value in a register afterward. If several >>> +instructions are needed by the target to perform the operation (eg. to >>> +load the address from a GOT entry then load the @code{ptr_mode} value >>> +and finally store it), it is the backend's responsibility to ensure no >>> +intermediate result gets spilled. This is to avoid leaking the value >>> +some place that an attacker might use to rewrite the stack guard slot >>> +after having clobbered it. >>> + >>> +If this pattern is not defined, then the address declaration is >>> +expanded first in the standard way and a @code{stack_protect_set} >>> +pattern is then generated to move the value from that address to the >>> +address in operand 0. >>> + >>> @cindex @code{stack_protect_set} instruction pattern >>> @item @samp{stack_protect_set} >>> -This pattern, if defined, moves a @code{ptr_mode} value from the memory >>> -in operand 1 to the memory in operand 0 without leaving the value in >>> -a register afterward. This is to avoid leaking the value some place >>> -that an attacker might use to rewrite the stack guard slot after >>> -having clobbered it. >>> +This pattern, if defined, moves a @code{ptr_mode} value from the valid >>> +memory location in operand 1 to the memory in operand 0 without leaving >>> +the value in a register afterward. This is to avoid leaking the value >>> +some place that an attacker might use to rewrite the stack guard slot >>> +after having clobbered it. >>> + >>> +Note: on targets where the addressing modes do not allow to load >>> +directly from stack guard address, the address is expanded in a standard >>> +way first which could cause some spills. >>> >>> If this pattern is not defined, then a plain move pattern is generated. >>> >>> +@cindex @code{stack_protect_combined_test} instruction pattern >>> +@item @samp{stack_protect_combined_test} >>> +This pattern, if defined, compares a @code{ptr_mode} value from an >>> +address whose declaration RTX is given in operand 1 with the memory in >>> +operand 0 without leaving the value in a register afterward and >>> +branches to operand 2 if the values were equal. If several >>> +instructions are needed by the target to perform the operation (eg. to >>> +load the address from a GOT entry then load the @code{ptr_mode} value >>> +and finally store it), it is the backend's responsibility to ensure no >>> +intermediate result gets spilled. This is to avoid leaking the value >>> +some place that an attacker might use to rewrite the stack guard slot >>> +after having clobbered it. >>> + >>> +If this pattern is not defined, then the address declaration is >>> +expanded first in the standard way and a @code{stack_protect_test} >>> +pattern is then generated to compare the value from that address to the >>> +value at the memory in operand 0. >>> + >>> @cindex @code{stack_protect_test} instruction pattern >>> @item @samp{stack_protect_test} >>> This pattern, if defined, compares a @code{ptr_mode} value from the >>> -memory in operand 1 with the memory in operand 0 without leaving the >>> -value in a register afterward and branches to operand 2 if the values >>> -were equal. >>> +valid memory location in operand 1 with the memory in operand 0 without >>> +leaving the value in a register afterward and branches to operand 2 if >>> +the values were equal. >>> >>> If this pattern is not defined, then a plain compare pattern and >>> conditional branch pattern is used. >>> diff --git a/gcc/function.c b/gcc/function.c >>> index 142cdaec2ce..bf96dd6feae 100644 >>> --- a/gcc/function.c >>> +++ b/gcc/function.c >>> @@ -4886,20 +4886,33 @@ stack_protect_epilogue (void) >>> rtx_code_label *label = gen_label_rtx (); >>> rtx x, y; >>> rtx_insn *seq; >>> + struct expand_operand ops[3]; >>> >>> x = expand_normal (crtl->stack_protect_guard); >>> - if (guard_decl) >>> - y = expand_normal (guard_decl); >>> - else >>> - y = const0_rtx; >>> - >>> - /* Allow the target to compare Y with X without leaking either into >>> - a register. */ >>> - if (targetm.have_stack_protect_test () >>> - && ((seq = targetm.gen_stack_protect_test (x, y, label)) != NULL_RTX)) >>> - emit_insn (seq); >>> - else >>> - emit_cmp_and_jump_insns (x, y, EQ, NULL_RTX, ptr_mode, 1, label); >>> + create_fixed_operand (&ops[0], x); >>> + create_fixed_operand (&ops[1], DECL_RTL (guard_decl)); >>> + create_fixed_operand (&ops[2], label); >>> + /* Allow the target to compute address of Y and compare it with X without >>> + leaking Y into a register. This combined address + compare pattern allows >>> + the target to prevent spilling of any intermediate results by splitting >>> + it after register allocator. */ >>> + if (!maybe_expand_jump_insn (targetm.code_for_stack_protect_combined_test, >>> + 3, ops)) >>> + { >>> + if (guard_decl) >>> + y = expand_normal (guard_decl); >>> + else >>> + y = const0_rtx; >>> + >>> + /* Allow the target to compare Y with X without leaking either into >>> + a register. */ >>> + if (targetm.have_stack_protect_test () >>> + && ((seq = targetm.gen_stack_protect_test (x, y, label)) >>> + != NULL_RTX)) >>> + emit_insn (seq); >>> + else >>> + emit_cmp_and_jump_insns (x, y, EQ, NULL_RTX, ptr_mode, 1, label); >>> + } >>> >>> /* The noreturn predictor has been moved to the tree level. The rtl-level >>> predictors estimate this branch about 20%, which isn't enough to get >>> diff --git a/gcc/target-insns.def b/gcc/target-insns.def >>> index 9a552c3d11c..d39889b3522 100644 >>> --- a/gcc/target-insns.def >>> +++ b/gcc/target-insns.def >>> @@ -96,7 +96,9 @@ DEF_TARGET_INSN (sibcall_value, (rtx x0, rtx x1, rtx opt2, rtx opt3, >>> DEF_TARGET_INSN (simple_return, (void)) >>> DEF_TARGET_INSN (split_stack_prologue, (void)) >>> DEF_TARGET_INSN (split_stack_space_check, (rtx x0, rtx x1)) >>> +DEF_TARGET_INSN (stack_protect_combined_set, (rtx x0, rtx x1)) >>> DEF_TARGET_INSN (stack_protect_set, (rtx x0, rtx x1)) >>> +DEF_TARGET_INSN (stack_protect_combined_test, (rtx x0, rtx x1, rtx x2)) >>> DEF_TARGET_INSN (stack_protect_test, (rtx x0, rtx x1, rtx x2)) >>> DEF_TARGET_INSN (store_multiple, (rtx x0, rtx x1, rtx x2)) >>> DEF_TARGET_INSN (tablejump, (rtx x0, rtx x1)) >>> diff --git a/gcc/testsuite/gcc.target/arm/pr85434.c b/gcc/testsuite/gcc.target/arm/pr85434.c >>> new file mode 100644 >>> index 00000000000..4143a861f7c >>> --- /dev/null >>> +++ b/gcc/testsuite/gcc.target/arm/pr85434.c >>> @@ -0,0 +1,200 @@ >>> +/* { dg-do compile } */ >>> +/* { dg-require-effective-target fstack_protector }*/ >>> +/* { dg-require-effective-target fpic }*/ >>> +/* { dg-additional-options "-Os -fpic -fstack-protector-strong" } */ >>> + >>> +#include >>> +#include >>> + >>> + >>> +static const unsigned char base64_enc_map[64] = >>> +{ >>> + 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', >>> + 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', >>> + 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', >>> + 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', >>> + 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', >>> + 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', >>> + '8', '9', '+', '/' >>> +}; >>> + >>> +#define BASE64_SIZE_T_MAX ( (size_t) -1 ) /* SIZE_T_MAX is not standard */ >>> + >>> + >>> +void doSmth(void *x); >>> + >>> +#include >>> + >>> + >>> +void check(int n) { >>> + >>> + if (!(n % 2 && n % 3 && n % 5)) { >>> + __asm__ ( "add r8, r8, #1;" ); >>> + } >>> +} >>> + >>> +uint32_t test( >>> + uint32_t a1, >>> + uint32_t a2, >>> + size_t a3, >>> + size_t a4, >>> + size_t a5, >>> + size_t a6) >>> +{ >>> + uint32_t nResult = 0; >>> + uint8_t* h = 0L; >>> + uint8_t X[128]; >>> + uint8_t mac[64]; >>> + size_t len; >>> + >>> + doSmth(&a1); >>> + doSmth(&a2); >>> + doSmth(&a3); >>> + doSmth(&a4); >>> + doSmth(&a5); >>> + doSmth(&a6); >>> + >>> + if (a1 && a2 && a3 && a4 && a5 && a6) { >>> + nResult = 1; >>> + h = (void*)X; >>> + len = sizeof(X); >>> + memset(X, a2, len); >>> + len -= 64; >>> + memcpy(mac ,X, len); >>> + *(h + len) = a6; >>> + >>> + { >>> + >>> + >>> + unsigned char *dst = X; >>> + size_t dlen = a3; >>> + size_t *olen = &a6; >>> + const unsigned char *src = mac; >>> + size_t slen = a4; >>> + size_t i, n; >>> + int C1, C2, C3; >>> + unsigned char *p; >>> + >>> + if( slen == 0 ) >>> + { >>> + *olen = 0; >>> + return( 0 ); >>> + } >>> + >>> + n = slen / 3 + ( slen % 3 != 0 ); >>> + >>> + if( n > ( BASE64_SIZE_T_MAX - 1 ) / 4 ) >>> + { >>> + *olen = BASE64_SIZE_T_MAX; >>> + return( 0 ); >>> + } >>> + >>> + n *= 4; >>> + >>> + if( ( dlen < n + 1 ) || ( NULL == dst ) ) >>> + { >>> + *olen = n + 1; >>> + return( 0 ); >>> + } >>> + >>> + n = ( slen / 3 ) * 3; >>> + >>> + for( i = 0, p = dst; i < n; i += 3 ) >>> + { >>> + C1 = *src++; >>> + C2 = *src++; >>> + C3 = *src++; >>> + >>> + check(i); >>> + >>> + *p++ = base64_enc_map[(C1 >> 2) & 0x3F]; >>> + *p++ = base64_enc_map[(((C1 & 3) << 4) + (C2 >> 4)) & 0x3F]; >>> + *p++ = base64_enc_map[(((C2 & 15) << 2) + (C3 >> 6)) & 0x3F]; >>> + *p++ = base64_enc_map[C3 & 0x3F]; >>> + } >>> + >>> + if( i < slen ) >>> + { >>> + C1 = *src++; >>> + C2 = ( ( i + 1 ) < slen ) ? *src++ : 0; >>> + >>> + *p++ = base64_enc_map[(C1 >> 2) & 0x3F]; >>> + *p++ = base64_enc_map[(((C1 & 3) << 4) + (C2 >> 4)) & 0x3F]; >>> + >>> + if( ( i + 1 ) < slen ) >>> + *p++ = base64_enc_map[((C2 & 15) << 2) & 0x3F]; >>> + else *p++ = '='; >>> + >>> + *p++ = '='; >>> + } >>> + >>> + *olen = p - dst; >>> + *p = 0; >>> + >>> +} >>> + >>> + __asm__ ("mov r8, %0;" : "=r" ( nResult )); >>> + } >>> + else >>> + { >>> + nResult = 2; >>> + } >>> + >>> + doSmth(X); >>> + doSmth(mac); >>> + >>> + >>> + return nResult; >>> +} >>> + >>> +/* The pattern below catches sequences of instructions that were generated >>> + for ARM and Thumb-2 before the fix for this PR. They are of the form: >>> + >>> + ldr rX, >>> + >>> + ldr rY, >>> + ldr rZ, [rX] >>> + >>> + cmp rY, rZ >>> + >>> + bl __stack_chk_fail >>> + >>> + Ideally the optional block would check for the various rX, rY and rZ >>> + registers not being set but this is not possible due to back references >>> + being illegal in lookahead expression in Tcl, thus preventing to use the >>> + only construct that allow to negate a regexp from using the backreferences >>> + to those registers. Instead we go for the heuristic of allowing non ldr/cmp >>> + instructions with the assumptions that (i) those are not part of the stack >>> + protector sequences and (ii) they would only be scheduled here if they don't >>> + conflict with registers used by stack protector. >>> + >>> + Note on the regexp logic: >>> + Allowing non X instructions (where X is ldr or cmp) is done by looking for >>> + some non newline spaces, followed by something which is not X, followed by >>> + an alphanumeric character followed by anything but a newline and ended by a >>> + newline the whole thing an undetermined number of times. The alphanumeric >>> + character is there to force the match of the negative lookahead for X to >>> + only happen after all the initial spaces and thus to check the mnemonic. >>> + This prevents it to match one of the initial space. */ >>> +/* { dg-final { scan-assembler-not {ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\]\n[ \t]+ldr[ \t]+([^,]+), \[\1\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+cmp[ \t]+\2, \3(?:\n[ \t]+(?!cmp)\w[^\n]*)*\n[ \t]+bl[ \t]+__stack_chk_fail} } } */ >>> + >>> +/* Likewise for Thumb-1 sequences of instructions prior to the fix for this PR >>> + which had the form: >>> + >>> + ldr rS, >>> + >>> + ldr rT, >>> + >>> + ldr rX, [rS, rT] >>> + >>> + ldr rY, >>> + ldr rZ, [rX] >>> + >>> + cmp rY, rZ >>> + >>> + bl __stack_chk_fail >>> + >>> + Note on the regexp logic: >>> + PC relative offset is checked by looking for a source operand that does not >>> + contain [ or ]. */ >>> +/* { dg-final { scan-assembler-not {ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), [^][\n]*(?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[\1, \2\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\]\n[ \t]+ldr[ \t]+([^,]+), \[\3\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+cmp[ \t]+\4, \5(?:\n[ \t]+(?!cmp)\w[^\n]*)*\n[ \t]+bl[ \t]+__stack_chk_fail} } } */ >>> >>>