Thanks Kyrill. Committed the attached patch. Best regards, Thomas On Wed, 21 Nov 2018 at 16:06, Kyrill Tkachov wrote: > > Hi Thomas, > > Sorry for the delay. > > On 16/11/18 14:56, Thomas Preudhomme wrote: > > Ping? > > > > Best regards, > > > > Thomas > > > > On Sat, 10 Nov 2018 at 15:07, Thomas Preudhomme > > wrote: > >> Thanks Kyrill. > >> > >> Updated patch in attachment. Best regards, > >> > >> Thomas > >> On Thu, 8 Nov 2018 at 15:53, Kyrill Tkachov wrote: > >>> Hi Thomas, > >>> > >>> On 08/11/18 09:52, Thomas Preudhomme wrote: > >>>> Ping? > >>>> > >>>> Best regards, > >>>> > >>>> Thomas > >>>> > >>>> On Thu, 1 Nov 2018 at 16:03, Thomas Preudhomme > >>>> wrote: > >>>>> Ping? > >>>>> > >>>>> Best regards, > >>>>> > >>>>> Thomas > >>>>> On Fri, 26 Oct 2018 at 22:41, Thomas Preudhomme > >>>>> wrote: > >>>>>> Hi, > >>>>>> > >>>>>> Please find updated patch to fix PR85434: spilling of stack protector > >>>>>> guard's address on ARM. Quite a few changes have been made to the ARM > >>>>>> part since last round of review so I think it makes more sense to > >>>>>> review it anew. Ran bootstrap + regression testsuite + glibc build + > >>>>>> glibc regression testsuite for Arm and Thumb-2 and bootstrap + > >>>>>> regression testsuite for Thumb-1. GCC's regression testsuite was run > >>>>>> in 3 configurations in all those cases: > >>>>>> > >>>>>> - default configuration (no RUNTESTFLAGS) > >>>>>> - with -fstack-protector-all > >>>>>> - with -fPIC -fstack-protector-all (to exercise both codepath in stack > >>>>>> protector's split code) > >>>>>> > >>>>>> None of this show any regression beyond some new scan fail with > >>>>>> -fstack-protector-all or -fPIC due to unexpected code sequence for the > >>>>>> testcases concerned and some guality swing due to less optimization > >>>>>> with new stack protector on. > >>>>>> > >>>>>> Patch description and ChangeLog below. > >>>>>> > >>>>>> In case of high register pressure in PIC mode, address of the stack > >>>>>> protector's guard can be spilled on ARM targets as shown in PR85434, > >>>>>> thus allowing an attacker to control what the canary would be compared > >>>>>> against. ARM does lack stack_protect_set and stack_protect_test insn > >>>>>> patterns, defining them does not help as the address is expanded > >>>>>> regularly and the patterns only deal with the copy and test of the > >>>>>> guard with the canary. > >>>>>> > >>>>>> This problem does not occur for x86 targets because the PIC access and > >>>>>> the test can be done in the same instruction. Aarch64 is exempt too > >>>>>> because PIC access insn pattern are mov of UNSPEC which prevents it from > >>>>>> the second access in the epilogue being CSEd in cse_local pass with the > >>>>>> first access in the prologue. > >>>>>> > >>>>>> The approach followed here is to create new "combined" set and test > >>>>>> standard pattern names that take the unexpanded guard and do the set or > >>>>>> test. This allows the target to use an opaque pattern (eg. using UNSPEC) > >>>>>> to hide the individual instructions being generated to the compiler and > >>>>>> split the pattern into generic load, compare and branch instruction > >>>>>> after register allocator, therefore avoiding any spilling. This is here > >>>>>> implemented for the ARM targets. For targets not implementing these new > >>>>>> standard pattern names, the existing stack_protect_set and > >>>>>> stack_protect_test pattern names are used. > >>>>>> > >>>>>> To be able to split PIC access after register allocation, the functions > >>>>>> had to be augmented to force a new PIC register load and to control > >>>>>> which register it loads into. This is because sharing the PIC register > >>>>>> between prologue and epilogue could lead to spilling due to CSE again > >>>>>> which an attacker could use to control what the canary gets compared > >>>>>> against. > >>>>>> > >>>>>> ChangeLog entries are as follows: > >>>>>> > >>>>>> *** gcc/ChangeLog *** > >>>>>> > >>>>>> 2018-10-26 Thomas Preud'homme > >>>>>> > >>>>>> * target-insns.def (stack_protect_combined_set): Define new standard > >>>>>> pattern name. > >>>>>> (stack_protect_combined_test): Likewise. > >>>>>> * cfgexpand.c (stack_protect_prologue): Try new > >>>>>> stack_protect_combined_set pattern first. > >>>>>> * function.c (stack_protect_epilogue): Try new > >>>>>> stack_protect_combined_test pattern first. > >>>>>> * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now > >>>>>> parameters to control which register to use as PIC register and force > >>>>>> reloading PIC register respectively. Insert in the stream of insns if > >>>>>> possible. > >>>>>> (legitimize_pic_address): Expose above new parameters in prototype and > >>>>>> adapt recursive calls accordingly. Use pic_reg if non null instead of > >>>>>> cached one. > >>>>>> (arm_load_pic_register): Add pic_reg parameter and use it if non null. > >>>>>> (arm_legitimize_address): Adapt to new legitimize_pic_address > >>>>>> prototype. > >>>>>> (thumb_legitimize_address): Likewise. > >>>>>> (arm_emit_call_insn): Adapt to require_pic_register prototype change. > >>>>>> (arm_expand_prologue): Adapt to arm_load_pic_register prototype change. > >>>>>> (thumb1_expand_prologue): Likewise. > >>>>>> * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype > >>>>>> change. > >>>>>> (arm_load_pic_register): Likewise. > >>>>>> * config/arm/predicated.md (guard_addr_operand): New predicate. > >>>>>> (guard_operand): New predicate. > >>>>>> * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address > >>>>>> prototype change. > >>>>>> (builtin_setjmp_receiver expander): Adapt to thumb1_expand_prologue > >>>>>> prototype change. > >>>>>> (stack_protect_combined_set): New expander.. > >>>>>> (stack_protect_combined_set_insn): New insn_and_split pattern. > >>>>>> (stack_protect_set_insn): New insn pattern. > >>>>>> (stack_protect_combined_test): New expander. > >>>>>> (stack_protect_combined_test_insn): New insn_and_split pattern. > >>>>>> (arm_stack_protect_test_insn): New insn pattern. > >>>>>> * config/arm/thumb1.md (thumb1_stack_protect_test_insn): New insn pattern. > >>>>>> * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec. > >>>>>> (UNSPEC_SP_TEST): Likewise. > >>>>>> * doc/md.texi (stack_protect_combined_set): Document new standard > >>>>>> pattern name. > >>>>>> (stack_protect_set): Clarify that the operand for guard's address is > >>>>>> legal. > >>>>>> (stack_protect_combined_test): Document new standard pattern name. > >>>>>> (stack_protect_test): Clarify that the operand for guard's address is > >>>>>> legal. > >>>>>> > >>>>>> *** gcc/testsuite/ChangeLog *** > >>>>>> > >>>>>> 2018-07-05 Thomas Preud'homme > >>>>>> > >>>>>> * gcc.target/arm/pr85434.c: New test. > >>>>>> > >>>>>> Is this ok for trunk? > >>>>>> > >>>>>> Best regards, > >>>>>> > >>>>>> Thomas > >>>>>> On Thu, 25 Oct 2018 at 15:54, Thomas Preudhomme > >>>>>> wrote: > >>>>>>> Good thing I did, found a missing earlyclobber in the process. > >>>>>>> Rerunning all tests again. > >>>>>>> > >>>>>>> Best regards, > >>>>>>> > >>>>>>> Thomas > >>>>>>> On Wed, 24 Oct 2018 at 10:13, Thomas Preudhomme > >>>>>>> wrote: > >>>>>>>> Please hold on for the reviews, found a small improvement that could > >>>>>>>> be done. Am testing it right now, should have something by tonight or > >>>>>>>> tomorrow. > >>>>>>>> > >>>>>>>> Best regards, > >>>>>>>> > >>>>>>>> Thomas > >>>>>>>> On Tue, 23 Oct 2018 at 13:35, Thomas Preudhomme > >>>>>>>> wrote: > >>>>>>>>> [Removing Jeff Law since middle end code hasn't changed] > >>>>>>>>> > >>>>>>>>> Hi, > >>>>>>>>> > >>>>>>>>> Given how memory operand are reloaded even with an X constraint, I've > >>>>>>>>> reworked the patch for the combined set and combined test instruction > >>>>>>>>> ot keep the mem out of the match_operand and used an expander to > >>>>>>>>> generate the right instruction pattern. I've also fixed some > >>>>>>>>> longstanding issues with the patch when flag_pic is true and with > >>>>>>>>> constraints for Thumb-1 that I hadn't noticed before due to using > >>>>>>>>> dg-cmp-results in conjunction with test_summary which does not show > >>>>>>>>> NA->FAIL (see [1]). > >>>>>>>>> > >>>>>>>>> All in all, I think the Arm code would do with a fresh review rather > >>>>>>>>> than looking at the changes since last posted version. (unchanged) > >>>>>>>>> ChangeLog entries are as follows: > >>>>>>>>> > >>>>>>>>> *** gcc/ChangeLog *** > >>>>>>>>> > >>>>>>>>> 2018-08-09 Thomas Preud'homme > >>>>>>>>> > >>>>>>>>> * target-insns.def (stack_protect_combined_set): Define new standard > >>>>>>>>> pattern name. > >>>>>>>>> (stack_protect_combined_test): Likewise. > >>>>>>>>> * cfgexpand.c (stack_protect_prologue): Try new > >>>>>>>>> stack_protect_combined_set pattern first. > >>>>>>>>> * function.c (stack_protect_epilogue): Try new > >>>>>>>>> stack_protect_combined_test pattern first. > >>>>>>>>> * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now > >>>>>>>>> parameters to control which register to use as PIC register and force > >>>>>>>>> reloading PIC register respectively. Insert in the stream of insns if > >>>>>>>>> possible. > >>>>>>>>> (legitimize_pic_address): Expose above new parameters in prototype and > >>>>>>>>> adapt recursive calls accordingly. Use pic_reg if non null instead of > >>>>>>>>> cached one. > >>>>>>>>> (arm_load_pic_register): Add pic_reg parameter and use it if non null. > >>>>>>>>> (arm_legitimize_address): Adapt to new legitimize_pic_address > >>>>>>>>> prototype. > >>>>>>>>> (thumb_legitimize_address): Likewise. > >>>>>>>>> (arm_emit_call_insn): Adapt to require_pic_register prototype change. > >>>>>>>>> (arm_expand_prologue): Adapt to arm_load_pic_register prototype change. > >>>>>>>>> (thumb1_expand_prologue): Likewise. > >>>>>>>>> * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype > >>>>>>>>> change. > >>>>>>>>> (arm_load_pic_register): Likewise. > >>>>>>>>> * config/arm/predicated.md (guard_addr_operand): New predicate. > >>>>>>>>> (guard_operand): New predicate. > >>>>>>>>> * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address > >>>>>>>>> prototype change. > >>>>>>>>> (builtin_setjmp_receiver expander): Adapt to thumb1_expand_prologue > >>>>>>>>> prototype change. > >>>>>>>>> (stack_protect_combined_set): New expander.. > >>>>>>>>> (stack_protect_combined_set_insn): New insn_and_split pattern. > >>>>>>>>> (stack_protect_set_insn): New insn pattern. > >>>>>>>>> (stack_protect_combined_test): New expander. > >>>>>>>>> (stack_protect_combined_test_insn): New insn_and_split pattern. > >>>>>>>>> (stack_protect_test_insn): New insn pattern. > >>>>>>>>> * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec. > >>>>>>>>> (UNSPEC_SP_TEST): Likewise. > >>>>>>>>> * doc/md.texi (stack_protect_combined_set): Document new standard > >>>>>>>>> pattern name. > >>>>>>>>> (stack_protect_set): Clarify that the operand for guard's address is > >>>>>>>>> legal. > >>>>>>>>> (stack_protect_combined_test): Document new standard pattern name. > >>>>>>>>> (stack_protect_test): Clarify that the operand for guard's address is > >>>>>>>>> legal. > >>>>>>>>> > >>>>>>>>> *** gcc/testsuite/ChangeLog *** > >>>>>>>>> > >>>>>>>>> 2018-07-05 Thomas Preud'homme > >>>>>>>>> > >>>>>>>>> * gcc.target/arm/pr85434.c: New test. > >>>>>>>>> > >>>>>>>>> Testing: Bootstrap and regression testing for Arm, Thumb-1 and Thumb-2 > >>>>>>>>> with (i) default flags, (ii) an extra -fstack-protect-all and (iii) > >>>>>>>>> -fPIC -fstack-protect-all. A glibc build and testsuite run was also > >>>>>>>>> performed for Arm and Thumb-2. Default flags show no regression and > >>>>>>>>> the other runs have some expected scan-assembler failing (due to stack > >>>>>>>>> protector or fPIC code sequence), as well as guality fail (due to less > >>>>>>>>> optimized code with the new stack protector code) and some execution > >>>>>>>>> failures in sibcall-9 and sibcall-10 under -fPIC -fstack-protector-all > >>>>>>>>> due to the PIC sequence for the global variable making the frame > >>>>>>>>> layout different for the 2 functions (these become PASS if making the > >>>>>>>>> global variable static). > >>>>>>>>> > >>>>>>>>> Is this ok for trunk? > >>>>>>>>> > >>>>>>>>> Best regards, > >>>>>>>>> > >>>>>>>>> Thomas > >>>>>>>>> > >>>>>>>>> [1] https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01412.html > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Tue, 25 Sep 2018 at 17:10, Kyrill Tkachov > >>>>>>>>> wrote: > >>>>>>>>>> Hi Thomas, > >>>>>>>>>> > >>>>>>>>>> On 29/08/18 10:51, Thomas Preudhomme wrote: > >>>>>>>>>>> Resend hopefully without HTML this time. > >>>>>>>>>>> > >>>>>>>>>>> On Wed, 29 Aug 2018 at 10:49, Thomas Preudhomme > >>>>>>>>>>> wrote: > >>>>>>>>>>>> Hi, > >>>>>>>>>>>> > >>>>>>>>>>>> I've reworked the patch fixing PR85434 (spilling of stack protector guard's address on ARM) to address the testsuite regression on powerpc and x86 as well as glibc testsuite regression on ARM. Issues were due to unconditionally attempting to generate the new patterns. The code now tests if there is a pattern for them for the target before generating them. In the ARM side of the patch, I've also added a more specific predicate for the new patterns. The new patch is found below. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> In case of high register pressure in PIC mode, address of the stack > >>>>>>>>>>>> protector's guard can be spilled on ARM targets as shown in PR85434, > >>>>>>>>>>>> thus allowing an attacker to control what the canary would be compared > >>>>>>>>>>>> against. ARM does lack stack_protect_set and stack_protect_test insn > >>>>>>>>>>>> patterns, defining them does not help as the address is expanded > >>>>>>>>>>>> regularly and the patterns only deal with the copy and test of the > >>>>>>>>>>>> guard with the canary. > >>>>>>>>>>>> > >>>>>>>>>>>> This problem does not occur for x86 targets because the PIC access and > >>>>>>>>>>>> the test can be done in the same instruction. Aarch64 is exempt too > >>>>>>>>>>>> because PIC access insn pattern are mov of UNSPEC which prevents it from > >>>>>>>>>>>> the second access in the epilogue being CSEd in cse_local pass with the > >>>>>>>>>>>> first access in the prologue. > >>>>>>>>>>>> > >>>>>>>>>>>> The approach followed here is to create new "combined" set and test > >>>>>>>>>>>> standard pattern names that take the unexpanded guard and do the set or > >>>>>>>>>>>> test. This allows the target to use an opaque pattern (eg. using UNSPEC) > >>>>>>>>>>>> to hide the individual instructions being generated to the compiler and > >>>>>>>>>>>> split the pattern into generic load, compare and branch instruction > >>>>>>>>>>>> after register allocator, therefore avoiding any spilling. This is here > >>>>>>>>>>>> implemented for the ARM targets. For targets not implementing these new > >>>>>>>>>>>> standard pattern names, the existing stack_protect_set and > >>>>>>>>>>>> stack_protect_test pattern names are used. > >>>>>>>>>>>> > >>>>>>>>>>>> To be able to split PIC access after register allocation, the functions > >>>>>>>>>>>> had to be augmented to force a new PIC register load and to control > >>>>>>>>>>>> which register it loads into. This is because sharing the PIC register > >>>>>>>>>>>> between prologue and epilogue could lead to spilling due to CSE again > >>>>>>>>>>>> which an attacker could use to control what the canary gets compared > >>>>>>>>>>>> against. > >>>>>>>>>>>> > >>>>>>>>>>>> ChangeLog entries are as follows: > >>>>>>>>>>>> > >>>>>>>>>>>> *** gcc/ChangeLog *** > >>>>>>>>>>>> > >>>>>>>>>>>> 2018-08-09 Thomas Preud'homme > >>>>>>>>>>>> > >>>>>>>>>>>> * target-insns.def (stack_protect_combined_set): Define new standard > >>>>>>>>>>>> pattern name. > >>>>>>>>>>>> (stack_protect_combined_test): Likewise. > >>>>>>>>>>>> * cfgexpand.c (stack_protect_prologue): Try new > >>>>>>>>>>>> stack_protect_combined_set pattern first. > >>>>>>>>>>>> * function.c (stack_protect_epilogue): Try new > >>>>>>>>>>>> stack_protect_combined_test pattern first. > >>>>>>>>>>>> * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now > >>>>>>>>>>>> parameters to control which register to use as PIC register and force > >>>>>>>>>>>> reloading PIC register respectively. Insert in the stream of insns if > >>>>>>>>>>>> possible. > >>>>>>>>>>>> (legitimize_pic_address): Expose above new parameters in prototype and > >>>>>>>>>>>> adapt recursive calls accordingly. > >>>>>>>>>>>> (arm_legitimize_address): Adapt to new legitimize_pic_address > >>>>>>>>>>>> prototype. > >>>>>>>>>>>> (thumb_legitimize_address): Likewise. > >>>>>>>>>>>> (arm_emit_call_insn): Adapt to new require_pic_register prototype. > >>>>>>>>>>>> * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype > >>>>>>>>>>>> change. > >>>>>>>>>>>> * config/arm/predicated.md (guard_operand): New predicate. > >>>>>>>>>> Typo, predicates.md is the filename. > >>>>>>>>>> > >>>>>>>>>> Looks ok to me otherwise. > >>>>>>>>>> Thank you for your patience. > >>>>>>>>>> > >>>>>>>>>> Kyrill > >>>>>>>>>> > >>>>>>>>>>>> * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address > >>>>>>>>>>>> prototype change. > >>>>>>>>>>>> (stack_protect_combined_set): New insn_and_split pattern. > >>>>>>>>>>>> (stack_protect_set): New insn pattern. > >>>>>>>>>>>> (stack_protect_combined_test): New insn_and_split pattern. > >>>>>>>>>>>> (stack_protect_test): New insn pattern. > >>>>>>>>>>>> * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec. > >>>>>>>>>>>> (UNSPEC_SP_TEST): Likewise. > >>>>>>>>>>>> * doc/md.texi (stack_protect_combined_set): Document new standard > >>>>>>>>>>>> pattern name. > >>>>>>>>>>>> (stack_protect_set): Clarify that the operand for guard's address is > >>>>>>>>>>>> legal. > >>>>>>>>>>>> (stack_protect_combined_test): Document new standard pattern name. > >>>>>>>>>>>> (stack_protect_test): Clarify that the operand for guard's address is > >>>>>>>>>>>> legal. > >>>>>>>>>>>> > >>>>>>>>>>>> *** gcc/testsuite/ChangeLog *** > >>>>>>>>>>>> > >>>>>>>>>>>> 2018-07-05 Thomas Preud'homme > >>>>>>>>>>>> > >>>>>>>>>>>> * gcc.target/arm/pr85434.c: New test. > >>>>>>>>>>>> Testing: > >>>>>>>>>>>> > >>>>>>>>>>>> native x86_64: bootstrap + testsuite -> no regression, can see failures with previous version of patch but not with new version > >>>>>>>>>>>> native powerpc64: bootstrap + testsuite -> no regression, can see failures from pr86834 with previous version of patch but not with new version > >>>>>>>>>>>> cross ARM Linux: build + testsuite -> no regression > >>>>>>>>>>>> native ARM Thumb-2: bootstrap + testsuite + glibc build + glibc test -> no regression > >>>>>>>>>>>> native ARM Arm: bootstrap + testsuite + glibc build + glibc test -> no regression > >>>>>>>>>>>> Aarch64: bootstrap + testsuite + glibc build + glibc test-> no regression > >>>>>>>>>>>> > >>>>>>>>>>>> Is this ok for trunk? > >>>>>>>>>>>> > >>>>>>>>>>>> Best regards, > >>>>>>>>>>>> > >>>>>>>>>>>> Thomas > >>> > >>> @@ -21998,7 +22039,7 @@ arm_expand_prologue (void) > >>> mask &= THUMB2_WORK_REGS; > >>> if (!IS_NESTED (func_type)) > >>> mask |= (1 << IP_REGNUM); > >>> - arm_load_pic_register (mask); > >>> + arm_load_pic_register (mask, 0); > >>> > >>> > >>> > >>> Please use NULL_RTX rather than 0 here and in the other occurrences in the patch. > >>> At a glance the changes look ok, but I'll have a deeper look later. > >>> > >>> Thanks, > >>> Kyrill > > diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c > index 35ca276e4ad..c8d0374f8ae 100644 > --- a/gcc/cfgexpand.c > +++ b/gcc/cfgexpand.c > @@ -6131,6 +6131,23 @@ stack_protect_prologue (void) > rtx x, y; > > x = expand_normal (crtl->stack_protect_guard); > + > + if (targetm.have_stack_protect_combined_set () && guard_decl) > + { > + gcc_assert (DECL_P (guard_decl)); > + y = DECL_RTL (guard_decl); > + > + /* Allow the target to compute address of Y and copy it to X without > + leaking Y into a register. This combined address + copy pattern > + allows the target to prevent spilling of any intermediate results by > + splitting it after register allocator. */ > + if (rtx_insn *insn = targetm.gen_stack_protect_combined_set (x, y)) > + { > + emit_insn (insn); > + return; > + } > + } > + > if (guard_decl) > y = expand_normal (guard_decl); > else > diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h > index 0dfb3ac59a6..f508bc5a455 100644 > --- a/gcc/config/arm/arm-protos.h > +++ b/gcc/config/arm/arm-protos.h > @@ -28,7 +28,7 @@ extern enum unwind_info_type arm_except_unwind_info (struct gcc_options *); > extern int use_return_insn (int, rtx); > extern bool use_simple_return_p (void); > extern enum reg_class arm_regno_class (int); > -extern void arm_load_pic_register (unsigned long); > +extern void arm_load_pic_register (unsigned long, rtx); > extern int arm_volatile_func (void); > extern void arm_expand_prologue (void); > extern void arm_expand_epilogue (bool); > @@ -69,7 +69,7 @@ extern int const_ok_for_dimode_op (HOST_WIDE_INT, enum rtx_code); > extern int arm_split_constant (RTX_CODE, machine_mode, rtx, > HOST_WIDE_INT, rtx, rtx, int); > extern int legitimate_pic_operand_p (rtx); > -extern rtx legitimize_pic_address (rtx, machine_mode, rtx); > +extern rtx legitimize_pic_address (rtx, machine_mode, rtx, rtx, bool); > extern rtx legitimize_tls_address (rtx, rtx); > extern bool arm_legitimate_address_p (machine_mode, rtx, bool); > extern int arm_legitimate_address_outer_p (machine_mode, rtx, RTX_CODE, int); > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > index 8810df53aa3..96b8150d34c 100644 > --- a/gcc/config/arm/arm.c > +++ b/gcc/config/arm/arm.c > @@ -7371,21 +7371,34 @@ legitimate_pic_operand_p (rtx x) > return 1; > } > > -/* Record that the current function needs a PIC register. Initialize > - cfun->machine->pic_reg if we have not already done so. */ > +/* Record that the current function needs a PIC register. If PIC_REG is null, > + a new pseudo is allocated as PIC register, otherwise PIC_REG is used. In > + both case cfun->machine->pic_reg is initialized if we have not already done > + so. COMPUTE_NOW decide whether and where to set the PIC register. If true, > + PIC register is reloaded in the current position of the instruction stream > + irregardless of whether it was loaded before. Otherwise, it is only loaded > + if not already done so (crtl->uses_pic_offset_table is null). Note that > + nonnull PIC_REG is only supported iff COMPUTE_NOW is true and null PIC_REG > + is only supported iff COMPUTE_NOW is false. */ > > static void > -require_pic_register (void) > +require_pic_register (rtx pic_reg, bool compute_now) > { > + gcc_assert (compute_now == (pic_reg != NULL_RTX)); > + > /* A lot of the logic here is made obscure by the fact that this > routine gets called as part of the rtx cost estimation process. > We don't want those calls to affect any assumptions about the real > function; and further, we can't call entry_of_function() until we > start the real expansion process. */ > - if (!crtl->uses_pic_offset_table) > + if (!crtl->uses_pic_offset_table || compute_now) > { > - gcc_assert (can_create_pseudo_p ()); > + gcc_assert (can_create_pseudo_p () > + || (pic_reg != NULL_RTX > + && REG_P (pic_reg) > + && GET_MODE (pic_reg) == Pmode)); > if (arm_pic_register != INVALID_REGNUM > + && !compute_now > && !(TARGET_THUMB1 && arm_pic_register > LAST_LO_REGNUM)) > { > if (!cfun->machine->pic_reg) > @@ -7401,8 +7414,10 @@ require_pic_register (void) > { > rtx_insn *seq, *insn; > > + if (pic_reg == NULL_RTX) > + pic_reg = gen_reg_rtx (Pmode); > if (!cfun->machine->pic_reg) > - cfun->machine->pic_reg = gen_reg_rtx (Pmode); > + cfun->machine->pic_reg = pic_reg; > > /* Play games to avoid marking the function as needing pic > if we are being called as part of the cost-estimation > @@ -7413,11 +7428,12 @@ require_pic_register (void) > start_sequence (); > > if (TARGET_THUMB1 && arm_pic_register != INVALID_REGNUM > - && arm_pic_register > LAST_LO_REGNUM) > + && arm_pic_register > LAST_LO_REGNUM > + && !compute_now) > emit_move_insn (cfun->machine->pic_reg, > gen_rtx_REG (Pmode, arm_pic_register)); > else > - arm_load_pic_register (0UL); > + arm_load_pic_register (0UL, pic_reg); > > seq = get_insns (); > end_sequence (); > @@ -7430,16 +7446,33 @@ require_pic_register (void) > we can't yet emit instructions directly in the final > insn stream. Queue the insns on the entry edge, they will > be committed after everything else is expanded. */ > - insert_insn_on_edge (seq, > - single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun))); > + if (currently_expanding_to_rtl) > + insert_insn_on_edge (seq, > + single_succ_edge > + (ENTRY_BLOCK_PTR_FOR_FN (cfun))); > + else > + emit_insn (seq); > } > } > } > } > > +/* Legitimize PIC load to ORIG into REG. If REG is NULL, a new pseudo is > + created to hold the result of the load. If not NULL, PIC_REG indicates > + which register to use as PIC register, otherwise it is decided by register > + allocator. COMPUTE_NOW forces the PIC register to be loaded at the current > + location in the instruction stream, irregardless of whether it was loaded > + previously. Note that nonnull PIC_REG is only supported iff COMPUTE_NOW is > + true and null PIC_REG is only supported iff COMPUTE_NOW is false. > + > + Returns the register REG into which the PIC load is performed. */ > + > rtx > -legitimize_pic_address (rtx orig, machine_mode mode, rtx reg) > +legitimize_pic_address (rtx orig, machine_mode mode, rtx reg, rtx pic_reg, > + bool compute_now) > { > + gcc_assert (compute_now == (pic_reg != NULL_RTX)); > + > if (GET_CODE (orig) == SYMBOL_REF > || GET_CODE (orig) == LABEL_REF) > { > @@ -7472,9 +7505,12 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg) > rtx mem; > > /* If this function doesn't have a pic register, create one now. */ > - require_pic_register (); > + require_pic_register (pic_reg, compute_now); > + > + if (pic_reg == NULL_RTX) > + pic_reg = cfun->machine->pic_reg; > > - pat = gen_calculate_pic_address (reg, cfun->machine->pic_reg, orig); > + pat = gen_calculate_pic_address (reg, pic_reg, orig); > > /* Make the MEM as close to a constant as possible. */ > mem = SET_SRC (pat); > @@ -7523,9 +7559,11 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg) > > gcc_assert (GET_CODE (XEXP (orig, 0)) == PLUS); > > - base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg); > + base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg, > + pic_reg, compute_now); > offset = legitimize_pic_address (XEXP (XEXP (orig, 0), 1), Pmode, > - base == reg ? 0 : reg); > + base == reg ? 0 : reg, pic_reg, > + compute_now); > > if (CONST_INT_P (offset)) > { > @@ -7625,16 +7663,17 @@ static GTY(()) int pic_labelno; > low register. */ > > void > -arm_load_pic_register (unsigned long saved_regs ATTRIBUTE_UNUSED) > +arm_load_pic_register (unsigned long saved_regs ATTRIBUTE_UNUSED, rtx pic_reg) > { > - rtx l1, labelno, pic_tmp, pic_rtx, pic_reg; > + rtx l1, labelno, pic_tmp, pic_rtx; > > if (crtl->uses_pic_offset_table == 0 || TARGET_SINGLE_PIC_BASE) > return; > > gcc_assert (flag_pic); > > - pic_reg = cfun->machine->pic_reg; > + if (pic_reg == NULL_RTX) > + pic_reg = cfun->machine->pic_reg; > if (TARGET_VXWORKS_RTP) > { > pic_rtx = gen_rtx_SYMBOL_REF (Pmode, VXWORKS_GOTT_BASE); > @@ -8710,7 +8749,8 @@ arm_legitimize_address (rtx x, rtx orig_x, machine_mode mode) > { > /* We need to find and carefully transform any SYMBOL and LABEL > references; so go back to the original address expression. */ > - rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX); > + rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX, > + false /*compute_now*/); > > if (new_x != orig_x) > x = new_x; > @@ -8778,7 +8818,8 @@ thumb_legitimize_address (rtx x, rtx orig_x, machine_mode mode) > { > /* We need to find and carefully transform any SYMBOL and LABEL > references; so go back to the original address expression. */ > - rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX); > + rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX, > + false /*compute_now*/); > > if (new_x != orig_x) > x = new_x; > @@ -18066,7 +18107,7 @@ arm_emit_call_insn (rtx pat, rtx addr, bool sibcall) > ? !targetm.binds_local_p (SYMBOL_REF_DECL (addr)) > : !SYMBOL_REF_LOCAL_P (addr))) > { > - require_pic_register (); > + require_pic_register (NULL_RTX, false /*compute_now*/); > use_reg (&CALL_INSN_FUNCTION_USAGE (insn), cfun->machine->pic_reg); > } > > @@ -21998,7 +22039,7 @@ arm_expand_prologue (void) > mask &= THUMB2_WORK_REGS; > if (!IS_NESTED (func_type)) > mask |= (1 << IP_REGNUM); > - arm_load_pic_register (mask); > + arm_load_pic_register (mask, NULL_RTX); > } > > /* If we are profiling, make sure no instructions are scheduled before > @@ -25229,7 +25270,7 @@ thumb1_expand_prologue (void) > /* Load the pic register before setting the frame pointer, > so we can use r7 as a temporary work register. */ > if (flag_pic && arm_pic_register != INVALID_REGNUM) > - arm_load_pic_register (live_regs_mask); > + arm_load_pic_register (live_regs_mask, NULL_RTX); > > if (!frame_pointer_needed && CALLER_INTERWORKING_SLOT_SIZE > 0) > emit_move_insn (gen_rtx_REG (Pmode, ARM_HARD_FRAME_POINTER_REGNUM), > diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md > index 270b8e454b3..1f702f81fd1 100644 > --- a/gcc/config/arm/arm.md > +++ b/gcc/config/arm/arm.md > @@ -6021,7 +6021,8 @@ > operands[1] = legitimize_pic_address (operands[1], SImode, > (!can_create_pseudo_p () > ? operands[0] > - : 0)); > + : NULL_RTX), NULL_RTX, > + false /*compute_now*/); > } > " > ) > @@ -6309,7 +6310,7 @@ > /* r3 is clobbered by set/longjmp, so we can use it as a scratch > register. */ > if (arm_pic_register != INVALID_REGNUM) > - arm_load_pic_register (1UL << 3); > + arm_load_pic_register (1UL << 3, NULL_RTX); > DONE; > }") > > @@ -8634,6 +8635,163 @@ > (set_attr "conds" "clob")] > ) > > +;; Named patterns for stack smashing protection. > +(define_expand "stack_protect_combined_set" > + [(parallel > + [(set (match_operand:SI 0 "memory_operand" "") > + (unspec:SI [(match_operand:SI 1 "guard_operand" "")] > + UNSPEC_SP_SET)) > + (clobber (match_scratch:SI 2 "")) > + (clobber (match_scratch:SI 3 ""))])] > + "" > + "" > +) > + > +;; Use a separate insn from the above expand to be able to have the mem outside > +;; the operand #1 when register allocation comes. This is needed to avoid LRA > +;; try to reload the guard since we need to control how PIC access is done in > +;; the -fpic/-fPIC case (see COMPUTE_NOW parameter when calling > +;; legitimize_pic_address ()). > +(define_insn_and_split "*stack_protect_combined_set_insn" > + [(set (match_operand:SI 0 "memory_operand" "=m,m") > + (unspec:SI [(mem:SI (match_operand:SI 1 "guard_addr_operand" "X,X"))] > + UNSPEC_SP_SET)) > + (clobber (match_scratch:SI 2 "=&l,&r")) > + (clobber (match_scratch:SI 3 "=&l,&r"))] > + "" > + "#" > + "reload_completed" > + [(parallel [(set (match_dup 0) (unspec:SI [(mem:SI (match_dup 2))] > + UNSPEC_SP_SET)) > + (clobber (match_dup 2))])] > + " > +{ > + if (flag_pic) > + { > + /* Forces recomputing of GOT base now. */ > + legitimize_pic_address (operands[1], SImode, operands[2], operands[3], > + true /*compute_now*/); > + } > + else > + { > + if (address_operand (operands[1], SImode)) > + operands[2] = operands[1]; > + else > + { > + rtx mem = XEXP (force_const_mem (SImode, operands[1]), 0); > + emit_move_insn (operands[2], mem); > + } > + } > +}" > + [(set_attr "arch" "t1,32")] > +) > + > +(define_insn "*stack_protect_set_insn" > + [(set (match_operand:SI 0 "memory_operand" "=m,m") > + (unspec:SI [(mem:SI (match_operand:SI 1 "register_operand" "+&l,&r"))] > + UNSPEC_SP_SET)) > + (clobber (match_dup 1))] > + "" > + "@ > + ldr\\t%1, [%1]\;str\\t%1, %0\;movs\t%1,#0 > + ldr\\t%1, [%1]\;str\\t%1, %0\;mov\t%1,#0" > + [(set_attr "length" "8,12") > + (set_attr "conds" "clob,nocond") > + (set_attr "type" "multiple") > + (set_attr "arch" "t1,32")] > +) > + > +(define_expand "stack_protect_combined_test" > + [(parallel > + [(set (pc) > + (if_then_else > + (eq (match_operand:SI 0 "memory_operand" "") > + (unspec:SI [(match_operand:SI 1 "guard_operand" "")] > + UNSPEC_SP_TEST)) > + (label_ref (match_operand 2)) > + (pc))) > + (clobber (match_scratch:SI 3 "")) > + (clobber (match_scratch:SI 4 "")) > + (clobber (reg:CC CC_REGNUM))])] > + "" > + "" > +) > + > +;; Use a separate insn from the above expand to be able to have the mem outside > +;; the operand #1 when register allocation comes. This is needed to avoid LRA > +;; try to reload the guard since we need to control how PIC access is done in > +;; the -fpic/-fPIC case (see COMPUTE_NOW parameter when calling > +;; legitimize_pic_address ()). > +(define_insn_and_split "*stack_protect_combined_test_insn" > + [(set (pc) > + (if_then_else > + (eq (match_operand:SI 0 "memory_operand" "m,m") > + (unspec:SI [(mem:SI (match_operand:SI 1 "guard_addr_operand" "X,X"))] > + UNSPEC_SP_TEST)) > + (label_ref (match_operand 2)) > + (pc))) > + (clobber (match_scratch:SI 3 "=&l,&r")) > + (clobber (match_scratch:SI 4 "=&l,&r")) > + (clobber (reg:CC CC_REGNUM))] > + "" > + "#" > + "reload_completed" > + [(const_int 0)] > +{ > + rtx eq; > + > + if (flag_pic) > + { > + /* Forces recomputing of GOT base now. */ > + legitimize_pic_address (operands[1], SImode, operands[3], operands[4], > + true /*compute_now*/); > + } > + else > + { > + if (address_operand (operands[1], SImode)) > + operands[3] = operands[1]; > + else > + { > + rtx mem = XEXP (force_const_mem (SImode, operands[1]), 0); > + emit_move_insn (operands[3], mem); > + } > + } > + if (TARGET_32BIT) > + { > + emit_insn (gen_arm_stack_protect_test_insn (operands[4], operands[0], > + operands[3])); > + rtx cc_reg = gen_rtx_REG (CC_Zmode, CC_REGNUM); > + eq = gen_rtx_EQ (CC_Zmode, cc_reg, const0_rtx); > + emit_jump_insn (gen_arm_cond_branch (operands[2], eq, cc_reg)); > + } > + else > + { > + emit_insn (gen_thumb1_stack_protect_test_insn (operands[4], operands[0], > + operands[3])); > + eq = gen_rtx_EQ (VOIDmode, operands[4], const0_rtx); > + emit_jump_insn (gen_cbranchsi4 (eq, operands[4], const0_rtx, > + operands[2])); > + } > + DONE; > +} > + [(set_attr "arch" "t1,32")] > +) > + > +(define_insn "arm_stack_protect_test_insn" > + [(set (reg:CC_Z CC_REGNUM) > + (compare:CC_Z (unspec:SI [(match_operand:SI 1 "memory_operand" "m,m") > + (mem:SI (match_operand:SI 2 "register_operand" "+l,r"))] > + UNSPEC_SP_TEST) > + (const_int 0))) > + (clobber (match_operand:SI 0 "register_operand" "=&l,&r")) > + (clobber (match_dup 2))] > + "TARGET_32BIT" > + "ldr\t%0, [%2]\;ldr\t%2, %1\;eors\t%0, %2, %0" > + [(set_attr "length" "8,12") > + (set_attr "type" "multiple") > + (set_attr "arch" "t,32")] > > > I believe this needs to set the "conds" attribute to "set" so that the final_prescan stuff handles it correctly. > > Ok with that change. > > Thanks, > Kyrill >