From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-482749-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 126396 invoked by alias); 31 Jul 2018 13:36:44 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 126385 invoked by uid 89); 31 Jul 2018 13:36:43 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-24.4 required=5.0 tests=BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_LAZY_DOMAIN_SECURITY,KAM_STOCKGEN autolearn=ham version=3.3.2 spammy=knew, Play, Forces
X-HELO: foss.arm.com
Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 31 Jul 2018 13:36:38 +0000
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249])	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1A43A7A9;	Tue, 31 Jul 2018 06:36:37 -0700 (PDT)
Received: from [10.2.207.77] (e100706-lin.cambridge.arm.com [10.2.207.77])	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 0881F3F5D0;	Tue, 31 Jul 2018 06:36:35 -0700 (PDT)
Message-ID: <5B6065E2.2030203@foss.arm.com>
Date: Tue, 31 Jul 2018 13:36:00 -0000
From: Kyrill  Tkachov <kyrylo.tkachov@foss.arm.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0
MIME-Version: 1.0
To: Thomas Preudhomme <thomas.preudhomme@linaro.org>
CC: Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>,  Richard Earnshaw <Richard.Earnshaw@arm.com>, gcc-patches@gcc.gnu.org
Subject: Re: [PATCH, ARM] PR85434: Prevent spilling of stack protector guard's address on ARM
References: <CAKnkMGuvuevF5tT-_9D2cnLq2GFdG8ncaKVO8AdD_M0KtzOXUg@mail.gmail.com> <8ba1a628-50de-f87a-0b83-5067d5178c22@redhat.com> <CAKnkMGvk9O4734+wVjGCnTT72yHTF_DAvGA06ip6RVqKLykk5Q@mail.gmail.com> <5B506FBE.1050701@foss.arm.com> <CAKnkMGucnnYBMFmP+7CoMVHQKCO+tw8iN4Ex7b3rjhPBBo4+3Q@mail.gmail.com> <CAKnkMGvU9ZQhGPab2tasXyAezM9KDTp2KKj-WD1xp2EQ8F4T1w@mail.gmail.com>
In-Reply-To: <CAKnkMGvU9ZQhGPab2tasXyAezM9KDTp2KKj-WD1xp2EQ8F4T1w@mail.gmail.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-SW-Source: 2018-07/txt/msg01946.txt.bz2

Hi Thomas,

On 25/07/18 14:28, Thomas Preudhomme wrote:
> Hi Kyrill,
>
> Using memory_operand worked, the issues I encountered when using it in
> earlier versions of the patch must have been due to the missing test
> on address_operand in the preparation statements which I added later.
> Please find an updated patch in attachment. ChangeLog entry is as
> follows:
>
> *** gcc/ChangeLog ***
>
> 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
>
>      * target-insns.def (stack_protect_combined_set): Define new standard
>      pattern name.
>      (stack_protect_combined_test): Likewise.
>      * cfgexpand.c (stack_protect_prologue): Try new
>      stack_protect_combined_set pattern first.
>      * function.c (stack_protect_epilogue): Try new
>      stack_protect_combined_test pattern first.
>      * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
>      parameters to control which register to use as PIC register and force
>      reloading PIC register respectively.  Insert in the stream of insns if
>      possible.
>      (legitimize_pic_address): Expose above new parameters in prototype and
>      adapt recursive calls accordingly.
>      (arm_legitimize_address): Adapt to new legitimize_pic_address
>      prototype.
>      (thumb_legitimize_address): Likewise.
>      (arm_emit_call_insn): Adapt to new require_pic_register prototype.
>      * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
>      change.
>      * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
>      prototype change.
>      (stack_protect_combined_set): New insn_and_split pattern.
>      (stack_protect_set): New insn pattern.
>      (stack_protect_combined_test): New insn_and_split pattern.
>      (stack_protect_test): New insn pattern.
>      * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
>      (UNSPEC_SP_TEST): Likewise.
>      * doc/md.texi (stack_protect_combined_set): Document new standard
>      pattern name.
>      (stack_protect_set): Clarify that the operand for guard's address is
>      legal.
>      (stack_protect_combined_test): Document new standard pattern name.
>      (stack_protect_test): Clarify that the operand for guard's address is
>      legal.
>
> *** gcc/testsuite/ChangeLog ***
>
> 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
>
>      * gcc.target/arm/pr85434.c: New test.
>
> Bootstrapped again for Arm and Thumb-2 and regtested with and without
> -fstack-protector-all without any regression.

This looks ok to me now.
Thank you for your patience and addressing my comments from before.

Kyrill

> Best regards,
>
> Thomas
> On Thu, 19 Jul 2018 at 17:34, Thomas Preudhomme
> <thomas.preudhomme@linaro.org> wrote:
>> [Dropping Jeff Law from the list since he already commented on the
>> middle end parts]
>>
>> Hi Kyrill,
>>
>> On Thu, 19 Jul 2018 at 12:02, Kyrill Tkachov
>> <kyrylo.tkachov@foss.arm.com> wrote:
>>> Hi Thomas,
>>>
>>> On 17/07/18 12:02, Thomas Preudhomme wrote:
>>>> Fixed in attached patch. ChangeLog entries are unchanged:
>>>>
>>>> *** gcc/ChangeLog ***
>>>>
>>>> 2018-07-05  Thomas Preud'homme <thomas.preudhomme@linaro.org>
>>>>
>>>>      PR target/85434
>>>>      * target-insns.def (stack_protect_combined_set): Define new standard
>>>>      pattern name.
>>>>      (stack_protect_combined_test): Likewise.
>>>>      * cfgexpand.c (stack_protect_prologue): Try new
>>>>      stack_protect_combined_set pattern first.
>>>>      * function.c (stack_protect_epilogue): Try new
>>>>      stack_protect_combined_test pattern first.
>>>>      * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
>>>>      parameters to control which register to use as PIC register and force
>>>>      reloading PIC register respectively.
>>>>      (legitimize_pic_address): Expose above new parameters in prototype and
>>>>      adapt recursive calls accordingly.
>>>>      (arm_legitimize_address): Adapt to new legitimize_pic_address
>>>>      prototype.
>>>>      (thumb_legitimize_address): Likewise.
>>>>      (arm_emit_call_insn): Adapt to new require_pic_register prototype.
>>>>      * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
>>>>      change.
>>>>      * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
>>>>      prototype change.
>>>>      (stack_protect_combined_set): New insn_and_split pattern.
>>>>      (stack_protect_set): New insn pattern.
>>>>      (stack_protect_combined_test): New insn_and_split pattern.
>>>>      (stack_protect_test): New insn pattern.
>>>>      * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
>>>>      (UNSPEC_SP_TEST): Likewise.
>>>>      * doc/md.texi (stack_protect_combined_set): Document new standard
>>>>      pattern name.
>>>>      (stack_protect_set): Clarify that the operand for guard's address is
>>>>      legal.
>>>>      (stack_protect_combined_test): Document new standard pattern name.
>>>>      (stack_protect_test): Clarify that the operand for guard's address is
>>>>      legal.
>>>>
>>>> *** gcc/testsuite/ChangeLog ***
>>>>
>>>> 2018-07-05  Thomas Preud'homme <thomas.preudhomme@linaro.org>
>>>>
>>>>      PR target/85434
>>>>      * gcc.target/arm/pr85434.c: New test.
>>>>
>>> Sorry for the delay. Some comments inline.
>>>
>>> Kyrill
>>>
>>> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
>>> index d6e3c382085..d1a893ac56e 100644
>>> --- a/gcc/cfgexpand.c
>>> +++ b/gcc/cfgexpand.c
>>> @@ -6105,8 +6105,18 @@ stack_protect_prologue (void)
>>>    {
>>>      tree guard_decl = targetm.stack_protect_guard ();
>>>      rtx x, y;
>>> +  struct expand_operand ops[2];
>>>
>>>      x = expand_normal (crtl->stack_protect_guard);
>>> +  create_fixed_operand (&ops[0], x);
>>> +  create_fixed_operand (&ops[1], DECL_RTL (guard_decl));
>>> +  /* Allow the target to compute address of Y and copy it to X without
>>> +     leaking Y into a register.  This combined address + copy pattern allows
>>> +     the target to prevent spilling of any intermediate results by splitting
>>> +     it after register allocator.  */
>>> +  if (maybe_expand_insn (targetm.code_for_stack_protect_combined_set, 2, ops))
>>> +    return;
>>> +
>>>      if (guard_decl)
>>>        y = expand_normal (guard_decl);
>>>      else
>>> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
>>> index 8537262ce64..100844e659c 100644
>>> --- a/gcc/config/arm/arm-protos.h
>>> +++ b/gcc/config/arm/arm-protos.h
>>> @@ -67,7 +67,7 @@ extern int const_ok_for_dimode_op (HOST_WIDE_INT, enum rtx_code);
>>>    extern int arm_split_constant (RTX_CODE, machine_mode, rtx,
>>>                                 HOST_WIDE_INT, rtx, rtx, int);
>>>    extern int legitimate_pic_operand_p (rtx);
>>> -extern rtx legitimize_pic_address (rtx, machine_mode, rtx);
>>> +extern rtx legitimize_pic_address (rtx, machine_mode, rtx, rtx, bool);
>>>    extern rtx legitimize_tls_address (rtx, rtx);
>>>    extern bool arm_legitimate_address_p (machine_mode, rtx, bool);
>>>    extern int arm_legitimate_address_outer_p (machine_mode, rtx, RTX_CODE, int);
>>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>>> index ec3abbcba9f..f4a970580c2 100644
>>> --- a/gcc/config/arm/arm.c
>>> +++ b/gcc/config/arm/arm.c
>>> @@ -7369,20 +7369,26 @@ legitimate_pic_operand_p (rtx x)
>>>    }
>>>
>>>    /* Record that the current function needs a PIC register.  Initialize
>>> -   cfun->machine->pic_reg if we have not already done so.  */
>>> +   cfun->machine->pic_reg if we have not already done so.
>>> +
>>> +   If not NULL, PIC_REG indicates which register to use as PIC register,
>>> +   otherwise it is decided by register allocator.  COMPUTE_NOW forces the PIC
>>> +   register to be loaded, irregardless of whether it was loaded previously.  */
>>>
>>>    static void
>>> -require_pic_register (void)
>>> +require_pic_register (rtx pic_reg, bool compute_now)
>>>    {
>>>      /* A lot of the logic here is made obscure by the fact that this
>>>         routine gets called as part of the rtx cost estimation process.
>>>         We don't want those calls to affect any assumptions about the real
>>>         function; and further, we can't call entry_of_function() until we
>>>         start the real expansion process.  */
>>> -  if (!crtl->uses_pic_offset_table)
>>> +  if (!crtl->uses_pic_offset_table || compute_now)
>>>        {
>>> -      gcc_assert (can_create_pseudo_p ());
>>> +      gcc_assert (can_create_pseudo_p ()
>>> +                 || (pic_reg != NULL_RTX && GET_MODE (pic_reg) == Pmode));
>>>          if (arm_pic_register != INVALID_REGNUM
>>> +         && can_create_pseudo_p ()
>>>            && !(TARGET_THUMB1 && arm_pic_register > LAST_LO_REGNUM))
>>>          {
>>>            if (!cfun->machine->pic_reg)
>>> @@ -7399,7 +7405,8 @@ require_pic_register (void)
>>>            rtx_insn *seq, *insn;
>>>
>>>            if (!cfun->machine->pic_reg)
>>> -           cfun->machine->pic_reg = gen_reg_rtx (Pmode);
>>> +           cfun->machine->pic_reg =
>>> +             can_create_pseudo_p () ? gen_reg_rtx (Pmode) : pic_reg;
>>>
>>>
>>> I'll admit I'm not too familiar with this code, but don't you want to be setting cfun->machine->pic_reg to the pic_reg that
>>> was passed into this function (if pic_reg != NULL as per the function comment)?
>> Indeed the comment needs to be fixed. I think it only makes sense to
>> specify a register after register allocation when the amount of
>> register is finite. Before that I don't see why one would want to do
>> that. I've changed the sentence about PIC_REG in the heading comment
>> to:
>>
>> "A new pseudo register is used for the PIC register if possible,
>> otherwise PIC_REG must be non NULL and is used instead."
>>
>>>
>>>            /* Play games to avoid marking the function as needing pic
>>>               if we are being called as part of the cost-estimation
>>> @@ -7410,7 +7417,8 @@ require_pic_register (void)
>>>                start_sequence ();
>>>
>>>                if (TARGET_THUMB1 && arm_pic_register != INVALID_REGNUM
>>> -                 && arm_pic_register > LAST_LO_REGNUM)
>>> +                 && arm_pic_register > LAST_LO_REGNUM
>>> +                 && can_create_pseudo_p ())
>>>                  emit_move_insn (cfun->machine->pic_reg,
>>>                                  gen_rtx_REG (Pmode, arm_pic_register));
>>>                else
>>> @@ -7427,15 +7435,29 @@ require_pic_register (void)
>>>                   we can't yet emit instructions directly in the final
>>>                   insn stream.  Queue the insns on the entry edge, they will
>>>                   be committed after everything else is expanded.  */
>>> -             insert_insn_on_edge (seq,
>>> -                                  single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
>>> +             if (can_create_pseudo_p ())
>>> +               insert_insn_on_edge (seq,
>>> +                                    single_succ_edge
>>> +                                    (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
>>> +             else
>>> +               emit_insn (seq);
>>>              }
>>>
>>> It's not the capability for creating pseudos that is relevant here, but whether we have an instruction
>>> stream to insert insns on. Do you think that using currently_expanding_to_rtl would work as a condition?
>> Indeed it's not the right test. Do you mean testing if
>> currently_expanding_to_rtl is false? I used can_create_pseudo_p
>> because I knew the function could never be called when it was false
>> before and thus my changes would only impact the code I was changing.
>>
>>>
>>>    +/* Legimitize PIC load to ORIG into REG.  If REG is NULL, a new pseudo is
>>>
>>> "Legitimize"
>>>
>>>    +   created to hold the result of the load.  If not NULL, PIC_REG indicates
>>> +   which register to use as PIC register, otherwise it is decided by register
>>> +   allocator.  COMPUTE_NOW forces the PIC register to be loaded at the current
>>> +   location in the instruction stream, irregardless of whether it was loaded
>>> +   previously.
>>> +
>>> +   Returns the register REG into which the PIC load is performed.  */
>>> +
>>>    rtx
>>> -legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
>>> +legitimize_pic_address (rtx orig, machine_mode mode, rtx reg, rtx pic_reg,
>>> +                       bool compute_now)
>>>    {
>>>      if (GET_CODE (orig) == SYMBOL_REF
>>>          || GET_CODE (orig) == LABEL_REF)
>>> @@ -7469,7 +7491,7 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
>>>            rtx mem;
>>>
>>>            /* If this function doesn't have a pic register, create one now.  */
>>> -         require_pic_register ();
>>> +         require_pic_register (pic_reg, compute_now);
>>>
>>>            pat = gen_calculate_pic_address (reg, cfun->machine->pic_reg, orig);
>>>
>>> @@ -7520,9 +7542,11 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
>>>
>>>          gcc_assert (GET_CODE (XEXP (orig, 0)) == PLUS);
>>>
>>> -      base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg);
>>> +      base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg,
>>> +                                    pic_reg, compute_now);
>>>          offset = legitimize_pic_address (XEXP (XEXP (orig, 0), 1), Pmode,
>>> -                                      base == reg ? 0 : reg);
>>> +                                      base == reg ? 0 : reg, pic_reg,
>>> +                                      compute_now);
>>>
>>>          if (CONST_INT_P (offset))
>>>          {
>>> @@ -8707,7 +8731,8 @@ arm_legitimize_address (rtx x, rtx orig_x, machine_mode mode)
>>>        {
>>>          /* We need to find and carefully transform any SYMBOL and LABEL
>>>           references; so go back to the original address expression.  */
>>> -      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX);
>>> +      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX,
>>> +                                         false /*compute_now*/);
>>>
>>>          if (new_x != orig_x)
>>>          x = new_x;
>>> @@ -8775,7 +8800,8 @@ thumb_legitimize_address (rtx x, rtx orig_x, machine_mode mode)
>>>        {
>>>          /* We need to find and carefully transform any SYMBOL and LABEL
>>>           references; so go back to the original address expression.  */
>>> -      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX);
>>> +      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX,
>>> +                                         false /*compute_now*/);
>>>
>>>          if (new_x != orig_x)
>>>          x = new_x;
>>> @@ -18059,7 +18085,7 @@ arm_emit_call_insn (rtx pat, rtx addr, bool sibcall)
>>>            ? !targetm.binds_local_p (SYMBOL_REF_DECL (addr))
>>>            : !SYMBOL_REF_LOCAL_P (addr)))
>>>        {
>>> -      require_pic_register ();
>>> +      require_pic_register (NULL_RTX, false /*compute_now*/);
>>>          use_reg (&CALL_INSN_FUNCTION_USAGE (insn), cfun->machine->pic_reg);
>>>        }
>>>
>>> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
>>> index 361a02668b0..9582c3fbb11 100644
>>> --- a/gcc/config/arm/arm.md
>>> +++ b/gcc/config/arm/arm.md
>>> @@ -6021,7 +6021,8 @@
>>>          operands[1] = legitimize_pic_address (operands[1], SImode,
>>>                                              (!can_create_pseudo_p ()
>>>                                               ? operands[0]
>>> -                                            : 0));
>>> +                                            : NULL_RTX), NULL_RTX,
>>> +                                           false /*compute_now*/);
>>>      }
>>>      "
>>>    )
>>> @@ -8634,6 +8635,95 @@
>>>       (set_attr "conds" "clob")]
>>>    )
>>>
>>> +;; Named patterns for stack smashing protection.
>>> +(define_insn_and_split "stack_protect_combined_set"
>>> +  [(set (match_operand:SI 0 "memory_operand" "=m")
>>> +       (unspec:SI [(match_operand:SI 1 "general_operand" "X")]
>>> +                  UNSPEC_SP_SET))
>>> +   (match_scratch:SI 2 "=r")
>>> +   (match_scratch:SI 3 "=r")]
>>> +  ""
>>> +  "#"
>>> +  "reload_completed"
>>> +  [(parallel [(set (match_dup 0) (unspec:SI [(mem:SI (match_dup 2))]
>>> +                                           UNSPEC_SP_SET))
>>> +             (clobber (match_dup 2))])]
>>> +  "
>>> +{
>>> +  rtx addr = XEXP (operands[1], 0);
>>>
>>>
>>> I think you want to tighten the predicate of operands[1].
>>> general_operand is very, well, general, but you're taking its 0th subtree, which it may not necessarily have.
>> I was afraid you'd ask that. Yes indeed a better predicate should be
>> used but I'm not sure all the possible cases here. However it has to
>> be some sort of memory since this is the address of the guard. I think
>> the problem with using memory_operand was that it could be a MEM of a
>> SYMBOL_REF which is not legitimate at this point.
>>
>>>    +  if (flag_pic)
>>> +    {
>>> +      /* Forces recomputing of GOT base now.  */
>>> +      operands[1] = legitimize_pic_address (addr, SImode, operands[2],
>>> +                                           operands[3], true /*compute_now*/);
>>> +    }
>>> +  else
>>> +    {
>>> +      if (!address_operand (addr, SImode))
>>> +       operands[1] = force_const_mem (SImode, addr);
>>> +      emit_move_insn (operands[2], operands[1]);
>>> +    }
>>> +}"
>>> +)
>>> +
>>> +(define_insn "stack_protect_set"
>>> +  [(set (match_operand:SI 0 "memory_operand" "=m")
>>> +       (unspec:SI [(mem:SI (match_operand:SI 1 "register_operand" "r"))]
>>> +        UNSPEC_SP_SET))
>>> +   (clobber (match_dup 1))]
>>>
>>> I'm not familiar with this idiom, don't you instead want to mark the constraint on operand 1 as earlyclobber (or read-write)?
>> According to the documentation earlyclobber applies to operands being
>> written to and this is an input operand only. As the doc says the
>> earlyclobber indicates to the compiler that it cannot reuse a register
>> used for an input operand to put the output operand into. Here I'm
>> trying to express instead that the operand needs to be saved/restored
>> if needed after this insn because it gets written to. This is only
>> needed in Thumb-1 because stack_protect_set_combined and
>> stack_protect_test_combined already use all registers available so I
>> cannot ask for an extra clobber register.
>>
>>>    +  ""
>>> +  "ldr\\t%1, [%1]\;str\\t%1, %0\;mov\t%1,0"
>>> +  [(set_attr "length" "12")
>>> +   (set_attr "type" "multiple")])
>>> +
>>> +(define_insn_and_split "stack_protect_combined_test"
>>> +  [(set (pc)
>>> +       (if_then_else
>>> +               (eq (match_operand:SI 0 "memory_operand" "m")
>>> +                   (unspec:SI [(match_operand:SI 1 "general_operand" "X")]
>>> +                              UNSPEC_SP_TEST))
>>> +               (label_ref (match_operand 2))
>>> +               (pc)))
>>> +   (match_scratch:SI 3 "=r")
>>> +   (match_scratch:SI 4 "=r")]
>>> +  ""
>>> +  "#"
>>> +  "reload_completed"
>>> +  [(const_int 0)]
>>> +{
>>> +  rtx eq, addr;
>>> +
>>> +  addr = XEXP (operands[1], 0);
>>>
>>> Same concern as with stack_protect_combined_set about operand 1 predicate being too general.
>> Will try to create a new predicate, thanks for raising this up.
>>
>>>    +  if (flag_pic)
>>> +    {
>>> +      /* Forces recomputing of GOT base now.  */
>>> +      operands[1] = legitimize_pic_address (addr, SImode, operands[3],
>>> +                                           operands[4],
>>> +                                           true /*compute_now*/);
>>> +    }
>>> +  else
>>> +    {
>>> +      if (!address_operand (addr, SImode))
>>> +       operands[1] = force_const_mem (SImode, addr);
>>> +      emit_move_insn (operands[3], operands[1]);
>>> +    }
>>> +  emit_insn (gen_stack_protect_test (operands[4], operands[0], operands[3]));
>>> +  eq = gen_rtx_EQ (VOIDmode, operands[4], const0_rtx);
>>> +  emit_jump_insn (gen_cbranchsi4 (eq, operands[4], const0_rtx, operands[2]));
>>> +  DONE;
>>> +})
>>> +
>>> +(define_insn "stack_protect_test"
>>> +  [(set (match_operand:SI 0 "register_operand" "=r")
>>> +       (unspec:SI [(match_operand:SI 1 "memory_operand" "m")
>>> +                   (mem:SI (match_operand:SI 2 "register_operand" "r"))]
>>> +        UNSPEC_SP_TEST))
>>> +   (clobber (match_dup 2))]
>>>
>>> Same concern on clobbering and constraints.
>> Same answer :-)
>>
>> Best regards,
>>
>> Thomas
>>
>>>    +  ""
>>> +  "ldr\t%0, [%2]\;ldr\t%2, %1\;eor\t%0, %2, %0"
>>> +  [(set_attr "length" "12")
>>> +   (set_attr "type" "multiple")])
>>> +
>>>    (define_expand "casesi"
>>>      [(match_operand:SI 0 "s_register_operand" "")       ; index to jump on
>>>       (match_operand:SI 1 "const_int_operand" "")        ; lower bound
>>> diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
>>> index b05f85e10e4..429c937df60 100644
>>> --- a/gcc/config/arm/unspecs.md
>>> +++ b/gcc/config/arm/unspecs.md
>>> @@ -86,6 +86,9 @@
>>>      UNSPEC_PROBE_STACK    ; Probe stack memory reference
>>>      UNSPEC_NONSECURE_MEM        ; Represent non-secure memory in ARMv8-M with
>>>                          ; security extension
>>> +  UNSPEC_SP_SET                ; Represent the setting of stack protector's canary
>>> +  UNSPEC_SP_TEST       ; Represent the testing of stack protector's canary
>>> +                       ; against the guard.
>>>    ])
>>>
>>>    (define_c_enum "unspec" [
>>> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
>>> index 734bc765387..f760da495bf 100644
>>> --- a/gcc/doc/md.texi
>>> +++ b/gcc/doc/md.texi
>>> @@ -7362,22 +7362,61 @@ builtins.
>>>    The get/set patterns have a single output/input operand respectively,
>>>    with @var{mode} intended to be @code{Pmode}.
>>>
>>> +@cindex @code{stack_protect_combined_set} instruction pattern
>>> +@item @samp{stack_protect_combined_set}
>>> +This pattern, if defined, moves a @code{ptr_mode} value from an address
>>> +whose declaration RTX is given in operand 1 to the memory in operand 0
>>> +without leaving the value in a register afterward.  If several
>>> +instructions are needed by the target to perform the operation (eg. to
>>> +load the address from a GOT entry then load the @code{ptr_mode} value
>>> +and finally store it), it is the backend's responsibility to ensure no
>>> +intermediate result gets spilled.  This is to avoid leaking the value
>>> +some place that an attacker might use to rewrite the stack guard slot
>>> +after having clobbered it.
>>> +
>>> +If this pattern is not defined, then the address declaration is
>>> +expanded first in the standard way and a @code{stack_protect_set}
>>> +pattern is then generated to move the value from that address to the
>>> +address in operand 0.
>>> +
>>>    @cindex @code{stack_protect_set} instruction pattern
>>>    @item @samp{stack_protect_set}
>>> -This pattern, if defined, moves a @code{ptr_mode} value from the memory
>>> -in operand 1 to the memory in operand 0 without leaving the value in
>>> -a register afterward.  This is to avoid leaking the value some place
>>> -that an attacker might use to rewrite the stack guard slot after
>>> -having clobbered it.
>>> +This pattern, if defined, moves a @code{ptr_mode} value from the valid
>>> +memory location in operand 1 to the memory in operand 0 without leaving
>>> +the value in a register afterward.  This is to avoid leaking the value
>>> +some place that an attacker might use to rewrite the stack guard slot
>>> +after having clobbered it.
>>> +
>>> +Note: on targets where the addressing modes do not allow to load
>>> +directly from stack guard address, the address is expanded in a standard
>>> +way first which could cause some spills.
>>>
>>>    If this pattern is not defined, then a plain move pattern is generated.
>>>
>>> +@cindex @code{stack_protect_combined_test} instruction pattern
>>> +@item @samp{stack_protect_combined_test}
>>> +This pattern, if defined, compares a @code{ptr_mode} value from an
>>> +address whose declaration RTX is given in operand 1 with the memory in
>>> +operand 0 without leaving the value in a register afterward and
>>> +branches to operand 2 if the values were equal.  If several
>>> +instructions are needed by the target to perform the operation (eg. to
>>> +load the address from a GOT entry then load the @code{ptr_mode} value
>>> +and finally store it), it is the backend's responsibility to ensure no
>>> +intermediate result gets spilled.  This is to avoid leaking the value
>>> +some place that an attacker might use to rewrite the stack guard slot
>>> +after having clobbered it.
>>> +
>>> +If this pattern is not defined, then the address declaration is
>>> +expanded first in the standard way and a @code{stack_protect_test}
>>> +pattern is then generated to compare the value from that address to the
>>> +value at the memory in operand 0.
>>> +
>>>    @cindex @code{stack_protect_test} instruction pattern
>>>    @item @samp{stack_protect_test}
>>>    This pattern, if defined, compares a @code{ptr_mode} value from the
>>> -memory in operand 1 with the memory in operand 0 without leaving the
>>> -value in a register afterward and branches to operand 2 if the values
>>> -were equal.
>>> +valid memory location in operand 1 with the memory in operand 0 without
>>> +leaving the value in a register afterward and branches to operand 2 if
>>> +the values were equal.
>>>
>>>    If this pattern is not defined, then a plain compare pattern and
>>>    conditional branch pattern is used.
>>> diff --git a/gcc/function.c b/gcc/function.c
>>> index 142cdaec2ce..bf96dd6feae 100644
>>> --- a/gcc/function.c
>>> +++ b/gcc/function.c
>>> @@ -4886,20 +4886,33 @@ stack_protect_epilogue (void)
>>>      rtx_code_label *label = gen_label_rtx ();
>>>      rtx x, y;
>>>      rtx_insn *seq;
>>> +  struct expand_operand ops[3];
>>>
>>>      x = expand_normal (crtl->stack_protect_guard);
>>> -  if (guard_decl)
>>> -    y = expand_normal (guard_decl);
>>> -  else
>>> -    y = const0_rtx;
>>> -
>>> -  /* Allow the target to compare Y with X without leaking either into
>>> -     a register.  */
>>> -  if (targetm.have_stack_protect_test ()
>>> -      && ((seq = targetm.gen_stack_protect_test (x, y, label)) != NULL_RTX))
>>> -    emit_insn (seq);
>>> -  else
>>> -    emit_cmp_and_jump_insns (x, y, EQ, NULL_RTX, ptr_mode, 1, label);
>>> +  create_fixed_operand (&ops[0], x);
>>> +  create_fixed_operand (&ops[1], DECL_RTL (guard_decl));
>>> +  create_fixed_operand (&ops[2], label);
>>> +  /* Allow the target to compute address of Y and compare it with X without
>>> +     leaking Y into a register.  This combined address + compare pattern allows
>>> +     the target to prevent spilling of any intermediate results by splitting
>>> +     it after register allocator.  */
>>> +  if (!maybe_expand_jump_insn (targetm.code_for_stack_protect_combined_test,
>>> +                              3, ops))
>>> +    {
>>> +      if (guard_decl)
>>> +       y = expand_normal (guard_decl);
>>> +      else
>>> +       y = const0_rtx;
>>> +
>>> +      /* Allow the target to compare Y with X without leaking either into
>>> +        a register.  */
>>> +      if (targetm.have_stack_protect_test ()
>>> +         && ((seq = targetm.gen_stack_protect_test (x, y, label))
>>> +             != NULL_RTX))
>>> +       emit_insn (seq);
>>> +      else
>>> +       emit_cmp_and_jump_insns (x, y, EQ, NULL_RTX, ptr_mode, 1, label);
>>> +    }
>>>
>>>      /* The noreturn predictor has been moved to the tree level.  The rtl-level
>>>         predictors estimate this branch about 20%, which isn't enough to get
>>> diff --git a/gcc/target-insns.def b/gcc/target-insns.def
>>> index 9a552c3d11c..d39889b3522 100644
>>> --- a/gcc/target-insns.def
>>> +++ b/gcc/target-insns.def
>>> @@ -96,7 +96,9 @@ DEF_TARGET_INSN (sibcall_value, (rtx x0, rtx x1, rtx opt2, rtx opt3,
>>>    DEF_TARGET_INSN (simple_return, (void))
>>>    DEF_TARGET_INSN (split_stack_prologue, (void))
>>>    DEF_TARGET_INSN (split_stack_space_check, (rtx x0, rtx x1))
>>> +DEF_TARGET_INSN (stack_protect_combined_set, (rtx x0, rtx x1))
>>>    DEF_TARGET_INSN (stack_protect_set, (rtx x0, rtx x1))
>>> +DEF_TARGET_INSN (stack_protect_combined_test, (rtx x0, rtx x1, rtx x2))
>>>    DEF_TARGET_INSN (stack_protect_test, (rtx x0, rtx x1, rtx x2))
>>>    DEF_TARGET_INSN (store_multiple, (rtx x0, rtx x1, rtx x2))
>>>    DEF_TARGET_INSN (tablejump, (rtx x0, rtx x1))
>>> diff --git a/gcc/testsuite/gcc.target/arm/pr85434.c b/gcc/testsuite/gcc.target/arm/pr85434.c
>>> new file mode 100644
>>> index 00000000000..4143a861f7c
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/arm/pr85434.c
>>> @@ -0,0 +1,200 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-require-effective-target fstack_protector }*/
>>> +/* { dg-require-effective-target fpic }*/
>>> +/* { dg-additional-options "-Os -fpic -fstack-protector-strong" } */
>>> +
>>> +#include <stddef.h>
>>> +#include <stdint.h>
>>> +
>>> +
>>> +static const unsigned char base64_enc_map[64] =
>>> +{
>>> +    'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J',
>>> +    'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T',
>>> +    'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd',
>>> +    'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',
>>> +    'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x',
>>> +    'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7',
>>> +    '8', '9', '+', '/'
>>> +};
>>> +
>>> +#define BASE64_SIZE_T_MAX   ( (size_t) -1 ) /* SIZE_T_MAX is not standard */
>>> +
>>> +
>>> +void doSmth(void *x);
>>> +
>>> +#include <string.h>
>>> +
>>> +
>>> +void check(int n) {
>>> +
>>> +    if (!(n % 2 && n % 3 && n % 5)) {
>>> + __asm__  (   "add    r8, r8, #1;" );
>>> +    }
>>> +}
>>> +
>>> +uint32_t test(
>>> +  uint32_t a1,
>>> +  uint32_t a2,
>>> +  size_t a3,
>>> +  size_t a4,
>>> +  size_t a5,
>>> +  size_t a6)
>>> +{
>>> +  uint32_t nResult = 0;
>>> +  uint8_t* h = 0L;
>>> +  uint8_t X[128];
>>> +  uint8_t mac[64];
>>> +  size_t len;
>>> +
>>> +  doSmth(&a1);
>>> +  doSmth(&a2);
>>> +  doSmth(&a3);
>>> +  doSmth(&a4);
>>> +  doSmth(&a5);
>>> +  doSmth(&a6);
>>> +
>>> +  if (a1 && a2 && a3 && a4 && a5 && a6) {
>>> +    nResult = 1;
>>> +    h = (void*)X;
>>> +    len = sizeof(X);
>>> +    memset(X, a2, len);
>>> +    len -= 64;
>>> +    memcpy(mac ,X, len);
>>> +    *(h + len) = a6;
>>> +
>>> +    {
>>> +
>>> +
>>> +        unsigned char *dst = X;
>>> +        size_t dlen = a3;
>>> +        size_t *olen = &a6;
>>> +        const unsigned char *src = mac;
>>> +        size_t slen = a4;
>>> +    size_t i, n;
>>> +    int C1, C2, C3;
>>> +    unsigned char *p;
>>> +
>>> +    if( slen == 0 )
>>> +    {
>>> +        *olen = 0;
>>> +        return( 0 );
>>> +    }
>>> +
>>> +    n = slen / 3 + ( slen % 3 != 0 );
>>> +
>>> +    if( n > ( BASE64_SIZE_T_MAX - 1 ) / 4 )
>>> +    {
>>> +        *olen = BASE64_SIZE_T_MAX;
>>> +        return( 0 );
>>> +    }
>>> +
>>> +    n *= 4;
>>> +
>>> +    if( ( dlen < n + 1 ) || ( NULL == dst ) )
>>> +    {
>>> +        *olen = n + 1;
>>> +        return( 0 );
>>> +    }
>>> +
>>> +    n = ( slen / 3 ) * 3;
>>> +
>>> +    for( i = 0, p = dst; i < n; i += 3 )
>>> +    {
>>> +        C1 = *src++;
>>> +        C2 = *src++;
>>> +        C3 = *src++;
>>> +
>>> +        check(i);
>>> +
>>> +        *p++ = base64_enc_map[(C1 >> 2) & 0x3F];
>>> +        *p++ = base64_enc_map[(((C1 &  3) << 4) + (C2 >> 4)) & 0x3F];
>>> +        *p++ = base64_enc_map[(((C2 & 15) << 2) + (C3 >> 6)) & 0x3F];
>>> +        *p++ = base64_enc_map[C3 & 0x3F];
>>> +    }
>>> +
>>> +    if( i < slen )
>>> +    {
>>> +        C1 = *src++;
>>> +        C2 = ( ( i + 1 ) < slen ) ? *src++ : 0;
>>> +
>>> +        *p++ = base64_enc_map[(C1 >> 2) & 0x3F];
>>> +        *p++ = base64_enc_map[(((C1 & 3) << 4) + (C2 >> 4)) & 0x3F];
>>> +
>>> +        if( ( i + 1 ) < slen )
>>> +             *p++ = base64_enc_map[((C2 & 15) << 2) & 0x3F];
>>> +        else *p++ = '=';
>>> +
>>> +        *p++ = '=';
>>> +    }
>>> +
>>> +    *olen = p - dst;
>>> +    *p = 0;
>>> +
>>> +}
>>> +
>>> +  __asm__ ("mov r8, %0;" : "=r" ( nResult ));
>>> +  }
>>> +  else
>>> +  {
>>> +    nResult = 2;
>>> +  }
>>> +
>>> +  doSmth(X);
>>> +  doSmth(mac);
>>> +
>>> +
>>> +  return nResult;
>>> +}
>>> +
>>> +/* The pattern below catches sequences of instructions that were generated
>>> +   for ARM and Thumb-2 before the fix for this PR. They are of the form:
>>> +
>>> +   ldr     rX, <offset from sp or fp>
>>> +   <optional non ldr instructions>
>>> +   ldr     rY, <offset from sp or fp>
>>> +   ldr     rZ, [rX]
>>> +   <optional non ldr instructions>
>>> +   cmp     rY, rZ
>>> +   <optional non cmp instructions>
>>> +   bl      __stack_chk_fail
>>> +
>>> +   Ideally the optional block would check for the various rX, rY and rZ
>>> +   registers not being set but this is not possible due to back references
>>> +   being illegal in lookahead expression in Tcl, thus preventing to use the
>>> +   only construct that allow to negate a regexp from using the backreferences
>>> +   to those registers.  Instead we go for the heuristic of allowing non ldr/cmp
>>> +   instructions with the assumptions that (i) those are not part of the stack
>>> +   protector sequences and (ii) they would only be scheduled here if they don't
>>> +   conflict with registers used by stack protector.
>>> +
>>> +   Note on the regexp logic:
>>> +   Allowing non X instructions (where X is ldr or cmp) is done by looking for
>>> +   some non newline spaces, followed by something which is not X, followed by
>>> +   an alphanumeric character followed by anything but a newline and ended by a
>>> +   newline the whole thing an undetermined number of times. The alphanumeric
>>> +   character is there to force the match of the negative lookahead for X to
>>> +   only happen after all the initial spaces and thus to check the mnemonic.
>>> +   This prevents it to match one of the initial space.  */
>>> +/* { dg-final { scan-assembler-not {ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\]\n[ \t]+ldr[ \t]+([^,]+), \[\1\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+cmp[ \t]+\2, \3(?:\n[ \t]+(?!cmp)\w[^\n]*)*\n[ \t]+bl[ \t]+__stack_chk_fail} } } */
>>> +
>>> +/* Likewise for Thumb-1 sequences of instructions prior to the fix for this PR
>>> +   which had the form:
>>> +
>>> +   ldr     rS, <offset from sp or fp>
>>> +   <optional non ldr instructions>
>>> +   ldr     rT, <PC relative offset>
>>> +   <optional non ldr instructions>
>>> +   ldr     rX, [rS, rT]
>>> +   <optional non ldr instructions>
>>> +   ldr     rY, <offset from sp or fp>
>>> +   ldr     rZ, [rX]
>>> +   <optional non ldr instructions>
>>> +   cmp     rY, rZ
>>> +   <optional non cmp instructions>
>>> +   bl      __stack_chk_fail
>>> +
>>> +  Note on the regexp logic:
>>> +  PC relative offset is checked by looking for a source operand that does not
>>> +  contain [ or ].  */
>>> +/* { dg-final { scan-assembler-not {ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), [^][\n]*(?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[\1, \2\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\]\n[ \t]+ldr[ \t]+([^,]+), \[\3\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+cmp[ \t]+\4, \5(?:\n[ \t]+(?!cmp)\w[^\n]*)*\n[ \t]+bl[ \t]+__stack_chk_fail} } } */
>>>
>>>