From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-439589-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 38928 invoked by alias); 26 Oct 2016 12:51:44 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 38895 invoked by uid 89); 26 Oct 2016 12:51:42 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=2.4 required=5.0 tests=BAYES_50,KAM_LAZY_DOMAIN_SECURITY,RP_MATCHES_RCVD,UNSUBSCRIBE_BODY autolearn=no version=3.3.2 spammy=BLKmode, blkmode, retaining, UINT32_MAX
X-HELO: foss.arm.com
Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 26 Oct 2016 12:51:32 +0000
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249])	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id AE8D716;	Wed, 26 Oct 2016 05:51:30 -0700 (PDT)
Received: from [10.2.207.77] (e100706-lin.cambridge.arm.com [10.2.207.77])	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 3DC803F220;	Wed, 26 Oct 2016 05:51:30 -0700 (PDT)
Message-ID: <5810A6D0.7000808@foss.arm.com>
Date: Wed, 26 Oct 2016 12:51:00 -0000
From: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0
MIME-Version: 1.0
To: "Andre Vieira (lists)" <Andre.SimoesDiasVieira@arm.com>,  gcc-patches@gcc.gnu.org
Subject: Re: [PATCHv2 4/7, GCC, ARM, V8M] ARMv8-M Security Extension's cmse_nonsecure_entry: clear registers
References: <5796116C.6010100@arm.com> <579612EE.3050606@arm.com> <57BD7E8B.4000108@arm.com> <580F885D.6030107@arm.com>
In-Reply-To: <580F885D.6030107@arm.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-SW-Source: 2016-10/txt/msg02108.txt.bz2

Hi Andre,

On 25/10/16 17:29, Andre Vieira (lists) wrote:
> On 24/08/16 12:01, Andre Vieira (lists) wrote:
>> On 25/07/16 14:23, Andre Vieira (lists) wrote:
>>> This patch extends support for the ARMv8-M Security Extensions
>>> 'cmse_nonsecure_entry' attribute to safeguard against leak of
>>> information through unbanked registers.
>>>
>>> When returning from a nonsecure entry function we clear all caller-saved
>>> registers that are not used to pass return values, by writing either the
>>> LR, in case of general purpose registers, or the value 0, in case of FP
>>> registers. We use the LR to write to APSR and FPSCR too. We currently do
>>> not support entry functions that pass arguments or return variables on
>>> the stack and we diagnose this. This patch relies on the existing code
>>> to make sure callee-saved registers used in cmse_nonsecure_entry
>>> functions are saved and restored thus retaining their nonsecure mode
>>> value, this should be happening already as it is required by AAPCS.
>>>
>>> This patch also clears padding bits for cmse_nonsecure_entry functions
>>> with struct and union return types. For unions a bit is only considered
>>> a padding bit if it is an unused bit in every field of that union. The
>>> function that calculates these is used in a later patch to do the same
>>> for arguments of cmse_nonsecure_call's.
>>>
>>> *** gcc/ChangeLog ***
>>> 2016-07-25  Andre Vieira        <andre.simoesdiasvieira@arm.com>
>>>              Thomas Preud'homme  <thomas.preudhomme@arm.com>
>>>
>>>          * config/arm/arm.c (output_return_instruction): Clear
>>>          registers.
>>>          (thumb2_expand_return): Likewise.
>>>          (thumb1_expand_epilogue): Likewise.
>>>          (thumb_exit): Likewise.
>>>          (arm_expand_epilogue): Likewise.
>>>          (cmse_nonsecure_entry_clear_before_return): New.
>>>          (comp_not_to_clear_mask_str_un): New.
>>>          (compute_not_to_clear_mask): New.
>>>          * config/arm/thumb1.md (*epilogue_insns): Change length attribute.
>>>          * config/arm/thumb2.md (*thumb2_return): Likewise.
>>>
>>> *** gcc/testsuite/ChangeLog ***
>>> 2016-07-25  Andre Vieira        <andre.simoesdiasvieira@arm.com>
>>>              Thomas Preud'homme  <thomas.preudhomme@arm.com>
>>>
>>>          * gcc.target/arm/cmse/cmse.exp: Test different multilibs separate.
>>>          * gcc.target/arm/cmse/struct-1.c: New.
>>>          * gcc.target/arm/cmse/bitfield-1.c: New.
>>>          * gcc.target/arm/cmse/bitfield-2.c: New.
>>>          * gcc.target/arm/cmse/bitfield-3.c: New.
>>>          * gcc.target/arm/cmse/baseline/cmse-2.c: Test that registers are
>>> cleared.
>>>          * gcc.target/arm/cmse/mainline/soft/cmse-5.c: New.
>>>          * gcc.target/arm/cmse/mainline/hard/cmse-5.c: New.
>>>          * gcc.target/arm/cmse/mainline/hard-sp/cmse-5.c: New.
>>>          * gcc.target/arm/cmse/mainline/softfp/cmse-5.c: New.
>>>          * gcc.target/arm/cmse/mainline/softfp-sp/cmse-5.c: New.
>>>
>> Updated this patch to correctly clear only the cumulative
>> exception-status (0-4,7) and the condition code bits (28-31) of the
>> FPSCR. I also adapted the code to be handle the bigger floating point
>> register files.
>>
>> ----
>>
>> This patch extends support for the ARMv8-M Security Extensions
>> 'cmse_nonsecure_entry' attribute to safeguard against leak of
>> information through unbanked registers.
>>
>> When returning from a nonsecure entry function we clear all caller-saved
>> registers that are not used to pass return values, by writing either the
>> LR, in case of general purpose registers, or the value 0, in case of FP
>> registers. We use the LR to write to APSR. For FPSCR we clear only the
>> cumulative exception-status (0-4, 7) and the condition code bits
>> (28-31). We currently do not support entry functions that pass arguments
>> or return variables on the stack and we diagnose this. This patch relies
>> on the existing code to make sure callee-saved registers used in
>> cmse_nonsecure_entry functions are saved and restored thus retaining
>> their nonsecure mode value, this should be happening already as it is
>> required by AAPCS.
>>
>> This patch also clears padding bits for cmse_nonsecure_entry functions
>> with struct and union return types. For unions a bit is only considered
>> a padding bit if it is an unused bit in every field of that union. The
>> function that calculates these is used in a later patch to do the same
>> for arguments of cmse_nonsecure_call's.
>>
>> *** gcc/ChangeLog ***
>> 2016-07-xx  Andre Vieira        <andre.simoesdiasvieira@arm.com>
>>              Thomas Preud'homme  <thomas.preudhomme@arm.com>
>>
>>          * config/arm/arm.c (output_return_instruction): Clear
>>          registers.
>>          (thumb2_expand_return): Likewise.
>>          (thumb1_expand_epilogue): Likewise.
>>          (thumb_exit): Likewise.
>>          (arm_expand_epilogue): Likewise.
>>          (cmse_nonsecure_entry_clear_before_return): New.
>>          (comp_not_to_clear_mask_str_un): New.
>>          (compute_not_to_clear_mask): New.
>>          * config/arm/thumb1.md (*epilogue_insns): Change length attribute.
>>          * config/arm/thumb2.md (*thumb2_return): Duplicate pattern for
>>          cmse_nonsecure_entry functions.
>>
>> *** gcc/testsuite/ChangeLog ***
>> 2016-07-xx  Andre Vieira        <andre.simoesdiasvieira@arm.com>
>>              Thomas Preud'homme  <thomas.preudhomme@arm.com>
>>
>>          * gcc.target/arm/cmse/cmse.exp: Test different multilibs separate.
>>          * gcc.target/arm/cmse/struct-1.c: New.
>>          * gcc.target/arm/cmse/bitfield-1.c: New.
>>          * gcc.target/arm/cmse/bitfield-2.c: New.
>>          * gcc.target/arm/cmse/bitfield-3.c: New.
>>          * gcc.target/arm/cmse/baseline/cmse-2.c: Test that registers are
>> cleared.
>>          * gcc.target/arm/cmse/mainline/soft/cmse-5.c: New.
>>          * gcc.target/arm/cmse/mainline/hard/cmse-5.c: New.
>>          * gcc.target/arm/cmse/mainline/hard-sp/cmse-5.c: New.
>>          * gcc.target/arm/cmse/mainline/softfp/cmse-5.c: New.
>>          * gcc.target/arm/cmse/mainline/softfp-sp/cmse-5.c: New.
>>
> Hi,
>
> Rebased previous patch on top of trunk as requested. No changes to
> ChangeLog.
>
> Cheers,
> Andre

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index bb81e5662e81a26c7d3ccf9f749e8e356e6de35e..c6260323ecfd2f2842e6a5aab06b67da16619c73 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -17496,6 +17496,279 @@ note_invalid_constants (rtx_insn *insn, HOST_WIDE_INT address, int do_pushes)
    return;
  }
  
+/* This function computes the clear mask and PADDING_BITS_TO_CLEAR for structs
+   and unions in the context of ARMv8-M Security Extensions.  It is used as a
+   helper function for both 'cmse_nonsecure_call' and 'cmse_nonsecure_entry'
+   functions.  The PADDING_BITS_TO_CLEAR pointer can be the base to either one
+   or four masks, depending on whether it is being computed for a
+   'cmse_nonsecure_entry' return value or a 'cmse_nonsecure_call' argument
+   respectively.  The tree for the type of the argument or a field within an
+   argument is passed in ARG_TYPE, the current register this argument or field
+   starts in is kept in the pointer REGNO and updated accordingly, the bit this
+   argument or field starts at is passed in STARTING_BIT and the last used bit
+   is kept in LAST_USED_BIT which is also updated accordingly.  */
+
+static unsigned HOST_WIDE_INT
+comp_not_to_clear_mask_str_un (tree arg_type, int * regno,
+			       uint32_t * padding_bits_to_clear,
+			       unsigned starting_bit, int * last_used_bit)
+
+{
+  unsigned HOST_WIDE_INT not_to_clear_reg_mask = 0;
+
+  if (TREE_CODE (arg_type) == RECORD_TYPE)
+    {
+      unsigned current_bit = starting_bit;
+      tree field;
+      long int offset, size;
+
+
+      field = TYPE_FIELDS (arg_type);
+      while (field)
+	{
+	  /* The offset within a structure is always an offset from
+	     the start of that structure.  Make sure we take that into the
+	     calculation of the register based offset that we use here.  */
+	  offset = starting_bit;
+	  offset += TREE_INT_CST_ELT (DECL_FIELD_BIT_OFFSET (field), 0);
+	  offset %= 32;
+
+	  /* This is the actual size of the field, for bitfields this is the
+	     bitfield width and not the container size.  */
+	  size = TREE_INT_CST_ELT (DECL_SIZE (field), 0);
+
+	  if (*last_used_bit != offset)
+	    {
+	      if (offset < *last_used_bit)
+		{
+		  /* This field's offset is before the 'last_used_bit', that
+		     means this field goes on the next register.  So we need to
+		     pad the rest of the current register and increase the
+		     register number.  */
+		  uint32_t mask;
+		  mask  = UINT32_MAX - ((uint32_t) 1 << *last_used_bit);
+		  mask++;
+
+		  *(padding_bits_to_clear + *regno) |= mask;

padding_bits_to_clear[*regno] |= mask;

+		  not_to_clear_reg_mask |= HOST_WIDE_INT_1U << *regno;
+		  (*regno)++;
+		}
+	      else
+		{
+		  /* Otherwise we pad the bits between the last field's end and
+		     the start of the new field.  */
+		  uint32_t mask;
+
+		  mask = UINT32_MAX >> (32 - offset);
+		  mask -= ((uint32_t) 1 << *last_used_bit) - 1;
+		  *(padding_bits_to_clear + *regno) |= mask;

Likewise.

+		}
+	      current_bit = offset;
+	    }
+
+	  /* Calculate further padding bits for inner structs/unions too.  */
+	  if (RECORD_OR_UNION_TYPE_P (TREE_TYPE (field)))
+	    {
+	      *last_used_bit = current_bit;
+	      not_to_clear_reg_mask
+		|= comp_not_to_clear_mask_str_un (TREE_TYPE (field), regno,
+						  padding_bits_to_clear, offset,
+						  last_used_bit);
+	    }
+	  else
+	    {
+	      /* Update 'current_bit' with this field's size.  If the
+		 'current_bit' lies in a subsequent register, update 'regno' and
+		 reset 'current_bit' to point to the current bit in that new
+		 register.  */
+	      current_bit += size;
+	      while (current_bit >= 32)
+		{
+		  current_bit-=32;
+		  not_to_clear_reg_mask |= HOST_WIDE_INT_1U << *regno;
+		  (*regno)++;
+		}
+	      *last_used_bit = current_bit;
+	    }
+
+	  field = TREE_CHAIN (field);
+	}
+      not_to_clear_reg_mask |= HOST_WIDE_INT_1U << *regno;
+    }
+  else if (TREE_CODE (arg_type) == UNION_TYPE)
+    {
+      tree field, field_t;
+      int i, regno_t, field_size;
+      int max_reg = -1;
+      int max_bit = -1;
+      uint32_t mask;
+      uint32_t padding_bits_to_clear_res[NUM_ARG_REGS]
+	= {UINT32_MAX, UINT32_MAX, UINT32_MAX, UINT32_MAX};
+
+      /* To compute the padding bits in a union we only consider bits as
+	 padding bits if they are always either a padding bit or fall outside a
+	 fields size for all fields in the union.  */
+      field = TYPE_FIELDS (arg_type);
+      while (field)
+	{
+	  uint32_t padding_bits_to_clear_t[NUM_ARG_REGS]
+	    = {0U, 0U, 0U, 0U};
+	  int last_used_bit_t = *last_used_bit;
+	  regno_t = *regno;
+	  field_t = TREE_TYPE (field);
+
+	  /* If the field's type is either a record or a union make sure to
+	     compute their padding bits too.  */
+	  if (RECORD_OR_UNION_TYPE_P (field_t))
+	    not_to_clear_reg_mask
+	      |= comp_not_to_clear_mask_str_un (field_t, &regno_t,
+						&padding_bits_to_clear_t[0],
+						starting_bit, &last_used_bit_t);
+	  else
+	    {
+	      field_size = TREE_INT_CST_ELT (DECL_SIZE (field), 0);
+	      regno_t = (field_size / 32) + *regno;
+	      last_used_bit_t = (starting_bit + field_size) % 32;
+	    }
+
+	  for (i = *regno; i < regno_t; i++)
+	    {
+	      /* For all but the last register used by this field only keep the
+		 padding bits that were padding bits in this field.  */
+	      padding_bits_to_clear_res[i] &= padding_bits_to_clear_t[i];
+	    }
+
+	    /* For the last register, keep all padding bits that were padding
+	       bits in this field and any padding bits that are still valid
+	       as padding bits but fall outside of this field's size.  */
+	    mask = (UINT32_MAX - ((uint32_t) 1 << last_used_bit_t)) + 1;
+	    padding_bits_to_clear_res[regno_t]
+	      &= padding_bits_to_clear_t[regno_t] | mask;
+
+	  /* Update the maximum size of the fields in terms of registers used
+	     ('max_reg') and the 'last_used_bit' in said register.  */
+	  if (max_reg < regno_t)
+	    {
+	      max_reg = regno_t;
+	      max_bit = last_used_bit_t;
+	    }
+	  else if (max_reg == regno_t && max_bit < last_used_bit_t)
+	    max_bit = last_used_bit_t;
+
+	  field = TREE_CHAIN (field);
+	}
+
+      /* Update the current padding_bits_to_clear using the intersection of the
+	 padding bits of all the fields.  */
+      for (i=*regno; i < max_reg; i++)
+	padding_bits_to_clear[i] |= padding_bits_to_clear_res[i];
+

watch the spacing in the 'for' definition.

  +      /* Do not keep trailing padding bits, we do not know yet whether this
+	 is the end of the argument.  */
+      mask = ((uint32_t) 1 << max_bit) - 1;
+      padding_bits_to_clear[max_reg]
+	|= padding_bits_to_clear_res[max_reg] & mask;
+
+      *regno = max_reg;
+      *last_used_bit = max_bit;
+    }
+  else
+    /* This function should only be used for structs and unions.  */
+    gcc_unreachable ();
+
+  return not_to_clear_reg_mask;
+}
+
+/* In the context of ARMv8-M Security Extensions, this function is used for both
+   'cmse_nonsecure_call' and 'cmse_nonsecure_entry' functions to compute what
+   registers are used when returning or passing arguments, which is then
+   returned as a mask.  It will also compute a mask to indicate padding/unused
+   bits for each of these registers, and passes this through the
+   PADDING_BITS_TO_CLEAR pointer.  The tree of the argument type is passed in
+   ARG_TYPE, the rtl representation of the argument is passed in ARG_RTX and
+   the starting register used to pass this argument or return value is passed
+   in REGNO.  It makes use of 'comp_not_to_clear_mask_str_un' to compute these
+   for struct and union types.  */
+
+static unsigned HOST_WIDE_INT
+compute_not_to_clear_mask (tree arg_type, rtx arg_rtx, int regno,
+			     uint32_t * padding_bits_to_clear)
+
+{
+  int last_used_bit = 0;
+  unsigned HOST_WIDE_INT not_to_clear_mask;
+
+  if (RECORD_OR_UNION_TYPE_P (arg_type))
+    {
+      not_to_clear_mask
+	= comp_not_to_clear_mask_str_un (arg_type, &regno,
+					 padding_bits_to_clear, 0,
+					 &last_used_bit);
+
+
+      /* If the 'last_used_bit' is not zero, that means we are still using a
+	 part of the last 'regno'.  In such cases we must clear the trailing
+	 bits.  Otherwise we are not using regno and we should mark it as to
+	 clear.  */
+      if (last_used_bit != 0)
+	*(padding_bits_to_clear + regno)
+	  |= UINT32_MAX - ((uint32_t) 1 << last_used_bit) + 1;

padding_bits_to_clear[regno] |= ...

+      else
+	not_to_clear_mask &= ~(HOST_WIDE_INT_1U << regno);
+    }
+  else
+    {
+      not_to_clear_mask = 0;
+      /* We are not dealing with structs nor unions.  So these arguments may be
+	 passed in floating point registers too.  In some cases a BLKmode is
+	 used when returning or passing arguments in multiple VFP registers.  */
+      if (GET_MODE (arg_rtx) == BLKmode)
+	{
+	  int i, arg_regs;
+	  rtx reg;
+
+	  /* This should really only occur when dealing with the hard-float
+	     ABI.  */
+	  gcc_assert (TARGET_HARD_FLOAT_ABI);
+
+	  for (i = 0; i < XVECLEN (arg_rtx, 0); i++)
+	    {
+	      reg = XEXP (XVECEXP (arg_rtx, 0, i), 0);
+	      gcc_assert (REG_P (reg));
+
+	      not_to_clear_mask |= HOST_WIDE_INT_1U << REGNO (reg);
+
+	      /* If we are dealing with DF mode, make sure we don't
+		 clear either of the registers it addresses.  */
+	      arg_regs = ARM_NUM_REGS (GET_MODE (reg));

Better assert here that you're indeed dealing with DFmode and/or you have 2 registers.

<snip>
  
+/* Clear caller saved registers not used to pass return values and leaked
+   condition flags before exiting a cmse_nonsecure_entry function.  */
+
+void
+cmse_nonsecure_entry_clear_before_return (void)
+{
+  uint64_t to_clear_mask[2];
+  uint32_t padding_bits_to_clear = 0;
+  uint32_t * padding_bits_to_clear_ptr = &padding_bits_to_clear;
+  int regno, maxregno = IP_REGNUM;
+  tree result_type;
+  rtx result_rtl;
+
+  to_clear_mask[0] = (1ULL << (NUM_ARG_REGS)) - 1;
+  to_clear_mask[0] |= (1ULL << IP_REGNUM);
+  /* If we are not dealing with -mfloat-abi=soft we will need to clear VFP
+     registers.  We also check TARGET_HARD_FLOAT to make sure these are
+     present.  */
+  if (TARGET_HARD_FLOAT)
+    {
+      uint64_t float_mask = (1ULL << (D7_VFP_REGNUM + 1)) - 1;
+      maxregno = LAST_VFP_REGNUM;
+
+      float_mask &= ~((1ULL << FIRST_VFP_REGNUM) - 1);
+      to_clear_mask[0] |= float_mask;
+
+      float_mask = (1ULL << (maxregno - 63)) - 1;
+      to_clear_mask[1] = float_mask;
+
+      /* Make sure we dont clear the two scratch registers used to clear the
+	 relevant FPSCR bits in output_return_instruction.  We have only
+	 implemented the clearing of FP registers for Thumb-2, so we assert
+	 here that VFP was not enabled for Thumb-1 ARMv8-M targets.  */
+      gcc_assert (arm_arch_thumb2);
+      emit_use (gen_rtx_REG (SImode, IP_REGNUM));
+      to_clear_mask[0] &= ~(1ULL << IP_REGNUM);
+      emit_use (gen_rtx_REG (SImode, 4));
+      to_clear_mask[0] &= ~(1ULL << 4);
+    }
+
+  /* If the user has defined registers to be caller saved, these are no longer
+     restored by the function before returning and must thus be cleared for
+     security purposes.  */
+  for (regno = NUM_ARG_REGS; regno < LAST_VFP_REGNUM; regno++)
+    {
+      /* We do not touch registers that can be used to pass arguments as per
+	 the AAPCS, since these should never be made callee-saved by user
+	 options.  */
+      if (regno >= FIRST_VFP_REGNUM && regno <= D7_VFP_REGNUM)
+	continue;
+      if (regno >= IP_REGNUM && regno <= PC_REGNUM)
+	continue;

Please use IN_RANGE.

+      if (call_used_regs[regno])
+	to_clear_mask[regno / 64] |= (1ULL << (regno % 64));
+    }
+
+  /* Make sure we do not clear the registers used to return the result in.  */
+  result_type = TREE_TYPE (DECL_RESULT (current_function_decl));
+  if (!VOID_TYPE_P (result_type))
+    {
+      result_rtl = arm_function_value (result_type, current_function_decl, 0);
+
+      /* No need to check that we return in registers, because we don't
+	 support returning on stack yet.  */
+      to_clear_mask[0]
+	&= ~compute_not_to_clear_mask (result_type, result_rtl, 0,
+				       padding_bits_to_clear_ptr);
+    }
+
+  if (padding_bits_to_clear != 0)
+    {
+      rtx reg_rtx;
+      /* Padding bits to clear is not 0 so we know we are dealing with
+	 returning a composite type, which only uses r0.  Let's make sure that
+	 r1-r3 is cleared too, we will use r1 as a scratch register.  */
+      gcc_assert ((to_clear_mask[0] & 0xe) == 0xe);
+
+      reg_rtx = gen_rtx_REG (SImode, 1);
+

Use R1_REGNUM.

+      /* Fill the lower half of the negated padding_bits_to_clear.  */
+      emit_move_insn (reg_rtx,
+		      GEN_INT ((((~padding_bits_to_clear) << 16u) >> 16u)));
+
+      /* Also fill the top half of the negated padding_bits_to_clear.  */
+      if (((~padding_bits_to_clear) >> 16) > 0)
+	emit_insn (gen_rtx_SET (gen_rtx_ZERO_EXTRACT (SImode, reg_rtx,
+						      GEN_INT (16),
+						      GEN_INT (16)),
+				GEN_INT ((~padding_bits_to_clear) >> 16)));
+
+      emit_insn (gen_andsi3 (gen_rtx_REG (SImode, 0),
+			   gen_rtx_REG (SImode, 0),
+			   reg_rtx));

Likewise, use R0_REGNUM.

<snip>

This is ok with the changes above.
Thanks,
Kyrill