From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-506996-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 46727 invoked by alias); 14 Aug 2019 22:15:49 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 46308 invoked by uid 89); 14 Aug 2019 22:15:27 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-4.3 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.1 spammy=
X-HELO: EUR03-DB5-obe.outbound.protection.outlook.com
Received: from mail-oln040092071100.outbound.protection.outlook.com (HELO EUR03-DB5-obe.outbound.protection.outlook.com) (40.92.71.100) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 14 Aug 2019 22:15:22 +0000
Received: from DB5EUR03FT047.eop-EUR03.prod.protection.outlook.com (10.152.20.53) by DB5EUR03HT018.eop-EUR03.prod.protection.outlook.com (10.152.20.251) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.2052.18; Wed, 14 Aug 2019 22:15:16 +0000
Received: from AM6PR10MB2566.EURPRD10.PROD.OUTLOOK.COM (10.152.20.54) by DB5EUR03FT047.mail.protection.outlook.com (10.152.21.232) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.2178.16 via Frontend Transport; Wed, 14 Aug 2019 22:15:16 +0000
Received: from AM6PR10MB2566.EURPRD10.PROD.OUTLOOK.COM ([fe80::4056:d9d8:9ce5:1976]) by AM6PR10MB2566.EURPRD10.PROD.OUTLOOK.COM ([fe80::4056:d9d8:9ce5:1976%4]) with mapi id 15.20.2157.022; Wed, 14 Aug 2019 22:15:16 +0000
From: Bernd Edlinger <bernd.edlinger@hotmail.de>
To: Richard Biener <rguenther@suse.de>
CC: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>, Richard Earnshaw	<richard.earnshaw@arm.com>, Ramana Radhakrishnan	<ramana.radhakrishnan@arm.com>, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>,	Eric Botcazou <ebotcazou@adacore.com>, Jeff Law <law@redhat.com>, "Jakub Jelinek" <jakub@redhat.com>
Subject: Re: [PATCHv4] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
Date: Wed, 14 Aug 2019 22:26:00 -0000
Message-ID: <AM6PR10MB256602B22A8F0DA91782D76DE4AD0@AM6PR10MB2566.EURPRD10.PROD.OUTLOOK.COM>
References: <AM6PR07MB4037775DF79E0229DCCA425AE44F0@AM6PR07MB4037.eurprd07.prod.outlook.com> <alpine.LSU.2.20.1903211208070.4934@zhemvz.fhfr.qr> <AM6PR07MB403745E0BCCCF005A02B0CBDE4430@AM6PR07MB4037.eurprd07.prod.outlook.com> <alpine.LSU.2.20.1903250937530.4934@zhemvz.fhfr.qr> <AM6PR10MB256664D731C3CC92F2FBEDC5E4DC0@AM6PR10MB2566.EURPRD10.PROD.OUTLOOK.COM> <alpine.LSU.2.20.1908021451300.19626@zhemvz.fhfr.qr> <AM6PR10MB2566A6E51DC500187D9EC6CFE4D90@AM6PR10MB2566.EURPRD10.PROD.OUTLOOK.COM> <DB8PR10MB2569255E3191B5C7FA70D338E4D70@DB8PR10MB2569.EURPRD10.PROD.OUTLOOK.COM> <alpine.LSU.2.20.1908141319050.11741@zhemvz.fhfr.qr>
In-Reply-To: <alpine.LSU.2.20.1908141319050.11741@zhemvz.fhfr.qr>
x-microsoft-original-message-id: <bdff83ec-a252-41c6-838e-7a67d10cbcb9@hotmail.de>
x-ms-exchange-purlcount: 1
Content-Type: text/plain; charset="Windows-1252"
Content-ID: <1CDDF6C9DED0114C820B31980621E986@EURPRD10.PROD.OUTLOOK.COM>
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-SW-Source: 2019-08/txt/msg01041.txt.bz2

On 8/14/19 2:00 PM, Richard Biener wrote:
> On Thu, 8 Aug 2019, Bernd Edlinger wrote:
>=20
>> On 8/2/19 9:01 PM, Bernd Edlinger wrote:
>>> On 8/2/19 3:11 PM, Richard Biener wrote:
>>>> On Tue, 30 Jul 2019, Bernd Edlinger wrote:
>>>>
>>>>>
>>>>> I have no test coverage for the movmisalign optab though, so I
>>>>> rely on your code review for that part.
>>>>
>>>> It looks OK.  I tried to make it trigger on the following on
>>>> i?86 with -msse2:
>>>>
>>>> typedef int v4si __attribute__((vector_size (16)));
>>>>
>>>> struct S { v4si v; } __attribute__((packed));
>>>>
>>>> v4si foo (struct S s)
>>>> {
>>>>   return s.v;
>>>> }
>>>>
>>>
>>> Hmm, the entry_parm need to be a MEM_P and an unaligned one.
>>> So the test case could be made to trigger it this way:
>>>
>>> typedef int v4si __attribute__((vector_size (16)));
>>>
>>> struct S { v4si v; } __attribute__((packed));
>>>
>>> int t;
>>> v4si foo (struct S a, struct S b, struct S c, struct S d,
>>>           struct S e, struct S f, struct S g, struct S h,
>>>           int i, int j, int k, int l, int m, int n,
>>>           int o, struct S s)
>>> {
>>>   t =3D o;
>>>   return s.v;
>>> }
>>>
>>
>> Ah, I realized that there are already a couple of very similar
>> test cases: gcc.target/i386/pr35767-1.c, gcc.target/i386/pr35767-1d.c,
>> gcc.target/i386/pr35767-1i.c and gcc.target/i386/pr39445.c,
>> which also manage to execute the movmisalign code with the latest patch
>> version.  So I thought that it is not necessary to add another one.
>>
>>> However the code path is still not reached, since targetm.slow_ualigned=
_access
>>> is always FALSE, which is probably a flaw in my patch.
>>>
>>> So I think,
>>>
>>> +  else if (MEM_P (data->entry_parm)
>>> +          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
>>> +             > MEM_ALIGN (data->entry_parm)
>>> +          && targetm.slow_unaligned_access (promoted_nominal_mode,
>>> +                                            MEM_ALIGN (data->entry_par=
m)))
>>>
>>> should probably better be
>>>
>>> +  else if (MEM_P (data->entry_parm)
>>> +          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
>>> +             > MEM_ALIGN (data->entry_parm)
>>> +        && (((icode =3D optab_handler (movmisalign_optab, promoted_nom=
inal_mode))
>>> +             !=3D CODE_FOR_nothing)
>>> +            || targetm.slow_unaligned_access (promoted_nominal_mode,
>>> +                                              MEM_ALIGN (data->entry_p=
arm))))
>>>
>>> Right?
>>>
>>> Then the modified test case would use the movmisalign optab.
>>> However nothing changes in the end, since the i386 back-end is used to =
work
>>> around the middle end not using movmisalign optab when it should do so.
>>>
>>
>> I prefer the second form of the check, as it offers more test coverage,
>> and is probably more correct than the former.
>>
>> Note there are more variations of this misalign check in expr.c,
>> some are somehow odd, like expansion of MEM_REF and VIEW_CONVERT_EXPR:
>>
>>             && mode !=3D BLKmode
>>             && align < GET_MODE_ALIGNMENT (mode))
>>           {
>>             if ((icode =3D optab_handler (movmisalign_optab, mode))
>>                 !=3D CODE_FOR_nothing)
>>               [...]
>>             else if (targetm.slow_unaligned_access (mode, align))
>>               temp =3D extract_bit_field (temp, GET_MODE_BITSIZE (mode),
>>                                         0, TYPE_UNSIGNED (TREE_TYPE (exp=
)),
>>                                         (modifier =3D=3D EXPAND_STACK_PA=
RM
>>                                          ? NULL_RTX : target),
>>                                         mode, mode, false, alt_rtl);
>>
>> I wonder if they are correct this way, why shouldn't we use the movmisal=
ign
>> optab if it exists, regardless of TARGET_SLOW_UNALIGNED_ACCESSS ?
>=20
> Doesn't the code do exactly this?  Prefer movmisalign over=20
> extrct_bit_field?
>=20

Ah, yes.  How could I miss that.

>>
>>> I wonder if I should try to add a gcc_checking_assert to the mov<mode> =
expand
>>> patterns that the memory is properly aligned ?
>>>
>>
>> Wow, that was a really exciting bug-hunt with those assertions around...
>=20
> :)
>=20
>>>> @@ -3292,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_=
all
>>>>
>>>>        did_conversion =3D true;
>>>>      }
>>>> +  else if (MEM_P (data->entry_parm)
>>>> +          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
>>>> +             > MEM_ALIGN (data->entry_parm)
>>>>
>>>> we arrive here by-passing
>>>>
>>>>   else if (need_conversion)
>>>>     {
>>>>       /* We did not have an insn to convert directly, or the sequence
>>>>          generated appeared unsafe.  We must first copy the parm to a
>>>>          pseudo reg, and save the conversion until after all
>>>>          parameters have been moved.  */
>>>>
>>>>       int save_tree_used;
>>>>       rtx tempreg =3D gen_reg_rtx (GET_MODE (data->entry_parm));
>>>>
>>>>       emit_move_insn (tempreg, validated_mem);
>>>>
>>>> but this move instruction is invalid in the same way as the case
>>>> you fix, no?  So wouldn't it be better to do
>>>>
>>>
>>> We could do that, but I supposed that there must be a reason why
>>> assign_parm_setup_stack gets away with that same:
>>>
>>>   if (data->promoted_mode !=3D data->nominal_mode)
>>>     {
>>>       /* Conversion is required.  */
>>>       rtx tempreg =3D gen_reg_rtx (GET_MODE (data->entry_parm));
>>>
>>>       emit_move_insn (tempreg, validize_mem (copy_rtx (data->entry_parm=
)));
>>>
>>>
>>> So either some back-ends are too permissive with us,
>>> or there is a reason why promoted_mode !=3D nominal_mode
>>> does not happen together with unaligned entry_parm.
>>> In a way that would be a rather unusual ABI.
>>>
>>
>> To find out if that ever happens I added a couple of checking
>> assertions in the arm mov<mode> expand patterns.
>>
>> So far the assertions did (almost) always hold, so it is likely not
>> necessary to fiddle with all those naive move instructions here.
>>
>> So my gut feeling is, leave those places alone until there is a reason
>> for changing them.
>=20
> Works for me - I wonder if we should add those asserts to generic
> code (guarded with flag_checking) though.
>=20

Well, yes, but I was scared away by the complexity of emit_move_insn_1.

It could be done, but in the moment I would be happy to have these
checks of one major strict alignment target, ARM is a good candidate
since most instructions work even if they are accidentally
using unaligned arguments.  So middle-end errors do not always
visible by ordinary tests.  Nevertheless it is a blatant violation of the
contract between middle-end and back-end, which should be avoided.

>> However the assertion in movsi triggered a couple times in the
>> ada testsuite due to expand_builtin_init_descriptor using a
>> BLKmode MEM rtx, which is only 8-bit aligned.  So, I set the
>> ptr_mode alignment there explicitly.
>=20
> Looks good given we emit ptr_mode moves into it.  Thus OK independently.
>=20

Thanks, committed as r274487.

>> Several struct-layout-1.dg testcase tripped over misaligned
>> complex_cst constants, fixed by varasm.c (align_variable).
>> This is likely a wrong code bug, because misaligned complex
>> constants, are expanded to misaligned MEM_REF, but the
>> expansion cannot handle misaligned constants, only packed
>> structure fields.
>=20
> Hmm.  So your patch overrides user-alignment here.  Woudln't it
> be better to do that more conciously by
>=20
>   if (! DECL_USER_ALIGN (decl)
>       || (align < GET_MODE_ALIGNMENT (DECL_MODE (decl))
>           && targetm.slow_unaligned_access (DECL_MODE (decl), align)))
>=20
> ?  And why is the movmisalign optab support missing here?
>=20

Yes, I wanted to replicate what we have in assign_parm_adjust_stack_rtl:

  /* If we can't trust the parm stack slot to be aligned enough for its
     ultimate type, don't use that slot after entry.  We'll make another
     stack slot, if we need one.  */
  if (stack_parm
      && ((GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN (stack_parm)
           && targetm.slow_unaligned_access (data->nominal_mode,
                                             MEM_ALIGN (stack_parm)))

which also makes a variable more aligned than it is declared.
But maybe both should also check the movmisalign optab in
addition to slow_unaligned_access ?

> IMHO whatever code later fails to properly use unaligned loads
> should be fixed instead rather than ignoring user requested alignment.
>=20
> Can you quote a short testcase that explains what exactly goes wrong?
> The struct-layout ones are awkward to look at...
>=20

Sure,

$ cat test.c
_Complex float __attribute__((aligned(1))) cf;

void foo (void)
{
  cf =3D 1.0i;
}

$ arm-linux-gnueabihf-gcc -S test.c=20
during RTL pass: expand
test.c: In function 'foo':
test.c:5:6: internal compiler error: in gen_movsf, at config/arm/arm.md:7003
    5 |   cf =3D 1.0i;
      |   ~~~^~~~~~
0x7ba475 gen_movsf(rtx_def*, rtx_def*)
	../../gcc-trunk/gcc/config/arm/arm.md:7003
0xa49587 insn_gen_fn::operator()(rtx_def*, rtx_def*) const
	../../gcc-trunk/gcc/recog.h:318
0xa49587 emit_move_insn_1(rtx_def*, rtx_def*)
	../../gcc-trunk/gcc/expr.c:3695
0xa49914 emit_move_insn(rtx_def*, rtx_def*)
	../../gcc-trunk/gcc/expr.c:3791
0xa494f7 emit_move_complex_parts(rtx_def*, rtx_def*)
	../../gcc-trunk/gcc/expr.c:3490
0xa49914 emit_move_insn(rtx_def*, rtx_def*)
	../../gcc-trunk/gcc/expr.c:3791
0xa5106f store_expr(tree_node*, rtx_def*, int, bool, bool)
	../../gcc-trunk/gcc/expr.c:5855
0xa51cc0 expand_assignment(tree_node*, tree_node*, bool)
	../../gcc-trunk/gcc/expr.c:5441
0xa51cc0 expand_assignment(tree_node*, tree_node*, bool)
	../../gcc-trunk/gcc/expr.c:4983
0x93396f expand_gimple_stmt_1
	../../gcc-trunk/gcc/cfgexpand.c:3777
0x93396f expand_gimple_stmt
	../../gcc-trunk/gcc/cfgexpand.c:3875
0x9392e1 expand_gimple_basic_block
	../../gcc-trunk/gcc/cfgexpand.c:5915
0x93b046 execute
	../../gcc-trunk/gcc/cfgexpand.c:6538
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

Without the hunk in varasm.c of course.

What happens is that expand_expr_real_2 returns a unaligned mem_ref here:

    case COMPLEX_CST:
      /* Handle evaluating a complex constant in a CONCAT target.  */
      if (original_target && GET_CODE (original_target) =3D=3D CONCAT)
        {
          [... this path not taken ...]
        }

      /* fall through */

    case STRING_CST:
      temp =3D expand_expr_constant (exp, 1, modifier);

      /* temp contains a constant address.
         On RISC machines where a constant address isn't valid,
         make some insns to get that address into a register.  */
      if (modifier !=3D EXPAND_CONST_ADDRESS
          && modifier !=3D EXPAND_INITIALIZER
          && modifier !=3D EXPAND_SUM
          && ! memory_address_addr_space_p (mode, XEXP (temp, 0),
                                            MEM_ADDR_SPACE (temp)))
        return replace_equiv_address (temp,
                                      copy_rtx (XEXP (temp, 0)));
      return temp;

The result of expand_expr_real(..., EXPAND_NORMAL) ought to be usable
by emit_move_insn, that is expected just *everywhere* and can't be changed.

This could probably be fixed in an ugly way in the COMPLEX_CST, handler
but OTOH, I don't see any reason why this constant has to be misaligned
when it can be easily aligned, which avoids the need for a misaligned acces=
s.

>> Furthermore gcc.dg/Warray-bounds-33.c was fixed by the
>> change in expr.c (expand_expr_real_1).  Certainly is it invalid
>> to read memory at a function address, but it should not ICE.
>> The problem here, is the MEM_REF has no valid MEM_ALIGN, it looks
>> like A32, so the misaligned code execution is not taken, but it is
>> set to A8 below, but then we hit an ICE if the result is used:
>=20
> So the user accessed it as A32.
>=20
>>         /* Don't set memory attributes if the base expression is
>>            SSA_NAME that got expanded as a MEM.  In that case, we should
>>            just honor its original memory attributes.  */
>>         if (TREE_CODE (tem) !=3D SSA_NAME || !MEM_P (orig_op0))
>>           set_mem_attributes (op0, exp, 0);
>=20
> Huh, I don't understand this.  'tem' should never be SSA_NAME.

tem is the result of get_inner_reference, why can't that be a SSA_NAME ?

> But set_mem_attributes_minus_bitpos uses get_object_alignment_1
> and that has special treatment for FUNCTION_DECLs that is not
> covered by
>=20
>       /* When EXP is an actual memory reference then we can use
>          TYPE_ALIGN of a pointer indirection to derive alignment.
>          Do so only if get_pointer_alignment_1 did not reveal absolute
>          alignment knowledge and if using that alignment would
>          improve the situation.  */
>       unsigned int talign;
>       if (!addr_p && !known_alignment
>           && (talign =3D min_align_of_type (TREE_TYPE (exp)) *=20
> BITS_PER_UNIT)
>           && talign > align)
>         align =3D talign;
>=20
> which could be moved out of the if-cascade.
>=20
> That said, setting A8 should eventually result into appropriate
> unaligned expansion, so it seems odd this triggers the assert...
>=20

The function pointer is really 32-byte aligned in ARM mode to start
with...

The problem is that the code that handles this misaligned access
is skipped because the mem_rtx has initially no MEM_ATTRS and therefore
MEM_ALIGN =3D=3D 32, and therefore the code that handles the unaligned
access is not taken.  BUT before the mem_rtx is returned it is
set to MEM_ALIGN =3D 8 by set_mem_attributes, and we have an assertion,
because the result from expand_expr_real(..., EXPAND_NORMAL) ought to be
usable with emit_move_insn.

>>
>> Finally gcc.dg/torture/pr48493.c required the change
>> in assign_parm_setup_stack.  This is just not using the
>> correct MEM_ALIGN attribute value, while the memory is
>> actually aligned.
>=20
> But doesn't
>=20
>           int align =3D STACK_SLOT_ALIGNMENT (data->passed_type,
>                                             GET_MODE (data->entry_parm),
>                                             TYPE_ALIGN=20
> (data->passed_type));
> +         if (align < (int)GET_MODE_ALIGNMENT (GET_MODE=20
> (data->entry_parm))
> +             && targetm.slow_unaligned_access (GET_MODE=20
> (data->entry_parm),
> +                                               align))
> +           align =3D GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm));
>=20
> hint at that STACK_SLOT_ALIGNMENT is simply bogus for the target?
> That is, the target says, for natural alignment 64 the stack slot
> alignment can only be guaranteed 32.  You can't then simply up it
> but have to use unaligned accesses (or the target/middle-end needs
> to do dynamic stack alignment).
>=20
Yes, maybe, but STACK_SLOT_ALIGNMENT is used in a few other places as well,
and none of them have a problem, probably because they use expand_expr,
but here we use emit_move_insn:

      if (MEM_P (src))
        {
          [...]
        }
      else
        {
          if (!REG_P (src))
            src =3D force_reg (GET_MODE (src), src);
          emit_move_insn (dest, src);
        }

So I could restrict that to

          if (!MEM_P (data->entry_parm)
              && align < (int)GET_MODE_ALIGNMENT (GET_MODE (data->entry_par=
m))
              && ((optab_handler (movmisalign_optab,
				  GET_MODE (data->entry_parm))
                   !=3D CODE_FOR_nothing)
                  || targetm.slow_unaligned_access (GET_MODE (data->entry_p=
arm),
                                                    align)))
            align =3D GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm));

But OTOH even for arguments arriving in unaligned stack slots where
emit_block_move could handle it, that would just work against the
intention of assign_parm_adjust_stack_rtl.

Of course there are limits how much alignment assign_stack_local
can handle, and that would result in an assertion in the emit_move_insn.
But in the end if that happens it is just an impossible target
configuration.

>=20
>>  Note that set_mem_attributes does not
>> always preserve the MEM_ALIGN of the ref, since:
>=20
> set_mem_attributes sets _all_ attributes from an expression or type.
>=20

Not really:

  refattrs =3D MEM_ATTRS (ref);
  if (refattrs)
    {
      /* ??? Can this ever happen?  Calling this routine on a MEM that
         already carries memory attributes should probably be invalid.  */
      [...]
      attrs.align =3D refattrs->align;
    }
  else
    [...]

  if (objectp || TREE_CODE (t) =3D=3D INDIRECT_REF)
    attrs.align =3D MAX (attrs.align, TYPE_ALIGN (type));

>>   /* Default values from pre-existing memory attributes if present.  */
>>   refattrs =3D MEM_ATTRS (ref);
>>   if (refattrs)
>>     {
>>       /* ??? Can this ever happen?  Calling this routine on a MEM that
>>          already carries memory attributes should probably be invalid.  =
*/
>>       attrs.expr =3D refattrs->expr;
>>       attrs.offset_known_p =3D refattrs->offset_known_p;
>>       attrs.offset =3D refattrs->offset;
>>       attrs.size_known_p =3D refattrs->size_known_p;
>>       attrs.size =3D refattrs->size;
>>       attrs.align =3D refattrs->align;
>>     }
>>
>> but if we happen to set_mem_align to _exactly_ the MODE_ALIGNMENT
>> the MEM_ATTRS are zero, and a smaller alignment may result.
>=20
> Not sure what you are saying here.  That
>=20
> set_mem_align (MEM:SI A32, 32)
>=20
> produces a NULL MEM_ATTRS and thus set_mem_attributes not inheriting
> the A32 but eventually computing sth lower?  Yeah, that's probably
> an interesting "hole" here.  I'm quite sure that if we'd do
>=20
> refattrs =3D MEM_ATTRS (ref) ? MEM_ATTRS (ref) : mem_mode_attrs[(int) GET=
_MODE (ref)];
>=20
> we run into issues exactly on strict-align targets ...
>=20

Yeah, that's scary...

>> Well with those checks in place it should now be a lot harder to generate
>> invalid code on STRICT_ALIGNMENT targets, without running into an ICE.
>>
>> Attached is the latest version of my arm alignment patch.
>>
>>
>> Boot-strapped and reg-tested on x64_64-pc-linux-gnu and arm-linux-gnueab=
ihf.
>> Is it OK for trunk?
>=20
> @@ -3291,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all
>=20
>        did_conversion =3D true;
>      }
> +  else if (MEM_P (data->entry_parm)
> +          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
> +             > MEM_ALIGN (data->entry_parm)
> +          && (((icode =3D optab_handler (movmisalign_optab,
> +                                       promoted_nominal_mode))
> +               !=3D CODE_FOR_nothing)
> +              || targetm.slow_unaligned_access (promoted_nominal_mode,
> +                                                MEM_ALIGN=20
> (data->entry_parm))))
> +    {
> +      if (icode !=3D CODE_FOR_nothing)
> +       emit_insn (GEN_FCN (icode) (parmreg, validated_mem));
> +      else
> +       rtl =3D parmreg =3D extract_bit_field (validated_mem,
> +                       GET_MODE_BITSIZE (promoted_nominal_mode), 0,
> +                       unsignedp, parmreg,
> +                       promoted_nominal_mode, VOIDmode, false, NULL);
> +    }
>    else
>      emit_move_insn (parmreg, validated_mem);
>=20
> This hunk would be obvious to me if we'd use MEM_ALIGN (validated_mem) /
> GET_MODE (validated_mem) instead of MEM_ALIGN (data->entry_parm)
> and promoted_nominal_mode.
>=20

Yes, the idea is just to save some cycles, since

parmreg =3D gen_reg_rtx (promoted_nominal_mode);
we know that parmreg will also have that mode, plus
emit_move_insn (parmreg, validated_mem) which would be called here
asserts that:

  gcc_assert (mode !=3D BLKmode
              && (GET_MODE (y) =3D=3D mode || GET_MODE (y) =3D=3D VOIDmode)=
);

so GET_MODE(validated_mem) =3D=3D GET_MODE (parmreg) =3D=3D promoted_nomina=
l_mode

I still like the current version with promoted_nominal_mode slighhtly
better both because of performance, and the 80-column restriction. :)

> Not sure if it helps on its own.
>=20
> I'm nervous about the alignment one since I'm not at all familiar
> with this code...
>=20
> Thanks,
> Richard.
>=20

I will send a new version of the patch shortly.


Thanks
Bernd.