From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-506133-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 34639 invoked by alias); 2 Aug 2019 19:01:18 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 34628 invoked by uid 89); 2 Aug 2019 19:01:18 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-5.0 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.1 spammy=
X-HELO: EUR01-DB5-obe.outbound.protection.outlook.com
Received: from mail-oln040092064052.outbound.protection.outlook.com (HELO EUR01-DB5-obe.outbound.protection.outlook.com) (40.92.64.52) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 02 Aug 2019 19:01:10 +0000
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=nW8suPBWNOFUpdl8q4167gV+gaMlbQRQeCw7TsS9naNzLJ2UyYrcd5XWpB1M38ZMNo7wCaY+HBPTtcx/WY97IvTiKBNjbnE50xYiZtGAx04dPUrP1GipgiVqLSrV1l9qhU2M2C2vw6cdiZ7V9gu8mTNt/6/QAJ+6khe3/jnqiqVePi1we1Cc7uitxiGK1ImYdB57iOnBdGIj+KkvRFWCDHgGHdT35oaHhKRGq2Fl4qsDz+9nheZcfYOhYuKqCuNiVQTOWnqNT9zOoB9EuFMH7VnpBDL8JdKIpWw3TBdmIkAkdqiY725RSAQXO0llKsXgF8XPxIdsf/vGQgdz5ySy1g==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=q7TtVIgBmMSkYk+lGGzmoXz4M/5nVq/4EaaH0G9F9yA=; b=eLlUOaNb72yD/GJ65wHsaQQGJwwl48k9Uz7RcsiMNJcdpyMPtyLW7T3Ziv3XBOn0v6H1DujsUUnWAijTIL6Z2nn2Hrm9xzKg70avapJ9VB5QAdwswQsnJn/Ad93HVhB6+yJyHoxI+9PAT6Mh3crwAIdoCq010tNTR0yMRolYLhHgcccQW4t2u91YufED87O6jaR5yV83q/tPw7v6RuDtmVO/fK1+UY7FfOa5ROBqxHJkAlpzELbSQUBMq/0flkUiEtZl+yp2M9L3iQKjAWaW9emxQwJy/6mMD5iRVzbA/Xt0Hx4U1NpzgSnkhU7wf9TQqSb3kVABE+/KOfRBg/jiow==
ARC-Authentication-Results: i=1; mx.microsoft.com 1;spf=none;dmarc=none;dkim=none;arc=none
Received: from DB5EUR01FT032.eop-EUR01.prod.protection.outlook.com (10.152.4.57) by DB5EUR01HT123.eop-EUR01.prod.protection.outlook.com (10.152.5.129) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2136.14; Fri, 2 Aug 2019 19:01:08 +0000
Received: from AM6PR10MB2566.EURPRD10.PROD.OUTLOOK.COM (10.152.4.54) by DB5EUR01FT032.mail.protection.outlook.com (10.152.4.250) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2136.14 via Frontend Transport; Fri, 2 Aug 2019 19:01:07 +0000
Received: from AM6PR10MB2566.EURPRD10.PROD.OUTLOOK.COM ([fe80::c488:4d1b:6ada:37cc]) by AM6PR10MB2566.EURPRD10.PROD.OUTLOOK.COM ([fe80::c488:4d1b:6ada:37cc%3]) with mapi id 15.20.2136.010; Fri, 2 Aug 2019 19:01:07 +0000
From: Bernd Edlinger <bernd.edlinger@hotmail.de>
To: Richard Biener <rguenther@suse.de>
CC: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>, Richard Earnshaw	<richard.earnshaw@arm.com>, Ramana Radhakrishnan	<ramana.radhakrishnan@arm.com>, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>,	Eric Botcazou <ebotcazou@adacore.com>, Jeff Law <law@redhat.com>, Jakub Jelinek <jakub@redhat.com>
Subject: Re: [PATCHv3] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
Date: Fri, 02 Aug 2019 19:01:00 -0000
Message-ID: <AM6PR10MB2566A6E51DC500187D9EC6CFE4D90@AM6PR10MB2566.EURPRD10.PROD.OUTLOOK.COM>
References: <AM6PR07MB4037775DF79E0229DCCA425AE44F0@AM6PR07MB4037.eurprd07.prod.outlook.com> <alpine.LSU.2.20.1903211208070.4934@zhemvz.fhfr.qr> <AM6PR07MB403745E0BCCCF005A02B0CBDE4430@AM6PR07MB4037.eurprd07.prod.outlook.com> <alpine.LSU.2.20.1903250937530.4934@zhemvz.fhfr.qr> <AM6PR10MB256664D731C3CC92F2FBEDC5E4DC0@AM6PR10MB2566.EURPRD10.PROD.OUTLOOK.COM> <alpine.LSU.2.20.1908021451300.19626@zhemvz.fhfr.qr>
In-Reply-To: <alpine.LSU.2.20.1908021451300.19626@zhemvz.fhfr.qr>
x-microsoft-original-message-id: <8141d711-a246-70f4-8744-e305411ee40c@hotmail.de>
Content-Type: text/plain; charset="Windows-1252"
Content-ID: <8C20315FE1592E41B0CB28F8812A2159@EURPRD10.PROD.OUTLOOK.COM>
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-SW-Source: 2019-08/txt/msg00178.txt.bz2

On 8/2/19 3:11 PM, Richard Biener wrote:
> On Tue, 30 Jul 2019, Bernd Edlinger wrote:
>=20
>>
>> I have no test coverage for the movmisalign optab though, so I
>> rely on your code review for that part.
>=20
> It looks OK.  I tried to make it trigger on the following on
> i?86 with -msse2:
>=20
> typedef int v4si __attribute__((vector_size (16)));
>=20
> struct S { v4si v; } __attribute__((packed));
>=20
> v4si foo (struct S s)
> {
>   return s.v;
> }
>=20

Hmm, the entry_parm need to be a MEM_P and an unaligned one.
So the test case could be made to trigger it this way:

typedef int v4si __attribute__((vector_size (16)));

struct S { v4si v; } __attribute__((packed));

int t;
v4si foo (struct S a, struct S b, struct S c, struct S d,
          struct S e, struct S f, struct S g, struct S h,
          int i, int j, int k, int l, int m, int n,
          int o, struct S s)
{
  t =3D o;
  return s.v;
}

However the code path is still not reached, since targetm.slow_ualigned_acc=
ess
is always FALSE, which is probably a flaw in my patch.

So I think,

+  else if (MEM_P (data->entry_parm)
+          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
+             > MEM_ALIGN (data->entry_parm)
+          && targetm.slow_unaligned_access (promoted_nominal_mode,
+                                            MEM_ALIGN (data->entry_parm)))

should probably better be

+  else if (MEM_P (data->entry_parm)
+          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
+             > MEM_ALIGN (data->entry_parm)
+        && (((icode =3D optab_handler (movmisalign_optab, promoted_nominal=
_mode))
+             !=3D CODE_FOR_nothing)
+            || targetm.slow_unaligned_access (promoted_nominal_mode,
+                                              MEM_ALIGN (data->entry_parm)=
)))

Right?

Then the modified test case would use the movmisalign optab.
However nothing changes in the end, since the i386 back-end is used to work
around the middle end not using movmisalign optab when it should do so.

I wonder if I should try to add a gcc_checking_assert to the mov<mode> expa=
nd
patterns that the memory is properly aligned ?


> but nowadays x86 seems to be happy with regular moves operating on
> unaligned memory, using unaligned moves where necessary.
>=20
> (insn 5 2 8 2 (set (reg:V4SI 82 [ _2 ])
>         (mem/c:V4SI (reg/f:SI 16 argp) [2 s.v+0 S16 A32])) "t.c":7:11 122=
9=20
> {movv4si_internal}
>      (nil))
>=20
> and with GCC 4.8 we ended up with the following expansion which is
> also correct.
>=20
> (insn 2 4 3 2 (set (subreg:V16QI (reg/v:V4SI 61 [ s ]) 0)
>         (unspec:V16QI [
>                 (mem/c:V16QI (reg/f:SI 16 argp) [0 s+0 S16 A32])
>             ] UNSPEC_LOADU)) t.c:6 1164 {sse2_loaddqu}
>      (nil))
>=20
> So it seems it has been too long and I don't remember what is
> special with arm that it doesn't work...  it possibly simply
> trusts GET_MODE_ALIGNMENT, never looking at MEM_ALIGN which
> I think is OK-ish?
>=20

Yes, that is what Richard said as well.

>>>>> Similarly the very same issue should exist on x86_64 which is
>>>>> !STRICT_ALIGNMENT, it's just the ABI seems to provide the appropriate
>>>>> alignment on the caller side.  So the STRICT_ALIGNMENT check is
>>>>> a wrong one.
>>>>>
>>>>
>>>> I may be plain wrong here, but I thought that !STRICT_ALIGNMENT targets
>>>> just use MEM_ALIGN to select the right instructions.  MEM_ALIGN
>>>> is always 32-bit align on the DImode memory.  The x86_64 vector instru=
ctions
>>>> would look at MEM_ALIGN and do the right thing, yes?
>>>
>>> No, they need to use the movmisalign optab and end up with UNSPECs
>>> for example.
>> Ah, thanks, now I see.
>>
>>>> It seems to be the definition of STRICT_ALIGNMENT targets that all RTL
>>>> instructions need to have MEM_ALIGN >=3D GET_MODE_ALIGNMENT, so the ta=
rget
>>>> does not even have to look at MEM_ALIGN except in the mov_misalign_opt=
ab,
>>>> right?
>>>
>>> Yes, I think we never losened that.  Note that RTL expansion has to
>>> fix this up for them.  Note that strictly speaking SLOW_UNALIGNED_ACCESS
>>> specifies that x86 is strict-align wrt vector modes.
>>>
>>
>> Yes I agree, the code would be incorrect for x86 as well when the movmis=
align_optab
>> is not used.  So I invoke the movmisalign optab if available and if not =
fall
>> back to extract_bit_field.  As in the assign_parm_setup_stack assign_par=
m_setup_reg
>> assumes that data->promoted_mode !=3D data->nominal_mode does not happen=
 with
>> misaligned stack slots.
>>
>>
>> Attached is the v3 if my patch.
>>
>> Boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueab=
ihf.
>>
>> Is it OK for trunk?
>=20
> Few comments.
>=20
> @@ -2274,8 +2274,6 @@ struct assign_parm_data_one
>    int partial;
>    BOOL_BITFIELD named_arg : 1;
>    BOOL_BITFIELD passed_pointer : 1;
> -  BOOL_BITFIELD on_stack : 1;
> -  BOOL_BITFIELD loaded_in_reg : 1;
>  };
>=20
>  /* A subroutine of assign_parms.  Initialize ALL.  */
>=20
> independently OK.
>=20
> @@ -2813,8 +2826,9 @@ assign_parm_adjust_stack_rtl (struct assign_parm_d
>       ultimate type, don't use that slot after entry.  We'll make another
>       stack slot, if we need one.  */
>    if (stack_parm
> -      && ((STRICT_ALIGNMENT
> -          && GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN=20
> (stack_parm))
> +      && ((GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN=20
> (stack_parm)
> +          && targetm.slow_unaligned_access (data->nominal_mode,
> +                                            MEM_ALIGN (stack_parm)))
>           || (data->nominal_type
>               && TYPE_ALIGN (data->nominal_type) > MEM_ALIGN (stack_parm)
>               && MEM_ALIGN (stack_parm) < PREFERRED_STACK_BOUNDARY)))
>=20
> looks like something we should have as a separate commit as well.  It
> also looks obvious to me.
>=20

Okay, committed as two separate commits: r274023 & r274025

> @@ -3292,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all
>=20
>        did_conversion =3D true;
>      }
> +  else if (MEM_P (data->entry_parm)
> +          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
> +             > MEM_ALIGN (data->entry_parm)
>=20
> we arrive here by-passing
>=20
>   else if (need_conversion)
>     {
>       /* We did not have an insn to convert directly, or the sequence
>          generated appeared unsafe.  We must first copy the parm to a
>          pseudo reg, and save the conversion until after all
>          parameters have been moved.  */
>=20
>       int save_tree_used;
>       rtx tempreg =3D gen_reg_rtx (GET_MODE (data->entry_parm));
>=20
>       emit_move_insn (tempreg, validated_mem);
>=20
> but this move instruction is invalid in the same way as the case
> you fix, no?  So wouldn't it be better to do
>=20

We could do that, but I supposed that there must be a reason why
assign_parm_setup_stack gets away with that same:

  if (data->promoted_mode !=3D data->nominal_mode)
    {
      /* Conversion is required.  */
      rtx tempreg =3D gen_reg_rtx (GET_MODE (data->entry_parm));

      emit_move_insn (tempreg, validize_mem (copy_rtx (data->entry_parm)));


So either some back-ends are too permissive with us,
or there is a reason why promoted_mode !=3D nominal_mode
does not happen together with unaligned entry_parm.
In a way that would be a rather unusual ABI.

>   if (moved)
>     /* Nothing to do.  */
>     ;
>   else
>     {
>        if (unaligned)
>          ...
>        else
>          emit_move_insn (...);
>=20
>        if (need_conversion)
>  ....
>     }
>=20
> ?  Hopefully whatever "moved" things in the if (moved) case did
> it correctly.
>=20

It would'nt.  It uses the gen_extend_insn would that be expected to
work with unaligned memory?

> Can you check whehter your patch does anything to the x86 testcase
> posted above?
>=20

Thanks, it might help to have at least a test case where the pattern
is expanded, even if it does not change anything.

> I'm not very familiar with this code so I'm leaving actual approval
> to somebody else.  Still hope the comments were helpful.
>=20

Yes they are, thanks a lot.


Thanks
Bernd.