From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-506926-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 118638 invoked by alias); 14 Aug 2019 11:16:36 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 118630 invoked by uid 89); 14 Aug 2019 11:16:36 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-5.8 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.1 spammy=UD:v, tc, speaking, 1164
X-HELO: mx1.suse.de
Received: from mx2.suse.de (HELO mx1.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 14 Aug 2019 11:16:34 +0000
Received: from relay2.suse.de (unknown [195.135.220.254])	by mx1.suse.de (Postfix) with ESMTP id 0AE02AD29;	Wed, 14 Aug 2019 11:16:31 +0000 (UTC)
Date: Wed, 14 Aug 2019 11:56:00 -0000
From: Richard Biener <rguenther@suse.de>
To: Bernd Edlinger <bernd.edlinger@hotmail.de>
cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,     Richard Earnshaw <richard.earnshaw@arm.com>,     Ramana Radhakrishnan <ramana.radhakrishnan@arm.com>,     Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>,     Eric Botcazou <ebotcazou@adacore.com>, Jeff Law <law@redhat.com>,     Jakub Jelinek <jakub@redhat.com>
Subject: Re: [PATCHv3] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
In-Reply-To:  <AM6PR10MB2566A6E51DC500187D9EC6CFE4D90@AM6PR10MB2566.EURPRD10.PROD.OUTLOOK.COM>
Message-ID: <alpine.LSU.2.20.1908141307300.11741@zhemvz.fhfr.qr>
References: <AM6PR07MB4037775DF79E0229DCCA425AE44F0@AM6PR07MB4037.eurprd07.prod.outlook.com> <alpine.LSU.2.20.1903211208070.4934@zhemvz.fhfr.qr> <AM6PR07MB403745E0BCCCF005A02B0CBDE4430@AM6PR07MB4037.eurprd07.prod.outlook.com> <alpine.LSU.2.20.1903250937530.4934@zhemvz.fhfr.qr> <AM6PR10MB256664D731C3CC92F2FBEDC5E4DC0@AM6PR10MB2566.EURPRD10.PROD.OUTLOOK.COM> <alpine.LSU.2.20.1908021451300.19626@zhemvz.fhfr.qr>  <AM6PR10MB2566A6E51DC500187D9EC6CFE4D90@AM6PR10MB2566.EURPRD10.PROD.OUTLOOK.COM>
User-Agent: Alpine 2.20 (LSU 67 2015-01-07)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
X-SW-Source: 2019-08/txt/msg00971.txt.bz2

On Fri, 2 Aug 2019, Bernd Edlinger wrote:

> On 8/2/19 3:11 PM, Richard Biener wrote:
> > On Tue, 30 Jul 2019, Bernd Edlinger wrote:
> > 
> >>
> >> I have no test coverage for the movmisalign optab though, so I
> >> rely on your code review for that part.
> > 
> > It looks OK.  I tried to make it trigger on the following on
> > i?86 with -msse2:
> > 
> > typedef int v4si __attribute__((vector_size (16)));
> > 
> > struct S { v4si v; } __attribute__((packed));
> > 
> > v4si foo (struct S s)
> > {
> >   return s.v;
> > }
> > 
> 
> Hmm, the entry_parm need to be a MEM_P and an unaligned one.
> So the test case could be made to trigger it this way:
> 
> typedef int v4si __attribute__((vector_size (16)));
> 
> struct S { v4si v; } __attribute__((packed));
> 
> int t;
> v4si foo (struct S a, struct S b, struct S c, struct S d,
>           struct S e, struct S f, struct S g, struct S h,
>           int i, int j, int k, int l, int m, int n,
>           int o, struct S s)
> {
>   t = o;
>   return s.v;
> }
> 
> However the code path is still not reached, since targetm.slow_ualigned_access
> is always FALSE, which is probably a flaw in my patch.
> 
> So I think,
> 
> +  else if (MEM_P (data->entry_parm)
> +          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
> +             > MEM_ALIGN (data->entry_parm)
> +          && targetm.slow_unaligned_access (promoted_nominal_mode,
> +                                            MEM_ALIGN (data->entry_parm)))
> 
> should probably better be
> 
> +  else if (MEM_P (data->entry_parm)
> +          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
> +             > MEM_ALIGN (data->entry_parm)
> +        && (((icode = optab_handler (movmisalign_optab, promoted_nominal_mode))
> +             != CODE_FOR_nothing)
> +            || targetm.slow_unaligned_access (promoted_nominal_mode,
> +                                              MEM_ALIGN (data->entry_parm))))
> 
> Right?

Ah, yes.  So it's really the presence of a movmisalign optab makes it
a must for unaligned moves and if it is not present then
targetm.slow_unaligned_access tells whether we need to use the bitfield
extraction/insertion code.

> Then the modified test case would use the movmisalign optab.
> However nothing changes in the end, since the i386 back-end is used to work
> around the middle end not using movmisalign optab when it should do so.

Yeah, in the past it would have failed though.  I wonder if movmisalign
is still needed for x86...

> I wonder if I should try to add a gcc_checking_assert to the mov<mode> expand
> patterns that the memory is properly aligned ?

I suppose gen* could add asserts that there is no movmisalign_optab
that would match when expanding a mov<mode>.  Eventually it's enough
to guard the mov_optab use in emit_move_insn_1 that way?  Or even
try movmisalign there...

> 
> > but nowadays x86 seems to be happy with regular moves operating on
> > unaligned memory, using unaligned moves where necessary.
> > 
> > (insn 5 2 8 2 (set (reg:V4SI 82 [ _2 ])
> >         (mem/c:V4SI (reg/f:SI 16 argp) [2 s.v+0 S16 A32])) "t.c":7:11 1229 
> > {movv4si_internal}
> >      (nil))
> > 
> > and with GCC 4.8 we ended up with the following expansion which is
> > also correct.
> > 
> > (insn 2 4 3 2 (set (subreg:V16QI (reg/v:V4SI 61 [ s ]) 0)
> >         (unspec:V16QI [
> >                 (mem/c:V16QI (reg/f:SI 16 argp) [0 s+0 S16 A32])
> >             ] UNSPEC_LOADU)) t.c:6 1164 {sse2_loaddqu}
> >      (nil))
> > 
> > So it seems it has been too long and I don't remember what is
> > special with arm that it doesn't work...  it possibly simply
> > trusts GET_MODE_ALIGNMENT, never looking at MEM_ALIGN which
> > I think is OK-ish?
> > 
> 
> Yes, that is what Richard said as well.
> 
> >>>>> Similarly the very same issue should exist on x86_64 which is
> >>>>> !STRICT_ALIGNMENT, it's just the ABI seems to provide the appropriate
> >>>>> alignment on the caller side.  So the STRICT_ALIGNMENT check is
> >>>>> a wrong one.
> >>>>>
> >>>>
> >>>> I may be plain wrong here, but I thought that !STRICT_ALIGNMENT targets
> >>>> just use MEM_ALIGN to select the right instructions.  MEM_ALIGN
> >>>> is always 32-bit align on the DImode memory.  The x86_64 vector instructions
> >>>> would look at MEM_ALIGN and do the right thing, yes?
> >>>
> >>> No, they need to use the movmisalign optab and end up with UNSPECs
> >>> for example.
> >> Ah, thanks, now I see.
> >>
> >>>> It seems to be the definition of STRICT_ALIGNMENT targets that all RTL
> >>>> instructions need to have MEM_ALIGN >= GET_MODE_ALIGNMENT, so the target
> >>>> does not even have to look at MEM_ALIGN except in the mov_misalign_optab,
> >>>> right?
> >>>
> >>> Yes, I think we never losened that.  Note that RTL expansion has to
> >>> fix this up for them.  Note that strictly speaking SLOW_UNALIGNED_ACCESS
> >>> specifies that x86 is strict-align wrt vector modes.
> >>>
> >>
> >> Yes I agree, the code would be incorrect for x86 as well when the movmisalign_optab
> >> is not used.  So I invoke the movmisalign optab if available and if not fall
> >> back to extract_bit_field.  As in the assign_parm_setup_stack assign_parm_setup_reg
> >> assumes that data->promoted_mode != data->nominal_mode does not happen with
> >> misaligned stack slots.
> >>
> >>
> >> Attached is the v3 if my patch.
> >>
> >> Boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf.
> >>
> >> Is it OK for trunk?
> > 
> > Few comments.
> > 
> > @@ -2274,8 +2274,6 @@ struct assign_parm_data_one
> >    int partial;
> >    BOOL_BITFIELD named_arg : 1;
> >    BOOL_BITFIELD passed_pointer : 1;
> > -  BOOL_BITFIELD on_stack : 1;
> > -  BOOL_BITFIELD loaded_in_reg : 1;
> >  };
> > 
> >  /* A subroutine of assign_parms.  Initialize ALL.  */
> > 
> > independently OK.
> > 
> > @@ -2813,8 +2826,9 @@ assign_parm_adjust_stack_rtl (struct assign_parm_d
> >       ultimate type, don't use that slot after entry.  We'll make another
> >       stack slot, if we need one.  */
> >    if (stack_parm
> > -      && ((STRICT_ALIGNMENT
> > -          && GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN 
> > (stack_parm))
> > +      && ((GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN 
> > (stack_parm)
> > +          && targetm.slow_unaligned_access (data->nominal_mode,
> > +                                            MEM_ALIGN (stack_parm)))
> >           || (data->nominal_type
> >               && TYPE_ALIGN (data->nominal_type) > MEM_ALIGN (stack_parm)
> >               && MEM_ALIGN (stack_parm) < PREFERRED_STACK_BOUNDARY)))
> > 
> > looks like something we should have as a separate commit as well.  It
> > also looks obvious to me.
> > 
> 
> Okay, committed as two separate commits: r274023 & r274025
> 
> > @@ -3292,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all
> > 
> >        did_conversion = true;
> >      }
> > +  else if (MEM_P (data->entry_parm)
> > +          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
> > +             > MEM_ALIGN (data->entry_parm)
> > 
> > we arrive here by-passing
> > 
> >   else if (need_conversion)
> >     {
> >       /* We did not have an insn to convert directly, or the sequence
> >          generated appeared unsafe.  We must first copy the parm to a
> >          pseudo reg, and save the conversion until after all
> >          parameters have been moved.  */
> > 
> >       int save_tree_used;
> >       rtx tempreg = gen_reg_rtx (GET_MODE (data->entry_parm));
> > 
> >       emit_move_insn (tempreg, validated_mem);
> > 
> > but this move instruction is invalid in the same way as the case
> > you fix, no?  So wouldn't it be better to do
> > 
> 
> We could do that, but I supposed that there must be a reason why
> assign_parm_setup_stack gets away with that same:
> 
>   if (data->promoted_mode != data->nominal_mode)
>     {
>       /* Conversion is required.  */
>       rtx tempreg = gen_reg_rtx (GET_MODE (data->entry_parm));
> 
>       emit_move_insn (tempreg, validize_mem (copy_rtx (data->entry_parm)));
> 
> 
> So either some back-ends are too permissive with us,
> or there is a reason why promoted_mode != nominal_mode
> does not happen together with unaligned entry_parm.
> In a way that would be a rather unusual ABI.
> 
> >   if (moved)
> >     /* Nothing to do.  */
> >     ;
> >   else
> >     {
> >        if (unaligned)
> >          ...
> >        else
> >          emit_move_insn (...);
> > 
> >        if (need_conversion)
> >  ....
> >     }
> > 
> > ?  Hopefully whatever "moved" things in the if (moved) case did
> > it correctly.
> > 
> 
> It would'nt.  It uses the gen_extend_insn would that be expected to
> work with unaligned memory?

No idea..

> > Can you check whehter your patch does anything to the x86 testcase
> > posted above?
> > 
> 
> Thanks, it might help to have at least a test case where the pattern
> is expanded, even if it does not change anything.
> 
> > I'm not very familiar with this code so I'm leaving actual approval
> > to somebody else.  Still hope the comments were helpful.
> > 
> 
> Yes they are, thanks a lot.

Sorry for the slow response(s).

Richard.