From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 118638 invoked by alias); 14 Aug 2019 11:16:36 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 118630 invoked by uid 89); 14 Aug 2019 11:16:36 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-5.8 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.1 spammy=UD:v, tc, speaking, 1164 X-HELO: mx1.suse.de Received: from mx2.suse.de (HELO mx1.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 14 Aug 2019 11:16:34 +0000 Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 0AE02AD29; Wed, 14 Aug 2019 11:16:31 +0000 (UTC) Date: Wed, 14 Aug 2019 11:56:00 -0000 From: Richard Biener To: Bernd Edlinger cc: "gcc-patches@gcc.gnu.org" , Richard Earnshaw , Ramana Radhakrishnan , Kyrill Tkachov , Eric Botcazou , Jeff Law , Jakub Jelinek Subject: Re: [PATCHv3] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544) In-Reply-To: Message-ID: References: User-Agent: Alpine 2.20 (LSU 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-SW-Source: 2019-08/txt/msg00971.txt.bz2 On Fri, 2 Aug 2019, Bernd Edlinger wrote: > On 8/2/19 3:11 PM, Richard Biener wrote: > > On Tue, 30 Jul 2019, Bernd Edlinger wrote: > > > >> > >> I have no test coverage for the movmisalign optab though, so I > >> rely on your code review for that part. > > > > It looks OK. I tried to make it trigger on the following on > > i?86 with -msse2: > > > > typedef int v4si __attribute__((vector_size (16))); > > > > struct S { v4si v; } __attribute__((packed)); > > > > v4si foo (struct S s) > > { > > return s.v; > > } > > > > Hmm, the entry_parm need to be a MEM_P and an unaligned one. > So the test case could be made to trigger it this way: > > typedef int v4si __attribute__((vector_size (16))); > > struct S { v4si v; } __attribute__((packed)); > > int t; > v4si foo (struct S a, struct S b, struct S c, struct S d, > struct S e, struct S f, struct S g, struct S h, > int i, int j, int k, int l, int m, int n, > int o, struct S s) > { > t = o; > return s.v; > } > > However the code path is still not reached, since targetm.slow_ualigned_access > is always FALSE, which is probably a flaw in my patch. > > So I think, > > + else if (MEM_P (data->entry_parm) > + && GET_MODE_ALIGNMENT (promoted_nominal_mode) > + > MEM_ALIGN (data->entry_parm) > + && targetm.slow_unaligned_access (promoted_nominal_mode, > + MEM_ALIGN (data->entry_parm))) > > should probably better be > > + else if (MEM_P (data->entry_parm) > + && GET_MODE_ALIGNMENT (promoted_nominal_mode) > + > MEM_ALIGN (data->entry_parm) > + && (((icode = optab_handler (movmisalign_optab, promoted_nominal_mode)) > + != CODE_FOR_nothing) > + || targetm.slow_unaligned_access (promoted_nominal_mode, > + MEM_ALIGN (data->entry_parm)))) > > Right? Ah, yes. So it's really the presence of a movmisalign optab makes it a must for unaligned moves and if it is not present then targetm.slow_unaligned_access tells whether we need to use the bitfield extraction/insertion code. > Then the modified test case would use the movmisalign optab. > However nothing changes in the end, since the i386 back-end is used to work > around the middle end not using movmisalign optab when it should do so. Yeah, in the past it would have failed though. I wonder if movmisalign is still needed for x86... > I wonder if I should try to add a gcc_checking_assert to the mov expand > patterns that the memory is properly aligned ? I suppose gen* could add asserts that there is no movmisalign_optab that would match when expanding a mov. Eventually it's enough to guard the mov_optab use in emit_move_insn_1 that way? Or even try movmisalign there... > > > but nowadays x86 seems to be happy with regular moves operating on > > unaligned memory, using unaligned moves where necessary. > > > > (insn 5 2 8 2 (set (reg:V4SI 82 [ _2 ]) > > (mem/c:V4SI (reg/f:SI 16 argp) [2 s.v+0 S16 A32])) "t.c":7:11 1229 > > {movv4si_internal} > > (nil)) > > > > and with GCC 4.8 we ended up with the following expansion which is > > also correct. > > > > (insn 2 4 3 2 (set (subreg:V16QI (reg/v:V4SI 61 [ s ]) 0) > > (unspec:V16QI [ > > (mem/c:V16QI (reg/f:SI 16 argp) [0 s+0 S16 A32]) > > ] UNSPEC_LOADU)) t.c:6 1164 {sse2_loaddqu} > > (nil)) > > > > So it seems it has been too long and I don't remember what is > > special with arm that it doesn't work... it possibly simply > > trusts GET_MODE_ALIGNMENT, never looking at MEM_ALIGN which > > I think is OK-ish? > > > > Yes, that is what Richard said as well. > > >>>>> Similarly the very same issue should exist on x86_64 which is > >>>>> !STRICT_ALIGNMENT, it's just the ABI seems to provide the appropriate > >>>>> alignment on the caller side. So the STRICT_ALIGNMENT check is > >>>>> a wrong one. > >>>>> > >>>> > >>>> I may be plain wrong here, but I thought that !STRICT_ALIGNMENT targets > >>>> just use MEM_ALIGN to select the right instructions. MEM_ALIGN > >>>> is always 32-bit align on the DImode memory. The x86_64 vector instructions > >>>> would look at MEM_ALIGN and do the right thing, yes? > >>> > >>> No, they need to use the movmisalign optab and end up with UNSPECs > >>> for example. > >> Ah, thanks, now I see. > >> > >>>> It seems to be the definition of STRICT_ALIGNMENT targets that all RTL > >>>> instructions need to have MEM_ALIGN >= GET_MODE_ALIGNMENT, so the target > >>>> does not even have to look at MEM_ALIGN except in the mov_misalign_optab, > >>>> right? > >>> > >>> Yes, I think we never losened that. Note that RTL expansion has to > >>> fix this up for them. Note that strictly speaking SLOW_UNALIGNED_ACCESS > >>> specifies that x86 is strict-align wrt vector modes. > >>> > >> > >> Yes I agree, the code would be incorrect for x86 as well when the movmisalign_optab > >> is not used. So I invoke the movmisalign optab if available and if not fall > >> back to extract_bit_field. As in the assign_parm_setup_stack assign_parm_setup_reg > >> assumes that data->promoted_mode != data->nominal_mode does not happen with > >> misaligned stack slots. > >> > >> > >> Attached is the v3 if my patch. > >> > >> Boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf. > >> > >> Is it OK for trunk? > > > > Few comments. > > > > @@ -2274,8 +2274,6 @@ struct assign_parm_data_one > > int partial; > > BOOL_BITFIELD named_arg : 1; > > BOOL_BITFIELD passed_pointer : 1; > > - BOOL_BITFIELD on_stack : 1; > > - BOOL_BITFIELD loaded_in_reg : 1; > > }; > > > > /* A subroutine of assign_parms. Initialize ALL. */ > > > > independently OK. > > > > @@ -2813,8 +2826,9 @@ assign_parm_adjust_stack_rtl (struct assign_parm_d > > ultimate type, don't use that slot after entry. We'll make another > > stack slot, if we need one. */ > > if (stack_parm > > - && ((STRICT_ALIGNMENT > > - && GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN > > (stack_parm)) > > + && ((GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN > > (stack_parm) > > + && targetm.slow_unaligned_access (data->nominal_mode, > > + MEM_ALIGN (stack_parm))) > > || (data->nominal_type > > && TYPE_ALIGN (data->nominal_type) > MEM_ALIGN (stack_parm) > > && MEM_ALIGN (stack_parm) < PREFERRED_STACK_BOUNDARY))) > > > > looks like something we should have as a separate commit as well. It > > also looks obvious to me. > > > > Okay, committed as two separate commits: r274023 & r274025 > > > @@ -3292,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all > > > > did_conversion = true; > > } > > + else if (MEM_P (data->entry_parm) > > + && GET_MODE_ALIGNMENT (promoted_nominal_mode) > > + > MEM_ALIGN (data->entry_parm) > > > > we arrive here by-passing > > > > else if (need_conversion) > > { > > /* We did not have an insn to convert directly, or the sequence > > generated appeared unsafe. We must first copy the parm to a > > pseudo reg, and save the conversion until after all > > parameters have been moved. */ > > > > int save_tree_used; > > rtx tempreg = gen_reg_rtx (GET_MODE (data->entry_parm)); > > > > emit_move_insn (tempreg, validated_mem); > > > > but this move instruction is invalid in the same way as the case > > you fix, no? So wouldn't it be better to do > > > > We could do that, but I supposed that there must be a reason why > assign_parm_setup_stack gets away with that same: > > if (data->promoted_mode != data->nominal_mode) > { > /* Conversion is required. */ > rtx tempreg = gen_reg_rtx (GET_MODE (data->entry_parm)); > > emit_move_insn (tempreg, validize_mem (copy_rtx (data->entry_parm))); > > > So either some back-ends are too permissive with us, > or there is a reason why promoted_mode != nominal_mode > does not happen together with unaligned entry_parm. > In a way that would be a rather unusual ABI. > > > if (moved) > > /* Nothing to do. */ > > ; > > else > > { > > if (unaligned) > > ... > > else > > emit_move_insn (...); > > > > if (need_conversion) > > .... > > } > > > > ? Hopefully whatever "moved" things in the if (moved) case did > > it correctly. > > > > It would'nt. It uses the gen_extend_insn would that be expected to > work with unaligned memory? No idea.. > > Can you check whehter your patch does anything to the x86 testcase > > posted above? > > > > Thanks, it might help to have at least a test case where the pattern > is expanded, even if it does not change anything. > > > I'm not very familiar with this code so I'm leaving actual approval > > to somebody else. Still hope the comments were helpful. > > > > Yes they are, thanks a lot. Sorry for the slow response(s). Richard.