From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 34639 invoked by alias); 2 Aug 2019 19:01:18 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 34628 invoked by uid 89); 2 Aug 2019 19:01:18 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-5.0 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.1 spammy= X-HELO: EUR01-DB5-obe.outbound.protection.outlook.com Received: from mail-oln040092064052.outbound.protection.outlook.com (HELO EUR01-DB5-obe.outbound.protection.outlook.com) (40.92.64.52) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 02 Aug 2019 19:01:10 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=nW8suPBWNOFUpdl8q4167gV+gaMlbQRQeCw7TsS9naNzLJ2UyYrcd5XWpB1M38ZMNo7wCaY+HBPTtcx/WY97IvTiKBNjbnE50xYiZtGAx04dPUrP1GipgiVqLSrV1l9qhU2M2C2vw6cdiZ7V9gu8mTNt/6/QAJ+6khe3/jnqiqVePi1we1Cc7uitxiGK1ImYdB57iOnBdGIj+KkvRFWCDHgGHdT35oaHhKRGq2Fl4qsDz+9nheZcfYOhYuKqCuNiVQTOWnqNT9zOoB9EuFMH7VnpBDL8JdKIpWw3TBdmIkAkdqiY725RSAQXO0llKsXgF8XPxIdsf/vGQgdz5ySy1g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=q7TtVIgBmMSkYk+lGGzmoXz4M/5nVq/4EaaH0G9F9yA=; b=eLlUOaNb72yD/GJ65wHsaQQGJwwl48k9Uz7RcsiMNJcdpyMPtyLW7T3Ziv3XBOn0v6H1DujsUUnWAijTIL6Z2nn2Hrm9xzKg70avapJ9VB5QAdwswQsnJn/Ad93HVhB6+yJyHoxI+9PAT6Mh3crwAIdoCq010tNTR0yMRolYLhHgcccQW4t2u91YufED87O6jaR5yV83q/tPw7v6RuDtmVO/fK1+UY7FfOa5ROBqxHJkAlpzELbSQUBMq/0flkUiEtZl+yp2M9L3iQKjAWaW9emxQwJy/6mMD5iRVzbA/Xt0Hx4U1NpzgSnkhU7wf9TQqSb3kVABE+/KOfRBg/jiow== ARC-Authentication-Results: i=1; mx.microsoft.com 1;spf=none;dmarc=none;dkim=none;arc=none Received: from DB5EUR01FT032.eop-EUR01.prod.protection.outlook.com (10.152.4.57) by DB5EUR01HT123.eop-EUR01.prod.protection.outlook.com (10.152.5.129) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2136.14; Fri, 2 Aug 2019 19:01:08 +0000 Received: from AM6PR10MB2566.EURPRD10.PROD.OUTLOOK.COM (10.152.4.54) by DB5EUR01FT032.mail.protection.outlook.com (10.152.4.250) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2136.14 via Frontend Transport; Fri, 2 Aug 2019 19:01:07 +0000 Received: from AM6PR10MB2566.EURPRD10.PROD.OUTLOOK.COM ([fe80::c488:4d1b:6ada:37cc]) by AM6PR10MB2566.EURPRD10.PROD.OUTLOOK.COM ([fe80::c488:4d1b:6ada:37cc%3]) with mapi id 15.20.2136.010; Fri, 2 Aug 2019 19:01:07 +0000 From: Bernd Edlinger To: Richard Biener CC: "gcc-patches@gcc.gnu.org" , Richard Earnshaw , Ramana Radhakrishnan , Kyrill Tkachov , Eric Botcazou , Jeff Law , Jakub Jelinek Subject: Re: [PATCHv3] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544) Date: Fri, 02 Aug 2019 19:01:00 -0000 Message-ID: References: In-Reply-To: x-microsoft-original-message-id: <8141d711-a246-70f4-8744-e305411ee40c@hotmail.de> Content-Type: text/plain; charset="Windows-1252" Content-ID: <8C20315FE1592E41B0CB28F8812A2159@EURPRD10.PROD.OUTLOOK.COM> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-SW-Source: 2019-08/txt/msg00178.txt.bz2 On 8/2/19 3:11 PM, Richard Biener wrote: > On Tue, 30 Jul 2019, Bernd Edlinger wrote: >=20 >> >> I have no test coverage for the movmisalign optab though, so I >> rely on your code review for that part. >=20 > It looks OK. I tried to make it trigger on the following on > i?86 with -msse2: >=20 > typedef int v4si __attribute__((vector_size (16))); >=20 > struct S { v4si v; } __attribute__((packed)); >=20 > v4si foo (struct S s) > { > return s.v; > } >=20 Hmm, the entry_parm need to be a MEM_P and an unaligned one. So the test case could be made to trigger it this way: typedef int v4si __attribute__((vector_size (16))); struct S { v4si v; } __attribute__((packed)); int t; v4si foo (struct S a, struct S b, struct S c, struct S d, struct S e, struct S f, struct S g, struct S h, int i, int j, int k, int l, int m, int n, int o, struct S s) { t =3D o; return s.v; } However the code path is still not reached, since targetm.slow_ualigned_acc= ess is always FALSE, which is probably a flaw in my patch. So I think, + else if (MEM_P (data->entry_parm) + && GET_MODE_ALIGNMENT (promoted_nominal_mode) + > MEM_ALIGN (data->entry_parm) + && targetm.slow_unaligned_access (promoted_nominal_mode, + MEM_ALIGN (data->entry_parm))) should probably better be + else if (MEM_P (data->entry_parm) + && GET_MODE_ALIGNMENT (promoted_nominal_mode) + > MEM_ALIGN (data->entry_parm) + && (((icode =3D optab_handler (movmisalign_optab, promoted_nominal= _mode)) + !=3D CODE_FOR_nothing) + || targetm.slow_unaligned_access (promoted_nominal_mode, + MEM_ALIGN (data->entry_parm)= ))) Right? Then the modified test case would use the movmisalign optab. However nothing changes in the end, since the i386 back-end is used to work around the middle end not using movmisalign optab when it should do so. I wonder if I should try to add a gcc_checking_assert to the mov expa= nd patterns that the memory is properly aligned ? > but nowadays x86 seems to be happy with regular moves operating on > unaligned memory, using unaligned moves where necessary. >=20 > (insn 5 2 8 2 (set (reg:V4SI 82 [ _2 ]) > (mem/c:V4SI (reg/f:SI 16 argp) [2 s.v+0 S16 A32])) "t.c":7:11 122= 9=20 > {movv4si_internal} > (nil)) >=20 > and with GCC 4.8 we ended up with the following expansion which is > also correct. >=20 > (insn 2 4 3 2 (set (subreg:V16QI (reg/v:V4SI 61 [ s ]) 0) > (unspec:V16QI [ > (mem/c:V16QI (reg/f:SI 16 argp) [0 s+0 S16 A32]) > ] UNSPEC_LOADU)) t.c:6 1164 {sse2_loaddqu} > (nil)) >=20 > So it seems it has been too long and I don't remember what is > special with arm that it doesn't work... it possibly simply > trusts GET_MODE_ALIGNMENT, never looking at MEM_ALIGN which > I think is OK-ish? >=20 Yes, that is what Richard said as well. >>>>> Similarly the very same issue should exist on x86_64 which is >>>>> !STRICT_ALIGNMENT, it's just the ABI seems to provide the appropriate >>>>> alignment on the caller side. So the STRICT_ALIGNMENT check is >>>>> a wrong one. >>>>> >>>> >>>> I may be plain wrong here, but I thought that !STRICT_ALIGNMENT targets >>>> just use MEM_ALIGN to select the right instructions. MEM_ALIGN >>>> is always 32-bit align on the DImode memory. The x86_64 vector instru= ctions >>>> would look at MEM_ALIGN and do the right thing, yes? >>> >>> No, they need to use the movmisalign optab and end up with UNSPECs >>> for example. >> Ah, thanks, now I see. >> >>>> It seems to be the definition of STRICT_ALIGNMENT targets that all RTL >>>> instructions need to have MEM_ALIGN >=3D GET_MODE_ALIGNMENT, so the ta= rget >>>> does not even have to look at MEM_ALIGN except in the mov_misalign_opt= ab, >>>> right? >>> >>> Yes, I think we never losened that. Note that RTL expansion has to >>> fix this up for them. Note that strictly speaking SLOW_UNALIGNED_ACCESS >>> specifies that x86 is strict-align wrt vector modes. >>> >> >> Yes I agree, the code would be incorrect for x86 as well when the movmis= align_optab >> is not used. So I invoke the movmisalign optab if available and if not = fall >> back to extract_bit_field. As in the assign_parm_setup_stack assign_par= m_setup_reg >> assumes that data->promoted_mode !=3D data->nominal_mode does not happen= with >> misaligned stack slots. >> >> >> Attached is the v3 if my patch. >> >> Boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueab= ihf. >> >> Is it OK for trunk? >=20 > Few comments. >=20 > @@ -2274,8 +2274,6 @@ struct assign_parm_data_one > int partial; > BOOL_BITFIELD named_arg : 1; > BOOL_BITFIELD passed_pointer : 1; > - BOOL_BITFIELD on_stack : 1; > - BOOL_BITFIELD loaded_in_reg : 1; > }; >=20 > /* A subroutine of assign_parms. Initialize ALL. */ >=20 > independently OK. >=20 > @@ -2813,8 +2826,9 @@ assign_parm_adjust_stack_rtl (struct assign_parm_d > ultimate type, don't use that slot after entry. We'll make another > stack slot, if we need one. */ > if (stack_parm > - && ((STRICT_ALIGNMENT > - && GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN=20 > (stack_parm)) > + && ((GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN=20 > (stack_parm) > + && targetm.slow_unaligned_access (data->nominal_mode, > + MEM_ALIGN (stack_parm))) > || (data->nominal_type > && TYPE_ALIGN (data->nominal_type) > MEM_ALIGN (stack_parm) > && MEM_ALIGN (stack_parm) < PREFERRED_STACK_BOUNDARY))) >=20 > looks like something we should have as a separate commit as well. It > also looks obvious to me. >=20 Okay, committed as two separate commits: r274023 & r274025 > @@ -3292,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all >=20 > did_conversion =3D true; > } > + else if (MEM_P (data->entry_parm) > + && GET_MODE_ALIGNMENT (promoted_nominal_mode) > + > MEM_ALIGN (data->entry_parm) >=20 > we arrive here by-passing >=20 > else if (need_conversion) > { > /* We did not have an insn to convert directly, or the sequence > generated appeared unsafe. We must first copy the parm to a > pseudo reg, and save the conversion until after all > parameters have been moved. */ >=20 > int save_tree_used; > rtx tempreg =3D gen_reg_rtx (GET_MODE (data->entry_parm)); >=20 > emit_move_insn (tempreg, validated_mem); >=20 > but this move instruction is invalid in the same way as the case > you fix, no? So wouldn't it be better to do >=20 We could do that, but I supposed that there must be a reason why assign_parm_setup_stack gets away with that same: if (data->promoted_mode !=3D data->nominal_mode) { /* Conversion is required. */ rtx tempreg =3D gen_reg_rtx (GET_MODE (data->entry_parm)); emit_move_insn (tempreg, validize_mem (copy_rtx (data->entry_parm))); So either some back-ends are too permissive with us, or there is a reason why promoted_mode !=3D nominal_mode does not happen together with unaligned entry_parm. In a way that would be a rather unusual ABI. > if (moved) > /* Nothing to do. */ > ; > else > { > if (unaligned) > ... > else > emit_move_insn (...); >=20 > if (need_conversion) > .... > } >=20 > ? Hopefully whatever "moved" things in the if (moved) case did > it correctly. >=20 It would'nt. It uses the gen_extend_insn would that be expected to work with unaligned memory? > Can you check whehter your patch does anything to the x86 testcase > posted above? >=20 Thanks, it might help to have at least a test case where the pattern is expanded, even if it does not change anything. > I'm not very familiar with this code so I'm leaving actual approval > to somebody else. Still hope the comments were helpful. >=20 Yes they are, thanks a lot. Thanks Bernd.