From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=4+PK=4E=arm.com=Tamar.Christina@sourceware.org>
Received: from EUR05-DB8-obe.outbound.protection.outlook.com (mail-db8eur05on2052.outbound.protection.outlook.com [40.107.20.52])
	by sourceware.org (Postfix) with ESMTPS id B4C603856B4D
	for <gcc-patches@gcc.gnu.org>; Tue,  6 Dec 2022 10:58:52 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B4C603856B4D
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com;
 s=selector2-armh-onmicrosoft-com;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=dYDMHgkyZt7xqpZES3Fyeo9mS6ujEC8T2g39pnBf81U=;
 b=uFOonn88GVvJlaLPxQmR72PdvEYSwbBUFvhTwx3CwFAb/I0Vyn6bni5Ks4073YlDAjmp9ZxS0QserGt9pFFIsx4eLDcT+j61oT51R5baQV3ojvlYg+ZzKhbMNrozBnpLAGLLCPEKWtmS7kr2knBr7HWB5gsQVMXxJjH/ZlB9xa8=
Received: from AM7PR03CA0026.eurprd03.prod.outlook.com (2603:10a6:20b:130::36)
 by DU0PR08MB9001.eurprd08.prod.outlook.com (2603:10a6:10:466::22) with
 Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.14; Tue, 6 Dec
 2022 10:58:39 +0000
Received: from AM7EUR03FT046.eop-EUR03.prod.protection.outlook.com
 (2603:10a6:20b:130:cafe::ad) by AM7PR03CA0026.outlook.office365.com
 (2603:10a6:20b:130::36) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.10 via Frontend
 Transport; Tue, 6 Dec 2022 10:58:39 +0000
X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123)
 smtp.mailfrom=arm.com; dkim=pass (signature was verified)
 header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com;
Received-SPF: Pass (protection.outlook.com: domain of arm.com designates
 63.35.35.123 as permitted sender) receiver=protection.outlook.com;
 client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com;
 pr=C
Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by
 AM7EUR03FT046.mail.protection.outlook.com (100.127.140.78) with Microsoft
 SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.20.5880.14 via Frontend Transport; Tue, 6 Dec 2022 10:58:37 +0000
Received: ("Tessian outbound b4aebcc5bc64:v130"); Tue, 06 Dec 2022 10:58:37 +0000
X-CR-MTA-TID: 64aa7808
Received: from 59e2a619a84b.1
	by 64aa7808-outbound-1.mta.getcheckrecipient.com id CE0C8440-2E11-4133-A1C6-47A0C956B6E5.1;
	Tue, 06 Dec 2022 10:58:26 +0000
Received: from EUR02-AM0-obe.outbound.protection.outlook.com
    by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 59e2a619a84b.1
    (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384);
    Tue, 06 Dec 2022 10:58:26 +0000
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none;
 b=DUy4fIOwTlWCJwK261HMZ/v8iHtwNhO5wiqqmGSftkYN+AjbzOUW/CxdfVou0XuSo1pLdkkrmjtULVkukHhFznspP9bv9WzSp4JVTNXlLsy5m0ar/MG1oENj7IbylOlQWYCCZ0+oYjWDcvT42b2CxaBZfLJmvsCVt4nznrut3m9c/utp+rWEeuod1UqA8IDX6kvk1z8HK1CL3VLFmT6qz/Kyzg8a4MfyveVblv3upl8TQDdprBlblT2geq6Icn5Gc5LNZq9IyjOkHTFGWz7lQyO6MblyIeUWjawvOm1Xls61jtY4eE7FaShoWHC6SCCaptP6UUBlwoOVggKXIHSHKA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
 s=arcselector9901;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=dYDMHgkyZt7xqpZES3Fyeo9mS6ujEC8T2g39pnBf81U=;
 b=iyNpnLdNBnK2nHtuZn+02DjCtRfq7pmr6pwdFNFmDohhBlAEfUq84KnmnqhbdnaP4NBQ1UcdNODAQAvnPhLK368P7FdJGNllahL8k4VyTz+TmOgXeTtT0tpjRxJJTdIK/gDSplXQU94UH5m2Rf0a6n7IjiG6tD4M6EUCzvTf6ryM78l3PBOeV5epxVzXNOTHy/Ts8pcy4aShOD5BB+PRmN0hFcku/MGluFfbPMtjYZ00gKJTEPAmoJbGOE+9jfw7RIdPHhC2mYbOJoXeeqNVVEKx7nIeFv4q5QvOINNk+Z/Lg5PNiUde6zkYslmFgaea3PACqW4K7y9juQPiWX5S5A==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
 smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass
 header.d=arm.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com;
 s=selector2-armh-onmicrosoft-com;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=dYDMHgkyZt7xqpZES3Fyeo9mS6ujEC8T2g39pnBf81U=;
 b=uFOonn88GVvJlaLPxQmR72PdvEYSwbBUFvhTwx3CwFAb/I0Vyn6bni5Ks4073YlDAjmp9ZxS0QserGt9pFFIsx4eLDcT+j61oT51R5baQV3ojvlYg+ZzKhbMNrozBnpLAGLLCPEKWtmS7kr2knBr7HWB5gsQVMXxJjH/ZlB9xa8=
Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17)
 by DU0PR08MB9077.eurprd08.prod.outlook.com (2603:10a6:10:471::11) with
 Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.14; Tue, 6 Dec
 2022 10:58:24 +0000
Received: from VI1PR08MB5325.eurprd08.prod.outlook.com
 ([fe80::bd2a:aff9:b1a0:2fc7]) by VI1PR08MB5325.eurprd08.prod.outlook.com
 ([fe80::bd2a:aff9:b1a0:2fc7%4]) with mapi id 15.20.5880.014; Tue, 6 Dec 2022
 10:58:24 +0000
From: Tamar Christina <Tamar.Christina@arm.com>
To: Richard Sandiford <Richard.Sandiford@arm.com>
CC: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>, nd <nd@arm.com>,
	Richard Earnshaw <Richard.Earnshaw@arm.com>, Marcus Shawcroft
	<Marcus.Shawcroft@arm.com>, Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
Subject: RE: [PATCH 5/8]AArch64 aarch64: Make existing V2HF be usable.
Thread-Topic: [PATCH 5/8]AArch64 aarch64: Make existing V2HF be usable.
Thread-Index: AQHY7SAlWreqI6axPUuXHMahz+7ngK4qKs2fgA+wqFCAJwVLw4AAB96w
Date: Tue, 6 Dec 2022 10:58:24 +0000
Message-ID:
 <VI1PR08MB5325AE85029C5D3294F6EF47FF1B9@VI1PR08MB5325.eurprd08.prod.outlook.com>
References: <Y1+4euF0rUwFIjTL@arm.com> <mptfsf2vhzj.fsf@arm.com>
	<VI1PR08MB5325335D195073D1E5B2AB33FF009@VI1PR08MB5325.eurprd08.prod.outlook.com>
 <mptzgc0bzcq.fsf@arm.com>
In-Reply-To: <mptzgc0bzcq.fsf@arm.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-ts-tracking-id: 9F2F6848C808B64B8D0BB09363D2BC91.0
x-checkrecipientchecked: true
Authentication-Results-Original: dkim=none (message not signed)
 header.d=none;dmarc=none action=none header.from=arm.com;
x-ms-traffictypediagnostic:
	VI1PR08MB5325:EE_|DU0PR08MB9077:EE_|AM7EUR03FT046:EE_|DU0PR08MB9001:EE_
X-MS-Office365-Filtering-Correlation-Id: b8a635d1-49f9-4d3e-1137-08dad778d3c5
x-checkrecipientrouted: true
nodisclaimer: true
X-MS-Exchange-SenderADCheck: 1
X-MS-Exchange-AntiSpam-Relay: 0
X-Microsoft-Antispam-Untrusted: BCL:0;
X-Microsoft-Antispam-Message-Info-Original:
 BxvLJ9iE3vTJAQyhALWYfK2o/+R/GS6lUEL5iS4ycgWH2tKL9QbmE+Hx/4Yu+6kNZnmNFU6DhQZLnSK0lYhEjS5KdIAugjpbfHdvVQZNW6O6RiWQkNpApvTldeb+qTlV5tsPhDMXbjvsmZ+sJ+L4tgtDphgLP9yLfTywaUjBYudW/euOVPm0HvylfngxzQKQZsllVwde0fLvhB3tuNQrxjWQJdC77xpi7QGO+8YmK1G4dws+HDTqIclY8UXmaZCtxUR2cjhusSbFCNmJIUaPwqmFeC7Q7on0/WgfcxOX/2A+NVGKvQ3eddULSD3Ct+DQtkZ6zQwmtFlqbSxrmXRjLeoF8x567k9WnnJ1NsKYSyk3+mbZTiCNeptfmXzIl8aIYzvbSuuap9v2OUcSZHkKKloTLdJGvGaDRYAQE7ZBBn5RB8d/nZeNvH1xMu97GBYYwpfQr9YgT4j+L3cwDUxseucFpDINv4AYt6M/62vogecyxui/3LGJe8vOjxyyQpaCzPM1we0bROLRNiDigUIzr46W6iAzcPe3v0GLhj0DbOJe/SGssrTkGDFKQU2outqGM56VlWy5knpz9Zri+A5FRRrEbVFNTW1VJe89cHUgdpdDbm0Of+KRO5cMDnnQOmWcHJtXEfemGl1RvCdSuJG8Yse877aJ9u1xu+YgD5ifEX1XzQ4EcDwhdupZS5vZ8LmjihxOFXh5mXZKhpxQAaXgvmPSesdhlSIUa7hr+9N9SH1/smR6wcIHJ6jtHU1bwi491xYDMkAbYuA5Ba38/v75yg==
X-Forefront-Antispam-Report-Untrusted:
 CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:VI1PR08MB5325.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230022)(4636009)(136003)(39860400002)(366004)(376002)(396003)(346002)(451199015)(71200400001)(2906002)(38070700005)(86362001)(41300700001)(9686003)(186003)(316002)(6636002)(54906003)(6506007)(7696005)(30864003)(83380400001)(53546011)(26005)(5660300002)(6862004)(8936002)(52536014)(33656002)(66476007)(66556008)(66446008)(122000001)(66946007)(76116006)(55016003)(38100700002)(64756008)(478600001)(4326008)(84970400001)(8676002)(579004);DIR:OUT;SFP:1101;
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU0PR08MB9077
Original-Authentication-Results: dkim=none (message not signed)
 header.d=none;dmarc=none action=none header.from=arm.com;
X-EOPAttributedMessage: 0
X-MS-Exchange-Transport-CrossTenantHeadersStripped:
 AM7EUR03FT046.eop-EUR03.prod.protection.outlook.com
X-MS-PublicTrafficType: Email
X-MS-Office365-Filtering-Correlation-Id-Prvs:
	79d59d2f-5967-49c8-8e1d-08dad778cbe5
X-Microsoft-Antispam: BCL:0;
X-Microsoft-Antispam-Message-Info:
	gRV/UZb+0JZKUkFPBdL75bWs74cYBT6sZE2scV24C5gMmxYwASRn/V40Gh67XHIAlc9aXnrwRFPyLtYwkEFy/ppR1bY2WCaRoq7yLZe8Z67zYtOmow6gDjg/R8VEX0jANUYrip09GL/XnoRkes2UgiDis2C/WIC5T78xmY2DIddrg8bjsxDahWQg1WbxukBgc1o68Z8gYcrfFHnCIt4/Y0ZeN/CKPdA04y4avHYN8DRBNz1qb1zAMzMMqNLt38jfqJnPjy5cXfgniQ7WphJRC6VGV+WjiOqsUGOpqQK5PochLjb/Dqi0VNXgJ6dadgoi9tm/3OEKYe+PZ91a7z+iZhLhpptnHsoDGMO4bzfmhx1qqaEGISJGUJnaIo+9niDeR1OWOFXix/VDJMEONN5LsfB8NA14O3/Bv5BaeJRwWFKE3wgjkh4S8xnLlwViI3ftDLUUcvUbgS7ohuoUqZ/RIkBnqUTyehbFhmg6BXArqeWlZo0pPVJxwKPeetxCSwVHdWf3KU8Q1cJCoMkMBN/YhCdlyjqAMGeVXzoTJTfbqh7ASnPNF4f88yfUb7cJT5MV3MpHLz9lHyqvzIIazQy1RpnoWptFdPkLtDkhqxTsJ36qY3n/Z4NKpq3Y7skmTLuYNpgCD3iEvPMfzzQK3830DfkoQg1GkmRVHcx2na7WzhG82yMeVE3+wkJuu3lD3X4oeDKPB2XQTMGYD2dy/4frtCCg30oKZnutQ0aPoXSd/3YlsI+HWsnNyVmXmVHC9GpZJPFelfrCrdY6Ym6S2qUHVg==
X-Forefront-Antispam-Report:
	CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230022)(4636009)(39860400002)(376002)(346002)(396003)(136003)(451199015)(40470700004)(46966006)(36840700001)(70206006)(40480700001)(4326008)(8676002)(41300700001)(70586007)(6862004)(8936002)(52536014)(316002)(86362001)(55016003)(54906003)(84970400001)(30864003)(81166007)(6636002)(356005)(6506007)(9686003)(2906002)(26005)(7696005)(82740400003)(36860700001)(47076005)(5660300002)(33656002)(478600001)(40460700003)(53546011)(186003)(83380400001)(336012)(82310400005);DIR:OUT;SFP:1101;
X-OriginatorOrg: arm.com
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Dec 2022 10:58:37.4676
 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: b8a635d1-49f9-4d3e-1137-08dad778d3c5
X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com]
X-MS-Exchange-CrossTenant-AuthSource:
	AM7EUR03FT046.eop-EUR03.prod.protection.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Anonymous
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU0PR08MB9001
X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,GIT_PATCH_0,KAM_DMARC_NONE,KAM_LOTSOFHASH,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: Tuesday, December 6, 2022 10:28 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Marcus Shawcroft
> <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Subject: Re: [PATCH 5/8]AArch64 aarch64: Make existing V2HF be usable.
>=20
> Tamar Christina <Tamar.Christina@arm.com> writes:
> > Hi,
> >
> >
> >> This name might cause confusion with the SVE iterators, where FULL
> >> means "every bit of the register is used".  How about something like
> >> VMOVE instead?
> >>
> >> With this change, I guess VALL_F16 represents "The set of all modes
> >> for which the vld1 intrinsics are provided" and VMOVE or whatever is
> >> "All Advanced SIMD modes suitable for moving, loading, and storing".
> >> That is, VMOVE extends VALL_F16 with modes that are not manifested
> >> via intrinsics.
> >>
> >
> > Done.
> >
> >> Where is the 2h used, and is it valid syntax in that context?
> >>
> >> Same for later instances of 2h.
> >
> > They are, but they weren't meant to be in this patch.  They belong in
> > a separate FP16 series that I won't get to finish for GCC 13 due not
> > being able to finish writing all the tests.  I have moved them to that =
patch
> series though.
> >
> > While the addp patch series has been killed, this patch is still good
> > standalone and improves codegen as shown in the updated testcase.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* config/aarch64/aarch64-simd.md (*aarch64_simd_movv2hf): New.
> > 	(mov<mode>, movmisalign<mode>, aarch64_dup_lane<mode>,
> > 	aarch64_store_lane0<mode>, aarch64_simd_vec_set<mode>,
> > 	@aarch64_simd_vec_copy_lane<mode>, vec_set<mode>,
> > 	reduc_<optab>_scal_<mode>, reduc_<fmaxmin>_scal_<mode>,
> > 	aarch64_reduc_<optab>_internal<mode>,
> aarch64_get_lane<mode>,
> > 	vec_init<mode><Vel>, vec_extract<mode><Vel>): Support V2HF.
> > 	(aarch64_simd_dupv2hf): New.
> > 	* config/aarch64/aarch64.cc (aarch64_classify_vector_mode):
> > 	Add E_V2HFmode.
> > 	* config/aarch64/iterators.md (VHSDF_P): New.
> > 	(V2F, VMOVE, nunits, Vtype, Vmtype, Vetype, stype, VEL,
> > 	Vel, q, vp): Add V2HF.
> > 	* config/arm/types.md (neon_fp_reduc_add_h): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 	* gcc.target/aarch64/sve/slp_1.c: Update testcase.
> >
> > --- inline copy of patch ---
> >
> > diff --git a/gcc/config/aarch64/aarch64-simd.md
> > b/gcc/config/aarch64/aarch64-simd.md
> > index
> >
> f4152160084d6b6f34bd69f0ba6386c1ab50f77e..487a31010245accec28e779661
> e6
> > c2d578fca4b7 100644
> > --- a/gcc/config/aarch64/aarch64-simd.md
> > +++ b/gcc/config/aarch64/aarch64-simd.md
> > @@ -19,10 +19,10 @@
> >  ;; <http://www.gnu.org/licenses/>.
> >
> >  (define_expand "mov<mode>"
> > -  [(set (match_operand:VALL_F16 0 "nonimmediate_operand")
> > -	(match_operand:VALL_F16 1 "general_operand"))]
> > +  [(set (match_operand:VMOVE 0 "nonimmediate_operand")
> > +	(match_operand:VMOVE 1 "general_operand"))]
> >    "TARGET_SIMD"
> > -  "
> > +{
> >    /* Force the operand into a register if it is not an
> >       immediate whose use can be replaced with xzr.
> >       If the mode is 16 bytes wide, then we will be doing @@ -46,12
> > +46,11 @@ (define_expand "mov<mode>"
> >        aarch64_expand_vector_init (operands[0], operands[1]);
> >        DONE;
> >      }
> > -  "
> > -)
> > +})
> >
> >  (define_expand "movmisalign<mode>"
> > -  [(set (match_operand:VALL_F16 0 "nonimmediate_operand")
> > -        (match_operand:VALL_F16 1 "general_operand"))]
> > +  [(set (match_operand:VMOVE 0 "nonimmediate_operand")
> > +        (match_operand:VMOVE 1 "general_operand"))]
> >    "TARGET_SIMD && !STRICT_ALIGNMENT"
> >  {
> >    /* This pattern is not permitted to fail during expansion: if both
> > arguments @@ -73,6 +72,16 @@ (define_insn
> "aarch64_simd_dup<mode>"
> >    [(set_attr "type" "neon_dup<q>, neon_from_gp<q>")]
> >  )
> >
> > +(define_insn "aarch64_simd_dupv2hf"
> > +  [(set (match_operand:V2HF 0 "register_operand" "=3Dw")
> > +	(vec_duplicate:V2HF
> > +	  (match_operand:HF 1 "register_operand" "0")))]
>=20
> Seems like this should be "w" rather than "0", since SLI is a two-registe=
r
> instruction.

Yes, but for a dup it's only valid when the same register is used. i.e. it =
has to
write into the original src register.

Thanks,
Tamar

>=20
> > +  "TARGET_SIMD"
> > +  "@
> > +   sli\\t%d0, %d1, 16"
> > +  [(set_attr "type" "neon_shift_imm")]
> > +)
> > +
> >  (define_insn "aarch64_simd_dup<mode>"
> >    [(set (match_operand:VDQF_F16 0 "register_operand" "=3Dw,w")
> >  	(vec_duplicate:VDQF_F16
> > @@ -85,10 +94,10 @@ (define_insn "aarch64_simd_dup<mode>"
> >  )
> >
> >  (define_insn "aarch64_dup_lane<mode>"
> > -  [(set (match_operand:VALL_F16 0 "register_operand" "=3Dw")
> > -	(vec_duplicate:VALL_F16
> > +  [(set (match_operand:VMOVE 0 "register_operand" "=3Dw")
> > +	(vec_duplicate:VMOVE
> >  	  (vec_select:<VEL>
> > -	    (match_operand:VALL_F16 1 "register_operand" "w")
> > +	    (match_operand:VMOVE 1 "register_operand" "w")
> >  	    (parallel [(match_operand:SI 2 "immediate_operand" "i")])
> >            )))]
> >    "TARGET_SIMD"
> > @@ -142,6 +151,29 @@ (define_insn
> "*aarch64_simd_mov<VDMOV:mode>"
> >  		     mov_reg, neon_move<q>")]
> >  )
> >
> > +(define_insn "*aarch64_simd_movv2hf"
> > +  [(set (match_operand:V2HF 0 "nonimmediate_operand"
> > +		"=3Dw, m,  m,  w, ?r, ?w, ?r, w, w")
> > +	(match_operand:V2HF 1 "general_operand"
> > +		"m,  Dz, w,  w,  w,  r,  r, Dz, Dn"))]
> > +  "TARGET_SIMD_F16INST
> > +   && (register_operand (operands[0], V2HFmode)
> > +       || aarch64_simd_reg_or_zero (operands[1], V2HFmode))"
> > +   "@
> > +    ldr\\t%s0, %1
> > +    str\\twzr, %0
> > +    str\\t%s1, %0
> > +    mov\\t%0.2s[0], %1.2s[0]
> > +    umov\\t%w0, %1.s[0]
> > +    fmov\\t%s0, %1
>=20
> Should be %w1 instead.
>=20
> > +    mov\\t%0, %1
>=20
> I guess this one works with either % (X registers) or %w.  Might still be=
 better
> to use %w anyway, so that it looks less like an oversight.
>=20
> > +    movi\\t%d0, 0
> > +    * return aarch64_output_simd_mov_immediate (operands[1], 32);"
> > +  [(set_attr "type" "neon_load1_1reg, store_8, neon_store1_1reg,\
> > +		     neon_logic, neon_to_gp, f_mcr,\
> > +		     mov_reg, neon_move, neon_move")]
> > +)
> > +
> >  (define_insn "*aarch64_simd_mov<VQMOV:mode>"
> >    [(set (match_operand:VQMOV 0 "nonimmediate_operand"
> >  		"=3Dw, Umn,  m,  w, ?r, ?w, ?r, w")
> > @@ -182,7 +214,7 @@ (define_insn
> "*aarch64_simd_mov<VQMOV:mode>"
> >
> >  (define_insn "aarch64_store_lane0<mode>"
> >    [(set (match_operand:<VEL> 0 "memory_operand" "=3Dm")
> > -	(vec_select:<VEL> (match_operand:VALL_F16 1 "register_operand"
> "w")
> > +	(vec_select:<VEL> (match_operand:VMOVE 1 "register_operand"
> "w")
> >  			(parallel [(match_operand 2 "const_int_operand"
> "n")])))]
> >    "TARGET_SIMD
> >     && ENDIAN_LANE_N (<nunits>, INTVAL (operands[2])) =3D=3D 0"
> > @@ -1035,11 +1067,11 @@ (define_insn "one_cmpl<mode>2"
> >  )
> >
> >  (define_insn "aarch64_simd_vec_set<mode>"
> > -  [(set (match_operand:VALL_F16 0 "register_operand" "=3Dw,w,w")
> > -	(vec_merge:VALL_F16
> > -	    (vec_duplicate:VALL_F16
> > +  [(set (match_operand:VMOVE 0 "register_operand" "=3Dw,w,w")
> > +	(vec_merge:VMOVE
> > +	    (vec_duplicate:VMOVE
> >  		(match_operand:<VEL> 1
> "aarch64_simd_nonimmediate_operand" "w,?r,Utv"))
> > -	    (match_operand:VALL_F16 3 "register_operand" "0,0,0")
> > +	    (match_operand:VMOVE 3 "register_operand" "0,0,0")
> >  	    (match_operand:SI 2 "immediate_operand" "i,i,i")))]
> >    "TARGET_SIMD"
> >    {
> > @@ -1061,14 +1093,14 @@ (define_insn "aarch64_simd_vec_set<mode>"
> >  )
> >
> >  (define_insn "@aarch64_simd_vec_copy_lane<mode>"
> > -  [(set (match_operand:VALL_F16 0 "register_operand" "=3Dw")
> > -	(vec_merge:VALL_F16
> > -	    (vec_duplicate:VALL_F16
> > +  [(set (match_operand:VMOVE 0 "register_operand" "=3Dw")
> > +	(vec_merge:VMOVE
> > +	    (vec_duplicate:VMOVE
> >  	      (vec_select:<VEL>
> > -		(match_operand:VALL_F16 3 "register_operand" "w")
> > +		(match_operand:VMOVE 3 "register_operand" "w")
> >  		(parallel
> >  		  [(match_operand:SI 4 "immediate_operand" "i")])))
> > -	    (match_operand:VALL_F16 1 "register_operand" "0")
> > +	    (match_operand:VMOVE 1 "register_operand" "0")
> >  	    (match_operand:SI 2 "immediate_operand" "i")))]
> >    "TARGET_SIMD"
> >    {
> > @@ -1376,7 +1408,7 @@ (define_insn "vec_shr_<mode>"
> >  )
> >
> >  (define_expand "vec_set<mode>"
> > -  [(match_operand:VALL_F16 0 "register_operand")
> > +  [(match_operand:VMOVE 0 "register_operand")
> >     (match_operand:<VEL> 1 "aarch64_simd_nonimmediate_operand")
> >     (match_operand:SI 2 "immediate_operand")]
> >    "TARGET_SIMD"
> > @@ -3495,7 +3527,7 @@ (define_insn "popcount<mode>2"
> >  ;; gimple_fold'd to the IFN_REDUC_(MAX|MIN) function.  (This is FP
> smax/smin).
> >  (define_expand "reduc_<optab>_scal_<mode>"
> >    [(match_operand:<VEL> 0 "register_operand")
> > -   (unspec:<VEL> [(match_operand:VHSDF 1 "register_operand")]
> > +   (unspec:<VEL> [(match_operand:VHSDF_P 1 "register_operand")]
> >  		 FMAXMINV)]
> >    "TARGET_SIMD"
> >    {
> > @@ -3510,7 +3542,7 @@ (define_expand "reduc_<optab>_scal_<mode>"
> >
> >  (define_expand "reduc_<fmaxmin>_scal_<mode>"
> >    [(match_operand:<VEL> 0 "register_operand")
> > -   (unspec:<VEL> [(match_operand:VHSDF 1 "register_operand")]
> > +   (unspec:<VEL> [(match_operand:VHSDF_P 1 "register_operand")]
> >  		 FMAXMINNMV)]
> >    "TARGET_SIMD"
> >    {
> > @@ -3554,8 +3586,8 @@ (define_insn
> "aarch64_reduc_<optab>_internalv2si"
> >  )
> >
> >  (define_insn "aarch64_reduc_<optab>_internal<mode>"
> > - [(set (match_operand:VHSDF 0 "register_operand" "=3Dw")
> > -       (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w")]
> > + [(set (match_operand:VHSDF_P 0 "register_operand" "=3Dw")
> > +       (unspec:VHSDF_P [(match_operand:VHSDF_P 1 "register_operand"
> > + "w")]
> >  		      FMAXMINV))]
> >   "TARGET_SIMD"
> >   "<maxmin_uns_op><vp>\\t%<Vetype>0, %1.<Vtype>"
> > @@ -4200,7 +4232,7 @@ (define_insn
> "*aarch64_get_lane_zero_extend<GPI:mode><VDQQH:mode>"
> >  (define_insn_and_split "aarch64_get_lane<mode>"
> >    [(set (match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand"
> "=3D?r, w, Utv")
> >  	(vec_select:<VEL>
> > -	  (match_operand:VALL_F16 1 "register_operand" "w, w, w")
> > +	  (match_operand:VMOVE 1 "register_operand" "w, w, w")
> >  	  (parallel [(match_operand:SI 2 "immediate_operand" "i, i, i")])))]
> >    "TARGET_SIMD"
> >    {
> > @@ -7981,7 +8013,7 @@ (define_expand "aarch64_st1<VALL_F16:mode>"
> >  ;; Standard pattern name vec_init<mode><Vel>.
> >
> >  (define_expand "vec_init<mode><Vel>"
> > -  [(match_operand:VALL_F16 0 "register_operand")
> > +  [(match_operand:VMOVE 0 "register_operand")
> >     (match_operand 1 "" "")]
> >    "TARGET_SIMD"
> >  {
> > @@ -8060,7 +8092,7 @@ (define_insn "aarch64_urecpe<mode>"
> >
> >  (define_expand "vec_extract<mode><Vel>"
> >    [(match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand")
> > -   (match_operand:VALL_F16 1 "register_operand")
> > +   (match_operand:VMOVE 1 "register_operand")
> >     (match_operand:SI 2 "immediate_operand")]
> >    "TARGET_SIMD"
> >  {
> > diff --git a/gcc/config/aarch64/aarch64.cc
> > b/gcc/config/aarch64/aarch64.cc index
> >
> 84dbe2f4ea7d03b424602ed98a34e7824217dc91..35671cb86e374f9ded21d0e4
> 944c
> > 63bc2cbc0901 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -3566,6 +3566,7 @@ aarch64_classify_vector_mode (machine_mode
> mode)
> >      case E_V8BFmode:
> >      case E_V4SFmode:
> >      case E_V2DFmode:
> > +    case E_V2HFmode:
> >        return TARGET_SIMD ? VEC_ADVSIMD : 0;
> >
> >      default:
> > diff --git a/gcc/config/aarch64/iterators.md
> > b/gcc/config/aarch64/iterators.md index
> >
> 37d8161a33b1c399d80be82afa67613a087389d4..dfcf86a440e316c2abdbcc6463
> 63
> > d39e458d1a91 100644
> > --- a/gcc/config/aarch64/iterators.md
> > +++ b/gcc/config/aarch64/iterators.md
> > @@ -160,6 +160,10 @@ (define_mode_iterator VDQF [V2SF V4SF V2DF])
> > (define_mode_iterator VHSDF [(V4HF "TARGET_SIMD_F16INST")
> >  			     (V8HF "TARGET_SIMD_F16INST")
> >  			     V2SF V4SF V2DF])
> > +;; Advanced SIMD Float modes suitable for pairwise operations.
> > +(define_mode_iterator VHSDF_P [(V4HF "TARGET_SIMD_F16INST")
> > +			       (V8HF "TARGET_SIMD_F16INST")
> > +			       V2SF V4SF V2DF (V2HF
> "TARGET_SIMD_F16INST")])
>=20
> Maybe "reduction or pairwise operations"?  Otherwise it isn't obvious why
> V4HF, V8HF and V4SF are included.
>=20
> >
> >  ;; Advanced SIMD Float modes, and DF.
> >  (define_mode_iterator VDQF_DF [V2SF V4SF V2DF DF]) @@ -188,15
> +192,23
> > @@ (define_mode_iterator VDQF_COND [V2SF V2SI V4SF V4SI V2DF
> V2DI])
> > (define_mode_iterator VALLF [V2SF V4SF V2DF SF DF])
> >
> >  ;; Advanced SIMD Float modes with 2 elements.
> > -(define_mode_iterator V2F [V2SF V2DF])
> > +(define_mode_iterator V2F [V2SF V2DF V2HF])
> >
> >  ;; All Advanced SIMD modes on which we support any arithmetic
> operations.
> >  (define_mode_iterator VALL [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V2SF
> > V4SF V2DF])
> >
> > -;; All Advanced SIMD modes suitable for moving, loading, and storing.
> > +;; All Advanced SIMD modes suitable for moving, loading, and storing
> > +;; except V2HF.
>=20
> I'd prefer:
>=20
> ;; The set of all modes for which vld1 intrinsics are provided.
>=20
> otherwise it isn't clear why V2HF is a special case.
>=20
> >  (define_mode_iterator VALL_F16 [V8QI V16QI V4HI V8HI V2SI V4SI V2DI
> >  				V4HF V8HF V4BF V8BF V2SF V4SF V2DF])
> >
> > +;; All Advanced SIMD modes suitable for moving, loading, and storing
> > +;; including V2HF (define_mode_iterator VMOVE [V8QI V16QI V4HI V8HI
> > +V2SI V4SI V2DI
> > +			     V4HF V8HF V4BF V8BF V2SF V4SF V2DF
> > +			     (V2HF "TARGET_SIMD_F16INST")])
> > +
> > +
> >  ;; The VALL_F16 modes except the 128-bit 2-element ones.
> >  (define_mode_iterator VALL_F16_NO_V2Q [V8QI V16QI V4HI V8HI V2SI
> V4SI
> >  				V4HF V8HF V2SF V4SF])
> > @@ -1076,7 +1088,7 @@ (define_mode_attr nunits [(V8QI "8") (V16QI
> "16")
> >  			  (V2SF "2") (V4SF "4")
> >  			  (V1DF "1") (V2DF "2")
> >  			  (DI "1") (DF "1")
> > -			  (V8DI "8")])
> > +			  (V8DI "8") (V2HF "2")])
> >
> >  ;; Map a mode to the number of bits in it, if the size of the mode
> > ;; is constant.
> > @@ -1090,6 +1102,7 @@ (define_mode_attr s [(HF "h") (SF "s") (DF "d")
> > (SI "s") (DI "d")])
> >
> >  ;; Give the length suffix letter for a sign- or zero-extension.
> >  (define_mode_attr size [(QI "b") (HI "h") (SI "w")])
> > +(define_mode_attr sizel [(QI "b") (HI "h") (SI "")])
> >
> >  ;; Give the number of bits in the mode  (define_mode_attr sizen [(QI
> > "8") (HI "16") (SI "32") (DI "64")])
>=20
> Looks like this isn't used in the patch, so could be dropped.
>=20
> OK with those changes, thanks.
>=20
> Richard
>=20
> > @@ -1193,7 +1206,7 @@ (define_mode_attr Vmntype [(V8HI ".8b") (V4SI
> > ".4h")  (define_mode_attr Vetype [(V8QI "b") (V16QI "b")
> >  			  (V4HI "h") (V8HI  "h")
> >  			  (V2SI "s") (V4SI  "s")
> > -			  (V2DI "d")
> > +			  (V2DI "d") (V2HF  "h")
> >  			  (V4HF "h") (V8HF  "h")
> >  			  (V2SF "s") (V4SF  "s")
> >  			  (V2DF "d")
> > @@ -1285,7 +1298,7 @@ (define_mode_attr Vcwtype [(VNx16QI "b")
> (VNx8QI
> > "h") (VNx4QI "w") (VNx2QI "d")  ;; more accurately.
> >  (define_mode_attr stype [(V8QI "b") (V16QI "b") (V4HI "s") (V8HI "s")
> >  			 (V2SI "s") (V4SI "s") (V2DI "d") (V4HF "s")
> > -			 (V8HF "s") (V2SF "s") (V4SF "s") (V2DF "d")
> > +			 (V8HF "s") (V2SF "s") (V4SF "s") (V2DF "d") (V2HF
> "s")
> >  			 (HF "s") (SF "s") (DF "d") (QI "b") (HI "s")
> >  			 (SI "s") (DI "d")])
> >
> > @@ -1360,8 +1373,8 @@ (define_mode_attr VEL [(V8QI  "QI") (V16QI "QI")
> >  		       (V4HF "HF") (V8HF  "HF")
> >  		       (V2SF "SF") (V4SF  "SF")
> >  		       (DF   "DF") (V2DF  "DF")
> > -		       (SI   "SI") (HI    "HI")
> > -		       (QI   "QI")
> > +		       (SI   "SI") (V2HF  "HF")
> > +		       (QI   "QI") (HI    "HI")
> >  		       (V4BF "BF") (V8BF "BF")
> >  		       (VNx16QI "QI") (VNx8QI "QI") (VNx4QI "QI") (VNx2QI
> "QI")
> >  		       (VNx8HI "HI") (VNx4HI "HI") (VNx2HI "HI") @@ -1381,7
> +1394,7
> > @@ (define_mode_attr Vel [(V8QI "qi") (V16QI "qi")
> >  		       (V2SF "sf") (V4SF "sf")
> >  		       (V2DF "df") (DF   "df")
> >  		       (SI   "si") (HI   "hi")
> > -		       (QI   "qi")
> > +		       (QI   "qi") (V2HF "hf")
> >  		       (V4BF "bf") (V8BF "bf")
> >  		       (VNx16QI "qi") (VNx8QI "qi") (VNx4QI "qi") (VNx2QI "qi")
> >  		       (VNx8HI "hi") (VNx4HI "hi") (VNx2HI "hi") @@ -1866,7
> +1879,7
> > @@ (define_mode_attr q [(V8QI "") (V16QI "_q")
> >  		     (V4HF "") (V8HF "_q")
> >  		     (V4BF "") (V8BF "_q")
> >  		     (V2SF "") (V4SF  "_q")
> > -			       (V2DF  "_q")
> > +		     (V2HF "") (V2DF  "_q")
> >  		     (QI "") (HI "") (SI "") (DI "") (HF "") (SF "") (DF "")
> >  		     (V2x8QI "") (V2x16QI "_q")
> >  		     (V2x4HI "") (V2x8HI "_q")
> > @@ -1905,6 +1918,7 @@ (define_mode_attr vp [(V8QI "v") (V16QI "v")
> >  		      (V2SI "p") (V4SI  "v")
> >  		      (V2DI "p") (V2DF  "p")
> >  		      (V2SF "p") (V4SF  "v")
> > +		      (V2HF "p")
> >  		      (V4HF "v") (V8HF  "v")])
> >
> >  (define_mode_attr vsi2qi [(V2SI "v8qi") (V4SI "v16qi") diff --git
> > a/gcc/config/arm/types.md b/gcc/config/arm/types.md index
> >
> 7d0504bdd944e9c0d1b545b0b66a9a1adc808714..3cfbc7a93cca1bea4925853e5
> 1d0
> > a147c5722247 100644
> > --- a/gcc/config/arm/types.md
> > +++ b/gcc/config/arm/types.md
> > @@ -483,6 +483,7 @@ (define_attr "autodetect_type"
> >  ; neon_fp_minmax_s_q
> >  ; neon_fp_minmax_d
> >  ; neon_fp_minmax_d_q
> > +; neon_fp_reduc_add_h
> >  ; neon_fp_reduc_add_s
> >  ; neon_fp_reduc_add_s_q
> >  ; neon_fp_reduc_add_d
> > @@ -1033,6 +1034,7 @@ (define_attr "type"
> >    neon_fp_minmax_d,\
> >    neon_fp_minmax_d_q,\
> >  \
> > +  neon_fp_reduc_add_h,\
> >    neon_fp_reduc_add_s,\
> >    neon_fp_reduc_add_s_q,\
> >    neon_fp_reduc_add_d,\
> > @@ -1257,8 +1259,8 @@ (define_attr "is_neon_type" "yes,no"
> >            neon_fp_compare_d, neon_fp_compare_d_q, neon_fp_minmax_s,\
> >            neon_fp_minmax_s_q, neon_fp_minmax_d,
> neon_fp_minmax_d_q,\
> >            neon_fp_neg_s, neon_fp_neg_s_q, neon_fp_neg_d,
> neon_fp_neg_d_q,\
> > -          neon_fp_reduc_add_s, neon_fp_reduc_add_s_q,
> neon_fp_reduc_add_d,\
> > -          neon_fp_reduc_add_d_q, neon_fp_reduc_minmax_s,
> > +          neon_fp_reduc_add_h, neon_fp_reduc_add_s,
> neon_fp_reduc_add_s_q,\
> > +          neon_fp_reduc_add_d, neon_fp_reduc_add_d_q,
> > + neon_fp_reduc_minmax_s,\
> >            neon_fp_reduc_minmax_s_q, neon_fp_reduc_minmax_d,\
> >            neon_fp_reduc_minmax_d_q,\
> >            neon_fp_cvt_narrow_s_q, neon_fp_cvt_narrow_d_q,\ diff --git
> > a/gcc/testsuite/gcc.target/aarch64/sve/slp_1.c
> > b/gcc/testsuite/gcc.target/aarch64/sve/slp_1.c
> > index
> >
> 07d71a63414b1066ea431e287286ad048515711a..e6021c5a42748701e5326a5c3
> 87a
> > 39a0bbadc9e5 100644
> > --- a/gcc/testsuite/gcc.target/aarch64/sve/slp_1.c
> > +++ b/gcc/testsuite/gcc.target/aarch64/sve/slp_1.c
> > @@ -30,11 +30,9 @@ vec_slp_##TYPE (TYPE *restrict a, TYPE b, TYPE c, in=
t
> n)	\
> >  TEST_ALL (VEC_PERM)
> >
> >  /* We should use one DUP for each of the 8-, 16- and 32-bit types,
> > -   although we currently use LD1RW for _Float16.  We should use two
> > -   DUPs for each of the three 64-bit types.  */
> > +   We should use two DUPs for each of the three 64-bit types.  */
> >  /* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.h, [hw]} 2 } }
> > */
> > -/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.s, [sw]} 2 } }
> > */
> > -/* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s, } 1 } } */
> > +/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.s, [sw]} 3 } }
> > +*/
> >  /* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, [dx]} 9 } }
> > */
> >  /* { dg-final { scan-assembler-times {\tzip1\tz[0-9]+\.d, z[0-9]+\.d,
> > z[0-9]+\.d\n} 3 } } */
> >  /* { dg-final { scan-assembler-not {\tzip2\t} } } */ @@ -53,7 +51,7
> > @@ TEST_ALL (VEC_PERM)
> >  /* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.s} 6 } } */
> >  /* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.d} 6 } } */
> >  /* { dg-final { scan-assembler-not {\tldr} } } */
> > -/* { dg-final { scan-assembler-times {\tstr} 2 } } */
> > -/* { dg-final { scan-assembler-times {\tstr\th[0-9]+} 2 } } */
> > +/* { dg-final { scan-assembler-not {\tstr} } } */
> > +/* { dg-final { scan-assembler-not {\tstr\th[0-9]+} } } */
> >
> >  /* { dg-final { scan-assembler-not {\tuqdec} } } */