From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 21374 invoked by alias); 10 Feb 2020 13:36:00 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 21355 invoked by uid 89); 10 Feb 2020 13:36:00 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-22.5 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_SHORT,LIKELY_SPAM_BODY,MSGID_FROM_MTA_HEADER,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_PASS,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 spammy=Lane, adequate, sup, Cheers X-HELO: EUR01-VE1-obe.outbound.protection.outlook.com Received: from mail-eopbgr140085.outbound.protection.outlook.com (HELO EUR01-VE1-obe.outbound.protection.outlook.com) (40.107.14.85) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 10 Feb 2020 13:35:56 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=f1uFQtyQrZ+qlu56sjpAphfpGcdK+red/bjNKBUN3hw=; b=RfMqjUDCxxM5veM7RPZS4lYM0WP53iGpuaifSyiPaEyZRKzCTTdlhTxd+dzlHQ1n4cwdiJvhDGJEeRAGD8JDVzWFKAyBoOz25fGpA3pnnaHFYvnIN7ge6+SJ2wmK/K6btdp1FrpklZO23m7UWfhty5VNEu4CSJFSUjwVstp/Ifg= Received: from VI1PR08CA0178.eurprd08.prod.outlook.com (2603:10a6:800:d1::32) by VI1PR08MB3264.eurprd08.prod.outlook.com (2603:10a6:803:4b::27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2707.23; Mon, 10 Feb 2020 13:35:52 +0000 Received: from VE1EUR03FT021.eop-EUR03.prod.protection.outlook.com (2a01:111:f400:7e09::204) by VI1PR08CA0178.outlook.office365.com (2603:10a6:800:d1::32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2707.21 via Frontend Transport; Mon, 10 Feb 2020 13:35:52 +0000 Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; gcc.gnu.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;gcc.gnu.org; dmarc=bestguesspass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VE1EUR03FT021.mail.protection.outlook.com (10.152.18.117) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2665.18 via Frontend Transport; Mon, 10 Feb 2020 13:35:52 +0000 Received: ("Tessian outbound d1ceabc7047e:v42"); Mon, 10 Feb 2020 13:35:52 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 3bae0d4b5292cb1e X-CR-MTA-TID: 64aa7808 Received: from bdec8b2aa506.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id B6346726-F94D-44B0-8152-E5BADA893AC2.1; Mon, 10 Feb 2020 13:35:47 +0000 Received: from EUR05-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id bdec8b2aa506.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 10 Feb 2020 13:35:47 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=c46jp+K78u6+g9S3DezpvG9jj5zaejAN7EMWfseN88EbasCZIV5B94Dsn/jgaZSJ4Z+SVBQDwcHXKeTzKBoK0L6isjUXWA4MCf3HLsvvcReYTSFMphKHjfIPaBzUZnEOXK+KtI9IiGRdPkGVbnBTwz0GDfXNOmQ7cwL7UZUjMPD05Vs9sLapK1oShVWfBVdxbjLkD1WVzZShglNM0zBSC24Rp/K8ljurafNJpYga3hVRKgjbrKFeqdEn2odWhdF/qjlnOuAl+IsHe0nTIin7NS/SEVOtP4kHotooPAM0MBZ/2FgmrtClgodXXEPKc2sSe6odfW06Qr0GgsjPSj6ljQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=f1uFQtyQrZ+qlu56sjpAphfpGcdK+red/bjNKBUN3hw=; b=elfg/KWOa8QgdsY3+6Lm0QevL07gBI2oUo9LuTYAFIfyvz5xg/TTL7m69rZXc+6fv805cLxNzBuTACTjeGrdiq8nDQ/0/SSFXAe9bKGuJmuKgzLB6XbHcF4FtKF6zhjOJV6DkD/a+uLgyuAuERRVKVN6Ql5T9GZO72yOJe5mUMZzwGOf8LnWyP5vDtBK/8zelPAo6VARKnddmqM5YBr3O1fVSyRmr26AMQLbMhGuZw6u/psIQoJV46L73QBZgeWvrJZveanwFnyNIMtLgGxoXeZjudwvGdNZTlLOj1RLwRTAmju453i6XBWm8GDRzJ0HB3inygz8aYEm+bbXsGKnwQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=f1uFQtyQrZ+qlu56sjpAphfpGcdK+red/bjNKBUN3hw=; b=RfMqjUDCxxM5veM7RPZS4lYM0WP53iGpuaifSyiPaEyZRKzCTTdlhTxd+dzlHQ1n4cwdiJvhDGJEeRAGD8JDVzWFKAyBoOz25fGpA3pnnaHFYvnIN7ge6+SJ2wmK/K6btdp1FrpklZO23m7UWfhty5VNEu4CSJFSUjwVstp/Ifg= Authentication-Results-Original: spf=none (sender IP is ) smtp.mailfrom=Stam.Markianos-Wright@arm.com; Received: from VI1PR08MB3005.eurprd08.prod.outlook.com (52.133.14.23) by VI1PR08MB4622.eurprd08.prod.outlook.com (20.178.15.17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2707.23; Mon, 10 Feb 2020 13:35:45 +0000 Received: from VI1PR08MB3005.eurprd08.prod.outlook.com ([fe80::e9d4:2eda:ba3e:8738]) by VI1PR08MB3005.eurprd08.prod.outlook.com ([fe80::e9d4:2eda:ba3e:8738%3]) with mapi id 15.20.2707.030; Mon, 10 Feb 2020 13:35:45 +0000 Subject: [Pingx3][GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, vdot - by element) for AArch32 AdvSIMD ARMv8.6 Extension From: Stam Markianos-Wright To: "gcc-patches@gcc.gnu.org" Cc: Richard Earnshaw , kyrylo.tkachov@arm.com, nickc@redhat.com, ramana.radhakrishnan@arm.com References: <346a4ee9-8f69-af56-b028-43ad4cd536d5@arm.com> <0b0b4089-7385-5f6c-8604-def1124fb0de@arm.com> <75d16447-bb2f-8907-1435-2d491600ffca@arm.com> <3a564141-a5fc-9ac4-067b-2861e16ca42d@arm.com> <10599202-fdf0-3b70-2655-7f8c57c29384@arm.com> Message-ID: <57ecc153-4dbf-bdc3-59af-d27375257183@arm.com> Date: Mon, 10 Feb 2020 13:36:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.4.1 In-Reply-To: <10599202-fdf0-3b70-2655-7f8c57c29384@arm.com> Content-Type: multipart/mixed; boundary="------------B6DAF1C3DAB41B8DC99E8692" MIME-Version: 1.0 Received: from [10.2.74.61] (217.140.106.55) by DM3PR14CA0135.namprd14.prod.outlook.com (2603:10b6:0:53::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2707.23 via Frontend Transport; Mon, 10 Feb 2020 13:35:43 +0000 X-MS-Exchange-Transport-Forked: True x-checkrecipientrouted: true X-MS-Oob-TLC-OOBClassifiers: OLM:366;OLM:366; X-Forefront-Antispam-Report-Untrusted: SFV:NSPM;SFS:(10009020)(4636009)(39860400002)(376002)(366004)(136003)(346002)(396003)(199004)(189003)(4326008)(16576012)(316002)(31696002)(86362001)(8676002)(6916009)(31686004)(6666004)(66556008)(66616009)(81156014)(81166006)(66946007)(8936002)(66476007)(478600001)(186003)(235185007)(16526019)(4001150100001)(53546011)(2616005)(956004)(5660300002)(6486002)(36756003)(2906002)(52116002)(966005)(33964004)(26005);DIR:OUT;SFP:1101;SCL:1;SRVR:VI1PR08MB4622;H:VI1PR08MB3005.eurprd08.prod.outlook.com;FPR:;SPF:None;LANG:en;PTR:InfoNoRecords;A:1;MX:1; Received-SPF: None (protection.outlook.com: arm.com does not designate permitted sender hosts) X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: jU/0fuRErJ2qNixZPTia9QqsWkgkZF49W2erfDD4+slaNynWZzVtBF+7GHa7vmlTNNLNoPPqA8pkCBSw0EZwqnXn6E210toPcWijppg0gYTUXimgP8ynF/BIDMAa3Ybnbcg/2UrLgPNkuk3WJjtc+7Qubiv9bhb+uPJpQGruI5F0bz0WWkHb/Glzw5VXJM3pJe0WIRJYCsdUCamwZaq253pkC6e3ND75yeDbfK6qKQ42kGLzB/ikhndeSn36OYQl5h9vTINagGQnDoQiaFOmy8+FCRAzTY8diDZKwDZgNKOg5/NV/YowzBqIzh+jyMfOXsdrMI19XN+MAI/+kwXMB9oKPXGyWtpMmJ65GQrYcZZSgAc0BbxO240B+jJaTq6fyS6TZRC3anuEd3bUow+/R/MgbOoagv45pAguttK7uKFMaySuqKDWrxh5ZCsuWOOOFa7HTeZoQuw3V91uDlCfZfXbUoF5i4ncp+AdaARDRPWnnxEqmDwvr/R0o4HpQ7UdG935RzvdYG6vkJHa6q2qYg== X-MS-Exchange-AntiSpam-MessageData: vHEJNWVGh4XQhsykI1MPLAQDle7Lp7TdPf3Wahbgk8UXEO2G+h4NtdE5MD7uuQSqfC85ULxNK5cWkJzREvWLbDwf+aQp+ST0tlkI8SBhEowDtwV1uJJ7DGR7Ck2waSSzQ0tdSqxiuUnRHwmxgQKrvQ== Original-Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Stam.Markianos-Wright@arm.com; Return-Path: Stam.Markianos-Wright@arm.com X-MS-Exchange-Transport-CrossTenantHeadersStripped: VE1EUR03FT021.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: ccf05e2e-d548-488f-f360-08d7ae2e21ba X-IsSubscribed: yes X-SW-Source: 2020-02/txt/msg00538.txt.bz2 --------------B6DAF1C3DAB41B8DC99E8692 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-length: 3572 On 2/3/20 11:20 AM, Stam Markianos-Wright wrote: > > > On 1/27/20 3:54 PM, Stam Markianos-Wright wrote: >> >> On 1/16/20 4:05 PM, Stam Markianos-Wright wrote: >>> >>> >>> On 1/10/20 6:48 PM, Stam Markianos-Wright wrote: >>>> >>>> >>>> On 12/18/19 1:25 PM, Stam Markianos-Wright wrote: >>>>> >>>>> >>>>> On 12/13/19 10:22 AM, Stam Markianos-Wright wrote: >>>>>> Hi all, >>>>>> >>>>>> This patch adds the ARMv8.6 Extension ACLE intrinsics for dot product >>>>>> operations (vector/by element) to the ARM back-end. >>>>>> >>>>>> These are: >>>>>> usdot (vector), dot (by element). >>>>>> >>>>>> The functions are optional from ARMv8.2-a as -march=armv8.2-a+i8mm and >>>>>> for ARM they remain optional as of ARMv8.6-a. >>>>>> >>>>>> The functions are declared in arm_neon.h, RTL patterns are defined to >>>>>> generate assembler and tests are added to verify and perform adequate checks. >>>>>> >>>>>> Regression testing on arm-none-eabi passed successfully. >>>>>> >>>>>> This patch depends on: >>>>>> >>>>>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html >>>>>> >>>>>> for ARM CLI updates, and on: >>>>>> >>>>>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html >>>>>> >>>>>> for testsuite effective_target update. >>>>>> >>>>>> Ok for trunk? >>>>> >>>> >>>> New diff addressing review comments from Aarch64 version of the patch. >>>> >>>> _Change of order of operands in RTL patterns. >>>> _Change tests to use check-function-bodies, compile with optimisation and >>>> check for exact registers. >>>> _Rename tests to remove "-compile-" in filename. >>>> >>> > .Ping! Ping :) Diff re-attached in this ping email is same as the one posted on 10/01 Thank you! > . >>> >>> Cheers, >>> Stam >>> >>>>>> >>>>>> >>>>>> ACLE documents are at https://developer.arm.com/docs/101028/latest >>>>>> ISA documents are at https://developer.arm.com/docs/ddi0596/latest >>>>>> >>>>>> PS. I don't have commit rights, so if someone could commit on my behalf, >>>>>> that would be great :) >>>>>> >>>>>> >>>>>> gcc/ChangeLog: >>>>>> >>>>>> 2019-11-28  Stam Markianos-Wright  >>>>>> >>>>>>      * config/arm/arm-builtins.c (enum arm_type_qualifiers): >>>>>>      (USTERNOP_QUALIFIERS): New define. >>>>>>      (USMAC_LANE_QUADTUP_QUALIFIERS): New define. >>>>>>      (SUMAC_LANE_QUADTUP_QUALIFIERS): New define. >>>>>>      (arm_expand_builtin_args): >>>>>>          Add case ARG_BUILTIN_LANE_QUADTUP_INDEX. >>>>>>      (arm_expand_builtin_1): Add qualifier_lane_quadtup_index. >>>>>>      * config/arm/arm_neon.h (vusdot_s32): New. >>>>>>      (vusdot_lane_s32): New. >>>>>>      (vusdotq_lane_s32): New. >>>>>>      (vsudot_lane_s32): New. >>>>>>      (vsudotq_lane_s32): New. >>>>>>      * config/arm/arm_neon_builtins.def >>>>>>          (usdot,usdot_lane,sudot_lane): New. >>>>>>      * config/arm/iterators.md (DOTPROD_I8MM): New. >>>>>>          (sup, opsuffix): Add . >>>>>>         * config/arm/neon.md (neon_usdot, dot_lane: New. >>>>>>      * config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New. >>>>>> >>>>>> >>>>>> gcc/testsuite/ChangeLog: >>>>>> >>>>>> 2019-12-12  Stam Markianos-Wright  >>>>>> >>>>>>      * gcc.target/arm/simd/vdot-2-1.c: New test. >>>>>>      * gcc.target/arm/simd/vdot-2-2.c: New test. >>>>>>      * gcc.target/arm/simd/vdot-2-3.c: New test. >>>>>>      * gcc.target/arm/simd/vdot-2-4.c: New test. >>>>>> >>>>>> >>>> --------------B6DAF1C3DAB41B8DC99E8692 Content-Type: text/x-patch; charset=UTF-8; name="I8MM-32-final.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="I8MM-32-final.patch" Content-length: 15372 diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index df84560588a..1b4316d0e93 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -86,7 +86,10 @@ enum arm_type_qualifiers qualifier_const_void_pointer = 0x802, /* Lane indices selected in pairs - must be within range of previous argument = a vector. */ - qualifier_lane_pair_index = 0x1000 + qualifier_lane_pair_index = 0x1000, + /* Lane indices selected in quadtuplets - must be within range of previous + argument = a vector. */ + qualifier_lane_quadtup_index = 0x2000 }; /* The qualifier_internal allows generation of a unary builtin from @@ -122,6 +125,13 @@ arm_unsigned_uternop_qualifiers[SIMD_MAX_BUILTIN_ARGS] qualifier_unsigned }; #define UTERNOP_QUALIFIERS (arm_unsigned_uternop_qualifiers) +/* T (T, unsigned T, T). */ +static enum arm_type_qualifiers +arm_usternop_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none, qualifier_unsigned, + qualifier_none }; +#define USTERNOP_QUALIFIERS (arm_usternop_qualifiers) + /* T (T, immediate). */ static enum arm_type_qualifiers arm_binop_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS] @@ -176,6 +186,20 @@ arm_umac_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS] qualifier_unsigned, qualifier_lane_index }; #define UMAC_LANE_QUALIFIERS (arm_umac_lane_qualifiers) +/* T (T, unsigned T, T, lane index). */ +static enum arm_type_qualifiers +arm_usmac_lane_quadtup_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none, qualifier_unsigned, + qualifier_none, qualifier_lane_quadtup_index }; +#define USMAC_LANE_QUADTUP_QUALIFIERS (arm_usmac_lane_quadtup_qualifiers) + +/* T (T, T, unsigend T, lane index). */ +static enum arm_type_qualifiers +arm_sumac_lane_quadtup_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none, qualifier_none, + qualifier_unsigned, qualifier_lane_quadtup_index }; +#define SUMAC_LANE_QUADTUP_QUALIFIERS (arm_sumac_lane_quadtup_qualifiers) + /* T (T, T, immediate). */ static enum arm_type_qualifiers arm_ternop_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS] @@ -2148,6 +2172,7 @@ typedef enum { ARG_BUILTIN_LANE_INDEX, ARG_BUILTIN_STRUCT_LOAD_STORE_LANE_INDEX, ARG_BUILTIN_LANE_PAIR_INDEX, + ARG_BUILTIN_LANE_QUADTUP_INDEX, ARG_BUILTIN_NEON_MEMORY, ARG_BUILTIN_MEMORY, ARG_BUILTIN_STOP @@ -2296,11 +2321,24 @@ arm_expand_builtin_args (rtx target, machine_mode map_mode, int fcode, if (CONST_INT_P (op[argc])) { machine_mode vmode = mode[argc - 1]; - neon_lane_bounds (op[argc], 0, GET_MODE_NUNITS (vmode) / 2, exp); + neon_lane_bounds (op[argc], 0, + GET_MODE_NUNITS (vmode) / 2, exp); + } + /* If the lane index isn't a constant then error out. */ + goto constant_arg; + + case ARG_BUILTIN_LANE_QUADTUP_INDEX: + /* Previous argument must be a vector, which this indexes. */ + gcc_assert (argc > 0); + if (CONST_INT_P (op[argc])) + { + machine_mode vmode = mode[argc - 1]; + neon_lane_bounds (op[argc], 0, + GET_MODE_NUNITS (vmode) / 4, exp); } - /* If the lane index isn't a constant then the next - case will error. */ - /* Fall through. */ + /* If the lane index isn't a constant then error out. */ + goto constant_arg; + case ARG_BUILTIN_CONSTANT: constant_arg: if (!(*insn_data[icode].operand[opno].predicate) @@ -2464,6 +2502,8 @@ arm_expand_builtin_1 (int fcode, tree exp, rtx target, args[k] = ARG_BUILTIN_LANE_INDEX; else if (d->qualifiers[qualifiers_k] & qualifier_lane_pair_index) args[k] = ARG_BUILTIN_LANE_PAIR_INDEX; + else if (d->qualifiers[qualifiers_k] & qualifier_lane_quadtup_index) + args[k] = ARG_BUILTIN_LANE_QUADTUP_INDEX; else if (d->qualifiers[qualifiers_k] & qualifier_struct_load_store_lane_index) args[k] = ARG_BUILTIN_STRUCT_LOAD_STORE_LANE_INDEX; else if (d->qualifiers[qualifiers_k] & qualifier_immediate) diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index db8db53614a..ede89ec2c64 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -18738,6 +18738,52 @@ vcmlaq_rot270_laneq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b, return __builtin_neon_vcmla_lane270v4sf (__r, __a, __b, __index); } + +/* AdvSIMD Matrix Multiply-Accumulate and Dot Product intrinsics. */ +#pragma GCC push_options +#pragma GCC target ("arch=armv8.2-a+i8mm") + +__extension__ extern __inline int32x2_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vusdot_s32 (int32x2_t __r, uint8x8_t __a, int8x8_t __b) +{ + return __builtin_neon_usdotv8qi_ssus (__r, __a, __b); +} + +__extension__ extern __inline int32x2_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vusdot_lane_s32 (int32x2_t __r, uint8x8_t __a, + int8x8_t __b, const int __index) +{ + return __builtin_neon_usdot_lanev8qi_ssuss (__r, __a, __b, __index); +} + +__extension__ extern __inline int32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vusdotq_lane_s32 (int32x4_t __r, uint8x16_t __a, + int8x8_t __b, const int __index) +{ + return __builtin_neon_usdot_lanev16qi_ssuss (__r, __a, __b, __index); +} + +__extension__ extern __inline int32x2_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vsudot_lane_s32 (int32x2_t __r, int8x8_t __a, + uint8x8_t __b, const int __index) +{ + return __builtin_neon_sudot_lanev8qi_sssus (__r, __a, __b, __index); +} + +__extension__ extern __inline int32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vsudotq_lane_s32 (int32x4_t __r, int8x16_t __a, + uint8x8_t __b, const int __index) +{ + return __builtin_neon_sudot_lanev16qi_sssus (__r, __a, __b, __index); +} + +#pragma GCC pop_options + #pragma GCC pop_options #endif diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def index e9ff4e501cb..b4537ff5de9 100644 --- a/gcc/config/arm/arm_neon_builtins.def +++ b/gcc/config/arm/arm_neon_builtins.def @@ -352,6 +352,10 @@ VAR2 (UTERNOP, udot, v8qi, v16qi) VAR2 (MAC_LANE, sdot_lane, v8qi, v16qi) VAR2 (UMAC_LANE, udot_lane, v8qi, v16qi) +VAR1 (USTERNOP, usdot, v8qi) +VAR2 (USMAC_LANE_QUADTUP, usdot_lane, v8qi, v16qi) +VAR2 (SUMAC_LANE_QUADTUP, sudot_lane, v8qi, v16qi) + VAR4 (BINOP, vcadd90, v4hf, v2sf, v8hf, v4sf) VAR4 (BINOP, vcadd270, v4hf, v2sf, v8hf, v4sf) VAR4 (TERNOP, vcmla0, v2sf, v4sf, v4hf, v8hf) diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md index 7da8b74abc0..afea7f823e0 100644 --- a/gcc/config/arm/iterators.md +++ b/gcc/config/arm/iterators.md @@ -466,6 +466,8 @@ (define_int_iterator DOTPROD [UNSPEC_DOT_S UNSPEC_DOT_U]) +(define_int_iterator DOTPROD_I8MM [UNSPEC_DOT_US UNSPEC_DOT_SU]) + (define_int_iterator VFMLHALVES [UNSPEC_VFML_LO UNSPEC_VFML_HI]) (define_int_iterator VCADD [UNSPEC_VCADD90 UNSPEC_VCADD270]) @@ -920,6 +922,7 @@ (UNSPEC_VRSRA_S_N "s") (UNSPEC_VRSRA_U_N "u") (UNSPEC_VCVTH_S "s") (UNSPEC_VCVTH_U "u") (UNSPEC_DOT_S "s") (UNSPEC_DOT_U "u") + (UNSPEC_DOT_US "us") (UNSPEC_DOT_SU "su") (UNSPEC_SSAT16 "s") (UNSPEC_USAT16 "u") ]) @@ -1151,6 +1154,9 @@ (define_int_attr MRRC [(VUNSPEC_MRRC "MRRC") (VUNSPEC_MRRC2 "MRRC2")]) (define_int_attr opsuffix [(UNSPEC_DOT_S "s8") - (UNSPEC_DOT_U "u8")]) + (UNSPEC_DOT_U "u8") + (UNSPEC_DOT_US "s8") + (UNSPEC_DOT_SU "u8") + ]) (define_int_attr smlaw_op [(UNSPEC_SMLAWB "smlawb") (UNSPEC_SMLAWT "smlawt")]) diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index dace9470c41..8b83cba8fb7 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -3279,6 +3279,20 @@ [(set_attr "type" "neon_dot")] ) +;; These instructions map to the __builtins for the Dot Product operations. +(define_insn "neon_usdot" + [(set (match_operand:VCVTI 0 "register_operand" "=w") + (plus:VCVTI + (unspec:VCVTI + [(match_operand: 2 "register_operand" "w") + (match_operand: 3 "register_operand" "w")] + UNSPEC_DOT_US) + (match_operand:VCVTI 1 "register_operand" "0")))] + "TARGET_I8MM" + "vusdot.s8\\t%0, %2, %3" + [(set_attr "type" "neon_dot")] +) + ;; These instructions map to the __builtins for the Dot Product ;; indexed operations. (define_insn "neon_dot_lane" @@ -3298,6 +3312,25 @@ [(set_attr "type" "neon_dot")] ) +;; These instructions map to the __builtins for the Dot Product +;; indexed operations in the v8.6 I8MM extension. +(define_insn "neon_dot_lane" + [(set (match_operand:VCVTI 0 "register_operand" "=w") + (plus:VCVTI + (unspec:VCVTI + [(match_operand: 2 "register_operand" "w") + (match_operand:V8QI 3 "register_operand" "t") + (match_operand:SI 4 "immediate_operand" "i")] + DOTPROD_I8MM) + (match_operand:VCVTI 1 "register_operand" "0")))] + "TARGET_I8MM" + { + operands[4] = GEN_INT (INTVAL (operands[4])); + return "vdot.\\t%0, %2, %P3[%c4]"; + } + [(set_attr "type" "neon_dot")] +) + ;; These expands map to the Dot Product optab the vectorizer checks for. ;; The auto-vectorizer expects a dot product builtin that also does an ;; accumulation into the provided register. diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md index ade6b1af994..0aaff3b4bfc 100644 --- a/gcc/config/arm/unspecs.md +++ b/gcc/config/arm/unspecs.md @@ -485,6 +485,8 @@ UNSPEC_VRNDX UNSPEC_DOT_S UNSPEC_DOT_U + UNSPEC_DOT_US + UNSPEC_DOT_SU UNSPEC_VFML_LO UNSPEC_VFML_HI UNSPEC_VCADD90 diff --git a/gcc/testsuite/gcc.target/arm/simd/vdot-2-1.c b/gcc/testsuite/gcc.target/arm/simd/vdot-2-1.c new file mode 100644 index 00000000000..4d5f07b771b --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/vdot-2-1.c @@ -0,0 +1,91 @@ +/* { dg-do assemble { target { arm*-*-* } } } */ +/* { dg-require-effective-target arm_v8_2a_i8mm_ok } */ +/* { dg-add-options arm_v8_2a_i8mm } */ +/* { dg-additional-options "-O -save-temps" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include + +/* Unsigned-Signed Dot Product instructions. */ + +/* +**usfoo: +** ... +** vusdot\.s8 d0, d1, d2 +** bx lr +*/ +int32x2_t usfoo (int32x2_t r, uint8x8_t x, int8x8_t y) +{ + return vusdot_s32 (r, x, y); +} + +/* +**usfoo_lane: +** ... +** vusdot\.s8 d0, d1, d2\[0\] +** bx lr +*/ +int32x2_t usfoo_lane (int32x2_t r, uint8x8_t x, int8x8_t y) +{ + return vusdot_lane_s32 (r, x, y, 0); +} + +/* +**usfooq_lane: +** ... +** vusdot\.s8 q0, q1, d4\[1\] +** bx lr +*/ +int32x4_t usfooq_lane (int32x4_t r, uint8x16_t x, int8x8_t y) +{ + return vusdotq_lane_s32 (r, x, y, 1); +} + +/* Signed-Unsigned Dot Product instructions. */ + +/* +**sfoo_lane: +** ... +** vsudot\.u8 d0, d1, d2\[0\] +** bx lr +*/ +int32x2_t sfoo_lane (int32x2_t r, int8x8_t x, uint8x8_t y) +{ + return vsudot_lane_s32 (r, x, y, 0); +} + +/* +**sfooq_lane: +** ... +** vsudot\.u8 q0, q1, d4\[1\] +** bx lr +*/ +int32x4_t sfooq_lane (int32x4_t r, int8x16_t x, uint8x8_t y) +{ + return vsudotq_lane_s32 (r, x, y, 1); +} + +/* +**usfoo_untied: +** ... +** vusdot\.s8 d1, d2, d3 +** vmov d0, d1 @ v2si +** bx lr +*/ +int32x2_t usfoo_untied (int32x2_t unused, int32x2_t r, uint8x8_t x, int8x8_t y) +{ + return vusdot_s32 (r, x, y); +} + +/* +**usfoo_lane_untied: +** ... +** vusdot.s8 d1, d2, d3\[0\] +** vmov d0, d1 @ v2si +** bx lr +*/ +int32x2_t usfoo_lane_untied (int32x2_t unused, int32x2_t r, uint8x8_t x, int8x8_t y) +{ + return vusdot_lane_s32 (r, x, y, 0); +} + diff --git a/gcc/testsuite/gcc.target/arm/simd/vdot-2-2.c b/gcc/testsuite/gcc.target/arm/simd/vdot-2-2.c new file mode 100644 index 00000000000..b7b76e27486 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/vdot-2-2.c @@ -0,0 +1,90 @@ +/* { dg-do assemble { target { arm*-*-* } } } */ +/* { dg-require-effective-target arm_v8_2a_i8mm_ok } */ +/* { dg-add-options arm_v8_2a_i8mm } */ +/* { dg-additional-options "-O -save-temps -mbig-endian" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include + +/* Unsigned-Signed Dot Product instructions. */ + +/* +**usfoo: +** ... +** vusdot\.s8 d0, d1, d2 +** bx lr +*/ +int32x2_t usfoo (int32x2_t r, uint8x8_t x, int8x8_t y) +{ + return vusdot_s32 (r, x, y); +} + +/* +**usfoo_lane: +** ... +** vusdot\.s8 d0, d1, d2\[0\] +** bx lr +*/ +int32x2_t usfoo_lane (int32x2_t r, uint8x8_t x, int8x8_t y) +{ + return vusdot_lane_s32 (r, x, y, 0); +} + +/* +**usfooq_lane: +** ... +** vusdot\.s8 q0, q1, d4\[1\] +** bx lr +*/ +int32x4_t usfooq_lane (int32x4_t r, uint8x16_t x, int8x8_t y) +{ + return vusdotq_lane_s32 (r, x, y, 1); +} + +/* Signed-Unsigned Dot Product instructions. */ + +/* +**sfoo_lane: +** ... +** vsudot\.u8 d0, d1, d2\[0\] +** bx lr +*/ +int32x2_t sfoo_lane (int32x2_t r, int8x8_t x, uint8x8_t y) +{ + return vsudot_lane_s32 (r, x, y, 0); +} + +/* +**sfooq_lane: +** ... +** vsudot\.u8 q0, q1, d4\[1\] +** bx lr +*/ +int32x4_t sfooq_lane (int32x4_t r, int8x16_t x, uint8x8_t y) +{ + return vsudotq_lane_s32 (r, x, y, 1); +} + +/* +**usfoo_untied: +** ... +** vusdot\.s8 d1, d2, d3 +** vmov d0, d1 @ v2si +** bx lr +*/ +int32x2_t usfoo_untied (int32x2_t unused, int32x2_t r, uint8x8_t x, int8x8_t y) +{ + return vusdot_s32 (r, x, y); +} + +/* +**usfoo_lane_untied: +** ... +** vusdot.s8 d1, d2, d3\[0\] +** vmov d0, d1 @ v2si +** bx lr +*/ +int32x2_t usfoo_lane_untied (int32x2_t unused, int32x2_t r, uint8x8_t x, int8x8_t y) +{ + return vusdot_lane_s32 (r, x, y, 0); +} diff --git a/gcc/testsuite/gcc.target/arm/simd/vdot-2-3.c b/gcc/testsuite/gcc.target/arm/simd/vdot-2-3.c new file mode 100644 index 00000000000..e14fe8f4433 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/vdot-2-3.c @@ -0,0 +1,21 @@ +/* { dg-do assemble { target { arm*-*-* } } } */ +/* { dg-require-effective-target arm_v8_2a_i8mm_ok } */ +/* { dg-add-options arm_v8_2a_i8mm } */ +/* { dg-additional-options "--save-temps" } */ + +#include + +/* Unsigned-Signed Dot Product instructions. */ + +int32x2_t usfoo_lane (int32x2_t r, uint8x8_t x, int8x8_t y) +{ + /* { dg-error "lane -1 out of range 0 - 1" "" { target *-*-* } 0 } */ + return vusdot_lane_s32 (r, x, y, -1); +} + + +int32x4_t usfooq_lane (int32x4_t r, uint8x16_t x, int8x8_t y) +{ + /* { dg-error "lane 2 out of range 0 - 1" "" { target *-*-* } 0 } */ + return vusdotq_lane_s32 (r, x, y, 2); +} diff --git a/gcc/testsuite/gcc.target/arm/simd/vdot-2-4.c b/gcc/testsuite/gcc.target/arm/simd/vdot-2-4.c new file mode 100644 index 00000000000..fb7ebb484e1 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/vdot-2-4.c @@ -0,0 +1,20 @@ +/* { dg-do assemble { target { arm*-*-* } } } */ +/* { dg-require-effective-target arm_v8_2a_i8mm_ok } */ +/* { dg-add-options arm_v8_2a_i8mm } */ +/* { dg-additional-options "--save-temps" } */ + +#include + +/* Signed-Unsigned Dot Product instructions. */ + +int32x2_t sfoo_lane (int32x2_t r, int8x8_t x, uint8x8_t y) +{ + /* { dg-error "lane -1 out of range 0 - 1" "" { target *-*-* } 0 } */ + return vsudot_lane_s32 (r, x, y, -1); +} + +int32x4_t sfooq_lane (int32x4_t r, int8x16_t x, uint8x8_t y) +{ + /* { dg-error "lane 2 out of range 0 - 1" "" { target *-*-* } 0 } */ + return vsudotq_lane_s32 (r, x, y, 2); +} --------------B6DAF1C3DAB41B8DC99E8692--