From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by sourceware.org (Postfix) with ESMTPS id 712C43858413 for ; Wed, 19 Jul 2023 13:37:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 712C43858413 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689773877; x=1721309877; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=cExLCUUHIKcI/q099nT+8UMoV3RGLfz0+0vM1C8F2KE=; b=F/t6onYmpPvZmSuyXbs1uVkpCWYZG0VVtEGzipCKHUH49VKUgS0KX8pY FBCWkTEiJ7NZHS0VlJ0sJcHSVIx0rx7MSkFAwEuZ8OJRiSfh0LtVZ795F tHij41qMgksbza9CixHFdYmJ33vPBueA9nF8xzOVQZfnk1xyaqGDo4cUo QHxKgM4MHEFSoN0ya8nq+mGyBJd+7jo/k28qp9G4rOeFcAr0Scws9GIeP JMhibV7YHqVMgHsOtMN3chT8IlXjAPa414QVQEVdZLFfn1eSWMfYKYvHf QWmFk+A8tAbVS4BV4vCgtAt/lU/ke8j1MaxuAmwaIbLfLzShDsQT/tocy w==; X-IronPort-AV: E=McAfee;i="6600,9927,10776"; a="351333284" X-IronPort-AV: E=Sophos;i="6.01,216,1684825200"; d="scan'208";a="351333284" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jul 2023 06:37:53 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10776"; a="814112369" X-IronPort-AV: E=Sophos;i="6.01,216,1684825200"; d="scan'208";a="814112369" Received: from fmsmsx602.amr.corp.intel.com ([10.18.126.82]) by FMSMGA003.fm.intel.com with ESMTP; 19 Jul 2023 06:37:53 -0700 Received: from fmsmsx610.amr.corp.intel.com (10.18.126.90) by fmsmsx602.amr.corp.intel.com (10.18.126.82) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.27; Wed, 19 Jul 2023 06:37:53 -0700 Received: from fmsedg601.ED.cps.intel.com (10.1.192.135) by fmsmsx610.amr.corp.intel.com (10.18.126.90) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.27 via Frontend Transport; Wed, 19 Jul 2023 06:37:53 -0700 Received: from NAM12-MW2-obe.outbound.protection.outlook.com (104.47.66.49) by edgegateway.intel.com (192.55.55.70) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.27; Wed, 19 Jul 2023 06:37:52 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Fgz4yFv0RIt+6u0CLvntbRTqvxobdbqWZNNrHNIfqidmS9kWrOXBf5wYmnGLreMEjxIYlDkc/bJaBjzml7dRQt/K4lruqf0KXC0blvhiveuSuFb+9RADN46RE4n/wMxzOz3GnQGn7WSzeMGYAXESUL+l/mvozIEiF8EnJHqjwvzoWnuaAqB6TvZ4NKHCLNY9aJFHSUonj8WhW33n3Cg4UxUUufJdAEb2AUJbSGybsqkFXpOE4j3HzGFhQ/fC9xqlPX2YVhB5+4zqahMH23zNHV5dXy2HaNM9sUybyxiU3q8mwTcOmMrewxVovqFakfwbZHlOc7+9LspuQWQxPckl7w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=5Uun64NSKVx5AG3GWXBz9bBLXNCDxVROFtfcDp0mfrw=; b=C1m+2k53ZYLwfYFS7PWaliHda8QxBQExpEw1PgnzmPN1JGdfCcsFG/Y+VWU8nbb50pJcCRT911yVSV7tI8DP5+LGP9n8wIDcDB8BFFkuN4Lpv/cSRenAnCGRKNKVKq8xJY0ewmaf81tJz9KxSSVc31VZUInu4YgoddRFgMsgXn+maOYQ/hVIGdONvsuBARqcHEBeuMglKBiM1E0IPHx0+8+ynXSLeCYNpHu6C93BZhM5ve49LZ4RkrUw5tz1KSd5COw/qkyse4lMOLCeSt1twMrkqt2MU7GDbXvoLxGTl/oYKijszLPU8BufFkl0m+FyKMxdhm7pqBQ3iCXQ6k0RYw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Received: from MW5PR11MB5908.namprd11.prod.outlook.com (2603:10b6:303:194::10) by PH8PR11MB7989.namprd11.prod.outlook.com (2603:10b6:510:258::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6588.31; Wed, 19 Jul 2023 13:37:51 +0000 Received: from MW5PR11MB5908.namprd11.prod.outlook.com ([fe80::21a9:85fa:b8a8:8052]) by MW5PR11MB5908.namprd11.prod.outlook.com ([fe80::21a9:85fa:b8a8:8052%3]) with mapi id 15.20.6588.031; Wed, 19 Jul 2023 13:37:50 +0000 From: "Li, Pan2" To: Richard Biener , Ju-Zhe Zhong CC: "gcc-patches@gcc.gnu.org" , "richard.sandiford@arm.com" Subject: RE: [PATCH] VECT: Add mask_len_fold_left_plus for in-order floating-point reduction Thread-Topic: [PATCH] VECT: Add mask_len_fold_left_plus for in-order floating-point reduction Thread-Index: AQHZtq1Rux/3Bbl6YkyyHnQ0J5DjKq/AxXqAgABY3KA= Date: Wed, 19 Jul 2023 13:37:50 +0000 Message-ID: References: <20230714234500.75826-1-juzhe.zhong@rivai.ai> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: MW5PR11MB5908:EE_|PH8PR11MB7989:EE_ x-ms-office365-filtering-correlation-id: f86f9dce-1092-4789-590f-08db885d58eb x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: OlChDY8QkDLSFE0ZsEIJdhovXGGbaLv408YKCcgLat85xjK1qCMmbO7qQm/VbxS3HDz3mh77DCkH5fNfy3laMfdjjZvwNbmt8FwU7mXhRMlZt+T4LWQ1ExYTjW55sCoEcuoco3wLyJACnGtHVGovSO1F0rnOBU74R3IP4U1BXSkdfkcRodp9vrV25lFaaWYpgKFUmmfpWGNSEsA1ZXyp/AJVlx/Q+CsqaCNLDgkNPrEGz0fDJP0k5PdhiuJFrNXCl52j/Ii1PLOMsIt+ctrkmXVhvmmzCYUWYfI3k2VaPAUWSfdLs8eXYuzL8ciI9EcElWiS8huMADt5joe7t6rYGAFSTe7I9fTTVQ7SVH1ZEcqwghd5LOzGNoOFnw1HPRS6OQJhryGrPX3w7kdN/RvntCRtM5SSngfAwUzC6ISFofIzQFbZpjdKCbQylMvJ+muA5lKQoETtbiyy/hmxhEXFHrts4T1stQ7iTkEjmOa76Epy7i1n5u930bq9QDUng3hZds1txctxQeJ7r08XXVCzFLp3rWbyfUe3rfHhk7JFjtWmJkxi7sITwTRQfr7dlBF/0P3aiOQ7EIAlF8Oys11ZmJm2AnpGdjYfLSJ8gZda2Unr1PKbgULDjxJu3dTQLpK9 x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:MW5PR11MB5908.namprd11.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230028)(366004)(396003)(346002)(136003)(376002)(39860400002)(451199021)(2906002)(71200400001)(478600001)(7696005)(110136005)(66476007)(8936002)(8676002)(66556008)(54906003)(41300700001)(66946007)(4326008)(316002)(66446008)(64756008)(76116006)(82960400001)(83380400001)(86362001)(122000001)(55016003)(186003)(9686003)(52536014)(53546011)(5660300002)(38070700005)(26005)(33656002)(38100700002)(6506007);DIR:OUT;SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?PhLzYMq5bCuO96A8Q7LQDp49/9I9b51PRtqhGsaGsbm/urlXPxMQcAgalhQd?= =?us-ascii?Q?gfG2yFVJ6hW6lsPHT/2fYBWPj8gGOWgZaM/h+8CwR8RN8Qq2jM+SuZQrosfx?= =?us-ascii?Q?VM04nEWY4D+ELeNDsXNoAqYEdqeaHmKYB5pDa1wF5k46DqgKec4cYgHXSFvm?= =?us-ascii?Q?qYc0RPFLa78lZ/YGvVmzJ6cRCeygN+3a7ZbJw9qsJgvqciDd0vM/Ncc8kOX+?= =?us-ascii?Q?gzh+cxsUvOA0S42dOjD3Lg9uTrBum2cSzz26mh8PlBgi4ebUT6SMrc32LQ9T?= =?us-ascii?Q?kcZjOZCgAIypvBEdYZu898TjMaDxxDELdqbHDy+yBaY7J2Rohd37ZTaqr+wO?= =?us-ascii?Q?POjJsPb1dhv2YlUKeMVs6TeYdi9IROfepwraeOr/wEtNl6T2QSjij3t4HN6r?= =?us-ascii?Q?6k2MfPvhOGVc39RIcoOqpNxD5wrEI/dMIJIJxTtr8EmOFNZZlrSPz5LIqOW9?= =?us-ascii?Q?pl5u0lUb7ZZ6OYaQkOElkPH3cqNrPg3DITbPDSwtPjwxeUhW/Z2A8lEbVfxH?= =?us-ascii?Q?mwOp4VkZuJJiC5Mx7iS2oj8hkjY6WfCxDeXScY0mUjLk+nxFOmMx90XnN6hu?= =?us-ascii?Q?xqciCH30h/2eLGI4ZLw4RsZsaIAMv9uFSRhdX8y2+xtmJ1ze6xzLHSIWGE+a?= =?us-ascii?Q?rLoXcKLXTd3+1gW2B/CbddDc7fBcO+HVBOqSS41o2sTXJPNysUhDqdzBTL86?= =?us-ascii?Q?S1a4ggOj7a0f1A8O9yOHaT5yrHhPE+jw+O+AIRQ8R1nL1mahHKI40hdg4hiE?= =?us-ascii?Q?RwXOpiEoZTe9DjWmzC2Q1rN3qtfqnqgvBobubTXe6OBHytGb0ATJx8vt5Amf?= =?us-ascii?Q?JDADO96Utq5RBOqgFsG+S+fQQMKSZVBWQJxdES3PF+LKRh+9sRGIQ+BYT6Gq?= =?us-ascii?Q?jLM+NrpYsN/47qcvoCYoScX8HbLO14YNVNPcb+W1XW375TBHUjiqv57+uLWA?= =?us-ascii?Q?2/A9hQHo/XAm7ys1YJZgDUk7UTVepHoQDPtqfY8pIdvrrDetq/yMnT/hT8UP?= =?us-ascii?Q?q4nId6heZ4bRjE/Ua4QOGi7fyXb5m6KxkVcPQlSik9sIZUAG0QdN6YKUe5L8?= =?us-ascii?Q?oxWwKNL8ODsHOlaPcjtBpv3jxE3U43K/O6iVmmyO8NsFhMZZaHc/yRMIAEJg?= =?us-ascii?Q?a9b/85Tl/wqg4HtSuYowZhQGHVHDBCFYIyhq9tR2CcR8htAK1AjVqxkEUhMy?= =?us-ascii?Q?YEZe1zI6JtQgyydoAROzDDUO5FKzx8Kb3ZqATwxVEvaaMdy0MRZ9/Pq8Zf+u?= =?us-ascii?Q?GIVJy7HsQgKKH0BcGSStBYNADpDTs60Znh7gbgkfedG1p+6p5XVrirdRnGhd?= =?us-ascii?Q?9xUhv5CTdXL0CLIsEXabd+eVbkpWA1PkPHENUiMIDKQbaZV/3rVLlu+LCJwF?= =?us-ascii?Q?d+dRAnWnjfzUP/BmQtoEvxJG6QyQQwlRdnsTHE95aX1+tE8uN+gMmq/xbt8l?= =?us-ascii?Q?VqR8B0wkat1clBW0CcDvVtqF0aeGlZC4cY9SUtlqLY10TFpWqHRe9niXr/sx?= =?us-ascii?Q?azbzcelbSKGbwuOU0EsBwDhgoqdV7TTcFjs9t87wvqSFM49aaJzhzCrMGxdT?= =?us-ascii?Q?Bna5G+YPyTUyOVQtSVA=3D?= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: MW5PR11MB5908.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: f86f9dce-1092-4789-590f-08db885d58eb X-MS-Exchange-CrossTenant-originalarrivaltime: 19 Jul 2023 13:37:50.8150 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: 9mKAxLjpssSMlAS43I+6n9kHxdutuL/JK+d6576gVp4a9WXYN93Oi7MWIXWbCw8AxEw36+KBJEP2g8mrIRb5Ew== X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH8PR11MB7989 X-OriginatorOrg: intel.com X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Committed as passed both the bootstrap and regression test, thanks Richard. Pan -----Original Message----- From: Gcc-patches On = Behalf Of Richard Biener via Gcc-patches Sent: Wednesday, July 19, 2023 4:17 PM To: Ju-Zhe Zhong Cc: gcc-patches@gcc.gnu.org; richard.sandiford@arm.com Subject: Re: [PATCH] VECT: Add mask_len_fold_left_plus for in-order floatin= g-point reduction On Sat, 15 Jul 2023, juzhe.zhong@rivai.ai wrote: > From: Ju-Zhe Zhong >=20 > Hi, Richard and Richi. >=20 > This patch adds mask_len_fold_left_plus pattern to support in-order float= ing-point > reduction for target support len loop control. >=20 > Consider this following case: > double > foo2 (double *__restrict a, > double init, > int *__restrict cond, > int n) > { > for (int i =3D 0; i < n; i++) > if (cond[i]) > init +=3D a[i]; > return init; > } >=20 > ARM SVE: >=20 > ... > vec_mask_and_60 =3D loop_mask_54 & mask__23.33_57; > vect__ifc__35.37_64 =3D .VCOND_MASK (vec_mask_and_60, vect__8.36_61, { 0.= 0, ... }); > _36 =3D .MASK_FOLD_LEFT_PLUS (init_20, vect__ifc__35.37_64, loop_mask_54)= ; > ... >=20 > For RVV, we want to see: > ... > _36 =3D .MASK_LEN_FOLD_LEFT_PLUS (init_20, vect__ifc__35.37_64, control_m= ask, loop_len, bias); > ... OK. Richard. > gcc/ChangeLog: >=20 > * doc/md.texi: Add mask_len_fold_left_plus. > * internal-fn.cc (mask_len_fold_left_direct): Ditto. > (expand_mask_len_fold_left_optab_fn): Ditto. > (direct_mask_len_fold_left_optab_supported_p): Ditto. > * internal-fn.def (MASK_LEN_FOLD_LEFT_PLUS): Ditto. > * optabs.def (OPTAB_D): Ditto. >=20 > --- > gcc/doc/md.texi | 13 +++++++++++++ > gcc/internal-fn.cc | 5 +++++ > gcc/internal-fn.def | 3 +++ > gcc/optabs.def | 1 + > 4 files changed, 22 insertions(+) >=20 > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi > index cbcb992e5d7..6f44e66399d 100644 > --- a/gcc/doc/md.texi > +++ b/gcc/doc/md.texi > @@ -5615,6 +5615,19 @@ no reassociation. > Like @samp{fold_left_plus_@var{m}}, but takes an additional mask operand > (operand 3) that specifies which elements of the source vector should be= added. > =20 > +@cindex @code{mask_len_fold_left_plus_@var{m}} instruction pattern > +@item @code{mask_len_fold_left_plus_@var{m}} > +Like @samp{fold_left_plus_@var{m}}, but takes an additional mask operand > +(operand 3), len operand (operand 4) and bias operand (operand 5) that > +performs following operations strictly in-order (no reassociation): > + > +@smallexample > +operand0 =3D operand1; > +for (i =3D 0; i < LEN + BIAS; i++) > + if (operand3[i]) > + operand0 +=3D operand2[i]; > +@end smallexample > + > @cindex @code{sdot_prod@var{m}} instruction pattern > @item @samp{sdot_prod@var{m}} > =20 > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc > index e698f0bffc7..2bf4fc492fe 100644 > --- a/gcc/internal-fn.cc > +++ b/gcc/internal-fn.cc > @@ -190,6 +190,7 @@ init_internal_fns () > #define fold_extract_direct { 2, 2, false } > #define fold_left_direct { 1, 1, false } > #define mask_fold_left_direct { 1, 1, false } > +#define mask_len_fold_left_direct { 1, 1, false } > #define check_ptrs_direct { 0, 0, false } > =20 > const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1] =3D= { > @@ -3890,6 +3891,9 @@ expand_convert_optab_fn (internal_fn fn, gcall *stm= t, convert_optab optab, > #define expand_mask_fold_left_optab_fn(FN, STMT, OPTAB) \ > expand_direct_optab_fn (FN, STMT, OPTAB, 3) > =20 > +#define expand_mask_len_fold_left_optab_fn(FN, STMT, OPTAB) \ > + expand_direct_optab_fn (FN, STMT, OPTAB, 5) > + > #define expand_check_ptrs_optab_fn(FN, STMT, OPTAB) \ > expand_direct_optab_fn (FN, STMT, OPTAB, 4) > =20 > @@ -3997,6 +4001,7 @@ multi_vector_optab_supported_p (convert_optab optab= , tree_pair types, > #define direct_fold_extract_optab_supported_p direct_optab_supported_p > #define direct_fold_left_optab_supported_p direct_optab_supported_p > #define direct_mask_fold_left_optab_supported_p direct_optab_supported_p > +#define direct_mask_len_fold_left_optab_supported_p direct_optab_support= ed_p > #define direct_check_ptrs_optab_supported_p direct_optab_supported_p > #define direct_vec_set_optab_supported_p direct_optab_supported_p > #define direct_vec_extract_optab_supported_p direct_optab_supported_p > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def > index ea750a921ed..d3aec51b1f2 100644 > --- a/gcc/internal-fn.def > +++ b/gcc/internal-fn.def > @@ -319,6 +319,9 @@ DEF_INTERNAL_OPTAB_FN (FOLD_LEFT_PLUS, ECF_CONST | EC= F_NOTHROW, > DEF_INTERNAL_OPTAB_FN (MASK_FOLD_LEFT_PLUS, ECF_CONST | ECF_NOTHROW, > mask_fold_left_plus, mask_fold_left) > =20 > +DEF_INTERNAL_OPTAB_FN (MASK_LEN_FOLD_LEFT_PLUS, ECF_CONST | ECF_NOTHROW, > + mask_len_fold_left_plus, mask_len_fold_left) > + > /* Unary math functions. */ > DEF_INTERNAL_FLT_FN (ACOS, ECF_CONST, acos, unary) > DEF_INTERNAL_FLT_FN (ACOSH, ECF_CONST, acosh, unary) > diff --git a/gcc/optabs.def b/gcc/optabs.def > index 3dae228fba6..7023392979e 100644 > --- a/gcc/optabs.def > +++ b/gcc/optabs.def > @@ -385,6 +385,7 @@ OPTAB_D (reduc_ior_scal_optab, "reduc_ior_scal_$a") > OPTAB_D (reduc_xor_scal_optab, "reduc_xor_scal_$a") > OPTAB_D (fold_left_plus_optab, "fold_left_plus_$a") > OPTAB_D (mask_fold_left_plus_optab, "mask_fold_left_plus_$a") > +OPTAB_D (mask_len_fold_left_plus_optab, "mask_len_fold_left_plus_$a") > =20 > OPTAB_D (extract_last_optab, "extract_last_$a") > OPTAB_D (fold_extract_last_optab, "fold_extract_last_$a") >=20 --=20 Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)