From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.24]) by sourceware.org (Postfix) with ESMTPS id 339433858D35 for ; Tue, 26 Sep 2023 03:03:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 339433858D35 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695697384; x=1727233384; h=from:to:cc:subject:date:message-id:references: in-reply-to:mime-version; bh=WQwG6vRh3laS73RmZa9K6qamIkSppUYwSGBlvtlNu7g=; b=FtjxS9dsYizTvjalUva2JPes1Ya7xNdk10CLVfwHex4mp9pUQmlbr49B jZaWh2VcIH3848LDBHR9bK8G9pNGIEqUpgcyM1d9vTLO1El/JXBo0KpID d1x1ymq3ip/V0eB3hC11kgEzS1mCscxPJ+IVu1seLNbvoTbwcc70DD8Ls 7ERxZZ/cYkw5KFDmKmw356PQpi2nua/Jqiy87k1XGbAkmibSW72FHEAgF 1o3pIW2ejMUUZfEiOG83rvL6/AOMhV0o1z3vXZmDzTfE1I636d+i6N0lc MRqBhThz1ttMBSWZCT3p6sC2BJeNZetdaN7pdogQkrZwOM5sr3lHTzhTj Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10843"; a="384245566" X-IronPort-AV: E=Sophos;i="6.03,176,1694761200"; d="scan'208,217";a="384245566" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2023 20:03:00 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10843"; a="864242745" X-IronPort-AV: E=Sophos;i="6.03,176,1694761200"; d="scan'208,217";a="864242745" Received: from orsmsx602.amr.corp.intel.com ([10.22.229.15]) by fmsmga002.fm.intel.com with ESMTP/TLS/AES256-GCM-SHA384; 25 Sep 2023 20:02:59 -0700 Received: from orsmsx610.amr.corp.intel.com (10.22.229.23) by ORSMSX602.amr.corp.intel.com (10.22.229.15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.32; Mon, 25 Sep 2023 20:02:58 -0700 Received: from ORSEDG601.ED.cps.intel.com (10.7.248.6) by orsmsx610.amr.corp.intel.com (10.22.229.23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.32 via Frontend Transport; Mon, 25 Sep 2023 20:02:58 -0700 Received: from NAM11-CO1-obe.outbound.protection.outlook.com (104.47.56.172) by edgegateway.intel.com (134.134.137.102) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.32; Mon, 25 Sep 2023 20:02:58 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Q7A/C+ocBmvDT9XYF+npRbYpJaedCznxJR+mtfYUBDUtbaMCWl2tpt6tDTv/EwrKMKTo6RSR/bLaE3Kk54VojssLCizgc6WbictWKuTrkiMueG4UwtuvS8+dDMrZObyYrfPSqEOrR970wrV0RgXOOBZVsHFoV98GiZZWDZKkktTZCCH5VFmwiOV+D7iEGNHLKpkjmgjPjZBMC1RSOrm/YlsQOJGHHoGAav3MiIb2lcSPJKxC1kooZFstYrLXERt8Qa7T1Hd+8Aqo6w+p3o1h+T2luQaDziJcfPNbUmIRTZkOUUo9NJcAPZl1zDCDHCUHfDfKvViiJ2qeFLUAuuYWjw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=x+JgIo1lk1emdf+gVnZBfXGYM3jcWEmSCJyEK+B+AUM=; b=cKgldIDiCZe5x5ZppspWBg/NCjteqR/JQV5w/Otezt0jXdSrIDTC4suChNNEXKOswEGPXuf6bsAknsLE9imWB8J0mbonO9dvkprZ82v+edlqgW14a+PoBLr1AmsHjUoYGt2rxmzJFPZ7P/lYmyj/8xgfKclzo2YN+DEoepyeZwh3qe+sP9SERBpAVtaONSPig/m5oJ43noCRkVJv6qtA+XpIItAWOaWXRlrWn3lCV3suxc/wV+cvq7ziayFdDZgrlKaklInGlUSvNAj6wRY23mjUarsE445Oa349n2m0SjJgC0rmZIUsvAk7/TjBu/UYjOYO2g/wZlBdJLxkDHylCA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Received: from MW5PR11MB5908.namprd11.prod.outlook.com (2603:10b6:303:194::10) by IA0PR11MB7934.namprd11.prod.outlook.com (2603:10b6:208:40d::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6813.28; Tue, 26 Sep 2023 03:02:53 +0000 Received: from MW5PR11MB5908.namprd11.prod.outlook.com ([fe80::6ff9:5a3d:4981:3476]) by MW5PR11MB5908.namprd11.prod.outlook.com ([fe80::6ff9:5a3d:4981:3476%5]) with mapi id 15.20.6792.024; Tue, 26 Sep 2023 03:02:53 +0000 From: "Li, Pan2" To: "juzhe.zhong@rivai.ai" , gcc-patches CC: "Wang, Yanzhang" , kito.cheng Subject: RE: [PATCH v1] RISC-V: Support FP nearbyint auto-vectorization Thread-Topic: [PATCH v1] RISC-V: Support FP nearbyint auto-vectorization Thread-Index: AQHZ8CKt9K7fSTk8JEmJNc2TkHtfXbAsas3XgAAAqMA= Date: Tue, 26 Sep 2023 03:02:53 +0000 Message-ID: References: <20230926023916.2631146-1-pan2.li@intel.com> <2688721A30954DE1+202309261059190787246@rivai.ai> In-Reply-To: <2688721A30954DE1+202309261059190787246@rivai.ai> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: MW5PR11MB5908:EE_|IA0PR11MB7934:EE_ x-ms-office365-filtering-correlation-id: cc31666d-4a53-4ca5-b793-08dbbe3d13ce x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: L8ev25Z8FzzXYA5LkxBAsiBIo3x2yZDIW5DQz6UHoScFLooI/GmhbM5i5q9ypgh9He3AwLplH1Vj3Y+SNMSC+le+7LRCDBTRwP1Rdc+XQNO4scJcQKKZhXFNIaJeWcCd3HK941Nu8FBV9V778gCXDGkgWeQz4YaCVBUgtG7Vka/WqXFTab46sRAI58lWsIZhVPhZTvsjyVXcEJIcDW2XMp1MBsxhcWGF39YZhYD21BiL2h/XK4p4z31x4k3NPuar0pRGTcUWp2c6pLO5LhTQbla1EZ25qOp0BfzREJCku0TkyArDJftZj+nyqY6bBLR+zZCgCUtxZQVYSYDl0PKZRcKze4mN134FGKVbSpc6Ur0F7c7r30g9/1n1O0Dr7Kgbm1zGxk1JvAzxEP3Jz3v0hhLU07aLZiTdoq4RFGxeUduYDcPZgAT/5UFNGE21ZAeF5Hs0xTWQGScxZyTIpctYm/WJl4kY20lmrU/J/qWgb9l3DDJQXWJPHmAwl1zB9uFbCQOimo1mj6/rhjznvf8bJ/F5RuCbJjlvhmQNI9ZddJtDZWW3/0QSibN0JYtU/I+GrUA+Pm6z5Feh1sWduwYZINkMZX6zqYfjcmix8iSwS5qddYuXXdL5qm1bp3s3M+8Q8eTHzvpzNJECAvA1mGipSg== x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:MW5PR11MB5908.namprd11.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(136003)(366004)(396003)(39860400002)(376002)(346002)(230922051799003)(1800799009)(186009)(451199024)(55016003)(2906002)(38070700005)(83380400001)(30864003)(122000001)(82960400001)(38100700002)(76116006)(9686003)(66946007)(66556008)(66476007)(66446008)(64756008)(54906003)(110136005)(52536014)(53546011)(6506007)(71200400001)(5660300002)(26005)(41300700001)(7696005)(478600001)(8936002)(8676002)(4326008)(316002)(86362001)(33656002)(84970400001)(559001)(579004);DIR:OUT;SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?ydEIRUBYi+0ON2KODOtSCqwjyOVUZpeIZctOJZh35xEGGZbEvalaYuSsgZRV?= =?us-ascii?Q?KbVRySWcg4dHB80+lSCpI5EChDfJ9uKHxLC0S1iN2dW7QokNaD8ivmiYxMWg?= =?us-ascii?Q?yxItS+V/9c/RxTg0s0Y0XRvx4+ve87hf8TPbqp4ilp4y4fzVKOBKi4HjsW1q?= =?us-ascii?Q?BNkB9vvEscMconvKFJfSiJqPf7ZdDtW4KuFbawJURerHb7XP3UtnSeTq3Ee9?= =?us-ascii?Q?IZeT1kWXi4pQZDdgcFEoJ/bGqVpz8oaY3Scegbkr9KwzQdJe+jcFVQARdFNy?= =?us-ascii?Q?uGWSgZsvX70YCnoduy52/wcoiC5bFbuqMiKLBHu6WSyDJ7d7i9RhyKb2/Cuy?= =?us-ascii?Q?Lgwl5Ns0ogEulBjh8i7oBWyIxhwWq5sBMfPZX+h/8wjT4jvbjHnI14Te+3xX?= =?us-ascii?Q?V00tKZ15pCOT8x4QWrcx6Njp4fSY5YEUyFg5dF1E+SER4s+N+Hz4STB3bIEW?= =?us-ascii?Q?gxymkzW1KEYQJJ0DDCeQJLbvm509fa+D74F8jpD8zb/o4RFCd7e+oN+SAB8a?= =?us-ascii?Q?u/uo4gZ7ur+TGa7sAqdmWoIWqO7yfEPR0L9iF8nB+g4jf1MtYwtHk94XDt+L?= =?us-ascii?Q?r6WjXLR+IpmW1GVrsdYyAgKoaPOcQyjTzg25Yic7vN+b/quUTlTtSlATf1X9?= =?us-ascii?Q?KPUpEqSilRtzx9dyX9frmKnzTaPakXnZCZ74c9r6H5ig1odQBBhRufradfQe?= =?us-ascii?Q?CG8y2nxR8lR22YE5aSkxJESR73PWU84gXTNS3BWsHiu4Apeav1aebEgG3njm?= =?us-ascii?Q?1oCr+eb9K9Z0jiSVYez4lyAljgObHKFQUWwdkW9E0D3arwMn/xjsGglz3Byl?= =?us-ascii?Q?Dq0I6TbrruWi1p9q03OCac61YLGADP9sys+2HeuCf16o+5YJlbNL16nshj5e?= =?us-ascii?Q?yU3dqklw4QuLwH9j7NFogAJwI29t70Tu7NdcP4alKYqi+W7VJ9fIqdt9VeSb?= =?us-ascii?Q?zyTSWxy0qSStlBSAxSsRp2uMagQxoH9Hkpq+HrzWuITztE/dTb5uPKtzYfgS?= =?us-ascii?Q?zjrxMG8nN649j2TsfIAo9rTM0EHRI+huEef4Meaw7+A295PS0hJ1eT3etbuX?= =?us-ascii?Q?4PwYtRqTXl3cSuwGcwtb19He5F30mApdyZ222RXKPJAALbFXmeDU2/0GZDIG?= =?us-ascii?Q?l9fYmtwpXQtWHST4iYv9dRULldsuO9u48JOgM3GH0NfVrs6wgy3RK2N+RET9?= =?us-ascii?Q?Cne5jclnwIqdPGrLWZfU0Qy0AOu56sI6287eHGaMcdP7JlOzhgVb5lAQW4/Z?= =?us-ascii?Q?2d7l6KJyZqIfTr7t0wIevXTv/Slqof1zb2HhT2+DHlLBdKuzJ5m9SNhlrOjh?= =?us-ascii?Q?GOUUFEtcFer/WFt4Z0aXf+6I6qjcqMU+/Ou7NBROuWSE7azAFSusmmYeRpa7?= =?us-ascii?Q?oXnEmJufZkMrdjhEoyLAqSGx7yiWogQ3uTQgzTULrTD2SFyUv46euLcmt3wG?= =?us-ascii?Q?3bUgq3Qw3puSMvsMxM58bduDe78F2PGFV8HnLwBjuce159B9hdoAZaa3TvPv?= =?us-ascii?Q?pKiTnFeB4Nmqxfv2dS8mygb8+EjPxji4GTH/dKveAzY6Dc5gu/x4V+JcQDnt?= =?us-ascii?Q?ybLEcmgQmus7rdQwfhQ=3D?= Content-Type: multipart/alternative; boundary="_000_MW5PR11MB59080E4C51795F3C0C5B2BAAA9C3AMW5PR11MB5908namp_" MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: MW5PR11MB5908.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: cc31666d-4a53-4ca5-b793-08dbbe3d13ce X-MS-Exchange-CrossTenant-originalarrivaltime: 26 Sep 2023 03:02:53.7326 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: SFTuHDklT8TKBmulktCTYpPyuysRhLhJc3lNeQIB4b+QkrndGMCCMfG66sg1P+NhKtPeskxBZFRVd8Lo3RMBRw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: IA0PR11MB7934 X-OriginatorOrg: intel.com X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,HTML_MESSAGE,KAM_SHORT,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --_000_MW5PR11MB59080E4C51795F3C0C5B2BAAA9C3AMW5PR11MB5908namp_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Sure thing, will send another patch for the renaming first. Pan From: juzhe.zhong@rivai.ai Sent: Tuesday, September 26, 2023 10:59 AM To: Li, Pan2 ; gcc-patches Cc: Li, Pan2 ; Wang, Yanzhang ;= kito.cheng Subject: Re: [PATCH v1] RISC-V: Support FP nearbyint auto-vectorization +static rtx +gen_nearbyint_const_fp (machine_mode inner_mode) +{ + /* The nearbyint needs the same floating point const as ceil. */ + return gen_ceil_const_fp (inner_mode); +} This is redundant. Also, this is also redundant: static rtx gen_floor_const_fp (machine_mode inner_mode) { /* The floor needs the same floating point const as ceil. */ return gen_ceil_const_fp (inner_mode); } So rename it : gen_ceil_const_fp (machine_mode inner_mode) into: get_fp_rounding_coefficient ________________________________ juzhe.zhong@rivai.ai From: pan2.li Date: 2023-09-26 10:39 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v1] RISC-V: Support FP nearbyint auto-vectorization From: Pan Li > This patch would like to support auto-vectorization for the nearbyint API in math.h. It depends on the -ffast-math option. When we would like to call nearbyint/nearbyintf like v2 =3D nearbyint (v1), we will convert it into below insns (reference the implementation of llvm). * frflags a5 * vfcvt.x.f v3, v1, RDN * vfcvt.f.x v2, v3 * fsflags a5 However, the floating point value may not need the cvt as above if its mantissa is zero. Take single precision floating point as example: Assume we have RTZ rounding mode +------------+---------------+-----------------+ | raw float | binary layout | after nearbyint | +------------+---------------+-----------------+ | 8388607.5 | 0x4affffff | 8388607.0 | | 8388608.0 | 0x4b000000 | 8388608.0 | | 8388609.0 | 0x4b000001 | 8388609.0 | +------------+---------------+-----------------+ All single floating point >=3D 8388608.0 will have all zero mantisaa. We leverage vmflt and mask to filter them out in vector and only do the cvt on mask. Befor this patch: math-nearbyint-1.c:21:1: missed: couldn't vectorize loop ... .L3: flw fa0,0(s0) addi s0,s0,4 addi s1,s1,4 call nearbyint fsw fa0,-4(s1) bne s0,s2,.L3 After this patch: vfabs.v v2,v1 vmflt.vf v0,v2,fa5 frflags a7 vfcvt.x.f.v v4,v1,v0.t vfcvt.f.x.v v2,v4,v0.t fsflags a7 vfsgnj.vv v2,v2,v1 Please note VLS mode is also involved in this patch and covered by the test cases. gcc/ChangeLog: * config/riscv/autovec.md (nearbyint2): New pattern. * config/riscv/riscv-protos.h (enum insn_type): New enum. (expand_vec_nearbyint): New function decl. * config/riscv/riscv-v.cc (gen_nearbyint_const_fp): New function impl. (expand_vec_nearbyint): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/test-math.h: Add helper function. * gcc.target/riscv/rvv/autovec/unop/math-nearbyint-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-nearbyint-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-nearbyint-2.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-nearbyint-3.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-nearbyint-1.c: New test. Signed-off-by: Pan Li > --- gcc/config/riscv/autovec.md | 11 ++++ gcc/config/riscv/riscv-protos.h | 2 + gcc/config/riscv/riscv-v.cc | 36 ++++++++++++ .../riscv/rvv/autovec/unop/math-nearbyint-0.c | 20 +++++++ .../riscv/rvv/autovec/unop/math-nearbyint-1.c | 20 +++++++ .../riscv/rvv/autovec/unop/math-nearbyint-2.c | 20 +++++++ .../riscv/rvv/autovec/unop/math-nearbyint-3.c | 22 +++++++ .../rvv/autovec/unop/math-nearbyint-run-1.c | 48 +++++++++++++++ .../rvv/autovec/unop/math-nearbyint-run-2.c | 48 +++++++++++++++ .../riscv/rvv/autovec/unop/test-math.h | 33 +++++++++++ .../riscv/rvv/autovec/vls/math-nearbyint-1.c | 58 +++++++++++++++++++ 11 files changed, 318 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nea= rbyint-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nea= rbyint-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nea= rbyint-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nea= rbyint-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nea= rbyint-run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nea= rbyint-run-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-near= byint-1.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index a005e17457e..b47f086f5e6 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -2210,6 +2210,7 @@ (define_expand "avg3_ceil" ;; Includes: ;; - ceil/ceilf ;; - floor/floorf +;; - nearbyint/nearbyintf ;; ------------------------------------------------------------------------- (define_expand "ceil2" [(match_operand:V_VLSF 0 "register_operand") @@ -2230,3 +2231,13 @@ (define_expand "floor2" DONE; } ) + +(define_expand "nearbyint2" + [(match_operand:V_VLSF 0 "register_operand") + (match_operand:V_VLSF 1 "register_operand")] + "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math" + { + riscv_vector::expand_vec_nearbyint (operands[0], operands[1], mo= de, mode); + DONE; + } +) diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-proto= s.h index 63eb2475705..f87bdef0f71 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -296,6 +296,7 @@ enum insn_type : unsigned int UNARY_OP_TAMA =3D __MASK_OP_TAMA | UNARY_OP_P, UNARY_OP_TAMU =3D __MASK_OP_TAMU | UNARY_OP_P, UNARY_OP_FRM_DYN =3D UNARY_OP | FRM_DYN_P, + UNARY_OP_TAMU_FRM_DYN =3D UNARY_OP_TAMU | FRM_DYN_P, UNARY_OP_TAMU_FRM_RUP =3D UNARY_OP_TAMU | FRM_RUP_P, UNARY_OP_TAMU_FRM_RDN =3D UNARY_OP_TAMU | FRM_RDN_P, @@ -460,6 +461,7 @@ void expand_cond_len_binop (unsigned, rtx *); void expand_reduction (unsigned, unsigned, rtx *, rtx); void expand_vec_ceil (rtx, rtx, machine_mode, machine_mode); void expand_vec_floor (rtx, rtx, machine_mode, machine_mode); +void expand_vec_nearbyint (rtx, rtx, machine_mode, machine_mode); #endif bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode, bool, void (*)(rtx *, rtx)); diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index a1ffefb23f3..4749dadf2d4 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -3571,6 +3571,13 @@ gen_floor_const_fp (machine_mode inner_mode) return gen_ceil_const_fp (inner_mode); } +static rtx +gen_nearbyint_const_fp (machine_mode inner_mode) +{ + /* The nearbyint needs the same floating point const as ceil. */ + return gen_ceil_const_fp (inner_mode); +} + static rtx emit_vec_float_cmp_mask (rtx fp_vector, rtx_code code, rtx fp_scalar, machine_mode vec_fp_mode) @@ -3676,4 +3683,33 @@ expand_vec_floor (rtx op_0, rtx op_1, machine_mode v= ec_fp_mode, emit_vec_copysign (op_0, op_0, op_1, vec_fp_mode); } +void +expand_vec_nearbyint (rtx op_0, rtx op_1, machine_mode vec_fp_mode, + machine_mode vec_int_mode) +{ + /* Step-1: Get the abs float value for mask generation. */ + emit_vec_abs (op_0, op_1, vec_fp_mode); + + /* Step-2: Generate the mask on const fp. */ + rtx const_fp =3D gen_nearbyint_const_fp (GET_MODE_INNER (vec_fp_mode)); + rtx mask =3D emit_vec_float_cmp_mask (op_0, LT, const_fp, vec_fp_mode); + + /* Step-3: Backup FP exception flags, nearbyint never raise exceptions. = */ + rtx fflags =3D gen_reg_rtx (SImode); + emit_insn (gen_riscv_frflags (fflags)); + + /* Step-4: Convert to integer on mask, with rounding down (aka nearbyint= ). */ + rtx tmp =3D gen_reg_rtx (vec_int_mode); + emit_vec_cvt_x_f (tmp, op_1, mask, UNARY_OP_TAMU_FRM_DYN, vec_fp_mode); + + /* Step-5: Convert to floating-point on mask for the nearbyint result. = */ + emit_vec_cvt_f_x (op_0, tmp, mask, UNARY_OP_TAMU_FRM_DYN, vec_fp_mode); + + /* Step-6: Restore FP exception flags. */ + emit_insn (gen_riscv_fsflags (fflags)); + + /* Step-7: Retrieve the sign bit for -0.0. */ + emit_vec_copysign (op_0, op_0, op_1, vec_fp_mode); +} + } // namespace riscv_vector diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint= -0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-0.c new file mode 100644 index 00000000000..f67b22ac02d --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-0.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-march=3Drv64gcv_zvfh -mabi=3Dlp64d -O3 -ftree-vectorize = -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" = } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "test-math.h" + +/* +** test__Float16___builtin_nearbyintf16: +** ... +** vsetvli\s+[atx][0-9]+,\s*zero,\s*e16,\s*m1,\s*ta,\s*mu +** vfabs\.v\s+v[0-9]+,\s*v[0-9]+ +** vmflt\.vf\s+v0,\s*v[0-9]+,\s*[fa]+[0-9]+ +** frflags\s+[axt][0-9]+ +** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t +** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t +** fsflags\s+[axt][0-9]+ +** vfsgnj\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+ +** ... +*/ +TEST_UNARY_CALL (_Float16, __builtin_nearbyintf16) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint= -1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-1.c new file mode 100644 index 00000000000..93639863412 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-1.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-march=3Drv64gcv -mabi=3Dlp64d -O3 -ftree-vectorize -fno-= vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "test-math.h" + +/* +** test_float___builtin_nearbyintf: +** ... +** vsetvli\s+[atx][0-9]+,\s*zero,\s*e32,\s*m1,\s*ta,\s*mu +** vfabs\.v\s+v[0-9]+,\s*v[0-9]+ +** vmflt\.vf\s+v0,\s*v[0-9]+,\s*[fa]+[0-9]+ +** frflags\s+[axt][0-9]+ +** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t +** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t +** fsflags\s+[axt][0-9]+ +** vfsgnj\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+ +** ... +*/ +TEST_UNARY_CALL (float, __builtin_nearbyintf) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint= -2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-2.c new file mode 100644 index 00000000000..d31de739d2d --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-2.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-march=3Drv64gcv -mabi=3Dlp64d -O3 -ftree-vectorize -fno-= vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "test-math.h" + +/* +** test_double___builtin_nearbyint: +** ... +** vsetvli\s+[atx][0-9]+,\s*zero,\s*e64,\s*m1,\s*ta,\s*mu +** vfabs\.v\s+v[0-9]+,\s*v[0-9]+ +** vmflt\.vf\s+v0,\s*v[0-9]+,\s*[fa]+[0-9]+ +** frflags\s+[axt][0-9]+ +** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t +** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t +** fsflags\s+[axt][0-9]+ +** vfsgnj\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+ +** ... +*/ +TEST_UNARY_CALL (double, __builtin_nearbyint) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint= -3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-3.c new file mode 100644 index 00000000000..4fd99505b40 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-3.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-march=3Drv64gcv -mabi=3Dlp64d -O3 -ftree-vectorize -fno-= vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "test-math.h" + +/* +** test_float___builtin_nearbyintf: +** ... +** vsetvli\s+[atx][0-9]+,\s*zero,\s*e32,\s*m1,\s*ta,\s*mu +** vfabs\.v\s+v[0-9]+,\s*v[0-9]+ +** vmflt\.vf\s+v0,\s*v[0-9]+,\s*[fa]+[0-9]+ +** frflags\s+[axt][0-9]+ +** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t +** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t +** fsflags\s+[axt][0-9]+ +** vfsgnj\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+ +** ... +** vmerge\.vvm\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+,\s*v0 +** ... +*/ +TEST_COND_UNARY_CALL (float, __builtin_nearbyintf) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint= -run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-r= un-1.c new file mode 100644 index 00000000000..7f49485822a --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-1.c @@ -0,0 +1,48 @@ +/* { dg-do run { target { riscv_vector } } } */ +/* { dg-additional-options "-std=3Dc99 -O3 -ftree-vectorize -fno-vect-cost= -model -ffast-math" } */ + +#include "test-math.h" + +#define ARRAY_SIZE 128 + +float in[ARRAY_SIZE]; +float out[ARRAY_SIZE]; +float ref[ARRAY_SIZE]; + +TEST_UNARY_CALL (float, __builtin_nearbyintf) +TEST_ASSERT (float) + +TEST_INIT (float, 1.2, 1.0, 1) +TEST_INIT (float, -1.2, -1.0, 2) +TEST_INIT (float, 3.0, 3.0, 3) +TEST_INIT (float, 8388607.5, 8388607.0, 4) +TEST_INIT (float, 8388609.0, 8388609.0, 5) +TEST_INIT (float, 0.0, 0.0, 6) +TEST_INIT (float, -0.0, -0.0, 7) +TEST_INIT (float, -8388607.5, -8388607.0, 8) +TEST_INIT (float, -8388608.0, -8388608.0, 9) + +int +main () +{ + unsigned fflags_before =3D get_fflags (); + + set_rm (FRM_RTZ); + + RUN_TEST (float, 1, __builtin_nearbyintf, in, out, ref, ARRAY_SIZE); + RUN_TEST (float, 2, __builtin_nearbyintf, in, out, ref, ARRAY_SIZE); + RUN_TEST (float, 3, __builtin_nearbyintf, in, out, ref, ARRAY_SIZE); + RUN_TEST (float, 4, __builtin_nearbyintf, in, out, ref, ARRAY_SIZE); + RUN_TEST (float, 5, __builtin_nearbyintf, in, out, ref, ARRAY_SIZE); + RUN_TEST (float, 6, __builtin_nearbyintf, in, out, ref, ARRAY_SIZE); + RUN_TEST (float, 7, __builtin_nearbyintf, in, out, ref, ARRAY_SIZE); + RUN_TEST (float, 8, __builtin_nearbyintf, in, out, ref, ARRAY_SIZE); + RUN_TEST (float, 9, __builtin_nearbyintf, in, out, ref, ARRAY_SIZE); + + unsigned fflags_after =3D get_fflags (); + + if (fflags_before !=3D fflags_after) + __builtin_abort (); + + return 0; +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint= -run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-r= un-2.c new file mode 100644 index 00000000000..4f8c7246b5e --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-2.c @@ -0,0 +1,48 @@ +/* { dg-do run { target { riscv_vector } } } */ +/* { dg-additional-options "-std=3Dc99 -O3 -ftree-vectorize -fno-vect-cost= -model -ffast-math" } */ + +#include "test-math.h" + +#define ARRAY_SIZE 128 + +double in[ARRAY_SIZE]; +double out[ARRAY_SIZE]; +double ref[ARRAY_SIZE]; + +TEST_UNARY_CALL (double, __builtin_nearbyint) +TEST_ASSERT (double) + +TEST_INIT (double, 1.2, 1.0, 1) +TEST_INIT (double, -1.8, -2.0, 2) +TEST_INIT (double, 3.0, 3.0, 3) +TEST_INIT (double, 4503599627370495.5, 4503599627370496.0, 4) +TEST_INIT (double, 4503599627370497.0, 4503599627370497.0, 5) +TEST_INIT (double, 0.0, 0.0, 6) +TEST_INIT (double, -0.0, -0.0, 7) +TEST_INIT (double, -4503599627370495.5, -4503599627370496.0, 8) +TEST_INIT (double, -4503599627370496.0, -4503599627370496.0, 9) + +int +main () +{ + unsigned fflags_before =3D get_fflags (); + + set_rm (FRM_RNE); + + RUN_TEST (double, 1, __builtin_nearbyint, in, out, ref, ARRAY_SIZE); + RUN_TEST (double, 2, __builtin_nearbyint, in, out, ref, ARRAY_SIZE); + RUN_TEST (double, 3, __builtin_nearbyint, in, out, ref, ARRAY_SIZE); + RUN_TEST (double, 4, __builtin_nearbyint, in, out, ref, ARRAY_SIZE); + RUN_TEST (double, 5, __builtin_nearbyint, in, out, ref, ARRAY_SIZE); + RUN_TEST (double, 6, __builtin_nearbyint, in, out, ref, ARRAY_SIZE); + RUN_TEST (double, 7, __builtin_nearbyint, in, out, ref, ARRAY_SIZE); + RUN_TEST (double, 8, __builtin_nearbyint, in, out, ref, ARRAY_SIZE); + RUN_TEST (double, 9, __builtin_nearbyint, in, out, ref, ARRAY_SIZE); + + unsigned fflags_after =3D get_fflags (); + + if (fflags_before !=3D fflags_after) + __builtin_abort (); + + return 0; +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/test-math.h b/= gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/test-math.h index d035835f370..b63ca56d848 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/test-math.h +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/test-math.h @@ -36,3 +36,36 @@ test_##TYPE##_init_##NUM (IN, REF, SIZE); \ test_##TYPE##_##CALL (OUT, IN, SIZE); \ test_##TYPE##_assert (OUT, REF, SIZE); + +#define FRM_RNE 0 +#define FRM_RTZ 1 +#define FRM_RDN 2 +#define FRM_RUP 3 +#define FRM_RMM 4 +#define FRM_DYN 7 + +static inline void +set_rm (unsigned rm) +{ + __asm__ volatile ( + "fsrm %0" + : + :"r"(rm) + : + ); +} + +static inline unsigned +get_fflags () +{ + unsigned fflags =3D 0; + + __asm__ volatile ( + "frflags %0" + :"=3Dr"(fflags) + : + : + ); + + return fflags; +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-nearbyint-= 1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-nearbyint-1.c new file mode 100644 index 00000000000..8c8498c5982 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-nearbyint-1.c @@ -0,0 +1,58 @@ +/* { dg-do compile } */ +/* { dg-options "-march=3Drv64gcv_zvfh_zvl4096b -mabi=3Dlp64d -O3 --param= =3Driscv-autovec-lmul=3Dm8 -ffast-math -fdump-tree-optimized" } */ + +#include "def.h" + +DEF_OP_V (nearbyintf16, 1, _Float16, __builtin_nearbyintf16) +DEF_OP_V (nearbyintf16, 2, _Float16, __builtin_nearbyintf16) +DEF_OP_V (nearbyintf16, 4, _Float16, __builtin_nearbyintf16) +DEF_OP_V (nearbyintf16, 8, _Float16, __builtin_nearbyintf16) +DEF_OP_V (nearbyintf16, 16, _Float16, __builtin_nearbyintf16) +DEF_OP_V (nearbyintf16, 32, _Float16, __builtin_nearbyintf16) +DEF_OP_V (nearbyintf16, 64, _Float16, __builtin_nearbyintf16) +DEF_OP_V (nearbyintf16, 128, _Float16, __builtin_nearbyintf16) +DEF_OP_V (nearbyintf16, 256, _Float16, __builtin_nearbyintf16) +DEF_OP_V (nearbyintf16, 512, _Float16, __builtin_nearbyintf16) +DEF_OP_V (nearbyintf16, 1024, _Float16, __builtin_nearbyintf16) +DEF_OP_V (nearbyintf16, 2048, _Float16, __builtin_nearbyintf16) + +DEF_OP_V (nearbyintf, 1, float, __builtin_nearbyintf) +DEF_OP_V (nearbyintf, 2, float, __builtin_nearbyintf) +DEF_OP_V (nearbyintf, 4, float, __builtin_nearbyintf) +DEF_OP_V (nearbyintf, 8, float, __builtin_nearbyintf) +DEF_OP_V (nearbyintf, 16, float, __builtin_nearbyintf) +DEF_OP_V (nearbyintf, 32, float, __builtin_nearbyintf) +DEF_OP_V (nearbyintf, 64, float, __builtin_nearbyintf) +DEF_OP_V (nearbyintf, 128, float, __builtin_nearbyintf) +DEF_OP_V (nearbyintf, 256, float, __builtin_nearbyintf) +DEF_OP_V (nearbyintf, 512, float, __builtin_nearbyintf) +DEF_OP_V (nearbyintf, 1024, float, __builtin_nearbyintf) + +DEF_OP_V (nearbyint, 1, double, __builtin_nearbyint) +DEF_OP_V (nearbyint, 2, double, __builtin_nearbyint) +DEF_OP_V (nearbyint, 4, double, __builtin_nearbyint) +DEF_OP_V (nearbyint, 8, double, __builtin_nearbyint) +DEF_OP_V (nearbyint, 16, double, __builtin_nearbyint) +DEF_OP_V (nearbyint, 32, double, __builtin_nearbyint) +DEF_OP_V (nearbyint, 64, double, __builtin_nearbyint) +DEF_OP_V (nearbyint, 128, double, __builtin_nearbyint) +DEF_OP_V (nearbyint, 256, double, __builtin_nearbyint) +DEF_OP_V (nearbyint, 512, double, __builtin_nearbyint) + +/* { dg-final { scan-assembler-not {csrr} } } */ +/* { dg-final { scan-tree-dump-not "1,1" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "2,2" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "4,4" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "16,16" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "32,32" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "64,64" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "128,128" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "256,256" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "512,512" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "1024,1024" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "2048,2048" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "4096,4096" "optimized" } } */ +/* { dg-final { scan-assembler-times {vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+,= \s*v0\.t} 30 } } */ +/* { dg-final { scan-assembler-times {vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+,= \s*v0\.t} 30 } } */ +/* { dg-final { scan-assembler-times {frflags\s+[atx][0-9]+} 30 } } */ +/* { dg-final { scan-assembler-times {fsflags\s+[atx][0-9]+} 30 } } */ -- 2.34.1 --_000_MW5PR11MB59080E4C51795F3C0C5B2BAAA9C3AMW5PR11MB5908namp_--