From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by sourceware.org (Postfix) with ESMTPS id 7B42D3858D33 for ; Wed, 1 Mar 2023 15:42:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7B42D3858D33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1677685342; x=1709221342; h=from:to:cc:subject:date:message-id:references: in-reply-to:mime-version; bh=REZbomRyHlsWcZWCkymaEN3lvrk6gPA5P/ZA1s6QcgA=; b=JJd5mVIuud2bFmTeJcrvts4QhBxRE+3fBLo65Xy5yKrSfbjUvcpaMoY9 ZJ+wcK5mHBSz1xqWhNvxu6qneC1yeLdnPciOEEVVv+whu7ioM/pworG0b cGvpqLyRdmHS9ZzFPzVFq0WBXuB5omAz+wprQDO5j+OLIUPJEqA4AjwGB NEFdpid4p1x702MLmRcMQ6p1sgxN72U866pRHZB3cVbFgkrBX1m/UbRJd Vs9/k8CgLJHrCkBjG7lIvRlmrZHtp6F1EGuMDl8Tb/dB4L7a9Is67+DN9 5KpEqfvSVrCvSZkSd2F2ozGgYY8dMWLZ61Jia1xV1IaFkZd9JRvb9Bj8T g==; X-IronPort-AV: E=McAfee;i="6500,9779,10636"; a="331918030" X-IronPort-AV: E=Sophos;i="5.98,225,1673942400"; d="scan'208,217";a="331918030" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Mar 2023 07:42:20 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10636"; a="674600768" X-IronPort-AV: E=Sophos;i="5.98,225,1673942400"; d="scan'208,217";a="674600768" Received: from fmsmsx603.amr.corp.intel.com ([10.18.126.83]) by orsmga002.jf.intel.com with ESMTP; 01 Mar 2023 07:42:20 -0800 Received: from fmsmsx612.amr.corp.intel.com (10.18.126.92) by fmsmsx603.amr.corp.intel.com (10.18.126.83) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.21; Wed, 1 Mar 2023 07:42:19 -0800 Received: from fmsedg601.ED.cps.intel.com (10.1.192.135) by fmsmsx612.amr.corp.intel.com (10.18.126.92) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.21 via Frontend Transport; Wed, 1 Mar 2023 07:42:19 -0800 Received: from NAM10-DM6-obe.outbound.protection.outlook.com (104.47.58.103) by edgegateway.intel.com (192.55.55.70) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.16; Wed, 1 Mar 2023 07:42:18 -0800 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=OFRBUop8MwfE4IG93g3nzHERb+N8d3Qit3y4Fr7hOUAYmL8KkqsEZtRYeKBeCxXyXiUecjD/vpidyvL5b4+wrNwVTejY3Kf5qx1/qgfoCgbAEqG2dGR7s+2U06c1anuWvvhaNhSmG0Eg2/ym4jfSp+TycxVsfcVCo4A0EABXWrNicW24pf9e8K2CLLNh2HogQyZ8ovKRfyKs+yuQGM89q17FsnCX+eGOj++mRXGEs+zkG01njjIKsrHkPT7oK8nk+SO0xvfvBrUkmlEbVQX1v7T+5gKAnbMUQEvzWUMhfJ4748QhFxrbwXEMy0SSa9e743lHqUAkRD7pFOZFnjd48w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=f72sNTDQwJS0IaT5u/hyE5k+N6NvgQdfglwxRm7QTxY=; b=EbBWVXMTQHxbRwSBui7Mkj2kO3FWu0PC93FzOopuqxxlyxou2/4hy3qegTj5DU2tp+AgdZlTtJktwnVUEEiWU2SBSzeR/zDFhaG/9u4F0qA/pky7Zn++418CByXP8pEp8tzJMl8jq3qvyIC7Ba2jnfJfvtWT8s8lNxOTD0iHOZuGVZHVpqzl6fl095CmFM33PReLRWjFX0mmQkwI6BsUyXZr/H2sLo/rKS4G1Zp1B00KYL3sEhg+pBRUVDxYGpYNMnM2ACT8fiu8s8DSO91LTGd3O4DmjBgu0Arz+zpGCzSpBHQMxJnzi+ZUvb7cTe1wQrfCYUdD/KqYx6uYLXee1Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Received: from MW5PR11MB5908.namprd11.prod.outlook.com (2603:10b6:303:194::10) by CH3PR11MB7722.namprd11.prod.outlook.com (2603:10b6:610:122::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6156.18; Wed, 1 Mar 2023 15:42:16 +0000 Received: from MW5PR11MB5908.namprd11.prod.outlook.com ([fe80::7ad:494f:ca70:719]) by MW5PR11MB5908.namprd11.prod.outlook.com ([fe80::7ad:494f:ca70:719%8]) with mapi id 15.20.6134.030; Wed, 1 Mar 2023 15:42:16 +0000 From: "Li, Pan2" To: "juzhe.zhong@rivai.ai" , rguenther CC: richard.sandiford , gcc-patches , Pan Li , kito.cheng Subject: RE: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment Thread-Topic: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment Thread-Index: AQHZTDY+LoOfVECU4k+WVfoDG3mnxa7l0GkAgAAQ8ruAAAdRxIAACMAlgAACRYCAAAEIAIAABKyagAANUzA= Date: Wed, 1 Mar 2023 15:42:16 +0000 Message-ID: References: , <9B9312B81334DB23+2023030122191665205362@rivai.ai> In-Reply-To: <9B9312B81334DB23+2023030122191665205362@rivai.ai> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: MW5PR11MB5908:EE_|CH3PR11MB7722:EE_ x-ms-office365-filtering-correlation-id: 2c25a242-568b-4e33-498c-08db1a6b88e5 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: FjvFHiwGDYBr3apQKJydB+qQ3HaULNJb+0XiX28TClZDL5XWxWtVN7XP4MFDPggiOL+cTlR7q5SazAXYJSbZIQjjMcIytUhxjc6M/bFo0z6Vtet2FmCNkravE3ozEdyKeGXRgIOrF/9BGYIdx3QVSmtC2gatFDjo5xliQssR7RgpjZdCPakYSs8NXyydYTzHst9LOC+sSbFtrINI3mt4XD1StZPa8hDEDp54qqJAoXeHNxSiOnQX5zSuZImJRZHZ+HrSvMY2Yoj1i3xQRfBu8blpg9WVRQTZmFGBHIozFIyDaou/7+QsHVB+nrVvdco0qu8XdFOtsPTh2jDOvrrtReX8tXnjl2J2ISAw8bMerD8tQPblmqwutOGWZyFfCPV32OStrGRPPC9Wt7+fOAPYkoLNhrk5X+aSBzsg9HEVFXRftMNEO0+vB6eJkn1ySAkWwuCUOno92r/fheXhYFeJiUUhfQIvRcxvXHuMycHyzwfUCf9TJbWn+6cqk64xAbzyig0Dipq5R0e+WlIJTNTBIIyyzc/pzEHNB4KwtWtK7i/LahsbnyXFJ1SPhtHS5VlBPcvD5+EiTwkMKreW0ZyTqPfn2i8ziCS+e/u+SjcUam66pA2Ai0ZRaV6XQJd0P1VCLZrjPBoxQYWJiue8LknLzE7AtZcrj5cNduVrT/VXgbksPTbi6DuqEfKJc0pb+QT5 x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:MW5PR11MB5908.namprd11.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230025)(346002)(366004)(376002)(39860400002)(396003)(136003)(451199018)(76116006)(82960400001)(38100700002)(8676002)(122000001)(8936002)(4326008)(66556008)(478600001)(5660300002)(66946007)(66476007)(45080400002)(166002)(2906002)(71200400001)(55016003)(66446008)(41300700001)(38070700005)(7696005)(54906003)(966005)(33656002)(86362001)(83380400001)(64756008)(6506007)(53546011)(110136005)(316002)(52536014)(9686003)(186003)(26005);DIR:OUT;SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?snowfodrvBzO2IXK2YMg9YUjQ9JqkSBulf0tY0bnGcgcDvMQ3FGbZe7Xi3st?= =?us-ascii?Q?ya8KKDS2tmsr2GOcDoROSa+7DXtB5WaprM/T0ll5Af0NQ7MLiDC3AQkpjmf2?= =?us-ascii?Q?FlN25Xoa35nWoCME6ZDCWFYNke2gtqNS4DGZJPWbL+L46xNcAuD6PP7WE8XL?= =?us-ascii?Q?kbP3lsxw8D1eg9gIidC34Kbt9clCGtMs3HkfoMfAYVX2F4sFohoqxF0NaM2q?= =?us-ascii?Q?ymiPFmcnSeo9aLxLrQN5J8EVtUrfY3uRYCqxbNWJBCfZfuVrwlWg+cKwm5xZ?= =?us-ascii?Q?cxnVE/SZ5F0QCHoy1E1f4dfK2XWpC1XYRV2hkgvD4HZgOyd5WWB2CZ1KBJN6?= =?us-ascii?Q?Gf72ZsMszU72nNKBEGQ3mfpKEzEvx/7ygMvURfPcS4ZS3SBu6abDHf+oxDQo?= =?us-ascii?Q?37+BxbvlK0F7in2xFpnOUjC4jhr3dEgqr0nRwRsAzGwgeczHd45NMxd1XmRk?= =?us-ascii?Q?h0zAox81tO5pc4Blh5yO3ufG4MBNZxDmu5YRvggc8nPzpZ1ipsrrhqzadiDt?= =?us-ascii?Q?sblONiv5RkFhVb9HUSTwRDIRsk9wRouS7baJPQm6RMR3T/Ve57NzVKRxdps2?= =?us-ascii?Q?fzp4BryxErijNDrV2fwloTepbXWZaAThqx09flrcrFUEFs8QcYsuuuOxT+9f?= =?us-ascii?Q?48SoRPcttiBQ7wGwFtzsYwUiKkJ8li2CuHTp5BrmHLlMIrRP5cWBOp97i3mz?= =?us-ascii?Q?cftOjCA/cnvMrkP+3POzIA7lMUj/5Sb5Al06Pw44fjSFGCy5D3aHziyjtEOm?= =?us-ascii?Q?sMD7ACNM3lE/CjhEAu/lIPIx5nWaOUHs+4t/Kih5gEN1y/zyL80PcsY0HjMx?= =?us-ascii?Q?LM9Ajc3oa1RLifLV9eLJrRSm9VLKOZ9hF7RlRO5nZiWmb/yrf3abnsN9Rd4C?= =?us-ascii?Q?XXCdHKcy6exKBYxRTKFfHdRULbUWtlbDsMf2clpzAcMg6v89nKf7H3QSP3K9?= =?us-ascii?Q?xjnXU5D8PPyLR6aotmNCeL+uTd//2WxJL59um+8+DPzXoaU/KCYUoqvW6I0Y?= =?us-ascii?Q?4KxcqPfcf/T/mrRM3D8vY5Uwfw1VrrYY3TU9r3t8N+ZL679fAg2z13JNFSKB?= =?us-ascii?Q?PbANJ0NyatB+N+3gCjiVUDroUGRV2l8WXRc9Xn+iUSmGIoCagU42ySfJZv1t?= =?us-ascii?Q?hocxTy+KYqqexaQrXncW+z9yigtZoX27UbMRSrLTzIJO/0XH9u40jdCOHp9X?= =?us-ascii?Q?Xe/9BqHhsYsWquBOO7EvFi/qEfah/oYjBAzGX35IxxFfHsCQkRe9R91DX+lx?= =?us-ascii?Q?LmCVKFwOMxvOgk6x5Pf3yRWMYrnO2yKROJmCqzcCPSEni3mBthpTc3h8jB4z?= =?us-ascii?Q?h7Jdpn9cO5mGTXNvsTn8ssq0hvV9ggWZs3fenRRBr5qNt3+t79chMVjsJV6U?= =?us-ascii?Q?SRJYmO0eau9jN44O1BmRLc+YB6NhSOgz+aVIEmfuwuyETr6/Y+bv3KOj7g3s?= =?us-ascii?Q?6Y0VeggGNmRFPhfKZ62qCvvt2XRYJYyTvnum3vMcd/sWs4aar9IQQ0eed28D?= =?us-ascii?Q?wJRNHufCuv3leHeVx0a+5vHC7Hd52htFESfCFzmELmoVh2ssOPFUP+XLz12g?= =?us-ascii?Q?7QYnVN1ubRwzbE4Tm9g=3D?= Content-Type: multipart/alternative; boundary="_000_MW5PR11MB59081B23B3C005A3C96712A0A9AD9MW5PR11MB5908namp_" MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: MW5PR11MB5908.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 2c25a242-568b-4e33-498c-08db1a6b88e5 X-MS-Exchange-CrossTenant-originalarrivaltime: 01 Mar 2023 15:42:16.2983 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: igxSIzOijaGOPvNfucg7qeQzbaPDNW9B3NJ+o0OpJYUwPp/xvD3GLtN5Hkdne1LOH1KUDjr4fvj9QSCc9a8QBA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH3PR11MB7722 X-OriginatorOrg: intel.com X-Spam-Status: No, score=-5.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,HTML_MESSAGE,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --_000_MW5PR11MB59081B23B3C005A3C96712A0A9AD9MW5PR11MB5908namp_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Thanks all for so much valuable and helpful materials. As I understand (Please help to correct me if any mistake.), for the VNx*BI= (aka, 1, 2, 4, 8, 16, 32, 64), the precision and mode size need to be adjusted as below. Precision size [1, 2, 4, 8, 16, 32, 64] Mode size [1, 1, 1, 1, 2, 4, 8] Given that, if we ignore the self-test failure, only the adjust_precision p= art is able to fix the bug I mentioned. The genmode will first get the precision, and then leverage the mode_size = =3D exact_div / 8 to generate. Meanwhile, it also provides the adjust_mode_size after the mode_size genera= tion. The riscv parts has the mode_size_adjust already and the value of mode_size= will be overridden by the adjustments. Unfortunately, the early stage mode_size generation leveraged exact_div, wh= ich doesn't honor precision size < 8 with the adjustment and fails on exact_div assertions. Besides the precision adjustment, I am not sure if we can narrow down the p= roblem to. 1. Defined the real size of both the precision and mode size to align th= e riscv ISA. 2. Besides, make the general mode_size =3D precision_size / 8 is able to= take care of both the exact_div and the dividend less than the divisor (li= ke 1/8 or 2/8) cases. Could you please share your professional suggestions about this? Thank you = all again and have a nice day! Pan From: juzhe.zhong@rivai.ai Sent: Wednesday, March 1, 2023 10:19 PM To: rguenther Cc: richard.sandiford ; gcc-patches ; Pan Li ; Li, Pan2 ; kito.cheng Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustm= ent >> So given the above I think that modeling the size as being the same >> but with accurate precision would work. It's then only the size of the >> padding in bytes we cannot represent with poly-int which should be fine. >> Correct? Yes. >> Btw, is storing a VNx1BI and then loading a VNx2BI from the same >> memory address well-defined? That is, how is the padding handled >> by the machine load/store instructions? storing VNx1BI is storing the data from addr 0 ~ 1/8 poly (1,1) and keep ad= dr 1/8 poly (1,1) ~ 2/8 poly (1,1) memory data unchange. load VNx2BI will load 0 ~ 2/8 poly (1,1), note that 0 ~ 1/8 poly (1,1) is = the date that we store above, 1/8 poly (1,1) ~ 2/8 poly (1,1) is the ori= gnal memory data. You can see here for this case (LLVM): https://godbolt.org/z/P9e1adrd3 foo: # @foo vsetvli a2, zero, e8, mf8, ta, ma vsm.v v0, (a0) vsetvli a2, zero, e8, mf4, ta, ma vlm.v v8, (a0) vsm.v v8, (a1) ret We can also doing like this in GCC as long as we can differentiate VNx1BI a= nd VNx2BI, and GCC do not eliminate statement according precision even thou= gh they have same bytesize. First we emit vsetvl e8mf8 +vsm for VNx1BI Then we emit vsetvl e8mf8 + vlm for VNx2BI Thanks. ________________________________ juzhe.zhong@rivai.ai From: Richard Biener Date: 2023-03-01 22:03 To: juzhe.zhong CC: richard.sandiford; gcc-patches; Pan Li; pa= n2.li; kito.cheng Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustm= ent On Wed, 1 Mar 2023, Richard Biener wrote: > On Wed, 1 Mar 2023, juzhe.zhong@rivai.ai wro= te: > > > Let's me first introduce RVV load/store basics and stack allocation. > > For scalable vector memory allocation, we allocate memory according to = machine vector-length. > > To get this CPU vector-length value (runtime invariant but compile time= unknown), we have an instruction call csrr vlenb. > > For example, csrr a5,vlenb (store CPU a single register vector-length v= alue (describe as bytesize) in a5 register). > > A single register size in bytes (GET_MODE_SIZE) is poly value (8,8) byt= es. That means csrr a5,vlenb, a5 has the value of size poly (8,8) bytes. > > > > Now, our problem is that VNx1BI, VNx2BI, VNx4BI, VNx8BI has the same by= tesize poly (1,1). So their storage consumes the same size. > > Meaning when we want to allocate a memory storge or stack for register = spillings, we should first csrr a5, vlenb, then slli a5,a5,3 (means a5 =3D = a5/8) > > Then, a5 has the bytesize value of poly (1,1). All VNx1BI, VNx2BI, VNx4= BI, VNx8BI are doing the same process as I described above. They all consume > > the same memory storage size since we can't model them accurately accor= ding to precision or you bitsize. > > > > They consume the same storage (I am agree it's better to model them mor= e accurately in case of memory storage comsuming). > > > > Well, even though they are consuming same size memory storage, I can ma= ke their memory accessing behavior (load/store) accurately by > > emiting the accurate RVV instruction for them according to RVV ISA. > > > > VNx1BI,VNx2BI, VNx4BI, VNx8BI are consuming same memory storage with si= ze poly (1,1) > > The instruction for these modes as follows: > > VNx1BI: vsevl e8mf8 + vlm, loading 1/8 of poly (1,1) storage. > > VNx2BI: vsevl e8mf8 + vlm, loading 1/4 of poly (1,1) storage. > > VNx4BI: vsevl e8mf8 + vlm, loading 1/2 of poly (1,1) storage. > > VNx8BI: vsevl e8mf8 + vlm, loading 1 of poly (1,1) storage. > > > > So base on these, It's fine that we don't model VNx1BI,VNx2BI, VNx4BI, = VNx8BI accurately according to precision or bitsize. > > This implementation is fine even though their memory storage is not acc= urate. > > > > However, the problem is that since they have the same bytesize, GCC wil= l think they are the same and do some incorrect statement elimination: > > > > (Note: Load same memory base) > > load v0 VNx1BI from base0 > > load v1 VNx2BI from base0 > > load v2 VNx4BI from base0 > > load v3 VNx8BI from base0 > > > > store v0 base1 > > store v1 base2 > > store v2 base3 > > store v3 base4 > > > > This program sequence, in GCC, it will eliminate the last 3 load instru= ctions. > > > > Then it will become: > > > > load v0 VNx1BI from base0 =3D=3D=3D> vsetvl e8mf8 + vlm (only load 1/8 = of poly size (1,1) memory data) > > > > store v0 base1 > > store v0 base2 > > store v0 base3 > > store v0 base4 > > > > This is what we want to fix. I think as long as we can have the way to = differentiate VNx1BI,VNx2BI, VNx4BI, VNx8BI > > and GCC will not do th incorrect elimination for RVV. > > > > I think it can work fine even though these 4 modes consume inaccurate = memory storage size > > but accurate data memory access load store behavior. > > So given the above I think that modeling the size as being the same > but with accurate precision would work. It's then only the size of the > padding in bytes we cannot represent with poly-int which should be fine. > > Correct? Btw, is storing a VNx1BI and then loading a VNx2BI from the same memory address well-defined? That is, how is the padding handled by the machine load/store instructions? Richard. --_000_MW5PR11MB59081B23B3C005A3C96712A0A9AD9MW5PR11MB5908namp_--