From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by sourceware.org (Postfix) with ESMTPS id CAE053858D33 for ; Thu, 2 Mar 2023 06:08:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CAE053858D33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1677737288; x=1709273288; h=from:to:cc:subject:date:message-id:references: in-reply-to:mime-version; bh=rCeoZOkTTDWMkUOcuPYpc2CoAmdJuVApDayuCDy6jQE=; b=OSShc+2ThotBwJMPQiPAmxNB02wTeTs0ay5UtYDbniFG/Y27t9HFex2J vBDEoCbI6Vu9zF2kUrYK5oNnDB02X2a+24c9mms0Y9Wy7t7DI2/1KUz+w 1bKuqLjK5urRjYaHVubCA7i0nYEXnmIZk8swJe6G1WlNGsmH1Eo9oQD6t o+QAkoSmvwQpz4UBK1Z0uQKUp3trt7TUMc8oIidMi3tUV6X1iX7mqpDH8 BJs1tRMnr1L3HgQq72FxLCBukGsnbF6r0CKfoJArHlHYJolZxMcR3oAEd 4tlh2tNTxV8f/G7hR23tDJ4hDbUbcUBMANbhtwsQ8FZDbQ03gJb0AHOSQ Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10636"; a="362216279" X-IronPort-AV: E=Sophos;i="5.98,226,1673942400"; d="scan'208,217";a="362216279" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Mar 2023 22:08:07 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10636"; a="743711535" X-IronPort-AV: E=Sophos;i="5.98,226,1673942400"; d="scan'208,217";a="743711535" Received: from orsmsx601.amr.corp.intel.com ([10.22.229.14]) by fmsmga004.fm.intel.com with ESMTP; 01 Mar 2023 22:08:07 -0800 Received: from orsmsx610.amr.corp.intel.com (10.22.229.23) by ORSMSX601.amr.corp.intel.com (10.22.229.14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.21; Wed, 1 Mar 2023 22:08:06 -0800 Received: from orsmsx610.amr.corp.intel.com (10.22.229.23) by ORSMSX610.amr.corp.intel.com (10.22.229.23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.21; Wed, 1 Mar 2023 22:08:06 -0800 Received: from ORSEDG601.ED.cps.intel.com (10.7.248.6) by orsmsx610.amr.corp.intel.com (10.22.229.23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.21 via Frontend Transport; Wed, 1 Mar 2023 22:08:06 -0800 Received: from NAM10-DM6-obe.outbound.protection.outlook.com (104.47.58.103) by edgegateway.intel.com (134.134.137.102) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.21; Wed, 1 Mar 2023 22:08:05 -0800 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=FB9J7oXUvzLMSAvu8+5Il4xpDQNE1BxvI/jeYDi7ZtHucBtmoi2lIsZnzVpE0WRuY1/gEvnFq4TIi82xT0ZyQb+CvtTzwU4mHQceNkcBB9NNpBNDSafTz5TYW6+CX/91R8z3BwthetSz4xbKQL5hySJBr2aSelfFjtLUzXRJoEzP/uLuiSeomnGwEopqRqg8tNXzWMrirGyU8rtsTZget1HpmBDl9n50o2J/wELQOuWB6PEELTJGIr0tGnkbxYHl/chbhGuLUWHbtU2SLAoMlhLAdz/ONcicoRVLZPpkMZtDWBpwwO6tvgdJo59XLiL+pBBPqWdYqg0/K+69DG0csA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=wbDWXB49KGP+UHp7toioRUvlXfNS4kuVbK3wZxv4Csg=; b=XRsir20zccowkgnXrO/RSG74YS4hYhu1BgGiM8gWT1PXY6idWlKUfbogjDLUAVw/VkEjj2J13BwU1nmyu/lZjnoVAUBm+8c2FPy3CZohyqvcuife5f2KAXylZQXDUDUVNlwsJvNB055YHZvDw6kPAUXB+CFWY+IKDZevxw4pS6eJH3JruRHuji8cfdFVj2lwNjCRZ/ipdCuOcfOF+/CsVnxnp632Ts0ArvFaaAopMcmUkFVB24wQRHJfcl9rp7NHZSq3YmS5Npw67vXd6SWtFvT8eUiYbCFoGdPumUQE+Wqtfx3NF4sF2JRoAVgSXQGC6g834Am+0IPWAXsvwRHDVA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Received: from MW5PR11MB5908.namprd11.prod.outlook.com (2603:10b6:303:194::10) by DS0PR11MB6325.namprd11.prod.outlook.com (2603:10b6:8:cf::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6134.26; Thu, 2 Mar 2023 06:07:58 +0000 Received: from MW5PR11MB5908.namprd11.prod.outlook.com ([fe80::7ad:494f:ca70:719]) by MW5PR11MB5908.namprd11.prod.outlook.com ([fe80::7ad:494f:ca70:719%8]) with mapi id 15.20.6134.030; Thu, 2 Mar 2023 06:07:58 +0000 From: "Li, Pan2" To: "juzhe.zhong@rivai.ai" , richard.sandiford CC: rguenther , gcc-patches , "Pan Li" , kito.cheng Subject: RE: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment Thread-Topic: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment Thread-Index: AQHZTFj11DCGbfbbF0O+uYHM4tsYGq7miNM6gAB2szA= Date: Thu, 2 Mar 2023 06:07:58 +0000 Message-ID: References: , , <9B9312B81334DB23+2023030122191665205362@rivai.ai>, , <56212E9722A8E5C3+2023030206533064159132@rivai.ai> In-Reply-To: <56212E9722A8E5C3+2023030206533064159132@rivai.ai> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: MW5PR11MB5908:EE_|DS0PR11MB6325:EE_ x-ms-office365-filtering-correlation-id: 709cfe77-37a5-477f-328a-08db1ae47890 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: jMt6XC+eDfBXqntAhe29TkAhu3iUREFjyY+oEti1homDnDGnX7PY1XIUFvjSYb87sDU0OvE7Jzi0GDROWiQAqQqTKX+Xx/mNexB3+1UstA+uDwVhlhOR1kU1HvXM8uvqkw3GQf46C9Xj6gdwf+N/4UJuuB2ODWhIbk87RhAcMwGytKQ71n2n5stRKv4sC17gzj6U8IfoYempuXyrKER/+42h2hnbmb/rrSz//3hX/K2cx5Ro+ZMjw7XbRUDNwRauuMGu50TCqspbnPq2PmCHEkjLVmI9hl21x/GNyBqkkYcrT6sldzG+1TSADFMPvPfnCR3VYfJ20DQZURah2kzUtLdnVeloNG7wJiUmrsQkigXQX9aEsaa9JTKb0K4sWUyqnJavXBZUhEtzgl/cx3HcqnXK9NjJ6A077CXk+aciCgHGCz/NRapFLXxjVlanOHIuOaGaz28VqJo+IfISw0+jMIs3aH9IbP6lvySUo8LJ3loP2iQEHrvmdzoYQiIhVVIT6bx6/7Nv9Di0yVuhyInGXAvIzdICG0cVS2znTk/GMEscgAbq/10P2B8AHl889TcH8Cs9PgNFxFR+0Ivn1eWMlM+BuIYMFp9Xn6gAEKXNawj8j+5tnQ4Aw6HE3241L5sXeFhcViel/rfbeU6crxTPI8lctcFbgY1BHC53zGZ49VrMMnm6OquN69d5eCvRd69A x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:MW5PR11MB5908.namprd11.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230025)(39860400002)(396003)(136003)(376002)(366004)(346002)(451199018)(478600001)(21615005)(30864003)(82960400001)(122000001)(71200400001)(316002)(38100700002)(83380400001)(110136005)(8936002)(53546011)(26005)(33656002)(38070700005)(186003)(166002)(52536014)(5660300002)(6506007)(966005)(8676002)(66946007)(41300700001)(66476007)(7696005)(9686003)(45080400002)(66556008)(55016003)(66446008)(54906003)(76116006)(4326008)(86362001)(2906002)(64756008)(579004)(559001);DIR:OUT;SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?gHzkT3hLA4PuhZXzol+Szod1NfJC+VeK/V34Or1fpAVkpm4evaSSvp3Ozxh8?= =?us-ascii?Q?wceO/icNaIxabycgMZnLJ9Kuvr3ZZ3eJjQvTHP8mlk8c2lJIKNjp6afpMJvv?= =?us-ascii?Q?wdhvdev2GctC0j2gs8umqyAInlKmx+862U8ZYv9SheKGFn8tXG7B78LCsE+E?= =?us-ascii?Q?3ushQ9oSa4qHyY6FS1smbnvPFRF1KWQf/eDgsP4vAxNUM59jVHkRzHO1JfuS?= =?us-ascii?Q?l7ijpoMEgZaK/4sUcZdxATwkDSyq/UPnyRYs6ngKE2meOAAGNbL3SwDrsnig?= =?us-ascii?Q?wIKAuJ/LdblX4l9m/Gu6rcfdqZLn0fe6KMqRGj9pi1+oD/bI+kY1XjsC8Zq3?= =?us-ascii?Q?tIGFQ+0k1Se1VWryOeqzw7XVtHtFiFAQVazrcLnvkgX7S6Y0iaAauzxQwlqt?= =?us-ascii?Q?3BrXvZiAXihnOlRwVIgr45KWnr5VN5nsjbdLzCc+0Ewcbm80ob/6SnAMvLp3?= =?us-ascii?Q?GMexceJDdhzm4phL2YBwznab10x3E887mNqwUkVTD0ae2jHqi8ZSPPW4I54h?= =?us-ascii?Q?sQYXhmVpnj6RzCD3z9OO3AbCqllFeQBCUAMtIt4lGEUuLimso9AbXSd5Q/0M?= =?us-ascii?Q?iZnOWmJ0wAuf9p6u0+WUiW6enSyJo/X6sVWB9HAbJngKzV/U7vOq1SBb/MGU?= =?us-ascii?Q?0KoDHL4ikCoE1SGHRkt+OJLzrkSqbuk2mDyKrjS1YYksMIj9ca7VoDHNtI8q?= =?us-ascii?Q?GbYfDLSubcRTl7C81VFs9fWwjpXP9EkQWKGtFdLysTJUDyZdWcQrFU0H7nbe?= =?us-ascii?Q?/HeWYnIXG447pgPxmYlOFRHcgW1ZprFF74yUYQZabVBPCAfPQm/UqmchAwpJ?= =?us-ascii?Q?H1kv92Koz7kwZWqFrmhIPU71Pk2SyZSnesMu8iFAL/z8xGbuVNwY34sSBnfF?= =?us-ascii?Q?3ZwyShNiP9QvAlmdhRSatTAgF9MxU5QqIAuf748HDYC/lVkZQfUbyHlLY4C2?= =?us-ascii?Q?CrwlSdGdg63/T0APN4wPBn/qOSMJ0PQbJ6ZxJR8oBW4GBwfA7RzMkciUJ9QH?= =?us-ascii?Q?QV0ZXSTUecfwx0532GPI0E0lYKE0RbzT0eZQz1n7sf1wvJ2/J3daC1r8qSag?= =?us-ascii?Q?5JeLsdvRhwAzBC9w5rwuNtQmiKDL/aDuwoE02f6nPO2Hh9IcgaxA4QEaQcJI?= =?us-ascii?Q?cWgGaswklqJHH4rCi8W3Nm2W25zhEYVvhmv05a1EW+F0vLhukf+z8FArx1r1?= =?us-ascii?Q?KZCC+1kuz9A/VpQDePGU4pI957TXYOgEj8pAE+IN4Qr3Fdop+nVVfl5OsY3c?= =?us-ascii?Q?RYcw6SssGvjFfVW4W9qf2f/QPUgll3aStV6mlkAz5/7lKuFOCkz1v1bLRn47?= =?us-ascii?Q?WAlMk6wfwTYByt6BwaijRyxOPLQyK08Hqt3w44exEQTL3HPVwYi08+HE4ree?= =?us-ascii?Q?L6Uyte2McYaBaHTENrMyd+wYKu70uu7UsDtekCWS8D3Bk5neXjWapUaPoeu6?= =?us-ascii?Q?M5j48PtCnpKvMw4Kna7SHezFBPOlEbNo3i4qGafVsPJimA4pRgbrSYeu3g6j?= =?us-ascii?Q?RQvg80JqzK+sB+6VFfagKj/KfZruArPkpzSdC3uD2m14+JqW0JUzvrZCs9fU?= =?us-ascii?Q?+v2pF0fm2A1Y59pJvMeWRuAhoV5acv4FSkq21rEZ?= Content-Type: multipart/alternative; boundary="_000_MW5PR11MB5908DD95DA5A328CC5D0ADF4A9B29MW5PR11MB5908namp_" MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: MW5PR11MB5908.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 709cfe77-37a5-477f-328a-08db1ae47890 X-MS-Exchange-CrossTenant-originalarrivaltime: 02 Mar 2023 06:07:58.0326 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: 5f3bFoAy6JqhJdUg4bxLSDK/oGT64W7z7c36M1+5Ma5kdNX6gxv0g04LmRMFdYSp4OTzNvli9UgEbglpIQ5m8w== X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS0PR11MB6325 X-OriginatorOrg: intel.com X-Spam-Status: No, score=-5.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,HTML_MESSAGE,KAM_SHORT,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --_000_MW5PR11MB5908DD95DA5A328CC5D0ADF4A9B29MW5PR11MB5908namp_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Thanks all for help. I tried and validated the way Richard mentioned, it works well as expected. Meanwhile, I updated the PR as below (I take the in-reply-to option for sen= d-email but looks failed). Could you please help to review continuously? Additionally, I would like to learn if we can land this patch for the GCC 1= 3 release (RVV release included). https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613149.html Pan From: juzhe.zhong@rivai.ai Sent: Thursday, March 2, 2023 6:54 AM To: richard.sandiford ; Li, Pan2 Cc: rguenther ; gcc-patches ; P= an Li ; kito.cheng Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustm= ent >> Does the eventual value set by ADJUST_BYTESIZE equal the real number of >> bytes loaded by vlm.v and stored by vstm.v (after the appropriate vsetvl= )? >> Or is the GCC size larger in some cases than the number of bytes >> loaded and stored? For VNx1BI,VNx2BI,VNx4BI,VNx8BI, we allocate the larger size of memory or s= tack for register spillling according to ADJUST_BYTESIZE. After appropriate vsetvl, VNx1BI is loaded/stored 1/8 of ADJUST_BYTESIZE (v= setvl e8mf8). After appropriate vsetvl, VNx2BI is loaded/stored 2/8 of ADJUST_BYTESIZE (v= setvl e8mf2). After appropriate vsetvl, VNx4BI is loaded/stored 4/8 of ADJUST_BYTESIZE (v= setvl e8mf4). After appropriate vsetvl, VNx8BI is loaded/stored 8/8 of ADJUST_BYTESIZE (v= setvl e8m1). Note: except these 4 machine modes, all other machine modes of RVV, ADJUST_= BYTESIZE are equal to the real number of bytes of load/store instruction that RVV IS= A define. Well, as I said, it's fine that we allocated larger memory for VNx1BI,VNx2B= I,VNx4BI, we can emit appropriate vsetvl to gurantee the correctness in RISC-V backwa= rd according to the machine_mode as long as long GCC didn't do the incorrect elimination= in middle-end. Besides, poly (1,1) is 1/8 of machine vector-length which is already really= a small number, which is the real number bytes loaded/stored for VNx8BI. You can say VNx1BI, VNx2BI, VNx4BI are consuming larger memory than we actu= ally load/stored by appropriate vsetvl since they are having same ADJUST_BYTESIZE as VNx8BI. However, I think it's= totally fine so far as long as we can gurantee the correctness and I think optimizing such memory storage consumi= ng is trivial. >> And does it equal the size of the corresponding LLVM machine type? Well, for some reason, in case of register spilling, LLVM consume much more= memory than GCC. And they always do whole register load/store (a single vector register vect= or-length) for register spilling. That's another story (I am not going to talk too much about this since it's= a quite ugly implementation). They don't model the types accurately according RVV ISA for register spilli= ng. In case of normal load/store like: vbool8_t v2 =3D *(vbool8_t*)in; *(vbool8_t*)(out + 100) =3D v2; This kind of load/store, their load/stores instructions of codegen are accu= rate. Even though their instructions are accurate for load/store accessing behavi= or, I am not sure whether size of their machine type is accurate. For example, in IR presentation: VNx1BI of GCC is represented as vscale x 1= x i1 VNx2BI of GCC is represented as vscale x 2 x i1 in LLVM IR. I am not sure the bytesize of vscale x 1 x i1 and vscale x 2 x i1. I didn't take a deep a look at it. I think this question is not that important, no matter whether VNx1BI and V= Nx2BI are modeled accurately in case of ADUST_BYTESIZE in GCC or vscale x 1 x i1 and vscale x 2 x i1 are modeled accurately in ca= se of their bytesize, I think as long as we can emit appropriate vsetvl + vlm/vsm, it's totally f= ine for RVV even though in some case, their memory allocation is not accurate in compiler. ________________________________ juzhe.zhong@rivai.ai From: Richard Sandiford Date: 2023-03-02 00:14 To: Li\, Pan2 CC: juzhe.zhong\@rivai.ai; rguenther; gcc-patches; Pan Li; kito.cheng Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment "Li, Pan2" > writes: > Thanks all for so much valuable and helpful materials. > > As I understand (Please help to correct me if any mistake.), for the VNx*= BI (aka, 1, 2, 4, 8, 16, 32, 64), > the precision and mode size need to be adjusted as below. > > Precision size [1, 2, 4, 8, 16, 32, 64] > Mode size [1, 1, 1, 1, 2, 4, 8] > > Given that, if we ignore the self-test failure, only the adjust_precision= part is able to fix the bug I mentioned. > The genmode will first get the precision, and then leverage the mode_size= =3D exact_div / 8 to generate. > Meanwhile, it also provides the adjust_mode_size after the mode_size gene= ration. > > The riscv parts has the mode_size_adjust already and the value of mode_si= ze will be overridden by the adjustments. Ah, OK! In that case, would the following help: Turn: mode_size[E_%smode] =3D exact_div (mode_precision[E_%smode], BITS_PER_UNI= T); into: if (!multiple_p (mode_precision[E_%smode], BITS_PER_UNIT, &mode_size[E_%smode])) mode_size[E_%smode] =3D -1; where -1 is an "obviously wrong" value. Ports that might hit the -1 are then responsible for setting the size later, via ADJUST_BYTESIZE. After all the adjustments are complete, genmodes asserts that no size is known_eq to -1. That way, target-independent code doesn't need to guess what the correct behaviour is. Does the eventual value set by ADJUST_BYTESIZE equal the real number of bytes loaded by vlm.v and stored by vstm.v (after the appropriate vsetvl)? And does it equal the size of the corresponding LLVM machine type? Or is the GCC size larger in some cases than the number of bytes loaded and stored? (You and Juzhe have probably answered that question before, sorry, but I'm still not 100% sure of the answer. Personally, I think I would find the ISA behaviour easier to understand if the explanation doesn't involve poly_ints. It would be good to understand things "as the architecture sees then" rather than in terms of GCC concepts.) Thanks, Richard > Unfortunately, the early stage mode_size generation leveraged exact_div, = which doesn't honor precision size < 8 > with the adjustment and fails on exact_div assertions. > > Besides the precision adjustment, I am not sure if we can narrow down the= problem to. > > > 1. Defined the real size of both the precision and mode size to align = the riscv ISA. > 2. Besides, make the general mode_size =3D precision_size / 8 is able = to take care of both the exact_div and the dividend less than the divisor (= like 1/8 or 2/8) cases. > > Could you please share your professional suggestions about this? Thank yo= u all again and have a nice day! > > Pan > > From: juzhe.zhong@rivai.ai > > Sent: Wednesday, March 1, 2023 10:19 PM > To: rguenther > > Cc: richard.sandiford >; gcc-patches >; Pan Li >; Li, Pan2 >; kito.che= ng > > Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjus= tment > >>> So given the above I think that modeling the size as being the same >>> but with accurate precision would work. It's then only the size of the >>> padding in bytes we cannot represent with poly-int which should be fine. > >>> Correct? > Yes. > >>> Btw, is storing a VNx1BI and then loading a VNx2BI from the same >>> memory address well-defined? That is, how is the padding handled >>> by the machine load/store instructions? > > storing VNx1BI is storing the data from addr 0 ~ 1/8 poly (1,1) and keep = addr 1/8 poly (1,1) ~ 2/8 poly (1,1) memory data unchange. > load VNx2BI will load 0 ~ 2/8 poly (1,1), note that 0 ~ 1/8 poly (1,1) i= s the date that we store above, 1/8 poly (1,1) ~ 2/8 poly (1,1) is the o= rignal memory data. > You can see here for this case (LLVM): > https://godbolt.org/z/P9e1adrd3 > foo: # @foo > vsetvli a2, zero, e8, mf8, ta, ma > vsm.v v0, (a0) > vsetvli a2, zero, e8, mf4, ta, ma > vlm.v v8, (a0) > vsm.v v8, (a1) > ret > > We can also doing like this in GCC as long as we can differentiate VNx1BI= and VNx2BI, and GCC do not eliminate statement according precision even th= ough > they have same bytesize. > > First we emit vsetvl e8mf8 +vsm for VNx1BI > Then we emit vsetvl e8mf8 + vlm for VNx2BI > > Thanks. > ________________________________ > juzhe.zhong@rivai.ai> > > From: Richard Biener > Date: 2023-03-01 22:03 > To: juzhe.zhong > CC: richard.sandiford; gcc-patches; Pan Li; = pan2.li; kito.cheng > Subject: Re: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjus= tment > On Wed, 1 Mar 2023, Richard Biener wrote: > >> On Wed, 1 Mar 2023, juzhe.zhong@rivai.ai> wrote: >> >> > Let's me first introduce RVV load/store basics and stack allocation. >> > For scalable vector memory allocation, we allocate memory according to= machine vector-length. >> > To get this CPU vector-length value (runtime invariant but compile tim= e unknown), we have an instruction call csrr vlenb. >> > For example, csrr a5,vlenb (store CPU a single register vector-length = value (describe as bytesize) in a5 register). >> > A single register size in bytes (GET_MODE_SIZE) is poly value (8,8) by= tes. That means csrr a5,vlenb, a5 has the value of size poly (8,8) bytes. >> > >> > Now, our problem is that VNx1BI, VNx2BI, VNx4BI, VNx8BI has the same b= ytesize poly (1,1). So their storage consumes the same size. >> > Meaning when we want to allocate a memory storge or stack for register= spillings, we should first csrr a5, vlenb, then slli a5,a5,3 (means a5 =3D= a5/8) >> > Then, a5 has the bytesize value of poly (1,1). All VNx1BI, VNx2BI, VNx= 4BI, VNx8BI are doing the same process as I described above. They all consu= me >> > the same memory storage size since we can't model them accurately acco= rding to precision or you bitsize. >> > >> > They consume the same storage (I am agree it's better to model them mo= re accurately in case of memory storage comsuming). >> > >> > Well, even though they are consuming same size memory storage, I can m= ake their memory accessing behavior (load/store) accurately by >> > emiting the accurate RVV instruction for them according to RVV ISA. >> > >> > VNx1BI,VNx2BI, VNx4BI, VNx8BI are consuming same memory storage with s= ize poly (1,1) >> > The instruction for these modes as follows: >> > VNx1BI: vsevl e8mf8 + vlm, loading 1/8 of poly (1,1) storage. >> > VNx2BI: vsevl e8mf8 + vlm, loading 1/4 of poly (1,1) storage. >> > VNx4BI: vsevl e8mf8 + vlm, loading 1/2 of poly (1,1) storage. >> > VNx8BI: vsevl e8mf8 + vlm, loading 1 of poly (1,1) storage. >> > >> > So base on these, It's fine that we don't model VNx1BI,VNx2BI, VNx4BI,= VNx8BI accurately according to precision or bitsize. >> > This implementation is fine even though their memory storage is not ac= curate. >> > >> > However, the problem is that since they have the same bytesize, GCC wi= ll think they are the same and do some incorrect statement elimination: >> > >> > (Note: Load same memory base) >> > load v0 VNx1BI from base0 >> > load v1 VNx2BI from base0 >> > load v2 VNx4BI from base0 >> > load v3 VNx8BI from base0 >> > >> > store v0 base1 >> > store v1 base2 >> > store v2 base3 >> > store v3 base4 >> > >> > This program sequence, in GCC, it will eliminate the last 3 load instr= uctions. >> > >> > Then it will become: >> > >> > load v0 VNx1BI from base0 =3D=3D=3D> vsetvl e8mf8 + vlm (only load 1/8= of poly size (1,1) memory data) >> > >> > store v0 base1 >> > store v0 base2 >> > store v0 base3 >> > store v0 base4 >> > >> > This is what we want to fix. I think as long as we can have the way to= differentiate VNx1BI,VNx2BI, VNx4BI, VNx8BI >> > and GCC will not do th incorrect elimination for RVV. >> > >> > I think it can work fine even though these 4 modes consume inaccurate= memory storage size >> > but accurate data memory access load store behavior. >> >> So given the above I think that modeling the size as being the same >> but with accurate precision would work. It's then only the size of the >> padding in bytes we cannot represent with poly-int which should be fine. >> >> Correct? > > Btw, is storing a VNx1BI and then loading a VNx2BI from the same > memory address well-defined? That is, how is the padding handled > by the machine load/store instructions? > > Richard. --_000_MW5PR11MB5908DD95DA5A328CC5D0ADF4A9B29MW5PR11MB5908namp_--