From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by sourceware.org (Postfix) with ESMTPS id DDA2A3858CDB for ; Thu, 18 May 2023 11:27:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DDA2A3858CDB Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684409278; x=1715945278; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=VLVvrnfFzfE3XzfNbCesy10/QucNIty9zHoyxBjO1YM=; b=ZaEKZK9VwTxKH0u2h8FzRGUMQgoqucdzYuYNAn9UVDK/EcyQoi68IXGS npfNbo3+CLAyS3++/PHIM91do4C8gHhibZ4LjYWOLORAXP527QzS7FQci bLlw2BJEAnJHTpG7rZ8NSPzYDvoN8qDkllu8TwL+77HI1NM3f3BJJ/X6X P4FzCH/jRD6BTEjluu7DREsuHEawp3pKPYnycCj9bxfPhxrbAmRfpsxS0 ph3f1qhC44uKdtiwvvytIDKDwiaGFRc4OxqzLhr/0gKkvdaSdaV0BW3On 70qG5Dkc4PBEdjluEeZ1UH/+RkDqAY5XVpZjh9QHX+J8huuQfpKnwxV3p w==; X-IronPort-AV: E=McAfee;i="6600,9927,10713"; a="355234503" X-IronPort-AV: E=Sophos;i="5.99,285,1677571200"; d="scan'208";a="355234503" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 May 2023 04:27:56 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10713"; a="767167103" X-IronPort-AV: E=Sophos;i="5.99,285,1677571200"; d="scan'208";a="767167103" Received: from fmsmsx602.amr.corp.intel.com ([10.18.126.82]) by fmsmga008.fm.intel.com with ESMTP; 18 May 2023 04:27:56 -0700 Received: from fmsmsx610.amr.corp.intel.com (10.18.126.90) by fmsmsx602.amr.corp.intel.com (10.18.126.82) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Thu, 18 May 2023 04:27:56 -0700 Received: from fmsedg601.ED.cps.intel.com (10.1.192.135) by fmsmsx610.amr.corp.intel.com (10.18.126.90) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23 via Frontend Transport; Thu, 18 May 2023 04:27:56 -0700 Received: from NAM04-DM6-obe.outbound.protection.outlook.com (104.47.73.44) by edgegateway.intel.com (192.55.55.70) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.23; Thu, 18 May 2023 04:27:55 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=mDW4Cuo5G15wDrFQibCPPlMG9FY5BCTxpr30rw/AQ85ex14sMn4GtUGBQtougMp6prWGx914Euc7MiesEdXdTMmCljQnq/FZBaDdFCn4N6+zpiAf1qU2GASkWvvgFsCRhVkv5u0e/yS7UEB90C+HzahZpzCSHUfEN5cNxYKO0v1BXlLi0IGnkf/y/up1tjqIWoqpy3jBaqyxXXGI0E0VHNmE+xMd0HSUW/15OmlRYSiRFVRgS/oADqwhHEc+8MN4f46IQHoxZkaQfx0YZ0UdiBplKED7ZxyOB2zqxQPOtiHt/3H2oVVreMiQtnyEfJzU5McL+AP4UcB90F3i5x+z/Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=nIWTIUjrJY+tnerdTitdYFVMG0Y10tvCN8DQa4o+95U=; b=EuF2Aw7cDaU90RykZt7tNOUQngj3tctexL8n99z1aqJSbPWMwdGaaqyLQpvXxK4TPKeNbhumOmMBfuZe+sMvNd5/Pl9FlnLuQiIcLMxMbkqn89pz+pUnNduUE4xj+fcFundbwxIHj7Lh+ftuamA0PB6tpk4dU/zk1Tw7QyNN2JiS28yS22Q9qVtRx0yyeB174xfZT+sVT0zUPigvJMzcSNXOuPFyquQHHhuXXdLLzqpWmY/TyE+JHd9SRzyCb7P+MFtdA16SXxm08f0/StczMvBBI+ygoI6iNG68Fh6ASuFrZTUhyjReH7qXI0jtLkl4mSCoQcpq6b4ecWCz99VtWg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Received: from MW5PR11MB5908.namprd11.prod.outlook.com (2603:10b6:303:194::10) by CH3PR11MB8185.namprd11.prod.outlook.com (2603:10b6:610:159::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6411.17; Thu, 18 May 2023 11:27:50 +0000 Received: from MW5PR11MB5908.namprd11.prod.outlook.com ([fe80::bbc5:f013:1f53:10a9]) by MW5PR11MB5908.namprd11.prod.outlook.com ([fe80::bbc5:f013:1f53:10a9%3]) with mapi id 15.20.6411.017; Thu, 18 May 2023 11:27:50 +0000 From: "Li, Pan2" To: "juzhe.zhong@rivai.ai" , "gcc-patches@gcc.gnu.org" CC: "richard.sandiford@arm.com" , "rguenther@suse.de" Subject: RE: [PATCH V11] VECT: Add decrement IV support in Loop Vectorizer Thread-Topic: [PATCH V11] VECT: Add decrement IV support in Loop Vectorizer Thread-Index: AQHZh+CHdCIZS94xzkexwmPwyspbjK9f50HA Date: Thu, 18 May 2023 11:27:50 +0000 Message-ID: References: <20230516102319.163752-1-juzhe.zhong@rivai.ai> In-Reply-To: <20230516102319.163752-1-juzhe.zhong@rivai.ai> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: MW5PR11MB5908:EE_|CH3PR11MB8185:EE_ x-ms-office365-filtering-correlation-id: 3e436fd0-4fa8-4a82-6b86-08db5792e9f5 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: xOOgYSD6N/CHWtBc5VMCCLDijYVTFbIVv1wKsiYgmgbUruUsSQQ353Kp3bEo8IYF4cQeh5C5vdO+mhLvepH82zRDy8YvGaIMTRUTi9YeSnv1AO4UmibOSCQ5PKOQtMWIbHBRbGnWAHlwaecf6+p86fvydrcgYNLNkdmxm3cjw8uB4nXUP0YtSwIHLWDtq7zhd6HmBc6RpdSAMi6POWe1YE9T8AWw0FYslMzC3ZsdCYuwaLrYA2mRqrKA3IzEdF0NEXHR2glTxeJAyFIXw2SoBY5Ny3qRLJKivT8FQ20CzD3oewnvZ6b6+Xo/plYqLSo9eK65oX0TWlyWadxu9meMtbrDSLroHUY3V6SYRXvrY2rXrNeA/7SnxX5R1MSVZa0Dg5DcXdd8K9g+vLzZARbsg2G/r794qm/BNJKIOY71YnQRVxP26t5mBRQOQrBY3MVcZEM+WGfVEbDDCi76QxgOZ0YcDk1Cg5gbPYl+kX/noesi5eAnZMEaIRirJxE3ARLnkrAPOavgmP2bwF6cKirhMEb2DkrLUlXfH7HfnLALGYk/1ddiIwQyTbOKo8WdaaH/mpfvKxKKQizSQa9dodMvJjqCvST/yN2nA1fI58j1Y3s= x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:MW5PR11MB5908.namprd11.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230028)(376002)(366004)(346002)(136003)(396003)(39860400002)(451199021)(38070700005)(86362001)(7696005)(33656002)(66556008)(76116006)(110136005)(54906003)(66476007)(66446008)(64756008)(316002)(4326008)(478600001)(55016003)(8936002)(2906002)(5660300002)(8676002)(52536014)(71200400001)(30864003)(38100700002)(82960400001)(122000001)(41300700001)(186003)(26005)(6506007)(53546011)(9686003)(66946007)(83380400001)(84970400001)(579004)(559001);DIR:OUT;SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?7g1T6LIfhaL57go64E7nKgQakGahl0UUd/sZ4VaLvxvL3sA2swRhuTsz/vya?= =?us-ascii?Q?ol28d3QNOLg6tS43aX03mW1HUgSwt6jiu2z5IwAK6qzT5+alkO7rqRUZ6+42?= =?us-ascii?Q?iWBT/D+1kLjlJPcVjtpFzXONUKL5gnz/XW1bBb1lKLTTbWWvf4wsFju1emIm?= =?us-ascii?Q?I/8iJFoRvBdg4CvXbj3Of381AOCAGj09P2R8N320ZqdLd/1MA54+cB9QgIka?= =?us-ascii?Q?vnnC8tnctf4cyChj3ZIROU9Dzbyng3f5NwLrpqUWn7WivaX4AW9jU6ULzQJz?= =?us-ascii?Q?1XVfpjjFjvV3V0YO96u8+1YyvBXpBNyODVpfT5EsZwLL43jI2dbWuFSpOh7h?= =?us-ascii?Q?xjmv+YZcG0V1P+4nuyWoh2z6x9WL5ZCJy3tGOSFTfvKNoCxkZzuuwAAUKUrS?= =?us-ascii?Q?aMuK9cIWRAhy1hSI9Tfq8GHSSgJxSCwu2wiJar2b68iuuG6RsIYp7GBJZhQM?= =?us-ascii?Q?imU1Wew9RAZBrBBUFUwijWLJpU307h21JNoVBVaPG8Tt7OplHuhSa7yGU4Mr?= =?us-ascii?Q?CaOzVvLAxe5FdwiVKY9o+4i9aTz50gYB7ISfe54Xsr1+TPPNnACI65//6IZg?= =?us-ascii?Q?8E1JcYeL1etjpQnV18Y5NIL7bOnncls25D023ALxFpWnEnOJYxNC4X8Ucp0y?= =?us-ascii?Q?/qgrXKJ9Z/IEKKvPyvOOtxOjJ5sHBHa1VsV5iPSmuteE49RG+9ptRn9GA0RQ?= =?us-ascii?Q?Hpz5EmkQTbu4Vg+iDG+ShJYP44fXc+iR9MR7ILT1JaqF1m6cpbmgdXfP7umY?= =?us-ascii?Q?fVTP7o12BDM3dzSJarw848Yva7pVZIDbSzzYEFVtTkTwYW2K1u8amhgKTbcw?= =?us-ascii?Q?uEU0BQkEuUuzNlcfWOlp3HW8j0gDidGepaLx7YRDpQweTGoeU+2MdkMjwHkP?= =?us-ascii?Q?kYbsGzSzN1QW13uaHQnuarQGPqBG1vZ9yTZwlolTyjHAzF0nnnrp70rQqhB5?= =?us-ascii?Q?VsZ/a7qHFMecMX7bijiKEGLVXqejaSXRVSA4JIcyyeP/EMr7i+almoiIR9tZ?= =?us-ascii?Q?KA1hy7GhOrE4S/f1u2cfMhwykfQtdSdFn5b31Ylz80dNKcuMzHYSMpKPng5c?= =?us-ascii?Q?xhe1/V60unIKyOJk2gO5f3xcWMVbj5/V42g6Yh19YVDuGjCuMEGxwiuT1NWQ?= =?us-ascii?Q?GmYWbjQotkC/VDc6UH2EIhk9iVyu9HbfWW78/e0T1lAyAugeN5z+/Vk+kuei?= =?us-ascii?Q?R5RBNl8vF4r9ZakxZLPEqE51ICN/VaR6nFBNxSc+I+j2mpKiQVn+ykLtSKJv?= =?us-ascii?Q?0eOxX8TQI5ypTiCJ0jKD6XBAdTGFbqLEo42ZPY3zAgvT6HLYV5eVwkBWdw0E?= =?us-ascii?Q?fHelR9aMXK7HnwrDFzs6ML8YuVbly+U3xWExJTKXgEYeyNjQGjIOQtvxlnex?= =?us-ascii?Q?UN8Ud/dpSmtaZN73xWCp9HJWONXki3iRh/AsguyWsHTfxV3uiFBuODP4ojYY?= =?us-ascii?Q?7uGrTNPV+JMGLbhPQQ0Dsy5GxteJs/rjrxpQYtwUPl4nZCgRNx6VlcUpc4km?= =?us-ascii?Q?f7dc2q9mpqoZeVtJ5CUwTMu6doEd50xQFQBZodyHJkfgG3VtFucX5CeNIqTk?= =?us-ascii?Q?GfSUur2rb4is4Z9CMoQ=3D?= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: MW5PR11MB5908.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 3e436fd0-4fa8-4a82-6b86-08db5792e9f5 X-MS-Exchange-CrossTenant-originalarrivaltime: 18 May 2023 11:27:50.4477 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: zmkdkXsUiVtHBcagwjM2DeXXY0zV6luNXhjchcJ5KwzG2BT2TrMVbK4MltAfrgL29ZVCFEoo82ULDh+RDLmNQg== X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH3PR11MB8185 X-OriginatorOrg: intel.com X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Synced with today(5/18/2023)'s upstream, passed the bootstrap and regressio= n test in X86. Pan -----Original Message----- From: Gcc-patches On = Behalf Of juzhe.zhong@rivai.ai Sent: Tuesday, May 16, 2023 6:23 PM To: gcc-patches@gcc.gnu.org Cc: richard.sandiford@arm.com; rguenther@suse.de; Ju-Zhe Zhong Subject: [PATCH V11] VECT: Add decrement IV support in Loop Vectorizer From: Ju-Zhe Zhong This patch implement decrement IV for length approach in loop control. Address comment from kewen that incorporate the implementation inside "vect= _set_loop_controls_directly" instead of a standalone function. Address comment from Richard using MIN_EXPR to handle these 3 following cas= es 1. single rgroup. 2. multiple rgroup for SLP. 3. multiple rgroup for non-SLP (tested on vec_pack_trunc). gcc/ChangeLog: * tree-vect-loop-manip.cc (vect_adjust_loop_lens): New function. (vect_set_loop_controls_directly): Add decrement IV support. (vect_set_loop_condition_partial_vectors): Ditto. * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): New variable. (vect_get_loop_len): Add decrement IV support. * tree-vect-stmts.cc (vectorizable_store): Ditto. (vectorizable_load): Ditto. * tree-vectorizer.h (LOOP_VINFO_USING_DECREMENTING_IV_P): New macro= . (vect_get_loop_len): Add decrement IV support. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-1.c: New tes= t. * gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-1.h: New tes= t. * gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-2.c: New tes= t. * gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-2.h: New tes= t. * gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c: New= test. * gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c: New= test. --- .../rvv/autovec/partial/multiple_rgroup-1.c | 6 + .../rvv/autovec/partial/multiple_rgroup-1.h | 304 ++++++++++ .../rvv/autovec/partial/multiple_rgroup-2.c | 6 + .../rvv/autovec/partial/multiple_rgroup-2.h | 546 ++++++++++++++++++ .../autovec/partial/multiple_rgroup_run-1.c | 19 + .../autovec/partial/multiple_rgroup_run-2.c | 19 + gcc/tree-vect-loop-manip.cc | 184 +++++- gcc/tree-vect-loop.cc | 37 +- gcc/tree-vect-stmts.cc | 9 +- gcc/tree-vectorizer.h | 13 +- 10 files changed, 1132 insertions(+), 11 deletions(-) create mode 100644 = gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/mult= iple_rgroup-1.h create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/mult= iple_rgroup-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/mult= iple_rgroup-2.h create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/mult= iple_rgroup_run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/mult= iple_rgroup_run-2.c diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rg= roup-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgro= up-1.c new file mode 100644 index 00000000000..69cc3be78f7 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup +++ -1.c @@ -0,0 +1,6 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=3Drv32gcv -mabi=3Dilp32d --param=20 +riscv-autovec-preference=3Dfixed-vlmax" } */ + +#include "multiple_rgroup-1.h" + +TEST_ALL (test_1) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rg= roup-1.h b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgro= up-1.h new file mode 100644 index 00000000000..fbc49f4855d --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup +++ -1.h @@ -0,0 +1,304 @@ +#include +#include + +#define test_1(TYPE1, TYPE2) = \ + void __attribute__ ((noinline, noclone)) = \ + test_1_##TYPE1_##TYPE2 (TYPE1 *__restrict f, TYPE2 *__restrict d, TYPE1 = x, \ + TYPE1 x2, TYPE2 y, int n) \ + { = \ + for (int i =3D 0; i < n; ++i) = \ + { = \ + f[i * 2 + 0] =3D x; = \ + f[i * 2 + 1] =3D x2; = \ + d[i] =3D y; = \ + } = \ + } + +#define run_1(TYPE1, TYPE2) = \ + int n_1_##TYPE1_##TYPE2 =3D 1; = \ + TYPE1 x_1_##TYPE1 =3D 117; = \ + TYPE1 x2_1_##TYPE1 =3D 232; = \ + TYPE2 y_1_##TYPE2 =3D 9762; = \ + TYPE1 f_1_##TYPE1[2 * 2 + 1] =3D {0}; = \ + TYPE2 d_1_##TYPE2[2] =3D {0}; = \ + test_1_##TYPE1_##TYPE2 (f_1_##TYPE1, d_1_##TYPE2, x_1_##TYPE1, x2_1_##TY= PE1, \ + y_1_##TYPE2, n_1_##TYPE1_##TYPE2); \ + for (int i =3D 0; i < n_1_##TYPE1_##TYPE2; ++i) = \ + { = \ + if (f_1_##TYPE1[i * 2 + 0] !=3D x_1_##TYPE1) = \ + __builtin_abort (); \ + if (f_1_##TYPE1[i * 2 + 1] !=3D x2_1_##TYPE1) = \ + __builtin_abort (); \ + if (d_1_##TYPE2[i] !=3D y_1_##TYPE2) = \ + __builtin_abort (); \ + } = \ + for (int i =3D n_1_##TYPE1_##TYPE2; i < n_1_##TYPE1_##TYPE2 + 1; ++i) = \ + { = \ + if (f_1_##TYPE1[i * 2 + 0] !=3D 0) = \ + __builtin_abort (); \ + if (f_1_##TYPE1[i * 2 + 1] !=3D 0) = \ + __builtin_abort (); \ + if (d_1_##TYPE2[i] !=3D 0) = \ + __builtin_abort (); \ + } + +#define run_2(TYPE1, TYPE2) = \ + int n_2_##TYPE1_##TYPE2 =3D 17; = \ + TYPE1 x_2_##TYPE1 =3D 133; = \ + TYPE1 x2_2_##TYPE1 =3D 94; = \ + TYPE2 y_2_##TYPE2 =3D 8672; = \ + TYPE1 f_2_##TYPE1[18 * 2 + 1] =3D {0}; = \ + TYPE2 d_2_##TYPE2[18] =3D {0}; = \ + test_1_##TYPE1_##TYPE2 (f_2_##TYPE1, d_2_##TYPE2, x_2_##TYPE1, x2_2_##TY= PE1, \ + y_2_##TYPE2, n_2_##TYPE1_##TYPE2); \ + for (int i =3D 0; i < n_2_##TYPE1_##TYPE2; ++i) = \ + { = \ + if (f_2_##TYPE1[i * 2 + 0] !=3D x_2_##TYPE1) = \ + __builtin_abort (); \ + if (f_2_##TYPE1[i * 2 + 1] !=3D x2_2_##TYPE1) = \ + __builtin_abort (); \ + if (d_2_##TYPE2[i] !=3D y_2_##TYPE2) = \ + __builtin_abort (); \ + } = \ + for (int i =3D n_2_##TYPE1_##TYPE2; i < n_2_##TYPE1_##TYPE2 + 1; ++i) = \ + { = \ + if (f_2_##TYPE1[i * 2 + 0] !=3D 0) = \ + __builtin_abort (); \ + if (f_2_##TYPE1[i * 2 + 1] !=3D 0) = \ + __builtin_abort (); \ + if (d_2_##TYPE2[i] !=3D 0) = \ + __builtin_abort (); \ + } + +#define run_3(TYPE1, TYPE2) = \ + int n_3_##TYPE1_##TYPE2 =3D 32; = \ + TYPE1 x_3_##TYPE1 =3D 233; = \ + TYPE1 x2_3_##TYPE1 =3D 78; = \ + TYPE2 y_3_##TYPE2 =3D 1234; = \ + TYPE1 f_3_##TYPE1[33 * 2 + 1] =3D {0}; = \ + TYPE2 d_3_##TYPE2[33] =3D {0}; = \ + test_1_##TYPE1_##TYPE2 (f_3_##TYPE1, d_3_##TYPE2, x_3_##TYPE1, x2_3_##TY= PE1, \ + y_3_##TYPE2, n_3_##TYPE1_##TYPE2); \ + for (int i =3D 0; i < n_3_##TYPE1_##TYPE2; ++i) = \ + { = \ + if (f_3_##TYPE1[i * 2 + 0] !=3D x_3_##TYPE1) = \ + __builtin_abort (); \ + if (f_3_##TYPE1[i * 2 + 1] !=3D x2_3_##TYPE1) = \ + __builtin_abort (); \ + if (d_3_##TYPE2[i] !=3D y_3_##TYPE2) = \ + __builtin_abort (); \ + } = \ + for (int i =3D n_3_##TYPE1_##TYPE2; i < n_3_##TYPE1_##TYPE2 + 1; ++i) = \ + { = \ + if (f_3_##TYPE1[i * 2 + 0] !=3D 0) = \ + __builtin_abort (); \ + if (f_3_##TYPE1[i * 2 + 1] !=3D 0) = \ + __builtin_abort (); \ + if (d_3_##TYPE2[i] !=3D 0) = \ + __builtin_abort (); \ + } + +#define run_4(TYPE1, TYPE2) = \ + int n_4_##TYPE1_##TYPE2 =3D 128; = \ + TYPE1 x_4_##TYPE1 =3D 222; = \ + TYPE1 x2_4_##TYPE1 =3D 59; = \ + TYPE2 y_4_##TYPE2 =3D 4321; = \ + TYPE1 f_4_##TYPE1[129 * 2 + 1] =3D {0}; = \ + TYPE2 d_4_##TYPE2[129] =3D {0}; = \ + test_1_##TYPE1_##TYPE2 (f_4_##TYPE1, d_4_##TYPE2, x_4_##TYPE1, x2_4_##TY= PE1, \ + y_4_##TYPE2, n_4_##TYPE1_##TYPE2); \ + for (int i =3D 0; i < n_4_##TYPE1_##TYPE2; ++i) = \ + { = \ + if (f_4_##TYPE1[i * 2 + 0] !=3D x_4_##TYPE1) = \ + __builtin_abort (); \ + if (f_4_##TYPE1[i * 2 + 1] !=3D x2_4_##TYPE1) = \ + __builtin_abort (); \ + if (d_4_##TYPE2[i] !=3D y_4_##TYPE2) = \ + __builtin_abort (); \ + } = \ + for (int i =3D n_4_##TYPE1_##TYPE2; i < n_4_##TYPE1_##TYPE2 + 1; ++i) = \ + { = \ + if (f_4_##TYPE1[i * 2 + 0] !=3D 0) = \ + __builtin_abort (); \ + if (f_4_##TYPE1[i * 2 + 1] !=3D 0) = \ + __builtin_abort (); \ + if (d_4_##TYPE2[i] !=3D 0) = \ + __builtin_abort (); \ + } + +#define run_5(TYPE1, TYPE2) = \ + int n_5_##TYPE1_##TYPE2 =3D 177; = \ + TYPE1 x_5_##TYPE1 =3D 111; = \ + TYPE1 x2_5_##TYPE1 =3D 189; = \ + TYPE2 y_5_##TYPE2 =3D 5555; = \ + TYPE1 f_5_##TYPE1[178 * 2 + 1] =3D {0}; = \ + TYPE2 d_5_##TYPE2[178] =3D {0}; = \ + test_1_##TYPE1_##TYPE2 (f_5_##TYPE1, d_5_##TYPE2, x_5_##TYPE1, x2_5_##TY= PE1, \ + y_5_##TYPE2, n_5_##TYPE1_##TYPE2); \ + for (int i =3D 0; i < n_5_##TYPE1_##TYPE2; ++i) = \ + { = \ + if (f_5_##TYPE1[i * 2 + 0] !=3D x_5_##TYPE1) = \ + __builtin_abort (); \ + if (f_5_##TYPE1[i * 2 + 1] !=3D x2_5_##TYPE1) = \ + __builtin_abort (); \ + if (d_5_##TYPE2[i] !=3D y_5_##TYPE2) = \ + __builtin_abort (); \ + } = \ + for (int i =3D n_5_##TYPE1_##TYPE2; i < n_5_##TYPE1_##TYPE2 + 1; ++i) = \ + { = \ + if (f_5_##TYPE1[i * 2 + 0] !=3D 0) = \ + __builtin_abort (); \ + if (f_5_##TYPE1[i * 2 + 1] !=3D 0) = \ + __builtin_abort (); \ + if (d_5_##TYPE2[i] !=3D 0) = \ + __builtin_abort (); \ + } + +#define run_6(TYPE1, TYPE2) = \ + int n_6_##TYPE1_##TYPE2 =3D 255; = \ + TYPE1 x_6_##TYPE1 =3D 123; = \ + TYPE1 x2_6_##TYPE1 =3D 132; = \ + TYPE2 y_6_##TYPE2 =3D 6655; = \ + TYPE1 f_6_##TYPE1[256 * 2 + 1] =3D {0}; = \ + TYPE2 d_6_##TYPE2[256] =3D {0}; = \ + test_1_##TYPE1_##TYPE2 (f_6_##TYPE1, d_6_##TYPE2, x_6_##TYPE1, x2_6_##TY= PE1, \ + y_6_##TYPE2, n_6_##TYPE1_##TYPE2); \ + for (int i =3D 0; i < n_6_##TYPE1_##TYPE2; ++i) = \ + { = \ + if (f_6_##TYPE1[i * 2 + 0] !=3D x_6_##TYPE1) = \ + __builtin_abort (); \ + if (f_6_##TYPE1[i * 2 + 1] !=3D x2_6_##TYPE1) = \ + __builtin_abort (); \ + if (d_6_##TYPE2[i] !=3D y_6_##TYPE2) = \ + __builtin_abort (); \ + } = \ + for (int i =3D n_6_##TYPE1_##TYPE2; i < n_6_##TYPE1_##TYPE2 + 1; ++i) = \ + { = \ + if (f_6_##TYPE1[i * 2 + 0] !=3D 0) = \ + __builtin_abort (); \ + if (f_6_##TYPE1[i * 2 + 1] !=3D 0) = \ + __builtin_abort (); \ + if (d_6_##TYPE2[i] !=3D 0) = \ + __builtin_abort (); \ + } + +#define run_7(TYPE1, TYPE2) = \ + int n_7_##TYPE1_##TYPE2 =3D 333; = \ + TYPE1 x_7_##TYPE1 =3D 39; = \ + TYPE1 x2_7_##TYPE1 =3D 59; = \ + TYPE2 y_7_##TYPE2 =3D 5968; = \ + TYPE1 f_7_##TYPE1[334 * 2 + 1] =3D {0}; = \ + TYPE2 d_7_##TYPE2[334] =3D {0}; = \ + test_1_##TYPE1_##TYPE2 (f_7_##TYPE1, d_7_##TYPE2, x_7_##TYPE1, x2_7_##TY= PE1, \ + y_7_##TYPE2, n_7_##TYPE1_##TYPE2); \ + for (int i =3D 0; i < n_7_##TYPE1_##TYPE2; ++i) = \ + { = \ + if (f_7_##TYPE1[i * 2 + 0] !=3D x_7_##TYPE1) = \ + __builtin_abort (); \ + if (f_7_##TYPE1[i * 2 + 1] !=3D x2_7_##TYPE1) = \ + __builtin_abort (); \ + if (d_7_##TYPE2[i] !=3D y_7_##TYPE2) = \ + __builtin_abort (); \ + } = \ + for (int i =3D n_7_##TYPE1_##TYPE2; i < n_7_##TYPE1_##TYPE2 + 1; ++i) = \ + { = \ + if (f_7_##TYPE1[i * 2 + 0] !=3D 0) = \ + __builtin_abort (); \ + if (f_7_##TYPE1[i * 2 + 1] !=3D 0) = \ + __builtin_abort (); \ + if (d_7_##TYPE2[i] !=3D 0) = \ + __builtin_abort (); \ + } + +#define run_8(TYPE1, TYPE2) = \ + int n_8_##TYPE1_##TYPE2 =3D 512; = \ + TYPE1 x_8_##TYPE1 =3D 71; = \ + TYPE1 x2_8_##TYPE1 =3D 255; = \ + TYPE2 y_8_##TYPE2 =3D 3366; = \ + TYPE1 f_8_##TYPE1[513 * 2 + 1] =3D {0}; = \ + TYPE2 d_8_##TYPE2[513] =3D {0}; = \ + test_1_##TYPE1_##TYPE2 (f_8_##TYPE1, d_8_##TYPE2, x_8_##TYPE1, x2_8_##TY= PE1, \ + y_8_##TYPE2, n_8_##TYPE1_##TYPE2); \ + for (int i =3D 0; i < n_8_##TYPE1_##TYPE2; ++i) = \ + { = \ + if (f_8_##TYPE1[i * 2 + 0] !=3D x_8_##TYPE1) = \ + __builtin_abort (); \ + if (f_8_##TYPE1[i * 2 + 1] !=3D x2_8_##TYPE1) = \ + __builtin_abort (); \ + if (d_8_##TYPE2[i] !=3D y_8_##TYPE2) = \ + __builtin_abort (); \ + } = \ + for (int i =3D n_8_##TYPE1_##TYPE2; i < n_8_##TYPE1_##TYPE2 + 1; ++i) = \ + { = \ + if (f_8_##TYPE1[i * 2 + 0] !=3D 0) = \ + __builtin_abort (); \ + if (f_8_##TYPE1[i * 2 + 1] !=3D 0) = \ + __builtin_abort (); \ + if (d_8_##TYPE2[i] !=3D 0) = \ + __builtin_abort (); \ + } + +#define run_9(TYPE1, TYPE2) = \ + int n_9_##TYPE1_##TYPE2 =3D 637; = \ + TYPE1 x_9_##TYPE1 =3D 157; = \ + TYPE1 x2_9_##TYPE1 =3D 89; = \ + TYPE2 y_9_##TYPE2 =3D 5511; = \ + TYPE1 f_9_##TYPE1[638 * 2 + 1] =3D {0}; = \ + TYPE2 d_9_##TYPE2[638] =3D {0}; = \ + test_1_##TYPE1_##TYPE2 (f_9_##TYPE1, d_9_##TYPE2, x_9_##TYPE1, x2_9_##TY= PE1, \ + y_9_##TYPE2, n_9_##TYPE1_##TYPE2); \ + for (int i =3D 0; i < n_9_##TYPE1_##TYPE2; ++i) = \ + { = \ + if (f_9_##TYPE1[i * 2 + 0] !=3D x_9_##TYPE1) = \ + __builtin_abort (); \ + if (f_9_##TYPE1[i * 2 + 1] !=3D x2_9_##TYPE1) = \ + __builtin_abort (); \ + if (d_9_##TYPE2[i] !=3D y_9_##TYPE2) = \ + __builtin_abort (); \ + } = \ + for (int i =3D n_9_##TYPE1_##TYPE2; i < n_9_##TYPE1_##TYPE2 + 1; ++i) = \ + { = \ + if (f_9_##TYPE1[i * 2 + 0] !=3D 0) = \ + __builtin_abort (); \ + if (f_9_##TYPE1[i * 2 + 1] !=3D 0) = \ + __builtin_abort (); \ + if (d_9_##TYPE2[i] !=3D 0) = \ + __builtin_abort (); \ + } + +#define run_10(TYPE1, TYPE2) = \ + int n_10_##TYPE1_##TYPE2 =3D 777; = \ + TYPE1 x_10_##TYPE1 =3D 203; = \ + TYPE1 x2_10_##TYPE1 =3D 200; = \ + TYPE2 y_10_##TYPE2 =3D 2023; = \ + TYPE1 f_10_##TYPE1[778 * 2 + 1] =3D {0}; = \ + TYPE2 d_10_##TYPE2[778] =3D {0}; = \ + test_1_##TYPE1_##TYPE2 (f_10_##TYPE1, d_10_##TYPE2, x_10_##TYPE1, = \ + x2_10_##TYPE1, y_10_##TYPE2, n_10_##TYPE1_##TYPE2); \ + for (int i =3D 0; i < n_10_##TYPE1_##TYPE2; ++i) = \ + { = \ + if (f_10_##TYPE1[i * 2 + 0] !=3D x_10_##TYPE1) = \ + __builtin_abort (); \ + if (f_10_##TYPE1[i * 2 + 1] !=3D x2_10_##TYPE1) = \ + __builtin_abort (); \ + if (d_10_##TYPE2[i] !=3D y_10_##TYPE2) = \ + __builtin_abort (); \ + } = \ + for (int i =3D n_10_##TYPE1_##TYPE2; i < n_10_##TYPE1_##TYPE2 + 1; ++i) = \ + { = \ + if (f_10_##TYPE1[i * 2 + 0] !=3D 0) = \ + __builtin_abort (); \ + if (f_10_##TYPE1[i * 2 + 1] !=3D 0) = \ + __builtin_abort (); \ + if (d_10_##TYPE2[i] !=3D 0) = \ + __builtin_abort (); \ + } + +#define TEST_ALL(T) = \ + T (int8_t, int16_t) = \ + T (uint8_t, uint16_t) = \ + T (int16_t, int32_t) = \ + T (uint16_t, uint32_t) = \ + T (int32_t, int64_t) = \ + T (uint32_t, uint64_t) = \ + T (float, double) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rg= roup-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgro= up-2.c new file mode 100644 index 00000000000..d1c41907547 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup +++ -2.c @@ -0,0 +1,6 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=3Drv32gcv -mabi=3Dilp32d --param=20 +riscv-autovec-preference=3Dfixed-vlmax" } */ + +#include "multiple_rgroup-2.h" + +TEST_ALL (test_1) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rg= roup-2.h b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgro= up-2.h new file mode 100644 index 00000000000..045a76de45f --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup +++ -2.h @@ -0,0 +1,546 @@ +#include +#include + +#define test_1(TYPE1, TYPE2, TYPE3) = \ + void __attribute__ ((noinline, noclone)) = \ + test_1_##TYPE1_##TYPE2 (TYPE1 *__restrict f, TYPE2 *__restrict d, = \ + TYPE3 *__restrict e, TYPE1 x, TYPE1 x2, TYPE1 x3, \ + TYPE1 x4, TYPE2 y, TYPE2 y2, TYPE3 z, int n) \ + { = \ + for (int i =3D 0; i < n; ++i) = \ + { = \ + f[i * 4 + 0] =3D x; = \ + f[i * 4 + 1] =3D x2; = \ + f[i * 4 + 2] =3D x3; = \ + f[i * 4 + 3] =3D x4; = \ + d[i * 2 + 0] =3D y; = \ + d[i * 2 + 1] =3D y2; = \ + e[i] =3D z; = \ + } = \ + } + +#define run_1(TYPE1, TYPE2, TYPE3) = \ + int n_1_##TYPE1_##TYPE2_##TYPE3 =3D 1; = \ + TYPE1 x_1_##TYPE1 =3D 117; = \ + TYPE1 x2_1_##TYPE1 =3D 232; = \ + TYPE1 x3_1_##TYPE1 =3D 127; = \ + TYPE1 x4_1_##TYPE1 =3D 11; = \ + TYPE2 y_1_##TYPE2 =3D 9762; = \ + TYPE2 y2_1_##TYPE2 =3D 6279; = \ + TYPE3 z_1_##TYPE3 =3D 5891663; = \ + TYPE1 f_1_##TYPE1[2 * 4 + 1] =3D {0}; = \ + TYPE2 d_1_##TYPE2[2 * 2 + 1] =3D {0}; = \ + TYPE3 e_1_##TYPE3[2] =3D {0}; = \ + test_1_##TYPE1_##TYPE2 (f_1_##TYPE1, d_1_##TYPE2, e_1_##TYPE3, x_1_##TYP= E1, \ + x2_1_##TYPE1, x3_1_##TYPE1, x4_1_##TYPE1, \ + y_1_##TYPE2, y2_1_##TYPE2, z_1_##TYPE3, \ + n_1_##TYPE1_##TYPE2_##TYPE3); \ + for (int i =3D 0; i < n_1_##TYPE1_##TYPE2_##TYPE3; ++i) = \ + { = \ + if (f_1_##TYPE1[i * 4 + 0] !=3D x_1_##TYPE1) = \ + __builtin_abort (); \ + if (f_1_##TYPE1[i * 4 + 1] !=3D x2_1_##TYPE1) = \ + __builtin_abort (); \ + if (f_1_##TYPE1[i * 4 + 2] !=3D x3_1_##TYPE1) = \ + __builtin_abort (); \ + if (f_1_##TYPE1[i * 4 + 3] !=3D x4_1_##TYPE1) = \ + __builtin_abort (); \ + if (d_1_##TYPE2[i * 2 + 0] !=3D y_1_##TYPE2) = \ + __builtin_abort (); \ + if (d_1_##TYPE2[i * 2 + 1] !=3D y2_1_##TYPE2) = \ + __builtin_abort (); \ + if (e_1_##TYPE3[i] !=3D z_1_##TYPE3) = \ + __builtin_abort (); \ + } = \ + for (int i =3D n_1_##TYPE1_##TYPE2_##TYPE3; = \ + i < n_1_##TYPE1_##TYPE2_##TYPE3 + 1; ++i) = \ + { = \ + if (f_1_##TYPE1[i * 4 + 0] !=3D 0) = \ + __builtin_abort (); \ + if (f_1_##TYPE1[i * 4 + 1] !=3D 0) = \ + __builtin_abort (); \ + if (f_1_##TYPE1[i * 4 + 2] !=3D 0) = \ + __builtin_abort (); \ + if (f_1_##TYPE1[i * 4 + 3] !=3D 0) = \ + __builtin_abort (); \ + if (d_1_##TYPE2[i * 2 + 0] !=3D 0) = \ + __builtin_abort (); \ + if (d_1_##TYPE2[i * 2 + 1] !=3D 0) = \ + __builtin_abort (); \ + if (e_1_##TYPE3[i] !=3D 0) = \ + __builtin_abort (); \ + } + +#define run_2(TYPE1, TYPE2, TYPE3) = \ + int n_2_##TYPE1_##TYPE2_##TYPE3 =3D 17; = \ + TYPE1 x_2_##TYPE1 =3D 107; = \ + TYPE1 x2_2_##TYPE1 =3D 202; = \ + TYPE1 x3_2_##TYPE1 =3D 17; = \ + TYPE1 x4_2_##TYPE1 =3D 53; = \ + TYPE2 y_2_##TYPE2 =3D 5566; = \ + TYPE2 y2_2_##TYPE2 =3D 7926; = \ + TYPE3 z_2_##TYPE3 =3D 781545971; = \ + TYPE1 f_2_##TYPE1[18 * 4 + 1] =3D {0}; = \ + TYPE2 d_2_##TYPE2[18 * 2 + 1] =3D {0}; = \ + TYPE3 e_2_##TYPE3[18] =3D {0}; = \ + test_1_##TYPE1_##TYPE2 (f_2_##TYPE1, d_2_##TYPE2, e_2_##TYPE3, x_2_##TYP= E1, \ + x2_2_##TYPE1, x3_2_##TYPE1, x4_2_##TYPE1, \ + y_2_##TYPE2, y2_2_##TYPE2, z_2_##TYPE3, \ + n_2_##TYPE1_##TYPE2_##TYPE3); \ + for (int i =3D 0; i < n_2_##TYPE1_##TYPE2_##TYPE3; ++i) = \ + { = \ + if (f_2_##TYPE1[i * 4 + 0] !=3D x_2_##TYPE1) = \ + __builtin_abort (); \ + if (f_2_##TYPE1[i * 4 + 1] !=3D x2_2_##TYPE1) = \ + __builtin_abort (); \ + if (f_2_##TYPE1[i * 4 + 2] !=3D x3_2_##TYPE1) = \ + __builtin_abort (); \ + if (f_2_##TYPE1[i * 4 + 3] !=3D x4_2_##TYPE1) = \ + __builtin_abort (); \ + if (d_2_##TYPE2[i * 2 + 0] !=3D y_2_##TYPE2) = \ + __builtin_abort (); \ + if (d_2_##TYPE2[i * 2 + 1] !=3D y2_2_##TYPE2) = \ + __builtin_abort (); \ + if (e_2_##TYPE3[i] !=3D z_2_##TYPE3) = \ + __builtin_abort (); \ + } = \ + for (int i =3D n_2_##TYPE1_##TYPE2_##TYPE3; = \ + i < n_2_##TYPE1_##TYPE2_##TYPE3 + 1; ++i) = \ + { = \ + if (f_2_##TYPE1[i * 4 + 0] !=3D 0) = \ + __builtin_abort (); \ + if (f_2_##TYPE1[i * 4 + 1] !=3D 0) = \ + __builtin_abort (); \ + if (f_2_##TYPE1[i * 4 + 2] !=3D 0) = \ + __builtin_abort (); \ + if (f_2_##TYPE1[i * 4 + 3] !=3D 0) = \ + __builtin_abort (); \ + if (d_2_##TYPE2[i * 2 + 0] !=3D 0) = \ + __builtin_abort (); \ + if (d_2_##TYPE2[i * 2 + 1] !=3D 0) = \ + __builtin_abort (); \ + if (e_2_##TYPE3[i] !=3D 0) = \ + __builtin_abort (); \ + } + +#define run_3(TYPE1, TYPE2, TYPE3) = \ + int n_3_##TYPE1_##TYPE2_##TYPE3 =3D 32; = \ + TYPE1 x_3_##TYPE1 =3D 109; = \ + TYPE1 x2_3_##TYPE1 =3D 239; = \ + TYPE1 x3_3_##TYPE1 =3D 151; = \ + TYPE1 x4_3_##TYPE1 =3D 3; = \ + TYPE2 y_3_##TYPE2 =3D 1234; = \ + TYPE2 y2_3_##TYPE2 =3D 4321; = \ + TYPE3 z_3_##TYPE3 =3D 145615615; = \ + TYPE1 f_3_##TYPE1[33 * 4 + 1] =3D {0}; = \ + TYPE2 d_3_##TYPE2[33 * 2 + 1] =3D {0}; = \ + TYPE3 e_3_##TYPE3[33] =3D {0}; = \ + test_1_##TYPE1_##TYPE2 (f_3_##TYPE1, d_3_##TYPE2, e_3_##TYPE3, x_3_##TYP= E1, \ + x2_3_##TYPE1, x3_3_##TYPE1, x4_3_##TYPE1, \ + y_3_##TYPE2, y2_3_##TYPE2, z_3_##TYPE3, \ + n_3_##TYPE1_##TYPE2_##TYPE3); \ + for (int i =3D 0; i < n_3_##TYPE1_##TYPE2_##TYPE3; ++i) = \ + { = \ + if (f_3_##TYPE1[i * 4 + 0] !=3D x_3_##TYPE1) = \ + __builtin_abort (); \ + if (f_3_##TYPE1[i * 4 + 1] !=3D x2_3_##TYPE1) = \ + __builtin_abort (); \ + if (f_3_##TYPE1[i * 4 + 2] !=3D x3_3_##TYPE1) = \ + __builtin_abort (); \ + if (f_3_##TYPE1[i * 4 + 3] !=3D x4_3_##TYPE1) = \ + __builtin_abort (); \ + if (d_3_##TYPE2[i * 2 + 0] !=3D y_3_##TYPE2) = \ + __builtin_abort (); \ + if (d_3_##TYPE2[i * 2 + 1] !=3D y2_3_##TYPE2) = \ + __builtin_abort (); \ + if (e_3_##TYPE3[i] !=3D z_3_##TYPE3) = \ + __builtin_abort (); \ + } = \ + for (int i =3D n_3_##TYPE1_##TYPE2_##TYPE3; = \ + i < n_3_##TYPE1_##TYPE2_##TYPE3 + 1; ++i) = \ + { = \ + if (f_3_##TYPE1[i * 4 + 0] !=3D 0) = \ + __builtin_abort (); \ + if (f_3_##TYPE1[i * 4 + 1] !=3D 0) = \ + __builtin_abort (); \ + if (f_3_##TYPE1[i * 4 + 2] !=3D 0) = \ + __builtin_abort (); \ + if (f_3_##TYPE1[i * 4 + 3] !=3D 0) = \ + __builtin_abort (); \ + if (d_3_##TYPE2[i * 2 + 0] !=3D 0) = \ + __builtin_abort (); \ + if (d_3_##TYPE2[i * 2 + 1] !=3D 0) = \ + __builtin_abort (); \ + if (e_3_##TYPE3[i] !=3D 0) = \ + __builtin_abort (); \ + } + +#define run_4(TYPE1, TYPE2, TYPE3) = \ + int n_4_##TYPE1_##TYPE2_##TYPE3 =3D 128; = \ + TYPE1 x_4_##TYPE1 =3D 239; = \ + TYPE1 x2_4_##TYPE1 =3D 132; = \ + TYPE1 x3_4_##TYPE1 =3D 39; = \ + TYPE1 x4_4_##TYPE1 =3D 48; = \ + TYPE2 y_4_##TYPE2 =3D 1036; = \ + TYPE2 y2_4_##TYPE2 =3D 3665; = \ + TYPE3 z_4_##TYPE3 =3D 5145656; = \ + TYPE1 f_4_##TYPE1[129 * 4 + 1] =3D {0}; = \ + TYPE2 d_4_##TYPE2[129 * 2 + 1] =3D {0}; = \ + TYPE3 e_4_##TYPE3[129] =3D {0}; = \ + test_1_##TYPE1_##TYPE2 (f_4_##TYPE1, d_4_##TYPE2, e_4_##TYPE3, x_4_##TYP= E1, \ + x2_4_##TYPE1, x3_4_##TYPE1, x4_4_##TYPE1, \ + y_4_##TYPE2, y2_4_##TYPE2, z_4_##TYPE3, \ + n_4_##TYPE1_##TYPE2_##TYPE3); \ + for (int i =3D 0; i < n_4_##TYPE1_##TYPE2_##TYPE3; ++i) = \ + { = \ + if (f_4_##TYPE1[i * 4 + 0] !=3D x_4_##TYPE1) = \ + __builtin_abort (); \ + if (f_4_##TYPE1[i * 4 + 1] !=3D x2_4_##TYPE1) = \ + __builtin_abort (); \ + if (f_4_##TYPE1[i * 4 + 2] !=3D x3_4_##TYPE1) = \ + __builtin_abort (); \ + if (f_4_##TYPE1[i * 4 + 3] !=3D x4_4_##TYPE1) = \ + __builtin_abort (); \ + if (d_4_##TYPE2[i * 2 + 0] !=3D y_4_##TYPE2) = \ + __builtin_abort (); \ + if (d_4_##TYPE2[i * 2 + 1] !=3D y2_4_##TYPE2) = \ + __builtin_abort (); \ + if (e_4_##TYPE3[i] !=3D z_4_##TYPE3) = \ + __builtin_abort (); \ + } = \ + for (int i =3D n_4_##TYPE1_##TYPE2_##TYPE3; = \ + i < n_4_##TYPE1_##TYPE2_##TYPE3 + 1; ++i) = \ + { = \ + if (f_4_##TYPE1[i * 4 + 0] !=3D 0) = \ + __builtin_abort (); \ + if (f_4_##TYPE1[i * 4 + 1] !=3D 0) = \ + __builtin_abort (); \ + if (f_4_##TYPE1[i * 4 + 2] !=3D 0) = \ + __builtin_abort (); \ + if (f_4_##TYPE1[i * 4 + 3] !=3D 0) = \ + __builtin_abort (); \ + if (d_4_##TYPE2[i * 2 + 0] !=3D 0) = \ + __builtin_abort (); \ + if (d_4_##TYPE2[i * 2 + 1] !=3D 0) = \ + __builtin_abort (); \ + if (e_4_##TYPE3[i] !=3D 0) = \ + __builtin_abort (); \ + } + +#define run_5(TYPE1, TYPE2, TYPE3) = \ + int n_5_##TYPE1_##TYPE2_##TYPE3 =3D 177; = \ + TYPE1 x_5_##TYPE1 =3D 239; = \ + TYPE1 x2_5_##TYPE1 =3D 132; = \ + TYPE1 x3_5_##TYPE1 =3D 39; = \ + TYPE1 x4_5_##TYPE1 =3D 48; = \ + TYPE2 y_5_##TYPE2 =3D 1036; = \ + TYPE2 y2_5_##TYPE2 =3D 3665; = \ + TYPE3 z_5_##TYPE3 =3D 5145656; = \ + TYPE1 f_5_##TYPE1[178 * 4 + 1] =3D {0}; = \ + TYPE2 d_5_##TYPE2[178 * 2 + 1] =3D {0}; = \ + TYPE3 e_5_##TYPE3[178] =3D {0}; = \ + test_1_##TYPE1_##TYPE2 (f_5_##TYPE1, d_5_##TYPE2, e_5_##TYPE3, x_5_##TYP= E1, \ + x2_5_##TYPE1, x3_5_##TYPE1, x4_5_##TYPE1, \ + y_5_##TYPE2, y2_5_##TYPE2, z_5_##TYPE3, \ + n_5_##TYPE1_##TYPE2_##TYPE3); \ + for (int i =3D 0; i < n_5_##TYPE1_##TYPE2_##TYPE3; ++i) = \ + { = \ + if (f_5_##TYPE1[i * 4 + 0] !=3D x_5_##TYPE1) = \ + __builtin_abort (); \ + if (f_5_##TYPE1[i * 4 + 1] !=3D x2_5_##TYPE1) = \ + __builtin_abort (); \ + if (f_5_##TYPE1[i * 4 + 2] !=3D x3_5_##TYPE1) = \ + __builtin_abort (); \ + if (f_5_##TYPE1[i * 4 + 3] !=3D x4_5_##TYPE1) = \ + __builtin_abort (); \ + if (d_5_##TYPE2[i * 2 + 0] !=3D y_5_##TYPE2) = \ + __builtin_abort (); \ + if (d_5_##TYPE2[i * 2 + 1] !=3D y2_5_##TYPE2) = \ + __builtin_abort (); \ + if (e_5_##TYPE3[i] !=3D z_5_##TYPE3) = \ + __builtin_abort (); \ + } = \ + for (int i =3D n_5_##TYPE1_##TYPE2_##TYPE3; = \ + i < n_5_##TYPE1_##TYPE2_##TYPE3 + 1; ++i) = \ + { = \ + if (f_5_##TYPE1[i * 4 + 0] !=3D 0) = \ + __builtin_abort (); \ + if (f_5_##TYPE1[i * 4 + 1] !=3D 0) = \ + __builtin_abort (); \ + if (f_5_##TYPE1[i * 4 + 2] !=3D 0) = \ + __builtin_abort (); \ + if (f_5_##TYPE1[i * 4 + 3] !=3D 0) = \ + __builtin_abort (); \ + if (d_5_##TYPE2[i * 2 + 0] !=3D 0) = \ + __builtin_abort (); \ + if (d_5_##TYPE2[i * 2 + 1] !=3D 0) = \ + __builtin_abort (); \ + if (e_5_##TYPE3[i] !=3D 0) = \ + __builtin_abort (); \ + } + +#define run_6(TYPE1, TYPE2, TYPE3) = \ + int n_6_##TYPE1_##TYPE2_##TYPE3 =3D 255; = \ + TYPE1 x_6_##TYPE1 =3D 239; = \ + TYPE1 x2_6_##TYPE1 =3D 132; = \ + TYPE1 x3_6_##TYPE1 =3D 39; = \ + TYPE1 x4_6_##TYPE1 =3D 48; = \ + TYPE2 y_6_##TYPE2 =3D 1036; = \ + TYPE2 y2_6_##TYPE2 =3D 3665; = \ + TYPE3 z_6_##TYPE3 =3D 5145656; = \ + TYPE1 f_6_##TYPE1[256 * 4 + 1] =3D {0}; = \ + TYPE2 d_6_##TYPE2[256 * 2 + 1] =3D {0}; = \ + TYPE3 e_6_##TYPE3[256] =3D {0}; = \ + test_1_##TYPE1_##TYPE2 (f_6_##TYPE1, d_6_##TYPE2, e_6_##TYPE3, x_6_##TYP= E1, \ + x2_6_##TYPE1, x3_6_##TYPE1, x4_6_##TYPE1, \ + y_6_##TYPE2, y2_6_##TYPE2, z_6_##TYPE3, \ + n_6_##TYPE1_##TYPE2_##TYPE3); \ + for (int i =3D 0; i < n_6_##TYPE1_##TYPE2_##TYPE3; ++i) = \ + { = \ + if (f_6_##TYPE1[i * 4 + 0] !=3D x_6_##TYPE1) = \ + __builtin_abort (); \ + if (f_6_##TYPE1[i * 4 + 1] !=3D x2_6_##TYPE1) = \ + __builtin_abort (); \ + if (f_6_##TYPE1[i * 4 + 2] !=3D x3_6_##TYPE1) = \ + __builtin_abort (); \ + if (f_6_##TYPE1[i * 4 + 3] !=3D x4_6_##TYPE1) = \ + __builtin_abort (); \ + if (d_6_##TYPE2[i * 2 + 0] !=3D y_6_##TYPE2) = \ + __builtin_abort (); \ + if (d_6_##TYPE2[i * 2 + 1] !=3D y2_6_##TYPE2) = \ + __builtin_abort (); \ + if (e_6_##TYPE3[i] !=3D z_6_##TYPE3) = \ + __builtin_abort (); \ + } = \ + for (int i =3D n_6_##TYPE1_##TYPE2_##TYPE3; = \ + i < n_6_##TYPE1_##TYPE2_##TYPE3 + 1; ++i) = \ + { = \ + if (f_6_##TYPE1[i * 4 + 0] !=3D 0) = \ + __builtin_abort (); \ + if (f_6_##TYPE1[i * 4 + 1] !=3D 0) = \ + __builtin_abort (); \ + if (f_6_##TYPE1[i * 4 + 2] !=3D 0) = \ + __builtin_abort (); \ + if (f_6_##TYPE1[i * 4 + 3] !=3D 0) = \ + __builtin_abort (); \ + if (d_6_##TYPE2[i * 2 + 0] !=3D 0) = \ + __builtin_abort (); \ + if (d_6_##TYPE2[i * 2 + 1] !=3D 0) = \ + __builtin_abort (); \ + if (e_6_##TYPE3[i] !=3D 0) = \ + __builtin_abort (); \ + } + +#define run_7(TYPE1, TYPE2, TYPE3) = \ + int n_7_##TYPE1_##TYPE2_##TYPE3 =3D 333; = \ + TYPE1 x_7_##TYPE1 =3D 239; = \ + TYPE1 x2_7_##TYPE1 =3D 132; = \ + TYPE1 x3_7_##TYPE1 =3D 39; = \ + TYPE1 x4_7_##TYPE1 =3D 48; = \ + TYPE2 y_7_##TYPE2 =3D 1036; = \ + TYPE2 y2_7_##TYPE2 =3D 3665; = \ + TYPE3 z_7_##TYPE3 =3D 5145656; = \ + TYPE1 f_7_##TYPE1[334 * 4 + 1] =3D {0}; = \ + TYPE2 d_7_##TYPE2[334 * 2 + 1] =3D {0}; = \ + TYPE3 e_7_##TYPE3[334] =3D {0}; = \ + test_1_##TYPE1_##TYPE2 (f_7_##TYPE1, d_7_##TYPE2, e_7_##TYPE3, x_7_##TYP= E1, \ + x2_7_##TYPE1, x3_7_##TYPE1, x4_7_##TYPE1, \ + y_7_##TYPE2, y2_7_##TYPE2, z_7_##TYPE3, \ + n_7_##TYPE1_##TYPE2_##TYPE3); \ + for (int i =3D 0; i < n_7_##TYPE1_##TYPE2_##TYPE3; ++i) = \ + { = \ + if (f_7_##TYPE1[i * 4 + 0] !=3D x_7_##TYPE1) = \ + __builtin_abort (); \ + if (f_7_##TYPE1[i * 4 + 1] !=3D x2_7_##TYPE1) = \ + __builtin_abort (); \ + if (f_7_##TYPE1[i * 4 + 2] !=3D x3_7_##TYPE1) = \ + __builtin_abort (); \ + if (f_7_##TYPE1[i * 4 + 3] !=3D x4_7_##TYPE1) = \ + __builtin_abort (); \ + if (d_7_##TYPE2[i * 2 + 0] !=3D y_7_##TYPE2) = \ + __builtin_abort (); \ + if (d_7_##TYPE2[i * 2 + 1] !=3D y2_7_##TYPE2) = \ + __builtin_abort (); \ + if (e_7_##TYPE3[i] !=3D z_7_##TYPE3) = \ + __builtin_abort (); \ + } = \ + for (int i =3D n_7_##TYPE1_##TYPE2_##TYPE3; = \ + i < n_7_##TYPE1_##TYPE2_##TYPE3 + 1; ++i) = \ + { = \ + if (f_7_##TYPE1[i * 4 + 0] !=3D 0) = \ + __builtin_abort (); \ + if (f_7_##TYPE1[i * 4 + 1] !=3D 0) = \ + __builtin_abort (); \ + if (f_7_##TYPE1[i * 4 + 2] !=3D 0) = \ + __builtin_abort (); \ + if (f_7_##TYPE1[i * 4 + 3] !=3D 0) = \ + __builtin_abort (); \ + if (d_7_##TYPE2[i * 2 + 0] !=3D 0) = \ + __builtin_abort (); \ + if (d_7_##TYPE2[i * 2 + 1] !=3D 0) = \ + __builtin_abort (); \ + if (e_7_##TYPE3[i] !=3D 0) = \ + __builtin_abort (); \ + } + +#define run_8(TYPE1, TYPE2, TYPE3) = \ + int n_8_##TYPE1_##TYPE2_##TYPE3 =3D 512; = \ + TYPE1 x_8_##TYPE1 =3D 239; = \ + TYPE1 x2_8_##TYPE1 =3D 132; = \ + TYPE1 x3_8_##TYPE1 =3D 39; = \ + TYPE1 x4_8_##TYPE1 =3D 48; = \ + TYPE2 y_8_##TYPE2 =3D 1036; = \ + TYPE2 y2_8_##TYPE2 =3D 3665; = \ + TYPE3 z_8_##TYPE3 =3D 5145656; = \ + TYPE1 f_8_##TYPE1[513 * 4 + 1] =3D {0}; = \ + TYPE2 d_8_##TYPE2[513 * 2 + 1] =3D {0}; = \ + TYPE3 e_8_##TYPE3[513] =3D {0}; = \ + test_1_##TYPE1_##TYPE2 (f_8_##TYPE1, d_8_##TYPE2, e_8_##TYPE3, x_8_##TYP= E1, \ + x2_8_##TYPE1, x3_8_##TYPE1, x4_8_##TYPE1, \ + y_8_##TYPE2, y2_8_##TYPE2, z_8_##TYPE3, \ + n_8_##TYPE1_##TYPE2_##TYPE3); \ + for (int i =3D 0; i < n_8_##TYPE1_##TYPE2_##TYPE3; ++i) = \ + { = \ + if (f_8_##TYPE1[i * 4 + 0] !=3D x_8_##TYPE1) = \ + __builtin_abort (); \ + if (f_8_##TYPE1[i * 4 + 1] !=3D x2_8_##TYPE1) = \ + __builtin_abort (); \ + if (f_8_##TYPE1[i * 4 + 2] !=3D x3_8_##TYPE1) = \ + __builtin_abort (); \ + if (f_8_##TYPE1[i * 4 + 3] !=3D x4_8_##TYPE1) = \ + __builtin_abort (); \ + if (d_8_##TYPE2[i * 2 + 0] !=3D y_8_##TYPE2) = \ + __builtin_abort (); \ + if (d_8_##TYPE2[i * 2 + 1] !=3D y2_8_##TYPE2) = \ + __builtin_abort (); \ + if (e_8_##TYPE3[i] !=3D z_8_##TYPE3) = \ + __builtin_abort (); \ + } = \ + for (int i =3D n_8_##TYPE1_##TYPE2_##TYPE3; = \ + i < n_8_##TYPE1_##TYPE2_##TYPE3 + 1; ++i) = \ + { = \ + if (f_8_##TYPE1[i * 4 + 0] !=3D 0) = \ + __builtin_abort (); \ + if (f_8_##TYPE1[i * 4 + 1] !=3D 0) = \ + __builtin_abort (); \ + if (f_8_##TYPE1[i * 4 + 2] !=3D 0) = \ + __builtin_abort (); \ + if (f_8_##TYPE1[i * 4 + 3] !=3D 0) = \ + __builtin_abort (); \ + if (d_8_##TYPE2[i * 2 + 0] !=3D 0) = \ + __builtin_abort (); \ + if (d_8_##TYPE2[i * 2 + 1] !=3D 0) = \ + __builtin_abort (); \ + if (e_8_##TYPE3[i] !=3D 0) = \ + __builtin_abort (); \ + } + +#define run_9(TYPE1, TYPE2, TYPE3) = \ + int n_9_##TYPE1_##TYPE2_##TYPE3 =3D 637; = \ + TYPE1 x_9_##TYPE1 =3D 222; = \ + TYPE1 x2_9_##TYPE1 =3D 111; = \ + TYPE1 x3_9_##TYPE1 =3D 11; = \ + TYPE1 x4_9_##TYPE1 =3D 7; = \ + TYPE2 y_9_##TYPE2 =3D 2034; = \ + TYPE2 y2_9_##TYPE2 =3D 6987; = \ + TYPE3 z_9_##TYPE3 =3D 1564616; = \ + TYPE1 f_9_##TYPE1[638 * 4 + 1] =3D {0}; = \ + TYPE2 d_9_##TYPE2[638 * 2 + 1] =3D {0}; = \ + TYPE3 e_9_##TYPE3[638] =3D {0}; = \ + test_1_##TYPE1_##TYPE2 (f_9_##TYPE1, d_9_##TYPE2, e_9_##TYPE3, x_9_##TYP= E1, \ + x2_9_##TYPE1, x3_9_##TYPE1, x4_9_##TYPE1, \ + y_9_##TYPE2, y2_9_##TYPE2, z_9_##TYPE3, \ + n_9_##TYPE1_##TYPE2_##TYPE3); \ + for (int i =3D 0; i < n_9_##TYPE1_##TYPE2_##TYPE3; ++i) = \ + { = \ + if (f_9_##TYPE1[i * 4 + 0] !=3D x_9_##TYPE1) = \ + __builtin_abort (); \ + if (f_9_##TYPE1[i * 4 + 1] !=3D x2_9_##TYPE1) = \ + __builtin_abort (); \ + if (f_9_##TYPE1[i * 4 + 2] !=3D x3_9_##TYPE1) = \ + __builtin_abort (); \ + if (f_9_##TYPE1[i * 4 + 3] !=3D x4_9_##TYPE1) = \ + __builtin_abort (); \ + if (d_9_##TYPE2[i * 2 + 0] !=3D y_9_##TYPE2) = \ + __builtin_abort (); \ + if (d_9_##TYPE2[i * 2 + 1] !=3D y2_9_##TYPE2) = \ + __builtin_abort (); \ + if (e_9_##TYPE3[i] !=3D z_9_##TYPE3) = \ + __builtin_abort (); \ + } = \ + for (int i =3D n_9_##TYPE1_##TYPE2_##TYPE3; = \ + i < n_9_##TYPE1_##TYPE2_##TYPE3 + 1; ++i) = \ + { = \ + if (f_9_##TYPE1[i * 4 + 0] !=3D 0) = \ + __builtin_abort (); \ + if (f_9_##TYPE1[i * 4 + 1] !=3D 0) = \ + __builtin_abort (); \ + if (f_9_##TYPE1[i * 4 + 2] !=3D 0) = \ + __builtin_abort (); \ + if (f_9_##TYPE1[i * 4 + 3] !=3D 0) = \ + __builtin_abort (); \ + if (d_9_##TYPE2[i * 2 + 0] !=3D 0) = \ + __builtin_abort (); \ + if (d_9_##TYPE2[i * 2 + 1] !=3D 0) = \ + __builtin_abort (); \ + if (e_9_##TYPE3[i] !=3D 0) = \ + __builtin_abort (); \ + } + +#define run_10(TYPE1, TYPE2, TYPE3) = \ + int n_10_##TYPE1_##TYPE2_##TYPE3 =3D 777; = \ + TYPE1 x_10_##TYPE1 =3D 222; = \ + TYPE1 x2_10_##TYPE1 =3D 111; = \ + TYPE1 x3_10_##TYPE1 =3D 11; = \ + TYPE1 x4_10_##TYPE1 =3D 7; = \ + TYPE2 y_10_##TYPE2 =3D 2034; = \ + TYPE2 y2_10_##TYPE2 =3D 6987; = \ + TYPE3 z_10_##TYPE3 =3D 1564616; = \ + TYPE1 f_10_##TYPE1[778 * 4 + 1] =3D {0}; = \ + TYPE2 d_10_##TYPE2[778 * 2 + 1] =3D {0}; = \ + TYPE3 e_10_##TYPE3[778] =3D {0}; = \ + test_1_##TYPE1_##TYPE2 (f_10_##TYPE1, d_10_##TYPE2, e_10_##TYPE3, x_10_#= #TYPE1, \ + x2_10_##TYPE1, x3_10_##TYPE1, x4_10_##TYPE1, \ + y_10_##TYPE2, y2_10_##TYPE2, z_10_##TYPE3, \ + n_10_##TYPE1_##TYPE2_##TYPE3); \ + for (int i =3D 0; i < n_10_##TYPE1_##TYPE2_##TYPE3; ++i) = \ + { = \ + if (f_10_##TYPE1[i * 4 + 0] !=3D x_10_##TYPE1) = \ + __builtin_abort (); \ + if (f_10_##TYPE1[i * 4 + 1] !=3D x2_10_##TYPE1) = \ + __builtin_abort (); \ + if (f_10_##TYPE1[i * 4 + 2] !=3D x3_10_##TYPE1) = \ + __builtin_abort (); \ + if (f_10_##TYPE1[i * 4 + 3] !=3D x4_10_##TYPE1) = \ + __builtin_abort (); \ + if (d_10_##TYPE2[i * 2 + 0] !=3D y_10_##TYPE2) = \ + __builtin_abort (); \ + if (d_10_##TYPE2[i * 2 + 1] !=3D y2_10_##TYPE2) = \ + __builtin_abort (); \ + if (e_10_##TYPE3[i] !=3D z_10_##TYPE3) = \ + __builtin_abort (); \ + } = \ + for (int i =3D n_10_##TYPE1_##TYPE2_##TYPE3; = \ + i < n_10_##TYPE1_##TYPE2_##TYPE3 + 1; ++i) = \ + { = \ + if (f_10_##TYPE1[i * 4 + 0] !=3D 0) = \ + __builtin_abort (); \ + if (f_10_##TYPE1[i * 4 + 1] !=3D 0) = \ + __builtin_abort (); \ + if (f_10_##TYPE1[i * 4 + 2] !=3D 0) = \ + __builtin_abort (); \ + if (f_10_##TYPE1[i * 4 + 3] !=3D 0) = \ + __builtin_abort (); \ + if (d_10_##TYPE2[i * 2 + 0] !=3D 0) = \ + __builtin_abort (); \ + if (d_10_##TYPE2[i * 2 + 1] !=3D 0) = \ + __builtin_abort (); \ + if (e_10_##TYPE3[i] !=3D 0) = \ + __builtin_abort (); \ + } + +#define TEST_ALL(T) = \ + T (int8_t, int16_t, int32_t) = \ + T (uint8_t, uint16_t, uint32_t) = \ + T (int16_t, int32_t, int64_t) = \ + T (uint16_t, uint32_t, uint64_t) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rg= roup_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_= rgroup_run-1.c new file mode 100644 index 00000000000..d3e187eae68 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup +++ _run-1.c @@ -0,0 +1,19 @@ +/* { dg-do run { target { riscv_vector } } } */ +/* { dg-additional-options "--param=20 +riscv-autovec-preference=3Dfixed-vlmax" } */ + +#include "multiple_rgroup-1.c" + +int main (void) +{ + TEST_ALL (run_1) + TEST_ALL (run_2) + TEST_ALL (run_3) + TEST_ALL (run_4) + TEST_ALL (run_5) + TEST_ALL (run_6) + TEST_ALL (run_7) + TEST_ALL (run_8) + TEST_ALL (run_9) + TEST_ALL (run_10) + return 0; +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rg= roup_run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_= rgroup_run-2.c new file mode 100644 index 00000000000..5166c9e35a0 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup +++ _run-2.c @@ -0,0 +1,19 @@ +/* { dg-do run { target { riscv_vector } } } */ +/* { dg-additional-options "--param=20 +riscv-autovec-preference=3Dfixed-vlmax" } */ + +#include "multiple_rgroup-2.c" + +int main (void) +{ + TEST_ALL (run_1) + TEST_ALL (run_2) + TEST_ALL (run_3) + TEST_ALL (run_4) + TEST_ALL (run_5) + TEST_ALL (run_6) + TEST_ALL (run_7) + TEST_ALL (run_8) + TEST_ALL (run_9) + TEST_ALL (run_10) + return 0; +} diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc inde= x ff6159e08d5..5344cb194e0 100644 --- a/gcc/tree-vect-loop-manip.cc +++ b/gcc/tree-vect-loop-manip.cc @@ -385,6 +385,66 @@ vect_maybe_permute_loop_masks (gimple_seq *seq, rgroup= _controls *dest_rgm, return false; } =20 +/* Try to use adjust loop lens for non-SLP multiple-rgroups. + + _36 =3D MIN_EXPR ; + + First length (MIN (X, VF/N)): + loop_len_15 =3D MIN_EXPR <_36, VF/N>; + + Second length: + tmp =3D _36 - loop_len_15; + loop_len_16 =3D MIN (tmp, VF/N); + + Third length: + tmp2 =3D tmp - loop_len_16; + loop_len_17 =3D MIN (tmp2, VF/N); + + Forth length: + tmp3 =3D tmp2 - loop_len_17; + loop_len_18 =3D MIN (tmp3, VF/N); */ + +static void +vect_adjust_loop_lens_control (tree iv_type, gimple_seq *seq, + rgroup_controls *dest_rgm, + rgroup_controls *src_rgm, tree step) { + tree ctrl_type =3D dest_rgm->type; + poly_uint64 nitems_per_ctrl + =3D TYPE_VECTOR_SUBPARTS (ctrl_type) * dest_rgm->factor; + tree length_limit =3D build_int_cst (iv_type, nitems_per_ctrl); + + for (unsigned int i =3D 0; i < dest_rgm->controls.length (); ++i) + { + if (!step) + step =3D src_rgm->controls[i / dest_rgm->controls.length ()]; + tree ctrl =3D dest_rgm->controls[i]; + if (i =3D=3D 0) + { + /* First iteration: MIN (X, VF/N) capped to the range [0, VF/N]. */ + gassign *assign + =3D gimple_build_assign (ctrl, MIN_EXPR, step, length_limit); + gimple_seq_add_stmt (seq, assign); + } + else if (i =3D=3D dest_rgm->controls.length () - 1) + { + /* Last iteration: Remain capped to the range [0, VF/N]. */ + gassign *assign =3D gimple_build_assign (ctrl, MINUS_EXPR, step, + dest_rgm->controls[i - 1]); + gimple_seq_add_stmt (seq, assign); + } + else + { + /* (MIN (remain, VF*I/N)) capped to the range [0, VF/N]. */ + step =3D gimple_build (seq, MINUS_EXPR, iv_type, step, + dest_rgm->controls[i - 1]); + gassign *assign + =3D gimple_build_assign (ctrl, MIN_EXPR, step, length_limit); + gimple_seq_add_stmt (seq, assign); + } + } +} + /* Helper for vect_set_loop_condition_partial_vectors. Generate definitio= ns for all the rgroup controls in RGC and return a control that is nonzero when the loop needs to iterate. Add any new preheader statements to @@= -468,9 +528,78 @@ vect_set_loop_controls_directly (class loop *loop, loop_= vec_info loop_vinfo, gimple_stmt_iterator incr_gsi; bool insert_after; standard_iv_increment_position (loop, &incr_gsi, &insert_after); - create_iv (build_int_cst (iv_type, 0), PLUS_EXPR, nitems_step, NULL_TREE= , - loop, &incr_gsi, insert_after, &index_before_incr, - &index_after_incr); + if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)) + { + nitems_total =3D gimple_convert (preheader_seq, iv_type, nitems_tota= l); + tree step =3D make_ssa_name (iv_type); + /* Create decrement IV. */ + create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, &incr_gs= i, + insert_after, &index_before_incr, &index_after_incr); + tree temp =3D gimple_build (header_seq, MIN_EXPR, iv_type, + index_before_incr, nitems_step); + gimple_seq_add_stmt (header_seq, gimple_build_assign (step,=20 +temp)); + + if (rgc->max_nscalars_per_iter =3D=3D 1) + { + /* single rgroup: + ... + _10 =3D (unsigned long) count_12(D); + ... + # ivtmp_9 =3D PHI + _36 =3D MIN_EXPR ; + ... + vect__4.8_28 =3D .LEN_LOAD (_17, 32B, _36, 0); + ... + ivtmp_35 =3D ivtmp_9 - _36; + ... + if (ivtmp_35 !=3D 0) + goto ; [83.33%] + else + goto ; [16.67%] + */ + gassign *assign =3D gimple_build_assign (rgc->controls[0], step); + gimple_seq_add_stmt (header_seq, assign); + } + else + { + /* Multiple rgroup (SLP): + ... + _38 =3D (unsigned long) bnd.7_29; + _39 =3D _38 * 2; + ... + # ivtmp_41 =3D PHI + ... + _43 =3D MIN_EXPR ; + loop_len_26 =3D MIN_EXPR <_43, 16>; + loop_len_25 =3D _43 - loop_len_26; + ... + .LEN_STORE (_6, 8B, loop_len_26, ...); + ... + .LEN_STORE (_25, 8B, loop_len_25, ...); + _33 =3D loop_len_26 / 2; + ... + .LEN_STORE (_8, 16B, _33, ...); + _36 =3D loop_len_25 / 2; + ... + .LEN_STORE (_15, 16B, _36, ...); + ivtmp_42 =3D ivtmp_41 - _43; + ... + if (ivtmp_42 !=3D 0) + goto ; [83.33%] + else + goto ; [16.67%] + */ + vect_adjust_loop_lens_control (iv_type, header_seq, rgc, NULL, step); + } + return index_after_incr; + } + else + { + /* Create increment IV. */ + create_iv (build_int_cst (iv_type, 0), PLUS_EXPR, nitems_step, NULL_= TREE, + loop, &incr_gsi, insert_after, &index_before_incr, + &index_after_incr); + } =20 tree zero_index =3D build_int_cst (compare_type, 0); tree test_index, test_limit, first_limit; @@ -704,6 +833,7 @@ vect_set_l= oop_condition_partial_vectors (class loop *loop, =20 bool use_masks_p =3D LOOP_VINFO_FULLY_MASKED_P (loop_vinfo); tree compare_type =3D LOOP_VINFO_RGROUP_COMPARE_TYPE (loop_vinfo); + tree iv_type =3D LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo); unsigned int compare_precision =3D TYPE_PRECISION (compare_type); tree orig_niters =3D niters; =20 @@ -753,6 +883,54 @@ vect_set_loop_condition_partial_vectors (class loop *l= oop, continue; } =20 + if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) + && rgc->max_nscalars_per_iter =3D=3D 1 + && rgc !=3D &LOOP_VINFO_LENS (loop_vinfo)[0]) + { + /* Multiple rgroup (non-SLP): + ... + _38 =3D (unsigned long) n_12(D); + ... + # ivtmp_38 =3D PHI + ... + _40 =3D MIN_EXPR ; + loop_len_21 =3D MIN_EXPR <_40, POLY_INT_CST [2, 2]>; + _41 =3D _40 - loop_len_21; + loop_len_20 =3D MIN_EXPR <_41, POLY_INT_CST [2, 2]>; + _42 =3D _40 - loop_len_20; + loop_len_19 =3D MIN_EXPR <_42, POLY_INT_CST [2, 2]>; + _43 =3D _40 - loop_len_19; + loop_len_16 =3D MIN_EXPR <_43, POLY_INT_CST [2, 2]>; + ... + vect__4.8_15 =3D .LEN_LOAD (_6, 64B, loop_len_21, 0); + ... + vect__4.9_8 =3D .LEN_LOAD (_13, 64B, loop_len_20, 0); + ... + vect__4.10_28 =3D .LEN_LOAD (_46, 64B, loop_len_19, 0); + ... + vect__4.11_30 =3D .LEN_LOAD (_49, 64B, loop_len_16, 0); + vect__7.13_31 =3D VEC_PACK_TRUNC_EXPR ; + vect__7.13_32 =3D VEC_PACK_TRUNC_EXPR <...>; + vect__7.12_33 =3D VEC_PACK_TRUNC_EXPR <...>; + ... + .LEN_STORE (_14, 16B, _40, vect__7.12_33, 0); + ivtmp_39 =3D ivtmp_38 - _40; + ... + if (ivtmp_39 !=3D 0) + goto ; [92.31%] + else + goto ; [7.69%] + */ + rgroup_controls *sub_rgc + =3D &(*controls)[nmasks / rgc->controls.length () - 1]; + if (!sub_rgc->controls.is_empty ()) + { + vect_adjust_loop_lens_control (iv_type, &header_seq, rgc, + sub_rgc, NULL_TREE); + continue; + } + } + /* See whether zero-based IV would ever generate all-false masks or zero length before wrapping around. */ bool might_wrap_p =3D vect_rgroup_iv_might_wrap_p (loop_vinfo, rgc); diff= --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index ed0166fedab..6= f49bdee009 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -973,6 +973,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, ve= c_info_shared *shared) vectorizable (false), can_use_partial_vectors_p (param_vect_partial_vector_usage !=3D 0), using_partial_vectors_p (false), + using_decrementing_iv_p (false), epil_using_partial_vectors_p (false), partial_load_store_bias (0), peeling_for_gaps (false), @@ -2725,6 +2726,16 @@ start_over: && !vect_verify_loop_lens (loop_vinfo)) LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) =3D false; =20 + /* If we're vectorizing an loop that uses length "controls" and + can iterate more than once, we apply decrementing IV approach + in loop control. */ + if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) + && !LOOP_VINFO_LENS (loop_vinfo).is_empty () + && !(LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) + && known_le (LOOP_VINFO_INT_NITERS (loop_vinfo), + LOOP_VINFO_VECT_FACTOR (loop_vinfo)))) + LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) =3D true; + /* If we're vectorizing an epilogue loop, the vectorized loop either nee= ds to be able to handle fewer than VF scalars, or needs to have a lower = VF than the main loop. */ @@ -10364,12 +10375,14 @@ vect_record_loop_len (loop_vec_info loop_vinfo, v= ec_loop_lens *lens, rgroup that operates on NVECTORS vectors, where 0 <=3D INDEX < NVECTORS= . */ =20 tree -vect_get_loop_len (loop_vec_info loop_vinfo, vec_loop_lens *lens, - unsigned int nvectors, unsigned int index) +vect_get_loop_len (loop_vec_info loop_vinfo, gimple_stmt_iterator *gsi, + vec_loop_lens *lens, unsigned int nvectors, tree vectype, + unsigned int index) { rgroup_controls *rgl =3D &(*lens)[nvectors - 1]; bool use_bias_adjusted_len =3D LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) !=3D 0; + tree iv_type =3D LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo); =20 /* Populate the rgroup's len array, if this is the first time we've used it. */ @@ -10400,6 +10413,26 @@ vect_get_loop_len (loop_vec_info loop_vinfo, vec_l= oop_lens *lens, =20 if (use_bias_adjusted_len) return rgl->bias_adjusted_ctrl; + else if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)) + { + tree loop_len =3D rgl->controls[index]; + poly_int64 nunits1 =3D TYPE_VECTOR_SUBPARTS (rgl->type); + poly_int64 nunits2 =3D TYPE_VECTOR_SUBPARTS (vectype); + if (maybe_ne (nunits1, nunits2)) + { + /* A loop len for data type X can be reused for data type Y + if X has N times more elements than Y and if Y's elements + are N times bigger than X's. */ + gcc_assert (multiple_p (nunits1, nunits2)); + unsigned int factor =3D exact_div (nunits1, nunits2).to_constant (); + gimple_seq seq =3D NULL; + loop_len =3D gimple_build (&seq, RDIV_EXPR, iv_type, loop_len, + build_int_cst (iv_type, factor)); + if (seq) + gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT); + } + return loop_len; + } else return rgl->controls[index]; } diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 7313191b= 0db..b5e4bc59355 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -8795,8 +8795,9 @@ vectorizable_store (vec_info *vinfo, else if (loop_lens) { tree final_len - =3D vect_get_loop_len (loop_vinfo, loop_lens, - vec_num * ncopies, vec_num * j + i); + =3D vect_get_loop_len (loop_vinfo, gsi, loop_lens, + vec_num * ncopies, vectype, + vec_num * j + i); tree ptr =3D build_int_cst (ref_type, align * BITS_PER_UNIT); machine_mode vmode =3D TYPE_MODE (vectype); opt_machine_mode new_ovmode @@ -10151,8 +10152,8 @@ vectorizable_load (vec_info *vinfo, else if (loop_lens && memory_access_type !=3D VMAT_INVARIANT) { tree final_len - =3D vect_get_loop_len (loop_vinfo, loop_lens, - vec_num * ncopies, + =3D vect_get_loop_len (loop_vinfo, gsi, loop_lens, + vec_num * ncopies, vectype, vec_num * j + i); tree ptr =3D build_int_cst (ref_type, align * BITS_PER_UNIT); diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 9cf2fb23fe= 3..8af3b35324e 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -818,6 +818,13 @@ public: the vector loop can handle fewer than VF scalars. */ bool using_partial_vectors_p; =20 + /* True if we've decided to use a decrementing loop control IV that coun= ts + scalars. This can be done for any loop that: + + (a) uses length "controls"; and + (b) can iterate more than once. */ + bool using_decrementing_iv_p; + /* True if we've decided to use partially-populated vectors for the epilogue of loop. */ bool epil_using_partial_vectors_p; @@ -890,6 +897,7 @@ public: #define LOOP_VINFO_VECTORIZABLE_P(L) (L)->vectorizable #define LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P(L) (L)->can_use_partial_vecto= rs_p #define LOOP_VINFO_USING_PARTIAL_VECTORS_P(L) (L)->using_partial_vect= ors_p +#define LOOP_VINFO_USING_DECREMENTING_IV_P(L)=20 +(L)->using_decrementing_iv_p #define LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P(L) = \ (L)->epil_using_partial_vectors_p #define LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS(L) (L)->partial_load_store_bias= @@ -2293,8 +2301,9 @@ extern tree vect_get_loop_mask (gimple_stmt_iterator= *, vec_loop_masks *, unsigned int, tree, unsigned int); extern void vect_record_loop_len (loop_vec_info, vec_loop_lens *, unsigned= int, tree, unsigned int); -extern tree vect_get_loop_len (loop_vec_info, vec_loop_lens *, unsigned in= t, - unsigned int); +extern tree vect_get_loop_len (loop_vec_info, gimple_stmt_iterator *, + vec_loop_lens *, unsigned int, tree, + unsigned int); extern gimple_seq vect_gen_len (tree, tree, tree, tree); extern stmt_vec_= info info_for_reduction (vec_info *, stmt_vec_info); extern bool reduction= _fn_for_scalar_code (code_helper, internal_fn *); -- 2.36.1