From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on20700.outbound.protection.outlook.com [IPv6:2a01:111:f400:7e88::700]) by sourceware.org (Postfix) with ESMTPS id 42930388206F for ; Fri, 14 Jun 2024 04:02:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 42930388206F Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=os.amperecomputing.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=os.amperecomputing.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 42930388206F Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f400:7e88::700 ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1718337779; cv=pass; b=bCVhTtEjD27phTTERZcpQIkvvwCCKS40ekiEpq6v/SqJZL1Z6AcmYXn1+8j7XLWvqgSuHfSIgPI+NAurSuV4t5GV75esMLDrtBb3C+9ZPqmLsoK8gfh9u+Lnzd7Ejce+xEjC4AuLc/luUt18Poy9DzEdAjACs3lRXXUJ2bxk2sM= ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1718337779; c=relaxed/simple; bh=qZIOtjKXAdFT10iutJSTl5eyHzPGjetnLnFFz13+Dug=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=DATW4q35IwK2xbEOojfMo3wkZL4BYE1QtR/GBCXu6EpX5JoBXwgUB2+gDHBh8Ff8g3Bmp31A2IFt0+7AHlEcvI8X2fMm097G3NDU4JU1wh22UGk8oTQEsx4deyMcl2zkIjSZ4mTzQFF8fb9oeFwFNEOuAN/rHDy9yqX48XlmqiE= ARC-Authentication-Results: i=2; server2.sourceware.org ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=iQDxYQaonVky+9vFppYV2+IfVDB1owRSRw4fktXGdn2B7QOna5NKCiv3EGCvrHSqJpjO50Eb2Gmo3zxQtyvSC7ljZdYdaTZILYwe64jxiXY3He8R5MDil03CCnRwhqG/TA2Y3kAgUgtqHwQxh+DyeTyAqQdbnKGUMpJAPDTB2TNxK+uzI8MfXA2V0mLwO1JOlSGj4uotcdqLjSmUqL5UE4NqG83ne2K5uniB/YhmEaWnRpmAGgsHpAM9JebxJklttub7gRjCn9QoTqveapr7ax09q9FYy+tCp86Rsu4IXseB6SQJER96QAnwaUqzSShGuxIUoTXTvu6NV3xLGY63KQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=DgbcQsSLo++YROKbPMOhjvGheXPXrDpsY0ghlIR2piQ=; b=UEYubiXpv2VwIqLS/UKxDqHB+kpssotpu4rxc/g2egDvX6wfkrAiWWqKeKMSkiux6Lv8tP+Fdzb093yY9L2FQLtohqjnG/HFmhoRJ2tev7hv0BMSHE/DQNpz8F9hkIlKcKraFEs2VporryrcTxewOc6JqZbrUQzXf1stqCTvu3VVwPxjly5dE3+JwLoLB0GcLakUnpkNyGcR2CQ4wLVCUk3aMHyGfBt88zDnih9y1srSS2DvGnqvrZ4a1iGZkg60xURVqayFwLQdcPcDGkbFpFoZFIMldK3WjAbOgCsaNH7Fbe3f6JMovmprLmgtFcgGhqWGr1JLbjrDBH6Jj/Ud7w== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=os.amperecomputing.com; dmarc=pass action=none header.from=os.amperecomputing.com; dkim=pass header.d=os.amperecomputing.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=os.amperecomputing.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=DgbcQsSLo++YROKbPMOhjvGheXPXrDpsY0ghlIR2piQ=; b=cHn3Jls0w9ibVBXxSsh5gOOEwpdsSG9fsDRG0psvk1ugPtESnKdid5WXhnz6BQTokKmQg7mftVjfwJD+MGhkpt7W5FN9d49SuuTRXMupD2Naj9tAa+Ni0e+y1DHRvvkrMUafaxEL4RrzUWnBku9p/DOPgpZKcaCDVsc/rx+Ru4E= Received: from LV2PR01MB7839.prod.exchangelabs.com (2603:10b6:408:14f::13) by SA1PR01MB6592.prod.exchangelabs.com (2603:10b6:806:185::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7677.25; Fri, 14 Jun 2024 04:02:52 +0000 Received: from LV2PR01MB7839.prod.exchangelabs.com ([fe80::2ac3:5a77:36fd:9c63]) by LV2PR01MB7839.prod.exchangelabs.com ([fe80::2ac3:5a77:36fd:9c63%4]) with mapi id 15.20.7633.036; Fri, 14 Jun 2024 04:02:52 +0000 From: Feng Xue OS To: Richard Biener CC: Tamar Christina , "gcc-patches@gcc.gnu.org" Subject: Re: [PATCH 6/6] vect: Optimize order of lane-reducing statements in loop def-use cycles [PR114440] Thread-Topic: [PATCH 6/6] vect: Optimize order of lane-reducing statements in loop def-use cycles [PR114440] Thread-Index: AQHasqFnWdDIH0r2PU6diIcfjAEnKLHGuY3R Date: Fri, 14 Jun 2024 04:02:52 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: msip_labels: MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Enabled=True;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SiteId=3bc2b170-fd94-476d-b0ce-4229bdc904a7;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SetDate=2024-06-14T04:02:52.130Z;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Name=Confidential;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_ContentBits=0;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Method=Standard; authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=os.amperecomputing.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: LV2PR01MB7839:EE_|SA1PR01MB6592:EE_ x-ms-office365-filtering-correlation-id: 54b65905-0b3d-44b0-743c-08dc8c26dd17 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0;ARA:13230035|1800799019|366011|376009|38070700013; x-microsoft-antispam-message-info: =?us-ascii?Q?nNkG0wLD2w2BBc0Iq9G2MIbBt2PJrco48LttP+Q6xylNHJA+zdk6xf/gy8qm?= =?us-ascii?Q?fI8ia6dpldBMBcWc3TKy8QNaVeppsRbWTzDoQ1di4WD6bHH/SbPOwH554ph0?= =?us-ascii?Q?PTIDLL+UnrMAMM6MMA0teJ6Tsb9ANdd8dJHanQ/LqXwHB0xF3vEpWeBX6VJD?= =?us-ascii?Q?ryaKmNkcCbTBZdd80dU2OrlAxnVP9BsqIdz/fKbq/TreqHtwOrlRswxKbBGC?= =?us-ascii?Q?12MRpdLJZizshHlWjwgsk776M6sTEfAd/Dt8n7zZBnSRLOmj0q8ndp5AT/C0?= =?us-ascii?Q?34mKemQJzI7+vaoZKEn2Oi/APJyDVqAF90m/OiBcoNhusKUpnyDFd3Z/DLC0?= =?us-ascii?Q?Gpw2e/ovVk3Dcx6xiGB3CdQGg8NM1uO1RxzFp2RzEM0a1eeAbbMNmFadlZr5?= =?us-ascii?Q?m+ZvmMHt90TkJEpKT95hvFhg+FwgmKUSRbXZRz6ypktt+yLvC64r/wOLG9jO?= =?us-ascii?Q?q/4/RoKIvOBfBKHR44ft3Na+aaxKzi54wraylZuPdfCeMbG5vSflF0QrBLAd?= =?us-ascii?Q?s0WcKVQ0I88u8cBoPv7ZWoUTxdG4T6EAkDXhxV4Ttd8E3SbYUVIvf5YLl6if?= =?us-ascii?Q?pOG+uZh3K+mPvPJJl/L8eD+HopWwa6aqpVjTGWKVorZ44IckgxRv9fbnYWGn?= =?us-ascii?Q?eK4qUlbU2ca53fyRmpqnPYFmLJGMwA/bs4NrqzlZVCvk1Z3t3ojmE257LzTA?= =?us-ascii?Q?OVtZ4xCnj/zoJWIQIWJg/4whUeeoMaGwAF7UeEJWMq7BZ/duYcvTxeX87sGJ?= =?us-ascii?Q?1xSpxSJk/LFkjo7iB7kFLPiQzzARGQE+2TGJtJ2zof4Rg9jRQi2MPvhJk9bh?= =?us-ascii?Q?6T9r6tnKylFzh1BDr+NSO4l3902pgVIF6EymaBlrRg3KIrt2H0RUnAHYFwqb?= =?us-ascii?Q?OX/PwQNiOkOwmJ2mvpU4I1V4nn5uGuanK6xCM7lEYtoQH4DMxb4e1ZYunJHF?= =?us-ascii?Q?uHitiy6ltBgD1Of4+3PGdsLK8UWrZwdhN7zP+KJehuoMXBzlHGSnYblm7q39?= =?us-ascii?Q?+zZuaNKIp+soI4+z5Diu3GiBYKagpr7VlscFi7oIUzc/AGenp9d5d8YRbXuJ?= =?us-ascii?Q?wG8FK/DGq+0r1AAXg+CugvaRFR4SDCA3/N5cKCAgVVFt23CvQJjbw0HuBzLf?= =?us-ascii?Q?2u07yUsaKOdrYGWvf1ZMixVjdVMP3aM0Degde6Fu2MKb+y/Wp/cHCzgyWy7d?= =?us-ascii?Q?GoDRZUwXFUKXT3HLBH2ooaCn4Su1KKxiT2omxBjBFC1V3Jno1pUFoU6HhEIU?= =?us-ascii?Q?RuaSYzLMNmscCsAnrDRSMj4AvjSuKWO5UYmkhEjJTUI9bv1Q9cS90RG59bNP?= =?us-ascii?Q?UICRnHWXOUdBaUEmviSWyDIpA/49+yUKHr+Hs1U3UJCW4jC+Cs496u/GozJO?= =?us-ascii?Q?lkwDDWsTfmVAwvZVx/d5i2YiVQaa?= x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:LV2PR01MB7839.prod.exchangelabs.com;PTR:;CAT:NONE;SFS:(13230035)(1800799019)(366011)(376009)(38070700013);DIR:OUT;SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?1fC09mxHI91gALbzsZo25mtQ/c/ONZA+OQg326bQdxKxFJ+aJ5q+B7qWBRuN?= =?us-ascii?Q?AmJ1JgQb1/ubKNiVVegHpVrh1dWxx3qhXQ52qNcvUWH8NZpoLvZnQlkFJ1MM?= =?us-ascii?Q?FKyzAYyizipOh45thHv6oZRY/XS5yrYFItJFR9AQf9LeSys0aQ/MwuG4RtQ6?= =?us-ascii?Q?E5zJN5jnDs24PIR7LLoeIJIbgkVKn9K6TTRjakTdiB9qLYpxu94rDr8x6Ahr?= =?us-ascii?Q?ywLIUExB8mVE4v1H6cyz3C1p8OF71+whWqhVRKMzMZg25Y0SQDgBkA8tboRg?= =?us-ascii?Q?y7iUzW3kUGngh+lKTTUrZiq08loVcgVe2reXxFf4QbmnxkpxnVBq2GqzPWLb?= =?us-ascii?Q?BL0+KjJEATPEJs1p6LwmHecvQ7doGJgXwZ1ReSlWbQC5wE7ecMVUZQf9ZJ3K?= =?us-ascii?Q?KwLW776UaMY4UfXp1BZwcQDGo4cJY7HrN6TtMLusV0uUh47eFCetLCySilug?= =?us-ascii?Q?7Px72sQzx35NFTExXFtw23sHk7lZGwkbzO3DdBRRO/olb36bDFLweuW1cT0K?= =?us-ascii?Q?/6xGCalyYCi1uU9OK2wtMgtVs/tID7Y64rBJv4wnQ3qzjIHrW7MN8VmTHrhm?= =?us-ascii?Q?Qj6ofm9ruB1BYckIo6rD92GhcXbUTKnSZjkmBaI9tdfO7AubzbVza03QuiZv?= =?us-ascii?Q?+wnGrEJZrMRCc6H1FHfjwbyedufCjEYTtt/+kmxkWGhpvvQHnpFg2wSqka1X?= =?us-ascii?Q?fZopJ3k6ZC8DPL2FNMD2++MIJ7qjGM5SLz1PFjGtDvk6TKoFuA7qUb76sU5q?= =?us-ascii?Q?ozo8r9aq7yrVCH1wkDgwXbX2IvbgZBDkTWmCRtI36Ci9we04cIF0MBDKvgQG?= =?us-ascii?Q?uAJ9BGZ6nDUoPYBXG1gNbE3s+OUEQ8vxEqSPhEB8LZ612eJ6u5oSPieSCCEF?= =?us-ascii?Q?sL7dj/OcsHK3/Uolcaw6NYD1A3NNIwfwNzi1B5jNyv93DS8ZtPnC8a7GfxZd?= =?us-ascii?Q?RLdzfjv7wplI2eShCumhjTPzUgeLha0ZceO5uKS9nUt6H/bWFimMbPHYrcyb?= =?us-ascii?Q?+Eyj6NZU5dVS8NEGbvHO9A+4ojmukWyFytvjn9W30c3frCgNEZETyCKpMV++?= =?us-ascii?Q?XUnfvS7ubhg/hWFachQdkvV4D9sSGcOId2ykXFye0eq37RNLuYFJfldkkIdg?= =?us-ascii?Q?aaxzfYFqSBYwiZicGgeFkfIcS/jQorZu4sL2+4lxXPnuw9SRada47KzO2W/q?= =?us-ascii?Q?7JVHVTgepKNxMJn5r0D8mQaJmJeYX4lZRl8R+JJARa3/pC7+k6yuVZPVxM7T?= =?us-ascii?Q?WsoxtqIcX4Ajsj/DgXFtM/FSrSB0AvCQgagnhtHlnwBurPtrqTA/4xl1KDtK?= =?us-ascii?Q?h/OzRCGoTwdC7nofrjwMHWV10aRk93VWpjpcpVND2A1i8IG/jPNNyW+NzCTc?= =?us-ascii?Q?aqraBfvx7QU+oDUbZM1/2qFm1YmRNsgCoJNQ9SRSqHO6t1bHl39ViuCvWPg6?= =?us-ascii?Q?opTykQoLfUb53ts5PeJxmTFpODGWVPiT8oEE4ms77EyII422P9+SxrEpoTN3?= =?us-ascii?Q?NNhQuY+qSAAHR0CmE6q2/9P8bHIpy6abTygyCRGmHLCsgvzFIWFSlygAanLv?= =?us-ascii?Q?ukmXkE1Kwqo0/my8w/fMXsTh5f7KkHj+7A5gFjhE?= Content-Type: multipart/mixed; boundary="_002_LV2PR01MB78394902F92CC907812FB837F7C22LV2PR01MB7839prod_" MIME-Version: 1.0 X-OriginatorOrg: os.amperecomputing.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: LV2PR01MB7839.prod.exchangelabs.com X-MS-Exchange-CrossTenant-Network-Message-Id: 54b65905-0b3d-44b0-743c-08dc8c26dd17 X-MS-Exchange-CrossTenant-originalarrivaltime: 14 Jun 2024 04:02:52.5405 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 3bc2b170-fd94-476d-b0ce-4229bdc904a7 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: /2w2SNJJGpx5/AT/pS6VGIfe2MDhqT/1WJOLgJZGLe0raNcF5FCCa39OKA3BijBsEIpgbR/Di00ZCk3D046EES3v01v/cWJ3snZvGtcwvSk= X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA1PR01MB6592 X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --_002_LV2PR01MB78394902F92CC907812FB837F7C22LV2PR01MB7839prod_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Regenerate the patch due to changes on its dependent patches. Thanks, Feng, --- gcc/ PR tree-optimization/114440 * tree-vectorizer.h (struct _stmt_vec_info): Add a new field reduc_result_pos. * tree-vect-loop.cc (vect_transform_reduction): Generate lane-reduc= ing statements in an optimized order. --- gcc/tree-vect-loop.cc | 51 ++++++++++++++++++++++++++++++++++++++----- gcc/tree-vectorizer.h | 6 +++++ 2 files changed, 51 insertions(+), 6 deletions(-) diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index fb9259d115c..de7a9bab990 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -8734,7 +8734,8 @@ vect_transform_reduction (loop_vec_info loop_vinfo, } bool single_defuse_cycle =3D STMT_VINFO_FORCE_SINGLE_CYCLE (reduc_info); - gcc_assert (single_defuse_cycle || lane_reducing_op_p (code)); + bool lane_reducing =3D lane_reducing_op_p (code); + gcc_assert (single_defuse_cycle || lane_reducing); /* Create the destination vector */ tree scalar_dest =3D gimple_get_lhs (stmt_info->stmt); @@ -8751,6 +8752,8 @@ vect_transform_reduction (loop_vec_info loop_vinfo, } else { + int result_pos =3D 0; + /* The input vectype of the reduction PHI determines copies of vectorized def-use cycles, which might be more than effective copi= es of vectorized lane-reducing reduction statements. This could be @@ -8780,9 +8783,9 @@ vect_transform_reduction (loop_vec_info loop_vinfo, sum_v2 =3D sum_v2; // copy sum_v3 =3D sum_v3; // copy - sum_v0 =3D SAD (s0_v0[i: 0 ~ 7 ], s1_v0[i: 0 ~ 7 ], sum_v0); - sum_v1 =3D SAD (s0_v1[i: 8 ~ 15], s1_v1[i: 8 ~ 15], sum_v1); - sum_v2 =3D sum_v2; // copy + sum_v0 =3D sum_v0; // copy + sum_v1 =3D SAD (s0_v1[i: 0 ~ 7 ], s1_v1[i: 0 ~ 7 ], sum_v1); + sum_v2 =3D SAD (s0_v2[i: 8 ~ 15], s1_v2[i: 8 ~ 15], sum_v2); sum_v3 =3D sum_v3; // copy sum_v0 +=3D n_v0[i: 0 ~ 3 ]; @@ -8790,7 +8793,20 @@ vect_transform_reduction (loop_vec_info loop_vinfo, sum_v2 +=3D n_v2[i: 8 ~ 11]; sum_v3 +=3D n_v3[i: 12 ~ 15]; } - */ + + Moreover, for a higher instruction parallelism in final vectorized + loop, it is considered to make those effective vectorized + lane-reducing statements be distributed evenly among all def-use + cycles. In the above example, SADs are generated into other cycles + rather than that of DOT_PROD. */ + + if (stmt_ncopies < ncopies) + { + gcc_assert (lane_reducing); + result_pos =3D reduc_info->reduc_result_pos; + reduc_info->reduc_result_pos =3D (result_pos + stmt_ncopies) % nc= opies; + gcc_assert (result_pos >=3D 0 && result_pos < ncopies); + } for (i =3D 0; i < MIN (3, (int) op.num_ops); i++) { @@ -8826,7 +8842,30 @@ vect_transform_reduction (loop_vec_info loop_vinfo, op.ops[i], &vec_oprnds[i], vecty= pe); if (used_ncopies < ncopies) - vec_oprnds[i].safe_grow_cleared (ncopies); + { + vec_oprnds[i].safe_grow_cleared (ncopies); + + /* Find suitable def-use cycles to generate vectorized + statements into, and reorder operands based on the + selection. */ + if (i !=3D reduc_index && result_pos) + { + int count =3D ncopies - used_ncopies; + int start =3D result_pos - count; + + if (start < 0) + { + count =3D result_pos; + start =3D 0; + } + + for (int j =3D used_ncopies - 1; j >=3D start; j--) + { + std::swap (vec_oprnds[i][j], vec_oprnds[i][j + count]= ); + gcc_assert (!vec_oprnds[i][j]); + } + } + } } } diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 3f7db707d97..b9bc9d432ee 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -1402,6 +1402,12 @@ public: /* The vector type for performing the actual reduction. */ tree reduc_vectype; + /* For loop reduction with multiple vectorized results (ncopies > 1), a + lane-reducing operation participating in it may not use all of those + results, this field specifies result index starting from which any + following land-reducing operation would be assigned to. */ + int reduc_result_pos; + /* If IS_REDUC_INFO is true and if the vector code is performing N scalar reductions in parallel, this variable gives the initial scalar values of those N reductions. */ -- 2.17.1 ________________________________________ From: Feng Xue OS Sent: Thursday, May 30, 2024 10:56 PM To: Richard Biener Cc: Tamar Christina; gcc-patches@gcc.gnu.org Subject: [PATCH 6/6] vect: Optimize order of lane-reducing statements in lo= op def-use cycles [PR114440] When transforming multiple lane-reducing operations in a loop reduction cha= in, originally, corresponding vectorized statements are generated into def-use cycles starting from 0. The def-use cycle with smaller index, would contain more statements, which means more instruction dependency. For example: int sum =3D 0; for (i) { sum +=3D d0[i] * d1[i]; // dot-prod sum +=3D w[i]; // widen-sum sum +=3D abs(s0[i] - s1[i]); // sad } Original transformation result: for (i / 16) { sum_v0 =3D DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], sum_v0); sum_v1 =3D sum_v1; // copy sum_v2 =3D sum_v2; // copy sum_v3 =3D sum_v3; // copy sum_v0 =3D WIDEN_SUM (w_v0[i: 0 ~ 15], sum_v0); sum_v1 =3D sum_v1; // copy sum_v2 =3D sum_v2; // copy sum_v3 =3D sum_v3; // copy sum_v0 =3D SAD (s0_v0[i: 0 ~ 7 ], s1_v0[i: 0 ~ 7 ], sum_v0); sum_v1 =3D SAD (s0_v1[i: 8 ~ 15], s1_v1[i: 8 ~ 15], sum_v1); sum_v2 =3D sum_v2; // copy sum_v3 =3D sum_v3; // copy } For a higher instruction parallelism in final vectorized loop, an optimal means is to make those effective vectorized lane-reducing statements be distributed evenly among all def-use cycles. Transformed as the below, DOT_PROD, WIDEN_SUM and SADs are generated into disparate cycles, instruction dependency could be eliminated. for (i / 16) { sum_v0 =3D DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], sum_v0); sum_v1 =3D sum_v1; // copy sum_v2 =3D sum_v2; // copy sum_v3 =3D sum_v3; // copy sum_v0 =3D sum_v0; // copy sum_v1 =3D WIDEN_SUM (w_v1[i: 0 ~ 15], sum_v1); sum_v2 =3D sum_v2; // copy sum_v3 =3D sum_v3; // copy sum_v0 =3D sum_v0; // copy sum_v1 =3D sum_v1; // copy sum_v2 =3D SAD (s0_v2[i: 0 ~ 7 ], s1_v2[i: 0 ~ 7 ], sum_v2); sum_v3 =3D SAD (s0_v3[i: 8 ~ 15], s1_v3[i: 8 ~ 15], sum_v3); } Thanks, Feng --- gcc/ PR tree-optimization/114440 * tree-vectorizer.h (struct _stmt_vec_info): Add a new field reduc_result_pos. * tree-vect-loop.cc (vect_transform_reduction): Generate lane-reduc= ing statements in an optimized order. --- gcc/tree-vect-loop.cc | 51 ++++++++++++++++++++++++++++++++++++++----- gcc/tree-vectorizer.h | 6 +++++ 2 files changed, 51 insertions(+), 6 deletions(-) diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index b5849dbb08a..4807f529506 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -8703,7 +8703,8 @@ vect_transform_reduction (loop_vec_info loop_vinfo, } bool single_defuse_cycle =3D STMT_VINFO_FORCE_SINGLE_CYCLE (reduc_info); - gcc_assert (single_defuse_cycle || lane_reducing_op_p (code)); + bool lane_reducing =3D lane_reducing_op_p (code); + gcc_assert (single_defuse_cycle || lane_reducing); /* Create the destination vector */ tree scalar_dest =3D gimple_get_lhs (stmt_info->stmt); @@ -8720,6 +8721,8 @@ vect_transform_reduction (loop_vec_info loop_vinfo, } else { + int result_pos =3D 0; + /* The input vectype of the reduction PHI determines copies of vectorized def-use cycles, which might be more than effective copi= es of vectorized lane-reducing reduction statements. This could be @@ -8749,9 +8752,9 @@ vect_transform_reduction (loop_vec_info loop_vinfo, sum_v2 =3D sum_v2; // copy sum_v3 =3D sum_v3; // copy - sum_v0 =3D SAD (s0_v0[i: 0 ~ 7 ], s1_v0[i: 0 ~ 7 ], sum_v0); - sum_v1 =3D SAD (s0_v1[i: 8 ~ 15], s1_v1[i: 8 ~ 15], sum_v1); - sum_v2 =3D sum_v2; // copy + sum_v0 =3D sum_v0; // copy + sum_v1 =3D SAD (s0_v1[i: 0 ~ 7 ], s1_v1[i: 0 ~ 7 ], sum_v1); + sum_v2 =3D SAD (s0_v2[i: 8 ~ 15], s1_v2[i: 8 ~ 15], sum_v2); sum_v3 =3D sum_v3; // copy sum_v0 +=3D n_v0[i: 0 ~ 3 ]; @@ -8759,7 +8762,20 @@ vect_transform_reduction (loop_vec_info loop_vinfo, sum_v2 +=3D n_v2[i: 8 ~ 11]; sum_v3 +=3D n_v3[i: 12 ~ 15]; } - */ + + Moreover, for a higher instruction parallelism in final vectorized + loop, it is considered to make those effective vectorized + lane-reducing statements be distributed evenly among all def-use + cycles. In the above example, SADs are generated into other cycles + rather than that of DOT_PROD. */ + + if (stmt_ncopies < ncopies) + { + gcc_assert (lane_reducing); + result_pos =3D reduc_info->reduc_result_pos; + reduc_info->reduc_result_pos =3D (result_pos + stmt_ncopies) % nc= opies; + gcc_assert (result_pos >=3D 0 && result_pos < ncopies); + } for (i =3D 0; i < MIN (3, (int) op.num_ops); i++) { @@ -8792,7 +8808,30 @@ vect_transform_reduction (loop_vec_info loop_vinfo, op.ops[i], &vec_oprnds[i], vectype= ); if (used_ncopies < ncopies) - vec_oprnds[i].safe_grow_cleared (ncopies); + { + vec_oprnds[i].safe_grow_cleared (ncopies); + + /* Find suitable def-use cycles to generate vectorized + statements into, and reorder operands based on the + selection. */ + if (i !=3D reduc_index && result_pos) + { + int count =3D ncopies - used_ncopies; + int start =3D result_pos - count; + + if (start < 0) + { + count =3D result_pos; + start =3D 0; + } + + for (int j =3D used_ncopies - 1; j >=3D start; j--) + { + std::swap (vec_oprnds[i][j], vec_oprnds[i][j + count]= ); + gcc_assert (!vec_oprnds[i][j]); + } + } + } } } diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index ca810869592..d64729ac953 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -1402,6 +1402,12 @@ public: /* The vector type for performing the actual reduction. */ tree reduc_vectype; + /* For loop reduction with multiple vectorized results (ncopies > 1), a + lane-reducing operation participating in it may not use all of those + results, this field specifies result index starting from which any + following land-reducing operation would be assigned to. */ + int reduc_result_pos; + /* If IS_REDUC_INFO is true and if the vector code is performing N scalar reductions in parallel, this variable gives the initial scalar values of those N reductions. */ -- 2.17.1 --_002_LV2PR01MB78394902F92CC907812FB837F7C22LV2PR01MB7839prod_ Content-Type: text/x-patch; name="0003-vect-Optimize-order-of-lane-reducing-statements-in-l.patch" Content-Description: 0003-vect-Optimize-order-of-lane-reducing-statements-in-l.patch Content-Disposition: attachment; filename="0003-vect-Optimize-order-of-lane-reducing-statements-in-l.patch"; size=6556; creation-date="Fri, 14 Jun 2024 04:02:47 GMT"; modification-date="Fri, 14 Jun 2024 04:02:47 GMT" Content-Transfer-Encoding: base64 RnJvbSBkY2FhOTJjYzE1MzA4YmEyYTBjNjVlNTAyYjMwNmQzZGQ0MTNhYjlkIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBGZW5nIFh1ZSA8Znh1ZUBvcy5hbXBlcmVjb21wdXRpbmcuY29t PgpEYXRlOiBXZWQsIDI5IE1heSAyMDI0IDE3OjI4OjE0ICswODAwClN1YmplY3Q6IFtQQVRDSCAz LzNdIHZlY3Q6IE9wdGltaXplIG9yZGVyIG9mIGxhbmUtcmVkdWNpbmcgc3RhdGVtZW50cyBpbiBs b29wCiBkZWYtdXNlIGN5Y2xlcwoKV2hlbiB0cmFuc2Zvcm1pbmcgbXVsdGlwbGUgbGFuZS1yZWR1 Y2luZyBvcGVyYXRpb25zIGluIGEgbG9vcCByZWR1Y3Rpb24gY2hhaW4sCm9yaWdpbmFsbHksIGNv cnJlc3BvbmRpbmcgdmVjdG9yaXplZCBzdGF0ZW1lbnRzIGFyZSBnZW5lcmF0ZWQgaW50byBkZWYt dXNlCmN5Y2xlcyBzdGFydGluZyBmcm9tIDAuIFRoZSBkZWYtdXNlIGN5Y2xlIHdpdGggc21hbGxl ciBpbmRleCwgd291bGQgY29udGFpbgptb3JlIHN0YXRlbWVudHMsIHdoaWNoIG1lYW5zIG1vcmUg aW5zdHJ1Y3Rpb24gZGVwZW5kZW5jeS4gRm9yIGV4YW1wbGU6CgogICBpbnQgc3VtID0gMDsKICAg Zm9yIChpKQogICAgIHsKICAgICAgIHN1bSArPSBkMFtpXSAqIGQxW2ldOyAgICAgIC8vIGRvdC1w cm9kIDx2ZWN0b3IoMTYpIGNoYXI+CiAgICAgICBzdW0gKz0gd1tpXTsgICAgICAgICAgICAgICAv LyB3aWRlbi1zdW0gPHZlY3RvcigxNikgY2hhcj4KICAgICAgIHN1bSArPSBhYnMoczBbaV0gLSBz MVtpXSk7IC8vIHNhZCA8dmVjdG9yKDgpIHNob3J0PgogICAgIH0KCk9yaWdpbmFsIHRyYW5zZm9y bWF0aW9uIHJlc3VsdDoKCiAgIGZvciAoaSAvIDE2KQogICAgIHsKICAgICAgIHN1bV92MCA9IERP VF9QUk9EIChkMF92MFtpOiAwIH4gMTVdLCBkMV92MFtpOiAwIH4gMTVdLCBzdW1fdjApOwogICAg ICAgc3VtX3YxID0gc3VtX3YxOyAgLy8gY29weQogICAgICAgc3VtX3YyID0gc3VtX3YyOyAgLy8g Y29weQogICAgICAgc3VtX3YzID0gc3VtX3YzOyAgLy8gY29weQoKICAgICAgIHN1bV92MCA9IFdJ REVOX1NVTSAod192MFtpOiAwIH4gMTVdLCBzdW1fdjApOwogICAgICAgc3VtX3YxID0gc3VtX3Yx OyAgLy8gY29weQogICAgICAgc3VtX3YyID0gc3VtX3YyOyAgLy8gY29weQogICAgICAgc3VtX3Yz ID0gc3VtX3YzOyAgLy8gY29weQoKICAgICAgIHN1bV92MCA9IFNBRCAoczBfdjBbaTogMCB+IDcg XSwgczFfdjBbaTogMCB+IDcgXSwgc3VtX3YwKTsKICAgICAgIHN1bV92MSA9IFNBRCAoczBfdjFb aTogOCB+IDE1XSwgczFfdjFbaTogOCB+IDE1XSwgc3VtX3YxKTsKICAgICAgIHN1bV92MiA9IHN1 bV92MjsgIC8vIGNvcHkKICAgICAgIHN1bV92MyA9IHN1bV92MzsgIC8vIGNvcHkKICAgICB9CgpG b3IgYSBoaWdoZXIgaW5zdHJ1Y3Rpb24gcGFyYWxsZWxpc20gaW4gZmluYWwgdmVjdG9yaXplZCBs b29wLCBhbiBvcHRpbWFsCm1lYW5zIGlzIHRvIG1ha2UgdGhvc2UgZWZmZWN0aXZlIHZlY3Rvcml6 ZWQgbGFuZS1yZWR1Y2luZyBzdGF0ZW1lbnRzIGJlCmRpc3RyaWJ1dGVkIGV2ZW5seSBhbW9uZyBh bGwgZGVmLXVzZSBjeWNsZXMuIFRyYW5zZm9ybWVkIGFzIHRoZSBiZWxvdywKRE9UX1BST0QsIFdJ REVOX1NVTSBhbmQgU0FEcyBhcmUgZ2VuZXJhdGVkIGludG8gZGlzcGFyYXRlIGN5Y2xlcywKaW5z dHJ1Y3Rpb24gZGVwZW5kZW5jeSBjb3VsZCBiZSBlbGltaW5hdGVkLgoKICAgZm9yIChpIC8gMTYp CiAgICAgewogICAgICAgc3VtX3YwID0gRE9UX1BST0QgKGQwX3YwW2k6IDAgfiAxNV0sIGQxX3Yw W2k6IDAgfiAxNV0sIHN1bV92MCk7CiAgICAgICBzdW1fdjEgPSBzdW1fdjE7ICAvLyBjb3B5CiAg ICAgICBzdW1fdjIgPSBzdW1fdjI7ICAvLyBjb3B5CiAgICAgICBzdW1fdjMgPSBzdW1fdjM7ICAv LyBjb3B5CgogICAgICAgc3VtX3YwID0gc3VtX3YwOyAgLy8gY29weQogICAgICAgc3VtX3YxID0g V0lERU5fU1VNICh3X3YxW2k6IDAgfiAxNV0sIHN1bV92MSk7CiAgICAgICBzdW1fdjIgPSBzdW1f djI7ICAvLyBjb3B5CiAgICAgICBzdW1fdjMgPSBzdW1fdjM7ICAvLyBjb3B5CgogICAgICAgc3Vt X3YwID0gc3VtX3YwOyAgLy8gY29weQogICAgICAgc3VtX3YxID0gc3VtX3YxOyAgLy8gY29weQog ICAgICAgc3VtX3YyID0gU0FEIChzMF92MltpOiAwIH4gNyBdLCBzMV92MltpOiAwIH4gNyBdLCBz dW1fdjIpOwogICAgICAgc3VtX3YzID0gU0FEIChzMF92M1tpOiA4IH4gMTVdLCBzMV92M1tpOiA4 IH4gMTVdLCBzdW1fdjMpOwogICAgIH0KCjIwMjQtMDMtMjIgRmVuZyBYdWUgPGZ4dWVAb3MuYW1w ZXJlY29tcHV0aW5nLmNvbT4KCmdjYy8KCVBSIHRyZWUtb3B0aW1pemF0aW9uLzExNDQ0MAoJKiB0 cmVlLXZlY3Rvcml6ZXIuaCAoc3RydWN0IF9zdG10X3ZlY19pbmZvKTogQWRkIGEgbmV3IGZpZWxk CglyZWR1Y19yZXN1bHRfcG9zLgoJKiB0cmVlLXZlY3QtbG9vcC5jYyAodmVjdF90cmFuc2Zvcm1f cmVkdWN0aW9uKTogR2VuZXJhdGUgbGFuZS1yZWR1Y2luZwoJc3RhdGVtZW50cyBpbiBhbiBvcHRp bWl6ZWQgb3JkZXIuCi0tLQogZ2NjL3RyZWUtdmVjdC1sb29wLmNjIHwgNTEgKysrKysrKysrKysr KysrKysrKysrKysrKysrKysrKysrKysrKystLS0tLQogZ2NjL3RyZWUtdmVjdG9yaXplci5oIHwg IDYgKysrKysKIDIgZmlsZXMgY2hhbmdlZCwgNTEgaW5zZXJ0aW9ucygrKSwgNiBkZWxldGlvbnMo LSkKCmRpZmYgLS1naXQgYS9nY2MvdHJlZS12ZWN0LWxvb3AuY2MgYi9nY2MvdHJlZS12ZWN0LWxv b3AuY2MKaW5kZXggZmI5MjU5ZDExNWMuLmRlN2E5YmFiOTkwIDEwMDY0NAotLS0gYS9nY2MvdHJl ZS12ZWN0LWxvb3AuY2MKKysrIGIvZ2NjL3RyZWUtdmVjdC1sb29wLmNjCkBAIC04NzM0LDcgKzg3 MzQsOCBAQCB2ZWN0X3RyYW5zZm9ybV9yZWR1Y3Rpb24gKGxvb3BfdmVjX2luZm8gbG9vcF92aW5m bywKICAgICB9CiAKICAgYm9vbCBzaW5nbGVfZGVmdXNlX2N5Y2xlID0gU1RNVF9WSU5GT19GT1JD RV9TSU5HTEVfQ1lDTEUgKHJlZHVjX2luZm8pOwotICBnY2NfYXNzZXJ0IChzaW5nbGVfZGVmdXNl X2N5Y2xlIHx8IGxhbmVfcmVkdWNpbmdfb3BfcCAoY29kZSkpOworICBib29sIGxhbmVfcmVkdWNp bmcgPSBsYW5lX3JlZHVjaW5nX29wX3AgKGNvZGUpOworICBnY2NfYXNzZXJ0IChzaW5nbGVfZGVm dXNlX2N5Y2xlIHx8IGxhbmVfcmVkdWNpbmcpOwogCiAgIC8qIENyZWF0ZSB0aGUgZGVzdGluYXRp b24gdmVjdG9yICAqLwogICB0cmVlIHNjYWxhcl9kZXN0ID0gZ2ltcGxlX2dldF9saHMgKHN0bXRf aW5mby0+c3RtdCk7CkBAIC04NzUxLDYgKzg3NTIsOCBAQCB2ZWN0X3RyYW5zZm9ybV9yZWR1Y3Rp b24gKGxvb3BfdmVjX2luZm8gbG9vcF92aW5mbywKICAgICB9CiAgIGVsc2UKICAgICB7CisgICAg ICBpbnQgcmVzdWx0X3BvcyA9IDA7CisKICAgICAgIC8qIFRoZSBpbnB1dCB2ZWN0eXBlIG9mIHRo ZSByZWR1Y3Rpb24gUEhJIGRldGVybWluZXMgY29waWVzIG9mCiAJIHZlY3Rvcml6ZWQgZGVmLXVz ZSBjeWNsZXMsIHdoaWNoIG1pZ2h0IGJlIG1vcmUgdGhhbiBlZmZlY3RpdmUgY29waWVzCiAJIG9m IHZlY3Rvcml6ZWQgbGFuZS1yZWR1Y2luZyByZWR1Y3Rpb24gc3RhdGVtZW50cy4gIFRoaXMgY291 bGQgYmUKQEAgLTg3ODAsOSArODc4Myw5IEBAIHZlY3RfdHJhbnNmb3JtX3JlZHVjdGlvbiAobG9v cF92ZWNfaW5mbyBsb29wX3ZpbmZvLAogCSAgICAgICBzdW1fdjIgPSBzdW1fdjI7ICAvLyBjb3B5 CiAJICAgICAgIHN1bV92MyA9IHN1bV92MzsgIC8vIGNvcHkKIAotCSAgICAgICBzdW1fdjAgPSBT QUQgKHMwX3YwW2k6IDAgfiA3IF0sIHMxX3YwW2k6IDAgfiA3IF0sIHN1bV92MCk7Ci0JICAgICAg IHN1bV92MSA9IFNBRCAoczBfdjFbaTogOCB+IDE1XSwgczFfdjFbaTogOCB+IDE1XSwgc3VtX3Yx KTsKLQkgICAgICAgc3VtX3YyID0gc3VtX3YyOyAgLy8gY29weQorCSAgICAgICBzdW1fdjAgPSBz dW1fdjA7ICAvLyBjb3B5CisJICAgICAgIHN1bV92MSA9IFNBRCAoczBfdjFbaTogMCB+IDcgXSwg czFfdjFbaTogMCB+IDcgXSwgc3VtX3YxKTsKKwkgICAgICAgc3VtX3YyID0gU0FEIChzMF92Mltp OiA4IH4gMTVdLCBzMV92MltpOiA4IH4gMTVdLCBzdW1fdjIpOwogCSAgICAgICBzdW1fdjMgPSBz dW1fdjM7ICAvLyBjb3B5CiAKIAkgICAgICAgc3VtX3YwICs9IG5fdjBbaTogMCAgfiAzIF07CkBA IC04NzkwLDcgKzg3OTMsMjAgQEAgdmVjdF90cmFuc2Zvcm1fcmVkdWN0aW9uIChsb29wX3ZlY19p bmZvIGxvb3BfdmluZm8sCiAJICAgICAgIHN1bV92MiArPSBuX3YyW2k6IDggIH4gMTFdOwogCSAg ICAgICBzdW1fdjMgKz0gbl92M1tpOiAxMiB+IDE1XTsKIAkgICAgIH0KLQkqLworCisJIE1vcmVv dmVyLCBmb3IgYSBoaWdoZXIgaW5zdHJ1Y3Rpb24gcGFyYWxsZWxpc20gaW4gZmluYWwgdmVjdG9y aXplZAorCSBsb29wLCBpdCBpcyBjb25zaWRlcmVkIHRvIG1ha2UgdGhvc2UgZWZmZWN0aXZlIHZl Y3Rvcml6ZWQKKwkgbGFuZS1yZWR1Y2luZyBzdGF0ZW1lbnRzIGJlIGRpc3RyaWJ1dGVkIGV2ZW5s eSBhbW9uZyBhbGwgZGVmLXVzZQorCSBjeWNsZXMuIEluIHRoZSBhYm92ZSBleGFtcGxlLCBTQURz IGFyZSBnZW5lcmF0ZWQgaW50byBvdGhlciBjeWNsZXMKKwkgcmF0aGVyIHRoYW4gdGhhdCBvZiBE T1RfUFJPRC4gICovCisKKyAgICAgIGlmIChzdG10X25jb3BpZXMgPCBuY29waWVzKQorCXsKKwkg IGdjY19hc3NlcnQgKGxhbmVfcmVkdWNpbmcpOworCSAgcmVzdWx0X3BvcyA9IHJlZHVjX2luZm8t PnJlZHVjX3Jlc3VsdF9wb3M7CisJICByZWR1Y19pbmZvLT5yZWR1Y19yZXN1bHRfcG9zID0gKHJl c3VsdF9wb3MgKyBzdG10X25jb3BpZXMpICUgbmNvcGllczsKKwkgIGdjY19hc3NlcnQgKHJlc3Vs dF9wb3MgPj0gMCAmJiByZXN1bHRfcG9zIDwgbmNvcGllcyk7CisJfQogCiAgICAgICBmb3IgKGkg PSAwOyBpIDwgTUlOICgzLCAoaW50KSBvcC5udW1fb3BzKTsgaSsrKQogCXsKQEAgLTg4MjYsNyAr ODg0MiwzMCBAQCB2ZWN0X3RyYW5zZm9ybV9yZWR1Y3Rpb24gKGxvb3BfdmVjX2luZm8gbG9vcF92 aW5mbywKIAkJCQkJICAgb3Aub3BzW2ldLCAmdmVjX29wcm5kc1tpXSwgdmVjdHlwZSk7CiAKIAkg IGlmICh1c2VkX25jb3BpZXMgPCBuY29waWVzKQotCSAgICB2ZWNfb3BybmRzW2ldLnNhZmVfZ3Jv d19jbGVhcmVkIChuY29waWVzKTsKKwkgICAgeworCSAgICAgIHZlY19vcHJuZHNbaV0uc2FmZV9n cm93X2NsZWFyZWQgKG5jb3BpZXMpOworCisJICAgICAgLyogRmluZCBzdWl0YWJsZSBkZWYtdXNl IGN5Y2xlcyB0byBnZW5lcmF0ZSB2ZWN0b3JpemVkCisJCSBzdGF0ZW1lbnRzIGludG8sIGFuZCBy ZW9yZGVyIG9wZXJhbmRzIGJhc2VkIG9uIHRoZQorCQkgc2VsZWN0aW9uLiAgKi8KKwkgICAgICBp ZiAoaSAhPSByZWR1Y19pbmRleCAmJiByZXN1bHRfcG9zKQorCQl7CisJCSAgaW50IGNvdW50ID0g bmNvcGllcyAtIHVzZWRfbmNvcGllczsKKwkJICBpbnQgc3RhcnQgPSByZXN1bHRfcG9zIC0gY291 bnQ7CisKKwkJICBpZiAoc3RhcnQgPCAwKQorCQkgICAgeworCQkgICAgICBjb3VudCA9IHJlc3Vs dF9wb3M7CisJCSAgICAgIHN0YXJ0ID0gMDsKKwkJICAgIH0KKworCQkgIGZvciAoaW50IGogPSB1 c2VkX25jb3BpZXMgLSAxOyBqID49IHN0YXJ0OyBqLS0pCisJCSAgICB7CisJCSAgICAgIHN0ZDo6 c3dhcCAodmVjX29wcm5kc1tpXVtqXSwgdmVjX29wcm5kc1tpXVtqICsgY291bnRdKTsKKwkJICAg ICAgZ2NjX2Fzc2VydCAoIXZlY19vcHJuZHNbaV1bal0pOworCQkgICAgfQorCQl9CisJICAgIH0K IAl9CiAgICAgfQogCmRpZmYgLS1naXQgYS9nY2MvdHJlZS12ZWN0b3JpemVyLmggYi9nY2MvdHJl ZS12ZWN0b3JpemVyLmgKaW5kZXggM2Y3ZGI3MDdkOTcuLmI5YmM5ZDQzMmVlIDEwMDY0NAotLS0g YS9nY2MvdHJlZS12ZWN0b3JpemVyLmgKKysrIGIvZ2NjL3RyZWUtdmVjdG9yaXplci5oCkBAIC0x NDAyLDYgKzE0MDIsMTIgQEAgcHVibGljOgogICAvKiBUaGUgdmVjdG9yIHR5cGUgZm9yIHBlcmZv cm1pbmcgdGhlIGFjdHVhbCByZWR1Y3Rpb24uICAqLwogICB0cmVlIHJlZHVjX3ZlY3R5cGU7CiAK KyAgLyogRm9yIGxvb3AgcmVkdWN0aW9uIHdpdGggbXVsdGlwbGUgdmVjdG9yaXplZCByZXN1bHRz IChuY29waWVzID4gMSksIGEKKyAgICAgbGFuZS1yZWR1Y2luZyBvcGVyYXRpb24gcGFydGljaXBh dGluZyBpbiBpdCBtYXkgbm90IHVzZSBhbGwgb2YgdGhvc2UKKyAgICAgcmVzdWx0cywgdGhpcyBm aWVsZCBzcGVjaWZpZXMgcmVzdWx0IGluZGV4IHN0YXJ0aW5nIGZyb20gd2hpY2ggYW55CisgICAg IGZvbGxvd2luZyBsYW5kLXJlZHVjaW5nIG9wZXJhdGlvbiB3b3VsZCBiZSBhc3NpZ25lZCB0by4g ICovCisgIGludCByZWR1Y19yZXN1bHRfcG9zOworCiAgIC8qIElmIElTX1JFRFVDX0lORk8gaXMg dHJ1ZSBhbmQgaWYgdGhlIHZlY3RvciBjb2RlIGlzIHBlcmZvcm1pbmcKICAgICAgTiBzY2FsYXIg cmVkdWN0aW9ucyBpbiBwYXJhbGxlbCwgdGhpcyB2YXJpYWJsZSBnaXZlcyB0aGUgaW5pdGlhbAog ICAgICBzY2FsYXIgdmFsdWVzIG9mIHRob3NlIE4gcmVkdWN0aW9ucy4gICovCi0tIAoyLjE3LjEK Cg== --_002_LV2PR01MB78394902F92CC907812FB837F7C22LV2PR01MB7839prod_--