From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from NAM12-DM6-obe.outbound.protection.outlook.com (mail-dm6nam12on2122.outbound.protection.outlook.com [40.107.243.122]) by sourceware.org (Postfix) with ESMTPS id 599743858C52 for ; Thu, 14 Sep 2023 12:43:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 599743858C52 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=os.amperecomputing.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=os.amperecomputing.com ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=VLsMR0HlyKvujQWKxVZLcNIfXID7axsToQDzwl/8tee1EX95PYJcMYvc6+W6RUt3mi30lt2UMZGdW075JGc8Tuf6kJvB57FG3gtudHu738MjbZAOYKu+sI28no3WRSSZPODP6LPlM35KrU524UjXLNNYHTa4zixVhyi1AiJSa8y7mUc/VlSxh11+gImNezR1Bgx9lPCseqOqfHaxj7XOrU0x91qFH0z03FJqEdIeGpjDFZLwPdhL5Pypw6Czro9BsADXvFC/NXzL+g8C2XujHhZev99t5SLQfKzeL2/RnjW0Q/INHNfWFcbyCLX8Bmr3MOMBE2RvRaJJQAxC+yooRA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=cPRvT/cPawx+8o65tRwJGP/Hvfz+jIXHWTpt+ypgg7s=; b=bECnJpxhIghrdACAIPCMbrkJrIAqeWwFQnX1Pv7VlJwREeeM/en491/nMLxeLlciJcj7OgU2iOwFV6GJeBgH2+o0GMJRxvlYftVwPhu2+S6zMR0xWUPeaf0u5ptOKqsi9Yq94kyieXgf8kq2RXhyWLtSE80M3AkNvSpUH/aiE2mbduvZX4WrZaswuKAUGFuVeeMdCsFnQttbFk58lcKds4eBnEviqw+mPlVpbkvycLWFs9zTiUxSXbFjpHDdVFpWhRokq/v9h7ZWsKHJ2TmpgLSxz2053/0pUSjCEztkIDkFL8mIHng47geR4/QCpG4JAiCFnGBBd7TBfq2ZdjVKcQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=os.amperecomputing.com; dmarc=pass action=none header.from=os.amperecomputing.com; dkim=pass header.d=os.amperecomputing.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=os.amperecomputing.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=cPRvT/cPawx+8o65tRwJGP/Hvfz+jIXHWTpt+ypgg7s=; b=A1cPDcx/EsShgbwBK25vwWEuSNV1CXceHO3aog9STBYoMoLd9JzJhEyCdDSjDgO0PNVFwP+fQ6gHc3+lF1FO4r5J15tKhcOfED36/Y4yQH7DYfSTFDW8uqUJyDKWkK893Xbue5ooQ729GzwPmfgWdrO5ch5lkyiCwm5rCIPX7K4= Received: from SN6PR01MB4240.prod.exchangelabs.com (2603:10b6:805:ae::22) by PH0PR01MB6780.prod.exchangelabs.com (2603:10b6:510:76::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6792.20; Thu, 14 Sep 2023 12:43:32 +0000 Received: from SN6PR01MB4240.prod.exchangelabs.com ([fe80::8644:bbc6:dd29:dd02]) by SN6PR01MB4240.prod.exchangelabs.com ([fe80::8644:bbc6:dd29:dd02%7]) with mapi id 15.20.6768.036; Thu, 14 Sep 2023 12:43:32 +0000 From: Di Zhao OS To: "gcc-patches@gcc.gnu.org" CC: Richard Biener Subject: [PATCH v4] [tree-optimization/110279] Consider FMA in get_reassociation_width Thread-Topic: [PATCH v4] [tree-optimization/110279] Consider FMA in get_reassociation_width Thread-Index: AdnnBSPtK+NgbkElQ86FLpG1NcmCuQ== Date: Thu, 14 Sep 2023 12:43:31 +0000 Message-ID: Accept-Language: en-US Content-Language: zh-CN X-MS-Has-Attach: yes X-MS-TNEF-Correlator: msip_labels: MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_ActionId=a9cb376c-1186-400c-b3b8-1cee763b6a6a;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_ContentBits=0;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Enabled=true;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Method=Standard;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Name=Confidential (Default);MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SetDate=2023-09-14T11:33:13Z;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SiteId=3bc2b170-fd94-476d-b0ce-4229bdc904a7; authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=os.amperecomputing.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: SN6PR01MB4240:EE_|PH0PR01MB6780:EE_ x-ms-office365-filtering-correlation-id: 6f3e6850-c244-4ced-7d26-08dbb52033d0 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: 0ZmU6VeqB21Af4cI9uDsmINb3YjdsvfXIcm3X7Z2D+mEstcdvkdzGCzD1ggZA++REdna45BIkA32wMooG6tT+MwdRGKIYNr6bxw7pfs+nUP4PJbae949i+4da2uIxJIrx8a9k0twCY01sVdoympiXiqTsujXWHKg3Pq/vcVrOJ703NeD5oXTbVvydisVg18DfVVJS9efFOuL/KqOhtkwOu6VHer/UVkcgmRm/dxrsgVz4XOOQ21OyFHVoUyfmwUZ3pfHJBo9jy44sCqDs8DJE5S3oy2jYHwwVjyy+x53cTG+VDVK9/b0iGxOscEjD7revjMbBN7HeL3GDWdFCRfDp0ju0KljdShMeN07kS28PPpDkFVQHfA2cl7u6u0wUnyHFrXFNMR8We6CL4w3xGiUnT7tDL2yHlmmXrpAHTfypYWP8mhpsJnoKYD86ku8C9/1UkVWNXjuL0p4TcZpFqvt1pp1O7oc5iJ/uAvPq46LiAjYf6BTBgeG67xqWKZIHl1/lQrDpzj/ytTphMLEkowQTZ5iPFjxH5uzFCWvpQwAwX/JtsPon5n7RDB8UsyGJxr72uDjThuL4ql/avzscYaxZ2yWeRCS3U2xQ9IRdoGReKPluzgn8q3qXaCohLDM1d+8j4jIqWGFUSu5WcMdW84mpw== x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:SN6PR01MB4240.prod.exchangelabs.com;PTR:;CAT:NONE;SFS:(13230031)(376002)(136003)(366004)(346002)(396003)(39850400004)(186009)(1800799009)(451199024)(38100700002)(38070700005)(86362001)(33656002)(84970400001)(66556008)(55016003)(478600001)(52536014)(26005)(71200400001)(5660300002)(76116006)(66476007)(64756008)(6506007)(7696005)(6916009)(66946007)(316002)(66446008)(9686003)(41300700001)(4326008)(8676002)(8936002)(2906002)(99936003)(122000001)(83380400001);DIR:OUT;SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?D9Pa2OH/zpYzLFnU1CFukC+rCcYw/LlnIsnwts/cwDQO8aV/W9znEo06NsS3?= =?us-ascii?Q?cITC8cqvVX70k+vqV28BKzZbQ3B5Lpyj8BVpyI/WZbASL04ROk+6AOQLwxYj?= =?us-ascii?Q?KoDoKtJT9HuGOsmCKNve8c0kGF7yOZl4LSxOMk314VRGbFzYankJys8LU/t6?= =?us-ascii?Q?6iEPau4X/VHi6/kJzBLdyNciA6T3O8y/MEEbPodaaYEA97vICQswrLuJ6mPd?= =?us-ascii?Q?mZYH3Ud3Q1f5IEHl5gWTkPk25rZ108p2WEvi2QEJ4cof71DVrISl2m9gqCCS?= =?us-ascii?Q?iu1CZY0FXbzWt6bdVL8ilKbjKI3dssgfY1xuUeMGwy7dX0IH2gJPoIDIe5Es?= =?us-ascii?Q?FHSWiIxTXO0+B377zKBxjDrOnr5ijvJvpq//eIc9jKT4adxwbAulzDR3pVu0?= =?us-ascii?Q?NDxT9xQQ9+yGvUclEO+m1KKE35QT4USfICIINtdBDZqRc4uLmu931mI52et/?= =?us-ascii?Q?Jf7Be0ppntf2IMRjK8VVhcCpSBfGW7PfqqZp8sonj2WCTvrZhuoBwhdRUJ4E?= =?us-ascii?Q?dTlGgz3m1YvJ2ha4LIiPUK2A0xzNAUZfVUwAeEAz00ylbLAExi1grdSyTECf?= =?us-ascii?Q?PCBkMTITIlTs4/9setSOr/uzukkj2cqozNEDAiSod+DvKvh7Gibi5G2T3F01?= =?us-ascii?Q?rU7uHFCwj8AbgQUKfBylDFuyn7XJtzf6Sxduv6bvUjU7beTXSK74riBuV+1/?= =?us-ascii?Q?+tw1FalJCI4xtl6ey7Rjb70gRzCmEQDTIJA235CUaZDOAxYNA4gHrFm1bxKW?= =?us-ascii?Q?EaJ+L9Xf7A3NxRwBZtIfjxhD2/zh9dfPB1r4YlBz1xxrrdCAGHnG09IBoIE5?= =?us-ascii?Q?Mtp4wdhjAnk3yUhefL6xaLgulO8rgvgAN7FJi5S5VRjOrzq6Y2R6tJt+q7za?= =?us-ascii?Q?uou+IJ++dU7bH26LUj4mfu6HvPPDsJ5F3R7h7J8d2bHfEOWiUpfPEo2zTN+i?= =?us-ascii?Q?Fu1OwTzeJhEZKBDCrJBf+2ne7ExPBpTCJjiE8pVlAKqzng10XG5/8imKa/Qy?= =?us-ascii?Q?HdNtbYn6mUBV8qTWkOcLLJCcvwdoE6aTFxZkmM1hI5IIuFC/B2v0idLUUDs4?= =?us-ascii?Q?dAGbygIGycD3oDfQgryGhMflyObK9YF2l9jgKHo1fktzzxflbIqA0N0qGW79?= =?us-ascii?Q?E5qjmrpg5Gu+Yqao922SobsYBnm+CiMGqGpZWqsUoDsJPwBBqlycvRDaTrf6?= =?us-ascii?Q?3H/Hh2KpcYBWDLcV9Re36SL9nYbNjJRNTbU7CMgnSjIEnPW4w6g5ASIM6bY+?= =?us-ascii?Q?8DKdhHMojbmbrH5HghpIV4b1hkTHfMULb0V22dL9MfdLfv8HL3FLmhgBGczP?= =?us-ascii?Q?IwLAJcoLuceB1wqhB+28pcZlF7KNejI7vCH1NE52GFnRxzKjE6GUDCmqZCrs?= =?us-ascii?Q?x8B1E6xYBHzxDtCB14Ykw/wBER7FPkCwU3XNWFb0o7odA5ST13tuQ8PEWUnz?= =?us-ascii?Q?ffaMKqRhqOV7cULiHwVslnWVLQ5SmOXagJ7c64+WBECat0JfgV5OVaicrTsM?= =?us-ascii?Q?yc2Y/GF0HOQI+DMJvN+0Xi5Yao6PcBw2xC2aSXcpeDGE2cVxp23e4jtUT2Ol?= =?us-ascii?Q?TnUgpgAiWQM5W8ooH6eZHvkUS2gUAuMZeesKiE8r2USlCpJjqDkEqnlBSaIR?= =?us-ascii?Q?Qg=3D=3D?= Content-Type: multipart/mixed; boundary="_002_SN6PR01MB4240D8EE68CF46689F9C243AE8F7ASN6PR01MB4240prod_" MIME-Version: 1.0 X-OriginatorOrg: os.amperecomputing.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: SN6PR01MB4240.prod.exchangelabs.com X-MS-Exchange-CrossTenant-Network-Message-Id: 6f3e6850-c244-4ced-7d26-08dbb52033d0 X-MS-Exchange-CrossTenant-originalarrivaltime: 14 Sep 2023 12:43:31.5854 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 3bc2b170-fd94-476d-b0ce-4229bdc904a7 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: DIwj87X8V6ziYg5U1LamjBI/P0D9rIlB3ji+AvQXG+bG9Y/fu/lzDgrv95wsAw8BVi1qVtXAPtDCzg3vtW71QwoUAjuanB4/DjdNr2OMtavupRdwfS4UPizeQfMQG327 X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH0PR01MB6780 X-Spam-Status: No, score=-3.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --_002_SN6PR01MB4240D8EE68CF46689F9C243AE8F7ASN6PR01MB4240prod_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable This is a new version of the patch on "nested FMA". Sorry for updating this after so long, I've been studying and writing micro cases to sort out the cause of the regression. First, following previous discussion: (https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629080.html) 1. From testing more altered cases, I don't think the problem is that reassociation works locally. In that: 1) On the example with multiplications: =09 tmp1 =3D a + c * c + d * d + x * y; tmp2 =3D x * tmp1; result +=3D (a + c + d + tmp2); Given "result" rewritten by width=3D2, the performance is worse if we rewrite "tmp1" with width=3D2. In contrast, if we remove the multiplications from the example (and make "tmp1" not singe used), and still rewrite "result" by width=3D2, then rewriting "tmp1" with width=3D2 is better. (Make sense because the tree's depth at "result" is still smaller if we rewrite "tmp1".) 2) I tried to modify the assembly code of the example without FMA, so the width of "result" is 4. On Ampere1 there's no obvious improvement. So although this is an interesting problem, it doesn't seem like the cause of the regression. 2. From assembly code of the case with FMA, one problem is that, rewriting "tmp1" to parallel didn't decrease the minimum CPU cycles (taking MULT_EXPRs into account), but increased code size, so the overhead is increased. a) When "tmp1" is not re-written to parallel: fmadd d31, d2, d2, d30 fmadd d31, d3, d3, d31 fmadd d31, d4, d5, d31 //"tmp1" =20 fmadd d31, d31, d4, d3 b) When "tmp1" is re-written to parallel: fmul d31, d4, d5 =20 fmadd d27, d2, d2, d30=20 fmadd d31, d3, d3, d31=20 fadd d31, d31, d27 //"tmp1" fmadd d31, d31, d4, d3 For version a), there are 3 dependent FMAs to calculate "tmp1". For version b), there are also 3 dependent instructions in the longer path: the 1st, 3rd and 4th. So it seems to me the current get_reassociation_width algorithm isn't optimal in the presence of FMA. So I modified the patch to improve get_reassociation_width, rather than check for code patterns. (Although there could be some other complicated factors so the regression is more obvious when there's "nested FMA". But with this patch that should be avoided or reduced.) With this patch 508.namd_r 1-copy run has 7% improvement on Ampere1, on Intel Xeon there's about 3%. While I'm still collecting data on other CPUs, I'd like to know how do you think of this. About changes in the patch: 1. When the op list forms a complete FMA chain, try to search for a smaller width considering the benefit of using FMA. With a smaller width, the increment of code size is smaller when breaking the chain. 2. To avoid regressions, included the other patch (https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629203.html) on this tracker again. This is because more FMA will be kept with 1., so we need to rule out the loop dependent FMA chains when param_avoid_fma_max_bits is set. Thanks, Di Zhao ---- PR tree-optimization/110279 gcc/ChangeLog: * tree-ssa-reassoc.cc (rank_ops_for_better_parallelism_p): New function to check whether ranking the ops results in better parallelism. (get_reassociation_width): Add new parameters. Search for smaller width considering the benefit of FMA. (rank_ops_for_fma): Change return value to be number of MULT_EXPRs. (reassociate_bb): For 3 ops, refine the condition to call swap_ops_for_binary_stmt. gcc/testsuite/ChangeLog: * gcc.dg/pr110279.c: New test. --_002_SN6PR01MB4240D8EE68CF46689F9C243AE8F7ASN6PR01MB4240prod_ Content-Type: application/octet-stream; name="0001-Consider-FMA-in-get_reassociation_width.patch" Content-Description: 0001-Consider-FMA-in-get_reassociation_width.patch Content-Disposition: attachment; filename="0001-Consider-FMA-in-get_reassociation_width.patch"; size=9254; creation-date="Thu, 14 Sep 2023 12:21:17 GMT"; modification-date="Thu, 14 Sep 2023 12:43:30 GMT" Content-Transfer-Encoding: base64 RnJvbSAzNTMwOWZlYTAzMzQxMzk3N2E0ZTViOTI3YTI2ZGI3YjRjMTQ0MmU4IE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiAiZHpoYW8uYW1wZXJlIiA8ZGkuemhhb0BhbXBlcmVjb21wdXRp bmcuY29tPgpEYXRlOiBUaHUsIDE0IFNlcCAyMDIzIDE2OjQ4OjIwICswODAwClN1YmplY3Q6IFtQ QVRDSF0gQ29uc2lkZXIgRk1BIGluIGdldF9yZWFzc29jaWF0aW9uX3dpZHRoCgotLS0KIGdjYy90 ZXN0c3VpdGUvZ2NjLmRnL3ByMTEwMjc5LmMgfCAgNjIgKysrKysrKysrKysrKysKIGdjYy90cmVl LXNzYS1yZWFzc29jLmNjICAgICAgICAgfCAxNDcgKysrKysrKysrKysrKysrKysrKysrKysrKysr Ky0tLS0KIDIgZmlsZXMgY2hhbmdlZCwgMTk0IGluc2VydGlvbnMoKyksIDE1IGRlbGV0aW9ucygt KQogY3JlYXRlIG1vZGUgMTAwNjQ0IGdjYy90ZXN0c3VpdGUvZ2NjLmRnL3ByMTEwMjc5LmMKCmRp ZmYgLS1naXQgYS9nY2MvdGVzdHN1aXRlL2djYy5kZy9wcjExMDI3OS5jIGIvZ2NjL3Rlc3RzdWl0 ZS9nY2MuZGcvcHIxMTAyNzkuYwpuZXcgZmlsZSBtb2RlIDEwMDY0NAppbmRleCAwMDAwMDAwMDAw MC4uOWRjNzI2NThiZmYKLS0tIC9kZXYvbnVsbAorKysgYi9nY2MvdGVzdHN1aXRlL2djYy5kZy9w cjExMDI3OS5jCkBAIC0wLDAgKzEsNjIgQEAKKy8qIHsgZGctZG8gY29tcGlsZSB9ICovCisvKiB7 IGRnLW9wdGlvbnMgIi1PZmFzdCAtLXBhcmFtIGF2b2lkLWZtYS1tYXgtYml0cz01MTIgLS1wYXJh bSB0cmVlLXJlYXNzb2Mtd2lkdGg9NCAtZmR1bXAtdHJlZS13aWRlbmluZ19tdWwtZGV0YWlscyIg fSAqLworLyogeyBkZy1hZGRpdGlvbmFsLW9wdGlvbnMgIi1tYXJjaD1hcm12OC4yLWEiIH0gKi8K KworI2RlZmluZSBMT09QX0NPVU5UIDgwMDAwMDAwMAordHlwZWRlZiBkb3VibGUgZGF0YV9lOwor CisvKiBDaGVjayB0aGF0IEZNQXMgd2l0aCBiYWNrZWRnZSBkZXBlbmRlbmN5IGFyZSBhdm9pZGVk LiBPdGhlcndpc2UgdGhlcmUgd29uJ3QKKyAgIGJlIEZNQSBnZW5lcmF0ZWQgd2l0aCAiLS1wYXJh bSBhdm9pZC1mbWEtbWF4LWJpdHM9NTEyIi4gICAqLworCitmb28xIChkYXRhX2UgYSwgZGF0YV9l IGIsIGRhdGFfZSBjLCBkYXRhX2UgZCkKK3sKKyAgZGF0YV9lIHJlc3VsdCA9IDA7CisKKyAgZm9y IChpbnQgaWMgPSAwOyBpYyA8IExPT1BfQ09VTlQ7IGljKyspCisgICAgeworICAgICAgcmVzdWx0 ICs9IChhICogYiArIGMgKiBkKTsKKworICAgICAgYSAtPSAwLjE7CisgICAgICBiICs9IDAuOTsK KyAgICAgIGMgKj0gMS4wMjsKKyAgICAgIGQgKj0gMC42MTsKKyAgICB9CisKKyAgcmV0dXJuIHJl c3VsdDsKK30KKworZm9vMiAoZGF0YV9lIGEsIGRhdGFfZSBiLCBkYXRhX2UgYywgZGF0YV9lIGQp Cit7CisgIGRhdGFfZSByZXN1bHQgPSAwOworCisgIGZvciAoaW50IGljID0gMDsgaWMgPCBMT09Q X0NPVU5UOyBpYysrKQorICAgIHsKKyAgICAgIHJlc3VsdCArPSBhICogYiArIHJlc3VsdCArIGMg KiBkOworCisgICAgICBhIC09IDAuMTsKKyAgICAgIGIgKz0gMC45OworICAgICAgYyAqPSAxLjAy OworICAgICAgZCAqPSAwLjYxOworICAgIH0KKworICByZXR1cm4gcmVzdWx0OworfQorCitmb28z IChkYXRhX2UgYSwgZGF0YV9lIGIsIGRhdGFfZSBjLCBkYXRhX2UgZCkKK3sKKyAgZGF0YV9lIHJl c3VsdCA9IDA7CisKKyAgZm9yIChpbnQgaWMgPSAwOyBpYyA8IExPT1BfQ09VTlQ7IGljKyspCisg ICAgeworICAgICAgcmVzdWx0ICs9IHJlc3VsdCArIGEgKiBiICsgYyAqIGQ7CisKKyAgICAgIGEg LT0gMC4xOworICAgICAgYiArPSAwLjk7CisgICAgICBjICo9IDEuMDI7CisgICAgICBkICo9IDAu NjE7CisgICAgfQorCisgIHJldHVybiByZXN1bHQ7Cit9CisKKy8qIHsgZGctZmluYWwgeyBzY2Fu LXRyZWUtZHVtcC10aW1lcyAiR2VuZXJhdGVkIEZNQSIgMyAid2lkZW5pbmdfbXVsIn0gfSAqLwpk aWZmIC0tZ2l0IGEvZ2NjL3RyZWUtc3NhLXJlYXNzb2MuY2MgYi9nY2MvdHJlZS1zc2EtcmVhc3Nv Yy5jYwppbmRleCBlZGEwM2JmOThhNi4uOTRkYjExZWRkNGIgMTAwNjQ0Ci0tLSBhL2djYy90cmVl LXNzYS1yZWFzc29jLmNjCisrKyBiL2djYy90cmVlLXNzYS1yZWFzc29jLmNjCkBAIC01NDI3LDE3 ICs1NDI3LDk2IEBAIGdldF9yZXF1aXJlZF9jeWNsZXMgKGludCBvcHNfbnVtLCBpbnQgY3B1X3dp ZHRoKQogICByZXR1cm4gcmVzOwogfQogCisvKiBHaXZlbiB0aGF0IExIUyBpcyB0aGUgcmVzdWx0 IFNTQV9OQU1FIG9mIE9QUywgcmV0dXJucyB3aGV0aGVyIHJhbmtpbmcgdGhlIG9wcworICAgcmVz dWx0cyBpbiBiZXR0ZXIgcGFyYWxsZWxpc20uICAqLworc3RhdGljIGJvb2wKK3Jhbmtfb3BzX2Zv cl9iZXR0ZXJfcGFyYWxsZWxpc21fcCAodmVjPG9wZXJhbmRfZW50cnkgKj4gKm9wcywgdHJlZSBs aHMpCit7CisgIC8qIElmIHRoZXJlJ3MgY29kZSBsaWtlICJhY2MgPSBhICogYiArIGMgKiBkICsg YWNjIiBpbiBhIHRpZ2h0IGxvb3AsIHNvbWUKKyAgICAgdWFyY2hzIGNhbiBleGVjdXRlIHJlc3Vs dHMgbGlrZToKKworCV8xID0gYSAqIGI7CisJXzIgPSAuRk1BIChjLCBkLCBfMSk7CisJYWNjXzEg PSBhY2NfMCArIF8yOworCisgICAgIGluIHBhcmFsbGVsLCB3aGlsZSB0dXJuaW5nIGl0IGludG8K KworCV8xID0gLkZNQShhLCBiLCBhY2NfMCk7CisJYWNjXzEgPSAuRk1BKGMsIGQsIF8xKTsKKwor ICAgICBoaW5kZXJzIHRoYXQsIGJlY2F1c2UgdGhlbiB0aGUgZmlyc3QgRk1BIGRlcGVuZHMgb24g dGhlIHJlc3VsdCBvZiBwcmVjZWRpbmcKKyAgICAgaXRlcmF0aW9uLiAgKi8KKyAgaWYgKG1heWJl X2xlICh0cmVlX3RvX3BvbHlfaW50NjQgKFRZUEVfU0laRSAoVFJFRV9UWVBFIChsaHMpKSksCisJ CXBhcmFtX2F2b2lkX2ZtYV9tYXhfYml0cykpCisgICAgeworICAgICAgLyogTG9vayBmb3IgY3Jv c3MgYmFja2VkZ2UgZGVwZW5kZW5jeToKKwkxLiBMSFMgaXMgYSBwaGkgYXJndW1lbnQgaW4gdGhl IHNhbWUgYmFzaWMgYmxvY2sgaXQgaXMgZGVmaW5lZC4KKwkyLiBBbmQgdGhlIHJlc3VsdCBvZiB0 aGUgcGhpIG5vZGUgaXMgdXNlZCBpbiBPUFMuICAqLworICAgICAgYmFzaWNfYmxvY2sgYmIgPSBn aW1wbGVfYmIgKFNTQV9OQU1FX0RFRl9TVE1UIChsaHMpKTsKKyAgICAgIGdpbXBsZV9zdG10X2l0 ZXJhdG9yIGdzaTsKKyAgICAgIGZvciAoZ3NpID0gZ3NpX3N0YXJ0X3BoaXMgKGJiKTsgIWdzaV9l bmRfcCAoZ3NpKTsgZ3NpX25leHQgKCZnc2kpKQorCXsKKwkgIGdwaGkgKnBoaSA9IGR5bl9jYXN0 PGdwaGkgKj4gKGdzaV9zdG10IChnc2kpKTsKKwkgIGZvciAodW5zaWduZWQgaSA9IDA7IGkgPCBn aW1wbGVfcGhpX251bV9hcmdzIChwaGkpOyArK2kpCisJICAgIHsKKwkgICAgICB0cmVlIG9wID0g UEhJX0FSR19ERUYgKHBoaSwgaSk7CisJICAgICAgaWYgKCEob3AgPT0gbGhzICYmIGdpbXBsZV9w aGlfYXJnX2VkZ2UgKHBoaSwgaSktPnNyYyA9PSBiYikpCisJCWNvbnRpbnVlOworCSAgICAgIHRy ZWUgcGhpX3Jlc3VsdCA9IGdpbXBsZV9waGlfcmVzdWx0IChwaGkpOworCSAgICAgIG9wZXJhbmRf ZW50cnkgKm9lOworCSAgICAgIHVuc2lnbmVkIGludCBqOworCSAgICAgIEZPUl9FQUNIX1ZFQ19F TFQgKCpvcHMsIGosIG9lKQorCQl7CisJCSAgaWYgKFRSRUVfQ09ERSAob2UtPm9wKSAhPSBTU0Ff TkFNRSkKKwkJICAgIGNvbnRpbnVlOworCisJCSAgLyogUmVzdWx0IG9mIHBoaSBpcyBvcGVyYW5k IG9mIFBMVVNfRVhQUi4gICovCisJCSAgaWYgKG9lLT5vcCA9PSBwaGlfcmVzdWx0KQorCQkgICAg cmV0dXJuIHRydWU7CisKKwkJICAvKiBDaGVjayBpcyByZXN1bHQgb2YgcGhpIGlzIG9wZXJhbmQg b2YgTVVMVF9FWFBSLiAgKi8KKwkJICBnaW1wbGUgKmRlZl9zdG10ID0gU1NBX05BTUVfREVGX1NU TVQgKG9lLT5vcCk7CisJCSAgaWYgKGlzX2dpbXBsZV9hc3NpZ24gKGRlZl9zdG10KQorCQkgICAg ICAmJiBnaW1wbGVfYXNzaWduX3Joc19jb2RlIChkZWZfc3RtdCkgPT0gTkVHQVRFX0VYUFIpCisJ CSAgICB7CisJCSAgICAgIHRyZWUgcmhzID0gZ2ltcGxlX2Fzc2lnbl9yaHMxIChkZWZfc3RtdCk7 CisJCSAgICAgIGlmIChUUkVFX0NPREUgKHJocykgPT0gU1NBX05BTUUpCisJCQl7CisJCQkgIGlm IChyaHMgPT0gcGhpX3Jlc3VsdCkKKwkJCSAgICByZXR1cm4gdHJ1ZTsKKwkJCSAgZGVmX3N0bXQg PSBTU0FfTkFNRV9ERUZfU1RNVCAocmhzKTsKKwkJCX0KKwkJICAgIH0KKwkJICBpZiAoaXNfZ2lt cGxlX2Fzc2lnbiAoZGVmX3N0bXQpCisJCSAgICAgICYmIGdpbXBsZV9hc3NpZ25fcmhzX2NvZGUg KGRlZl9zdG10KSA9PSBNVUxUX0VYUFIpCisJCSAgICB7CisJCSAgICAgIGlmIChnaW1wbGVfYXNz aWduX3JoczEgKGRlZl9zdG10KSA9PSBwaGlfcmVzdWx0CisJCQkgIHx8IGdpbXBsZV9hc3NpZ25f cmhzMiAoZGVmX3N0bXQpID09IHBoaV9yZXN1bHQpCisJCQlyZXR1cm4gdHJ1ZTsKKwkJICAgIH0K KwkJfQorCSAgICB9CisJfQorICAgIH0KKworICByZXR1cm4gZmFsc2U7Cit9CisKIC8qIFJldHVy bnMgYW4gb3B0aW1hbCBudW1iZXIgb2YgcmVnaXN0ZXJzIHRvIHVzZSBmb3IgY29tcHV0YXRpb24g b2YKLSAgIGdpdmVuIHN0YXRlbWVudHMuICAqLworICAgZ2l2ZW4gc3RhdGVtZW50cy4KKworICAg TVVMVF9OVU0gaXMgdGhlIG51bWJlciBvZiBNVUxUX0VYUFJzIGluIE9QUy4gIExIUyBpcyB0aGUg cmVzdWx0IFNTQV9OQU1FIG9mCisgICB0aGUgb3BlcmF0b3JzLiAgKi8KIAogc3RhdGljIGludAot Z2V0X3JlYXNzb2NpYXRpb25fd2lkdGggKGludCBvcHNfbnVtLCBlbnVtIHRyZWVfY29kZSBvcGMs Ci0JCQkgbWFjaGluZV9tb2RlIG1vZGUpCitnZXRfcmVhc3NvY2lhdGlvbl93aWR0aCAodmVjPG9w ZXJhbmRfZW50cnkgKj4gKm9wcywgaW50IG11bHRfbnVtLCB0cmVlIGxocywKKwkJCSBlbnVtIHRy ZWVfY29kZSBvcGMsIG1hY2hpbmVfbW9kZSBtb2RlKQogewogICBpbnQgcGFyYW1fd2lkdGggPSBw YXJhbV90cmVlX3JlYXNzb2Nfd2lkdGg7CiAgIGludCB3aWR0aDsKICAgaW50IHdpZHRoX21pbjsK ICAgaW50IGN5Y2xlc19iZXN0OworICBpbnQgb3BzX251bSA9IG9wcy0+bGVuZ3RoICgpOwogCiAg IGlmIChwYXJhbV93aWR0aCA+IDApCiAgICAgd2lkdGggPSBwYXJhbV93aWR0aDsKQEAgLTU0Njgs NiArNTU0NywzNyBAQCBnZXRfcmVhc3NvY2lhdGlvbl93aWR0aCAoaW50IG9wc19udW0sIGVudW0g dHJlZV9jb2RlIG9wYywKIAlicmVhazsKICAgICB9CiAKKyAgLyogRm9yIGEgY29tcGxldGUgRk1B IGNoYWluLCByZXdyaXRpbmcgdG8gcGFyYWxsZWwgcmVkdWNlcyB0aGUgbnVtYmVyIG9mIEZNQSwK KyAgICAgc28gdGhlIGNvZGUgc2l6ZSBpbmNyZWFzZXMuICBDaGVjayBpZiBmZXdlciBwYXJ0aXRp b25zIHJlc3VsdHMgaW4gYmV0dGVyCisgICAgIChvciBzYW1lKSBjeWNsZSBudW1iZXIuICAqLwor ICBpZiAobXVsdF9udW0gPj0gb3BzX251bSAtIDEgJiYgd2lkdGggPiAxKQorICAgIHsKKyAgICAg IHdpZHRoX21pbiA9IDE7CisgICAgICB3aGlsZSAod2lkdGggPiB3aWR0aF9taW4pCisJeworCSAg aW50IHdpZHRoX21pZCA9ICh3aWR0aCArIHdpZHRoX21pbikgLyAyOworCSAgaW50IGVsb2cgPSBl eGFjdF9sb2cyICh3aWR0aF9taWQpOworCSAgZWxvZyA9IGVsb2cgPj0gMCA/IGVsb2cgOiBmbG9v cl9sb2cyICh3aWR0aF9taWQpICsgMTsKKwkgIGludCBhdHRlbXB0X2N5Y2xlcyA9IENFSUwgKG11 bHRfbnVtLCB3aWR0aF9taWQpICsgZWxvZzsKKwkgIC8qIFNpbmNlIENZQ0xFU19CRVNUIGRvZXNu J3QgY291bnQgdGhlIGNpcmNsZSBvZiBtdWx0aXBsaWNhdGlvbnMsCisJICAgICBjb21wYXJlIHdp dGggQ1lDTEVTX0JFU1QgKyAxLiAgKi8KKwkgIGlmIChjeWNsZXNfYmVzdCArIDEgPj0gYXR0ZW1w dF9jeWNsZXMpCisJICAgIHsKKwkgICAgICB3aWR0aCA9IHdpZHRoX21pZDsKKwkgICAgICBjeWNs ZXNfYmVzdCA9IGF0dGVtcHRfY3ljbGVzIC0gMTsKKwkgICAgfQorCSAgZWxzZSBpZiAod2lkdGhf bWluIDwgd2lkdGhfbWlkKQorCSAgICB3aWR0aF9taW4gPSB3aWR0aF9taWQ7CisJICBlbHNlCisJ ICAgIGJyZWFrOworCX0KKyAgICB9CisKKyAgLyogSWYgdGhlcmUncyBsb29wIGRlcGVuZGVudCBG TUEgcmVzdWx0LCByZXdyaXRlIHRvIGF2b2lkIHRoYXQuICBUaGlzIGlzCisgICAgIGJldHRlciB0 aGFuIHNraXBwaW5nIHRoZSBGTUEgY2FuZGlkYXRlcyBpbiB3aWRlbmluZ19tdWwuICAqLworICBp ZiAod2lkdGggPT0gMSAmJiBtdWx0X251bSAmJiByYW5rX29wc19mb3JfYmV0dGVyX3BhcmFsbGVs aXNtX3AgKG9wcywgbGhzKSkKKyAgICByZXR1cm4gMjsKKwogICByZXR1cm4gd2lkdGg7CiB9CiAK QEAgLTY3ODAsOCArNjg5MCwxMCBAQCB0cmFuc2Zvcm1fc3RtdF90b19tdWx0aXBseSAoZ2ltcGxl X3N0bXRfaXRlcmF0b3IgKmdzaSwgZ2ltcGxlICpzdG10LAogICAgUmVhcnJhbmdlIG9wcyB0byAt PiBlICsgYSAqIGIgKyBjICogZCBnZW5lcmF0ZXM6CiAKICAgIF80ICA9IC5GTUEgKGNfNyhEKSwg ZF84KEQpLCBfMyk7Ci0gICBfMTEgPSAuRk1BIChhXzUoRCksIGJfNihEKSwgXzQpOyAgKi8KLXN0 YXRpYyBib29sCisgICBfMTEgPSAuRk1BIChhXzUoRCksIGJfNihEKSwgXzQpOworCisgICBSZXR1 cm4gdGhlIHJldHVybiBudW1iZXIgb2YgTVVMVF9FWFBScyBpbiB0aGUgY2hhaW4uICAqLworc3Rh dGljIHVuc2lnbmVkCiByYW5rX29wc19mb3JfZm1hICh2ZWM8b3BlcmFuZF9lbnRyeSAqPiAqb3Bz KQogewogICBvcGVyYW5kX2VudHJ5ICpvZTsKQEAgLTY4MTMsNyArNjkyNSw4IEBAIHJhbmtfb3Bz X2Zvcl9mbWEgKHZlYzxvcGVyYW5kX2VudHJ5ICo+ICpvcHMpCiAgICAgIFB1dHRpbmcgb3BzIHRo YXQgbm90IGRlZiBmcm9tIG11bHQgaW4gZnJvbnQgY2FuIGdlbmVyYXRlIG1vcmUgRk1Bcy4KIAog ICAgICAyLiBJZiBhbGwgb3BzIGFyZSBkZWZpbmVkIHdpdGggbXVsdCwgd2UgZG9uJ3QgbmVlZCB0 byByZWFycmFuZ2UgdGhlbS4gICovCi0gIGlmIChvcHNfbXVsdC5sZW5ndGggKCkgPj0gMiAmJiBv cHNfbXVsdC5sZW5ndGggKCkgIT0gb3BzX2xlbmd0aCkKKyAgdW5zaWduZWQgbXVsdF9udW0gPSBv cHNfbXVsdC5sZW5ndGggKCk7CisgIGlmIChtdWx0X251bSA+PSAyICYmIG11bHRfbnVtICE9IG9w c19sZW5ndGgpCiAgICAgewogICAgICAgLyogUHV0IG5vLW11bHQgb3BzIGFuZCBtdWx0IG9wcyBh bHRlcm5hdGVseSBhdCB0aGUgZW5kIG9mIHRoZQogCSBxdWV1ZSwgd2hpY2ggaXMgY29uZHVjaXZl IHRvIGdlbmVyYXRpbmcgbW9yZSBGTUEgYW5kIHJlZHVjaW5nIHRoZQpAQCAtNjgyOSw5ICs2OTQy LDggQEAgcmFua19vcHNfZm9yX2ZtYSAodmVjPG9wZXJhbmRfZW50cnkgKj4gKm9wcykKIAkgIGlm IChvcGluZGV4ID4gMCkKIAkgICAgb3BpbmRleC0tOwogCX0KLSAgICAgIHJldHVybiB0cnVlOwog ICAgIH0KLSAgcmV0dXJuIGZhbHNlOworICByZXR1cm4gbXVsdF9udW07CiB9CiAvKiBSZWFzc29j aWF0ZSBleHByZXNzaW9ucyBpbiBiYXNpYyBibG9jayBCQiBhbmQgaXRzIHBvc3QtZG9taW5hdG9y IGFzCiAgICBjaGlsZHJlbi4KQEAgLTY5OTUsOSArNzEwNywxMCBAQCByZWFzc29jaWF0ZV9iYiAo YmFzaWNfYmxvY2sgYmIpCiAJICAgICAgZWxzZQogCQl7CiAJCSAgbWFjaGluZV9tb2RlIG1vZGUg PSBUWVBFX01PREUgKFRSRUVfVFlQRSAobGhzKSk7Ci0JCSAgaW50IG9wc19udW0gPSBvcHMubGVu Z3RoICgpOworCQkgIHVuc2lnbmVkIG9wc19udW0gPSBvcHMubGVuZ3RoICgpOwogCQkgIGludCB3 aWR0aDsKLQkJICBib29sIGhhc19mbWEgPSBmYWxzZTsKKwkJICAvKiBOdW1iZXIgb2YgTVVMVF9F WFBScyBpbiB0aGUgb3AgbGlzdC4gICovCisJCSAgdW5zaWduZWQgbXVsdF9udW0gPSAwOwogCiAJ CSAgLyogRm9yIGJpbmFyeSBiaXQgb3BlcmF0aW9ucywgaWYgdGhlcmUgYXJlIGF0IGxlYXN0IDMK IAkJICAgICBvcGVyYW5kcyBhbmQgdGhlIGxhc3Qgb3BlcmFuZCBpbiBPUFMgaXMgYSBjb25zdGFu dCwKQEAgLTcwMjAsMTYgKzcxMzMsMTggQEAgcmVhc3NvY2lhdGVfYmIgKGJhc2ljX2Jsb2NrIGJi KQogCQkJCQkJICAgICAgb3B0X3R5cGUpCiAJCSAgICAgICYmIChyaHNfY29kZSA9PSBQTFVTX0VY UFIgfHwgcmhzX2NvZGUgPT0gTUlOVVNfRVhQUikpCiAJCSAgICB7Ci0JCSAgICAgIGhhc19mbWEg PSByYW5rX29wc19mb3JfZm1hICgmb3BzKTsKKwkJICAgICAgbXVsdF9udW0gPSByYW5rX29wc19m b3JfZm1hICgmb3BzKTsKIAkJICAgIH0KIAogCQkgIC8qIE9ubHkgcmV3cml0ZSB0aGUgZXhwcmVz c2lvbiB0cmVlIHRvIHBhcmFsbGVsIGluIHRoZQogCQkgICAgIGxhc3QgcmVhc3NvYyBwYXNzIHRv IGF2b2lkIHVzZWxlc3Mgd29yayBiYWNrLWFuZC1mb3J0aAogCQkgICAgIHdpdGggaW5pdGlhbCBs aW5lYXJpemF0aW9uLiAgKi8KKwkJICBib29sIGhhc19mbWEgPSBtdWx0X251bSA+PSAyICYmIG11 bHRfbnVtICE9IG9wc19udW07CiAJCSAgaWYgKCFyZWFzc29jX2luc2VydF9wb3dpX3AKLQkJICAg ICAgJiYgb3BzLmxlbmd0aCAoKSA+IDMKLQkJICAgICAgJiYgKHdpZHRoID0gZ2V0X3JlYXNzb2Np YXRpb25fd2lkdGggKG9wc19udW0sIHJoc19jb2RlLAotCQkJCQkJCSAgIG1vZGUpKSA+IDEpCisJ CSAgICAgICYmIG9wc19udW0gPiAzCisJCSAgICAgICYmICh3aWR0aCA9IGdldF9yZWFzc29jaWF0 aW9uX3dpZHRoICgmb3BzLCBtdWx0X251bSwgbGhzLAorCQkJCQkJCSAgIHJoc19jb2RlLCBtb2Rl KSkKKwkJCSAgID4gMSkKIAkJICAgIHsKIAkJICAgICAgaWYgKGR1bXBfZmlsZSAmJiAoZHVtcF9m bGFncyAmIFRERl9ERVRBSUxTKSkKIAkJCWZwcmludGYgKGR1bXBfZmlsZSwKQEAgLTcwNDYsNyAr NzE2MSw5IEBAIHJlYXNzb2NpYXRlX2JiIChiYXNpY19ibG9jayBiYikKIAkJCSB0byBtYWtlIHN1 cmUgdGhlIG9uZXMgdGhhdCBnZXQgdGhlIGRvdWJsZQogCQkJIGJpbmFyeSBvcCBhcmUgY2hvc2Vu IHdpc2VseS4gICovCiAJCSAgICAgIGludCBsZW4gPSBvcHMubGVuZ3RoICgpOwotCQkgICAgICBp ZiAobGVuID49IDMgJiYgIWhhc19mbWEpCisJCSAgICAgIGlmIChsZW4gPj0gMworCQkJICAmJiAo IWhhc19mbWEKKwkJCSAgICAgIHx8IHJhbmtfb3BzX2Zvcl9iZXR0ZXJfcGFyYWxsZWxpc21fcCAo Jm9wcywgbGhzKSkpCiAJCQlzd2FwX29wc19mb3JfYmluYXJ5X3N0bXQgKG9wcywgbGVuIC0gMyk7 CiAKIAkJICAgICAgbmV3X2xocyA9IHJld3JpdGVfZXhwcl90cmVlIChzdG10LCByaHNfY29kZSwg MCwgb3BzLAotLSAKMi4yNS4xCgo= --_002_SN6PR01MB4240D8EE68CF46689F9C243AE8F7ASN6PR01MB4240prod_--