From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from NAM10-MW2-obe.outbound.protection.outlook.com (mail-mw2nam10on2071f.outbound.protection.outlook.com [IPv6:2a01:111:f403:2412::71f]) by sourceware.org (Postfix) with ESMTPS id A7A5B3871030 for ; Wed, 26 Jun 2024 14:54:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A7A5B3871030 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=os.amperecomputing.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=os.amperecomputing.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A7A5B3871030 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:2412::71f ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1719413664; cv=pass; b=eiRmSjsDykOHGsZ5NACT+CV2pm957ayVoPrXH5p/8NwKymO4B6TKhmnsZR6DcXIU2X/Z+gyI0t7LajYINXOSQjQqmMibZwYW2g82v2kXz+jK4LR1uV07ZrUp9qYF3mvLiesGYTJYtZNPjrUgVlsd0sDBmWLWYKbE8K+mqDLswUY= ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1719413664; c=relaxed/simple; bh=VRC6hMUZ1IBUCdwbgghOqC4LRzhkyoDouEosqM6chh0=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=g8bRHzMNMoNnuZS5OkGBYL2I1GbQsv4FmKI4DCZxeR6djcuqXJHMpYuv8m3CTDHxoZbUqL7Pr2NB3sUfszXIDarjlVhFzVdCuO54b3QNsHSweBm4lZ2GFRM2ytQsPcyf3mwMUBeaPLOeUSU1D70kV3SySxal6DFtQ4YeNNNxCgU= ARC-Authentication-Results: i=2; server2.sourceware.org ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Ypp6FbJms5dMhDBpgJxgOJ9BADwnFb0I+jBSOw7uiXqQ97NWIbhv9i6CSJwlJJQBlqw3CTDTqVcaFKQlw9P0h8NhRArvnHkN9eQr80xqbMZBeobJEv5NLQ80RtsYEY4kbRIFm8mFfsJYZjnjO2ibixmxYobMYK0HL7hMd3vU43lNCjPbaHuSboiEl++lBGzoMBB2OvcywmOJqZiPutwrwdBS48aTFmvBv5gl3ekGur69//gxP40eFkC0+Hl0oz7NH6AObqPXBBJ0+s8PlyXPuEbrwNahp+6w8dyxDhY3e7NUDnK3oLSONZjocEgVDhWVzJ+TiC74mK+1Wcq9Zw5U+w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Ht2cC7JXBE7cSEfj5aj1NtiB1PhFKuS4np0iZBrwzfQ=; b=Y9GWREg0mErz8Q38syB/r1qmOd/KSyeNSWF6RPqJM6maZ8aJVNjObv1Xya0CdFTRIJoSIfpqI7U+KPELQnCKJSoKSPusKUIL92KHqQHm4RWtB3rpA/qEZE2Lq8tOSeerMHbLOB4HbONSYh2h+yIVhjQvQDRUyZ8rIcZuhXNEZF7EgW/o0wazbpy4jaD5lgeE/mbOIxWSQAvgycrSI80r4LtBColKm5ERDKjEWVIFjKID79Xt9HLvSGsTHL8Qis/Z6/aUd60sM49QpyHU/AWtAWMCaxrzQNqhQZ+vxRpYzYt8NUD+7ybhR/etjlaeMbDAqLGDEhsVEg4rBKsxO+dKGg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=os.amperecomputing.com; dmarc=pass action=none header.from=os.amperecomputing.com; dkim=pass header.d=os.amperecomputing.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=os.amperecomputing.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Ht2cC7JXBE7cSEfj5aj1NtiB1PhFKuS4np0iZBrwzfQ=; b=lfJSfb3+xtirtkWYwfhdYB7LGUFXbkogI7/HsHfYzHRlSx/Ryl426AE55Za/tvCJGdXnn5rh/cBtvKHHdFrAPR9bhjanmBm2yVV+YwikcNVbX+tfSETYhVgT0/tOFaJ7kpNQvAY7QxrqmMydHA6xReqOZ6dZIQvtniX/rdNUkg8= Received: from LV2PR01MB7839.prod.exchangelabs.com (2603:10b6:408:14f::13) by CH0PR01MB7186.prod.exchangelabs.com (2603:10b6:610:f8::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7698.32; Wed, 26 Jun 2024 14:54:10 +0000 Received: from LV2PR01MB7839.prod.exchangelabs.com ([fe80::2ac3:5a77:36fd:9c63]) by LV2PR01MB7839.prod.exchangelabs.com ([fe80::2ac3:5a77:36fd:9c63%4]) with mapi id 15.20.7698.025; Wed, 26 Jun 2024 14:54:10 +0000 From: Feng Xue OS To: Richard Biener CC: "gcc-patches@gcc.gnu.org" Subject: Re: [PATCH 8/8] vect: Optimize order of lane-reducing statements in loop def-use cycles Thread-Topic: [PATCH 8/8] vect: Optimize order of lane-reducing statements in loop def-use cycles Thread-Index: AQHav786v2Z8y9EFd0WdWlutvvPXVrHQLuyTgAoB6c8= Date: Wed, 26 Jun 2024 14:54:10 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: msip_labels: MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Enabled=True;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SiteId=3bc2b170-fd94-476d-b0ce-4229bdc904a7;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SetDate=2024-06-26T14:54:10.212Z;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Name=Confidential;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_ContentBits=0;MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Method=Standard; authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=os.amperecomputing.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: LV2PR01MB7839:EE_|CH0PR01MB7186:EE_ x-ms-office365-filtering-correlation-id: c9647aa6-88a1-42fe-940e-08dc95efd692 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0;ARA:13230038|376012|366014|1800799022|38070700016; x-microsoft-antispam-message-info: =?us-ascii?Q?85KRqg45cBNMjH/e3O5PKUS9rNRXdMUqmMx9reMb/MkhKQyR/0iMecwmHRxQ?= =?us-ascii?Q?u9v9jYEnak8YWsb4m9GpGoAyS+Dr3TcFI+GL/naUaHdd1pt90UaOmmBI5wtm?= =?us-ascii?Q?xDBAZ/KyJIbudLiLc8N/Ij8cyMG8ZxI4ks66tpk4rE2HCCIskLmk2ZwiU02A?= =?us-ascii?Q?tOhLu1jWqxLLVfP+jaS6JF/7K1boUJa020PqrlYlfpZI4FkPQxzHcYmwi/Kl?= =?us-ascii?Q?OL+4sRw/cn6FnYUNAEu8blGkU01WAIzJ5nwOIoBWHaJOGYnmHkyVlIuKCqcR?= =?us-ascii?Q?nedOZNjPrS4J92D1TNXqYOYUvaIYwxLeDXeBaT+zOcP4HwXqsQGKMVTq+jHH?= =?us-ascii?Q?8pLOr689ciTaprgzFvSIt+xZFQbFkA7BWI1PfXDmaP6kWVVCj97XdtvR31m6?= =?us-ascii?Q?HHTpFR0v9IkKKdHqbQ//cQJFD2o+qFzOPG0vG/vQQvPo+IBPB5h5IWLUewy9?= =?us-ascii?Q?xeYuaSyF/g0UqCY576nAPpSbQc4WP5VSFxdcGhPa1Tw4bVLA8P2GQljrLuaZ?= =?us-ascii?Q?Z1CYhQCjuUTs4KnfuaOPf3GAcUomKZi85j20g33vFLn/cpv3fTBu1B7kMdEv?= =?us-ascii?Q?AOjKjmYwf4QI4VWN7+HySakHRG9KgFkEU+apCR+3n170lrzRGx7klwcnhUFn?= =?us-ascii?Q?KVFFzfJWZqMQK5BYAhMthaPeDnphXbNZ8wfsEguA9osI2vvVV4Q17bRcgKjj?= =?us-ascii?Q?MrZzjdKFyoctVKFpIXxPcrf6+3MdGM022C0hmcItajr15jS/M6uypnftI8DC?= =?us-ascii?Q?3BCLngJf0IDZ8/FZ53LDy91KN/asrOnHKnx0+n/BPWj5P3XXNtkGLX+cFSa5?= =?us-ascii?Q?Pjztl5iVZltx4KaeqkQ3wRA1jpfXfgpFDph264pFJhL4HkLr/8SLDC0Q7rjo?= =?us-ascii?Q?+NbCEZYPOdtDbV6Qe+/U544JNlvbewnvjgr015YYPBPz29aQgiuJyC8Rnitw?= =?us-ascii?Q?rU5uVTj/+5Kor8t2XNCK6ZPllZCAN2tATKLCF4Yi3ftO9OOSKCkH35Zwq1Td?= =?us-ascii?Q?ngVpGEf4MrQNmcrD5rd1+MqgHSuyQ2GnynCw5Ap2K7Y4Zmb2jdxmJzYmPEyz?= =?us-ascii?Q?nqJvpkBhotzHWjtOZmyah5patKmKDI57ju5BnSa+HDQac1Crt1Gc0BzhqE3K?= =?us-ascii?Q?sGYqqk4ZzciP76GC8O73249fSeHrX6LbPWvkrtSughcQPJJboT9oFxHfC+D6?= =?us-ascii?Q?lhoasD+vfDO/GnEstsYuG7pDBo0JRiiFIZ+kAkl0lGGpM92lJzOOvLdm/79+?= =?us-ascii?Q?wMUa51dlPpSv3056igAg4WJ5rdENDn6c3rES2up4Ivkuo1F4nGX0aE7BeLTo?= =?us-ascii?Q?4QZHry10v7zBDNXxghfWgRaYj9qI3fbhcWe3epVGXdnGieH9Lje+nxXBlBOl?= =?us-ascii?Q?pnsLP5p29nKzZDYunIekNl55zkfPXeZB6vaee9IE5uq2ojaTWQ=3D=3D?= x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:LV2PR01MB7839.prod.exchangelabs.com;PTR:;CAT:NONE;SFS:(13230038)(376012)(366014)(1800799022)(38070700016);DIR:OUT;SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?Q4sISerBbnf0YyMoJP28nofPSyP0UEAfMnyJpDP+Q2CPR4QUHb53YlLKOvDR?= =?us-ascii?Q?rJs83tPWoPZWh0rjUh9u/HlIgj+rdWIFgGiEUy9LSEBtK+rR9OyBPOjrmQzt?= =?us-ascii?Q?pGymmHI/YYQ8YN+lVJXPdTktNXyfw1E69oe/TCzB9FpqJJ3VQL62Fi27Kbz9?= =?us-ascii?Q?4Uc3LaR1XZJ3uCBWbPfds0DQ87slV/aqtR6D9fAONmkVyvRNrU08qmWPiz1A?= =?us-ascii?Q?sAnbwRAIcMeKV8Vh7ceHiCDIBfWkilgTCFLd5s5zxGUZy1JeEjjVpzjQUFnd?= =?us-ascii?Q?cazTAaZZVXUqeY625dxmWMglkJlA6X6PCPtuwavSFLLbskvNl6QD9mPE/qLI?= =?us-ascii?Q?pUNGzonxsb/CCoGS9GW/TfAZ8davKOdmrF/ziaPYKg+aSuiOIzaXSEOZY1WR?= =?us-ascii?Q?0yYhi6XPqMjsB1klTApLRKNoGlMZtm+J+ZJUE06qcTigz1JqtF2SYT7xPXwl?= =?us-ascii?Q?1HUuA4uR7D6bcXwWRaiKjXc5FSYWgsYyJNn9TDIX9QKaQghEAPqbV2+eupur?= =?us-ascii?Q?OmmAgXhA05eRKRLzP3uU+nryCvAyB6nbXNJrxF71aIKsUmbLJwBEJQjnlEvt?= =?us-ascii?Q?rOaoKi+LZzfbzY3OinhbCUrPSmYshHNdQm22UdZ3mco9ADdp6cCUUnZA1Olg?= =?us-ascii?Q?YJQ7s513mXaImEEoFXyabxiXrnnSJnpdQ1oJnY+x8ntsHCDoyAVQW3y05Cf7?= =?us-ascii?Q?V1QhAx37d1tJHnKrcdJMirZO5O2lnry2diEdNXaPbfA55tnDxOjehcVv4ED0?= =?us-ascii?Q?WemQ4gJ+hUcYY1CNTnJwqUw5SkEEj+lboTaoMO08/mpMxuuic9meksvW1pqs?= =?us-ascii?Q?YaW7L37159jNjXCkxTFas+D7tZ2n0ReHHnnoVfvFoetCaC8Tkl4OsQaT2Wz0?= =?us-ascii?Q?0B2CAOBzr4pSmSzB1ueN/8PiUHODgsyaAHY4pxLJSmkxLSQAtLF4wvrl0P1N?= =?us-ascii?Q?3z946fv/2kWkSc6cUYqNgXnRVGGUoS74EtEa+G13ahGJfs5DbKYMauQZViH2?= =?us-ascii?Q?u3JvAwH0ZzDwJ3q5DtjSVZtePGlebnWOxn8eIIdH1ENnVZ8DDkI2H28ZxnrI?= =?us-ascii?Q?hQI7wYgif7BZjjo77x27S35Aby1ljMALG6ZcrcoFepdB58DTIN+ZMasmrwnI?= =?us-ascii?Q?we5/eWqp6QkqoPZQfJTO+1cQdn3UadgyYj8+eafAKAOH1w10oGqA9YD3kASk?= =?us-ascii?Q?VNDh2Oaa3evuDzRJFk/cV+mIR1JEGAAI498FPJx404UsBOWTIwB3uH3syWtH?= =?us-ascii?Q?poUuZvHxDXukpmvRG+wBJK/z+sunJOW4klrskbC7A+5G6P1gaPgBaZKRmY3F?= =?us-ascii?Q?VC/upK5V539m9S5HIrJjVhLwXb18kpVzQbeIccRMQ5TN2xWBMlt5TOGQi1OL?= =?us-ascii?Q?BGghOrL6itSICQTiqFkx5Z9hXWA1+bTXlDNcIBTlJU+pXquuMu5bqu60A/ln?= =?us-ascii?Q?KnCl7Wi3qtGoLuMqVnv5Sveaf8D6rRt0wg67xDgg8u1h92N75uT5w9hKVFUh?= =?us-ascii?Q?MIWpUlH6bqc8+pC3tFvI3JUs6Eoi7my16YiJH2o53ckNRd7IVhB0Jln/0QG3?= =?us-ascii?Q?oQVzVWPpvNb9HGR9SXz0gbzLRRtcNImTZGNZw7WB?= Content-Type: multipart/mixed; boundary="_002_LV2PR01MB78390C1BBE6FD743B6DEDE73F7D62LV2PR01MB7839prod_" MIME-Version: 1.0 X-OriginatorOrg: os.amperecomputing.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: LV2PR01MB7839.prod.exchangelabs.com X-MS-Exchange-CrossTenant-Network-Message-Id: c9647aa6-88a1-42fe-940e-08dc95efd692 X-MS-Exchange-CrossTenant-originalarrivaltime: 26 Jun 2024 14:54:10.8944 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 3bc2b170-fd94-476d-b0ce-4229bdc904a7 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: h/YFF7awOxDWkHkpPKbSETWFlnL2+gYGrwMPdekBw2MCJgHXdo66E1ZAxe9cFZ4/SYW64yVCTntM7qPpyUw7F7MLfE4I/R4DdejHEOR+318= X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH0PR01MB7186 X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,GIT_PATCH_0,KAM_ASCII_DIVIDERS,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --_002_LV2PR01MB78390C1BBE6FD743B6DEDE73F7D62LV2PR01MB7839prod_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable This patch is also adjusted with changes to two its dependent patches. When transforming multiple lane-reducing operations in a loop reduction cha= in, originally, corresponding vectorized statements are generated into def-use cycles starting from 0. The def-use cycle with smaller index, would contain more statements, which means more instruction dependency. For example: int sum =3D 0; for (i) { sum +=3D d0[i] * d1[i]; // dot-prod sum +=3D w[i]; // widen-sum sum +=3D abs(s0[i] - s1[i]); // sad } Original transformation result: for (i / 16) { sum_v0 =3D DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], sum_v0); sum_v1 =3D sum_v1; // copy sum_v2 =3D sum_v2; // copy sum_v3 =3D sum_v3; // copy sum_v0 =3D WIDEN_SUM (w_v0[i: 0 ~ 15], sum_v0); sum_v1 =3D sum_v1; // copy sum_v2 =3D sum_v2; // copy sum_v3 =3D sum_v3; // copy sum_v0 =3D SAD (s0_v0[i: 0 ~ 7 ], s1_v0[i: 0 ~ 7 ], sum_v0); sum_v1 =3D SAD (s0_v1[i: 8 ~ 15], s1_v1[i: 8 ~ 15], sum_v1); sum_v2 =3D sum_v2; // copy sum_v3 =3D sum_v3; // copy } For a higher instruction parallelism in final vectorized loop, an optimal means is to make those effective vectorized lane-reducing statements be distributed evenly among all def-use cycles. Transformed as the below, DOT_PROD, WIDEN_SUM and SADs are generated into disparate cycles, instruction dependency could be eliminated. for (i / 16) { sum_v0 =3D DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], sum_v0); sum_v1 =3D sum_v1; // copy sum_v2 =3D sum_v2; // copy sum_v3 =3D sum_v3; // copy sum_v0 =3D sum_v0; // copy sum_v1 =3D WIDEN_SUM (w_v1[i: 0 ~ 15], sum_v1); sum_v2 =3D sum_v2; // copy sum_v3 =3D sum_v3; // copy sum_v0 =3D sum_v0; // copy sum_v1 =3D sum_v1; // copy sum_v2 =3D SAD (s0_v2[i: 0 ~ 7 ], s1_v2[i: 0 ~ 7 ], sum_v2); sum_v3 =3D SAD (s0_v3[i: 8 ~ 15], s1_v3[i: 8 ~ 15], sum_v3); } 2024-03-22 Feng Xue gcc/ PR tree-optimization/114440 * tree-vectorizer.h (struct _stmt_vec_info): Add a new field reduc_result_pos. * tree-vect-loop.cc (vect_transform_reduction): Generate lane-reduc= ing statements in an optimized order. --- gcc/tree-vect-loop.cc | 43 +++++++++++++++++++++++++++++++++++++++---- gcc/tree-vectorizer.h | 6 ++++++ 2 files changed, 45 insertions(+), 4 deletions(-) diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 6bfb0e72905..783c4f2b153 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -8841,9 +8841,9 @@ vect_transform_reduction (loop_vec_info loop_vinfo, sum_v2 =3D sum_v2; // copy sum_v3 =3D sum_v3; // copy - sum_v0 =3D SAD (s0_v0[i: 0 ~ 7 ], s1_v0[i: 0 ~ 7 ], sum_v0); - sum_v1 =3D SAD (s0_v1[i: 8 ~ 15], s1_v1[i: 8 ~ 15], sum_v1); - sum_v2 =3D sum_v2; // copy + sum_v0 =3D sum_v0; // copy + sum_v1 =3D SAD (s0_v1[i: 0 ~ 7 ], s1_v1[i: 0 ~ 7 ], sum_v1); + sum_v2 =3D SAD (s0_v2[i: 8 ~ 15], s1_v2[i: 8 ~ 15], sum_v2); sum_v3 =3D sum_v3; // copy sum_v0 +=3D n_v0[i: 0 ~ 3 ]; @@ -8851,7 +8851,12 @@ vect_transform_reduction (loop_vec_info loop_vinfo, sum_v2 +=3D n_v2[i: 8 ~ 11]; sum_v3 +=3D n_v3[i: 12 ~ 15]; } - */ + + Moreover, for a higher instruction parallelism in final vectorized + loop, it is considered to make those effective vectorized lane- + reducing statements be distributed evenly among all def-use cycles= . + In the above example, SADs are generated into other cycles rather + than that of DOT_PROD. */ unsigned using_ncopies =3D vec_oprnds[0].length (); unsigned reduc_ncopies =3D vec_oprnds[reduc_index].length (); @@ -8864,6 +8869,36 @@ vect_transform_reduction (loop_vec_info loop_vinfo, gcc_assert (vec_oprnds[i].length () =3D=3D using_ncopies); vec_oprnds[i].safe_grow_cleared (reduc_ncopies); } + + /* Find suitable def-use cycles to generate vectorized statements + into, and reorder operands based on the selection. */ + unsigned curr_pos =3D reduc_info->reduc_result_pos; + unsigned next_pos =3D (curr_pos + using_ncopies) % reduc_ncopies; + + gcc_assert (curr_pos < reduc_ncopies); + reduc_info->reduc_result_pos =3D next_pos; + + if (curr_pos) + { + unsigned count =3D reduc_ncopies - using_ncopies; + unsigned start =3D curr_pos - count; + + if ((int) start < 0) + { + count =3D curr_pos; + start =3D 0; + } + + for (unsigned i =3D 0; i < op.num_ops - 1; i++) + { + for (unsigned j =3D using_ncopies; j > start; j--) + { + unsigned k =3D j - 1; + std::swap (vec_oprnds[i][k], vec_oprnds[i][k + count]= ); + gcc_assert (!vec_oprnds[i][k]); + } + } + } } } diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 94736736dcc..64c6571a293 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -1402,6 +1402,12 @@ public: /* The vector type for performing the actual reduction. */ tree reduc_vectype; + /* For loop reduction with multiple vectorized results (ncopies > 1), a + lane-reducing operation participating in it may not use all of those + results, this field specifies result index starting from which any + following land-reducing operation would be assigned to. */ + unsigned int reduc_result_pos; + /* If IS_REDUC_INFO is true and if the vector code is performing N scalar reductions in parallel, this variable gives the initial scalar values of those N reductions. */ -- 2.17.1 ________________________________________ From: Feng Xue OS Sent: Thursday, June 20, 2024 2:02 PM To: Richard Biener Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH 8/8] vect: Optimize order of lane-reducing statements i= n loop def-use cycles This patch was updated with some new change. When transforming multiple lane-reducing operations in a loop reduction cha= in, originally, corresponding vectorized statements are generated into def-use cycles starting from 0. The def-use cycle with smaller index, would contain more statements, which means more instruction dependency. For example: int sum =3D 0; for (i) { sum +=3D d0[i] * d1[i]; // dot-prod sum +=3D w[i]; // widen-sum sum +=3D abs(s0[i] - s1[i]); // sad } Original transformation result: for (i / 16) { sum_v0 =3D DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], sum_v0); sum_v1 =3D sum_v1; // copy sum_v2 =3D sum_v2; // copy sum_v3 =3D sum_v3; // copy sum_v0 =3D WIDEN_SUM (w_v0[i: 0 ~ 15], sum_v0); sum_v1 =3D sum_v1; // copy sum_v2 =3D sum_v2; // copy sum_v3 =3D sum_v3; // copy sum_v0 =3D SAD (s0_v0[i: 0 ~ 7 ], s1_v0[i: 0 ~ 7 ], sum_v0); sum_v1 =3D SAD (s0_v1[i: 8 ~ 15], s1_v1[i: 8 ~ 15], sum_v1); sum_v2 =3D sum_v2; // copy sum_v3 =3D sum_v3; // copy } For a higher instruction parallelism in final vectorized loop, an optimal means is to make those effective vectorized lane-reducing statements be distributed evenly among all def-use cycles. Transformed as the below, DOT_PROD, WIDEN_SUM and SADs are generated into disparate cycles, instruction dependency could be eliminated. for (i / 16) { sum_v0 =3D DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], sum_v0); sum_v1 =3D sum_v1; // copy sum_v2 =3D sum_v2; // copy sum_v3 =3D sum_v3; // copy sum_v0 =3D sum_v0; // copy sum_v1 =3D WIDEN_SUM (w_v1[i: 0 ~ 15], sum_v1); sum_v2 =3D sum_v2; // copy sum_v3 =3D sum_v3; // copy sum_v0 =3D sum_v0; // copy sum_v1 =3D sum_v1; // copy sum_v2 =3D SAD (s0_v2[i: 0 ~ 7 ], s1_v2[i: 0 ~ 7 ], sum_v2); sum_v3 =3D SAD (s0_v3[i: 8 ~ 15], s1_v3[i: 8 ~ 15], sum_v3); } 2024-03-22 Feng Xue gcc/ PR tree-optimization/114440 * tree-vectorizer.h (struct _stmt_vec_info): Add a new field reduc_result_pos. * tree-vect-loop.cc (vect_transform_reduction): Generate lane-reduc= ing statements in an optimized order. --- gcc/tree-vect-loop.cc | 43 +++++++++++++++++++++++++++++++++++++++---- gcc/tree-vectorizer.h | 6 ++++++ 2 files changed, 45 insertions(+), 4 deletions(-) diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 5a27a2c3d9c..adee54350d4 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -8821,9 +8821,9 @@ vect_transform_reduction (loop_vec_info loop_vinfo, sum_v2 =3D sum_v2; // copy sum_v3 =3D sum_v3; // copy - sum_v0 =3D SAD (s0_v0[i: 0 ~ 7 ], s1_v0[i: 0 ~ 7 ], sum_v0); - sum_v1 =3D SAD (s0_v1[i: 8 ~ 15], s1_v1[i: 8 ~ 15], sum_v1); - sum_v2 =3D sum_v2; // copy + sum_v0 =3D sum_v0; // copy + sum_v1 =3D SAD (s0_v1[i: 0 ~ 7 ], s1_v1[i: 0 ~ 7 ], sum_v1); + sum_v2 =3D SAD (s0_v2[i: 8 ~ 15], s1_v2[i: 8 ~ 15], sum_v2); sum_v3 =3D sum_v3; // copy sum_v0 +=3D n_v0[i: 0 ~ 3 ]; @@ -8831,7 +8831,12 @@ vect_transform_reduction (loop_vec_info loop_vinfo, sum_v2 +=3D n_v2[i: 8 ~ 11]; sum_v3 +=3D n_v3[i: 12 ~ 15]; } - */ + + Moreover, for a higher instruction parallelism in final vectorized + loop, it is considered to make those effective vectorized lane- + reducing statements be distributed evenly among all def-use cycles= . + In the above example, SADs are generated into other cycles rather + than that of DOT_PROD. */ tree phi_vectype_in =3D STMT_VINFO_REDUC_VECTYPE_IN (reduc_info); unsigned all_ncopies =3D vect_get_num_copies (loop_vinfo, phi_vectyp= e_in); unsigned use_ncopies =3D vec_oprnds[0].length (); @@ -8855,6 +8860,36 @@ vect_transform_reduction (loop_vec_info loop_vinfo, gcc_assert (vec_oprnds[i].length () =3D=3D use_ncopies); vec_oprnds[i].safe_grow_cleared (all_ncopies); } + + /* Find suitable def-use cycles to generate vectorized statements + into, and reorder operands based on the selection. */ + unsigned curr_pos =3D reduc_info->reduc_result_pos; + unsigned next_pos =3D (curr_pos + use_ncopies) % all_ncopies; + + gcc_assert (curr_pos < all_ncopies); + reduc_info->reduc_result_pos =3D next_pos; + + if (curr_pos) + { + unsigned count =3D all_ncopies - use_ncopies; + unsigned start =3D curr_pos - count; + + if ((int) start < 0) + { + count =3D curr_pos; + start =3D 0; + } + + for (unsigned i =3D 0; i < op.num_ops - 1; i++) + { + for (unsigned j =3D use_ncopies; j > start; j--) + { + unsigned k =3D j - 1; + std::swap (vec_oprnds[i][k], vec_oprnds[i][k + count]= ); + gcc_assert (!vec_oprnds[i][k]); + } + } + } } } diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 94736736dcc..64c6571a293 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -1402,6 +1402,12 @@ public: /* The vector type for performing the actual reduction. */ tree reduc_vectype; + /* For loop reduction with multiple vectorized results (ncopies > 1), a + lane-reducing operation participating in it may not use all of those + results, this field specifies result index starting from which any + following land-reducing operation would be assigned to. */ + unsigned int reduc_result_pos; + /* If IS_REDUC_INFO is true and if the vector code is performing N scalar reductions in parallel, this variable gives the initial scalar values of those N reductions. */ -- 2.17.1 ________________________________________ From: Feng Xue OS Sent: Sunday, June 16, 2024 3:32 PM To: Richard Biener Cc: gcc-patches@gcc.gnu.org Subject: [PATCH 8/8] vect: Optimize order of lane-reducing statements in lo= op def-use cycles When transforming multiple lane-reducing operations in a loop reduction cha= in, originally, corresponding vectorized statements are generated into def-use cycles starting from 0. The def-use cycle with smaller index, would contain more statements, which means more instruction dependency. For example: int sum =3D 0; for (i) { sum +=3D d0[i] * d1[i]; // dot-prod sum +=3D w[i]; // widen-sum sum +=3D abs(s0[i] - s1[i]); // sad } Original transformation result: for (i / 16) { sum_v0 =3D DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], sum_v0); sum_v1 =3D sum_v1; // copy sum_v2 =3D sum_v2; // copy sum_v3 =3D sum_v3; // copy sum_v0 =3D WIDEN_SUM (w_v0[i: 0 ~ 15], sum_v0); sum_v1 =3D sum_v1; // copy sum_v2 =3D sum_v2; // copy sum_v3 =3D sum_v3; // copy sum_v0 =3D SAD (s0_v0[i: 0 ~ 7 ], s1_v0[i: 0 ~ 7 ], sum_v0); sum_v1 =3D SAD (s0_v1[i: 8 ~ 15], s1_v1[i: 8 ~ 15], sum_v1); sum_v2 =3D sum_v2; // copy sum_v3 =3D sum_v3; // copy } For a higher instruction parallelism in final vectorized loop, an optimal means is to make those effective vectorized lane-reducing statements be distributed evenly among all def-use cycles. Transformed as the below, DOT_PROD, WIDEN_SUM and SADs are generated into disparate cycles, instruction dependency could be eliminated. Thanks, Feng --- gcc/ PR tree-optimization/114440 * tree-vectorizer.h (struct _stmt_vec_info): Add a new field reduc_result_pos. * tree-vect-loop.cc (vect_transform_reduction): Generate lane-reduc= ing statements in an optimized order. --- gcc/tree-vect-loop.cc | 39 +++++++++++++++++++++++++++++++++++---- gcc/tree-vectorizer.h | 6 ++++++ 2 files changed, 41 insertions(+), 4 deletions(-) diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 6d91665a341..c7e13d655d8 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -8828,9 +8828,9 @@ vect_transform_reduction (loop_vec_info loop_vinfo, sum_v2 =3D sum_v2; // copy sum_v3 =3D sum_v3; // copy - sum_v0 =3D SAD (s0_v0[i: 0 ~ 7 ], s1_v0[i: 0 ~ 7 ], sum_v0); - sum_v1 =3D SAD (s0_v1[i: 8 ~ 15], s1_v1[i: 8 ~ 15], sum_v1); - sum_v2 =3D sum_v2; // copy + sum_v0 =3D sum_v0; // copy + sum_v1 =3D SAD (s0_v1[i: 0 ~ 7 ], s1_v1[i: 0 ~ 7 ], sum_v1); + sum_v2 =3D SAD (s0_v2[i: 8 ~ 15], s1_v2[i: 8 ~ 15], sum_v2); sum_v3 =3D sum_v3; // copy sum_v0 +=3D n_v0[i: 0 ~ 3 ]; @@ -8838,14 +8838,45 @@ vect_transform_reduction (loop_vec_info loop_vinfo, sum_v2 +=3D n_v2[i: 8 ~ 11]; sum_v3 +=3D n_v3[i: 12 ~ 15]; } - */ + + Moreover, for a higher instruction parallelism in final vectorized + loop, it is considered to make those effective vectorized lane- + reducing statements be distributed evenly among all def-use cycles= . + In the above example, SADs are generated into other cycles rather + than that of DOT_PROD. */ unsigned using_ncopies =3D vec_oprnds[0].length (); unsigned reduc_ncopies =3D vec_oprnds[reduc_index].length (); + unsigned result_pos =3D reduc_info->reduc_result_pos; + + reduc_info->reduc_result_pos + =3D (result_pos + using_ncopies) % reduc_ncopies; + gcc_assert (result_pos < reduc_ncopies); for (unsigned i =3D 0; i < op.num_ops - 1; i++) { gcc_assert (vec_oprnds[i].length () =3D=3D using_ncopies); vec_oprnds[i].safe_grow_cleared (reduc_ncopies); + + /* Find suitable def-use cycles to generate vectorized statements + into, and reorder operands based on the selection. */ + if (result_pos) + { + unsigned count =3D reduc_ncopies - using_ncopies; + unsigned start =3D result_pos - count; + + if ((int) start < 0) + { + count =3D result_pos; + start =3D 0; + } + + for (unsigned j =3D using_ncopies; j > start; j--) + { + unsigned k =3D j - 1; + std::swap (vec_oprnds[i][k], vec_oprnds[i][k + count]); + gcc_assert (!vec_oprnds[i][k]); + } + } } } diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 94736736dcc..64c6571a293 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -1402,6 +1402,12 @@ public: /* The vector type for performing the actual reduction. */ tree reduc_vectype; + /* For loop reduction with multiple vectorized results (ncopies > 1), a + lane-reducing operation participating in it may not use all of those + results, this field specifies result index starting from which any + following land-reducing operation would be assigned to. */ + unsigned int reduc_result_pos; + /* If IS_REDUC_INFO is true and if the vector code is performing N scalar reductions in parallel, this variable gives the initial scalar values of those N reductions. */ -- 2.17.1 --_002_LV2PR01MB78390C1BBE6FD743B6DEDE73F7D62LV2PR01MB7839prod_ Content-Type: text/x-patch; name="0003-vect-Optimize-order-of-lane-reducing-statements-in-l.patch" Content-Description: 0003-vect-Optimize-order-of-lane-reducing-statements-in-l.patch Content-Disposition: attachment; filename="0003-vect-Optimize-order-of-lane-reducing-statements-in-l.patch"; size=5849; creation-date="Wed, 26 Jun 2024 14:54:07 GMT"; modification-date="Wed, 26 Jun 2024 14:54:07 GMT" Content-Transfer-Encoding: base64 RnJvbSA2ODc4YTYwMzA5ZmFjNTE4MzY4YTMxYmI1Mjk3NmU0Zjk1NjJkYWMzIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBGZW5nIFh1ZSA8Znh1ZUBvcy5hbXBlcmVjb21wdXRpbmcuY29t PgpEYXRlOiBXZWQsIDI5IE1heSAyMDI0IDE3OjI4OjE0ICswODAwClN1YmplY3Q6IFtQQVRDSCAz LzNdIHZlY3Q6IE9wdGltaXplIG9yZGVyIG9mIGxhbmUtcmVkdWNpbmcgc3RhdGVtZW50cyBpbiBs b29wCiBkZWYtdXNlIGN5Y2xlcwoKV2hlbiB0cmFuc2Zvcm1pbmcgbXVsdGlwbGUgbGFuZS1yZWR1 Y2luZyBvcGVyYXRpb25zIGluIGEgbG9vcCByZWR1Y3Rpb24gY2hhaW4sCm9yaWdpbmFsbHksIGNv cnJlc3BvbmRpbmcgdmVjdG9yaXplZCBzdGF0ZW1lbnRzIGFyZSBnZW5lcmF0ZWQgaW50byBkZWYt dXNlCmN5Y2xlcyBzdGFydGluZyBmcm9tIDAuIFRoZSBkZWYtdXNlIGN5Y2xlIHdpdGggc21hbGxl ciBpbmRleCwgd291bGQgY29udGFpbgptb3JlIHN0YXRlbWVudHMsIHdoaWNoIG1lYW5zIG1vcmUg aW5zdHJ1Y3Rpb24gZGVwZW5kZW5jeS4gRm9yIGV4YW1wbGU6CgogICBpbnQgc3VtID0gMDsKICAg Zm9yIChpKQogICAgIHsKICAgICAgIHN1bSArPSBkMFtpXSAqIGQxW2ldOyAgICAgIC8vIGRvdC1w cm9kIDx2ZWN0b3IoMTYpIGNoYXI+CiAgICAgICBzdW0gKz0gd1tpXTsgICAgICAgICAgICAgICAv LyB3aWRlbi1zdW0gPHZlY3RvcigxNikgY2hhcj4KICAgICAgIHN1bSArPSBhYnMoczBbaV0gLSBz MVtpXSk7IC8vIHNhZCA8dmVjdG9yKDgpIHNob3J0PgogICAgIH0KCk9yaWdpbmFsIHRyYW5zZm9y bWF0aW9uIHJlc3VsdDoKCiAgIGZvciAoaSAvIDE2KQogICAgIHsKICAgICAgIHN1bV92MCA9IERP VF9QUk9EIChkMF92MFtpOiAwIH4gMTVdLCBkMV92MFtpOiAwIH4gMTVdLCBzdW1fdjApOwogICAg ICAgc3VtX3YxID0gc3VtX3YxOyAgLy8gY29weQogICAgICAgc3VtX3YyID0gc3VtX3YyOyAgLy8g Y29weQogICAgICAgc3VtX3YzID0gc3VtX3YzOyAgLy8gY29weQoKICAgICAgIHN1bV92MCA9IFdJ REVOX1NVTSAod192MFtpOiAwIH4gMTVdLCBzdW1fdjApOwogICAgICAgc3VtX3YxID0gc3VtX3Yx OyAgLy8gY29weQogICAgICAgc3VtX3YyID0gc3VtX3YyOyAgLy8gY29weQogICAgICAgc3VtX3Yz ID0gc3VtX3YzOyAgLy8gY29weQoKICAgICAgIHN1bV92MCA9IFNBRCAoczBfdjBbaTogMCB+IDcg XSwgczFfdjBbaTogMCB+IDcgXSwgc3VtX3YwKTsKICAgICAgIHN1bV92MSA9IFNBRCAoczBfdjFb aTogOCB+IDE1XSwgczFfdjFbaTogOCB+IDE1XSwgc3VtX3YxKTsKICAgICAgIHN1bV92MiA9IHN1 bV92MjsgIC8vIGNvcHkKICAgICAgIHN1bV92MyA9IHN1bV92MzsgIC8vIGNvcHkKICAgICB9CgpG b3IgYSBoaWdoZXIgaW5zdHJ1Y3Rpb24gcGFyYWxsZWxpc20gaW4gZmluYWwgdmVjdG9yaXplZCBs b29wLCBhbiBvcHRpbWFsCm1lYW5zIGlzIHRvIG1ha2UgdGhvc2UgZWZmZWN0aXZlIHZlY3Rvcml6 ZWQgbGFuZS1yZWR1Y2luZyBzdGF0ZW1lbnRzIGJlCmRpc3RyaWJ1dGVkIGV2ZW5seSBhbW9uZyBh bGwgZGVmLXVzZSBjeWNsZXMuIFRyYW5zZm9ybWVkIGFzIHRoZSBiZWxvdywKRE9UX1BST0QsIFdJ REVOX1NVTSBhbmQgU0FEcyBhcmUgZ2VuZXJhdGVkIGludG8gZGlzcGFyYXRlIGN5Y2xlcywKaW5z dHJ1Y3Rpb24gZGVwZW5kZW5jeSBjb3VsZCBiZSBlbGltaW5hdGVkLgoKICAgZm9yIChpIC8gMTYp CiAgICAgewogICAgICAgc3VtX3YwID0gRE9UX1BST0QgKGQwX3YwW2k6IDAgfiAxNV0sIGQxX3Yw W2k6IDAgfiAxNV0sIHN1bV92MCk7CiAgICAgICBzdW1fdjEgPSBzdW1fdjE7ICAvLyBjb3B5CiAg ICAgICBzdW1fdjIgPSBzdW1fdjI7ICAvLyBjb3B5CiAgICAgICBzdW1fdjMgPSBzdW1fdjM7ICAv LyBjb3B5CgogICAgICAgc3VtX3YwID0gc3VtX3YwOyAgLy8gY29weQogICAgICAgc3VtX3YxID0g V0lERU5fU1VNICh3X3YxW2k6IDAgfiAxNV0sIHN1bV92MSk7CiAgICAgICBzdW1fdjIgPSBzdW1f djI7ICAvLyBjb3B5CiAgICAgICBzdW1fdjMgPSBzdW1fdjM7ICAvLyBjb3B5CgogICAgICAgc3Vt X3YwID0gc3VtX3YwOyAgLy8gY29weQogICAgICAgc3VtX3YxID0gc3VtX3YxOyAgLy8gY29weQog ICAgICAgc3VtX3YyID0gU0FEIChzMF92MltpOiAwIH4gNyBdLCBzMV92MltpOiAwIH4gNyBdLCBz dW1fdjIpOwogICAgICAgc3VtX3YzID0gU0FEIChzMF92M1tpOiA4IH4gMTVdLCBzMV92M1tpOiA4 IH4gMTVdLCBzdW1fdjMpOwogICAgIH0KCjIwMjQtMDMtMjIgRmVuZyBYdWUgPGZ4dWVAb3MuYW1w ZXJlY29tcHV0aW5nLmNvbT4KCmdjYy8KCVBSIHRyZWUtb3B0aW1pemF0aW9uLzExNDQ0MAoJKiB0 cmVlLXZlY3Rvcml6ZXIuaCAoc3RydWN0IF9zdG10X3ZlY19pbmZvKTogQWRkIGEgbmV3IGZpZWxk CglyZWR1Y19yZXN1bHRfcG9zLgoJKiB0cmVlLXZlY3QtbG9vcC5jYyAodmVjdF90cmFuc2Zvcm1f cmVkdWN0aW9uKTogR2VuZXJhdGUgbGFuZS1yZWR1Y2luZwoJc3RhdGVtZW50cyBpbiBhbiBvcHRp bWl6ZWQgb3JkZXIuCi0tLQogZ2NjL3RyZWUtdmVjdC1sb29wLmNjIHwgNDMgKysrKysrKysrKysr KysrKysrKysrKysrKysrKysrKysrKysrKysrLS0tLQogZ2NjL3RyZWUtdmVjdG9yaXplci5oIHwg IDYgKysrKysrCiAyIGZpbGVzIGNoYW5nZWQsIDQ1IGluc2VydGlvbnMoKyksIDQgZGVsZXRpb25z KC0pCgpkaWZmIC0tZ2l0IGEvZ2NjL3RyZWUtdmVjdC1sb29wLmNjIGIvZ2NjL3RyZWUtdmVjdC1s b29wLmNjCmluZGV4IDZiZmIwZTcyOTA1Li43ODNjNGYyYjE1MyAxMDA2NDQKLS0tIGEvZ2NjL3Ry ZWUtdmVjdC1sb29wLmNjCisrKyBiL2djYy90cmVlLXZlY3QtbG9vcC5jYwpAQCAtODg0MSw5ICs4 ODQxLDkgQEAgdmVjdF90cmFuc2Zvcm1fcmVkdWN0aW9uIChsb29wX3ZlY19pbmZvIGxvb3Bfdmlu Zm8sCiAJICAgICAgIHN1bV92MiA9IHN1bV92MjsgIC8vIGNvcHkKIAkgICAgICAgc3VtX3YzID0g c3VtX3YzOyAgLy8gY29weQogCi0JICAgICAgIHN1bV92MCA9IFNBRCAoczBfdjBbaTogMCB+IDcg XSwgczFfdjBbaTogMCB+IDcgXSwgc3VtX3YwKTsKLQkgICAgICAgc3VtX3YxID0gU0FEIChzMF92 MVtpOiA4IH4gMTVdLCBzMV92MVtpOiA4IH4gMTVdLCBzdW1fdjEpOwotCSAgICAgICBzdW1fdjIg PSBzdW1fdjI7ICAvLyBjb3B5CisJICAgICAgIHN1bV92MCA9IHN1bV92MDsgIC8vIGNvcHkKKwkg ICAgICAgc3VtX3YxID0gU0FEIChzMF92MVtpOiAwIH4gNyBdLCBzMV92MVtpOiAwIH4gNyBdLCBz dW1fdjEpOworCSAgICAgICBzdW1fdjIgPSBTQUQgKHMwX3YyW2k6IDggfiAxNV0sIHMxX3YyW2k6 IDggfiAxNV0sIHN1bV92Mik7CiAJICAgICAgIHN1bV92MyA9IHN1bV92MzsgIC8vIGNvcHkKIAog CSAgICAgICBzdW1fdjAgKz0gbl92MFtpOiAwICB+IDMgXTsKQEAgLTg4NTEsNyArODg1MSwxMiBA QCB2ZWN0X3RyYW5zZm9ybV9yZWR1Y3Rpb24gKGxvb3BfdmVjX2luZm8gbG9vcF92aW5mbywKIAkg ICAgICAgc3VtX3YyICs9IG5fdjJbaTogOCAgfiAxMV07CiAJICAgICAgIHN1bV92MyArPSBuX3Yz W2k6IDEyIH4gMTVdOwogCSAgICAgfQotCSovCisKKwkgTW9yZW92ZXIsIGZvciBhIGhpZ2hlciBp bnN0cnVjdGlvbiBwYXJhbGxlbGlzbSBpbiBmaW5hbCB2ZWN0b3JpemVkCisJIGxvb3AsIGl0IGlz IGNvbnNpZGVyZWQgdG8gbWFrZSB0aG9zZSBlZmZlY3RpdmUgdmVjdG9yaXplZCBsYW5lLQorCSBy ZWR1Y2luZyBzdGF0ZW1lbnRzIGJlIGRpc3RyaWJ1dGVkIGV2ZW5seSBhbW9uZyBhbGwgZGVmLXVz ZSBjeWNsZXMuCisJIEluIHRoZSBhYm92ZSBleGFtcGxlLCBTQURzIGFyZSBnZW5lcmF0ZWQgaW50 byBvdGhlciBjeWNsZXMgcmF0aGVyCisJIHRoYW4gdGhhdCBvZiBET1RfUFJPRC4gICovCiAgICAg ICB1bnNpZ25lZCB1c2luZ19uY29waWVzID0gdmVjX29wcm5kc1swXS5sZW5ndGggKCk7CiAgICAg ICB1bnNpZ25lZCByZWR1Y19uY29waWVzID0gdmVjX29wcm5kc1tyZWR1Y19pbmRleF0ubGVuZ3Ro ICgpOwogCkBAIC04ODY0LDYgKzg4NjksMzYgQEAgdmVjdF90cmFuc2Zvcm1fcmVkdWN0aW9uIChs b29wX3ZlY19pbmZvIGxvb3BfdmluZm8sCiAJICAgICAgZ2NjX2Fzc2VydCAodmVjX29wcm5kc1tp XS5sZW5ndGggKCkgPT0gdXNpbmdfbmNvcGllcyk7CiAJICAgICAgdmVjX29wcm5kc1tpXS5zYWZl X2dyb3dfY2xlYXJlZCAocmVkdWNfbmNvcGllcyk7CiAJICAgIH0KKworCSAgLyogRmluZCBzdWl0 YWJsZSBkZWYtdXNlIGN5Y2xlcyB0byBnZW5lcmF0ZSB2ZWN0b3JpemVkIHN0YXRlbWVudHMKKwkg ICAgIGludG8sIGFuZCByZW9yZGVyIG9wZXJhbmRzIGJhc2VkIG9uIHRoZSBzZWxlY3Rpb24uICAq LworCSAgdW5zaWduZWQgY3Vycl9wb3MgPSByZWR1Y19pbmZvLT5yZWR1Y19yZXN1bHRfcG9zOwor CSAgdW5zaWduZWQgbmV4dF9wb3MgPSAoY3Vycl9wb3MgKyB1c2luZ19uY29waWVzKSAlIHJlZHVj X25jb3BpZXM7CisKKwkgIGdjY19hc3NlcnQgKGN1cnJfcG9zIDwgcmVkdWNfbmNvcGllcyk7Cisg ICAgICAgICAgcmVkdWNfaW5mby0+cmVkdWNfcmVzdWx0X3BvcyA9IG5leHRfcG9zOworCisJICBp ZiAoY3Vycl9wb3MpCisJICAgIHsKKwkgICAgICB1bnNpZ25lZCBjb3VudCA9IHJlZHVjX25jb3Bp ZXMgLSB1c2luZ19uY29waWVzOworCSAgICAgIHVuc2lnbmVkIHN0YXJ0ID0gY3Vycl9wb3MgLSBj b3VudDsKKworCSAgICAgIGlmICgoaW50KSBzdGFydCA8IDApCisJCXsKKwkJICBjb3VudCA9IGN1 cnJfcG9zOworCQkgIHN0YXJ0ID0gMDsKKwkJfQorCisJICAgICAgZm9yICh1bnNpZ25lZCBpID0g MDsgaSA8IG9wLm51bV9vcHMgLSAxOyBpKyspCisJCXsKKwkJICBmb3IgKHVuc2lnbmVkIGogPSB1 c2luZ19uY29waWVzOyBqID4gc3RhcnQ7IGotLSkKKwkJICAgIHsKKwkJICAgICAgdW5zaWduZWQg ayA9IGogLSAxOworCQkgICAgICBzdGQ6OnN3YXAgKHZlY19vcHJuZHNbaV1ba10sIHZlY19vcHJu ZHNbaV1bayArIGNvdW50XSk7CisJCSAgICAgIGdjY19hc3NlcnQgKCF2ZWNfb3BybmRzW2ldW2td KTsKKwkJICAgIH0KKwkJfQorCSAgICB9CiAJfQogICAgIH0KIApkaWZmIC0tZ2l0IGEvZ2NjL3Ry ZWUtdmVjdG9yaXplci5oIGIvZ2NjL3RyZWUtdmVjdG9yaXplci5oCmluZGV4IDk0NzM2NzM2ZGNj Li42NGM2NTcxYTI5MyAxMDA2NDQKLS0tIGEvZ2NjL3RyZWUtdmVjdG9yaXplci5oCisrKyBiL2dj Yy90cmVlLXZlY3Rvcml6ZXIuaApAQCAtMTQwMiw2ICsxNDAyLDEyIEBAIHB1YmxpYzoKICAgLyog VGhlIHZlY3RvciB0eXBlIGZvciBwZXJmb3JtaW5nIHRoZSBhY3R1YWwgcmVkdWN0aW9uLiAgKi8K ICAgdHJlZSByZWR1Y192ZWN0eXBlOwogCisgIC8qIEZvciBsb29wIHJlZHVjdGlvbiB3aXRoIG11 bHRpcGxlIHZlY3Rvcml6ZWQgcmVzdWx0cyAobmNvcGllcyA+IDEpLCBhCisgICAgIGxhbmUtcmVk dWNpbmcgb3BlcmF0aW9uIHBhcnRpY2lwYXRpbmcgaW4gaXQgbWF5IG5vdCB1c2UgYWxsIG9mIHRo b3NlCisgICAgIHJlc3VsdHMsIHRoaXMgZmllbGQgc3BlY2lmaWVzIHJlc3VsdCBpbmRleCBzdGFy dGluZyBmcm9tIHdoaWNoIGFueQorICAgICBmb2xsb3dpbmcgbGFuZC1yZWR1Y2luZyBvcGVyYXRp b24gd291bGQgYmUgYXNzaWduZWQgdG8uICAqLworICB1bnNpZ25lZCBpbnQgcmVkdWNfcmVzdWx0 X3BvczsKKwogICAvKiBJZiBJU19SRURVQ19JTkZPIGlzIHRydWUgYW5kIGlmIHRoZSB2ZWN0b3Ig Y29kZSBpcyBwZXJmb3JtaW5nCiAgICAgIE4gc2NhbGFyIHJlZHVjdGlvbnMgaW4gcGFyYWxsZWws IHRoaXMgdmFyaWFibGUgZ2l2ZXMgdGhlIGluaXRpYWwKICAgICAgc2NhbGFyIHZhbHVlcyBvZiB0 aG9zZSBOIHJlZHVjdGlvbnMuICAqLwotLSAKMi4xNy4xCgo= --_002_LV2PR01MB78390C1BBE6FD743B6DEDE73F7D62LV2PR01MB7839prod_--