From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR05-VI1-obe.outbound.protection.outlook.com (mail-vi1eur05on2083.outbound.protection.outlook.com [40.107.21.83]) by sourceware.org (Postfix) with ESMTPS id 59024384840F for ; Fri, 4 Jun 2021 10:13:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 59024384840F Received: from AS8PR04CA0099.eurprd04.prod.outlook.com (2603:10a6:20b:31e::14) by AS8PR08MB5926.eurprd08.prod.outlook.com (2603:10a6:20b:29d::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4195.24; Fri, 4 Jun 2021 10:13:09 +0000 Received: from AM5EUR03FT057.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:31e:cafe::41) by AS8PR04CA0099.outlook.office365.com (2603:10a6:20b:31e::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4195.23 via Frontend Transport; Fri, 4 Jun 2021 10:13:09 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; gcc.gnu.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;gcc.gnu.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM5EUR03FT057.mail.protection.outlook.com (10.152.17.44) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4195.21 via Frontend Transport; Fri, 4 Jun 2021 10:13:09 +0000 Received: ("Tessian outbound 836922dda4f1:v93"); Fri, 04 Jun 2021 10:13:08 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 5378d50065527216 X-CR-MTA-TID: 64aa7808 Received: from 61cb29cb086c.3 by 64aa7808-outbound-1.mta.getcheckrecipient.com id C74A58AE-45F0-4411-87FE-E5C394253803.1; Fri, 04 Jun 2021 10:13:03 +0000 Received: from EUR02-HE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 61cb29cb086c.3 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 04 Jun 2021 10:13:03 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=J+5FJs7lmFnjKxgNAw9TnxhlQA2ZgrBFZSXUmo1lfxlolD9cjC/FQc+c2rSlllRtrqnhgSRmJENDZDFee1FIjrWVtOdai/uhs/rY+x2VyrZxJAbVjktvU3Otk+ZHgvpfPxFvUMEEDs5NZivCxrzMMzxZbnTTK1O4wFsl22JZl23qA0p5ltimz5lEOEem/HU1bqTiS81uw+3WRJL2NFkvseFak/8BVkme/4EvghC6GXwdfdVkARKen05Y4Jec9adRijnEvses2IfvHRUbTUtEQgC24FjrSqkpSRepT228mR/Fq8HSayFCQT3dqlxRK1Z8QSQyVHSKb8QAN9WgUk+aRw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=OPP1TKFZCvsjtMVTWNZ0Fk/Ik1e0f5NJ6mBGSK/R6cE=; b=ZiYS8DOpY2WWRTVwZWrgaRFdzOs/GMifxms35E1xtnN3NAFQ6rKPDzxTWmSG/DRyR86NMaJY5bDPoVy89png/Ki0HZE6q0asfrhYmFyQKvMxDcZKzpJxzcuG13OzeObdWCrT2gsh+jUHQ+3WXD1LR6KrXILF3YQNbntRFm3A865F1+EpIrrZ5q2/Vuo+7STTpLx/WqnQdZMUfkqK6/DYBm+FWWUivexw2A49lJxAFT3p4zmnalVZtbmUpzTtKjO6G6vM/nnDDakAflB30DSKMmkTxQfy2RpQzaz/MRu5eQcgbHUGSfEmrM8X0FnVsYpaIPETXSKx7B4uTjBy4CFXSA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by VI1PR0801MB1887.eurprd08.prod.outlook.com (2603:10a6:800:85::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4195.22; Fri, 4 Jun 2021 10:12:51 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::f557:1fb2:62cc:5243]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::f557:1fb2:62cc:5243%9]) with mapi id 15.20.4195.024; Fri, 4 Jun 2021 10:12:51 +0000 From: Tamar Christina To: Tamar Christina , Richard Biener CC: Richard Sandiford , nd , "gcc-patches@gcc.gnu.org" Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes. Thread-Topic: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes. Thread-Index: AQHXQdV4g2WFnSd+rEyE6iZSy1EmNqrX6WiAgAABSHCABLQJgIAAAwwAgAAbeYCAF6c1YIABMgmAgAsI3tCAAzBucA== Date: Fri, 4 Jun 2021 10:12:51 +0000 Message-ID: References: <7q3oonr2-92r0-8o9q-s27q-9r735s4n3s3@fhfr.qr> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: x-ts-tracking-id: A9D143B21968F8469BA4E10DEF55E47B.0 x-checkrecipientchecked: true Authentication-Results-Original: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=arm.com; x-originating-ip: [82.11.185.166] x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: 4e742ec1-69b2-46f4-ad3c-08d927415a4d x-ms-traffictypediagnostic: VI1PR0801MB1887:|AS8PR08MB5926: x-ms-exchange-transport-forked: True X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:2657;OLM:2657; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: Rb1pQedbU5ctXVmMFZ1prhWQ3H7kVy1ExyI9JUJkIIoEcOnGQdXXTgZhxZEZ0mBvco5TC5xNvEHNXITOWaJ2MAgraKkvHR2iYIVnaFqN+tpKbwxsCtzzSskNKnDWEV3AiciMC5/rgzxP+amB4e56WygYHiud0eMQGHLU2jcKN6fKGtW5utPMMs8Q0KSPOYvSqBTfT+ZOc7kTv2mibXvk41DAXX3qTjgMhdvuDtYIHkBkGgSsU4maBfZUlWCPbDrAQs9utNRubuoLVAo/p9YseYeVRJMFlp7d/0plXp26gsfUEbw/lMRp7CKFtqYcJTrxhvkt2Ewf+iJIVQssQ4euBEUoNJUsZm6c2eveQuPtJQ8xeszP4zBvfx77VKbvy/01/GZCnQCJJK99sYdOLoeFyTSTAZ3gaZb3I5OiST1TisIawG1T+twIcywWSL8clRHepRgEGRDosdc5hhZAk/UQdUnaIz3Awla0NLaFKVdLyyAdo1j4deL8hvI1VXzj3sRo7JD5WM1GUworHomos+naHBJGSqlhyOmJc5f4l9gYG5KYENqQObBbHALTP3oDARJO+2laJrFG4dvH9Paa4lrL5Jg68H0VeD6/kumllOFBE+612N7PGYY4IzCHtaef9ZGo2c6RyxHbIyrNAcE6BYNLXheuKd+hIkAHsULbXZDvPm8= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(396003)(366004)(39830400003)(346002)(376002)(38100700002)(66946007)(7696005)(6506007)(53546011)(66616009)(54906003)(66556008)(66446008)(64756008)(76116006)(66476007)(5660300002)(52536014)(122000001)(110136005)(8936002)(8676002)(71200400001)(9686003)(86362001)(4326008)(2906002)(26005)(186003)(33656002)(83380400001)(55016002)(99936003)(508600001)(30864003)(559001)(579004)(357404004); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata: =?us-ascii?Q?H/zQqGPIq+vlU6X9Z3FxEmdnuuit2fuyXxWDlsQY1YJfqbrgqIPEt/NfMNdT?= =?us-ascii?Q?IeMp8cDnjsZSsThmIta6jzJzjjwA03n8dHbn2MijDOp32t0frPVdmagCJ38U?= =?us-ascii?Q?LpEBGSsPjYzge00CJuOz8eaEYubAU99cn0RLOAXgEPYYE3Pit6n0Si49j57r?= =?us-ascii?Q?sSAMY5vnOOTt+XTimKv1ow5PB9dsCZnUUUEXexOrcGw9ZdTOkCL1HWOx7KLl?= =?us-ascii?Q?SXPLhpI2YRNFjkNg4pCpqHoO3dCoeJtzAxUElsPvdYgu5m7cUX8mq/EoMya1?= =?us-ascii?Q?Limo8TZpBMGngLge5ieuA3i+8pkqgfToe2vwqO7zl78Knb35OjLlQtV2uTrd?= =?us-ascii?Q?LyrifdxKXYDFaa1JCmESBzRF6QKNVbcnX9OjgD020EmhnBXT4BWhQV9KkdK2?= =?us-ascii?Q?j/LKz7oZeO0dagtvFDcDm0e29YhFt3FhVVELVj9QYx2crRzhAvkU4ifysN2I?= =?us-ascii?Q?beEeVlvA2lJDtKn6OU4CknSDxL8dZLQmSym9zhRIBfRUI8/KqOazdS6dUZ+n?= =?us-ascii?Q?fPhEpXKZmGZ8gM7GSgn/2mp6QQo7+OseBwki4WxbX2LYVZ7dMbA4NfQURNeo?= =?us-ascii?Q?cQgHUR78TKhSawSTtPplm5VQHuYxfKG92lU8ZFbd2UlMOoCWp2YxsZ7F2Z27?= =?us-ascii?Q?9rJtYOkL82hrTl5JdxL3yI6+5RIQIxRXLs1/QJrRyQ7ZVxpbMaGBwVGd82RC?= =?us-ascii?Q?VaVf6I1I0zMETliF0MiXQQGjd44ELgxqB+CZPPwDeA+ZMJCXwA0Enw+dgZSG?= =?us-ascii?Q?oyKUl74NTaw6+k2h/ub/UlIVH77rbbWDMQ5MKB+3jZBrfl+EFhfkCRm6/+oZ?= =?us-ascii?Q?qD1F8DMIvwqA8z11tXmmDiB3L1LwCk6ibPg1OF7VJD862ph4DpKYOnKpvI2f?= =?us-ascii?Q?I4R178vj+Knen+jobIOLpfYL0zQ7HrOfuDyx64IZx0RTwMv31NGIQFkVpxDz?= =?us-ascii?Q?5tWZL816K7g9LEcgbdeujWjDATaVm/CN5OFB55FrqrGRo9iMotN6Bxp6W/OL?= =?us-ascii?Q?QKsh+IuYITjCNQsocz58YLSvuBMegKJ99m1YlVmld/6/Oi9Zf6EwxV/QY8Zg?= =?us-ascii?Q?xmqEDGk0DWpS1Blx79+JQj2WmXyU7YvA2zDM7lyO+YePX/RanIUdBkqi6gQh?= =?us-ascii?Q?RI26g1Jw2exWPYch4sKT1Of+cdCrVqe8Ugd8d5bu3i3v0XPuTd54QGPmQmUi?= =?us-ascii?Q?3LFOcSQ3cvzgjx9riPQxA1XJgAJxoGRfxFqy0+bAJjwn8zRb3hCkogSt1b+Y?= =?us-ascii?Q?vkpedqYCoyYk3HrLkjXexjKOeFuO8pRe/Zg7zmsWffkeq31AEkSJjszWcrow?= =?us-ascii?Q?Vj2e+EN6a162yGvqvzHYMLKJ?= Content-Type: multipart/mixed; boundary="_002_VI1PR08MB532593667144D0CA998C2723FF3B9VI1PR08MB5325eurp_" MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0801MB1887 Original-Authentication-Results: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM5EUR03FT057.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 92323766-b0c4-429d-2367-08d927414fc9 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: vwW0QZkG3339wofx3sKJRSznpdU5cRluv9u/BsENpE3YEg0M9CvyoL+OEJZa/4KB1zs+3jp2IYvlw0F/TKIpjnMdB3DrQA+prKdbr/NBVP3CPp7kPZf9w869cjVBHaNiSasGaXap0gbi9dnXnoSmMO/f9hUaABswFAENNDK/fr97+BY9kLu3pJnSDSVpcExyYgY/t92zm3k3iMMsbQ0LTKMhDYPsQRQoKCfjWMPM0RJ12nzI5o8qm+dk38h34xCgOBN4KV+qE6t/O3YUoK1p5tthBe4i8585j7C3zlWO3jia7efqw/uzhpnSXNhikLKIVcfSyRmgooSmL3uIwTXuBGV0vcNBCHS0C6gBfev51+1j1l/vYKoPSrYQca2WoV57LYsIzthAEsxOFGLXtuchXW0TRZDsVqg8iWRirJURSUMQlRRIcH4nJBem8vNHs1MkkwudvkSwCDBkYoSFfBlmbAr/GrsjyZ0ddrxUUkTJgfp7cUaF2SSo18Ok1izt8ibK0ilLlAEbui8VgJQhhFlYtty1yAfMLbl+LbpyuccWDTVN383BZx4MvrYvivznULi62iVHk6W5FRjbF6E/bVp00ChdDHGe/Ch/qOutI6yxiE+ZFXtnqVFeG8xHR94EnviJjZYMLQ30N6hKcT8ZbGycAqUO6obt9T7wcACo1vtAsLRzm8vpKNKQoB3exJvfYSv4 X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(376002)(396003)(346002)(39830400003)(36840700001)(46966006)(9686003)(66616009)(70206006)(70586007)(7696005)(6506007)(110136005)(54906003)(186003)(5660300002)(26005)(53546011)(336012)(2906002)(47076005)(86362001)(36860700001)(55016002)(8676002)(8936002)(235185007)(81166007)(99936003)(4326008)(52536014)(356005)(30864003)(83380400001)(33656002)(82310400003)(508600001)(579004)(559001)(357404004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 04 Jun 2021 10:13:09.0333 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 4e742ec1-69b2-46f4-ad3c-08d927415a4d X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM5EUR03FT057.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB5926 X-Spam-Status: No, score=-13.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, KAM_LOTSOFHASH, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Jun 2021 10:13:24 -0000 --_002_VI1PR08MB532593667144D0CA998C2723FF3B9VI1PR08MB5325eurp_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi Richi, Attached is re-spun patch. tree_nop_conversion_p was very handy in cleanin= g up the patch, Thanks! Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master if Richard S has no comments? Thanks, Tamar gcc/ChangeLog: * optabs.def (usdot_prod_optab): New. * doc/md.texi: Document it and clarify other dot prod optabs. * optabs-tree.h (enum optab_subtype): Add optab_vector_mixed_sign. * optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab. * optabs.c (expand_widen_pattern_expr): Likewise. * tree-cfg.c (verify_gimple_assign_ternary): Likewise. * tree-vect-loop.c (vectorizable_reduction): Query dot-product kind. * tree-vect-patterns.c (vect_supportable_direct_optab_p): Take optional optab subtype. (vect_joust_widened_type, vect_widened_op_tree): Optionally ignore mismatch types. (vect_recog_dot_prod_pattern): Support usdot_prod_optab. --- inline copy of patch --- diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index d166a0debedf4d8edf55c842bcf4ff4690b3e9ce..9fad3322b3f1eb2a836833bb390= df78f0cd9734b 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -5438,13 +5438,55 @@ Like @samp{fold_left_plus_@var{m}}, but takes an ad= ditional mask operand =20 @cindex @code{sdot_prod@var{m}} instruction pattern @item @samp{sdot_prod@var{m}} + +Compute the sum of the products of two signed elements. +Operand 1 and operand 2 are of the same mode. Their +product, which is of a wider mode, is computed and added to operand 3. +Operand 3 is of a mode equal or wider than the mode of the product. The +result is placed in operand 0, which is of the same mode as operand 3. + +Semantically the expressions perform the multiplication in the following s= igns + +@smallexample +sdot =3D=3D + res =3D sign-ext (a) * sign-ext (b) + c +@dots{} +@end smallexample + @cindex @code{udot_prod@var{m}} instruction pattern -@itemx @samp{udot_prod@var{m}} -Compute the sum of the products of two signed/unsigned elements. -Operand 1 and operand 2 are of the same mode. Their product, which is of a -wider mode, is computed and added to operand 3. Operand 3 is of a mode equ= al or -wider than the mode of the product. The result is placed in operand 0, whi= ch -is of the same mode as operand 3. +@item @samp{udot_prod@var{m}} + +Compute the sum of the products of two unsigned elements. +Operand 1 and operand 2 are of the same mode. Their +product, which is of a wider mode, is computed and added to operand 3. +Operand 3 is of a mode equal or wider than the mode of the product. The +result is placed in operand 0, which is of the same mode as operand 3. + +Semantically the expressions perform the multiplication in the following s= igns + +@smallexample +udot =3D=3D + res =3D zero-ext (a) * zero-ext (b) + c +@dots{} +@end smallexample + + + +@cindex @code{usdot_prod@var{m}} instruction pattern +@item @samp{usdot_prod@var{m}} +Compute the sum of the products of elements of different signs. +Operand 1 must be unsigned and operand 2 signed. Their +product, which is of a wider mode, is computed and added to operand 3. +Operand 3 is of a mode equal or wider than the mode of the product. The +result is placed in operand 0, which is of the same mode as operand 3. + +Semantically the expressions perform the multiplication in the following s= igns + +@smallexample +usdot =3D=3D + res =3D ((unsigned-conv) sign-ext (a)) * zero-ext (b) + c +@dots{} +@end smallexample =20 @cindex @code{ssad@var{m}} instruction pattern @item @samp{ssad@var{m}} diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h index c3aaa1a416991e856d3e24da45968a92ebada82c..fbd2b06b8dbfd560dfb66b31483= 0e6b564b37abb 100644 --- a/gcc/optabs-tree.h +++ b/gcc/optabs-tree.h @@ -29,7 +29,8 @@ enum optab_subtype { optab_default, optab_scalar, - optab_vector + optab_vector, + optab_vector_mixed_sign }; =20 /* Return the optab used for computing the given operation on the type giv= en by diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c index 95ffe397c23e80c105afea52e9d47216bf52f55a..eeb5aeed3202cc6971b6447994b= c5311e9c010bb 100644 --- a/gcc/optabs-tree.c +++ b/gcc/optabs-tree.c @@ -127,7 +127,12 @@ optab_for_tree_code (enum tree_code code, const_tree t= ype, return TYPE_UNSIGNED (type) ? usum_widen_optab : ssum_widen_optab; =20 case DOT_PROD_EXPR: - return TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab; + { + if (subtype =3D=3D optab_vector_mixed_sign) + return usdot_prod_optab; + + return (TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab); + } =20 case SAD_EXPR: return TYPE_UNSIGNED (type) ? usad_optab : ssad_optab; diff --git a/gcc/optabs.c b/gcc/optabs.c index f4614a394587787293dc8b680a38901f7906f61c..d9b64441d0e0726afee89dc9c93= 7350451e7670d 100644 --- a/gcc/optabs.c +++ b/gcc/optabs.c @@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op= 1, rtx wide_op, bool sbool =3D false; =20 oprnd0 =3D ops->op0; + if (nops >=3D 2) + oprnd1 =3D ops->op1; + if (nops >=3D 3) + oprnd2 =3D ops->op2; + tmode0 =3D TYPE_MODE (TREE_TYPE (oprnd0)); if (ops->code =3D=3D VEC_UNPACK_FIX_TRUNC_HI_EXPR || ops->code =3D=3D VEC_UNPACK_FIX_TRUNC_LO_EXPR) @@ -285,6 +290,27 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op= 1, rtx wide_op, ? vec_unpacks_sbool_hi_optab : vec_unpacks_sbool_lo_optab); sbool =3D true; } + else if (ops->code =3D=3D DOT_PROD_EXPR) + { + enum optab_subtype subtype =3D optab_default; + signop sign1 =3D TYPE_SIGN (TREE_TYPE (oprnd0)); + signop sign2 =3D TYPE_SIGN (TREE_TYPE (oprnd1)); + if (sign1 =3D=3D sign2) + ; + else if (sign1 =3D=3D SIGNED && sign2 =3D=3D UNSIGNED) + { + subtype =3D optab_vector_mixed_sign; + /* Same as optab_vector_mixed_sign but flip the operands. */ + std::swap (op0, op1); + } + else if (sign1 =3D=3D UNSIGNED && sign2 =3D=3D SIGNED) + subtype =3D optab_vector_mixed_sign; + else + gcc_unreachable (); + + widen_pattern_optab + =3D optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype); + } else widen_pattern_optab =3D optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_defaul= t); @@ -298,10 +324,7 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op= 1, rtx wide_op, gcc_assert (icode !=3D CODE_FOR_nothing); =20 if (nops >=3D 2) - { - oprnd1 =3D ops->op1; - tmode1 =3D TYPE_MODE (TREE_TYPE (oprnd1)); - } + tmode1 =3D TYPE_MODE (TREE_TYPE (oprnd1)); else if (sbool) { nops =3D 2; @@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1= , rtx wide_op, { gcc_assert (tmode1 =3D=3D tmode0); gcc_assert (op1); - oprnd2 =3D ops->op2; wmode =3D TYPE_MODE (TREE_TYPE (oprnd2)); } =20 diff --git a/gcc/optabs.def b/gcc/optabs.def index b192a9d070b8aa72e5676b2eaa020b5bdd7ffcc8..f470c2168378cec840edf7fbdb7= c18615baae928 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil") OPTAB_D (sdot_prod_optab, "sdot_prod$I$a") OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3") OPTAB_D (udot_prod_optab, "udot_prod$I$a") +OPTAB_D (usdot_prod_optab, "usdot_prod$I$a") OPTAB_D (usum_widen_optab, "widen_usum$I$a3") OPTAB_D (usad_optab, "usad$I$a") OPTAB_D (ssad_optab, "ssad$I$a") diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c index 7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..0128891852fcd74fe31cd338614= e90a26256b4bd 100644 --- a/gcc/tree-cfg.c +++ b/gcc/tree-cfg.c @@ -4421,7 +4421,8 @@ verify_gimple_assign_ternary (gassign *stmt) && !SCALAR_FLOAT_TYPE_P (rhs1_type)) || (!INTEGRAL_TYPE_P (lhs_type) && !SCALAR_FLOAT_TYPE_P (lhs_type)))) - || !types_compatible_p (rhs1_type, rhs2_type) + /* rhs1_type and rhs2_type may differ in sign. */ + || !tree_nop_conversion_p (rhs1_type, rhs2_type) || !useless_type_conversion_p (lhs_type, rhs3_type) || maybe_lt (GET_MODE_SIZE (element_mode (rhs3_type)), 2 * GET_MODE_SIZE (element_mode (rhs1_type)))) diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 93fa2928e001c154bd4a9a73ac1dbbbf73c456df..756d2867b678d0d8394202c6adb= 03d9cd26029e7 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -6662,6 +6662,12 @@ vectorizable_reduction (loop_vec_info loop_vinfo, bool lane_reduc_code_p =3D (code =3D=3D DOT_PROD_EXPR || code =3D=3D WIDEN_SUM_EXPR || code = =3D=3D SAD_EXPR); int op_type =3D TREE_CODE_LENGTH (code); + enum optab_subtype optab_query_kind =3D optab_vector; + if (code =3D=3D DOT_PROD_EXPR + && TYPE_SIGN (TREE_TYPE (gimple_assign_rhs1 (stmt))) + !=3D TYPE_SIGN (TREE_TYPE (gimple_assign_rhs2 (stmt)))) + optab_query_kind =3D optab_vector_mixed_sign; + =20 scalar_dest =3D gimple_assign_lhs (stmt); scalar_type =3D TREE_TYPE (scalar_dest); @@ -7189,7 +7195,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo, bool ok =3D true; =20 /* 4.1. check support for the operation in the loop */ - optab optab =3D optab_for_tree_code (code, vectype_in, optab_vector)= ; + optab optab =3D optab_for_tree_code (code, vectype_in, optab_query_k= ind); if (!optab) { if (dump_enabled_p ()) diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c index 441d6cd28c4eaded7abd756164890dbcffd2f3b8..82123b96313e6783ea214b92598= 05d65c07d8858 100644 --- a/gcc/tree-vect-patterns.c +++ b/gcc/tree-vect-patterns.c @@ -201,7 +201,8 @@ vect_get_external_def_edge (vec_info *vinfo, tree var) static bool vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code co= de, tree itype, tree *vecotype_out, - tree *vecitype_out =3D NULL) + tree *vecitype_out =3D NULL, + enum optab_subtype subtype =3D optab_default) { tree vecitype =3D get_vectype_for_scalar_type (vinfo, itype); if (!vecitype) @@ -211,7 +212,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo, tree = otype, tree_code code, if (!vecotype) return false; =20 - optab optab =3D optab_for_tree_code (code, vecitype, optab_default); + optab optab =3D optab_for_tree_code (code, vecitype, subtype); if (!optab) return false; =20 @@ -487,10 +488,14 @@ vect_joust_widened_integer (tree type, bool shift_p, = tree op, } =20 /* Return true if the common supertype of NEW_TYPE and *COMMON_TYPE - is narrower than type, storing the supertype in *COMMON_TYPE if so. */ + is narrower than type, storing the supertype in *COMMON_TYPE if so. + If UNPROM_TYPE then accept that *COMMON_TYPE and NEW_TYPE may be of + different signs but equal precision and that the resulting + multiplication of them be compatible with UNPROM_TYPE. */ =20 static bool -vect_joust_widened_type (tree type, tree new_type, tree *common_type) +vect_joust_widened_type (tree type, tree new_type, tree *common_type, + tree unprom_type =3D NULL) { if (types_compatible_p (*common_type, new_type)) return true; @@ -514,7 +519,18 @@ vect_joust_widened_type (tree type, tree new_type, tre= e *common_type) unsigned int precision =3D MAX (TYPE_PRECISION (*common_type), TYPE_PRECISION (new_type)); precision *=3D 2; - if (precision * 2 > TYPE_PRECISION (type)) + + /* Check if the mismatch is only in the sign and if we have + UNPROM_TYPE then allow it if there is enough precision to + not lose any information during the conversion. */ + if (unprom_type + && TYPE_SIGN (unprom_type) =3D=3D SIGNED + && tree_nop_conversion_p (*common_type, new_type)) + return true; + + /* The resulting application is unsigned, check if we have enough + precision to perform the operation. */ + if (precision * 2 > TYPE_PRECISION (unprom_type ? unprom_type : type)) return false; =20 *common_type =3D build_nonstandard_integer_type (precision, false); @@ -532,6 +548,10 @@ vect_joust_widened_type (tree type, tree new_type, tre= e *common_type) to a type that (a) is narrower than the result of STMT_INFO and (b) can hold all leaf operand values. =20 + If UNPROM_TYPE then allow that the signs of the operands + may differ in signs but not in precision and that the resulting type + of the operation on the operands is compatible with UNPROM_TYPE. + Return 0 if STMT_INFO isn't such a tree, or if no such COMMON_TYPE exists. */ =20 @@ -539,7 +559,8 @@ static unsigned int vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code = code, tree_code widened_code, bool shift_p, unsigned int max_nops, - vect_unpromoted_value *unprom, tree *common_type) + vect_unpromoted_value *unprom, tree *common_type, + tree unprom_type =3D NULL) { /* Check for an integer operation with the right code. */ gassign *assign =3D dyn_cast (stmt_info->stmt); @@ -600,7 +621,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info st= mt_info, tree_code code, =3D vinfo->lookup_def (this_unprom->op); nops =3D vect_widened_op_tree (vinfo, def_stmt_info, code, widened_code, shift_p, max_nops, - this_unprom, common_type); + this_unprom, common_type, + unprom_type); if (nops =3D=3D 0) return 0; =20 @@ -617,7 +639,7 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info st= mt_info, tree_code code, if (i =3D=3D 0) *common_type =3D this_unprom->type; else if (!vect_joust_widened_type (type, this_unprom->type, - common_type)) + common_type, unprom_type)) return 0; } } @@ -799,12 +821,15 @@ vect_convert_input (vec_info *vinfo, stmt_vec_info st= mt_info, tree type, } =20 /* Invoke vect_convert_input for N elements of UNPROM and store the - result in the corresponding elements of RESULT. */ + result in the corresponding elements of RESULT. + + If ALLOW_SHORT_SIGN_MISMATCH then don't convert the types if they only + differ by sign. */ =20 static void vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned in= t n, tree *result, tree type, vect_unpromoted_value *unprom, - tree vectype) + tree vectype, bool allow_short_sign_mismatch =3D false) { for (unsigned int i =3D 0; i < n; ++i) { @@ -812,8 +837,12 @@ vect_convert_inputs (vec_info *vinfo, stmt_vec_info st= mt_info, unsigned int n, for (j =3D 0; j < i; ++j) if (unprom[j].op =3D=3D unprom[i].op) break; + if (j < i) result[i] =3D result[j]; + else if (allow_short_sign_mismatch + && tree_nop_conversion_p (type, unprom[i].type)) + result[i] =3D unprom[i].op; else result[i] =3D vect_convert_input (vinfo, stmt_info, type, &unprom[i], vectype); @@ -888,21 +917,24 @@ vect_reassociating_reduction_p (vec_info *vinfo, =20 Try to find the following pattern: =20 - type x_t, y_t; + type1a x_t + type1b y_t; TYPE1 prod; TYPE2 sum =3D init; loop: sum_0 =3D phi S1 x_t =3D ... S2 y_t =3D ... - S3 x_T =3D (TYPE1) x_t; - S4 y_T =3D (TYPE1) y_t; + S3 x_T =3D (TYPE3) x_t; + S4 y_T =3D (TYPE4) y_t; S5 prod =3D x_T * y_T; [S6 prod =3D (TYPE2) prod; #optional] S7 sum_1 =3D prod + sum_0; =20 - where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2' is= the - same size of 'TYPE1' or bigger. This is a special case of a reduction + where 'TYPE1' is exactly double the size of type 'type1a' and 'type1b', + the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the sign of + 'type1a' and 'type1b' can differ. 'TYPE2' is the same size of 'TYPE1' o= r + bigger and must be the same sign. This is a special case of a reduction computation. =20 Input: @@ -939,15 +971,16 @@ vect_recog_dot_prod_pattern (vec_info *vinfo, =20 /* Look for the following pattern DX =3D (TYPE1) X; - DY =3D (TYPE1) Y; + DY =3D (TYPE2) Y; DPROD =3D DX * DY; - DDPROD =3D (TYPE2) DPROD; + DDPROD =3D (TYPE3) DPROD; sum_1 =3D DDPROD + sum_0; In which - DX is double the size of X - DY is double the size of Y - DX, DY, DPROD all have the same type but the sign - between DX, DY and DPROD can differ. + between DX, DY and DPROD can differ. The sign of DPROD + is one of the signs of DX or DY. - sum is the same size of DPROD or bigger - sum has been recognized as a reduction variable. =20 @@ -986,20 +1019,29 @@ vect_recog_dot_prod_pattern (vec_info *vinfo, inside the loop (in case we are analyzing an outer-loop). */ vect_unpromoted_value unprom0[2]; if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR, WIDEN_MULT_EXPR= , - false, 2, unprom0, &half_type)) + false, 2, unprom0, &half_type, + TREE_TYPE (unprom_mult.op))) return NULL; =20 + /* Check to see if there is a sign change happening in the operands of t= he + multiplication and pick the appropriate optab subtype. */ + enum optab_subtype subtype; + if (TYPE_SIGN (unprom0[0].type) =3D=3D TYPE_SIGN (unprom0[1].type)) + subtype =3D optab_default; + else + subtype =3D optab_vector_mixed_sign; + vect_pattern_detected ("vect_recog_dot_prod_pattern", last_stmt); =20 tree half_vectype; if (!vect_supportable_direct_optab_p (vinfo, type, DOT_PROD_EXPR, half_t= ype, - type_out, &half_vectype)) + type_out, &half_vectype, subtype)) return NULL; =20 /* Get the inputs in the appropriate types. */ tree mult_oprnd[2]; vect_convert_inputs (vinfo, stmt_vinfo, 2, mult_oprnd, half_type, - unprom0, half_vectype); + unprom0, half_vectype, true); =20 var =3D vect_recog_temp_ssa_var (type, NULL); pattern_stmt =3D gimple_build_assign (var, DOT_PROD_EXPR, > -----Original Message----- > From: Gcc-patches bounces+tamar.christina=3Darm.com@gcc.gnu.org> On Behalf Of Tamar > Christina via Gcc-patches > Sent: Wednesday, June 2, 2021 10:28 AM > To: Richard Biener > Cc: Richard Sandiford ; nd ; > gcc-patches@gcc.gnu.org > Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product > where the sign for the multiplicant changes. >=20 > Ping, >=20 > Did you have any comments Richard S? >=20 > Otherwise I'll proceed with respining according to Richi's comments. >=20 > Regards, > Tamar >=20 > > -----Original Message----- > > From: Richard Biener > > Sent: Wednesday, May 26, 2021 9:57 AM > > To: Tamar Christina > > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford > > > > Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product > > where the sign for the multiplicant changes. > > > > On Tue, 25 May 2021, Tamar Christina wrote: > > > > > Hi Richi, > > > > > > Here's a respun version of the patch. > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > > > > > Ok for master? > > > > index > > > 7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..13e405edd765dde704c64348d > > 2d0b3cd88f0af7c > > 100644 > > --- a/gcc/tree-cfg.c > > +++ b/gcc/tree-cfg.c > > @@ -4421,7 +4421,9 @@ verify_gimple_assign_ternary (gassign *stmt) > > && !SCALAR_FLOAT_TYPE_P (rhs1_type)) > > || (!INTEGRAL_TYPE_P (lhs_type) > > && !SCALAR_FLOAT_TYPE_P (lhs_type)))) > > - || !types_compatible_p (rhs1_type, rhs2_type) > > + || (!types_compatible_p (rhs1_type, rhs2_type) > > + && TYPE_SIGN (rhs1_type) =3D=3D TYPE_SIGN (rhs2_type) > > + && TYPE_PRECISION (rhs1_type) !=3D TYPE_PRECISION > > (rhs2_type)) > > > > I think this doesn't capture the constraints - instead please do > > > > - || !types_compatible_p (rhs1_type, rhs2_type) > > + /* rhs1_type and rhs2_type may differ in sign. */ > > + || !tree_nop_conversion_p (rhs1_type, rhs2_type) > > > > > > +/* Determine the optab_subtype to use for the given CODE and STMT. > For > > + most CODE this will be optab_vector, however for certain > > +operations > > such as > > + DOT_PROD_EXPR where the operation can different signs for the > > operands > > we > > + need to be able to pick the right optabs. */ > > + > > +static enum optab_subtype > > +vect_determine_dot_kind (tree_code code, stmt_vec_info stmt_vinfo) > > > > vect_determine_optab_subkind would be a better name. 'code' is > > redundant (or should better match stmt_vinfo->stmts code). I wonder > > if it might be clearer to compute the subtype where we compute 'code' > > and the relation to stmt_info is obvious, I mean here: > > > > /* 3. Check the operands of the operation. The first operands are > > defined > > inside the loop body. The last operand is the reduction variabl= e, > > which is defined by the loop-header-phi. */ > > > > tree vectype_out =3D STMT_VINFO_VECTYPE (stmt_info); > > STMT_VINFO_REDUC_VECTYPE (reduc_info) =3D vectype_out; > > gassign *stmt =3D as_a (stmt_info->stmt); > > enum tree_code code =3D gimple_assign_rhs_code (stmt); > > bool lane_reduc_code_p > > =3D (code =3D=3D DOT_PROD_EXPR || code =3D=3D WIDEN_SUM_EXPR || cod= e =3D=3D > > SAD_EXPR); > > > > so just add > > > > enum optab_subtype optab_query_kind =3D optab_vector; > > if (code =3D=3D DOT_PROD_EXPR > > && ) > > optab_query_kind =3D optab_vector_mixed_sign; > > > > in this place and avoid adding the new function? > > > > I'm not too familiar with the pattern recog code, a 2nd eye would be > > prefered (Richard?), but > > > > + /* Check if the mismatch is only in the sign and if we have > > + allow_short_sign_mismatch then allow it. */ if (unprom_type > > + && TYPE_SIGN (unprom_type) =3D=3D SIGNED > > + && TYPE_SIGN (*common_type) !=3D TYPE_SIGN (new_type)) > > + { > > + bool sign =3D TYPE_SIGN (*common_type) =3D=3D UNSIGNED; > > + tree eq_type > > + =3D build_nonstandard_integer_type (TYPE_PRECISION (new_type), > > + sign); > > + > > + if (types_compatible_p (*common_type, eq_type)) > > + return true; > > + } > > > > looks somewhat complicated - is that equal to > > > > if (unprom_type > > && tree_nop_conversion_p (*common_type, new_type)) > > return true; > > > > ? That is, *common_type and new_type only differ in sign? > > > > @@ -812,8 +844,13 @@ vect_convert_inputs (vec_info *vinfo, > > stmt_vec_info stmt_info, unsigned int n, > > for (j =3D 0; j < i; ++j) > > if (unprom[j].op =3D=3D unprom[i].op) > > break; > > + bool only_sign =3D allow_short_sign_mismatch > > + && TYPE_SIGN (type) !=3D TYPE_SIGN (unprom[i].ty= pe) > > + && TYPE_PRECISION (type) =3D=3D TYPE_PRECISION > > (unprom[i].type); > > > > this could use the same tree_nop_conversion_p predicate. > > > > Otherwise the patch looks good. > > > > Thanks, > > Richard. > > > > > > > > > Thanks, > > > Tamar > > > > > > gcc/ChangeLog: > > > > > > * optabs.def (usdot_prod_optab): New. > > > * doc/md.texi: Document it and clarify other dot prod optabs. > > > * optabs-tree.h (enum optab_subtype): Add > > optab_vector_mixed_sign. > > > * optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab. > > > * optabs.c (expand_widen_pattern_expr): Likewise. > > > * tree-cfg.c (verify_gimple_assign_ternary): Likewise. > > > * tree-vect-loop.c (vect_determine_dot_kind): New. > > > (vectorizable_reduction): Query dot-product kind. > > > * tree-vect-patterns.c (vect_supportable_direct_optab_p): Take > > optional > > > optab subtype. > > > (vect_joust_widened_type, vect_widened_op_tree): Optionally > > ignore > > > mismatch types. > > > (vect_recog_dot_prod_pattern): Support usdot_prod_optab. > > > > > > > > > > -----Original Message----- > > > > From: Richard Biener > > > > Sent: Monday, May 10, 2021 2:29 PM > > > > To: Tamar Christina > > > > Cc: gcc-patches@gcc.gnu.org; nd > > > > Subject: RE: [PATCH 1/4]middle-end Vect: Add support for > > > > dot-product where the sign for the multiplicant changes. > > > > > > > > On Mon, 10 May 2021, Tamar Christina wrote: > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > From: Richard Biener > > > > > > Sent: Monday, May 10, 2021 12:40 PM > > > > > > To: Tamar Christina > > > > > > Cc: gcc-patches@gcc.gnu.org; nd > > > > > > Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot- > > product > > > > > > where the sign for the multiplicant changes. > > > > > > > > > > > > On Fri, 7 May 2021, Tamar Christina wrote: > > > > > > > > > > > > > Hi Richi, > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > From: Richard Biener > > > > > > > > Sent: Friday, May 7, 2021 12:46 PM > > > > > > > > To: Tamar Christina > > > > > > > > Cc: gcc-patches@gcc.gnu.org; nd > > > > > > > > Subject: Re: [PATCH 1/4]middle-end Vect: Add support for > > > > > > > > dot-product where the sign for the multiplicant changes. > > > > > > > > > > > > > > > > On Wed, 5 May 2021, Tamar Christina wrote: > > > > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > > > > > This patch adds support for a dot product where the sign > > > > > > > > > of the multiplication arguments differ. i.e. one is > > > > > > > > > signed and one is unsigned but the precisions are the sam= e. > > > > > > > > > > > > > > > > > > #define N 480 > > > > > > > > > #define SIGNEDNESS_1 unsigned #define SIGNEDNESS_2 > > > > > > > > > signed #define SIGNEDNESS_3 signed #define SIGNEDNESS_4 > > > > > > > > > unsigned > > > > > > > > > > > > > > > > > > SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 > > > > > > > > > int res, > > > > > > > > > SIGNEDNESS_3 char *restrict a, > > > > > > > > > SIGNEDNESS_4 char *restrict b) { > > > > > > > > > for (__INTPTR_TYPE__ i =3D 0; i < N; ++i) > > > > > > > > > { > > > > > > > > > int av =3D a[i]; > > > > > > > > > int bv =3D b[i]; > > > > > > > > > SIGNEDNESS_2 short mult =3D av * bv; > > > > > > > > > res +=3D mult; > > > > > > > > > } > > > > > > > > > return res; > > > > > > > > > } > > > > > > > > > > > > > > > > > > The operations are performed as if the operands were > > extended > > > > > > > > > to a 32-bit > > > > > > > > value. > > > > > > > > > As such this operation isn't valid if there is an > > > > > > > > > intermediate conversion to an unsigned value. i.e. if > > > > > > > > > SIGNEDNESS_2 is > > unsigned. > > > > > > > > > > > > > > > > > > more over if the signs of SIGNEDNESS_3 and SIGNEDNESS_4 > > > > > > > > > are flipped the same optab is used but the operands are > > > > > > > > > flipped in the optab > > > > > > > > expansion. > > > > > > > > > > > > > > > > > > To support this the patch extends the dot-product > > > > > > > > > detection to optionally ignore operands with different > > > > > > > > > signs and stores this information in the optab subtype > > > > > > > > > which is now made a > > bitfield. > > > > > > > > > > > > > > > > > > The subtype can now additionally controls which optab an > > > > > > > > > EXPR can expand > > > > > > > > to. > > > > > > > > > > > > > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no > > issues. > > > > > > > > > > > > > > > > > > Ok for master? > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > Tamar > > > > > > > > > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > > > > > > > > > * optabs.def (usdot_prod_optab): New. > > > > > > > > > * doc/md.texi: Document it. > > > > > > > > > * optabs-tree.c (optab_for_tree_code): Support > > > > usdot_prod_optab. > > > > > > > > > * optabs-tree.h (enum optab_subtype): Likewise. > > > > > > > > > * optabs.c (expand_widen_pattern_expr): Likewise. > > > > > > > > > * tree-cfg.c (verify_gimple_assign_ternary): Likewise. > > > > > > > > > * tree-vect-loop.c (vect_determine_dot_kind): New. > > > > > > > > > (vectorizable_reduction): Query dot-product kind. > > > > > > > > > * tree-vect-patterns.c (vect_supportable_direct_optab_p)= : > > > > > > > > > Take > > > > > > > > optional > > > > > > > > > optab subtype. > > > > > > > > > (vect_joust_widened_type, vect_widened_op_tree): > > > > Optionally > > > > > > > > ignore > > > > > > > > > mismatch types. > > > > > > > > > (vect_recog_dot_prod_pattern): Support usdot_prod_optab. > > > > > > > > > > > > > > > > > > --- inline copy of patch -- diff --git a/gcc/doc/md.texi > > > > > > > > > b/gcc/doc/md.texi index > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > d166a0debedf4d8edf55c842bcf4ff4690b3e9ce..baf20416e63745097825fc30fd > > > > > > > > f2 > > > > > > > > > e66bc80d7d23 100644 > > > > > > > > > --- a/gcc/doc/md.texi > > > > > > > > > +++ b/gcc/doc/md.texi > > > > > > > > > @@ -5440,11 +5440,13 @@ Like > > @samp{fold_left_plus_@var{m}}, > > > > > > > > > but > > > > > > > > takes > > > > > > > > > an additional mask operand @item > > > > > > > > > @samp{sdot_prod@var{m}} > > > > > > @cindex > > > > > > > > > @code{udot_prod@var{m}} instruction pattern @itemx > > > > > > > > > @samp{udot_prod@var{m}} > > > > > > > > > +@cindex @code{usdot_prod@var{m}} instruction pattern > > @itemx > > > > > > > > > +@samp{usdot_prod@var{m}} > > > > > > > > > Compute the sum of the products of two signed/unsigned > > > > elements. > > > > > > > > > -Operand 1 and operand 2 are of the same mode. Their > > > > > > > > > product, which is of a -wider mode, is computed and added= to > operand 3. > > > > > > > > > Operand 3 is of a mode equal or -wider than the mode of > > > > > > > > > the product. The result is placed in operand 0, which > > > > > > > > > -is of the same mode > > > > > > as operand 3. > > > > > > > > > +Operand 1 and operand 2 are of the same mode but may > > > > > > > > > +differ in > > > > > > signs. > > > > > > > > > +Their product, which is of a wider mode, is computed > > > > > > > > > +and added to > > > > > > > > operand 3. > > > > > > > > > +Operand 3 is of a mode equal or wider than the mode of > > > > > > > > > +the > > > > product. > > > > > > > > > +The result is placed in operand 0, which is of the same > > > > > > > > > +mode as > > > > > > operand 3. > > > > > > > > > > > > > > > > This doesn't really say what the 's', 'u' and 'us' specify. > > > > > > > > Since we're doing a widen multiplication and then a > > > > > > > > non-widening addition we only need to know the effective > > > > > > > > sign of the multiplication so I think > > > > > > the existing 's' and 'u' > > > > > > > > are enough to cover all cases? > > > > > > > > > > > > > > The existing 's' and 'u' enforce that both operands of the > > > > > > > multiplication are of the same sign. So for e.g. 'u' both > > > > > > > operand must be > > > > > > unsigned. > > > > > > > > > > > > > > In the `us` case one can be signed and one unsigned. > > > > > > > Operationally this does a sign extension to the wider type > > > > > > > for the signed value, and the unsigned value gets zero > > > > > > > extended first, and then converts it to unsigned to perform > > > > > > > the unsigned multiplication, conforming to the C > > > > > > promotion rules. > > > > > > > > > > > > > > TL;DR; Without a new optab I can't tell during expansion > > > > > > > which semantic the operation had at the gimple/C level as > > > > > > > modes don't > > carry > > > > signs. > > > > > > > > > > > > > > Long version: > > > > > > > > > > > > > > The problem with using the existing patterns, because of > > > > > > > their enforcement of `av` and `bv` being the same sign is > > > > > > > that we can't remove the explicit sign extensions, but the > > > > > > > multiplication must be done on > > > > > > the sign/zero extended char input in the same sign. > > > > > > > > > > > > > > Which means (unless I am mistaken) to get the correct > > > > > > > result, you can't use neither `udot` nor `sdot` as > > > > > > > semantically these would zero or sign extend both operands > > > > > > > from char to int to perform the multiplication in the same > > > > > > > sigh. Whereas in this case, one parameter is zero > > > > > > and one parameter is sign extended and the result is always an > > > > > > unsigned number. > > > > > > > > > > > > > > So basically > > > > > > > > > > > > > > udot =3D=3D > > > > > > > c =3D zero-ext (a) * zero-ext (b) sdot > > > > > > signed > > > > > > > b> =3D=3D > > > > > > > c =3D sign-ext (a) * sign-ext (b) usdot > > > > > > unsigned a, signed b> =3D=3D > > > > > > > c =3D ((unsigned-conv) sign-ext (a)) * zero-ext (b) > > > > > > > > > > > > > > So semantically the existing optabs won't fit here. udot > > > > > > > would internally promote to unsigned types before the > > > > > > > multiplication so the result of the multiplication would be > > > > > > > wrong. sdot would promote both to > > > > > > signed and do signed multiplication, so the result is also wron= g. > > > > > > > > > > > > > > Now if I relax the constraint on the signs of udot and sdot > > > > > > > there are two > > > > > > problems: > > > > > > > RTL Modes don't contain signs. So a target can't tell me > > > > > > > how the operands > > > > > > will be promoted. > > > > > > > So: > > > > > > > > > > > > > > 1) I can't really check which semantics the target will > > > > > > > adhere to on > > > > > > expansion. > > > > > > > 2) at expand time I have no way to differentiate between the > > > > > > > two > > > > > > instructions variants, given just modes > > > > > > > I can't tell whether I expand to the normal dot-product > > > > > > > or the new > > > > > > instruction. > > > > > > > > > > > > Ah, OK. Indeed with such a weird instruction the new variant > > > > > > makes > > > > sense. > > > > > > Still can you please amend the optab documentation to say > > > > > > which operand is unsigned and which is signed? Just 'may diffe= r in > signs' > > > > > > is bad. > > > > > > > > > > Sure, will expand on it. > > > > > > > > > > > > > > > > > Since the multiplication is commutative I wonder why you need > > > > > > to handle both signed_to_unsigned and unsigned_to_signed - we > > should > > > > > > just enforce a canonical order (like the optab does). > > > > > > > > > > Sure, I thought it would have been better to change the order at > > > > > expand time, but can do so at detection time. > > > > > > > > > > > I also think it's a particular bad fit for the bad > > > > > > optab_for_tree_code API - would any of that improve when using > > > > > > a direct internal function here? > > > > > > > > > > Somewhat, but this has considerable knock on effects, e.g. > > > > > currently DOT_PROD is treated as a widening operation and so is > > > > > handled by supportable_widening_operation which does not support > > > > > calls. There's > > a > > > > > significant number of places which work on the tree EXPR > > > > > (including > > > > constant folding) which all need to be changed. > > > > > > > > > > > In particular all the changes around optab_subtype look like > > > > > > they make a bad API worse ... at least a single > > > > > > optab_vector_mixed_sign should suffice here, no need to make it= a > flags kind. > > > > > > > > > > The reason I did so is because depending on where the query is > > > > > done it does use different subtypes currently. During detection > > > > > it uses optab_default, and during vectorization optab_vector. > > > > > For this instruction this difference doesn't seem to be used, > > > > > but did not want to > > > > lose this information in case something depended on it. > > > > > > > > > > But can make it just one. > > > > > > > > > > > > > > > > > + /* If we have a sign changing dot product we need to check > > > > > > + that > > the > > > > > > + promoted type if unsigned has at least the same > > > > > > + precision as the > > > > > > final > > > > > > + type of the dot-product. */ if (subtype !=3D > > > > > > + optab_default) > > > > > > + { > > > > > > + tree mult_type =3D TREE_TYPE (unprom_mult.op); > > > > > > + if (TYPE_SIGN (mult_type) =3D=3D UNSIGNED > > > > > > + && TYPE_PRECISION (mult_type) < TYPE_PRECISION (type)= ) > > > > > > + return NULL; > > > > > > + } > > > > > > > > > > > > I don't understand this - how do we ever arrive at a result > > > > > > with less > > > > precision? > > > > > > > > > > The user could have manually truncated the results, i.e. in the > > > > > detection code notice `mult` > > > > > > > > > > int av =3D a[i]; > > > > > int bv =3D b[i]; > > > > > SIGNEDNESS_2 short mult =3D av * bv; > > > > > res +=3D mult; > > > > > > > > > > which is a short, so it's manually truncating the multiplication > > > > > which is done as int by the instruction. If `mult` is unsigned > > > > > then it will truncate the result if the signed input to usdot > > > > > was negative, unless the Intermediate calculation is of the same > > > > > precision as the instruction. i.e. if mult is unsigned int then > > > > > there's no truncation going on, it's casting from int to > > > > > unsigned int so it's safe to use then as the instruction does the= same > thing internally. > > > > > > > > It looks to me that we simply should only ever allow sing-changes > > > > from multiplication result to the sum. At least your example > > > > above is not > > special to > > > > mixed sign multiplications, no? > > > > > > > > > > And why's this not an issue for signed multiplication? > > > > > > > > > > It is, but in that case it's handled by the type jousting, which > > > > > doesn't allow the type mismatch. i.e. > > > > > > > > > > #define SIGNEDNESS_1 unsigned > > > > > #define SIGNEDNESS_2 unsigned > > > > > #define SIGNEDNESS_3 signed > > > > > #define SIGNEDNESS_4 signed > > > > > > > > > > SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int > > > > > res, > > > > > SIGNEDNESS_3 char *restrict a, > > > > > SIGNEDNESS_4 char *restrict b) { > > > > > for (__INTPTR_TYPE__ i =3D 0; i < N; ++i) > > > > > { > > > > > int av =3D a[i]; > > > > > int bv =3D b[i]; > > > > > SIGNEDNESS_2 short mult =3D av * bv; > > > > > res +=3D mult; > > > > > } > > > > > return res; > > > > > } > > > > > > > > > > Is also not detected as a dot product. By adding the carve out > > > > > to the widen multiplication detection it now allows this case > > > > > through so I handle it in the detection code. Thinking about it > > > > > now, it seems more logical to add this case handling inside the > > > > > type jousting code as I don't think it's ever something you'd wan= t. > > > > > > > > Yeah, I think we only need to look through sign changes on the > > multiplication > > > > result. > > > > > > > > > > Also... > > > > > > > > > > > > + /* If we have a sign changing dot-product the dot-product > > > > > > + itself does > > > > > > any > > > > > > + sign conversions, so consume the type and use the > > > > > > + unpromoted > > > > types. > > > > > > */ > > > > > > + tree mult_arg1, mult_arg2; > > > > > > + if (subtype =3D=3D optab_default) > > > > > > + { > > > > > > + mult_arg1 =3D mult_oprnd[0]; > > > > > > + mult_arg2 =3D mult_oprnd[1]; > > > > > > + } > > > > > > + else > > > > > > + { > > > > > > + mult_arg1 =3D unprom0[0].op; > > > > > > + mult_arg2 =3D unprom0[1].op; > > > > > > + } > > > > > > pattern_stmt =3D gimple_build_assign (var, DOT_PROD_EXPR, > > > > > > - mult_oprnd[0], mult_oprnd= [1], > > > > > > oprnd1); > > > > > > + mult_arg1, mult_arg2, > > > > > > + oprnd1); > > > > > > > > > > > > I thought DOT_PROD always performs the promotion. Maybe > > > > mult_oprnd > > > > > > and unprom0 are just misnamed here? > > > > > > > > > > Somewhat, in a normal dot-product the sign of the multiplication > > > > > are the same here as the "unpromoted" types. So after > > vect_convert_input > > > > > these two types are the same. > > > > > > > > > > However because here the sign changes and to maintain the > > > > > semantics > > of > > > > > the C code there's an extra conversion here to get the arguments > > > > > in the same sign. That needs to be stripped before given to the > > > > > instruction which does the conversion internally. > > > > > > > > Yes, but then why's that not done by the detection code? That is, > > > > does it (mis-)handle the (int)short_a * (int)(unsigned > > > > short)short_b where we'd want the mixed-sign handling and not > > > > strip the unsigned short conversion from short_b? > > > > > > > > Richard. > > > > > > > > > > > > > > Regards, > > > > > Tamar > > > > > > > > > > > > > > > > > Richard. > > > > > > > > > > > > > Regards, > > > > > > > Tamar > > > > > > > > > > > > > > > > > > > > > > > The tree.def docs say the sum is also possibly widening > > > > > > > > but I don't see this covered by the optab so we should > > > > > > > > eventually remove this feature from the tree side. In > > > > > > > > fact the tree-cfg.c verifier requires the addition to be > > > > > > > > not widening - thus only tree.def needs > > > > > > adjustment. > > > > > > > > > > > > > > > > > @cindex @code{ssad@var{m}} instruction pattern @item > > > > > > > > > @samp{ssad@var{m}} diff --git a/gcc/optabs-tree.h > > > > > > > > > b/gcc/optabs-tree.h index > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > c3aaa1a416991e856d3e24da45968a92ebada82c..ebc23ac86fe99057f375781c2f > > > > > > > > 19 > > > > > > > > > 90e0548ba08d 100644 > > > > > > > > > --- a/gcc/optabs-tree.h > > > > > > > > > +++ b/gcc/optabs-tree.h > > > > > > > > > @@ -27,11 +27,29 @@ along with GCC; see the file COPYING3= . > > If > > > > > > > > > not > > > > > > see > > > > > > > > > shift amount vs. machines that take a vector for the > > > > > > > > > shift > > amount. > > > > > > > > > */ enum optab_subtype { > > > > > > > > > - optab_default, > > > > > > > > > - optab_scalar, > > > > > > > > > - optab_vector > > > > > > > > > + optab_default =3D 1 << 0, optab_scalar =3D 1 << 1, > > > > > > > > > + optab_vector =3D 1 << 2, optab_signed_to_unsigned =3D = 1 > > > > > > > > > + << 3, optab_unsigned_to_signed =3D > > > > > > > > > + 1 << 4 > > > > > > > > > }; > > > > > > > > > > > > > > > > > > +/* Override the OrEqual-operator so we can use > > optab_subtype > > > > > > > > > +as a bit flag. */ inline enum optab_subtype& operator > > > > > > > > > +|=3D (enum > > > > > > > > optab_subtype& > > > > > > > > > +a, enum optab_subtype b) { > > > > > > > > > + return a =3D static_cast(static_cast<= int>(a) > > > > > > > > > + | > static_cast(b)); } > > > > > > > > > + > > > > > > > > > +/* Override the Or-operator so we can use optab_subtype > > > > > > > > > +as a bit flag. */ inline enum optab_subtype operator | > > > > > > > > > +(enum optab_subtype a, enum optab_subtype b) { > > > > > > > > > + return static_cast(static_cast(a= ) > > > > > > > > > + | static_cast(b)); } > > > > > > > > > + > > > > > > > > > /* Return the optab used for computing the given > > > > > > > > > operation on the type > > > > > > > > given by > > > > > > > > > the second argument. The third argument > > > > > > > > > distinguishes between the > > > > > > > > types of > > > > > > > > > vector shifts and rotates. */ diff --git > > > > > > > > > a/gcc/optabs-tree.c b/gcc/optabs-tree.c index > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 95ffe397c23e80c105afea52e9d47216bf52f55a..2f60004545defc53182e004eea > > > > > > > > 1e > > > > > > > > > 5c22b7453072 100644 > > > > > > > > > --- a/gcc/optabs-tree.c > > > > > > > > > +++ b/gcc/optabs-tree.c > > > > > > > > > @@ -127,7 +127,17 @@ optab_for_tree_code (enum > tree_code > > > > code, > > > > > > > > const_tree type, > > > > > > > > > return TYPE_UNSIGNED (type) ? usum_widen_optab : > > > > > > > > > ssum_widen_optab; > > > > > > > > > > > > > > > > > > case DOT_PROD_EXPR: > > > > > > > > > - return TYPE_UNSIGNED (type) ? udot_prod_optab : > > > > > > sdot_prod_optab; > > > > > > > > > + { > > > > > > > > > + gcc_assert (subtype & optab_default > > > > > > > > > + || subtype & optab_vector > > > > > > > > > + || subtype & optab_signed_to_unsigned > > > > > > > > > + || subtype & optab_unsigned_to_signed); > > > > > > > > > + > > > > > > > > > + if (subtype & (optab_unsigned_to_signed | > > > > > > > > optab_signed_to_unsigned)) > > > > > > > > > + return usdot_prod_optab; > > > > > > > > > + > > > > > > > > > + return (TYPE_UNSIGNED (type) ? udot_prod_optab : > > > > > > > > sdot_prod_optab); > > > > > > > > > + } > > > > > > > > > > > > > > > > > > case SAD_EXPR: > > > > > > > > > return TYPE_UNSIGNED (type) ? usad_optab : > > > > > > > > > ssad_optab; diff --git a/gcc/optabs.c b/gcc/optabs.c > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > f4614a394587787293dc8b680a38901f7906f61c..2e18b76de1412eab71971753ac > > > > > > > > 67 > > > > > > > > > 8597c0d00098 100644 > > > > > > > > > --- a/gcc/optabs.c > > > > > > > > > +++ b/gcc/optabs.c > > > > > > > > > @@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops > > ops, > > > > > > > > > rtx op0, > > > > > > > > rtx op1, rtx wide_op, > > > > > > > > > bool sbool =3D false; > > > > > > > > > > > > > > > > > > oprnd0 =3D ops->op0; > > > > > > > > > + if (nops >=3D 2) > > > > > > > > > + oprnd1 =3D ops->op1; > > > > > > > > > + if (nops >=3D 3) > > > > > > > > > + oprnd2 =3D ops->op2; > > > > > > > > > + > > > > > > > > > tmode0 =3D TYPE_MODE (TREE_TYPE (oprnd0)); > > > > > > > > > if (ops->code =3D=3D VEC_UNPACK_FIX_TRUNC_HI_EXPR > > > > > > > > > || ops->code =3D=3D VEC_UNPACK_FIX_TRUNC_LO_EXPR) > @@ > > > > > > > > > - > > > > 285,6 > > > > > > > > +290,27 > > > > > > > > > @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx > > > > > > > > > op1, rtx > > > > > > > > wide_op, > > > > > > > > > ? vec_unpacks_sbool_hi_optab : > > > > vec_unpacks_sbool_lo_optab); > > > > > > > > > sbool =3D true; > > > > > > > > > } > > > > > > > > > + else if (ops->code =3D=3D DOT_PROD_EXPR) > > > > > > > > > + { > > > > > > > > > + enum optab_subtype subtype =3D optab_default; > > > > > > > > > + signop sign1 =3D TYPE_SIGN (TREE_TYPE (oprnd0)); > > > > > > > > > + signop sign2 =3D TYPE_SIGN (TREE_TYPE (oprnd1)); > > > > > > > > > + if (sign1 =3D=3D sign2) > > > > > > > > > + ; > > > > > > > > > + else if (sign1 =3D=3D SIGNED && sign2 =3D=3D UNSIG= NED) > > > > > > > > > + { > > > > > > > > > + subtype |=3D optab_signed_to_unsigned; > > > > > > > > > + /* Same as optab_unsigned_to_signed but flip the > > > > operands. */ > > > > > > > > > + std::swap (op0, op1); > > > > > > > > > + } > > > > > > > > > + else if (sign1 =3D=3D UNSIGNED && sign2 =3D=3D SIG= NED) > > > > > > > > > + subtype |=3D optab_unsigned_to_signed; > > > > > > > > > + else > > > > > > > > > + gcc_unreachable (); > > > > > > > > > + > > > > > > > > > + widen_pattern_optab > > > > > > > > > + =3D optab_for_tree_code (ops->code, TREE_TYPE > (oprnd0), > > > > subtype); > > > > > > > > > + } > > > > > > > > > else > > > > > > > > > widen_pattern_optab > > > > > > > > > =3D optab_for_tree_code (ops->code, TREE_TYPE > > > > > > > > > (oprnd0), optab_default); @@ -298,10 +324,7 @@ > > > > expand_widen_pattern_expr > > > > > > > > (sepops ops, rtx op0, rtx op1, rtx wide_op, > > > > > > > > > gcc_assert (icode !=3D CODE_FOR_nothing); > > > > > > > > > > > > > > > > > > if (nops >=3D 2) > > > > > > > > > - { > > > > > > > > > - oprnd1 =3D ops->op1; > > > > > > > > > - tmode1 =3D TYPE_MODE (TREE_TYPE (oprnd1)); > > > > > > > > > - } > > > > > > > > > + tmode1 =3D TYPE_MODE (TREE_TYPE (oprnd1)); > > > > > > > > > else if (sbool) > > > > > > > > > { > > > > > > > > > nops =3D 2; > > > > > > > > > @@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops > > ops, > > > > rtx > > > > > > > > > op0, > > > > > > > > rtx op1, rtx wide_op, > > > > > > > > > { > > > > > > > > > gcc_assert (tmode1 =3D=3D tmode0); > > > > > > > > > gcc_assert (op1); > > > > > > > > > - oprnd2 =3D ops->op2; > > > > > > > > > wmode =3D TYPE_MODE (TREE_TYPE (oprnd2)); > > > > > > > > > } > > > > > > > > > > > > > > > > > > diff --git a/gcc/optabs.def b/gcc/optabs.def index > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > b192a9d070b8aa72e5676b2eaa020b5bdd7ffcc8..f470c2168378cec840edf7fbd > > > > > > > > b7c > > > > > > > > > 18615baae928 100644 > > > > > > > > > --- a/gcc/optabs.def > > > > > > > > > +++ b/gcc/optabs.def > > > > > > > > > @@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab, > > "uavg$a3_ceil") > > > > > > > > OPTAB_D > > > > > > > > > (sdot_prod_optab, "sdot_prod$I$a") OPTAB_D > > > > (ssum_widen_optab, > > > > > > > > > "widen_ssum$I$a3") OPTAB_D (udot_prod_optab, > > > > "udot_prod$I$a") > > > > > > > > > +OPTAB_D (usdot_prod_optab, "usdot_prod$I$a") > > > > > > > > > OPTAB_D (usum_widen_optab, "widen_usum$I$a3") > > OPTAB_D > > > > > > > > (usad_optab, > > > > > > > > > "usad$I$a") OPTAB_D (ssad_optab, "ssad$I$a") diff --git > > > > > > > > > a/gcc/tree-cfg.c b/gcc/tree-cfg.c index > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..58b55bb648ad97d514f1fa18bb > > > > > > > > 00 > > > > > > > > > 808fd2678b42 100644 > > > > > > > > > --- a/gcc/tree-cfg.c > > > > > > > > > +++ b/gcc/tree-cfg.c > > > > > > > > > @@ -4421,7 +4421,8 @@ verify_gimple_assign_ternary > > > > > > > > > (gassign > > > > *stmt) > > > > > > > > > && !SCALAR_FLOAT_TYPE_P (rhs1_type)) > > > > > > > > > || (!INTEGRAL_TYPE_P (lhs_type) > > > > > > > > > && !SCALAR_FLOAT_TYPE_P (lhs_type)))) > > > > > > > > > - || !types_compatible_p (rhs1_type, rhs2_type) > > > > > > > > > + || (!types_compatible_p (rhs1_type, rhs2_type) > > > > > > > > > + && TYPE_SIGN (rhs1_type) =3D=3D TYPE_SIGN > > > > (rhs2_type)) > > > > > > > > > > > > > > > > That's not restrictive enough. I suggest you use > > > > > > > > > > > > > > > > && element_precision (rhs1_type) !=3D > > > > > > > > element_precision > > > > > > > > (rhs2_type) > > > > > > > > > > > > > > > > instead. > > > > > > > > > > > > > > > > As said, I'm not sure all the changes in this patch are req= uired. > > > > > > > > > > > > > > > > Please elaborate. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Richard. > > > > > > > > > > > > > > > > > || !useless_type_conversion_p (lhs_type, rhs3_type) > > > > > > > > > || maybe_lt (GET_MODE_SIZE (element_mode > > > > (rhs3_type)), > > > > > > > > > 2 * GET_MODE_SIZE (element_mode > > > > (rhs1_type)))) > > > > > > > > diff --git > > > > > > > > > a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 93fa2928e001c154bd4a9a73ac1dbbbf73c456df..cb8f5fbb6abca181c4171194d1 > > > > > > > > 9f > > > > > > > > > ec29ec6e4176 100644 > > > > > > > > > --- a/gcc/tree-vect-loop.c > > > > > > > > > +++ b/gcc/tree-vect-loop.c > > > > > > > > > @@ -6401,6 +6401,33 @@ build_vect_cond_expr (enum > > tree_code > > > > > > code, > > > > > > > > tree vop[3], tree mask, > > > > > > > > > } > > > > > > > > > } > > > > > > > > > > > > > > > > > > +/* Determine the optab_subtype to use for the given > > > > > > > > > +CODE > > and > > > > STMT. > > > > > > > > For > > > > > > > > > + most CODE this will be optab_vector, however for > > > > > > > > > + certain operations > > > > > > > > such as > > > > > > > > > + DOT_PROD_EXPR where the operation can different > > > > > > > > > + signs for the > > > > > > > > operands we > > > > > > > > > + need to be able to pick the right optabs. */ > > > > > > > > > + > > > > > > > > > +static enum optab_subtype vect_determine_dot_kind > > > > > > > > > +(tree_code code, stmt_vec_info > > > > > > > > > +stmt_vinfo) { > > > > > > > > > + enum optab_subtype subtype =3D optab_vector; > > > > > > > > > + switch (code) > > > > > > > > > + { > > > > > > > > > + case DOT_PROD_EXPR: > > > > > > > > > + { > > > > > > > > > + gassign *stmt =3D as_a > (STMT_VINFO_STMT > > > > (stmt_vinfo)); > > > > > > > > > + signop rhs1_sign =3D TYPE_SIGN (TREE_TYPE > > > > > > > > > +(gimple_assign_rhs1 > > > > > > > > (stmt))); > > > > > > > > > + signop rhs2_sign =3D TYPE_SIGN (TREE_TYPE > > > > > > > > > +(gimple_assign_rhs2 > > > > > > > > (stmt))); > > > > > > > > > + if (rhs1_sign !=3D rhs2_sign) > > > > > > > > > + subtype |=3D optab_unsigned_to_signed; > > > > > > > > > + break; > > > > > > > > > + } > > > > > > > > > + default: > > > > > > > > > + break; > > > > > > > > > + } > > > > > > > > > + > > > > > > > > > + return subtype; > > > > > > > > > +} > > > > > > > > > + > > > > > > > > > /* Function vectorizable_reduction. > > > > > > > > > > > > > > > > > > Check if STMT_INFO performs a reduction operation > > > > > > > > > that can be > > > > > > > > vectorized. > > > > > > > > > @@ -7189,7 +7216,8 @@ vectorizable_reduction > > > > > > > > > (loop_vec_info > > > > > > > > loop_vinfo, > > > > > > > > > bool ok =3D true; > > > > > > > > > > > > > > > > > > /* 4.1. check support for the operation in the loo= p */ > > > > > > > > > - optab optab =3D optab_for_tree_code (code, vectype= _in, > > > > > > optab_vector); > > > > > > > > > + enum optab_subtype subtype =3D > > > > > > > > > + vect_determine_dot_kind (code, > > > > > > > > stmt_info); > > > > > > > > > + optab optab =3D optab_for_tree_code (code, > > > > > > > > > + vectype_in, subtype); > > > > > > > > > if (!optab) > > > > > > > > > { > > > > > > > > > if (dump_enabled_p ()) diff --git > > > > > > > > > a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 441d6cd28c4eaded7abd756164890dbcffd2f3b8..943c001fb13777b4d1513841f > > > > > > > > a84 > > > > > > > > > 942316846d5e 100644 > > > > > > > > > --- a/gcc/tree-vect-patterns.c > > > > > > > > > +++ b/gcc/tree-vect-patterns.c > > > > > > > > > @@ -201,7 +201,8 @@ vect_get_external_def_edge > (vec_info > > > > > > > > > *vinfo, tree > > > > > > > > > var) static bool vect_supportable_direct_optab_p > > > > > > > > > (vec_info *vinfo, tree otype, tree_code code, > > > > > > > > > tree itype, tree *vecotype_out, > > > > > > > > > - tree *vecitype_out =3D NULL) > > > > > > > > > + tree *vecitype_out =3D NULL, > > > > > > > > > + enum optab_subtype > subtype =3D > > > > > > > > optab_default) > > > > > > > > > { > > > > > > > > > tree vecitype =3D get_vectype_for_scalar_type (vinfo, = itype); > > > > > > > > > if (!vecitype) > > > > > > > > > @@ -211,7 +212,7 @@ vect_supportable_direct_optab_p > > (vec_info > > > > > > > > > *vinfo, > > > > > > > > tree otype, tree_code code, > > > > > > > > > if (!vecotype) > > > > > > > > > return false; > > > > > > > > > > > > > > > > > > - optab optab =3D optab_for_tree_code (code, vecitype, > > > > > > > > > optab_default); > > > > > > > > > + optab optab =3D optab_for_tree_code (code, vecitype, > > > > > > > > > + subtype); > > > > > > > > > if (!optab) > > > > > > > > > return false; > > > > > > > > > > > > > > > > > > @@ -487,14 +488,31 @@ vect_joust_widened_integer (tree > > type, > > > > > > > > > bool shift_p, tree op, } > > > > > > > > > > > > > > > > > > /* Return true if the common supertype of NEW_TYPE and > > > > > > > > *COMMON_TYPE > > > > > > > > > - is narrower than type, storing the supertype in > > *COMMON_TYPE > > > > if > > > > > > so. > > > > > > > > */ > > > > > > > > > + is narrower than type, storing the supertype in > > > > > > > > > + *COMMON_TYPE if > > > > > > so. > > > > > > > > > + If ALLOW_SHORT_SIGN_MISMATCH then accept that > > > > > > *COMMON_TYPE > > > > > > > > and NEW_TYPE > > > > > > > > > + may be of different signs but equal precision. */ > > > > > > > > > > > > > > > > > > static bool > > > > > > > > > -vect_joust_widened_type (tree type, tree new_type, tree > > > > > > > > *common_type) > > > > > > > > > +vect_joust_widened_type (tree type, tree new_type, tree > > > > > > > > *common_type, > > > > > > > > > + bool allow_short_sign_mismatch =3D > false) > > > > > > > > > { > > > > > > > > > if (types_compatible_p (*common_type, new_type)) > > > > > > > > > return true; > > > > > > > > > > > > > > > > > > + /* Check if the mismatch is only in the sign and if we= have > > > > > > > > > + allow_short_sign_mismatch then allow it. */ > > > > > > > > > + if (allow_short_sign_mismatch > > > > > > > > > + && TYPE_SIGN (*common_type) !=3D TYPE_SIGN > (new_type)) > > > > > > > > > + { > > > > > > > > > + bool sign =3D TYPE_SIGN (*common_type) =3D=3D UNSI= GNED; > > > > > > > > > + tree eq_type > > > > > > > > > + =3D build_nonstandard_integer_type (TYPE_PRECISION > > > > (new_type), > > > > > > > > > + sign); > > > > > > > > > + > > > > > > > > > + if (types_compatible_p (*common_type, eq_type)) > > > > > > > > > + return true; > > > > > > > > > + } > > > > > > > > > + > > > > > > > > > /* See if *COMMON_TYPE can hold all values of NEW_TYPE= . > > */ > > > > > > > > > if ((TYPE_PRECISION (new_type) < TYPE_PRECISION > > > > (*common_type)) > > > > > > > > > && (TYPE_UNSIGNED (new_type) || !TYPE_UNSIGNED > > > > > > > > (*common_type))) > > > > > > > > > @@ -532,6 +550,9 @@ vect_joust_widened_type (tree type, > > tree > > > > > > > > new_type, tree *common_type) > > > > > > > > > to a type that (a) is narrower than the result of > > > > > > > > > STMT_INFO > > and > > > > > > > > > (b) can hold all leaf operand values. > > > > > > > > > > > > > > > > > > + If ALLOW_SHORT_SIGN_MISMATCH then allow that the > > > > > > > > > + signs > > of > > > > > > > > > + the > > > > > > > > operands > > > > > > > > > + may differ in signs but not in precision. > > > > > > > > > + > > > > > > > > > Return 0 if STMT_INFO isn't such a tree, or if no > > > > > > > > > such > > > > COMMON_TYPE > > > > > > > > > exists. */ > > > > > > > > > > > > > > > > > > @@ -539,7 +560,8 @@ static unsigned int > > vect_widened_op_tree > > > > > > > > > (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code= , > > > > > > > > > tree_code widened_code, bool shift_p, > > > > > > > > > unsigned int max_nops, > > > > > > > > > - vect_unpromoted_value *unprom, tree > > > > *common_type) > > > > > > > > > + vect_unpromoted_value *unprom, tree > > > > *common_type, > > > > > > > > > + bool allow_short_sign_mismatch =3D false) > > > > > > > > > { > > > > > > > > > /* Check for an integer operation with the right code.= */ > > > > > > > > > gassign *assign =3D dyn_cast > > > > > > > > > (stmt_info->stmt); @@ > > > > > > > > > -600,7 > > > > > > > > > +622,8 @@ vect_widened_op_tree (vec_info *vinfo, > > > > stmt_vec_info > > > > > > > > stmt_info, tree_code code, > > > > > > > > > =3D vinfo->lookup_def (this_unprom->op); > > > > > > > > > nops =3D vect_widened_op_tree (vinfo, > > > > > > > > > def_stmt_info, > > > > code, > > > > > > > > > widened_code, shift_p, > > > > max_nops, > > > > > > > > > - this_unprom, > > > > common_type); > > > > > > > > > + this_unprom, > > > > common_type, > > > > > > > > > + > > > > allow_short_sign_mismatch); > > > > > > > > > if (nops =3D=3D 0) > > > > > > > > > return 0; > > > > > > > > > > > > > > > > > > @@ -617,7 +640,8 @@ vect_widened_op_tree (vec_info > > > > > > > > > *vinfo, > > > > > > > > stmt_vec_info stmt_info, tree_code code, > > > > > > > > > if (i =3D=3D 0) > > > > > > > > > *common_type =3D this_unprom->type; > > > > > > > > > else if (!vect_joust_widened_type (type, > > > > > > > > > this_unprom- > > > > >type, > > > > > > > > > - common_type)) > > > > > > > > > + > common_type, > > > > > > > > > + > > > > allow_short_sign_mismatch)) > > > > > > > > > return 0; > > > > > > > > > } > > > > > > > > > } > > > > > > > > > @@ -888,21 +912,24 @@ vect_reassociating_reduction_p > > (vec_info > > > > > > > > > *vinfo, > > > > > > > > > > > > > > > > > > Try to find the following pattern: > > > > > > > > > > > > > > > > > > - type x_t, y_t; > > > > > > > > > + type1a x_t > > > > > > > > > + type1b y_t; > > > > > > > > > TYPE1 prod; > > > > > > > > > TYPE2 sum =3D init; > > > > > > > > > loop: > > > > > > > > > sum_0 =3D phi > > > > > > > > > S1 x_t =3D ... > > > > > > > > > S2 y_t =3D ... > > > > > > > > > - S3 x_T =3D (TYPE1) x_t; > > > > > > > > > - S4 y_T =3D (TYPE1) y_t; > > > > > > > > > + S3 x_T =3D (TYPE3) x_t; > > > > > > > > > + S4 y_T =3D (TYPE4) y_t; > > > > > > > > > S5 prod =3D x_T * y_T; > > > > > > > > > [S6 prod =3D (TYPE2) prod; #optional] > > > > > > > > > S7 sum_1 =3D prod + sum_0; > > > > > > > > > > > > > > > > > > - where 'TYPE1' is exactly double the size of type 'typ= e', and > > 'TYPE2' > > > > is > > > > > > the > > > > > > > > > - same size of 'TYPE1' or bigger. This is a special cas= e of a > > reduction > > > > > > > > > + where 'TYPE1' is exactly double the size of type > > > > > > > > > + 'type1a' and > > > > 'type1b', > > > > > > > > > + the sign of 'TYPE1' must be one of 'type1a' or > > > > > > > > > + 'type1b' but the > > > > sign of > > > > > > > > > + 'type1a' and 'type1b' can differ. 'TYPE2' is the > > > > > > > > > + same size of > > 'TYPE1' > > > > or > > > > > > > > > + bigger and must be the same sign. This is a special > > > > > > > > > + case of a reduction > > > > > > > > > computation. > > > > > > > > > > > > > > > > > > Input: > > > > > > > > > @@ -939,15 +966,16 @@ vect_recog_dot_prod_pattern > > (vec_info > > > > > > > > > *vinfo, > > > > > > > > > > > > > > > > > > /* Look for the following pattern > > > > > > > > > DX =3D (TYPE1) X; > > > > > > > > > - DY =3D (TYPE1) Y; > > > > > > > > > + DY =3D (TYPE2) Y; > > > > > > > > > DPROD =3D DX * DY; > > > > > > > > > - DDPROD =3D (TYPE2) DPROD; > > > > > > > > > + DDPROD =3D (TYPE3) DPROD; > > > > > > > > > sum_1 =3D DDPROD + sum_0; > > > > > > > > > In which > > > > > > > > > - DX is double the size of X > > > > > > > > > - DY is double the size of Y > > > > > > > > > - DX, DY, DPROD all have the same type but the sign > > > > > > > > > - between DX, DY and DPROD can differ. > > > > > > > > > + between DX, DY and DPROD can differ. The sign of = DPROD > > > > > > > > > + is one of the signs of DX or DY. > > > > > > > > > - sum is the same size of DPROD or bigger > > > > > > > > > - sum has been recognized as a reduction variable. > > > > > > > > > > > > > > > > > > @@ -986,14 +1014,41 @@ vect_recog_dot_prod_pattern > > (vec_info > > > > > > *vinfo, > > > > > > > > > inside the loop (in case we are analyzing an outer-= loop). */ > > > > > > > > > vect_unpromoted_value unprom0[2]; > > > > > > > > > if (!vect_widened_op_tree (vinfo, mult_vinfo, > > > > > > > > > MULT_EXPR, > > > > > > > > WIDEN_MULT_EXPR, > > > > > > > > > - false, 2, unprom0, &half_type)) > > > > > > > > > + false, 2, unprom0, &half_type, > true)) > > > > > > > > > return NULL; > > > > > > > > > > > > > > > > > > + /* Check to see if there is a sign change happening > > > > > > > > > + in the operands of > > > > > > > > the > > > > > > > > > + multiplication and pick the appropriate optab subty= pe. > > > > > > > > > +*/ > > > > > > > > > + enum optab_subtype subtype; > > > > > > > > > + tree rhs_type1 =3D unprom0[0].type; > > > > > > > > > + tree rhs_type2 =3D unprom0[1].type; > > > > > > > > > + if (TYPE_SIGN (rhs_type1) =3D=3D TYPE_SIGN (rhs_type2)= ) > > > > > > > > > + subtype =3D optab_default; > > > > > > > > > + else if (TYPE_SIGN (rhs_type1) =3D=3D SIGNED > > > > > > > > > + && TYPE_SIGN (rhs_type2) =3D=3D UNSIGNED) > > > > > > > > > + subtype =3D optab_signed_to_unsigned; > > > > > > > > > + else if (TYPE_SIGN (rhs_type1) =3D=3D UNSIGNED > > > > > > > > > + && TYPE_SIGN (rhs_type2) =3D=3D SIGNED) > > > > > > > > > + subtype =3D optab_unsigned_to_signed; > > > > > > > > > + else > > > > > > > > > + gcc_unreachable (); > > > > > > > > > + > > > > > > > > > + /* If we have a sign changing dot product we need to > > > > > > > > > + check > > that > > > > the > > > > > > > > > + promoted type if unsigned has at least the same > > > > > > > > > + precision as the > > > > > > final > > > > > > > > > + type of the dot-product. */ > > > > > > > > > + if (subtype !=3D optab_default) > > > > > > > > > + { > > > > > > > > > + tree mult_type =3D TREE_TYPE (unprom_mult.op); > > > > > > > > > + if (TYPE_SIGN (mult_type) =3D=3D UNSIGNED > > > > > > > > > + && TYPE_PRECISION (mult_type) < TYPE_PRECISION > (type)) > > > > > > > > > + return NULL; > > > > > > > > > + } > > > > > > > > > + > > > > > > > > > vect_pattern_detected ("vect_recog_dot_prod_pattern", > > > > > > > > > last_stmt); > > > > > > > > > > > > > > > > > > tree half_vectype; > > > > > > > > > if (!vect_supportable_direct_optab_p (vinfo, type, > > > > > > > > > DOT_PROD_EXPR, > > > > > > > > half_type, > > > > > > > > > - type_out, &half_vectype)) > > > > > > > > > + type_out, > &half_vectype, > > > > subtype)) > > > > > > > > > return NULL; > > > > > > > > > > > > > > > > > > /* Get the inputs in the appropriate types. */ @@ > > > > > > > > > -1002,8 > > > > > > > > > +1057,22 @@ vect_recog_dot_prod_pattern (vec_info > > > > > > > > > +*vinfo, > > > > > > > > > unprom0, half_vectype); > > > > > > > > > > > > > > > > > > var =3D vect_recog_temp_ssa_var (type, NULL); > > > > > > > > > + > > > > > > > > > + /* If we have a sign changing dot-product the > > > > > > > > > + dot-product itself does > > > > > > any > > > > > > > > > + sign conversions, so consume the type and use the > > > > > > > > > + unpromoted types. */ tree mult_arg1, mult_arg2; if > > > > > > > > > + (subtype =3D=3D > > > > > > > > > + optab_default) > > > > > > > > > + { > > > > > > > > > + mult_arg1 =3D mult_oprnd[0]; > > > > > > > > > + mult_arg2 =3D mult_oprnd[1]; > > > > > > > > > + } > > > > > > > > > + else > > > > > > > > > + { > > > > > > > > > + mult_arg1 =3D unprom0[0].op; > > > > > > > > > + mult_arg2 =3D unprom0[1].op; > > > > > > > > > + } > > > > > > > > > pattern_stmt =3D gimple_build_assign (var, DOT_PROD_EX= PR, > > > > > > > > > - mult_oprnd[0], mult_oprnd[1], > > > > oprnd1); > > > > > > > > > + mult_arg1, mult_arg2, > oprnd1); > > > > > > > > > > > > > > > > > > return pattern_stmt; > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Richard Biener SUSE Software Solutions > > > > > > > > Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; > > GF: > > > > > > > > Felix Imend?rffer; HRB 36809 (AG Nuernberg) > > > > > > > > > > > > > > > > > > > -- > > > > > > Richard Biener SUSE Software Solutions > > > > > > Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: > > > > > > Felix Imend?rffer; HRB 36809 (AG > > Nuernberg) > > > > > > > > > > > > > -- > > > > Richard Biener SUSE Software Solutions > Germany > > > > GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix > > > > Imend?rffer; HRB 36809 (AG Nuernberg) > > > > > > > -- > > Richard Biener > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 > > Nuernberg, Germany; GF: Felix Imend --_002_VI1PR08MB532593667144D0CA998C2723FF3B9VI1PR08MB5325eurp_ Content-Type: application/octet-stream; name="rb14433.patch" Content-Description: rb14433.patch Content-Disposition: attachment; filename="rb14433.patch"; size=16555; creation-date="Fri, 04 Jun 2021 10:11:00 GMT"; modification-date="Fri, 04 Jun 2021 10:11:00 GMT" Content-Transfer-Encoding: base64 ZGlmZiAtLWdpdCBhL2djYy9kb2MvbWQudGV4aSBiL2djYy9kb2MvbWQudGV4aQppbmRleCBkMTY2 YTBkZWJlZGY0ZDhlZGY1NWM4NDJiY2Y0ZmY0NjkwYjNlOWNlLi45ZmFkMzMyMmIzZjFlYjJhODM2 ODMzYmIzOTBkZjc4ZjBjZDk3MzRiIDEwMDY0NAotLS0gYS9nY2MvZG9jL21kLnRleGkKKysrIGIv Z2NjL2RvYy9tZC50ZXhpCkBAIC01NDM4LDEzICs1NDM4LDU1IEBAIExpa2UgQHNhbXB7Zm9sZF9s ZWZ0X3BsdXNfQHZhcnttfX0sIGJ1dCB0YWtlcyBhbiBhZGRpdGlvbmFsIG1hc2sgb3BlcmFuZAog CiBAY2luZGV4IEBjb2Rle3Nkb3RfcHJvZEB2YXJ7bX19IGluc3RydWN0aW9uIHBhdHRlcm4KIEBp dGVtIEBzYW1we3Nkb3RfcHJvZEB2YXJ7bX19CisKK0NvbXB1dGUgdGhlIHN1bSBvZiB0aGUgcHJv ZHVjdHMgb2YgdHdvIHNpZ25lZCBlbGVtZW50cy4KK09wZXJhbmQgMSBhbmQgb3BlcmFuZCAyIGFy ZSBvZiB0aGUgc2FtZSBtb2RlLiBUaGVpcgorcHJvZHVjdCwgd2hpY2ggaXMgb2YgYSB3aWRlciBt b2RlLCBpcyBjb21wdXRlZCBhbmQgYWRkZWQgdG8gb3BlcmFuZCAzLgorT3BlcmFuZCAzIGlzIG9m IGEgbW9kZSBlcXVhbCBvciB3aWRlciB0aGFuIHRoZSBtb2RlIG9mIHRoZSBwcm9kdWN0LiBUaGUK K3Jlc3VsdCBpcyBwbGFjZWQgaW4gb3BlcmFuZCAwLCB3aGljaCBpcyBvZiB0aGUgc2FtZSBtb2Rl IGFzIG9wZXJhbmQgMy4KKworU2VtYW50aWNhbGx5IHRoZSBleHByZXNzaW9ucyBwZXJmb3JtIHRo ZSBtdWx0aXBsaWNhdGlvbiBpbiB0aGUgZm9sbG93aW5nIHNpZ25zCisKK0BzbWFsbGV4YW1wbGUK K3Nkb3Q8c2lnbmVkIGMsIHNpZ25lZCBhLCBzaWduZWQgYj4gPT0KKyAgIHJlcyA9IHNpZ24tZXh0 IChhKSAqIHNpZ24tZXh0IChiKSArIGMKK0Bkb3Rze30KK0BlbmQgc21hbGxleGFtcGxlCisKIEBj aW5kZXggQGNvZGV7dWRvdF9wcm9kQHZhcnttfX0gaW5zdHJ1Y3Rpb24gcGF0dGVybgotQGl0ZW14 IEBzYW1we3Vkb3RfcHJvZEB2YXJ7bX19Ci1Db21wdXRlIHRoZSBzdW0gb2YgdGhlIHByb2R1Y3Rz IG9mIHR3byBzaWduZWQvdW5zaWduZWQgZWxlbWVudHMuCi1PcGVyYW5kIDEgYW5kIG9wZXJhbmQg MiBhcmUgb2YgdGhlIHNhbWUgbW9kZS4gVGhlaXIgcHJvZHVjdCwgd2hpY2ggaXMgb2YgYQotd2lk ZXIgbW9kZSwgaXMgY29tcHV0ZWQgYW5kIGFkZGVkIHRvIG9wZXJhbmQgMy4gT3BlcmFuZCAzIGlz IG9mIGEgbW9kZSBlcXVhbCBvcgotd2lkZXIgdGhhbiB0aGUgbW9kZSBvZiB0aGUgcHJvZHVjdC4g VGhlIHJlc3VsdCBpcyBwbGFjZWQgaW4gb3BlcmFuZCAwLCB3aGljaAotaXMgb2YgdGhlIHNhbWUg bW9kZSBhcyBvcGVyYW5kIDMuCitAaXRlbSBAc2FtcHt1ZG90X3Byb2RAdmFye219fQorCitDb21w dXRlIHRoZSBzdW0gb2YgdGhlIHByb2R1Y3RzIG9mIHR3byB1bnNpZ25lZCBlbGVtZW50cy4KK09w ZXJhbmQgMSBhbmQgb3BlcmFuZCAyIGFyZSBvZiB0aGUgc2FtZSBtb2RlLiBUaGVpcgorcHJvZHVj dCwgd2hpY2ggaXMgb2YgYSB3aWRlciBtb2RlLCBpcyBjb21wdXRlZCBhbmQgYWRkZWQgdG8gb3Bl cmFuZCAzLgorT3BlcmFuZCAzIGlzIG9mIGEgbW9kZSBlcXVhbCBvciB3aWRlciB0aGFuIHRoZSBt b2RlIG9mIHRoZSBwcm9kdWN0LiBUaGUKK3Jlc3VsdCBpcyBwbGFjZWQgaW4gb3BlcmFuZCAwLCB3 aGljaCBpcyBvZiB0aGUgc2FtZSBtb2RlIGFzIG9wZXJhbmQgMy4KKworU2VtYW50aWNhbGx5IHRo ZSBleHByZXNzaW9ucyBwZXJmb3JtIHRoZSBtdWx0aXBsaWNhdGlvbiBpbiB0aGUgZm9sbG93aW5n IHNpZ25zCisKK0BzbWFsbGV4YW1wbGUKK3Vkb3Q8dW5zaWduZWQgYywgdW5zaWduZWQgYSwgdW5z aWduZWQgYj4gPT0KKyAgIHJlcyA9IHplcm8tZXh0IChhKSAqIHplcm8tZXh0IChiKSArIGMKK0Bk b3Rze30KK0BlbmQgc21hbGxleGFtcGxlCisKKworCitAY2luZGV4IEBjb2Rle3VzZG90X3Byb2RA dmFye219fSBpbnN0cnVjdGlvbiBwYXR0ZXJuCitAaXRlbSBAc2FtcHt1c2RvdF9wcm9kQHZhcntt fX0KK0NvbXB1dGUgdGhlIHN1bSBvZiB0aGUgcHJvZHVjdHMgb2YgZWxlbWVudHMgb2YgZGlmZmVy ZW50IHNpZ25zLgorT3BlcmFuZCAxIG11c3QgYmUgdW5zaWduZWQgYW5kIG9wZXJhbmQgMiBzaWdu ZWQuIFRoZWlyCitwcm9kdWN0LCB3aGljaCBpcyBvZiBhIHdpZGVyIG1vZGUsIGlzIGNvbXB1dGVk IGFuZCBhZGRlZCB0byBvcGVyYW5kIDMuCitPcGVyYW5kIDMgaXMgb2YgYSBtb2RlIGVxdWFsIG9y IHdpZGVyIHRoYW4gdGhlIG1vZGUgb2YgdGhlIHByb2R1Y3QuIFRoZQorcmVzdWx0IGlzIHBsYWNl ZCBpbiBvcGVyYW5kIDAsIHdoaWNoIGlzIG9mIHRoZSBzYW1lIG1vZGUgYXMgb3BlcmFuZCAzLgor CitTZW1hbnRpY2FsbHkgdGhlIGV4cHJlc3Npb25zIHBlcmZvcm0gdGhlIG11bHRpcGxpY2F0aW9u IGluIHRoZSBmb2xsb3dpbmcgc2lnbnMKKworQHNtYWxsZXhhbXBsZQordXNkb3Q8dW5zaWduZWQg YywgdW5zaWduZWQgYSwgc2lnbmVkIGI+ID09CisgICByZXMgPSAoKHVuc2lnbmVkLWNvbnYpIHNp Z24tZXh0IChhKSkgKiB6ZXJvLWV4dCAoYikgKyBjCitAZG90c3t9CitAZW5kIHNtYWxsZXhhbXBs ZQogCiBAY2luZGV4IEBjb2Rle3NzYWRAdmFye219fSBpbnN0cnVjdGlvbiBwYXR0ZXJuCiBAaXRl bSBAc2FtcHtzc2FkQHZhcnttfX0KZGlmZiAtLWdpdCBhL2djYy9vcHRhYnMtdHJlZS5oIGIvZ2Nj L29wdGFicy10cmVlLmgKaW5kZXggYzNhYWExYTQxNjk5MWU4NTZkM2UyNGRhNDU5NjhhOTJlYmFk YTgyYy4uZmJkMmIwNmI4ZGJmZDU2MGRmYjY2YjMxNDgzMGU2YjU2NGIzN2FiYiAxMDA2NDQKLS0t IGEvZ2NjL29wdGFicy10cmVlLmgKKysrIGIvZ2NjL29wdGFicy10cmVlLmgKQEAgLTI5LDcgKzI5 LDggQEAgZW51bSBvcHRhYl9zdWJ0eXBlCiB7CiAgIG9wdGFiX2RlZmF1bHQsCiAgIG9wdGFiX3Nj YWxhciwKLSAgb3B0YWJfdmVjdG9yCisgIG9wdGFiX3ZlY3RvciwKKyAgb3B0YWJfdmVjdG9yX21p eGVkX3NpZ24KIH07CiAKIC8qIFJldHVybiB0aGUgb3B0YWIgdXNlZCBmb3IgY29tcHV0aW5nIHRo ZSBnaXZlbiBvcGVyYXRpb24gb24gdGhlIHR5cGUgZ2l2ZW4gYnkKZGlmZiAtLWdpdCBhL2djYy9v cHRhYnMtdHJlZS5jIGIvZ2NjL29wdGFicy10cmVlLmMKaW5kZXggOTVmZmUzOTdjMjNlODBjMTA1 YWZlYTUyZTlkNDcyMTZiZjUyZjU1YS4uZWViNWFlZWQzMjAyY2M2OTcxYjY0NDc5OTRiYzUzMTFl OWMwMTBiYiAxMDA2NDQKLS0tIGEvZ2NjL29wdGFicy10cmVlLmMKKysrIGIvZ2NjL29wdGFicy10 cmVlLmMKQEAgLTEyNyw3ICsxMjcsMTIgQEAgb3B0YWJfZm9yX3RyZWVfY29kZSAoZW51bSB0cmVl X2NvZGUgY29kZSwgY29uc3RfdHJlZSB0eXBlLAogICAgICAgcmV0dXJuIFRZUEVfVU5TSUdORUQg KHR5cGUpID8gdXN1bV93aWRlbl9vcHRhYiA6IHNzdW1fd2lkZW5fb3B0YWI7CiAKICAgICBjYXNl IERPVF9QUk9EX0VYUFI6Ci0gICAgICByZXR1cm4gVFlQRV9VTlNJR05FRCAodHlwZSkgPyB1ZG90 X3Byb2Rfb3B0YWIgOiBzZG90X3Byb2Rfb3B0YWI7CisgICAgICB7CisJaWYgKHN1YnR5cGUgPT0g b3B0YWJfdmVjdG9yX21peGVkX3NpZ24pCisJICByZXR1cm4gdXNkb3RfcHJvZF9vcHRhYjsKKwor CXJldHVybiAoVFlQRV9VTlNJR05FRCAodHlwZSkgPyB1ZG90X3Byb2Rfb3B0YWIgOiBzZG90X3By b2Rfb3B0YWIpOworICAgICAgfQogCiAgICAgY2FzZSBTQURfRVhQUjoKICAgICAgIHJldHVybiBU WVBFX1VOU0lHTkVEICh0eXBlKSA/IHVzYWRfb3B0YWIgOiBzc2FkX29wdGFiOwpkaWZmIC0tZ2l0 IGEvZ2NjL29wdGFicy5jIGIvZ2NjL29wdGFicy5jCmluZGV4IGY0NjE0YTM5NDU4Nzc4NzI5M2Rj OGI2ODBhMzg5MDFmNzkwNmY2MWMuLmQ5YjY0NDQxZDBlMDcyNmFmZWU4OWRjOWM5MzczNTA0NTFl NzY3MGQgMTAwNjQ0Ci0tLSBhL2djYy9vcHRhYnMuYworKysgYi9nY2Mvb3B0YWJzLmMKQEAgLTI2 Miw2ICsyNjIsMTEgQEAgZXhwYW5kX3dpZGVuX3BhdHRlcm5fZXhwciAoc2Vwb3BzIG9wcywgcnR4 IG9wMCwgcnR4IG9wMSwgcnR4IHdpZGVfb3AsCiAgIGJvb2wgc2Jvb2wgPSBmYWxzZTsKIAogICBv cHJuZDAgPSBvcHMtPm9wMDsKKyAgaWYgKG5vcHMgPj0gMikKKyAgICBvcHJuZDEgPSBvcHMtPm9w MTsKKyAgaWYgKG5vcHMgPj0gMykKKyAgICBvcHJuZDIgPSBvcHMtPm9wMjsKKwogICB0bW9kZTAg PSBUWVBFX01PREUgKFRSRUVfVFlQRSAob3BybmQwKSk7CiAgIGlmIChvcHMtPmNvZGUgPT0gVkVD X1VOUEFDS19GSVhfVFJVTkNfSElfRVhQUgogICAgICAgfHwgb3BzLT5jb2RlID09IFZFQ19VTlBB Q0tfRklYX1RSVU5DX0xPX0VYUFIpCkBAIC0yODUsNiArMjkwLDI3IEBAIGV4cGFuZF93aWRlbl9w YXR0ZXJuX2V4cHIgKHNlcG9wcyBvcHMsIHJ0eCBvcDAsIHJ0eCBvcDEsIHJ0eCB3aWRlX29wLAog CSAgID8gdmVjX3VucGFja3Nfc2Jvb2xfaGlfb3B0YWIgOiB2ZWNfdW5wYWNrc19zYm9vbF9sb19v cHRhYik7CiAgICAgICBzYm9vbCA9IHRydWU7CiAgICAgfQorICBlbHNlIGlmIChvcHMtPmNvZGUg PT0gRE9UX1BST0RfRVhQUikKKyAgICB7CisgICAgICBlbnVtIG9wdGFiX3N1YnR5cGUgc3VidHlw ZSA9IG9wdGFiX2RlZmF1bHQ7CisgICAgICBzaWdub3Agc2lnbjEgPSBUWVBFX1NJR04gKFRSRUVf VFlQRSAob3BybmQwKSk7CisgICAgICBzaWdub3Agc2lnbjIgPSBUWVBFX1NJR04gKFRSRUVfVFlQ RSAob3BybmQxKSk7CisgICAgICBpZiAoc2lnbjEgPT0gc2lnbjIpCisJOworICAgICAgZWxzZSBp ZiAoc2lnbjEgPT0gU0lHTkVEICYmIHNpZ24yID09IFVOU0lHTkVEKQorCXsKKwkgIHN1YnR5cGUg PSBvcHRhYl92ZWN0b3JfbWl4ZWRfc2lnbjsKKwkgIC8qIFNhbWUgYXMgb3B0YWJfdmVjdG9yX21p eGVkX3NpZ24gYnV0IGZsaXAgdGhlIG9wZXJhbmRzLiAgKi8KKwkgIHN0ZDo6c3dhcCAob3AwLCBv cDEpOworCX0KKyAgICAgIGVsc2UgaWYgKHNpZ24xID09IFVOU0lHTkVEICYmIHNpZ24yID09IFNJ R05FRCkKKwlzdWJ0eXBlID0gb3B0YWJfdmVjdG9yX21peGVkX3NpZ247CisgICAgICBlbHNlCisJ Z2NjX3VucmVhY2hhYmxlICgpOworCisgICAgICB3aWRlbl9wYXR0ZXJuX29wdGFiCisJPSBvcHRh Yl9mb3JfdHJlZV9jb2RlIChvcHMtPmNvZGUsIFRSRUVfVFlQRSAob3BybmQwKSwgc3VidHlwZSk7 CisgICAgfQogICBlbHNlCiAgICAgd2lkZW5fcGF0dGVybl9vcHRhYgogICAgICAgPSBvcHRhYl9m b3JfdHJlZV9jb2RlIChvcHMtPmNvZGUsIFRSRUVfVFlQRSAob3BybmQwKSwgb3B0YWJfZGVmYXVs dCk7CkBAIC0yOTgsMTAgKzMyNCw3IEBAIGV4cGFuZF93aWRlbl9wYXR0ZXJuX2V4cHIgKHNlcG9w cyBvcHMsIHJ0eCBvcDAsIHJ0eCBvcDEsIHJ0eCB3aWRlX29wLAogICBnY2NfYXNzZXJ0IChpY29k ZSAhPSBDT0RFX0ZPUl9ub3RoaW5nKTsKIAogICBpZiAobm9wcyA+PSAyKQotICAgIHsKLSAgICAg IG9wcm5kMSA9IG9wcy0+b3AxOwotICAgICAgdG1vZGUxID0gVFlQRV9NT0RFIChUUkVFX1RZUEUg KG9wcm5kMSkpOwotICAgIH0KKyAgICB0bW9kZTEgPSBUWVBFX01PREUgKFRSRUVfVFlQRSAob3By bmQxKSk7CiAgIGVsc2UgaWYgKHNib29sKQogICAgIHsKICAgICAgIG5vcHMgPSAyOwpAQCAtMzE2 LDcgKzMzOSw2IEBAIGV4cGFuZF93aWRlbl9wYXR0ZXJuX2V4cHIgKHNlcG9wcyBvcHMsIHJ0eCBv cDAsIHJ0eCBvcDEsIHJ0eCB3aWRlX29wLAogICAgIHsKICAgICAgIGdjY19hc3NlcnQgKHRtb2Rl MSA9PSB0bW9kZTApOwogICAgICAgZ2NjX2Fzc2VydCAob3AxKTsKLSAgICAgIG9wcm5kMiA9IG9w cy0+b3AyOwogICAgICAgd21vZGUgPSBUWVBFX01PREUgKFRSRUVfVFlQRSAob3BybmQyKSk7CiAg ICAgfQogCmRpZmYgLS1naXQgYS9nY2Mvb3B0YWJzLmRlZiBiL2djYy9vcHRhYnMuZGVmCmluZGV4 IGIxOTJhOWQwNzBiOGFhNzJlNTY3NmIyZWFhMDIwYjViZGQ3ZmZjYzguLmY0NzBjMjE2ODM3OGNl Yzg0MGVkZjdmYmRiN2MxODYxNWJhYWU5MjggMTAwNjQ0Ci0tLSBhL2djYy9vcHRhYnMuZGVmCisr KyBiL2djYy9vcHRhYnMuZGVmCkBAIC0zNTIsNiArMzUyLDcgQEAgT1BUQUJfRCAodWF2Z19jZWls X29wdGFiLCAidWF2ZyRhM19jZWlsIikKIE9QVEFCX0QgKHNkb3RfcHJvZF9vcHRhYiwgInNkb3Rf cHJvZCRJJGEiKQogT1BUQUJfRCAoc3N1bV93aWRlbl9vcHRhYiwgIndpZGVuX3NzdW0kSSRhMyIp CiBPUFRBQl9EICh1ZG90X3Byb2Rfb3B0YWIsICJ1ZG90X3Byb2QkSSRhIikKK09QVEFCX0QgKHVz ZG90X3Byb2Rfb3B0YWIsICJ1c2RvdF9wcm9kJEkkYSIpCiBPUFRBQl9EICh1c3VtX3dpZGVuX29w dGFiLCAid2lkZW5fdXN1bSRJJGEzIikKIE9QVEFCX0QgKHVzYWRfb3B0YWIsICJ1c2FkJEkkYSIp CiBPUFRBQl9EIChzc2FkX29wdGFiLCAic3NhZCRJJGEiKQpkaWZmIC0tZ2l0IGEvZ2NjL3RyZWUt Y2ZnLmMgYi9nY2MvdHJlZS1jZmcuYwppbmRleCA3ZTNhYWU1ZjljMjhhNDlmZWVkYzdjYzY2ZThh YzBkNDc2YjlmMjhhLi4wMTI4ODkxODUyZmNkNzRmZTMxY2QzMzg2MTRlOTBhMjYyNTZiNGJkIDEw MDY0NAotLS0gYS9nY2MvdHJlZS1jZmcuYworKysgYi9nY2MvdHJlZS1jZmcuYwpAQCAtNDQyMSw3 ICs0NDIxLDggQEAgdmVyaWZ5X2dpbXBsZV9hc3NpZ25fdGVybmFyeSAoZ2Fzc2lnbiAqc3RtdCkK IAkJICAmJiAhU0NBTEFSX0ZMT0FUX1RZUEVfUCAocmhzMV90eXBlKSkKIAkJIHx8ICghSU5URUdS QUxfVFlQRV9QIChsaHNfdHlwZSkKIAkJICAgICAmJiAhU0NBTEFSX0ZMT0FUX1RZUEVfUCAobGhz X3R5cGUpKSkpCi0JICAgIHx8ICF0eXBlc19jb21wYXRpYmxlX3AgKHJoczFfdHlwZSwgcmhzMl90 eXBlKQorCSAgICAvKiByaHMxX3R5cGUgYW5kIHJoczJfdHlwZSBtYXkgZGlmZmVyIGluIHNpZ24u ICAqLworCSAgICB8fCAhdHJlZV9ub3BfY29udmVyc2lvbl9wIChyaHMxX3R5cGUsIHJoczJfdHlw ZSkKIAkgICAgfHwgIXVzZWxlc3NfdHlwZV9jb252ZXJzaW9uX3AgKGxoc190eXBlLCByaHMzX3R5 cGUpCiAJICAgIHx8IG1heWJlX2x0IChHRVRfTU9ERV9TSVpFIChlbGVtZW50X21vZGUgKHJoczNf dHlwZSkpLAogCQkJIDIgKiBHRVRfTU9ERV9TSVpFIChlbGVtZW50X21vZGUgKHJoczFfdHlwZSkp KSkKZGlmZiAtLWdpdCBhL2djYy90cmVlLXZlY3QtbG9vcC5jIGIvZ2NjL3RyZWUtdmVjdC1sb29w LmMKaW5kZXggOTNmYTI5MjhlMDAxYzE1NGJkNGE5YTczYWMxZGJiYmY3M2M0NTZkZi4uNzU2ZDI4 NjdiNjc4ZDBkODM5NDIwMmM2YWRiMDNkOWNkMjYwMjllNyAxMDA2NDQKLS0tIGEvZ2NjL3RyZWUt dmVjdC1sb29wLmMKKysrIGIvZ2NjL3RyZWUtdmVjdC1sb29wLmMKQEAgLTY2NjIsNiArNjY2Miwx MiBAQCB2ZWN0b3JpemFibGVfcmVkdWN0aW9uIChsb29wX3ZlY19pbmZvIGxvb3BfdmluZm8sCiAg IGJvb2wgbGFuZV9yZWR1Y19jb2RlX3AKICAgICA9IChjb2RlID09IERPVF9QUk9EX0VYUFIgfHwg Y29kZSA9PSBXSURFTl9TVU1fRVhQUiB8fCBjb2RlID09IFNBRF9FWFBSKTsKICAgaW50IG9wX3R5 cGUgPSBUUkVFX0NPREVfTEVOR1RIIChjb2RlKTsKKyAgZW51bSBvcHRhYl9zdWJ0eXBlIG9wdGFi X3F1ZXJ5X2tpbmQgPSBvcHRhYl92ZWN0b3I7CisgIGlmIChjb2RlID09IERPVF9QUk9EX0VYUFIK KyAgICAgICYmIFRZUEVfU0lHTiAoVFJFRV9UWVBFIChnaW1wbGVfYXNzaWduX3JoczEgKHN0bXQp KSkKKwkgICAhPSBUWVBFX1NJR04gKFRSRUVfVFlQRSAoZ2ltcGxlX2Fzc2lnbl9yaHMyIChzdG10 KSkpKQorICAgIG9wdGFiX3F1ZXJ5X2tpbmQgPSBvcHRhYl92ZWN0b3JfbWl4ZWRfc2lnbjsKKwog CiAgIHNjYWxhcl9kZXN0ID0gZ2ltcGxlX2Fzc2lnbl9saHMgKHN0bXQpOwogICBzY2FsYXJfdHlw ZSA9IFRSRUVfVFlQRSAoc2NhbGFyX2Rlc3QpOwpAQCAtNzE4OSw3ICs3MTk1LDcgQEAgdmVjdG9y aXphYmxlX3JlZHVjdGlvbiAobG9vcF92ZWNfaW5mbyBsb29wX3ZpbmZvLAogICAgICAgYm9vbCBv ayA9IHRydWU7CiAKICAgICAgIC8qIDQuMS4gY2hlY2sgc3VwcG9ydCBmb3IgdGhlIG9wZXJhdGlv biBpbiB0aGUgbG9vcCAgKi8KLSAgICAgIG9wdGFiIG9wdGFiID0gb3B0YWJfZm9yX3RyZWVfY29k ZSAoY29kZSwgdmVjdHlwZV9pbiwgb3B0YWJfdmVjdG9yKTsKKyAgICAgIG9wdGFiIG9wdGFiID0g b3B0YWJfZm9yX3RyZWVfY29kZSAoY29kZSwgdmVjdHlwZV9pbiwgb3B0YWJfcXVlcnlfa2luZCk7 CiAgICAgICBpZiAoIW9wdGFiKQogCXsKIAkgIGlmIChkdW1wX2VuYWJsZWRfcCAoKSkKZGlmZiAt LWdpdCBhL2djYy90cmVlLXZlY3QtcGF0dGVybnMuYyBiL2djYy90cmVlLXZlY3QtcGF0dGVybnMu YwppbmRleCA0NDFkNmNkMjhjNGVhZGVkN2FiZDc1NjE2NDg5MGRiY2ZmZDJmM2I4Li44MjEyM2I5 NjMxM2U2NzgzZWEyMTRiOTI1OTgwNWQ2NWMwN2Q4ODU4IDEwMDY0NAotLS0gYS9nY2MvdHJlZS12 ZWN0LXBhdHRlcm5zLmMKKysrIGIvZ2NjL3RyZWUtdmVjdC1wYXR0ZXJucy5jCkBAIC0yMDEsNyAr MjAxLDggQEAgdmVjdF9nZXRfZXh0ZXJuYWxfZGVmX2VkZ2UgKHZlY19pbmZvICp2aW5mbywgdHJl ZSB2YXIpCiBzdGF0aWMgYm9vbAogdmVjdF9zdXBwb3J0YWJsZV9kaXJlY3Rfb3B0YWJfcCAodmVj X2luZm8gKnZpbmZvLCB0cmVlIG90eXBlLCB0cmVlX2NvZGUgY29kZSwKIAkJCQkgdHJlZSBpdHlw ZSwgdHJlZSAqdmVjb3R5cGVfb3V0LAotCQkJCSB0cmVlICp2ZWNpdHlwZV9vdXQgPSBOVUxMKQor CQkJCSB0cmVlICp2ZWNpdHlwZV9vdXQgPSBOVUxMLAorCQkJCSBlbnVtIG9wdGFiX3N1YnR5cGUg c3VidHlwZSA9IG9wdGFiX2RlZmF1bHQpCiB7CiAgIHRyZWUgdmVjaXR5cGUgPSBnZXRfdmVjdHlw ZV9mb3Jfc2NhbGFyX3R5cGUgKHZpbmZvLCBpdHlwZSk7CiAgIGlmICghdmVjaXR5cGUpCkBAIC0y MTEsNyArMjEyLDcgQEAgdmVjdF9zdXBwb3J0YWJsZV9kaXJlY3Rfb3B0YWJfcCAodmVjX2luZm8g KnZpbmZvLCB0cmVlIG90eXBlLCB0cmVlX2NvZGUgY29kZSwKICAgaWYgKCF2ZWNvdHlwZSkKICAg ICByZXR1cm4gZmFsc2U7CiAKLSAgb3B0YWIgb3B0YWIgPSBvcHRhYl9mb3JfdHJlZV9jb2RlIChj b2RlLCB2ZWNpdHlwZSwgb3B0YWJfZGVmYXVsdCk7CisgIG9wdGFiIG9wdGFiID0gb3B0YWJfZm9y X3RyZWVfY29kZSAoY29kZSwgdmVjaXR5cGUsIHN1YnR5cGUpOwogICBpZiAoIW9wdGFiKQogICAg IHJldHVybiBmYWxzZTsKIApAQCAtNDg3LDEwICs0ODgsMTQgQEAgdmVjdF9qb3VzdF93aWRlbmVk X2ludGVnZXIgKHRyZWUgdHlwZSwgYm9vbCBzaGlmdF9wLCB0cmVlIG9wLAogfQogCiAvKiBSZXR1 cm4gdHJ1ZSBpZiB0aGUgY29tbW9uIHN1cGVydHlwZSBvZiBORVdfVFlQRSBhbmQgKkNPTU1PTl9U WVBFCi0gICBpcyBuYXJyb3dlciB0aGFuIHR5cGUsIHN0b3JpbmcgdGhlIHN1cGVydHlwZSBpbiAq Q09NTU9OX1RZUEUgaWYgc28uICAqLworICAgaXMgbmFycm93ZXIgdGhhbiB0eXBlLCBzdG9yaW5n IHRoZSBzdXBlcnR5cGUgaW4gKkNPTU1PTl9UWVBFIGlmIHNvLgorICAgSWYgVU5QUk9NX1RZUEUg dGhlbiBhY2NlcHQgdGhhdCAqQ09NTU9OX1RZUEUgYW5kIE5FV19UWVBFIG1heSBiZSBvZgorICAg ZGlmZmVyZW50IHNpZ25zIGJ1dCBlcXVhbCBwcmVjaXNpb24gYW5kIHRoYXQgdGhlIHJlc3VsdGlu ZworICAgbXVsdGlwbGljYXRpb24gb2YgdGhlbSBiZSBjb21wYXRpYmxlIHdpdGggVU5QUk9NX1RZ UEUuICAgKi8KIAogc3RhdGljIGJvb2wKLXZlY3Rfam91c3Rfd2lkZW5lZF90eXBlICh0cmVlIHR5 cGUsIHRyZWUgbmV3X3R5cGUsIHRyZWUgKmNvbW1vbl90eXBlKQordmVjdF9qb3VzdF93aWRlbmVk X3R5cGUgKHRyZWUgdHlwZSwgdHJlZSBuZXdfdHlwZSwgdHJlZSAqY29tbW9uX3R5cGUsCisJCQkg dHJlZSB1bnByb21fdHlwZSA9IE5VTEwpCiB7CiAgIGlmICh0eXBlc19jb21wYXRpYmxlX3AgKCpj b21tb25fdHlwZSwgbmV3X3R5cGUpKQogICAgIHJldHVybiB0cnVlOwpAQCAtNTE0LDcgKzUxOSwx OCBAQCB2ZWN0X2pvdXN0X3dpZGVuZWRfdHlwZSAodHJlZSB0eXBlLCB0cmVlIG5ld190eXBlLCB0 cmVlICpjb21tb25fdHlwZSkKICAgdW5zaWduZWQgaW50IHByZWNpc2lvbiA9IE1BWCAoVFlQRV9Q UkVDSVNJT04gKCpjb21tb25fdHlwZSksCiAJCQkJVFlQRV9QUkVDSVNJT04gKG5ld190eXBlKSk7 CiAgIHByZWNpc2lvbiAqPSAyOwotICBpZiAocHJlY2lzaW9uICogMiA+IFRZUEVfUFJFQ0lTSU9O ICh0eXBlKSkKKworICAvKiBDaGVjayBpZiB0aGUgbWlzbWF0Y2ggaXMgb25seSBpbiB0aGUgc2ln biBhbmQgaWYgd2UgaGF2ZQorICAgICBVTlBST01fVFlQRSB0aGVuIGFsbG93IGl0IGlmIHRoZXJl IGlzIGVub3VnaCBwcmVjaXNpb24gdG8KKyAgICAgbm90IGxvc2UgYW55IGluZm9ybWF0aW9uIGR1 cmluZyB0aGUgY29udmVyc2lvbi4gICovCisgIGlmICh1bnByb21fdHlwZQorICAgICAgJiYgVFlQ RV9TSUdOICh1bnByb21fdHlwZSkgPT0gU0lHTkVECisgICAgICAmJiB0cmVlX25vcF9jb252ZXJz aW9uX3AgKCpjb21tb25fdHlwZSwgbmV3X3R5cGUpKQorCXJldHVybiB0cnVlOworCisgIC8qIFRo ZSByZXN1bHRpbmcgYXBwbGljYXRpb24gaXMgdW5zaWduZWQsIGNoZWNrIGlmIHdlIGhhdmUgZW5v dWdoCisgICAgIHByZWNpc2lvbiB0byBwZXJmb3JtIHRoZSBvcGVyYXRpb24uICAqLworICBpZiAo cHJlY2lzaW9uICogMiA+IFRZUEVfUFJFQ0lTSU9OICh1bnByb21fdHlwZSA/IHVucHJvbV90eXBl IDogdHlwZSkpCiAgICAgcmV0dXJuIGZhbHNlOwogCiAgICpjb21tb25fdHlwZSA9IGJ1aWxkX25v bnN0YW5kYXJkX2ludGVnZXJfdHlwZSAocHJlY2lzaW9uLCBmYWxzZSk7CkBAIC01MzIsNiArNTQ4 LDEwIEBAIHZlY3Rfam91c3Rfd2lkZW5lZF90eXBlICh0cmVlIHR5cGUsIHRyZWUgbmV3X3R5cGUs IHRyZWUgKmNvbW1vbl90eXBlKQogICAgdG8gYSB0eXBlIHRoYXQgKGEpIGlzIG5hcnJvd2VyIHRo YW4gdGhlIHJlc3VsdCBvZiBTVE1UX0lORk8gYW5kCiAgICAoYikgY2FuIGhvbGQgYWxsIGxlYWYg b3BlcmFuZCB2YWx1ZXMuCiAKKyAgIElmIFVOUFJPTV9UWVBFIHRoZW4gYWxsb3cgdGhhdCB0aGUg c2lnbnMgb2YgdGhlIG9wZXJhbmRzCisgICBtYXkgZGlmZmVyIGluIHNpZ25zIGJ1dCBub3QgaW4g cHJlY2lzaW9uIGFuZCB0aGF0IHRoZSByZXN1bHRpbmcgdHlwZQorICAgb2YgdGhlIG9wZXJhdGlv biBvbiB0aGUgb3BlcmFuZHMgaXMgY29tcGF0aWJsZSB3aXRoIFVOUFJPTV9UWVBFLgorCiAgICBS ZXR1cm4gMCBpZiBTVE1UX0lORk8gaXNuJ3Qgc3VjaCBhIHRyZWUsIG9yIGlmIG5vIHN1Y2ggQ09N TU9OX1RZUEUKICAgIGV4aXN0cy4gICovCiAKQEAgLTUzOSw3ICs1NTksOCBAQCBzdGF0aWMgdW5z aWduZWQgaW50CiB2ZWN0X3dpZGVuZWRfb3BfdHJlZSAodmVjX2luZm8gKnZpbmZvLCBzdG10X3Zl Y19pbmZvIHN0bXRfaW5mbywgdHJlZV9jb2RlIGNvZGUsCiAJCSAgICAgIHRyZWVfY29kZSB3aWRl bmVkX2NvZGUsIGJvb2wgc2hpZnRfcCwKIAkJICAgICAgdW5zaWduZWQgaW50IG1heF9ub3BzLAot CQkgICAgICB2ZWN0X3VucHJvbW90ZWRfdmFsdWUgKnVucHJvbSwgdHJlZSAqY29tbW9uX3R5cGUp CisJCSAgICAgIHZlY3RfdW5wcm9tb3RlZF92YWx1ZSAqdW5wcm9tLCB0cmVlICpjb21tb25fdHlw ZSwKKwkJICAgICAgdHJlZSB1bnByb21fdHlwZSA9IE5VTEwpCiB7CiAgIC8qIENoZWNrIGZvciBh biBpbnRlZ2VyIG9wZXJhdGlvbiB3aXRoIHRoZSByaWdodCBjb2RlLiAgKi8KICAgZ2Fzc2lnbiAq YXNzaWduID0gZHluX2Nhc3QgPGdhc3NpZ24gKj4gKHN0bXRfaW5mby0+c3RtdCk7CkBAIC02MDAs NyArNjIxLDggQEAgdmVjdF93aWRlbmVkX29wX3RyZWUgKHZlY19pbmZvICp2aW5mbywgc3RtdF92 ZWNfaW5mbyBzdG10X2luZm8sIHRyZWVfY29kZSBjb2RlLAogCQk9IHZpbmZvLT5sb29rdXBfZGVm ICh0aGlzX3VucHJvbS0+b3ApOwogCSAgICAgIG5vcHMgPSB2ZWN0X3dpZGVuZWRfb3BfdHJlZSAo dmluZm8sIGRlZl9zdG10X2luZm8sIGNvZGUsCiAJCQkJCSAgIHdpZGVuZWRfY29kZSwgc2hpZnRf cCwgbWF4X25vcHMsCi0JCQkJCSAgIHRoaXNfdW5wcm9tLCBjb21tb25fdHlwZSk7CisJCQkJCSAg IHRoaXNfdW5wcm9tLCBjb21tb25fdHlwZSwKKwkJCQkJICAgdW5wcm9tX3R5cGUpOwogCSAgICAg IGlmIChub3BzID09IDApCiAJCXJldHVybiAwOwogCkBAIC02MTcsNyArNjM5LDcgQEAgdmVjdF93 aWRlbmVkX29wX3RyZWUgKHZlY19pbmZvICp2aW5mbywgc3RtdF92ZWNfaW5mbyBzdG10X2luZm8s IHRyZWVfY29kZSBjb2RlLAogCSAgICAgIGlmIChpID09IDApCiAJCSpjb21tb25fdHlwZSA9IHRo aXNfdW5wcm9tLT50eXBlOwogCSAgICAgIGVsc2UgaWYgKCF2ZWN0X2pvdXN0X3dpZGVuZWRfdHlw ZSAodHlwZSwgdGhpc191bnByb20tPnR5cGUsCi0JCQkJCQkgY29tbW9uX3R5cGUpKQorCQkJCQkJ IGNvbW1vbl90eXBlLCB1bnByb21fdHlwZSkpCiAJCXJldHVybiAwOwogCSAgICB9CiAJfQpAQCAt Nzk5LDEyICs4MjEsMTUgQEAgdmVjdF9jb252ZXJ0X2lucHV0ICh2ZWNfaW5mbyAqdmluZm8sIHN0 bXRfdmVjX2luZm8gc3RtdF9pbmZvLCB0cmVlIHR5cGUsCiB9CiAKIC8qIEludm9rZSB2ZWN0X2Nv bnZlcnRfaW5wdXQgZm9yIE4gZWxlbWVudHMgb2YgVU5QUk9NIGFuZCBzdG9yZSB0aGUKLSAgIHJl c3VsdCBpbiB0aGUgY29ycmVzcG9uZGluZyBlbGVtZW50cyBvZiBSRVNVTFQuICAqLworICAgcmVz dWx0IGluIHRoZSBjb3JyZXNwb25kaW5nIGVsZW1lbnRzIG9mIFJFU1VMVC4KKworICAgSWYgQUxM T1dfU0hPUlRfU0lHTl9NSVNNQVRDSCB0aGVuIGRvbid0IGNvbnZlcnQgdGhlIHR5cGVzIGlmIHRo ZXkgb25seQorICAgZGlmZmVyIGJ5IHNpZ24uICAqLwogCiBzdGF0aWMgdm9pZAogdmVjdF9jb252 ZXJ0X2lucHV0cyAodmVjX2luZm8gKnZpbmZvLCBzdG10X3ZlY19pbmZvIHN0bXRfaW5mbywgdW5z aWduZWQgaW50IG4sCiAJCSAgICAgdHJlZSAqcmVzdWx0LCB0cmVlIHR5cGUsIHZlY3RfdW5wcm9t b3RlZF92YWx1ZSAqdW5wcm9tLAotCQkgICAgIHRyZWUgdmVjdHlwZSkKKwkJICAgICB0cmVlIHZl Y3R5cGUsIGJvb2wgYWxsb3dfc2hvcnRfc2lnbl9taXNtYXRjaCA9IGZhbHNlKQogewogICBmb3Ig KHVuc2lnbmVkIGludCBpID0gMDsgaSA8IG47ICsraSkKICAgICB7CkBAIC04MTIsOCArODM3LDEy IEBAIHZlY3RfY29udmVydF9pbnB1dHMgKHZlY19pbmZvICp2aW5mbywgc3RtdF92ZWNfaW5mbyBz dG10X2luZm8sIHVuc2lnbmVkIGludCBuLAogICAgICAgZm9yIChqID0gMDsgaiA8IGk7ICsraikK IAlpZiAodW5wcm9tW2pdLm9wID09IHVucHJvbVtpXS5vcCkKIAkgIGJyZWFrOworCiAgICAgICBp ZiAoaiA8IGkpCiAJcmVzdWx0W2ldID0gcmVzdWx0W2pdOworICAgICAgZWxzZSBpZiAoYWxsb3df c2hvcnRfc2lnbl9taXNtYXRjaAorCSAgICAgICAmJiB0cmVlX25vcF9jb252ZXJzaW9uX3AgKHR5 cGUsIHVucHJvbVtpXS50eXBlKSkKKwlyZXN1bHRbaV0gPSB1bnByb21baV0ub3A7CiAgICAgICBl bHNlCiAJcmVzdWx0W2ldID0gdmVjdF9jb252ZXJ0X2lucHV0ICh2aW5mbywgc3RtdF9pbmZvLAog CQkJCQl0eXBlLCAmdW5wcm9tW2ldLCB2ZWN0eXBlKTsKQEAgLTg4OCwyMSArOTE3LDI0IEBAIHZl Y3RfcmVhc3NvY2lhdGluZ19yZWR1Y3Rpb25fcCAodmVjX2luZm8gKnZpbmZvLAogCiAgICBUcnkg dG8gZmluZCB0aGUgZm9sbG93aW5nIHBhdHRlcm46CiAKLSAgICAgdHlwZSB4X3QsIHlfdDsKKyAg ICAgdHlwZTFhIHhfdAorICAgICB0eXBlMWIgeV90OwogICAgICBUWVBFMSBwcm9kOwogICAgICBU WVBFMiBzdW0gPSBpbml0OwogICAgbG9vcDoKICAgICAgc3VtXzAgPSBwaGkgPGluaXQsIHN1bV8x PgogICAgICBTMSAgeF90ID0gLi4uCiAgICAgIFMyICB5X3QgPSAuLi4KLSAgICAgUzMgIHhfVCA9 IChUWVBFMSkgeF90OwotICAgICBTNCAgeV9UID0gKFRZUEUxKSB5X3Q7CisgICAgIFMzICB4X1Qg PSAoVFlQRTMpIHhfdDsKKyAgICAgUzQgIHlfVCA9IChUWVBFNCkgeV90OwogICAgICBTNSAgcHJv ZCA9IHhfVCAqIHlfVDsKICAgICAgW1M2ICBwcm9kID0gKFRZUEUyKSBwcm9kOyAgI29wdGlvbmFs XQogICAgICBTNyAgc3VtXzEgPSBwcm9kICsgc3VtXzA7CiAKLSAgIHdoZXJlICdUWVBFMScgaXMg ZXhhY3RseSBkb3VibGUgdGhlIHNpemUgb2YgdHlwZSAndHlwZScsIGFuZCAnVFlQRTInIGlzIHRo ZQotICAgc2FtZSBzaXplIG9mICdUWVBFMScgb3IgYmlnZ2VyLiBUaGlzIGlzIGEgc3BlY2lhbCBj YXNlIG9mIGEgcmVkdWN0aW9uCisgICB3aGVyZSAnVFlQRTEnIGlzIGV4YWN0bHkgZG91YmxlIHRo ZSBzaXplIG9mIHR5cGUgJ3R5cGUxYScgYW5kICd0eXBlMWInLAorICAgdGhlIHNpZ24gb2YgJ1RZ UEUxJyBtdXN0IGJlIG9uZSBvZiAndHlwZTFhJyBvciAndHlwZTFiJyBidXQgdGhlIHNpZ24gb2YK KyAgICd0eXBlMWEnIGFuZCAndHlwZTFiJyBjYW4gZGlmZmVyLiAnVFlQRTInIGlzIHRoZSBzYW1l IHNpemUgb2YgJ1RZUEUxJyBvcgorICAgYmlnZ2VyIGFuZCBtdXN0IGJlIHRoZSBzYW1lIHNpZ24u IFRoaXMgaXMgYSBzcGVjaWFsIGNhc2Ugb2YgYSByZWR1Y3Rpb24KICAgIGNvbXB1dGF0aW9uLgog CiAgICBJbnB1dDoKQEAgLTkzOSwxNSArOTcxLDE2IEBAIHZlY3RfcmVjb2dfZG90X3Byb2RfcGF0 dGVybiAodmVjX2luZm8gKnZpbmZvLAogCiAgIC8qIExvb2sgZm9yIHRoZSBmb2xsb3dpbmcgcGF0 dGVybgogICAgICAgICAgIERYID0gKFRZUEUxKSBYOwotICAgICAgICAgIERZID0gKFRZUEUxKSBZ OworCSAgRFkgPSAoVFlQRTIpIFk7CiAgICAgICAgICAgRFBST0QgPSBEWCAqIERZOwotICAgICAg ICAgIEREUFJPRCA9IChUWVBFMikgRFBST0Q7CisJICBERFBST0QgPSAoVFlQRTMpIERQUk9EOwog ICAgICAgICAgIHN1bV8xID0gRERQUk9EICsgc3VtXzA7CiAgICAgIEluIHdoaWNoCiAgICAgIC0g RFggaXMgZG91YmxlIHRoZSBzaXplIG9mIFgKICAgICAgLSBEWSBpcyBkb3VibGUgdGhlIHNpemUg b2YgWQogICAgICAtIERYLCBEWSwgRFBST0QgYWxsIGhhdmUgdGhlIHNhbWUgdHlwZSBidXQgdGhl IHNpZ24KLSAgICAgICBiZXR3ZWVuIERYLCBEWSBhbmQgRFBST0QgY2FuIGRpZmZlci4KKyAgICAg ICBiZXR3ZWVuIERYLCBEWSBhbmQgRFBST0QgY2FuIGRpZmZlci4gVGhlIHNpZ24gb2YgRFBST0QK KyAgICAgICBpcyBvbmUgb2YgdGhlIHNpZ25zIG9mIERYIG9yIERZLgogICAgICAtIHN1bSBpcyB0 aGUgc2FtZSBzaXplIG9mIERQUk9EIG9yIGJpZ2dlcgogICAgICAtIHN1bSBoYXMgYmVlbiByZWNv Z25pemVkIGFzIGEgcmVkdWN0aW9uIHZhcmlhYmxlLgogCkBAIC05ODYsMjAgKzEwMTksMjkgQEAg dmVjdF9yZWNvZ19kb3RfcHJvZF9wYXR0ZXJuICh2ZWNfaW5mbyAqdmluZm8sCiAgICAgIGluc2lk ZSB0aGUgbG9vcCAoaW4gY2FzZSB3ZSBhcmUgYW5hbHl6aW5nIGFuIG91dGVyLWxvb3ApLiAgKi8K ICAgdmVjdF91bnByb21vdGVkX3ZhbHVlIHVucHJvbTBbMl07CiAgIGlmICghdmVjdF93aWRlbmVk X29wX3RyZWUgKHZpbmZvLCBtdWx0X3ZpbmZvLCBNVUxUX0VYUFIsIFdJREVOX01VTFRfRVhQUiwK LQkJCSAgICAgZmFsc2UsIDIsIHVucHJvbTAsICZoYWxmX3R5cGUpKQorCQkJICAgICBmYWxzZSwg MiwgdW5wcm9tMCwgJmhhbGZfdHlwZSwKKwkJCSAgICAgVFJFRV9UWVBFICh1bnByb21fbXVsdC5v cCkpKQogICAgIHJldHVybiBOVUxMOwogCisgIC8qIENoZWNrIHRvIHNlZSBpZiB0aGVyZSBpcyBh IHNpZ24gY2hhbmdlIGhhcHBlbmluZyBpbiB0aGUgb3BlcmFuZHMgb2YgdGhlCisgICAgIG11bHRp cGxpY2F0aW9uIGFuZCBwaWNrIHRoZSBhcHByb3ByaWF0ZSBvcHRhYiBzdWJ0eXBlLiAgKi8KKyAg ZW51bSBvcHRhYl9zdWJ0eXBlIHN1YnR5cGU7CisgIGlmIChUWVBFX1NJR04gKHVucHJvbTBbMF0u dHlwZSkgPT0gVFlQRV9TSUdOICh1bnByb20wWzFdLnR5cGUpKQorICAgIHN1YnR5cGUgPSBvcHRh Yl9kZWZhdWx0OworICBlbHNlCisgICAgc3VidHlwZSA9IG9wdGFiX3ZlY3Rvcl9taXhlZF9zaWdu OworCiAgIHZlY3RfcGF0dGVybl9kZXRlY3RlZCAoInZlY3RfcmVjb2dfZG90X3Byb2RfcGF0dGVy biIsIGxhc3Rfc3RtdCk7CiAKICAgdHJlZSBoYWxmX3ZlY3R5cGU7CiAgIGlmICghdmVjdF9zdXBw b3J0YWJsZV9kaXJlY3Rfb3B0YWJfcCAodmluZm8sIHR5cGUsIERPVF9QUk9EX0VYUFIsIGhhbGZf dHlwZSwKLQkJCQkJdHlwZV9vdXQsICZoYWxmX3ZlY3R5cGUpKQorCQkJCQl0eXBlX291dCwgJmhh bGZfdmVjdHlwZSwgc3VidHlwZSkpCiAgICAgcmV0dXJuIE5VTEw7CiAKICAgLyogR2V0IHRoZSBp bnB1dHMgaW4gdGhlIGFwcHJvcHJpYXRlIHR5cGVzLiAgKi8KICAgdHJlZSBtdWx0X29wcm5kWzJd OwogICB2ZWN0X2NvbnZlcnRfaW5wdXRzICh2aW5mbywgc3RtdF92aW5mbywgMiwgbXVsdF9vcHJu ZCwgaGFsZl90eXBlLAotCQkgICAgICAgdW5wcm9tMCwgaGFsZl92ZWN0eXBlKTsKKwkJICAgICAg IHVucHJvbTAsIGhhbGZfdmVjdHlwZSwgdHJ1ZSk7CiAKICAgdmFyID0gdmVjdF9yZWNvZ190ZW1w X3NzYV92YXIgKHR5cGUsIE5VTEwpOwogICBwYXR0ZXJuX3N0bXQgPSBnaW1wbGVfYnVpbGRfYXNz aWduICh2YXIsIERPVF9QUk9EX0VYUFIsCg== --_002_VI1PR08MB532593667144D0CA998C2723FF3B9VI1PR08MB5325eurp_--