From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR04-VI1-obe.outbound.protection.outlook.com (mail-eopbgr80043.outbound.protection.outlook.com [40.107.8.43]) by sourceware.org (Postfix) with ESMTPS id 25F443858C74 for ; Fri, 26 Aug 2022 12:12:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 25F443858C74 Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.com ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=EfiSYBOYgL7iROmYh5oNKZ0XmQa7nwIVa8xqH8nH1Kidk/SpChsX9j9cvThnMHEwrsBYAF0x8t3z4PbZkCtfgcn5ibybIh0uwtymJr4HrLLN6HcRiqlqBz6ePngEeBbaQ9ZRhNoSBK7iDiIVqrBD65J2V0HLvCvXSQhbPvySoSOapJRHbnla07OOsINWQiXeID3pMjE6SLUGSrYc+8UpI9sxZIK8+DmFOWIfgs2zuIX+cOtbVnhr2TIu4dEUUFve8ZnKmietHaX/lnxcWLTSMMafTiMi+EblcwQ/WZU2m/abKl5lM4qA+tk9vAuL49Ap6+2iTSMADIqmwcp89YL8/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=4pl/pEdBSYgM/qvPrmReAh4AZaRox7ubtuL3iRTYgME=; b=jjjwd59385ypTAE64+rRfJ573t4eM/EEWRpqfJTEwwrgz2WnHJA5Ra2oUa2ZQVoqowRp+CWqZkXGM05XgxKneDbNyfnel0pY4LK2B0VkhwCbp7o4dT2ycb0Xx75dDdy5g2WSZv+ueXbZF2KzXkGKEtgSzRuYOIpol1QbnAAE9d69OmRNV+AV5YO9NXTR6r5y1eXSESzbpsqeq2/TGrtHEd8nc5TUJUA18ySFzapkUceRhV9ursQS4oYQET1lvfI+67weVh02an9BSf51R62c5vomZO2kJJwEnSFVzNjivu52S+ioIvQg4Z6lO7O8EvNGCVy3VNDD/kJVBEjZfbE+4g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=4pl/pEdBSYgM/qvPrmReAh4AZaRox7ubtuL3iRTYgME=; b=br71IMdYx/Mc4eD6I0eTvSno9dHf6f/zdaodPAi8skAJpX4vR9JSgo8WEoRbUkaFCOS/ppXoiOeGHB06JS0XLbuyioU0SRDABOSS1AVzjXigxFePwDuxixvPL/HHc21PnyuM9cpZWl9xTpFJ9Vqjh14P3TKaWO5jKRJGfqxrM9O/Wh/Si/QH6PLBhj/gAaEGpDjuusA+K+DEu0ysDrFnRSSt0WQ37g0sg1e8uIy7PxzMOEpwy+eSWhdvqDN5L54VgfSvGMHq8sDm4M/xLmQ3n3lWyktC6+i5m2I3+ajN6PYjp3orvjSra0Yp0Bt9PlIKDja1AgWVdlnkKu1SFXbqYw== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com; Received: from VE1PR04MB6560.eurprd04.prod.outlook.com (2603:10a6:803:122::25) by AM0PR0402MB3457.eurprd04.prod.outlook.com (2603:10a6:208:24::26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5566.15; Fri, 26 Aug 2022 12:12:26 +0000 Received: from VE1PR04MB6560.eurprd04.prod.outlook.com ([fe80::2d5d:bae0:430f:70ad]) by VE1PR04MB6560.eurprd04.prod.outlook.com ([fe80::2d5d:bae0:430f:70ad%4]) with mapi id 15.20.5566.015; Fri, 26 Aug 2022 12:12:26 +0000 Message-ID: <7e2041c5-57c0-21a0-7246-54b8196f3c9c@suse.com> Date: Fri, 26 Aug 2022 14:12:24 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.13.0 Content-Language: en-US To: "H.J. Lu" Cc: Binutils From: Jan Beulich Subject: x86: further optimization opportunities Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-ClientProxiedBy: FR3P281CA0168.DEUP281.PROD.OUTLOOK.COM (2603:10a6:d10:a0::20) To VE1PR04MB6560.eurprd04.prod.outlook.com (2603:10a6:803:122::25) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 9c8c1977-d241-4f2a-bd87-08da875c3d31 X-MS-TrafficTypeDiagnostic: AM0PR0402MB3457:EE_ X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: eR93MCnrzXVvHxFORDnJpSYuv3u+lPuLs9fNLlwxAf/u3+kmQn/xmx/GPOhZdKX8YfgoKkK+OPRZJG/nNaLV7ITgLuQlJTp0dMKVvitfHD4fQltRNn4yBqWA2hIAh+zrMRb/ghjjDoaMrotjiGSJAcsBx4MVwmv+aER3Zw2tiHXah6IOaM+0Y5LXPAfkdA9mE9NSJL+VoSfCRtFEKiZopaRuq3hSy7kBQMkqk3m/xxLtUCXL5oDWrC8KrvTUxpjpfSKMZCru1YEFmmx7P2OiCBSp/7L0P+1NsQ6k73PpL43sAnm7Qygzm/2+JwOqHY6oV/cqrPBw9RhHLKHNMkgnAB9i+7/Qcb9Sr0DV38l+pXQ+TzQfN/4vS1fDStkNmbYUubk3XR3g7Tf62F/YOsI+oubWhHVpiuRYTKGpx9qHmJ1fm/Q7a1iczxXE6gjzpNy1MP9M2Ov2xfIVn9NSCR2j61fobjAsUEQdbws0NfXGAx0rilwn5CjpfkKfauhiBNq6BEq6M72uKc+qeKZG5OUuCxIkl1AgIM1YKbfXmZGOHs3Kz2RnyyK7NJpaksQkSsRPf5N0RZ9nDGX23D/kSeDPPEY7yx3Uf2Y65NkizB5T2yxiWpQEeO6chHsH8XyUlUkaF4+QFxK+8dYdWwREZbzGhgoVJsBX/6reQfv5lqJU3u9nmUEkmTsSy0EP2FTw1Bx1yeBNu4cre1AyT27Kpngvj798nmUE7oMcj6Hy3CczYlCFfgF4zMQRD/eNU+/aOLS71tLJol+rriYm3RUY1hiFWUX7EaJeE+OP31j4o/tFYk0= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:VE1PR04MB6560.eurprd04.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230016)(39860400002)(366004)(396003)(136003)(346002)(376002)(6486002)(41300700001)(26005)(6506007)(6512007)(2906002)(66476007)(8676002)(4326008)(478600001)(316002)(31686004)(31696002)(36756003)(66946007)(86362001)(6916009)(66556008)(38100700002)(186003)(2616005)(8936002)(5660300002)(83380400001)(45980500001)(43740500002);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?L3FQL3JCbFFrZ29SbUN6SWlYelhFZDZZNkppaHdhQkRURWtrZ1BNUkZRUU5t?= =?utf-8?B?QW1sS3RJcUsrWGJpejN2NHNlSElnYmo2NDlIb0FBVDk5OGk5NEZEWXFGcHFC?= =?utf-8?B?bGw5Sjk4YXEyMzB4ZTJoZGp0ZU9RVWFjeXBWVFVwRzZMemZweWFYMlVXK1JY?= =?utf-8?B?MXlFSkE5SnFrT3pjSWFmQmlnT2o5ZERIUXhGWkhYMDAwWFF3WllMTlBqREtO?= =?utf-8?B?UDA0eFU3Qi9WeXpDTEV1NXhXcGhpZm5EenZlL2E2NzFUYkdybEcwN3h4NUVy?= =?utf-8?B?Zm5HeFlGZnFkaEVDOHgrc3JTYkNOSEM3YkJGSFdLMy9GVkl6cDhjRlRFS2Iy?= =?utf-8?B?SzRiZ2o5cExscHpaNGNkT1k5ZTJJRmFHRVdUdElOSWZrbysrQWg3WXltUVRJ?= =?utf-8?B?OHk1OWVvR2ZFMmZ4OEFjYlQzS2RtUFUwVE5KK0Y2R1A1ZXZMc2JmeXJ0eVhu?= =?utf-8?B?dGlwWi9HZ1FCU3N0RmVYbHRxQ2hFNXg4Mm1sYWU1SHhKUTUwOUZyeFNham14?= =?utf-8?B?U28wdXBOVFB2Yy9wQ0VvZnFVd1NtdjkvMm1TU3hIUU9TUEhzT25aR3FJcDVU?= =?utf-8?B?UGhuNGNaSjB0ZlRaaXp0ZEtjc0xkVk0xbjFUL1pMNDJhckJBQmxHd3RnYUFo?= =?utf-8?B?RjcyWkxOYWV5Um55RmNUWWxDVXY3aGljRXYyWXFMSVlBcE9rVEVvcVVwVHBF?= =?utf-8?B?aWZHVFQzZERPUVdJODZybUdnN3JOVjFPYWpUS0lZWU9nYlJHT1FCeVdMeDZq?= =?utf-8?B?cnZ2ZDVBVkF2MXlSRmhQUGNWWEVid2Q1cU9JVHFQb28zcTg5Vkdjem9kUTlZ?= =?utf-8?B?OG12UmJsN3N5RVhaV0pvNHYrY2lNWHFYdDZ3ekwxWFRxZHJrRTI3YVIyMG05?= =?utf-8?B?UTNEYVB6TmhxWFc2WGREcDlDU1dwUDlYUGMzemc4dlBjZ0JmdVo4aE41bElP?= =?utf-8?B?cnJoeGp0MGJFeGZlN1lRYkxCYTY2TDVJcGkyZ2prWkZJaHFiTGR5QUR3Mng1?= =?utf-8?B?aGN0VTVpUjlxUlFvakpXQTFEbXVGRFF1YlpMOXZzSWJxampQcnNUS1BFZkxG?= =?utf-8?B?aDBCNkd5K0ZMeno5b3ZndjZpNEJRU1BWL0ZPMnZlRUI2Wlh4SldobFhPZTN3?= =?utf-8?B?ekMzdTR1amNXbVVLNnFrb2hZdlEyS0YwYlhqY1lLMkc4YTFzS2UrY3JLOWVH?= =?utf-8?B?THM1VGRGaFoxL1NXd3Qxak9FczhPdXhURDBYc0I1ZVFhb3R2NGpQOHBWekZm?= =?utf-8?B?T0pFeE1xbGRENTNRMWs3S0tDaHE5Qk9sUDlTVkFQSEtidHJMcjAySkluaDgr?= =?utf-8?B?OWt0bzlxM0dlKzFOZ2FMTWVVRzgrU01wMzZxNkd4RkcxbndSZFRlaGlMWE9U?= =?utf-8?B?ME9PQ1dUZXd0eFVqNW5TZ1dqNlNab3VFbHJhMjhqUGhQUHR4bzdqZWZ6UGZY?= =?utf-8?B?cVJGQkNpMGE2WkRtWmlWRDgxektINWtwM2I2UzZhZmFTSVN4c2hYWUxoVDNj?= =?utf-8?B?K1lhSnJnQVJ1OUNVSkNJSmZqdUtOQVBJV2JDb2F4ZG1NUldEa0doYkxDOUFo?= =?utf-8?B?N1BrYUNVeTJyNzVSakg5WS9laUJ0cm1uOVFWRC9TMTllVHVmSVc0ZU84bEM2?= =?utf-8?B?ZUhpSHVjRzZybitpOWttUkFtTWg0aElTakJiZ01mSmR2ZWhIaGVCUVJBTVdP?= =?utf-8?B?MFhTQmVmQ2xTODlXVmR5RmhEVGdiTTNabUxqeTZxbXYzZWQrTnVqbXZlcHJp?= =?utf-8?B?cmFvSHlUWU9mVjNxTFkyUkdhVm5meStlOEpqRlBBL1FLbU9JK1VxbGJFK2dR?= =?utf-8?B?ZzZRNEp0bHcyUHdHdmhJWXpKQ1p5T3hZaXE4enplQ0d3ZHVBTmhockhCNENK?= =?utf-8?B?d3pQdFR4b2g3eEJYSlNWOVo4eVBEaUZkYWZQYjRpWU5KVllqNTNzV3RpTit3?= =?utf-8?B?emZzMHRQMFBrTDhjM200eWR1OThiUSt2YVVRa3FRQ2dyUDZQOFBPUVAvKzY2?= =?utf-8?B?WFdBTzdGSTZ2eUhyZ2RjYlpvZEdub1ZOZmNxeHg2VXpndmZocU5CVHIrL0hO?= =?utf-8?B?ZDByTXlsa0owdWdXU0xUTXh0RHdxTFIydXgvcitPOTlTejlvRHJidmE2cXdX?= =?utf-8?Q?jT+codsWZEsrCSWCdy+XoA1ku?= X-OriginatorOrg: suse.com X-MS-Exchange-CrossTenant-Network-Message-Id: 9c8c1977-d241-4f2a-bd87-08da875c3d31 X-MS-Exchange-CrossTenant-AuthSource: VE1PR04MB6560.eurprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 26 Aug 2022 12:12:26.0851 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: f7a17af6-1c5c-4a36-aa8b-f5be247aa4ba X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: k348XB4m19uxSbc4T/yXf7n0EN9QZ8huHIlYhg2dnDvHMuNUL8yLruE81JZuKtoAuPB7jLp3g+TOI+FAoD5MHw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM0PR0402MB3457 X-Spam-Status: No, score=-3030.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: H.J., over time I've accumulated a list of possible transformations we could do in addition to what we do already. Some are a little exotic, so may not be worth it. Hence I'd like to ask for your view on things, if you don't mind. 1) {,X}OR r,0 and AND/TEST r,~0 --> TEST r,r Except for 32-bit forms in 64-bit mode. Note that ADD/CMP/SUB can't be replaced this way, because TEST leaves AF undefined. But perhaps IMUL r,1 can be, unless we feared people depending on a particular implementation's setting of PF, SF, and ZF. 2) AND r,0 and perhaps IMUL r,r,0 --> XOR r,r 3) {,V}PCMPEQQ --> e.g. {,V}PCMPEQD {,V}PCMPGTQ --> {,V}PXOR. when both source operands match, for being a 1 byte shorter encoding. Some of the respective AVX512 forms can be transformed into KX{,N}OR*. 4) P{AND{,N},{,X}OR} and {AND{,N},{,X}OR}PD --> {AND{,N},{,X}OR}PS MOVDQ{A,U} and MOV{A,U}PD --> MOV{A,U}PS for saving the prefix byte. Perhaps only when -Os. 5) PSHUFD --> SHUFPS with identical register operands, and again perhaps only when -Os. 6) VPCMP{,U}{B,W,D,Q} and VPCOM{,U}{B,W,D,Q} --> VPCMP{EQ,GT}{B,W,D,Q} where suitable, saving the immediate byte and in the latter case also possibly allowing for the shorter VEX2 encoding. 7) VPSUB{,U}S{B,W,D,Q} --> VPXOR VPCMPGT{B,W,D,Q} (pre-AVX512) --> VPXOR when both source operands are identical. 8) VFMADD{P,S}{S,D} et al --> VFMADD{132,231,213}{P,S}{S,D} when one operand is suitably repeated. (This requires CpuFMA to be explicitly enabled, as that's not a prereq to CpuFMA4.) 9) MOVZX with 64-bit destination to drop the REX64 prefix. 10) RET/RETF/LRET with immediate of zero to immediate-less form. 11) 32-bit TEST with {8..15}-bit immediate in 16-bit mode. 12) MOVABS displacement optimization with -Os, using 32-bit addressing mode as applicable. 13) BT{,C,R,S} with in-range immediate to operand-size-prefix-less forms. For memory operands only by reducing nominal operand size (for register operands going from 16- to 32-bit operand size is okay) and with an adjustment to the displacement as necessary (perhaps leaving alone ones with LOCK prefix). 14) BT{,C,R,S} with memory operand and out-of-range immediate, transforming the upper immediate bits into an adjustment to the displacement. Accompanied by a warning, as the upper bits would no longer end up being ignored. The SDM in fact suggests this as a model assemblers might follow. Note that examples of 4 and 5 can actually be found in Linux'es crypto code.