From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR02-VI1-obe.outbound.protection.outlook.com (mail-vi1eur02on2068.outbound.protection.outlook.com [40.107.241.68]) by sourceware.org (Postfix) with ESMTPS id C2D4D3861937 for ; Thu, 28 Sep 2023 09:29:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C2D4D3861937 Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.com ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=FoJsxRzgEhIdL+WNr3PKBT3ufbQH+nkdgmtFtB4qo41CNNMRYatKv93d0oDOgEDEvlCy9vf33ZuQicUsIVhRSw8n/rNObTdlP1EyYZI9b6Kk2Z4TSeRo5vdOBesaFbdtzhYBs8u/v202jkC/Q9QAJFOgCyvcZ3cQHRaAWoJvDknw5zK1pnJGD4ESBPdpATi3qEkA5VnW7tqkrmoymg0T1RaEA/BMUZAs8/TVfmnJRhaK6ZXQYcwkv7/A1gQxdERTzZGJVI3cPQCAe/oTweVOvOy52+22QuROXiy7RJiBt0mEUkHE12D1h40CV1jCM4h2Ef9WSEnwHWIf2qRsYKFmbg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=gw5uS7zmuaYfJ8l6Qc2tbGu+CDhwPJb8/2UbvVf1ZcI=; b=j+jnehG+exrAQXQvqb6NPq0Gk4fz0AYVq6TjvJnNHrkmU6+oCy8AVMCkPTxw+PdkaNREAyADS/XVmzF+S/gOuqs3jSzMZ8lPYnAev/GBi+BnmNCZdUFffuSWbIDVOgO8/Daf3AcqDVDOHFZM6OeLgZNOfzIKB1qhjk5sok65IUiqtjpH+1wgcaJuIg8RBqINthHE52Mk0pnT7FmtjJFCRYYqzb9P+9RgS3NbA0gEgCQKeqBNQAB1IJF9HGBE4z00GDMMSaq+nUvePSHfGO1xWJS6cNs/PoozBGrXsMFjwqzbpNCBbifsqCxj8083MuAY8qCJE9oKFijSt69noFRhQg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=gw5uS7zmuaYfJ8l6Qc2tbGu+CDhwPJb8/2UbvVf1ZcI=; b=kMTxMltqEukI4fjbc+mVmC8TunAoCqD1AOCguLHFVNGNfEZKKl4ZOKj+uv/rk7eFSnkPDk7Ro06YxpIFymYgl5h5NInoBs6oc+pbsa1ey12S3KclmmnACzPGFoCZck7E1yb2uC5fmaP+3seGcqba+xquo1N6+gIGjaIjKRM2bI+uOHF8dpTpdqb26+uZQ+LAs/nyq0LCESWCC7ITee69jTDwJk2wmLeauOxDHivILvKFboSoWTw9tVeWDh4bpiAXkUYgCBdRQbaQlZZN+fDfinMwwOJH/Ux4Jfk/ndj2QwhnHh9B50mM1QODZSWp8vp1AxxaEW0WkKfwOKnTRxJNhg== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com; Received: from DU2PR04MB8790.eurprd04.prod.outlook.com (2603:10a6:10:2e1::23) by PAXPR04MB9277.eurprd04.prod.outlook.com (2603:10a6:102:2b9::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6813.28; Thu, 28 Sep 2023 09:29:51 +0000 Received: from DU2PR04MB8790.eurprd04.prod.outlook.com ([fe80::9f5d:8bed:7a5b:e75a]) by DU2PR04MB8790.eurprd04.prod.outlook.com ([fe80::9f5d:8bed:7a5b:e75a%6]) with mapi id 15.20.6838.016; Thu, 28 Sep 2023 09:29:50 +0000 Message-ID: Date: Thu, 28 Sep 2023 11:29:48 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 Subject: Re: [PATCH 5/8] Support APX NDD optimized encoding. Content-Language: en-US To: "Cui, Lili" Cc: hongjiu.lu@intel.com, "Hu, Lin1" , binutils@sourceware.org References: <20230919152527.497773-1-lili.cui@intel.com> <20230919152527.497773-6-lili.cui@intel.com> From: Jan Beulich In-Reply-To: <20230919152527.497773-6-lili.cui@intel.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-ClientProxiedBy: FR3P281CA0094.DEUP281.PROD.OUTLOOK.COM (2603:10a6:d10:a1::9) To DU2PR04MB8790.eurprd04.prod.outlook.com (2603:10a6:10:2e1::23) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DU2PR04MB8790:EE_|PAXPR04MB9277:EE_ X-MS-Office365-Filtering-Correlation-Id: 9f969b69-5665-49b7-5798-08dbc0057709 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 4Jwba4D3sCZT/PnmRPNtxHLeXQB1XNyLdz6+l0QUwU+VjyHFj276vKiwXwlPFUq2nuVINbmFWkrihAhlMGiOis/YdTN7drLHYN1PQu/vI3+pD0HgxOv++G1j7g3DnWftiHG6vgdz9dvHR1EkTeVR99eNNDzKWvhclJbksv5ONuZkXfA8dnytrzF4c0s+rUWWf6yFDxPWK6NAy0KdbkwiUvbY8fk9aBXFoBSCLSEd+YQ47+ApifnBi7oLwpd89aVlOS3Wnj0SfvKtJo3OZqA2DR9CvfbdaAwQXF+x9Xee0U5xLpnc0sjdMv7mtnalu62Ut8xZa49AzkBb0/oMTvebtlA99zePC8VYI4PSDQj8vvvWbKwo2XoxCk3kXyftP9qvfD+3y6JXs2nSkiPAV8tkp55A0QOFsY/f9PTcjpu+hI2q1Z5BkfAXpFGNTrc+HK+Ocgvow1Bi1l4LN08uvq+1A5BlKEnkj/4w3cf1kX+WCYlNxmR6z/LedPUiUh6P4Oc1P1x7Ap5Wt9eGajMKDq4yYrCq8bXog6ipYWEg0x+TGndQOS+KkCoBL5fcSSDwM8nJME9wWqPLBu2oGWI5ITGznYyXHmv0P3PYzRtwXZDD3PyixVtAsVUnMQrH9N7XvE/6rz6+Ge9ePjv8xidDzz0n2A== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DU2PR04MB8790.eurprd04.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(39860400002)(396003)(136003)(346002)(376002)(366004)(230922051799003)(1800799009)(186009)(451199024)(64100799003)(4326008)(5660300002)(8676002)(6512007)(8936002)(36756003)(6506007)(2616005)(53546011)(66476007)(26005)(86362001)(41300700001)(66556008)(478600001)(66946007)(31696002)(6916009)(2906002)(316002)(6486002)(38100700002)(83380400001)(31686004)(43740500002)(45980500001);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?N2ZUc0ZUZjM4M2tnc3YxdDYvY0xKazNjcmw4VCtMZmRJUDlqWWU0K1BjT3dp?= =?utf-8?B?b2VUTDlPZmxBVzN6ckZpUUtYcS9OQTdzZVVnTTJLUUJOSkJ1SkxmSjBHQktC?= =?utf-8?B?YWh1WkdKNklDK0lweEZzVkdJaTczWjdkTytmaHF4SldTczZXb2VqU1VSditq?= =?utf-8?B?ZTVRc2p4ek41clBZNk1JZnJDdFJXSkRDV1BFb2JQVzhPNUtBdVd1YkQwVFhz?= =?utf-8?B?TVZzTTI1UFV5Z01qZk1HajVzMFR5cFFndXdSQUxuOUQ0cStFZkxzOVc4UHNj?= =?utf-8?B?UFhURWVuUFVlMFA1ZGJscEtLUWxjQUl5K2lJZVBqTENXbUdiZWc2blZtcHEy?= =?utf-8?B?ZHNzdGNrdlU0L09GbEZHYlJZSWlLSlZqMGRham1UTnV4QlRHSk5yVXBtdFpq?= =?utf-8?B?dmpvaDZhTThqUlVIcDVJbW9sOHdTVzQ5eGVyckhrQzR0TDdadmtqOHlUYWRv?= =?utf-8?B?YlhjblZYQWcyWDBsSjJGeTBrK0VJckMzb3ZsNjN2eFM0cDk3allKYTBqZlBE?= =?utf-8?B?d21LY0NGZzRHRmhqNm9JaWFub1ZxeWlKaFB3L251V2prNEdDVG9LU3pnU3U2?= =?utf-8?B?NGVXREk3Ulc3SzlpY3k4aGRWK2liMTQyRlltUDJWUlpDcnZFK1B5bkhuV0dt?= =?utf-8?B?VERDUk5LTCtuN3R6di95WXk2UWZMTUZJSHo3UFNYMWRnK3lOTnZucEI5djdm?= =?utf-8?B?Q1ZHVTZGQ1k5bTg1bmtxOGdPeERJYkIraHdEb05oL0dKemFZSjc2ZnBhaU56?= =?utf-8?B?MXUrSFJPUUhtQUJvRGx6NnRGUTJIWXN2ZjNVb0FtZDFkb1dDWmNWWlY2dWNy?= =?utf-8?B?aGxHK1BLYU13RXpjTFlQWjFqeHo2NUVqdy9kL1NYSm1zY1dQZnA0YmxpOGZo?= =?utf-8?B?d1J5UVBOZHdDOUlPSnRyWlVZc0E1VHMvaVozNEt4cFYyWFMvdGlBVGFVQ2RS?= =?utf-8?B?N0pCZGxUUFA1MzVvQnR1Z0FSTFBRYTNveGM1aSswekllRmhnSjF1MlZBYkRC?= =?utf-8?B?eS9DZDg3VTRSVDRmdnhnUXhDdlU2NldVTE5iWEF1ejBBdysrc0RUMmxtSWxv?= =?utf-8?B?T2s2d1JGb2pDekwzb0VzeUt0cVhCdGJwWXlIa2UyTUxLMGdwQ3d5UDJOWjVt?= =?utf-8?B?eWJ5dUlwMzF2eVhJY2hNR0FTSXB3eXRjNVF4WlFGcytsamhyOXkvRzdDckli?= =?utf-8?B?Z1JIWjcwUms5MndwZG5BZWNNTjhmbHVIYXlRaWc2OEFIV0o5WmF1dTloZExL?= =?utf-8?B?a3pOTzJWbkRJMjVWZk1QRDV6MURucGhTZzZPdFEvOHVDbG9YSzM3QUNkU1Fh?= =?utf-8?B?NlFHZ2NaRUplUWVhSHBhbmFDeXZRNW0vUXBlYUlBcjFyQ3V1QjViWEJmckJY?= =?utf-8?B?UjZWVk0yR2dLRVJnQVNPZFFpWTNxM1E5VVhuaGpFeHpoeTJLMUhvTWpkbEho?= =?utf-8?B?OHV0Z0l6TXFNaFMvVElTTmRTUG52Zy9Ba3Uzd1o2UFM2dkRMNXZBSVRGeHNF?= =?utf-8?B?RlRlMUdiVDhtQ20wZjZYak9QVmlFd25DSWdvYTBvbVdRVHJxd28xQi9jSXdJ?= =?utf-8?B?L0NITGYwN1g2dTVEcDM4YkpuOFg2cStMVHQwdzIrSUVzckw2c1g5azBvL2F0?= =?utf-8?B?cGM3d3hnbm91OXlpVVdsL2ovMmxWdHhGMXNOdmhaSkVuamtzbEdtUzZ1OWtI?= =?utf-8?B?YnpXaUhCbUtFeFVqamJPdmZmUU8zdW5JWDdVN0FWWWV3UTJuSGFORVN3NXFG?= =?utf-8?B?OVdjK2hwUDVIek9ocTlmSnlHNGNhY1BvYTR4V0xhLzBiTWYzbzRjWUk2amJM?= =?utf-8?B?MHpJMlNJaXFOTEtuV282QVRkbjQ5RlhPVjZhZUxFQzZRWmlWM3c3MWpVUURH?= =?utf-8?B?Q2xUalBKeVJQbG9oQUZuVzJWWlJaKzRVaWZuUXpvcFZpaU5FZXdTbDY5Q3By?= =?utf-8?B?T2YvcDZ4UXREdXE1K3Y1a1I1M0RtSmJHM2ltMjdhNmdNdTJkb0FQenBUbksz?= =?utf-8?B?WUo3Y2NibUNuTDNrRkxnSWlIdGdSZHJhMWVUNkc1QllIWnFpQjAvdlVTd1FZ?= =?utf-8?B?VTJyZGttV2p6bkJIVHdKMU1wSUVpVzBnZ1BnOXlWNHA2d2RoQW1sMHRCWS9K?= =?utf-8?Q?TPC7bSr2zWKCaQd1ra1O9Dq5T?= X-OriginatorOrg: suse.com X-MS-Exchange-CrossTenant-Network-Message-Id: 9f969b69-5665-49b7-5798-08dbc0057709 X-MS-Exchange-CrossTenant-AuthSource: DU2PR04MB8790.eurprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Sep 2023 09:29:50.8723 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: f7a17af6-1c5c-4a36-aa8b-f5be247aa4ba X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 5AlNF2s8nErEYu1JPLjpZMzDXqvsYmY4bDIMS7E23jaxhH497fXRie8v73iNpQzZHnUMwia4p+pwhOJW8eht9A== X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAXPR04MB9277 X-Spam-Status: No, score=-3026.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SCC_10_SHORT_WORD_LINES,SCC_5_SHORT_WORD_LINES,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 19.09.2023 17:25, Cui, Lili wrote: > --- a/gas/config/tc-i386.c > +++ b/gas/config/tc-i386.c > @@ -7091,6 +7091,46 @@ check_EgprOperands (const insn_template *t) > return 0; > } > > +/* Optimize APX NDD insns to non-NDD insns. */ > + > +static int "bool" please when the function merely returns a yes/no indicator. > +optimize_NDD_to_nonNDD (const insn_template *t) > +{ > + if (t->opcode_modifier.vexvvvv > + && t->opcode_space == SPACE_EVEXMAP4 > + && i.reg_operands >= 2 See the remark near the bottom of the changes to this file: This condition is likely insufficient, as - further insns allowing ND may not be treated this way (CCMPscc, CTESTscc, and one of the CFCMOVcc forms at the very least), - {nf} uses will want excluding, as it would be merely a waste of time to try to re-match with fewer operands. > + && (i.types[i.operands - 1].bitfield.dword > + || i.types[i.operands - 1].bitfield.qword)) Why do you exclude byte and word operations? Imo what you want to check is class being Reg. > + { > + int tmp_flag = -1; Either type or name need to change: A variable of this name wants to be "bool". I would have suggested "dupl" as the name, but that how doesn't fit how the variable is used below. (Of course I'm also not happy with the use of plain int here and below, but that's a wider issue throughout the source file. Still it would be nice if new code properly used unsigned int whenever only non-negative values are to be held.) > + int dest = i.operands - 1; > + int src1 = (i.operands > 2) ? i.operands - 2 : 0; > + int src2 = (i.operands > 3) ? i.operands - 3 : 0; > + > + if (i.op[src1].regs == i.op[dest].regs) > + tmp_flag = src2; > + /* adcx and adox don't have D bit. */ IMUL doesn't either, yet ought to also be eligible? We have a "commutative" flag already - can't you extend its use to the purposes you have here? I expect this would simplify ... > + else if (i.op[src2].regs == i.op[dest].regs Just to mention it: Both this and the earlier similar equality check aren't entirely legal, as you haven't previously checked that the respective operands actually are register ones. Analysis tools (like the UB sanitizer) may choke on this, even if otherwise this shouldn't be a problem from what I can tell. > + && (t->opcode_modifier.d > + || t->mnem_off == MN_adcx > + || t->mnem_off == MN_adox) > + && (t->mnem_off != MN_sub) > + && (t->mnem_off != MN_sbb)) ... this condition. > + tmp_flag = src1; > + if (tmp_flag != -1) > + { > + --i.operands; > + --i.reg_operands; > + --i.tm.operands; > + > + if (tmp_flag != src2) > + swap_2_operands (tmp_flag, src2); Nit: There's once again something wrong with indentation here. > + return 1; > + } > + } > + return 0; > +} > + > /* Helper function for the progress() macro in match_template(). */ > static INLINE enum i386_error progress (enum i386_error new, > enum i386_error last, > @@ -7562,6 +7602,15 @@ match_template (char mnem_suffix) > slip through to break. */ > } > > + /* If we can optimize a NDD insn to non-NDD insn, like > + add %r16, %r8, %r8 -> add %r16, %r8, then rematch template. */ > + if (optimize_NDD_to_nonNDD (t)) I don't think such an optimization should be done without any form of -O. As to the function name, maybe better optimize_NDD_to_REX2()? > + { > + t = current_templates->start; > + --t; > + continue; So the decrement is to compensate the loop continuation. Nevertheless imo this wants spelling "t = current_templates->start - 1". Yet like above note that a good UB checker may object to this subtraction of 1, unless you build upon templates making it here not being part of the first group in i386_optab[]. (If any such dependency exists, it needs spelling out in a suitable place, i.e. in particular one that's hard to overlook when doing re-arrangements.) > + } > + > /* Check if VEX/EVEX encoding requirements can be satisfied. */ > if (VEX_check_encoding (t)) > { I also wonder whether this doesn't need moving further down. I expect at least VEX_check_encoding() may need running first. But I'm worried anyway that this patch comes too early in the series. {evex} prefixes or use of {nf} would need skipping the optimization attempt, and getting all the conditions right is likely easier to see when all the rest of the infrastructure is in place. > --- /dev/null > +++ b/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s > @@ -0,0 +1,115 @@ > +# Check 64bit APX NDD instructions with optimized encoding > + > + .allow_index_reg > + .text > +_start: > +inc %ax,%ax > +inc %eax,%eax > +inc %rax,%rax > +inc %r16,%r16 > +add %r31b,%r8b,%r8b > +add %r31b,%r8b,%r31b > +addb %r31b,%r8b,%r8b > +add %r31,%r8,%r8 > +addq %r31,%r8,%r8 > +add %r31d,%r8d,%r8d > +addl %r31d,%r8d,%r8d > +add %r31w,%r8w,%r8w > +addw %r31w,%r8w,%r8w > +{store} add %r31,%r8,%r8 > +{load} add %r31,%r8,%r8 > +add %r31,(%r8),%r31 > +add (%r31),%r8,%r8 > +add $0x12344433,%r15,%r15 > +add $0xfffffffff4332211,%r8,%r8 > +dec %r17,%r17 > +not %r17,%r17 > +neg %r17,%r17 > +sub %r15,%r17,%r17 > +sub %r15d,(%r8),%r15d > +sub (%r15,%rax,1),%r16,%r16 > +sub $0x1234,%r30,%r30 > +sbb %r15,%r17,%r17 > +sbb %r15,(%r8),%r15 > +sbb (%r15,%rax,1),%r16,%r16 > +sbb $0x1234,%r30,%r30 > +adc %r15,%r17,%r17 > +adc %r15d,(%r8),%r15d > +adc (%r15,%rax,1),%r16,%r16 > +adc $0x1234,%r30,%r30 > +or %r15,%r17,%r17 > +or %r15d,(%r8),%r15d > +or (%r15,%rax,1),%r16,%r16 > +or $0x1234,%r30,%r30 > +xor %r15,%r17,%r17 > +xor %r15d,(%r8),%r15d > +xor (%r15,%rax,1),%r16,%r16 > +xor $0x1234,%r30,%r30 > +and %r15,%r17,%r17 > +and %r15d,(%r8),%r15d > +and (%r15,%rax,1),%r16,%r16 > +and $0x1234,%r30,%r30 > +and $0x1234,%r30 > +ror %r31,%r31 > +rorb %r31b,%r31b Please be consistent with omitting (or having) suffixes (further up you simply test both variants, but that's not the case throughout ... > +ror $0x2,%r12,%r12 > +rol %r31,%r31 > +rolb %r31b,%r31b > +rol $0x2,%r12,%r12 > +rcr %r31,%r31 > +rcrb %r31b,%r31b > +rcr $0x2,%r12b,%r12b > +rcr $0x2,%r12,%r12 > +rcl %r31,%r31 > +rclb %r31b,%r31b > +rcl $0x2,%r12b,%r12b > +rcl $0x2,%r12,%r12 > +shl %r31,%r31 > +shlb %r31b,%r31b > +shl $0x2,%r12b,%r12b > +shl $0x2,%r12,%r12 > +sar %r31,%r31 > +sarb %r31b,%r31b > +sar $0x2,%r12b,%r12b > +sar $0x2,%r12,%r12 > +shl %r31,%r31 > +shlb %r31b,%r31b > +shl $0x2,%r12b,%r12b > +shl $0x2,%r12,%r12 > +shr %r31,%r31 > +shrb %r31b,%r31b ... here. Jan