From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR02-AM0-obe.outbound.protection.outlook.com (mail-am0eur02on2077.outbound.protection.outlook.com [40.107.247.77]) by sourceware.org (Postfix) with ESMTPS id A895C3858D39 for ; Tue, 19 Sep 2023 15:44:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A895C3858D39 Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.com ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=hofQ9g3pgiACFHr87ESOjmp1C1nvkYRtCeTp7lQLXZlKkFBPMIpWi5G9wE04H5Oflwp0QQZceJCUBmNScokvxSoIxlqCDhBkuRTqV3vYbHC7Jjj0ZOQi+B9N9tQksSRMMbMrrV1CVPZj1sqFcJThL2pQEJ076kOIgzqz9qKgV9Bhx92YAcmX4myiscbcuGmBWniPWJpw6OJ2yy5RcEworIMaxGzUtMFzJw3R6HQGLQ5V5znAz/WzL1qqtnqTdcphkx5wfVYPA+h6EbDty3c9YAhW36Mkf4XjwENpzbLXLfeYTFnDjdcP9aDv/Zmvg3h1RrEZIAJ1Hu93Ru4FlBvwSw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=fbBmsylpyGl5gCdDx/IAahsZWumwVrwVTnPneUHWg64=; b=e5Ract28hVM+jGljRkG1P/4BfTRy8RZMUHW+ZqoHDdG8WRArrp/9R0jfnd/6U2S8gCuIwHoMmyNXCNjv9bIITVRMATV/hzcCYxi8f5CN2UOnq/84snBWjmN6uxD/W4Z0Cbg/YJStV2dRXmvoAGdY569YQkYS5yrmIKlXNgSFhPo5lfQ+Zn3Hz6LLBukP+g420hUVxtBxtUykkBFuol2J1zur1EdTDnRyFt/F+Hp8YxPJRCfnHu2zqvkAvDzpT/MS0teC5tahGFHlWGSr94GqHdxcpd6xUwRUrcq7x9JSOHHYEdEoTF6/Y6uhB6fC8/nuEYUoDmwxtnh9nrK/WR/eOg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=fbBmsylpyGl5gCdDx/IAahsZWumwVrwVTnPneUHWg64=; b=WX7xT4WofI07kWKk1nwn6bF0g596ejBQcKABWW62jnrbdYeucbSKx7up7IwkI260G6sKnOmQqfyKbJzDLTpORk03Uxg1ij0/4l026ENA4eIteomd0asbrjRMP7335HaVn2YR7rNu5bDRinRefzWQMEnU8cOOTi6Aasf74Pr6FJdmbUYS2Op1TnzPc8yrITFm9yx+redovcky+E5pBN4KuJr8MA7PZWeBWsFLOCQrdIxCeum0pH9RQ0ejtUafFcszTZvRAG3IqIe7BwIAsdC0kpqLdQQrwsVdduzMUf/9EbRqM1CDk5eTouNPRRmwS2bjF8jwSCNinCadIFcn02JGfw== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com; Received: from DU2PR04MB8790.eurprd04.prod.outlook.com (2603:10a6:10:2e1::23) by AM9PR04MB8194.eurprd04.prod.outlook.com (2603:10a6:20b:3e6::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6792.27; Tue, 19 Sep 2023 15:44:28 +0000 Received: from DU2PR04MB8790.eurprd04.prod.outlook.com ([fe80::f749:b27f:2187:6654]) by DU2PR04MB8790.eurprd04.prod.outlook.com ([fe80::f749:b27f:2187:6654%6]) with mapi id 15.20.6792.026; Tue, 19 Sep 2023 15:44:28 +0000 Message-ID: Date: Tue, 19 Sep 2023 17:44:26 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 Subject: [PATCH v2 1/4] x86: fold certain VEX and EVEX templates Content-Language: en-US To: Binutils Cc: "H.J. Lu" , Lili Cui References: From: Jan Beulich In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-ClientProxiedBy: FR0P281CA0046.DEUP281.PROD.OUTLOOK.COM (2603:10a6:d10:48::23) To DU2PR04MB8790.eurprd04.prod.outlook.com (2603:10a6:10:2e1::23) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DU2PR04MB8790:EE_|AM9PR04MB8194:EE_ X-MS-Office365-Filtering-Correlation-Id: de6a00be-2420-435e-edc0-08dbb9274f08 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: M7dvAsX4whdzzyVM2zTlqEUAheqygswOe1bSb4wfAW8y5lwfnuxn6fQxTSvE77uZrqazwPwq6/YcIxfd7fKSGaqgblf9HwDQiBA0yjx07E2LiCH235ABV3E2UGBAJ+Eu1RFb8u0rHiyxQYh6XTlwrbpzlly3X922IFCbLAEle9S7ks2rF1NY79NxzBgUrqeSz4bRgEJQCqLtCNj5m8/vhalxwFUhLm9ayKKOs/P3XG+g67aBzzGsgQ9L9US5IEzOMFCMkf8Iup9N+kk/eqJDY0fIq1E8NOovljYbJvIcqg5FzWf1K/JULDYodwOT/uoVG6JD8bZura3gDTE5926+gwjWiDHBmRvAStaLvlFjmM3BumKXTbEWm9cKsI1Iq1GG75RmxxFS1t+OaCoN7IobSwPeuQR3OVRRnuvTiuE9sth5+HEqrV5tmG7NPXsPizypR1Kq03acooD3B7Vrf+SpWS0EpUc34WIBpSVyHdf3ikJ/LIfICo8+n3QJZ3yVmnOZsQ1+/SSQhE4X/f3+0Ohs5YaxgR0r20P6QNcS/2qeNXpKf115mNgpHuzTmfcPNzWkT1JhcWTKkoOtAvHF0AGQQ1Bb499ZpfkeopCHBGTfO2MfoQORizwG+rEHBX4OrhQlFW78ZDXJlgfE9r6Wqtmj+w== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DU2PR04MB8790.eurprd04.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(396003)(39860400002)(366004)(136003)(376002)(346002)(451199024)(186009)(1800799009)(36756003)(31686004)(5660300002)(30864003)(8676002)(83380400001)(86362001)(41300700001)(4326008)(8936002)(6506007)(6486002)(2616005)(26005)(6512007)(66899024)(2906002)(66946007)(66556008)(478600001)(66476007)(31696002)(38100700002)(54906003)(316002)(6916009)(19627235002)(43740500002)(45980500001);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?SklqUXdKOVRCMGJHcVJOTVRyYlpqY0MzclhreVpkd2p5U3dlOGZXamh5TkNt?= =?utf-8?B?LzZOR3U0ZnlqZnRoWjFaRUJadm1GTjhhejNDN3Qwby83QndsbW9Vc1RBY3lB?= =?utf-8?B?a0s5cEF2RENSMC9ROFpPZGRJaUNONUt6a2J5bWMrSVBqVUJReDd0bVQyQndu?= =?utf-8?B?cmNzUmoyREh3Ty9CQUJ5SHFaUVpmUUxpSEc4NG9mcElsWGY2TU5saGhiR3NO?= =?utf-8?B?SlVpam5pYkNKVjdyT0NvUlV3WHRwQ3RqcTRVajlDVzRKRUNvd1ZUeFpEK3Zh?= =?utf-8?B?Q1pGeWZrT2ZsNzd2eWFaekxyUzVMMTRmd2x6anZmK3VUTW93clUvVFNmVDZs?= =?utf-8?B?Zk81ZExGT3E2U1gzbWE5am9xZ1pSc2lENURZZlFJUXRhT2VaTSsvTlZKOWlz?= =?utf-8?B?MWNlT25LM2ttM2ZYdXNCeXVORWQxbWVQY0pTdXZiSUplUTVXajlZZEJWZnBI?= =?utf-8?B?Mkk3Z2hSTjRiUWYrc3ZtYTVQK3pDd2NyaVIvSGZPT0UzaE5SNjNubTc0RU9x?= =?utf-8?B?T2Exb21oRUJoSVRKOEljT0JoRHptQUxNN3ZLZ1lYQldsd3FmaWZWMFIzYXpV?= =?utf-8?B?dkJORk8wMzFBSFl6aFpMNkRtamRjSnViNjdTekh5cHlzZWYydlZKcDJIZ2NO?= =?utf-8?B?S0YxczhJaHErWml4bk9UVk5ZalhqRnA3bVdLOThZb1BRdzA5UnFFeTZPeHIw?= =?utf-8?B?eGtwdExGL3FSZ2MyOVpaMThxMlVYeUZyNkgvWkZYMm1mV2pGZ0hkcXBEeVZy?= =?utf-8?B?VzNvUUZ2SHh3SGxxL2pyVmR4eFR4Tkw2azhiQ0tKaERobnk5RWc5ZWdoZWNL?= =?utf-8?B?UFdzdHlwSFl4TUMxWXNiVXF2c28waWtwV0UvR2M3OXEwQWNLRzBMZ3ZUSDBT?= =?utf-8?B?S29rWThjejZmUE1TaU42SE5ERUU5L0kxTktCSnlwVjNhR0EwMkpYdTNFMUhG?= =?utf-8?B?Ti9pazhuMm94SnBSSWhYZFhFMG1kWkVhbEQ0OTFkOXAxOFl6c0I2Um1oWXJ3?= =?utf-8?B?c2xMZVYrWTRzQytRTGxhR0VMeXVTMk80UjEvdmFEeFlpZzRyRHpKN29WRjZM?= =?utf-8?B?VUlLWSt2SFhMOTdJaFdndGEwRjNXWVg5VjhodmZFZXIwenpnOVlCUlpFR3U1?= =?utf-8?B?Zi9DRXFuRE5yZjk0MXZVN2NUQjBwcWcrWE9UcGo2YWdJd280L0R6SG9DTjB0?= =?utf-8?B?OEhKREZ1RGpKbFRJL2ltbTlKbk8reS9vanZYZUFET3UrSFdEMjRuSnpjOTN4?= =?utf-8?B?TWhBSDlWUGRpOG45Qkg5TG5oVnVpVjhhckUya2JlZlF3UDNJbXFXdVY4aWFZ?= =?utf-8?B?TmFweDhSOXNUZXBLZ0VtNDdHa3EzZVJJQkI1WUlaOFVzNmhYdC9XK1pmWEJI?= =?utf-8?B?NjJpaFY1dVZYcXJnNHVRdnhNZGtpd0NWem5VTFA1OTlTQm4xeEZqb2c0MFh6?= =?utf-8?B?QytRMDcxdGFDZmRlcm1pNzdvQWV2eEVndmthSEdGUS9xUXFMcnd2TGx4V25a?= =?utf-8?B?eUVZaUZpTEk3SFNMQS9YL0tvQ0VhTmJMU0NLdFRGQ0RETkd4UTJqN2VhVEpS?= =?utf-8?B?ZzV4bjgvMHNJem1WNndGNjhOeHo3TWNXVXU4TU9GRDljZGtZaUs1ZU93UC9a?= =?utf-8?B?T25JM3J3dU05bThPOFhrTytmeVRPamhHWjI5ekxtRWczc3MydTIwRkdLZVQ1?= =?utf-8?B?L0tWZTd0OG9WSUI4WVhKa2dyb0J4TEptQWZtVm5wS0w3VFhJRUwxV1NDSmxX?= =?utf-8?B?UnlGVWVNTXNDemwveWZEZU5hTmhLbnd6ZExDSkU3Y2lsOWdiZjhiVHlCWmNC?= =?utf-8?B?V3RPWXBrMm9XZEJSdlg3MVB3d3FDdXpxMmlycDRRS0pZdnlRcnI4MThUSklC?= =?utf-8?B?YUFsWXZFdHJzNloza2IrMDVnL29icDZPek1OaXdLZU8ycXlRazYycDVpZkxs?= =?utf-8?B?RWRzOENNZWFRTk02SFhJU1Z1a2tjd0d3R1QrUnY2RXNJRDA0ZW0vYkhrcTJm?= =?utf-8?B?UVFQa2taWk1VU0paYXFveTVIakg2SGlyOXNNRG5iOWd3eTFiL2FLQ2hwMEc3?= =?utf-8?B?Q1Z5V3ZuTDgzbmpXd0pJSXUxcTJUZ0RZZ1RsZlkrKy92cTFoM2NQY0NEM3VS?= =?utf-8?Q?uyUkjQu5QmOl0OPzM9cNFsHPF?= X-OriginatorOrg: suse.com X-MS-Exchange-CrossTenant-Network-Message-Id: de6a00be-2420-435e-edc0-08dbb9274f08 X-MS-Exchange-CrossTenant-AuthSource: DU2PR04MB8790.eurprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 19 Sep 2023 15:44:28.6593 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: f7a17af6-1c5c-4a36-aa8b-f5be247aa4ba X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: sfrJMb8ze7nC589nYvYWB5WmRr0bTtVW4EcgsLFufdblGvagScqeLVHbm/MSiEhQ9I+OEPNgTpppLLOUvMhRKQ== X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM9PR04MB8194 X-Spam-Status: No, score=-3026.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: In anticipation of APX introduce logic to reduce the number of templates we have now, allowing to limit some the number of ones we then need to gain. The fundamental requirements are that - attributes be compatible, which specifically means VexW needs to be the same in the templates (which often isn't the case, for VEX encodings having far more WIG tha, EVEX ones), - the EVEX form being AVX512F (with or without AVX512VL), not any of its extensions (the same will then be required for APX - it'll need to be APX_F). Note that in check_register() there's now a redundant zmm check. Since this logic will need revisiting for APX anyway, I'd like to keep it that way for now. (Similarly a couple of if()-s which could be folded are kept separate, to reduce code churn when adding APX support.) --- RFC: Of course there are quite a few code changes, so there is the question of the savings being worth it. The AVX512F constraint may be possible to relax, but the change is big enough already. --- v2: Deal with need_evex_encoding() being unreliable ahead of operand parsing (the difference really is largely benign here, but becomes relevant in subsequent changes). --- a/gas/config/tc-i386.c +++ b/gas/config/tc-i386.c @@ -436,6 +436,7 @@ struct _i386_insn vex_encoding_vex, vex_encoding_vex3, vex_encoding_evex, + vex_encoding_evex512, vex_encoding_error } vec_encoding; @@ -1872,6 +1873,13 @@ cpu_flags_and_not (i386_cpu_flags x, i38 static const i386_cpu_flags avx512 = CPU_ANY_AVX512F_FLAGS; +static INLINE bool need_evex_encoding (void) +{ + return i.vec_encoding == vex_encoding_evex + || i.vec_encoding == vex_encoding_evex512 + || i.mask.reg; +} + #define CPU_FLAGS_ARCH_MATCH 0x1 #define CPU_FLAGS_64BIT_MATCH 0x2 @@ -1899,6 +1907,29 @@ cpu_flags_match (const insn_template *t) /* This instruction is available only on some archs. */ i386_cpu_flags cpu = cpu_arch_flags; + /* Dual VEX/EVEX templates may need stripping of one of the flags. */ + if (t->opcode_modifier.vex && t->opcode_modifier.evex) + { + /* Dual AVX/AVX512F templates need to retain AVX512F only if we already + know that EVEX encoding will be needed. */ + if ((x.bitfield.cpuavx || x.bitfield.cpuavx2) + && x.bitfield.cpuavx512f) + { + if (need_evex_encoding ()) + { + x.bitfield.cpuavx = 0; + x.bitfield.cpuavx2 = 0; + } + /* need_evex_encoding() isn't reliable before operands were + parsed. */ + else if (i.operands) + { + x.bitfield.cpuavx512f = 0; + x.bitfield.cpuavx512vl = 0; + } + } + } + /* AVX512VL is no standalone feature - match it and then strip it. */ if (x.bitfield.cpuavx512vl && !cpu.bitfield.cpuavx512vl) return match; @@ -1924,6 +1955,8 @@ cpu_flags_match (const insn_template *t) && (!x.bitfield.cpupclmulqdq || cpu.bitfield.cpupclmulqdq)) match |= CPU_FLAGS_ARCH_MATCH; } + else if (x.bitfield.cpuavx2 && cpu.bitfield.cpuavx2) + match |= CPU_FLAGS_ARCH_MATCH; else if (x.bitfield.cpuavx512f) { /* We need to check a few extra flags with AVX512F. */ @@ -3646,6 +3679,27 @@ install_template (const insn_template *t i.tm = *t; + /* Dual VEX/EVEX templates need stripping one of the possible variants. */ + if (t->opcode_modifier.vex && t->opcode_modifier.evex) + { + if ((is_cpu (t, CpuAVX) || is_cpu (t, CpuAVX2)) + && is_cpu (t, CpuAVX512F)) + { + if (need_evex_encoding ()) + { + i.tm.opcode_modifier.vex = 0; + i.tm.cpu.bitfield.cpuavx = 0; + if (is_cpu (&i.tm, CpuAVX2)) + i.tm.cpu.bitfield.isa = 0; + } + else + { + i.tm.opcode_modifier.evex = 0; + i.tm.cpu.bitfield.cpuavx512f = 0; + } + } + } + /* Note that for pseudo prefixes this produces a length of 1. But for them the length isn't interesting at all. */ for (l = 1; l < 4; ++l) @@ -4553,6 +4607,8 @@ optimize_encoding (void) i.tm.opcode_modifier.vex = VEX128; i.tm.opcode_modifier.vexw = VEXW0; i.tm.opcode_modifier.evex = 0; + i.vec_encoding = vex_encoding_vex; + i.mask.reg = NULL; } else if (optimize > 1) i.tm.opcode_modifier.evex = EVEX128; @@ -5438,6 +5494,11 @@ md_assemble (char *line) if (optimize && !i.no_optimize && i.tm.opcode_modifier.optimize) optimize_encoding (); + /* Past optimization there's no need to distinguish vex_encoding_evex and + vex_encoding_evex512 anymore. */ + if (i.vec_encoding == vex_encoding_evex512) + i.vec_encoding = vex_encoding_evex; + if (use_unaligned_vector_move) encode_with_unaligned_vector_move (); @@ -5467,6 +5528,7 @@ md_assemble (char *line) if (i.tm.operand_types[j].bitfield.tmmword) i.xstate |= xstate_tmm; else if (i.tm.operand_types[j].bitfield.zmmword + && !i.tm.opcode_modifier.vex && vector_size >= VSZ512) i.xstate |= xstate_zmm; else if (i.tm.operand_types[j].bitfield.ymmword @@ -6468,7 +6530,8 @@ check_VecOperands (const insn_template * cpu = cpu_flags_and (cpu_flags_from_attr (t->cpu), avx512); if (!cpu_flags_all_zero (&cpu) && !is_cpu (t, CpuAVX512VL) - && !cpu_arch_flags.bitfield.cpuavx512vl) + && !cpu_arch_flags.bitfield.cpuavx512vl + && (!t->opcode_modifier.vex || need_evex_encoding ())) { for (op = 0; op < t->operands; ++op) { @@ -6779,6 +6842,8 @@ check_VecOperands (const insn_template * /* Check vector Disp8 operand. */ if (t->opcode_modifier.disp8memshift + && (!t->opcode_modifier.vex + || need_evex_encoding ()) && i.disp_encoding <= disp_encoding_8bit) { if (i.broadcast.type || i.broadcast.bytes) @@ -6874,7 +6939,8 @@ VEX_check_encoding (const insn_template return 1; } - if (i.vec_encoding == vex_encoding_evex) + if (i.vec_encoding == vex_encoding_evex + || i.vec_encoding == vex_encoding_evex512) { /* This instruction must be encoded with EVEX prefix. */ if (!is_evex_encoding (t)) @@ -11211,6 +11277,10 @@ s_insn (int dummy ATTRIBUTE_UNUSED) goto done; } + /* No need to distinguish vex_encoding_evex and vex_encoding_evex512. */ + if (i.vec_encoding == vex_encoding_evex512) + i.vec_encoding = vex_encoding_evex; + /* Are we to emit ModR/M encoding? */ if (!i.short_form && (i.mem_operands @@ -11633,6 +11703,12 @@ RC_SAE_specifier (const char *pstr) return NULL; } + if (i.vec_encoding == vex_encoding_default) + i.vec_encoding = vex_encoding_evex512; + else if (i.vec_encoding != vex_encoding_evex + && i.vec_encoding != vex_encoding_evex512) + return NULL; + i.rounding.type = RC_NamesTable[j].type; return (char *)(pstr + RC_NamesTable[j].len); @@ -11692,6 +11768,12 @@ check_VecOperations (char *op_string) } op_string++; + if (i.vec_encoding == vex_encoding_default) + i.vec_encoding = vex_encoding_evex; + else if (i.vec_encoding != vex_encoding_evex + && i.vec_encoding != vex_encoding_evex512) + goto unknown_vec_op; + i.broadcast.type = bcst_type; i.broadcast.operand = this_operand; @@ -13953,8 +14035,17 @@ static bool check_register (const reg_en } } - if (vector_size < VSZ512 && r->reg_type.bitfield.zmmword) - return false; + if (r->reg_type.bitfield.zmmword) + { + if (vector_size < VSZ512) + return false; + + if (i.vec_encoding == vex_encoding_default) + i.vec_encoding = vex_encoding_evex512; + else if (i.vec_encoding != vex_encoding_evex + && i.vec_encoding != vex_encoding_evex512) + i.vec_encoding = vex_encoding_error; + } if (vector_size < VSZ256 && r->reg_type.bitfield.ymmword) return false; @@ -13979,7 +14070,8 @@ static bool check_register (const reg_en || flag_code != CODE_64BIT) return false; - if (i.vec_encoding == vex_encoding_default) + if (i.vec_encoding == vex_encoding_default + || i.vec_encoding == vex_encoding_evex512) i.vec_encoding = vex_encoding_evex; else if (i.vec_encoding != vex_encoding_evex) i.vec_encoding = vex_encoding_error; --- a/gas/config/tc-i386-intel.c +++ b/gas/config/tc-i386-intel.c @@ -209,6 +209,11 @@ operatorT i386_operator (const char *nam || i386_types[j].sz[0] > 8 || (i386_types[j].sz[0] & (i386_types[j].sz[0] - 1))) return O_illegal; + if (i.vec_encoding == vex_encoding_default) + i.vec_encoding = vex_encoding_evex; + else if (i.vec_encoding != vex_encoding_evex + && i.vec_encoding != vex_encoding_evex512) + return O_illegal; if (!i.broadcast.bytes && !i.broadcast.type) { i.broadcast.bytes = i386_types[j].sz[0]; --- a/opcodes/i386-opc.tbl +++ b/opcodes/i386-opc.tbl @@ -131,6 +131,8 @@ #define EVexLIG EVex=EVEXLIG #define EVexDYN EVex=EVEXDYN +#define Disp8ShiftVL Disp8MemShift=DISP8_SHIFT_VL + #define Vsz256 Vsz=VSZ256 #define Vsz512 Vsz=VSZ512 @@ -1518,8 +1520,8 @@ vdivs, 0x5e, AVX, Modrm|Vex vdppd, 0x6641, AVX, Modrm|Vex|Space0F3A|VexVVVV|VexWIG|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM } vdpps, 0x6640, AVX, Modrm|Vex|Space0F3A|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM } vextractf128, 0x6619, AVX, Modrm|Vex=2|Space0F3A|VexW=1|NoSuf, { Imm8, RegYMM, Unspecified|BaseIndex|RegXMM } -vextractps, 0x6617, AVX, Modrm|Vex|Space0F3A|VexWIG|NoSuf, { Imm8, RegXMM, Reg32|Dword|Unspecified|BaseIndex } -vextractps, 0x6617, AVX|x64, RegMem|Vex|Space0F3A|VexWIG|NoSuf, { Imm8, RegXMM, Reg64 } +vextractps, 0x6617, AVX|AVX512F, Modrm|Vex128|EVex128|Space0F3A|VexWIG|Disp8MemShift=2|NoSuf, { Imm8, RegXMM, Reg32|Dword|Unspecified|BaseIndex } +vextractps, 0x6617, AVX|AVX512F|x64, RegMem|Vex128|EVex128|Space0F3A|VexWIG|NoSuf, { Imm8, RegXMM, Reg64 } vhaddpd, 0x667c, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM } vhaddps, 0xf27c, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM } vhsubpd, 0x667d, AVX, Modrm|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM } @@ -1541,7 +1543,7 @@ vmovap, 0x28, AVX, D|Modrm| // by Intel AVX spec). To avoid extra template in gcc x86 backend and // support assembler for AMD64, we accept 64bit operand on vmovd so // that we can use one template for both SSE and AVX instructions. -vmovd, 0x666e, AVX, D|Modrm|Vex=1|Space0F|NoSuf, { Reg32|Unspecified|BaseIndex, RegXMM } +vmovd, 0x666e, AVX|AVX512F, D|Modrm|Vex128|EVex128|Space0F|Disp8MemShift=2|NoSuf, { Reg32|Unspecified|BaseIndex, RegXMM } vmovd, 0x667e, AVX|x64, D|RegMem|Vex=1|Space0F|VexW=2|NoSuf|Size64, { RegXMM, Reg64 } vmovddup, 0xf212, AVX, Modrm|Vex|Space0F|VexWIG|NoSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM } vmovddup, 0xf212, AVX, Modrm|Vex=2|Space0F|VexWIG|NoSuf, { Unspecified|BaseIndex|RegYMM, RegYMM } @@ -1559,7 +1561,7 @@ vmovntdqa, 0x662a, AVX|AVX2, Modrm|Vex|S vmovntp, 0x2b, AVX, Modrm|Vex|Space0F|VexWIG|CheckOperandSize|NoSuf, { RegXMM|RegYMM, Xmmword|Ymmword|Unspecified|BaseIndex } vmovq, 0xf37e, AVX, Load|Modrm|Vex=1|Space0F|VexWIG|NoSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM } vmovq, 0x66d6, AVX, Modrm|Vex=1|Space0F|VexWIG|NoSuf, { RegXMM, Qword|Unspecified|BaseIndex|RegXMM } -vmovq, 0x666e, AVX|x64, D|Modrm|Vex=1|Space0F|VexW=2|NoSuf, { Reg64|Unspecified|BaseIndex, RegXMM } +vmovq, 0x666e, AVX|AVX512F|x64, D|Modrm|Vex128|EVex128|Space0F|VexW1|Disp8MemShift=3|NoSuf, { Reg64|Unspecified|BaseIndex, RegXMM } vmovs, 0x10, AVX, D|Modrm|VexLIG|Space0F|VexWIG|NoSuf, { |Unspecified|BaseIndex, RegXMM } vmovs, 0x10, AVX, D|Modrm|VexLIG|Space0F|VexVVVV|VexWIG|NoSuf, { RegXMM, RegXMM, RegXMM } vmovshdup, 0xf316, AVX, Modrm|Vex|Space0F|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM } @@ -1599,8 +1601,10 @@ vpcmpgtq, 0x6637, AVX|AVX2, Modrm|Vex|Sp vpcmpistri, 0x6663, AVX, Modrm|Vex|Space0F3A|VexWIG|NoSuf, { Imm8, Unspecified|BaseIndex|RegXMM, RegXMM } vpcmpistrm, 0x6662, AVX, Modrm|Vex|Space0F3A|VexWIG|NoSuf, { Imm8, Unspecified|BaseIndex|RegXMM, RegXMM } vperm2f128, 0x6606, AVX, Modrm|Vex256|Space0F3A|VexVVVV|VexW0|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegYMM, RegYMM, RegYMM } -vpermilp, 0x660c | , AVX, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM } -vpermilp, 0x6604 | , AVX, Modrm|Vex|Space0F3A|VexW0|CheckOperandSize|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM } +vpermilps, 0x660c, AVX|AVX512F, Modrm|Vex|EVexDYN|Masking|Space0F38|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM } +vpermilps, 0x6604, AVX|AVX512F, Modrm|Vex|EVexDYN|Masking|Space0F3A|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM } +vpermilpd, 0x660d, AVX, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM } +vpermilpd, 0x6605, AVX, Modrm|Vex|Space0F3A|VexW0|CheckOperandSize|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM } vpextr, 0x6616, AVX|, Modrm|Vex|Space0F3A||NoSuf, { Imm8, RegXMM, |Unspecified|BaseIndex } vpextrw, 0x66c5, AVX, Load|Modrm|Vex|Space0F|VexWIG|No_bSuf|No_wSuf|No_sSuf, { Imm8, RegXMM, Reg32|Reg64 } vpextr, 0x6614 | , AVX, RegMem|Vex|Space0F3A|VexWIG|NoSuf, { Imm8, RegXMM, Reg32|Reg64 } @@ -1632,18 +1636,18 @@ vpminub, 0x66da, AVX|AVX2, Modrm|C|Vex|S vpminud, 0x663b, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM } vpminuw, 0x663a, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM } vpmovmskb, 0x66d7, AVX|AVX2, Modrm|Vex|Space0F|VexWIG|No_bSuf|No_wSuf|No_sSuf, { RegXMM|RegYMM, Reg32|Reg64 } -vpmovsxbd, 0x6621, AVX, Modrm|Vex|Space0F38|VexWIG|NoSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM } -vpmovsxbq, 0x6622, AVX, Modrm|Vex|Space0F38|VexWIG|NoSuf, { Word|Unspecified|BaseIndex|RegXMM, RegXMM } +vpmovsxbd, 0x6621, AVX|AVX512F|AVX512VL, Modrm|Vex128|EVex128|Masking|Space0F38|VexWIG|Disp8MemShift=2|NoSuf, { RegXMM|Dword|Unspecified|BaseIndex, RegXMM } +vpmovsxbq, 0x6622, AVX|AVX512F|AVX512VL, Modrm|Vex128|EVex128|Masking|Space0F38|VexWIG|Disp8MemShift=1|NoSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM } vpmovsxbw, 0x6620, AVX, Modrm|Vex|Space0F38|VexWIG|NoSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM } vpmovsxdq, 0x6625, AVX, Modrm|Vex|Space0F38|VexWIG|NoSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM } -vpmovsxwd, 0x6623, AVX, Modrm|Vex|Space0F38|VexWIG|NoSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM } -vpmovsxwq, 0x6624, AVX, Modrm|Vex|Space0F38|VexWIG|NoSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM } -vpmovzxbd, 0x6631, AVX, Modrm|Vex|Space0F38|VexWIG|NoSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM } -vpmovzxbq, 0x6632, AVX, Modrm|Vex|Space0F38|VexWIG|NoSuf, { Word|Unspecified|BaseIndex|RegXMM, RegXMM } +vpmovsxwd, 0x6623, AVX|AVX512F|AVX512VL, Modrm|Vex128|EVex128|Masking|Space0F38|VexWIG|Disp8MemShift=3|NoSuf, { RegXMM|Qword|Unspecified|BaseIndex, RegXMM } +vpmovsxwq, 0x6624, AVX|AVX512F|AVX512VL, Modrm|Vex128|EVex128|Masking|Space0F38|VexWIG|Disp8MemShift=2|NoSuf, { RegXMM|Dword|Unspecified|BaseIndex, RegXMM } +vpmovzxbd, 0x6631, AVX|AVX512F|AVX512VL, Modrm|Vex128|EVex128|Masking|Space0F38|VexWIG|Disp8MemShift=2|NoSuf, { RegXMM|Dword|Unspecified|BaseIndex, RegXMM } +vpmovzxbq, 0x6632, AVX|AVX512F|AVX512VL, Modrm|Vex128|EVex128|Masking|Space0F38|VexWIG|Disp8MemShift=1|NoSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM } vpmovzxbw, 0x6630, AVX, Modrm|Vex|Space0F38|VexWIG|NoSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM } vpmovzxdq, 0x6635, AVX, Modrm|Vex|Space0F38|VexWIG|NoSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM } -vpmovzxwd, 0x6633, AVX, Modrm|Vex|Space0F38|VexWIG|NoSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM } -vpmovzxwq, 0x6634, AVX, Modrm|Vex|Space0F38|VexWIG|NoSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM } +vpmovzxwd, 0x6633, AVX|AVX512F|AVX512VL, Modrm|Vex128|EVex128|Masking|Space0F38|VexWIG|Disp8MemShift=3|NoSuf, { RegXMM|Qword|Unspecified|BaseIndex, RegXMM } +vpmovzxwq, 0x6634, AVX|AVX512F|AVX512VL, Modrm|Vex128|EVex128|Masking|Space0F38|VexWIG|Disp8MemShift=2|NoSuf, { RegXMM|Dword|Unspecified|BaseIndex, RegXMM } vpmuldq, 0x6628, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM } vpmulhrsw, 0x660b, AVX|AVX2, Modrm|Vex|Space0F38|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM } vpmulhuw, 0x66e4, AVX|AVX2, Modrm|C|Vex|Space0F|VexVVVV|VexWIG|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM } @@ -1710,39 +1714,40 @@ vzeroupper, 0x77, AVX, Vex|Space0F|VexWI // 256bit integer AVX2 instructions. -vpmovsxbd, 0x6621, AVX2, Modrm|Vex=2|Space0F38|VexWIG|NoSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegYMM } -vpmovsxbq, 0x6622, AVX2, Modrm|Vex=2|Space0F38|VexWIG|NoSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegYMM } +vpmovsxbd, 0x6621, AVX2|AVX512F|AVX512VL, Modrm|Vex256|EVex256|Masking|Space0F38|VexWIG|Disp8MemShift=3|NoSuf, { RegXMM|Qword|Unspecified|BaseIndex, RegYMM } +vpmovsxbq, 0x6622, AVX2|AVX512F|AVX512VL, Modrm|Vex256|EVex256|Masking|Space0F38|VexWIG|Disp8MemShift=2|NoSuf, { RegXMM|Dword|Unspecified|BaseIndex, RegYMM } vpmovsxbw, 0x6620, AVX2, Modrm|Vex=2|Space0F38|VexWIG|NoSuf, { Unspecified|BaseIndex|RegXMM, RegYMM } vpmovsxdq, 0x6625, AVX2, Modrm|Vex=2|Space0F38|VexWIG|NoSuf, { Unspecified|BaseIndex|RegXMM, RegYMM } -vpmovsxwd, 0x6623, AVX2, Modrm|Vex=2|Space0F38|VexWIG|NoSuf, { Unspecified|BaseIndex|RegXMM, RegYMM } -vpmovsxwq, 0x6624, AVX2, Modrm|Vex=2|Space0F38|VexWIG|NoSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegYMM } -vpmovzxbd, 0x6631, AVX2, Modrm|Vex=2|Space0F38|VexWIG|NoSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegYMM } -vpmovzxbq, 0x6632, AVX2, Modrm|Vex=2|Space0F38|VexWIG|NoSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegYMM } +vpmovsxwd, 0x6623, AVX2|AVX512F|AVX512VL, Modrm|Vex256|EVex256|Masking|Space0F38|VexWIG|Disp8MemShift=4|NoSuf, { RegXMM|Unspecified|BaseIndex, RegYMM } +vpmovsxwq, 0x6624, AVX2|AVX512F|AVX512VL, Modrm|Vex256|EVex256|Masking|Space0F38|VexWIG|Disp8MemShift=3|NoSuf, { RegXMM|Qword|Unspecified|BaseIndex, RegYMM } +vpmovzxbd, 0x6631, AVX2|AVX512F|AVX512VL, Modrm|Vex256|EVex256|Masking|Space0F38|VexWIG|Disp8MemShift=3|NoSuf, { RegXMM|Qword|Unspecified|BaseIndex, RegYMM } +vpmovzxbq, 0x6632, AVX2|AVX512F|AVX512VL, Modrm|Vex256|EVex256|Masking|Space0F38|VexWIG|Disp8MemShift=2|NoSuf, { RegXMM|Dword|Unspecified|BaseIndex, RegYMM } vpmovzxbw, 0x6630, AVX2, Modrm|Vex=2|Space0F38|VexWIG|NoSuf, { Unspecified|BaseIndex|RegXMM, RegYMM } vpmovzxdq, 0x6635, AVX2, Modrm|Vex=2|Space0F38|VexWIG|NoSuf, { Unspecified|BaseIndex|RegXMM, RegYMM } -vpmovzxwd, 0x6633, AVX2, Modrm|Vex=2|Space0F38|VexWIG|NoSuf, { Unspecified|BaseIndex|RegXMM, RegYMM } -vpmovzxwq, 0x6634, AVX2, Modrm|Vex=2|Space0F38|VexWIG|NoSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegYMM } +vpmovzxwd, 0x6633, AVX2|AVX512F|AVX512VL, Modrm|Vex256|EVex256|Masking|Space0F38|VexWIG|Disp8MemShift=4|NoSuf, { RegXMM|Unspecified|BaseIndex, RegYMM } +vpmovzxwq, 0x6634, AVX2|AVX512F|AVX512VL, Modrm|Vex256|EVex256|Masking|Space0F38|VexWIG|Disp8MemShift=3|NoSuf, { RegXMM|Qword|Unspecified|BaseIndex, RegYMM } // New AVX2 instructions. vbroadcasti128, 0x665A, AVX2, Modrm|Vex=2|Space0F38|VexW=1|NoSuf, { Xmmword|Unspecified|BaseIndex, RegYMM } vbroadcastsd, 0x6619, AVX2, Modrm|Vex=2|Space0F38|VexW=1|NoSuf, { RegXMM, RegYMM } -vbroadcastss, 0x6618, AVX2, Modrm|Vex|Space0F38|VexW=1|NoSuf, { RegXMM, RegXMM|RegYMM } +vbroadcastss, 0x6618, AVX2|AVX512F, Modrm|Vex|EVexDYN|Masking|Space0F38|VexW0|Disp8MemShift=2|NoSuf, { RegXMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM } vpblendd, 0x6602, AVX2, Modrm|Vex|Space0F3A|VexVVVV|VexW0|CheckOperandSize|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM } vpbroadcast, 0x6678 | , AVX2, Modrm|Vex|Space0F38|VexW0|NoSuf, { |Unspecified|BaseIndex|RegXMM, RegXMM|RegYMM } -vpbroadcast, 0x6658 | , AVX2, Modrm|Vex|Space0F38|VexW0|NoSuf|Optimize, { |Unspecified|BaseIndex|RegXMM, RegXMM|RegYMM } +vpbroadcastd, 0x6658, AVX2|AVX512F, Modrm|Vex|EVexDYN|Masking|Space0F38|VexW0|Disp8MemShift|NoSuf, { RegXMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM } +vpbroadcastq, 0x6659, AVX2, Modrm|Vex|Space0F38|VexW0|NoSuf|Optimize, { RegXMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM } vperm2i128, 0x6646, AVX2, Modrm|Vex=2|Space0F3A|VexVVVV|VexW0|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegYMM, RegYMM, RegYMM } -vpermd, 0x6636, AVX2, Modrm|Vex256|Space0F38|VexVVVV|VexW0|NoSuf, { Unspecified|BaseIndex|RegYMM, RegYMM, RegYMM } -vpermpd, 0x6601, AVX2, Modrm|Vex=2|Space0F3A|VexW1|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegYMM, RegYMM } -vpermps, 0x6616, AVX2, Modrm|Vex256|Space0F38|VexVVVV|VexW0|NoSuf, { Unspecified|BaseIndex|RegYMM, RegYMM, RegYMM } -vpermq, 0x6600, AVX2, Modrm|Vex=2|Space0F3A|VexW1|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegYMM, RegYMM } +vpermd, 0x6636, AVX2|AVX512F, Modrm|Vex256|EVexDYN|Masking|Space0F38|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM } +vpermpd, 0x6601, AVX2|AVX512F, Modrm|Vex256|EVexDYN|Masking|Space0F3A|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegYMM|RegZMM } +vpermps, 0x6616, AVX2|AVX512F, Modrm|Vex256|EVexDYN|Masking|Space0F38|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM } +vpermq, 0x6600, AVX2|AVX512F, Modrm|Vex256|EVexDYN|Masking|Space0F3A|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegYMM|RegZMM } vextracti128, 0x6639, AVX2, Modrm|Vex=2|Space0F3A|VexW=1|NoSuf, { Imm8, RegYMM, Unspecified|BaseIndex|RegXMM } vinserti128, 0x6638, AVX2, Modrm|Vex256|Space0F3A|VexVVVV|VexW0|NoSuf, { Imm8, Unspecified|BaseIndex|RegXMM, RegYMM, RegYMM } vpmaskmov, 0x668e, AVX2, Modrm|Vex|Space0F38|VexVVVV||CheckOperandSize|NoSuf, { RegXMM|RegYMM, RegXMM|RegYMM, Xmmword|Ymmword|Unspecified|BaseIndex } vpmaskmov, 0x668c, AVX2, Modrm|Vex|Space0F38|VexVVVV||CheckOperandSize|NoSuf, { Xmmword|Ymmword|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM } -vpsllv, 0x6647, AVX2, Modrm|Vex|Space0F38|VexVVVV||CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM } -vpsravd, 0x6646, AVX2, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM } -vpsrlv, 0x6645, AVX2, Modrm|Vex|Space0F38|VexVVVV||CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM } +vpsllv, 0x6647, AVX2|AVX512F, Modrm|Vex|EVexDYN|Masking|Space0F38|VexVVVV||Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM||Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM } +vpsravd, 0x6646, AVX2|AVX512F, Modrm|Vex|EVexDYN|Masking|Space0F38|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM } +vpsrlv, 0x6645, AVX2|AVX512F, Modrm|Vex|EVexDYN|Masking|Space0F38|VexVVVV||Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM||Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM } // AVX gather instructions vgatherdpd, 0x6692, AVX2, Modrm|Vex|Space0F38|VexVVVV|VexW1|SwapSources|CheckOperandSize|NoSuf|VecSIB128, { RegXMM|RegYMM, Qword|Unspecified|BaseIndex, RegXMM|RegYMM } @@ -1779,7 +1784,7 @@ vpclmulhqhqdq, 0x6644/0x11, AVX|PCLMULQD vgf2p8affineinvqb, 0x66cf, AVX|GFNI, Modrm|Vex|Space0F3A|VexVVVV|VexW1|CheckOperandSize|NoSuf, { Imm8, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM } vgf2p8affineqb, 0x66ce, AVX|GFNI, Modrm|Vex|Space0F3A|VexVVVV|VexW1|CheckOperandSize|NoSuf, { Imm8, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM } -vgf2p8mulb, 0x66cf, AVX|GFNI, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM } +vgf2p8mulb, 0x66cf, GFNI|AVX|AVX512F, Modrm|Vex|EVexDYN|Masking|Space0F38|VexVVVV|VexW0|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM } // FSGSBASE, RDRND and F16C @@ -2082,8 +2087,6 @@ vpclmulhqhqdq, 0x6644/0x11, VPCLMULQDQ, // AVX512F instructions. -#define Disp8ShiftVL Disp8MemShift=DISP8_SHIFT_VL - , 0x6615, AVX512F, Modrm|Masking|Space0F38|VexVVVV||Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM||Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM } vprorv, 0x6614, AVX512F, Modrm|Masking|Space0F38|VexVVVV||Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM||Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM } -vpsllv, 0x6647, AVX512F, Modrm|Masking|Space0F38|VexVVVV||Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM||Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM } -vpsrav, 0x6646, AVX512F, Modrm|Masking|Space0F38|VexVVVV||Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM||Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM } -vpsrlv, 0x6645, AVX512F, Modrm|Masking|Space0F38|VexVVVV||Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM||Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM } +vpsravq, 0x6646, AVX512F, Modrm|Masking|Space0F38|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM } vpternlog, 0x6625, AVX512F, Modrm|Masking|Space0F3A|VexVVVV||Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM||Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM } vbroadcastf32x4, 0x661A, AVX512F, Modrm|Masking|Space0F38|VexW=1|Disp8MemShift=4|NoSuf, { XMMword|Unspecified|BaseIndex, RegYMM|RegZMM } @@ -2153,10 +2154,9 @@ vbroadcasti32x4, 0x665A, AVX512F, Modrm| vbroadcastf64x4, 0x661B, AVX512F, Modrm|EVex=1|Masking|Space0F38|VexW=2|Disp8MemShift=5|NoSuf, { YMMword|Unspecified|BaseIndex, RegZMM } vbroadcasti64x4, 0x665B, AVX512F, Modrm|EVex=1|Masking|Space0F38|VexW=2|Disp8MemShift=5|NoSuf, { YMMword|Unspecified|BaseIndex, RegZMM } -vbroadcastss, 0x6618, AVX512F, Modrm|Masking|Space0F38|VexW0|Disp8MemShift=2|NoSuf, { RegXMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM } vbroadcastsd, 0x6619, AVX512F, Modrm|Masking|Space0F38|VexW1|Disp8MemShift=3|NoSuf, { RegXMM|Qword|Unspecified|BaseIndex, RegYMM|RegZMM } -vpbroadcast, 0x6658 | , AVX512F, Modrm|Masking|Space0F38||Disp8MemShift|NoSuf, { RegXMM||Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM } +vpbroadcastq, 0x6659, AVX512F, Modrm|Masking|Space0F38|VexW1|Disp8MemShift|NoSuf, { RegXMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM } vpbroadcast, 0x667c, AVX512F, Modrm|Masking|Space0F38||NoSuf, { , RegXMM|RegYMM|RegZMM } vcmpp, 0xC2/0x, AVX512F, Modrm|Masking|Space0F|VexVVVV||Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|ImmExt|SAE, { RegXMM|RegYMM|RegZMM||Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask } @@ -2246,9 +2246,6 @@ vextracti32x4, 0x6639, AVX512F, Modrm|Ma vextractf64x4, 0x661B, AVX512F, Modrm|EVex=1|Masking|Space0F3A|VexW=2|Disp8MemShift=5|NoSuf, { Imm8, RegZMM, RegYMM|Unspecified|BaseIndex } vextracti64x4, 0x663B, AVX512F, Modrm|EVex=1|Masking|Space0F3A|VexW=2|Disp8MemShift=5|NoSuf, { Imm8, RegZMM, RegYMM|Unspecified|BaseIndex } -vextractps, 0x6617, AVX512F, Modrm|EVex128|Space0F3A|VexWIG|Disp8MemShift=2|NoSuf, { Imm8, RegXMM, Reg32|Dword|Unspecified|BaseIndex } -vextractps, 0x6617, AVX512F|x64, RegMem|EVex128|Space0F3A|VexWIG|NoSuf, { Imm8, RegXMM, Reg64 } - vfixupimmp, 0x6654, AVX512F, Modrm|Masking|Space0F3A|VexVVVV||Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|SAE, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM||Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM } vfixupimms, 0x6655, AVX512F, Modrm|EVexLIG|Masking|Space0F3A|VexVVVV||Disp8MemShift|NoSuf|SAE, { Imm8|Imm8S, RegXMM||Unspecified|BaseIndex, RegXMM, RegXMM } @@ -2304,8 +2301,6 @@ vmovap, 0x28, AVX512F, D|Mo vmovntp, 0x2B, AVX512F, Modrm|Space0F||Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM, XMMword|YMMword|ZMMword|Unspecified|BaseIndex } vmovup, 0x10, AVX512F, D|Modrm|Masking|Space0F||Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM } -vmovd, 0x666E, AVX512F, D|Modrm|EVex=2|Space0F|Disp8MemShift=2|NoSuf, { Reg32|Unspecified|BaseIndex, RegXMM } - vmovddup, 0xF212, AVX512F, Modrm|Masking|Space0F|VexW=2|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegYMM|RegZMM|Unspecified|BaseIndex, RegYMM|RegZMM } vmovdqa64, 0x666F, AVX512F, D|Modrm|Masking|Space0F|VexW=2|Disp8ShiftVL|CheckOperandSize|NoSuf|Optimize, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM } @@ -2322,7 +2317,6 @@ vmovhp, 0x17, AVX512F, Modr vmovlp, 0x12, AVX512F, Modrm|EVexLIG|Space0F|VexVVVV||Disp8MemShift=3|NoSuf, { Qword|Unspecified|BaseIndex, RegXMM, RegXMM } vmovlp, 0x13, AVX512F, Modrm|EVexLIG|Space0F||Disp8MemShift=3|NoSuf, { RegXMM, Qword|Unspecified|BaseIndex } -vmovq, 0x666E, AVX512F|x64, D|Modrm|EVex128|Space0F|VexW1|Disp8MemShift=3|NoSuf, { Reg64|Unspecified|BaseIndex, RegXMM } vmovq, 0xF37E, AVX512F, Load|Modrm|EVex=2|Space0F|VexW1|Disp8MemShift=3|NoSuf, { Qword|Unspecified|BaseIndex|RegXMM, RegXMM } vmovq, 0x66D6, AVX512F, Modrm|EVex=2|Space0F|VexW1|Disp8MemShift=3|NoSuf, { RegXMM, Qword|Unspecified|BaseIndex|RegXMM } @@ -2360,15 +2354,10 @@ vpcmpu, 0x661e/, AVX vptestm, 0x6627, AVX512F, Modrm|Masking|Space0F38|VexVVVV||Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM||Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask } vptestnm, 0xf327, AVX512F, Modrm|Masking|Space0F38|VexVVVV||Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM||Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegMask } -vpermd, 0x6636, AVX512F, Modrm|Masking|Space0F38|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM } -vpermps, 0x6616, AVX512F, Modrm|Masking|Space0F38|VexVVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM } - -vpermilp, 0x6604 | , AVX512F, Modrm|Masking|Space0F3A||Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM||Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM } -vpermilp, 0x660C | , AVX512F, Modrm|Masking|Space0F38|VexVVVV||Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM||Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM } +vpermilpd, 0x6605, AVX512F, Modrm|Masking|Space0F3A|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM } +vpermilpd, 0x660d, AVX512F, Modrm|Masking|Space0F38|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM } -vpermpd, 0x6601, AVX512F, Modrm|Masking|Space0F3A|VexW=2|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegYMM|RegZMM } vpermpd, 0x6616, AVX512F, Modrm|Masking|Space0F38|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM } -vpermq, 0x6600, AVX512F, Modrm|Masking|Space0F3A|VexW=2|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8|Imm8S, RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegYMM|RegZMM } vpermq, 0x6636, AVX512F, Modrm|Masking|Space0F38|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegYMM|RegZMM, RegYMM|RegZMM } vpmovdb, 0xF331, AVX512F, Modrm|EVex=1|Masking|Space0F38|VexW=1|Disp8MemShift=4|NoSuf, { RegZMM, RegXMM|Unspecified|BaseIndex } @@ -2593,31 +2582,11 @@ vpmovsqw, 0xF324, AVX512F|AVX512VL, Modr vpmovusqw, 0xF314, AVX512F|AVX512VL, Modrm|EVex=2|Masking|Space0F38|VexW0|Disp8MemShift=2|NoSuf, { RegXMM, RegXMM|Dword|Unspecified|BaseIndex } vpmovusqw, 0xF314, AVX512F|AVX512VL, Modrm|EVex=3|Masking|Space0F38|VexW0|Disp8MemShift=3|NoSuf, { RegYMM, RegXMM|Qword|Unspecified|BaseIndex } -vpmovsxbd, 0x6621, AVX512F|AVX512VL, Modrm|EVex=2|Masking|Space0F38|VexWIG|Disp8MemShift=2|NoSuf, { RegXMM|Dword|Unspecified|BaseIndex, RegXMM } -vpmovsxbd, 0x6621, AVX512F|AVX512VL, Modrm|EVex=3|Masking|Space0F38|VexWIG|Disp8MemShift=3|NoSuf, { RegXMM|Qword|Unspecified|BaseIndex, RegYMM } -vpmovzxbd, 0x6631, AVX512F|AVX512VL, Modrm|EVex=2|Masking|Space0F38|VexWIG|Disp8MemShift=2|NoSuf, { RegXMM|Dword|Unspecified|BaseIndex, RegXMM } -vpmovzxbd, 0x6631, AVX512F|AVX512VL, Modrm|EVex=3|Masking|Space0F38|VexWIG|Disp8MemShift=3|NoSuf, { RegXMM|Qword|Unspecified|BaseIndex, RegYMM } - -vpmovsxbq, 0x6622, AVX512F|AVX512VL, Modrm|EVex=2|Masking|Space0F38|VexWIG|Disp8MemShift=1|NoSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM } -vpmovsxbq, 0x6622, AVX512F|AVX512VL, Modrm|EVex=3|Masking|Space0F38|VexWIG|Disp8MemShift=2|NoSuf, { RegXMM|Dword|Unspecified|BaseIndex, RegYMM } -vpmovzxbq, 0x6632, AVX512F|AVX512VL, Modrm|EVex=2|Masking|Space0F38|VexWIG|Disp8MemShift=1|NoSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM } -vpmovzxbq, 0x6632, AVX512F|AVX512VL, Modrm|EVex=3|Masking|Space0F38|VexWIG|Disp8MemShift=2|NoSuf, { RegXMM|Dword|Unspecified|BaseIndex, RegYMM } - vpmovsxdq, 0x6625, AVX512F|AVX512VL, Modrm|EVex=2|Masking|Space0F38|VexW0|Disp8MemShift=3|NoSuf, { RegXMM|Qword|Unspecified|BaseIndex, RegXMM } vpmovsxdq, 0x6625, AVX512F|AVX512VL, Modrm|EVex=3|Masking|Space0F38|VexW=1|Disp8MemShift=4|NoSuf, { RegXMM|Unspecified|BaseIndex, RegYMM } vpmovzxdq, 0x6635, AVX512F|AVX512VL, Modrm|EVex=2|Masking|Space0F38|VexW0|Disp8MemShift=3|NoSuf, { RegXMM|Qword|Unspecified|BaseIndex, RegXMM } vpmovzxdq, 0x6635, AVX512F|AVX512VL, Modrm|EVex=3|Masking|Space0F38|VexW=1|Disp8MemShift=4|NoSuf, { RegXMM|Unspecified|BaseIndex, RegYMM } -vpmovsxwd, 0x6623, AVX512F|AVX512VL, Modrm|EVex=2|Masking|Space0F38|VexWIG|Disp8MemShift=3|NoSuf, { RegXMM|Qword|Unspecified|BaseIndex, RegXMM } -vpmovsxwd, 0x6623, AVX512F|AVX512VL, Modrm|EVex=3|Masking|Space0F38|VexWIG|Disp8MemShift=4|NoSuf, { RegXMM|Unspecified|BaseIndex, RegYMM } -vpmovzxwd, 0x6633, AVX512F|AVX512VL, Modrm|EVex=2|Masking|Space0F38|VexWIG|Disp8MemShift=3|NoSuf, { RegXMM|Qword|Unspecified|BaseIndex, RegXMM } -vpmovzxwd, 0x6633, AVX512F|AVX512VL, Modrm|EVex=3|Masking|Space0F38|VexWIG|Disp8MemShift=4|NoSuf, { RegXMM|Unspecified|BaseIndex, RegYMM } - -vpmovsxwq, 0x6624, AVX512F|AVX512VL, Modrm|EVex=2|Masking|Space0F38|VexWIG|Disp8MemShift=2|NoSuf, { RegXMM|Dword|Unspecified|BaseIndex, RegXMM } -vpmovsxwq, 0x6624, AVX512F|AVX512VL, Modrm|EVex=3|Masking|Space0F38|VexWIG|Disp8MemShift=3|NoSuf, { RegXMM|Qword|Unspecified|BaseIndex, RegYMM } -vpmovzxwq, 0x6634, AVX512F|AVX512VL, Modrm|EVex=2|Masking|Space0F38|VexWIG|Disp8MemShift=2|NoSuf, { RegXMM|Dword|Unspecified|BaseIndex, RegXMM } -vpmovzxwq, 0x6634, AVX512F|AVX512VL, Modrm|EVex=3|Masking|Space0F38|VexWIG|Disp8MemShift=3|NoSuf, { RegXMM|Qword|Unspecified|BaseIndex, RegYMM } - // AVX512VL instructions end. // AVX512BW instructions. @@ -2960,7 +2929,6 @@ vpshufbitqmb, 0x668f, AVX512_BITALG, Mod vgf2p8affineinvqb, 0x66cf, GFNI|AVX512F, Modrm|Masking|Space0F3A|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM } vgf2p8affineqb, 0x66ce, GFNI|AVX512F, Modrm|Masking|Space0F3A|VexVVVV|VexW1|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Qword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM } -vgf2p8mulb, 0x66cf, GFNI|AVX512F, Modrm|Masking|Space0F38|VexVVVV|VexW0|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM } // AVX512 + GFNI instructions end