From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-x334.google.com (mail-wm1-x334.google.com [IPv6:2a00:1450:4864:20::334]) by sourceware.org (Postfix) with ESMTPS id DA9C43858D32 for ; Mon, 11 Dec 2023 08:34:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DA9C43858D32 Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org DA9C43858D32 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::334 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702283696; cv=none; b=hb/HFCEotsds/JbSVNFYvfCGUAJJB1k5aIQ1qrC7jvR3LCx9Zya3d/XV5XqkZCqLbA3M798zZBbOFzCiGh9xdczOheNs8pj46FhLVn0DuOpPoG9NMQcYwu6zxCzVjP5y6xDqLr1mkGKFqjKw2GPoI2VKLRPCy9bOq3AIlAHkZ+8= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702283696; c=relaxed/simple; bh=KyAM87Ma9Df+NRqLNTV/bZxtiwwQWPjrjSQZqhCnYiw=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=uuRHvepus5If42YR5G3Xe2cxTeml06h+Z4oX4K21GHLp95Phs8240y2Hfw1/vSV+ySPcUgLLOdWlGK6eUUMjrDVR3DdAcJj3K2G0nrmCU3mkB04UdPjESmjIl51Wluy9g9Tjb46uudsxRvK36w7RXUPAoTljUy/nOS2OMl+iysI= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wm1-x334.google.com with SMTP id 5b1f17b1804b1-40c39e936b4so19247105e9.1 for ; Mon, 11 Dec 2023 00:34:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1702283691; x=1702888491; darn=sourceware.org; h=content-transfer-encoding:in-reply-to:autocrypt:from:references:cc :to:content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=/QUDgVvIBt3fzwvAAmX0HZUcj5RiGMNkQM36VFcT4cs=; b=NlJgBzkBK/560XkQxgCjy07YA7lNXwhdVZuLrKrUkmI7VXNHpJ0Ub1x2efqdLMr9/E grL1K3ggOQJ6/JYTvIOEJrC2YSiCBf9aQh/AsxAqBXXjByjjhz5zAWxiO8/NBk9rU4/0 bsAZxWHCQ8S2YfEZnU8mOIe9aORueXLrFBPvg4w6kZX1oisX2fiKWbRAeqlUJKHSRhCG pYwq7M0nKLAPq/AmqX5y8WHRyeaLGhLf/TC3WH6fm2wPCq7rLXrwZmItSlvmhX3CakjG /d1YlunZPAS/7IpA+p9M1H0ZF8jjDYllwXYCY7zizJNDn2nqs0eaFzqKmHZne5Z6Rglu MPIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702283691; x=1702888491; h=content-transfer-encoding:in-reply-to:autocrypt:from:references:cc :to:content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=/QUDgVvIBt3fzwvAAmX0HZUcj5RiGMNkQM36VFcT4cs=; b=JSPUb8SSPIJ/SWQtIt9mL3tXnXdm2XfvklUXcNhE64EZd2D2PJepzSrMR7/dbS5rUu CEPIrt56tpAXmJXBM+66PfmAPtneyjw+VZSUvBT61bmwYmJzd/EWmCxXM3zJyzAew/UD nF1ccBE1RE4iUmrLUvovIhy+IOgfSy8/dMW6So/n8Lo4AMsIGyAKAoaB0J4z7pqubeMJ +4hvGQoqkH3fVKVEtAxbF+gSibxGRRzUulJpyt5L21oXtsXF/IL12ymdIJN1fVu2xgdQ X3tJOrQmEGEO8ArgcFeukvYoqzh7J4ddGW6IJ28Ts7keu54Brov4Xl4Cr7LjJlx7lj/h c4MQ== X-Gm-Message-State: AOJu0YzfkkJ9ngzf6/rTOVf6JyxNQdUVac9kij2fvTt8XHQBDfBR+BWO g5xowahak6sN5YlNDVX3lV2IU7FUPJNmQDXmNWI0 X-Google-Smtp-Source: AGHT+IHHohwpPuEuS6FrAsupEnsdxv8x119cfSQnXUFlLI4SJmDG49lnpPAvYAAwW61qZYxmPjkrtg== X-Received: by 2002:a05:600c:3b06:b0:40b:5e22:2e8 with SMTP id m6-20020a05600c3b0600b0040b5e2202e8mr906201wms.84.1702283691595; Mon, 11 Dec 2023 00:34:51 -0800 (PST) Received: from [10.156.60.236] (ip-037-024-206-209.um08.pools.vodafone-ip.de. [37.24.206.209]) by smtp.gmail.com with ESMTPSA id fc17-20020a05600c525100b0040c42681fcesm5303207wmb.15.2023.12.11.00.34.50 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 11 Dec 2023 00:34:51 -0800 (PST) Message-ID: <0bb5fbcd-f58e-48ad-a5ee-3413b026f903@suse.com> Date: Mon, 11 Dec 2023 09:34:32 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 4/9] Support APX GPR32 with extend evex prefix Content-Language: en-US To: "Cui, Lili" Cc: "Lu, Hongjiu" , "binutils@sourceware.org" References: <20231124070213.3886483-1-lili.cui@intel.com> <20231124070213.3886483-4-lili.cui@intel.com> <546c8890-0526-49a3-8310-319358bf55c2@suse.com> From: Jan Beulich Autocrypt: addr=jbeulich@suse.com; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3026.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 08.12.2023 16:21, Cui, Lili wrote: >> -----Original Message----- >> From: Jan Beulich >> Sent: Thursday, December 7, 2023 8:39 PM >> >> On 24.11.2023 08:02, Cui, Lili wrote: >>> @@ -3670,10 +3673,11 @@ install_template (const insn_template *t) >>> >>> /* Dual VEX/EVEX templates need stripping one of the possible variants. */ >>> if (t->opcode_modifier.vex && t->opcode_modifier.evex) >>> - { >>> - if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2) >>> - || maybe_cpu (t, CpuFMA)) >>> - && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL))) >>> + { >>> + if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA) >>> + || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) || >> APX_F(CpuCMPCCXADD) >>> + || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) || >> APX_F(CpuAVX512DQ) >>> + || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2)) >>> { >>> if (need_evex_encoding ()) >> >> There are several issues here: >> - Why did you need to change (to the worse) the original code? >> - Why did you not model the addition after that original code? >> - How come APX_F (CpuAVX512*) constructs appear here, when no AVX512 >> insn can be VEX-encoded? > > I don't understand what you mean, we have this combination. > > kmov, 0x90, AVX512BW&(AVX512BW|APX_F), Modrm|Vex128|EVex128|Space0F|VexW1||NoSuf, { RegMask||Unspecified|BaseIndex, RegMask } Oh, I'm sorry: I forgot about the mask register insns. >> - If these new macros are really needed for whatever reason, they shouldn't >> be added to opcodes/i386-opc.h when they're useful only in the assembler. >> - Style requires a blank before the opening parenthesis in function >> invocations (which also covers function-like macro invocations). >> >> I think I asked before: How is it that you get away without altering >> cpu_flags_match(), containing related and quite similar logic? >> > > For the original logic ( ... || ... ) && ( ... || ...), the content in the first bracket and the content in the following brackets can be combined arbitrarily. I think it is Inaccurate. In which way? If there are issues with the existing code, these issues want taking care of in separate (prereq) patches. Of course there are assumptions made here about the CPU combinations that can (and cannot) occur in any of our templates. Similar assumptions are imo fine to make in the APX additions. Note how I used two nested if()s despite that not having been necessary at that time. I did so in anticipation that for APX you'd want to add another (separate) inner if(), rather than altering the one that's there. > So I give examples one by one for each identified combination. Which examples are you talking about? I see none given in your reply. > Just found cpu_flags_match() has similar logic, I think the following is the only code related to CPUID alerts, but none of our combinations are related to cpuavx. > > if (all.bitfield.cpuavx) > { > /* We need to check SSE2AVX with AVX. */ > if (!t->opcode_modifier.sse2avx > || (sse2avx && !i.prefix[DATA_PREFIX])) > match |= CPU_FLAGS_ARCH_MATCH; > } Not sure why you pick out this one. This special case is needed for sse2avx; I don't see how it's related here. What I've been pointing you at is the code in that function which follows a similar "Dual VEX/EVEX templates ..." comment. >>> @@ -14233,6 +14276,12 @@ static bool check_register (const reg_entry >> *r) >>> if (!cpu_arch_flags.bitfield.cpuapx_f >>> || flag_code != CODE_64BIT) >>> return false; >>> + >>> + /* When using RegRex2, dual VEX/EVEX templates need to be marked as >> EVEX. >>> + For the later install_template function. */ >>> + if (current_templates->start->opcode_modifier.vex >>> + && current_templates->start->opcode_modifier.evex) >>> + i.vec_encoding = vex_encoding_evex; >> >> I'm afraid I don't understand the 2nd sentence of the comment. This may be >> related to my question regarding cpu_flags_match() further up. >> >> The first sentence isn't quite correct either - you don't mark any template here >> (and you can't, because we don't even know yet which template we're going >> to use). >> >> Finally - do you really need the .evex check here? (I won't exclude that this >> yields a better diagnostic in certain cases, but this wants clarifying if so.) >> > > If you look at install_template(), you'll see that before this function we need to know if the current encoding is evex. "This function" being check_register()? If so, then no, we can't know up front whether EVEX encoding is going to be needed, as operand parsing happens ahead of template selection. If instead you mean "that function" and hence install_template(), then yes, we need to know whether to use EVEX there. Yet how does that result in a need for the .evex check here? (Or maybe your reply was really to the first of the three parts of my earlier one?) But anyway - as said earlier on, using current_templates here looks wrong in the first place. check_register() deals with only a register, without regard to the context it is used in (with the sole exception of allow_pseudo_reg). May I remind you that earlier on I already indicated that I suspect you'll need a new enumerator to put in i.vec_encoding for this new purpose? > We need to check opcode_modifier.evex here, it is a fix for issues caused by the merge of VEX and EVEX. > if (t->opcode_modifier.vex && t->opcode_modifier.evex) > { > if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA) > || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) || APX_F(CpuCMPCCXADD) > || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) || APX_F(CpuAVX512DQ) > || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2)) > { > if (need_evex_encoding ()) > { >[...] >>> @@ -1319,13 +1320,16 @@ getsec, 0xf37, SMX, NoSuf, {} >>> >>> invept, 0x660f3880, EPT&No64, Modrm|IgnoreSize|NoSuf, { >>> Oword|Unspecified|BaseIndex, Reg32 } invept, 0x660f3880, EPT&x64, >>> Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 } >>> +invept, 0xf3f0, EPT&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { >>> +Oword|Unspecified|BaseIndex, Reg64 } >>> invvpid, 0x660f3881, EPT&No64, Modrm|IgnoreSize|NoSuf, { >>> Oword|Unspecified|BaseIndex, Reg32 } invvpid, 0x660f3881, EPT&x64, >>> Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 } >>> +invvpid, 0xf3f1, EPT&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { >>> +Oword|Unspecified|BaseIndex, Reg64 } >> >> Seeing these: Are there any Map4 encodings which aren't EVex128? If not >> (and if you're also not hiddenly aware of some appearing in the near future), >> please consider making EVexMap4 include this right away. Even if in the longer >> run other encodings appear, it'll then be easy to simply replace all the >> EVexMap4 uses in a purely mechanical way. Until then shorter template lines >> are preferable. >> > > Would you mind defining it this way? Since #define EVex128 is behind it. Considering that you don't like unnecessary changes. > > +#define EVexMap4 OpcodeSpace=SPACE_EVEXMAP4|EVex=EVEX128 The order of #define-s doesn't matter. There's no reason not to use EVex128 here even if it's #define-d only a few lines later. >>> @@ -1837,14 +1842,14 @@ xtest, 0xf01d6, HLE|RTM, NoSuf, {} >>> >>> // BMI2 instructions. >>> >>> -bzhi, 0xf5, BMI2, >>> >> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No >> _bSuf|No >>> _wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, >>> Reg32|Reg64 } -mulx, 0xf2f6, BMI2, >>> >> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wS >> uf|No_sSu >>> f, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 } >>> -pdep, 0xf2f5, BMI2, >>> >> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wS >> uf|No_sSu >>> f, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 } >>> -pext, 0xf3f5, BMI2, >>> >> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wS >> uf|No_sSu >>> f, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 } >>> -rorx, 0xf2f0, BMI2, >>> >> Modrm|CheckOperandSize|Vex128|Space0F3A|No_bSuf|No_wSuf|No_sSu >> f, { >>> Imm8|Imm8S, Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, >> Reg32|Reg64 >>> } -sarx, 0xf3f7, BMI2, >>> >> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No >> _bSuf|No >>> _wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, >>> Reg32|Reg64 } -shlx, 0x66f7, BMI2, >>> >> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No >> _bSuf|No >>> _wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, >>> Reg32|Reg64 } -shrx, 0xf2f7, BMI2, >>> >> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No >> _bSuf|No >>> _wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, >>> Reg32|Reg64 } >>> +bzhi, 0xf5, BMI2&(BMI2|APX_F), >>> >> +Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|SwapS >> ources|N >>> +o_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64, >>> +Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 } >> >> Hmm, I had specifically suggested a pre-processor macro to use in place of the >> open-coded BMI2&(BMI2|APX_F). Is there a reason you didn't use that (here >> and below)? > > There are many different types of combinations, and each combination appears relatively few times, so I think adding a #define for each combination feels a bit wasteful. I never suggested using multiple #define-s. I suggested a single APX_F() macro which would be used uniformly here and elsewhere (here: APX_F(BMI2)). And that macro would come with a comment explaining why the expression is the (seemingly strange) way it is. Right now there's no such explanation anywhere, and it would also be hard to find a good (central) place where to put it. Jan