From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-x331.google.com (mail-wm1-x331.google.com [IPv6:2a00:1450:4864:20::331]) by sourceware.org (Postfix) with ESMTPS id B4F1C3858297 for ; Thu, 7 Dec 2023 12:38:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B4F1C3858297 Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org B4F1C3858297 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::331 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701952739; cv=none; b=Bw9eBnE4/apdC+FVvN+DIru3gGDei5DNXDAbV5QkL7YuG+whZ3bq3d1a62l56ouQF970kwNot9BihrwmWKn1QdXMq3SuRz61JYofKMxneB9TmUdtjBsQSon+uuTzS8ixuj7R2Gj0drmJCH44e2wEbvgwxdw4iilE0F9D+qbl1jQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701952739; c=relaxed/simple; bh=QqV9mNaxno8DuK3ejABm9h5R2AoIhptUR4T3QdvPLgU=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=lexYvUeXaNLekodr6hfRyn9AvASuejLshLUu8t+HjnzMPmAh7VTLfX/WXIH28BlIGkrRtXgAb9lrzj0j/dmTRRgfdwitZgGoT4zcdqaf/pp9UZDwQrRpMhEenVluSLcG5otAkoIM/A9QKuKfODz4YekWDWAc/nbamrnQKkN5tb8= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wm1-x331.google.com with SMTP id 5b1f17b1804b1-40c19f5f822so5143765e9.1 for ; Thu, 07 Dec 2023 04:38:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1701952735; x=1702557535; darn=sourceware.org; h=content-transfer-encoding:in-reply-to:autocrypt:from :content-language:references:cc:to:subject:user-agent:mime-version :date:message-id:from:to:cc:subject:date:message-id:reply-to; bh=qQJTkIdvF+wCQnQ6i8NKz9ECzJ2qFnmlK4bCT/Hb/dI=; b=GZoHwyEoVetlYsOv4XLuqCRLtxp/hcv3Bbyv9AGKY6rxcIk+eVOZflcjsX9eGW28If FCPeMr4bTsz5qW8e1BxLumrqy0sutz5srLWt99RRVd3wXSOYwEiAtNBqHLCRKsdwSsbs te1MqMw6dFPSD77yJwtyLZBRS9/a5s66LU3k9bnEijvNSzq3Kill0MmAvKEqAGQ54qxo yoyBMkLIl+d5tUuU3nqQlrBdKEpOpglbl0qxBZtlO5cbXTrQJmv25srqwD0WkUErs/sL A/m6uvL9UEJyHwiy8f5/Y1bLv8Sbb/LkkUzU0nEwo07mKUP9iKXzZ/x1mV00qn/PnSZj wxmA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701952735; x=1702557535; h=content-transfer-encoding:in-reply-to:autocrypt:from :content-language:references:cc:to:subject:user-agent:mime-version :date:message-id:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=qQJTkIdvF+wCQnQ6i8NKz9ECzJ2qFnmlK4bCT/Hb/dI=; b=OUO1H+UtIqLma3jI29pjUNq6VwedHocqTrxdlpHVm1jRrSEUe5CcdQvc7bCeo7n5QS OCRbieBRzVu54y/5FkM9PpQBp5+MSGOspz7H89zkrFljOOr1twqbVBWyMDYYb/N5oiCa xdjMi0yofng5S4m1xzRlBdrVshKD0LGjA/ca8vlbPbN34yJy/9R3X/LPcjXfS6RuA5Y5 aGGb/GVpvQSmMhB9bkdV5rRbkzJod8BU2FSvcnv2iI9Trykxs6MbZM8GFReg6pkJIsAA oUyj5JmV0cSbet+oaESIeT8XGZh0ZDTa9kGSAqZGy2kmbrCq07i89Ci5xMVONkQkx42n mbRA== X-Gm-Message-State: AOJu0YwuEnsIVsEGZsQpRkHtMJYuieMOWKFuYk/K2fmouNtmAl40Dob8 YhdSd4PKjnqFdHna5mQTB84t X-Google-Smtp-Source: AGHT+IGsTHbboxd2we39GcFkU6MVfQKqgoaZE5hY9cLqhICJHz00Us9Yb7L/OR1uBfNU+vAI7RR5kg== X-Received: by 2002:a05:600c:2d8c:b0:40b:5e4a:2367 with SMTP id i12-20020a05600c2d8c00b0040b5e4a2367mr2685810wmg.105.1701952735338; Thu, 07 Dec 2023 04:38:55 -0800 (PST) Received: from [10.156.60.236] (ip-037-024-206-209.um08.pools.vodafone-ip.de. [37.24.206.209]) by smtp.gmail.com with ESMTPSA id p8-20020a05600c358800b0040b4ccdcffbsm1796999wmq.2.2023.12.07.04.38.54 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 07 Dec 2023 04:38:55 -0800 (PST) Message-ID: <546c8890-0526-49a3-8310-319358bf55c2@suse.com> Date: Thu, 7 Dec 2023 13:38:54 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 4/9] Support APX GPR32 with extend evex prefix To: "Cui, Lili" Cc: hongjiu.lu@intel.com, binutils@sourceware.org References: <20231124070213.3886483-1-lili.cui@intel.com> <20231124070213.3886483-4-lili.cui@intel.com> Content-Language: en-US From: Jan Beulich Autocrypt: addr=jbeulich@suse.com; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL In-Reply-To: <20231124070213.3886483-4-lili.cui@intel.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3026.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 24.11.2023 08:02, Cui, Lili wrote: > --- a/gas/config/tc-i386.c > +++ b/gas/config/tc-i386.c > @@ -409,6 +409,9 @@ struct _i386_insn > /* Compressed disp8*N attribute. */ > unsigned int memshift; > > + /* No CSPAZO flags update.*/ > + bool has_nf; As before I don't see the point in adding this field when it's not used in the change. Note that this is unrelated to the introduction of the NF attribute right here, which has a reason. > @@ -3670,10 +3673,11 @@ install_template (const insn_template *t) > > /* Dual VEX/EVEX templates need stripping one of the possible variants. */ > if (t->opcode_modifier.vex && t->opcode_modifier.evex) > - { > - if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2) > - || maybe_cpu (t, CpuFMA)) > - && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL))) > + { > + if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA) > + || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) || APX_F(CpuCMPCCXADD) > + || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) || APX_F(CpuAVX512DQ) > + || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2)) > { > if (need_evex_encoding ()) There are several issues here: - Why did you need to change (to the worse) the original code? - Why did you not model the addition after that original code? - How come APX_F (CpuAVX512*) constructs appear here, when no AVX512 insn can be VEX-encoded? - If these new macros are really needed for whatever reason, they shouldn't be added to opcodes/i386-opc.h when they're useful only in the assembler. - Style requires a blank before the opening parenthesis in function invocations (which also covers function-like macro invocations). I think I asked before: How is it that you get away without altering cpu_flags_match(), containing related and quite similar logic? > @@ -3873,6 +3877,14 @@ is_any_vex_encoding (const insn_template *t) > return t->opcode_modifier.vex || t->opcode_modifier.evex; > } > > +static INLINE bool > +is_apx_evex_encoding (void) > +{ > + return i.rex2 || i.tm.opcode_space == SPACE_EVEXMAP4 > + || (i.vex.register_specifier > + && i.vex.register_specifier->reg_flags & RegRex2); > +} If you want this to be a function despite being used just once, you'll need to add a comment mentioning the constraint when calling it (or else the use of i.rex2 in particular is confusing). I'm sure I commented on this before, and I thought such a comment had already appeared. > @@ -5655,17 +5693,17 @@ md_assemble (char *line) > instruction already has a prefix, we need to convert old > registers to new ones. */ > > - if ((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte > - && (i.op[0].regs->reg_flags & RegRex64) != 0) > - || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte > - && (i.op[1].regs->reg_flags & RegRex64) != 0) > - || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte) > - || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte)) > - && (i.rex != 0 || i.rex2 != 0))) > + if (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte > + && (i.op[0].regs->reg_flags & RegRex64) != 0) > + || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte > + && (i.op[1].regs->reg_flags & RegRex64) != 0) > + || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte) > + || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte)) > + && (i.rex != 0 || i.rex2 != 0)))) I'm having trouble spotting the change here: There's an outer pair of parentheses being added, but that's for no reason unless there's another change well hidden. Please clarify. > { > int x; > > - if (!i.rex2) > + if (!is_apx_rex2_encoding () && !is_any_vex_encoding(&i.tm)) > i.rex |= REX_OPCODE; Why the change to is_apx_rex2_encoding()? If that's wanted / needed here, shouldn't that be put in place by the earlier patch? > @@ -14233,6 +14276,12 @@ static bool check_register (const reg_entry *r) > if (!cpu_arch_flags.bitfield.cpuapx_f > || flag_code != CODE_64BIT) > return false; > + > + /* When using RegRex2, dual VEX/EVEX templates need to be marked as EVEX. > + For the later install_template function. */ > + if (current_templates->start->opcode_modifier.vex > + && current_templates->start->opcode_modifier.evex) > + i.vec_encoding = vex_encoding_evex; I'm afraid I don't understand the 2nd sentence of the comment. This may be related to my question regarding cpu_flags_match() further up. The first sentence isn't quite correct either - you don't mark any template here (and you can't, because we don't even know yet which template we're going to use). Finally - do you really need the .evex check here? (I won't exclude that this yields a better diagnostic in certain cases, but this wants clarifying if so.) > --- a/gas/testsuite/gas/i386/x86-64.exp > +++ b/gas/testsuite/gas/i386/x86-64.exp > @@ -250,7 +250,7 @@ run_dump_test "x86-64-sse-noavx" > run_dump_test "x86-64-movbe" > run_dump_test "x86-64-movbe-intel" > run_dump_test "x86-64-movbe-suffix" > -run_list_test "x86-64-inval-movbe" "-al" > +run_list_test "x86-64-inval-movbe" "-I${srcdir}/$subdir -march=+noapx_f -al" I can see why you add the -march=, as we've been through this before. But why the -I ? > @@ -896,7 +897,7 @@ rex.wrxb, 0x4f, x64, NoSuf|IsPrefix, {} > load:Load:0, store:Store:0, + > vex:VEX:0, vex2:VEX:0, vex3:VEX3:0, evex:EVEX:0, + > - rex:REX:x64, rex2:REX2:x64, nooptimize:NoOptimize:0> > + rex:REX:x64, rex2:REX2:APX_F, nooptimize:NoOptimize:0> This change wants to go into the earlier patch? > @@ -1319,13 +1320,16 @@ getsec, 0xf37, SMX, NoSuf, {} > > invept, 0x660f3880, EPT&No64, Modrm|IgnoreSize|NoSuf, { Oword|Unspecified|BaseIndex, Reg32 } > invept, 0x660f3880, EPT&x64, Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 } > +invept, 0xf3f0, EPT&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Oword|Unspecified|BaseIndex, Reg64 } > invvpid, 0x660f3881, EPT&No64, Modrm|IgnoreSize|NoSuf, { Oword|Unspecified|BaseIndex, Reg32 } > invvpid, 0x660f3881, EPT&x64, Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 } > +invvpid, 0xf3f1, EPT&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Oword|Unspecified|BaseIndex, Reg64 } Seeing these: Are there any Map4 encodings which aren't EVex128? If not (and if you're also not hiddenly aware of some appearing in the near future), please consider making EVexMap4 include this right away. Even if in the longer run other encodings appear, it'll then be easy to simply replace all the EVexMap4 uses in a purely mechanical way. Until then shorter template lines are preferable. > @@ -1437,7 +1443,6 @@ xgetbv, 0xf01d0, Xsave, NoSuf, {} > xsetbv, 0xf01d1, Xsave, NoSuf, {} > > // xsaveopt > - > xsaveopt, 0xfae/6, Xsaveopt, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|NoEgpr, { Unspecified|BaseIndex } > xsaveopt64, 0xfae/6, Xsaveopt&x64, Modrm|NoSuf|Size64|NoEgpr, { Unspecified|BaseIndex } Iirc the earlier patch added that blank line. Why would you do such back and forth? > @@ -1837,14 +1842,14 @@ xtest, 0xf01d6, HLE|RTM, NoSuf, {} > > // BMI2 instructions. > > -bzhi, 0xf5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 } > -mulx, 0xf2f6, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 } > -pdep, 0xf2f5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 } > -pext, 0xf3f5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 } > -rorx, 0xf2f0, BMI2, Modrm|CheckOperandSize|Vex128|Space0F3A|No_bSuf|No_wSuf|No_sSuf, { Imm8|Imm8S, Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, Reg32|Reg64 } > -sarx, 0xf3f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 } > -shlx, 0x66f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 } > -shrx, 0xf2f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 } > +bzhi, 0xf5, BMI2&(BMI2|APX_F), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 } Hmm, I had specifically suggested a pre-processor macro to use in place of the open-coded BMI2&(BMI2|APX_F). Is there a reason you didn't use that (here and below)? Jan