From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-x32a.google.com (mail-wm1-x32a.google.com [IPv6:2a00:1450:4864:20::32a]) by sourceware.org (Postfix) with ESMTPS id 58BE83858D1E for ; Tue, 30 Apr 2024 16:26:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 58BE83858D1E Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 58BE83858D1E Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::32a ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1714494368; cv=none; b=S/LTNjcXiLUzps4JMZx54njjsW4QVUz1rM4pz1nSvOYPuinCdboKpmE9HZhuEy7jPAY6XQqlaw5nTQFnMrgD84wVKtb5WxvpQPT7ok/sK9f/T5B6cy6Gxy0AaXoKnvtUenANamsMsg0DML+55bIHMGUTm/sViVxgCGYpAe9qrvM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1714494368; c=relaxed/simple; bh=ak68re9csIbl9mJQqjCH5waLTfiC3t/mDKjT/MwWue0=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=mQSxlYp61PFbXGhlhuKw/1ArSbGDnHkNS9mbp9DJo3XIMcQOQVpqif2j66Gqd/ZQ2UrJOyMA6vAKnFjuxVbOdXy9Otd5NTac+SjeV+HJ2n0HdXVZWs9R3aQkETxHepCnLJYuFumrxpPjQ53KbMx+Kh0/ZrnrhxFunq22QpLzK54= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wm1-x32a.google.com with SMTP id 5b1f17b1804b1-41b79450f8cso37615775e9.3 for ; Tue, 30 Apr 2024 09:26:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1714494364; x=1715099164; darn=sourceware.org; h=content-transfer-encoding:in-reply-to:autocrypt:from:references:cc :to:content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=b06JsmAj7CegLlLutSVg611l3a3ZFft8MB2cFeRnQmc=; b=K4nRVaA6a8l9EvVOaNXIMXSxbGoHzoA2qVZ5AYnYkvJvWd9pl8Aaeotc6iF4Qlr2tC AbawP5GtDlYfgpmjri4A5qD67nGCVrRzWr8OGIUwz0jdqVVRLfCsP/jQYLgMpWCLXynA w1Fvlhe3BaNOoKgBsdvPNEe+3iwYtOuGgIlu/ZkyKhM4kWl1Z+I3MpX2x8L0DhefYC4q UAmhTa3w01kgImXcNGOg7XfbRphaWlzAFI65sYVRMWGmbVn6JqkEHfbffujLE1cl4rGg oa4ZGGngt8X7YIhMm8d4YXf60bgrsD2hFUQwHGwCtagrThgaTeY8BuLVXrPpyKgDBmCb j4aQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714494364; x=1715099164; h=content-transfer-encoding:in-reply-to:autocrypt:from:references:cc :to:content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=b06JsmAj7CegLlLutSVg611l3a3ZFft8MB2cFeRnQmc=; b=TnqOMIPcHIBLYwITnQYKggbBNKpqYNwH1Eb8RmDGnXwlFEJ9C+8dOVOw/NWQD+AH6a zmxX36md4bji0ZhXJsQsu9XwZUkn3BeSrN2YhdBDiKKRbLQRENCGhHJ6yqEFDHX3FHi2 lv7h8Zl3bDRGxxpux/ou94pr0SuzVWMXHDQmzSC1LWxMib4LEkYm708uXO9oe9inZ7Cd fK9C2rU1tfW6hvGEykEJdcCgzPf/+hZs+K5jTjv1qkwQ0SLhOdEOwCYFT1XvFFr4ZXun rtavEfd3O6+USBbpexN3ycAWA6gXQ96NRKdO7ptoX+j7njqPMYjw/VOccOfNhaPVhnMV VbPg== X-Forwarded-Encrypted: i=1; AJvYcCX4gTB4eemblvVtMyHVGzyTuMbcjBmHvMKzwDSO8hB6qv7Mc39kiacDZO86dnB4xiTVE5/yu1Lwh1FvLgTlrkTakohBl4U0IQ== X-Gm-Message-State: AOJu0YxH4pxa7K5VNB17yqAhXqdKqTilAeZLCY1JY+nWiWGT76A3lbWU grIGESGUb94qmju5Qkhi2CYt13yWtk7qUsIr1O2jAEW2Uk2Sq7a4AMzrKUzX7g== X-Google-Smtp-Source: AGHT+IG7PYIYnjvVruqcfKj1VWj027jTUvZNahveSepNav+J+/nCuJw4u2guKVuFVbeEia4cBkVaGg== X-Received: by 2002:a05:6000:110:b0:349:cd18:abbd with SMTP id o16-20020a056000011000b00349cd18abbdmr118171wrx.46.1714494363929; Tue, 30 Apr 2024 09:26:03 -0700 (PDT) Received: from [10.156.60.236] (ip-037-024-206-209.um08.pools.vodafone-ip.de. [37.24.206.209]) by smtp.gmail.com with ESMTPSA id g17-20020a5d5551000000b0034c5e61ee82sm11365198wrw.67.2024.04.30.09.26.03 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 30 Apr 2024 09:26:03 -0700 (PDT) Message-ID: <76ebd8f2-63a0-4759-a86c-35528c3d63d7@suse.com> Date: Tue, 30 Apr 2024 18:26:02 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] Support APX zero-upper Content-Language: en-US To: "Cui, Lili" Cc: hjl.tools@gmail.com, binutils@sourceware.org References: <20240428105424.2428135-1-lili.cui@intel.com> From: Jan Beulich Autocrypt: addr=jbeulich@suse.com; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL In-Reply-To: <20240428105424.2428135-1-lili.cui@intel.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3024.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 28.04.2024 12:54, Cui, Lili wrote: > --- a/gas/config/tc-i386.c > +++ b/gas/config/tc-i386.c > @@ -1920,7 +1920,7 @@ static INLINE bool need_evex_encoding (const insn_template *t) > return i.encoding == encoding_evex > || i.encoding == encoding_evex512 > || (t->opcode_modifier.vex && i.encoding == encoding_egpr) > - || i.mask.reg; > + || i.mask.reg || t->opcode_modifier.zu; > } I wonder if this is really needed. Can you clarify why/how you found a need to add this? > @@ -3980,7 +3980,8 @@ is_apx_evex_encoding (void) > { > return i.rex2 || i.tm.opcode_space == SPACE_EVEXMAP4 || i.has_nf > || (i.vex.register_specifier > - && (i.vex.register_specifier->reg_flags & RegRex2)); > + && (i.vex.register_specifier->reg_flags & RegRex2)) > + || i.tm.opcode_modifier.zu; This isn't needed - "i.tm.opcode_space == SPACE_EVEXMAP4" already covers all you need. > @@ -4285,8 +4286,9 @@ build_apx_evex_prefix (void) > i.vex.bytes[3] &= ~0x08; > > /* Encode the NDD bit of the instruction promoted from the legacy > - space. */ > - if (i.vex.register_specifier && i.tm.opcode_space == SPACE_EVEXMAP4) > + space. ZU shares the same bit with NDD. */ > + if ((i.vex.register_specifier && i.tm.opcode_space == SPACE_EVEXMAP4) > + || i.tm.opcode_modifier.zu) > i.vex.bytes[3] |= 0x10; > > /* Encode the NF bit. */ > @@ -9204,7 +9206,7 @@ match_template (char mnem_suffix) > /* APX insns acting on byte operands are WIG, yet that can't be expressed > in the templates (they're also covering word/dword/qword operands). */ > if (t->opcode_space == SPACE_EVEXMAP4 && !t->opcode_modifier.vexw && > - i.types[i.operands - 1].bitfield.byte) > + i.types[i.operands - 1].bitfield.byte && !t->opcode_modifier.zu) With a change request at the bottom this won't be needed anymore either, I think. > --- /dev/null > +++ b/gas/testsuite/gas/i386/x86-64-apx-zu-inval.s > @@ -0,0 +1,28 @@ > +# Check illegal APX-ZU instructions > + > + .allow_index_reg > + .text > +_start: > + imulzub $0xa,%bl,%al > + imulzud $0xa,%ebx,%eax > + imulzu $0xa,%rbx,%rax > + imulzub $0xaaaa,%bl,%al > + imulzud $0xaaaa,%ebx,%eax > + imulzu $0xaaaa,%rbx,%rax > + imulzu $0xaaaa,%ebx,%rax > + imulzu $0xaaaa,%ebx,%rax > + setzuno %eax > + setzub %bx > + setzuae %r8w > + setzue %r9w > + setzune %r10d > + setzube %eax > + setzua %bx > + setzus %r18w > + setzuns %r19w > + setzup %r20d > + setzunp %r21w > + setzul %r22w > + setzuge %r23d > + setzule %r24w > + setzug %r25w How about having at least one case with a 64-bit register here, too? Further perhaps better also have one use of %ah, %ch, %dh, or %bh here. > @@ -14060,3 +14077,15 @@ JMPABS_Fixup (instr_info *ins, int bytemode, int sizeflag) > return OP_IMREG (ins, bytemode, sizeflag); > return OP_OFF64 (ins, bytemode, sizeflag); > } > + > +static bool > +IMUL_Fixup (instr_info *ins, int bytemode, int sizeflag) > +{ > + /* Although imul do not support NDD, the EVEX.ND bit is used to control > + whether its destination register has its upper bits zeroed when OSIZE > + is 16b. */ > + if (ins->vex.nd) > + ins->mnemonicendp = stpcpy (ins->obuf, "imulzu"); Despite the comment this handling isn't restricted to 16-bit operand size. > + return OP_G (ins, bytemode, sizeflag); > +} Further for SETZUcc I can't even spot how you check that EVEX.NDD=1. With EVEX.NDD=0 aiui this is ordinary SETcc, just EVEX-encoded. > --- a/opcodes/i386-opc.h > +++ b/opcodes/i386-opc.h > @@ -753,6 +753,9 @@ enum > /* Instrucion requires REX2 prefix. */ > Rex2, > > + /* Support zero upper */ > + ZU, > + > /* The last bitfield in i386_opcode_modifier. */ > Opcode_Modifier_Num > }; > @@ -800,6 +803,7 @@ typedef struct i386_opcode_modifier > unsigned int noegpr:1; > unsigned int nf:1; > unsigned int rex2:1; > + unsigned int zu:1; > } i386_opcode_modifier; Does this really need to be a new attribute? I would have expected a new OperandConstraint value would suffice. > --- a/opcodes/i386-opc.tbl > +++ b/opcodes/i386-opc.tbl > @@ -399,8 +399,10 @@ imul, 0xfaf, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64|U > imul, 0xaf, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|EVexMap4|NF, { Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 } > imul, 0x6b, i186, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Imm8S, Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 } > imul, 0x6b, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|EVexMap4|NF, { Imm8S, Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 } > +imulzu, 0x6b, APX_F, Modrm|No_bSuf|No_sSuf|EVexMap4|NF|ZU, { Imm8S, Reg16|Unspecified|BaseIndex, Reg16 } > imul, 0x69, i186, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Imm16|Imm32|Imm32S, Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 } > imul, 0x69, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|EVexMap4|NF, { Imm16|Imm32|Imm32S, Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 } > +imulzu, 0x69, APX_F, Modrm|No_bSuf|No_sSuf|EVexMap4|NF|ZU, { Imm16, Reg16|Unspecified|BaseIndex, Reg16 } > // imul with 2 operands mimics imul with 3 by putting the register in > // both i.rm.reg & i.rm.regmem fields. RegKludge enables this > // transformation. There's (once again) another adjustment wanted below here. > @@ -528,6 +530,7 @@ loopne, 0xe0, x64, JumpByte|No_bSuf|No_wSuf|No_sSuf|NoRex64, { Disp8 } > > // Set byte on flag instructions. > set, 0xf9/0, i386, Modrm|No_wSuf|No_lSuf|No_sSuf|No_qSuf, { Reg8|Unspecified|BaseIndex } > +setzu, 0xf24/0, APX_F, Modrm|No_wSuf|No_lSuf|No_sSuf|No_qSuf|EVexMap4|ZU, { Reg8 } Didn't we kind of agree to also permit set, 0xf24/0, APX_F, Modrm|No_bSuf|No_sSuf|EVexMap4|ZU, { Reg32|Reg64 } ? This then also makes more noticable the question regarding EVEX.W: In the latter form, the register used selects it. In the form you add it ought to be EVexWIG, though, I would say (matching the .IGNORED in the spec). Plus, as per one of the comments on the disassembler, don't we also need yet another line permitting "{evex} setz %dl" and alike to be used? Jan