From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-x229.google.com (mail-lj1-x229.google.com [IPv6:2a00:1450:4864:20::229]) by sourceware.org (Postfix) with ESMTPS id 5AB7D3849AD3 for ; Tue, 30 Apr 2024 06:18:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5AB7D3849AD3 Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 5AB7D3849AD3 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::229 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1714457909; cv=none; b=ghn6YpVv58QohUbI/j11avketoWyuYkiOO4kNNR0Nbr6dlGy9rEXQERdF4VWr+m6sruweD8RNhnEy3Bp1RqBKn6T9wIYfMP5KmVrTRUm5WkXDv1bZTJxRzpdUjcSfR3D/faSogskZkgbgJrJ2VQtBcCSxdWgSrZAd1MoyNIsqaE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1714457909; c=relaxed/simple; bh=AncrZP3D4xQLd4wnh9a1kzW620Jr2PFC9tURM8EtqZY=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=OVCbPp2tfrEAT3v4uk88rNDE3zlhkT5OyIWuuwaeVkGjDcAJUY9wmEoEXnDI88R5QdZlbM1bzylTsPQbhIo/K8/vItOOE1abVPDtm8m3td7X4YJVBTkwr6gNPlYX45xgTWDSzAEZCrHT9z+1YZr1xJyzMWkh9gMVklixoA7EbsY= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-lj1-x229.google.com with SMTP id 38308e7fff4ca-2e09138a2b1so22657121fa.3 for ; Mon, 29 Apr 2024 23:18:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1714457906; x=1715062706; darn=sourceware.org; h=content-transfer-encoding:in-reply-to:autocrypt:from:references:cc :to:content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=FKiz+NZp9iDnU4ZVR90/b+lkx5n73Yjpgz8gVvdov+U=; b=Vx3fPlH9B7ZfmN9BpjwO1Tmj/mmSJixQa+4mwyD8HS5VLReaZjEajoRa70OayKkqMv u+WBNdCPuvmod2H9o0O4lmw/pz9+hfvlrMqRiP8O5skcpDEAYRfHqJRyIkZjT88kBF8W /X9Aq7oUem8FCweMSTkJXTs2cP0fKyI/SgKVlrEY2IFJ+Ve7lipolhowKDQ+KoAHbCOc iV51qavKF8D74bMqfyLE+OX1RZ4d7vSPKf627hml7/51cBszY8WIOdOGFHohq7ybW5wL 0mu78NRV9R2RKtF78KSThjnl0KvZGhDSGk9A8p7sqHphftC2+DvgC1l0cuJYeFHpn6DL 8b6Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714457906; x=1715062706; h=content-transfer-encoding:in-reply-to:autocrypt:from:references:cc :to:content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=FKiz+NZp9iDnU4ZVR90/b+lkx5n73Yjpgz8gVvdov+U=; b=xFs7f46xMlm1My4Upx/Lq1K5yqWXtV1cUtGmIk/ke0YNE4RFz1N89uMyWkVxv+Niqh UGyHzmT3KvHbEiRvaU3R8sOiUBeiocIDWvjEYIXg3oe+nJgqZbNuGKZGpZZHPIW59pLL Wd6eCv6LQ5PPiEy2CCqFG8D50RzdGdRZThaY4kTXBBbb0ZVrvgsnsd46u0AFbkICJhUd nr5TxHpNpL/10Ss+dfQsFaOuT2BqiUa2TZPS0M7X4ztQX17fhoxY3LfNs90rlifevUsU kMXzlqzGLL7Qdcqob5gAl+ham9pMQoRF+55tC1ZGh0aocWHQYnsMV6ADKiIdq+nZKoD+ OJ7A== X-Forwarded-Encrypted: i=1; AJvYcCWpKfp+AeTmUjrE8Dp1XgH8lPBKc96Y3KvFlTTcv9sXApOT4O+HciY3s6nYL65WsNWHlejd3A89RpRJC5gPVYEV6BAG1iaZyQ== X-Gm-Message-State: AOJu0YzLq31Dg5Od/zjMFRkgNjWb2gEA79hcA9zMfW40AC+/EpP0rGtb Yr24/ZDKfxenKAZiTBeV6AoZZ/9nWY1x3q4uV9wjzcT8bRoTQjs1KLUpCDTNYQ== X-Google-Smtp-Source: AGHT+IEqFTzu5cUzTJ8nw+e+yNxb7WoR2S2O9UBKT6uIP6tba2y5j+SZzf86xELTxgEiVfJs2dxmhA== X-Received: by 2002:a2e:a608:0:b0:2e0:14bd:18f2 with SMTP id v8-20020a2ea608000000b002e014bd18f2mr4671410ljp.47.1714457905678; Mon, 29 Apr 2024 23:18:25 -0700 (PDT) Received: from [10.156.60.236] (ip-037-024-206-209.um08.pools.vodafone-ip.de. [37.24.206.209]) by smtp.gmail.com with ESMTPSA id bi24-20020a05600c3d9800b0041be4065adasm11055081wmb.22.2024.04.29.23.18.25 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 29 Apr 2024 23:18:25 -0700 (PDT) Message-ID: Date: Tue, 30 Apr 2024 08:18:24 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 2/3] x86: Drop SwapSources Content-Language: en-US To: "Cui, Lili" Cc: "hjl.tools@gmail.com" , "binutils@sourceware.org" References: <20240424072356.2433122-1-lili.cui@intel.com> <20240424072356.2433122-3-lili.cui@intel.com> <8549a5b2-d8b1-451e-90c1-74ee29dadf3a@suse.com> <11644d0a-1b51-4c31-82ac-92bb39fd59df@suse.com> <787a00d3-7852-4c7d-ad03-bfa0ecc5a8bd@suse.com> From: Jan Beulich Autocrypt: addr=jbeulich@suse.com; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3024.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 30.04.2024 04:56, Cui, Lili wrote: >> On 29.04.2024 15:41, Cui, Lili wrote: >>>> On 29.04.2024 14:23, Cui, Lili wrote: >>>>>> On 28.04.2024 06:47, Cui, Lili wrote: >>>>>>>> On 26.04.2024 10:14, Cui, Lili wrote: >>>>>>>>>> On 24.04.2024 09:23, Cui, Lili wrote: >>>>>>>>>>> --- a/gas/config/tc-i386.c >>>>>>>>>>> +++ b/gas/config/tc-i386.c >>>>>>>>>>> @@ -10434,6 +10434,14 @@ build_modrm_byte (void) >>>>>>>>>>> >>>>>>>>>>> switch (i.tm.opcode_modifier.vexvvvv) >>>>>>>>>>> { >>>>>>>>>>> + case VexVVVV_SRC2: >>>>>>>>>>> + if (source != op) >>>>>>>>>>> + { >>>>>>>>>>> + v = source++; >>>>>>>>>>> + break; >>>>>>>>>>> + } >>>>>>>>>>> + /* For XOP: vpshl* and vpsha*. */ >>>>>>>>>>> + /* Fall through. */ >>>>>>>>>>> case VexVVVV_SRC1: >>>>>>>>>> >>>>>>>>>> This falling-through is odd and hence needs a better comment >>>>>>>>>> (then also covering vprot*, which afaict is similarly affected). >>>>>>>>>> The reason for this is the XOP.W-controlled operand swapping, >>>>>>>>>> if I'm not mistaken? In which case perhaps instead of the >>>>>>>>>> fall-through here the logic swapping the operands should >>>>>>>>>> replace >>>>>>>>>> VexVVVV_SRC2 by >>>>>>>> VexVVVV_SRC1? >>>>>>>>>> >>>>>>>>> >>>>>>>>> Yes, vprot* should be included, and it is related to >>>>>>>>> XOP.W-controlled >>>>>>>> operand swapping, the comments says " /* Only the first two >>>>>>>> register operands need reversing, alongside flipping VEX.W. */ >>>>>>>> ", But there is actually a memory operand, not two register operands. >>>>>>>>> >>>>>>>>> I think VexVVVV_SRC2 makes more sense here, it matches the >>>>>>>>> actual >>>>>>>> situation, we want to use vvvv to encode the first operand. >>>>>>>>> >>>>>>>>> Opcode table: >>>>>>>>> vprot, 0x90 | , XOP, >>>>>>>>> D|Modrm|Vex128|SpaceXOP09|VexVVVV_Src2|VexW0|NoSuf, >>>>>> { RegXMM, >>>>>>>>> RegXMM|Unspecified|BaseIndex, RegXMM } >>>>>>>>> >>>>>>>>> testcase: >>>>>>>>> vprotb (%rax),%xmm12,%xmm15 >>>>>>>>> vprotb %xmm15,(%r12),%xmm0 >>>>>>>> >>>>>>>> VexVVVV_Src2 is appropriate for the latter, yes, but not for the >>>>>>>> former. That uses VexVVVV_Src1 layout. Hence my suggestion to >>>>>>>> replace the attribute when swapping operands. >>>>>>>> >>>>>>> >>>>>>> If replace the Src2VVVV| VexW0 with Src1VVVV| VexW1 and swapping >>>>>> operands. We can put VexVVVV_SRC1 before VexVVVV_SRC2, but we >> still >>>>>> need to add "(!is_cpu (&i.tm, CpuXOP) || source == op" under >>>>>> VexVVVV_SRC1 , and match_template also needs to be adjusted (I >> made >>>>>> a simple modification and it still failed, I think continuing like >>>>>> this may go against the original intention). >>>>>>> >>>>>>> switch (i.tm.opcode_modifier.vexvvvv) >>>>>>> { >>>>>>> /* VEX.vvvv encodes the first source register operand. */ >>>>>>> case VexVVVV_SRC1: >>>>>>> if (!is_cpu (&i.tm, CpuXOP) || source == op) >>>>>>> { >>>>>>> v = dest - 1; >>>>>>> break; >>>>>>> } >>>>>>> /* For XOP: vpshl*, vpsha* and vprot*. */ >>>>>>> /* Fall through. */ >>>>>>> /* VEX.vvvv encodes the last source register operand. */ >>>>>>> case VexVVVV_SRC2: >>>>>>> v = source++; >>>>>>> break; >>>>>>> /* VEX.vvvv encodes the destination register operand. */ >>>>>>> case VexVVVV_DST: >>>>>>> v = dest--; >>>>>>> break; >>>>>>> default: >>>>>>> v = ~0; >>>>>>> break; >>>>>>> } >>>>>>> >>>>>>> Do you think we should add a separate patch 4 for XOP that removes >>>>>>> the >>>>>> special handling in match_template and completes its template? so >>>>>> we don't have to add special handling for src1vvvv or src2vvvv. >>>>>> This might go against your desire to reduce template size, but it >>>>>> would help simplify the logic. I'd like to know your thoughts. >>>>>> >>>>>> Indeed. You'd effectively revert earlier folding that I did. And >>>>>> the adjustment I suggested earlier ought to be small/simple enough. >>>>>> >>>>> >>>>> So, I continued working on the previous suggestion. With the >>>>> following >>>> modification and it worked. >>>>> >>>>> @@ -8932,7 +8932,7 @@ match_template (char mnem_suffix) >>>>> || is_cpu (t, CpuAPX_F)); >>>>> if (!operand_type_match (overlap0, i.types[0]) >>>>> || !operand_type_match (overlap1, i.types[j]) >>>>> - || (t->operands == 3 >>>>> + || (t->operands == 3 && !is_cpu (t, CpuXOP) >>>>> && !operand_type_match (overlap2, i.types[1])) >>>>> || (check_register >>>>> && !operand_type_register_match (i.types[0], >>>> >>>> Just to mention it - this certainly isn't what I suggested. In fact I >>>> seem to vaguely recall that something similar was once proposed >>>> during the original APX work as well, where I then objected, too. >>>> >>>>> But I found that there are 4 test files that failed, I didn't find >>>>> the doc on >>>> how to encode vprotb but I guess that is because I changed the >>>> default template from Src2VVVV| VexW0 to Src1VVVV| VexW1, then all >>>> the related test cases needed to be modified. Do you have any comments >> here? >>>>> >>>>> regexp "^[ ]*[a-f0-9]+: 8f e9 40 90 >>>> d8[ ]+vprotb %xmm7,%xmm0,%xmm3$" >>>>> line " 11ed: 8f e9 f8 90 df vprotb %xmm7,%xmm0,%xmm3" >>>> >>>> Well, while changing the templates is in principle possible, and the >>>> resulting code would still be correct, changing encodings it usually not a >> good idea. >>>> Thus when it can be avoided, it should be avoided, imo. Hence why I >>>> didn't suggest this, but to amend the code doing the operand swapping >>>> (for the case where operand order is controlled by XOP.W). >>>> >>> >>> I mistakenly thought this was what you wanted, and I also think this >> modification is unacceptable. Could you elaborate further on your original >> suggestion? >> >> At the bottom of match_template() we have >> >> case Opcode_VexW: >> /* Only the first two register operands need reversing, alongside >> flipping VEX.W. */ >> i.tm.opcode_modifier.vexw ^= VEXW0 ^ VEXW1; >> >> This is where I think you further want to adjust >> i.tm.opcode_modifier.vexvvvv. >> > > "vprotb %xmm7,%xmm0,%xmm3" doesn't go through this place, it finds the matching template directly (Src1VVVV|VexW1|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM }). Of course, for not having a memory operand (and no pseudo prefix). But that's not the problem here - the problem is that for build_modrm_byte() you want to make adjustments when operands need swapping (compared to what the templates say). > I inserted a judgment to solve this problem, it's a bit ugly. I'd like to abandon this optimization. Didn't you have a working form earlier on? Could we use that, leaving the XOP stuff untouched for now, for me to see about looking into later? > To avoid confusion, I post all the changes. > > --- a/gas/config/tc-i386.c > +++ b/gas/config/tc-i386.c > @@ -8932,7 +8932,7 @@ match_template (char mnem_suffix) > || is_cpu (t, CpuAPX_F)); > if (!operand_type_match (overlap0, i.types[0]) > || !operand_type_match (overlap1, i.types[j]) > - || (t->operands == 3 > + || (t->operands == 3 && !is_cpu (t, CpuXOP) --> This place also needs to change. > && !operand_type_match (overlap2, i.types[1])) > || (check_register > && !operand_type_register_match (i.types[0], > @@ -9035,6 +9035,10 @@ match_template (char mnem_suffix) > specific_error = progress (i.error); > continue; > } > + if (is_cpu (t, CpuXOP) && operand_types[0].bitfield.baseindex --> It's ugly, and still has some test cases failing for other XOP instructions. > + && i.types[0].bitfield.class == RegSIMD > + && t->opcode_modifier.vexw == VEXW1) > + found_reverse_match = Opcode_VexW; > break; > } > } Yeah, as indicated before: These probably shouldn't be changed like this. > --- a/opcodes/i386-opc.tbl > +++ b/opcodes/i386-opc.tbl > @@ -1938,10 +1938,10 @@ vpmacsww, 0x95, XOP, Modrm|Vex128|SpaceXOP08|Src1VVVV|VexW0|NoSuf, { RegXMM, Reg > vpmadcsswd, 0xa6, XOP, Modrm|Vex128|SpaceXOP08|Src1VVVV|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM } > vpmadcswd, 0xb6, XOP, Modrm|Vex128|SpaceXOP08|Src1VVVV|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM } > vpperm, 0xa3, XOP, D|Modrm|Vex128|SpaceXOP08|Src1VVVV|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM } > -vprot, 0x90 | , XOP, D|Modrm|Vex128|SpaceXOP09|Src2VVVV|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM } > +vprot, 0x90 | , XOP, D|Modrm|Vex128|SpaceXOP09|Src1VVVV|VexW1|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM } > vprot, 0xc0 | , XOP, Modrm|Vex128|SpaceXOP08|VexW0|NoSuf, { Imm8|Imm8S, RegXMM|Unspecified|BaseIndex, RegXMM } > -vpsha, 0x98 | , XOP, D|Modrm|Vex128|SpaceXOP09|Src2VVVV|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM } > -vpshl, 0x94 | , XOP, D|Modrm|Vex128|SpaceXOP09|Src2VVVV|VexW0|NoSuf, { RegXMM, RegXMM|Unspecified|BaseIndex, RegXMM } > +vpsha, 0x98 | , XOP, D|Modrm|Vex128|SpaceXOP09|Src1VVVV|VexW1|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM } > +vpshl, 0x94 | , XOP, D|Modrm|Vex128|SpaceXOP09|Src1VVVV|VexW1|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM, RegXMM } Aiui (and as per earlier discussion) this leads to encoding changes, which we want to avoid whenever possible. Jan