From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-x231.google.com (mail-lj1-x231.google.com [IPv6:2a00:1450:4864:20::231]) by sourceware.org (Postfix) with ESMTPS id 5D2FA3858D35 for ; Mon, 29 Apr 2024 13:49:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5D2FA3858D35 Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 5D2FA3858D35 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::231 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1714398543; cv=none; b=hJzXmCjQQGrqB6m0DnHgu0fiHtX8gCdamBJGttREEnE+8zt/9hrSC23oBhn7fFYf+a6eEXvEDnh1riT21AJargCDtIY1ztyAVytcjydc3ukruXO6dn9uLouVPLKHcgZ29zl2a4XMScHzWYRJ32azdCFoI6GHU3InDRwt+YrDOiw= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1714398543; c=relaxed/simple; bh=0DEXY7lyTYB/ThAUTYH4Y9q74qagsQbuR4qxW110AgQ=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=xrOFsyP2pCHTrzjCgfbumgayJlv/33ME8U7yg3b5/9Z9wJNBy1XclAAC3OVsyJ+azrwW8bKDtTK2x365KC+wHCC1kuSR9dVXZeO2rYevulUluzPFDnUKeKQh8BaRPLOks6f47e2WE3szCG/7yWD/Uko7w/Je38YzeXWIQz11a5o= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-lj1-x231.google.com with SMTP id 38308e7fff4ca-2de232989aaso53953541fa.1 for ; Mon, 29 Apr 2024 06:49:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1714398540; x=1715003340; darn=sourceware.org; h=content-transfer-encoding:in-reply-to:autocrypt:from:references:cc :to:content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=k9phYgFwkyV+pVcLaGUpEFdO33IuWyPMqrMFn9CtfAQ=; b=OQz+Bzs8Cu9PPZqbvF1Dn+ratMlZV1hQVZtQT4fQ4w+FC7SPg6bq5fVJOB2FOqP3GX RyR9pqyyA5b4ILtcoGo1GqliU6zgWGlrgB/KEbLgYi/A2hrewCKBIRLIqVGxlL7Gk0+V MFn4/JIsMphFz+wqq0VdO1iXZYjkzlvxtQUWulDr5twfCl+8WPud4ZqlDz4pc8yARR9Y 3gFK2CHVPN0Ht3iGM8kGvKqcIw5USCo6fDcSfoIhE9bOPJzNeTg3NkMDS/9DCLav+PYb AjI7SfIaEnn92q13ZuEWVmNW0lkTvjaLwA7gwgTlMy/KnXsDLw0m+ZdhIozbGa45WIJp 1vnQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714398540; x=1715003340; h=content-transfer-encoding:in-reply-to:autocrypt:from:references:cc :to:content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=k9phYgFwkyV+pVcLaGUpEFdO33IuWyPMqrMFn9CtfAQ=; b=ZW8jFYxGqyIK9W6BBZIpPrG7WBcCEqCNeTHUPHbwGi7VtgwxKQHXM83mJJ/MWSz3AW 1Zfe/jO2h8v5q1Z9ert3Q0RiGaN60/QNKQnsoLtPazd6rQi3oCRolbf2TZVvDRu7L54R jgUuUiknbgvV5He4AV3JiUEeuXd3w+T6a9RIc5xL+8GWcthNyZKmTOC8Ma047aUENXiu PnhdzCENCDxEQwKYFxgtV7BWg77hhvdsgtoNALJq1wTjrIsPPZ0jlEb3Kubu4ZoxBZwr XuJBDYUVgSEtaryZSJzJg6okJGEZgvZmEStMhOdSe7gLW/RQlUoLk35v6N21xsev/pvq 2X8Q== X-Forwarded-Encrypted: i=1; AJvYcCVHLFOL5lzvipYUx1vXhUTLoq5/i5hdfDu2LiG6+d0MOe3ErXW5sRPdP1DqdSm1acy6HZvxLXv9+r/npNjwI2+2myVtQxC5mg== X-Gm-Message-State: AOJu0YxKCEhbezr8QtarGWdcEcQrQsWJzvg8jvCwONwHSUhVgGxjY14i kKXnad5xN8XMt6cj18eC8vrNWp9iFLjDHfYwRQnBUHXNbfbAfrgsOpznN6J6evTwjEywiePBw9U = X-Google-Smtp-Source: AGHT+IGXh6MVLHAJ5/S9PugS4SXOemz1rW/s4WrO2W47XBQROz3AHhq77zG+frDYJW393xnJVIC98Q== X-Received: by 2002:a2e:87c3:0:b0:2da:d986:e387 with SMTP id v3-20020a2e87c3000000b002dad986e387mr5634486ljj.50.1714398539690; Mon, 29 Apr 2024 06:48:59 -0700 (PDT) Received: from [10.156.60.236] (ip-037-024-206-209.um08.pools.vodafone-ip.de. [37.24.206.209]) by smtp.gmail.com with ESMTPSA id z9-20020a05600c0a0900b0041668162b45sm44796542wmp.26.2024.04.29.06.48.59 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 29 Apr 2024 06:48:59 -0700 (PDT) Message-ID: <787a00d3-7852-4c7d-ad03-bfa0ecc5a8bd@suse.com> Date: Mon, 29 Apr 2024 15:49:01 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 2/3] x86: Drop SwapSources Content-Language: en-US To: "Cui, Lili" Cc: "hjl.tools@gmail.com" , "binutils@sourceware.org" References: <20240424072356.2433122-1-lili.cui@intel.com> <20240424072356.2433122-3-lili.cui@intel.com> <8549a5b2-d8b1-451e-90c1-74ee29dadf3a@suse.com> <11644d0a-1b51-4c31-82ac-92bb39fd59df@suse.com> From: Jan Beulich Autocrypt: addr=jbeulich@suse.com; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3024.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 29.04.2024 15:41, Cui, Lili wrote: >> On 29.04.2024 14:23, Cui, Lili wrote: >>>> On 28.04.2024 06:47, Cui, Lili wrote: >>>>>> On 26.04.2024 10:14, Cui, Lili wrote: >>>>>>>> On 24.04.2024 09:23, Cui, Lili wrote: >>>>>>>>> --- a/gas/config/tc-i386.c >>>>>>>>> +++ b/gas/config/tc-i386.c >>>>>>>>> @@ -10434,6 +10434,14 @@ build_modrm_byte (void) >>>>>>>>> >>>>>>>>> switch (i.tm.opcode_modifier.vexvvvv) >>>>>>>>> { >>>>>>>>> + case VexVVVV_SRC2: >>>>>>>>> + if (source != op) >>>>>>>>> + { >>>>>>>>> + v = source++; >>>>>>>>> + break; >>>>>>>>> + } >>>>>>>>> + /* For XOP: vpshl* and vpsha*. */ >>>>>>>>> + /* Fall through. */ >>>>>>>>> case VexVVVV_SRC1: >>>>>>>> >>>>>>>> This falling-through is odd and hence needs a better comment >>>>>>>> (then also covering vprot*, which afaict is similarly affected). >>>>>>>> The reason for this is the XOP.W-controlled operand swapping, if >>>>>>>> I'm not mistaken? In which case perhaps instead of the >>>>>>>> fall-through here the logic swapping the operands should replace >>>>>>>> VexVVVV_SRC2 by >>>>>> VexVVVV_SRC1? >>>>>>>> >>>>>>> >>>>>>> Yes, vprot* should be included, and it is related to >>>>>>> XOP.W-controlled >>>>>> operand swapping, the comments says " /* Only the first two >>>>>> register operands need reversing, alongside flipping VEX.W. */ ", >>>>>> But there is actually a memory operand, not two register operands. >>>>>>> >>>>>>> I think VexVVVV_SRC2 makes more sense here, it matches the actual >>>>>> situation, we want to use vvvv to encode the first operand. >>>>>>> >>>>>>> Opcode table: >>>>>>> vprot, 0x90 | , XOP, >>>>>>> D|Modrm|Vex128|SpaceXOP09|VexVVVV_Src2|VexW0|NoSuf, >>>> { RegXMM, >>>>>>> RegXMM|Unspecified|BaseIndex, RegXMM } >>>>>>> >>>>>>> testcase: >>>>>>> vprotb (%rax),%xmm12,%xmm15 >>>>>>> vprotb %xmm15,(%r12),%xmm0 >>>>>> >>>>>> VexVVVV_Src2 is appropriate for the latter, yes, but not for the >>>>>> former. That uses VexVVVV_Src1 layout. Hence my suggestion to >>>>>> replace the attribute when swapping operands. >>>>>> >>>>> >>>>> If replace the Src2VVVV| VexW0 with Src1VVVV| VexW1 and swapping >>>> operands. We can put VexVVVV_SRC1 before VexVVVV_SRC2, but we still >>>> need to add "(!is_cpu (&i.tm, CpuXOP) || source == op" under >>>> VexVVVV_SRC1 , and match_template also needs to be adjusted (I made a >>>> simple modification and it still failed, I think continuing like this >>>> may go against the original intention). >>>>> >>>>> switch (i.tm.opcode_modifier.vexvvvv) >>>>> { >>>>> /* VEX.vvvv encodes the first source register operand. */ >>>>> case VexVVVV_SRC1: >>>>> if (!is_cpu (&i.tm, CpuXOP) || source == op) >>>>> { >>>>> v = dest - 1; >>>>> break; >>>>> } >>>>> /* For XOP: vpshl*, vpsha* and vprot*. */ >>>>> /* Fall through. */ >>>>> /* VEX.vvvv encodes the last source register operand. */ >>>>> case VexVVVV_SRC2: >>>>> v = source++; >>>>> break; >>>>> /* VEX.vvvv encodes the destination register operand. */ >>>>> case VexVVVV_DST: >>>>> v = dest--; >>>>> break; >>>>> default: >>>>> v = ~0; >>>>> break; >>>>> } >>>>> >>>>> Do you think we should add a separate patch 4 for XOP that removes >>>>> the >>>> special handling in match_template and completes its template? so we >>>> don't have to add special handling for src1vvvv or src2vvvv. This >>>> might go against your desire to reduce template size, but it would >>>> help simplify the logic. I'd like to know your thoughts. >>>> >>>> Indeed. You'd effectively revert earlier folding that I did. And the >>>> adjustment I suggested earlier ought to be small/simple enough. >>>> >>> >>> So, I continued working on the previous suggestion. With the following >> modification and it worked. >>> >>> @@ -8932,7 +8932,7 @@ match_template (char mnem_suffix) >>> || is_cpu (t, CpuAPX_F)); >>> if (!operand_type_match (overlap0, i.types[0]) >>> || !operand_type_match (overlap1, i.types[j]) >>> - || (t->operands == 3 >>> + || (t->operands == 3 && !is_cpu (t, CpuXOP) >>> && !operand_type_match (overlap2, i.types[1])) >>> || (check_register >>> && !operand_type_register_match (i.types[0], >> >> Just to mention it - this certainly isn't what I suggested. In fact I seem to >> vaguely recall that something similar was once proposed during the original >> APX work as well, where I then objected, too. >> >>> But I found that there are 4 test files that failed, I didn't find the doc on >> how to encode vprotb but I guess that is because I changed the default >> template from Src2VVVV| VexW0 to Src1VVVV| VexW1, then all the related >> test cases needed to be modified. Do you have any comments here? >>> >>> regexp "^[ ]*[a-f0-9]+: 8f e9 40 90 >> d8[ ]+vprotb %xmm7,%xmm0,%xmm3$" >>> line " 11ed: 8f e9 f8 90 df vprotb %xmm7,%xmm0,%xmm3" >> >> Well, while changing the templates is in principle possible, and the resulting >> code would still be correct, changing encodings it usually not a good idea. >> Thus when it can be avoided, it should be avoided, imo. Hence why I didn't >> suggest this, but to amend the code doing the operand swapping (for the >> case where operand order is controlled by XOP.W). >> > > I mistakenly thought this was what you wanted, and I also think this modification is unacceptable. Could you elaborate further on your original suggestion? At the bottom of match_template() we have case Opcode_VexW: /* Only the first two register operands need reversing, alongside flipping VEX.W. */ i.tm.opcode_modifier.vexw ^= VEXW0 ^ VEXW1; This is where I think you further want to adjust i.tm.opcode_modifier.vexvvvv. Jan