From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-x333.google.com (mail-wm1-x333.google.com [IPv6:2a00:1450:4864:20::333]) by sourceware.org (Postfix) with ESMTPS id 69D2D3857719 for ; Tue, 12 Dec 2023 12:40:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 69D2D3857719 Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 69D2D3857719 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::333 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702384803; cv=none; b=mBv/P9E6Nk8KJ6E7gnY38A7hj4nBXq2ne7aZFX34U1Z1r47K9tUKuadEdLiRypkVRqJYDIIpgroNQbEwdQ8ezxnZekgmYl7jCFgJGc1JTadOyKAt7OJiPY0+qBm18NYdPRP0ajcRrP0meWzHO0l51UIVG/RWKJvUEW118sVIQmM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702384803; c=relaxed/simple; bh=jpL8u2c+tm0NJTcH4fmcPVrnEQy4XAvAj8YLfbLp8qE=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=vtpbnmzMhUmf7mF2Zawb3PJMENKZzNXm3fsT926tVF7bojiy/EUy0r3eVWK5Vmq6qe/Yi6kfGtHBriUSONCEbaJV4c3hSvVcaKlFcdCAMvcUN1tLg/aSP1xwPy4Pq1I59pde8Jh1TaUX8voOX/uBSIk6WHhUB206KafNuVlYEdY= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wm1-x333.google.com with SMTP id 5b1f17b1804b1-40c517d0de5so5507325e9.0 for ; Tue, 12 Dec 2023 04:40:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1702384800; x=1702989600; darn=sourceware.org; h=content-transfer-encoding:in-reply-to:autocrypt:from:references:cc :to:content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=7S+7Dq6CeyB57zCzFJ66X19DJJZoi+c85pyChNbRRyI=; b=E0QjNWU3VSNsfAs8Q/QV6uygBuGhcUwYovphnwKTWM1RUb2HsZB5UPOT5GYb7nbxJf x47VhneUaWw0y+HFQStFpONG+WDHMTkS9rT0FasuENbJkATrl8CltuMiHVldkvY2EtCh K9e6elVxhGbwwtppAL3Ytmd45bD+cVIxjCFgM/l/w8k9sNAx7bCIP8TmHUE9sRH5YCRH MPvo7qcLfwqoA+UG1wWP6FBsEjJ6Kna3ZcefvTBz+YmOQR743YySK0vGlBf6jv6nLN34 0IINFYj2Sa9s588Kt+Ivg54q5hEuxexO7XJXHOYOE2foHHxCRxW1L+yCaOOSzJT0vk3q RCSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702384800; x=1702989600; h=content-transfer-encoding:in-reply-to:autocrypt:from:references:cc :to:content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=7S+7Dq6CeyB57zCzFJ66X19DJJZoi+c85pyChNbRRyI=; b=GUEX+j2xpXdmMHI1Jien6Ag+GK2McGs3lVTEK12V+KqeiJj4WqAQp3vS6qMgwBxIm1 zhXo2BhvOwjeF6EX3trKchoim16Klbf23M7TOvwdnQ8xQlWxaWhtXYLWe1Z3HZSAPtvb 8N1oR48QalgxugA++A7orIRhtaLBLW273jEnMzjPEJGWsxwxAxnqN077bM3l96BaKlaT Ca6821vofY7n9IJCMVOVIgOXVKWwRAdjN/yshz1/4aKvvCtvXZMU1XaJHnyqmRav2d+y StY3pSVkYZ6Nu8LXND7o7kAUfhkX69s25nCyTfsRlkIQDb+MRNe+V33brvhR3H/QlsE5 ZFuQ== X-Gm-Message-State: AOJu0YwJVGj9FodIqTBpyM0zxu3l9OOkLGZkCnVBaq6PvPIc6yDGR9M5 tIjLYnhE2I73Y9auYZbrn3SiMDIg+af486sqBZA7 X-Google-Smtp-Source: AGHT+IEsYFxae8a2nbXjVOCZ7OXCV/axQesiLfMzx4BrPvqdgL0zBsihdtWSUG9Y0lxFFKzxWRQCMg== X-Received: by 2002:a05:600c:4d8f:b0:40c:2654:5705 with SMTP id v15-20020a05600c4d8f00b0040c26545705mr3069205wmp.119.1702384800071; Tue, 12 Dec 2023 04:40:00 -0800 (PST) Received: from [10.156.60.236] (ip-037-024-206-209.um08.pools.vodafone-ip.de. [37.24.206.209]) by smtp.gmail.com with ESMTPSA id s12-20020adf978c000000b003333a0da243sm10854498wrb.81.2023.12.12.04.39.59 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 12 Dec 2023 04:39:59 -0800 (PST) Message-ID: Date: Tue, 12 Dec 2023 13:39:38 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 4/9] Support APX GPR32 with extend evex prefix Content-Language: en-US To: "Cui, Lili" Cc: "Lu, Hongjiu" , "binutils@sourceware.org" References: <20231124070213.3886483-1-lili.cui@intel.com> <20231124070213.3886483-4-lili.cui@intel.com> <546c8890-0526-49a3-8310-319358bf55c2@suse.com> <0bb5fbcd-f58e-48ad-a5ee-3413b026f903@suse.com> <61ef66ac-ae1c-4c57-b800-475437e225e6@suse.com> From: Jan Beulich Autocrypt: addr=jbeulich@suse.com; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3026.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_ASCII_DIVIDERS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 12.12.2023 13:32, Cui, Lili wrote: >>>>>>> @@ -3670,10 +3673,11 @@ install_template (const insn_template *t) >>>>>>> >>>>>>> /* Dual VEX/EVEX templates need stripping one of the possible >>>> variants. */ >>>>>>> if (t->opcode_modifier.vex && t->opcode_modifier.evex) >>>>>>> - { >>>>>>> - if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2) >>>>>>> - || maybe_cpu (t, CpuFMA)) >>>>>>> - && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL))) >>>>>>> + { >>>>>>> + if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA) >>>>>>> + || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) || >>>>>> APX_F(CpuCMPCCXADD) >>>>>>> + || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) || >>>>>> APX_F(CpuAVX512DQ) >>>>>>> + || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || >> APX_F(CpuBMI2)) >>>>>>> { >>>>>>> if (need_evex_encoding ()) >>>>>> >>>>>> There are several issues here: >>>>>> - Why did you need to change (to the worse) the original code? >>>>>> - Why did you not model the addition after that original code? >>>>>> - How come APX_F (CpuAVX512*) constructs appear here, when no >>>> AVX512 >>>>>> insn can be VEX-encoded? >>>>> >>>>> I don't understand what you mean, we have this combination. >>>>> >>>>> kmov, 0x90, AVX512BW&(AVX512BW|APX_F), >>>>> Modrm|Vex128|EVex128|Space0F|VexW1||NoSuf, { >>>>> RegMask||Unspecified|BaseIndex, RegMask } >>>> >>>> Oh, I'm sorry: I forgot about the mask register insns. >>>> >>>>>> - If these new macros are really needed for whatever reason, they >>>> shouldn't >>>>>> be added to opcodes/i386-opc.h when they're useful only in the >>>> assembler. >>>>>> - Style requires a blank before the opening parenthesis in function >>>>>> invocations (which also covers function-like macro invocations). >>>>>> >>>>>> I think I asked before: How is it that you get away without >>>>>> altering cpu_flags_match(), containing related and quite similar logic? >>>>>> >>>>> >>>>> For the original logic ( ... || ... ) && ( ... || ...), the content >>>>> in the first bracket >>>> and the content in the following brackets can be combined >>>> arbitrarily. I think it is Inaccurate. >>>> >>>> In which way? If there are issues with the existing code, these >>>> issues want taking care of in separate (prereq) patches. Of course >>>> there are assumptions made here about the CPU combinations that can >>>> (and cannot) occur in any of our templates. Similar assumptions are imo >> fine to make in the APX additions. >>>> >>>> Note how I used two nested if()s despite that not having been >>>> necessary at that time. I did so in anticipation that for APX you'd >>>> want to add another >>>> (separate) inner if(), rather than altering the one that's there. >>> >>> Could we remove the CPU check here? it's a bit ugly and has limited >> effectiveness. >>> >>> if (t->opcode_modifier.vex && t->opcode_modifier.evex) >>> { >>> if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA) >>> || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) || >> APX_F(CpuCMPCCXADD) >>> || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) || >> APX_F(CpuAVX512DQ) >>> || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2)) >> >> I agree on the "a bit ugly" part, but taking what's there right now I don't >> understand "has limited effectiveness". Of course you can remove any code >> you want, provided you can prove nothing breaks. >> > > Here is install_template(). > All I can say is that after removing the CPU check, no test cases failed. I know it's hard to convince you to delete this place, or what do you suggest to do with this? APX requires this, otherwise the test cases will fail. > > - if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA) > - || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) || APX_F(CpuCMPCCXADD) > - || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) || APX_F(CpuAVX512DQ) > - || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2)) > - { So be it then (assuming you don't delete any pre-existing code there). As said, I expect this will bite us later. >>>>> Just found cpu_flags_match() has similar logic, I think the >>>>> following is the >>>> only code related to CPUID alerts, but none of our combinations are >>>> related to cpuavx. >>>>> >>>>> if (all.bitfield.cpuavx) >>>>> { >>>>> /* We need to check SSE2AVX with AVX. */ >>>>> if (!t->opcode_modifier.sse2avx >>>>> || (sse2avx && !i.prefix[DATA_PREFIX])) >>>>> match |= CPU_FLAGS_ARCH_MATCH; >>>>> } >>>> >>>> Not sure why you pick out this one. This special case is needed for >>>> sse2avx; I don't see how it's related here. What I've been pointing >>>> you at is the code in that function which follows a similar "Dual VEX/EVEX >> templates ..." >>>> comment. >>>> >>> >>> I know you're talking about this code, I'm just guessing what it does? Don't >> know what I missed. >> >> You pulled out this sse2avx code. Hence I was expecting you to tell me why >> you consider it relevant here. >> > Here is cpu_flag_match(). > > I rechecked the code, maybe you want to say I missed the outer loop. > > cpu = cpu_flags_and (any, active); > if (cpu_flags_all_zero (&any) || !cpu_flags_all_zero (&cpu)) > { > if (all.bitfield.cpuavx) > { > /* We need to check SSE2AVX with AVX. */ > if (!t->opcode_modifier.sse2avx > || (sse2avx && !i.prefix[DATA_PREFIX])) > match |= CPU_FLAGS_ARCH_MATCH; > } > else > match |= CPU_FLAGS_ARCH_MATCH; > } No, ... >>> For example >>> >>> .arch .nobmi >>> andn (%eax), %eax, %eax >>> >>> --------------------------------------------------------------------------------------------- >>> if (flag_code != CODE_64BIT) >>> active = cpu_flags_and_not (cpu_arch_flags, cpu_64_flags); >>> else >>> active = cpu_arch_flags; ---> cpubmi = 0; >>> cpu = cpu_flags_and (all, active); ---> cpuapx =1; cpubmi = 0; >>> if (cpu_flags_equal (&cpu, &all)) ---> &cpu and &all are not same. >>> { >>> ... >>> } >>> Return CPU_FLAGS_64BIT_MATCH >>> ---------------------------------------------------------------------- >>> ------------------------ >>> Then we will report an arch error. >>> >>> if (supported != CPU_FLAGS_PERFECT_MATCH) >>> { >>> as_bad (_("`%s' is not supported on `%s%s'"), >>> insn_name (current_templates.start), >>> cpu_arch_name ? cpu_arch_name : default_arch, >>> cpu_sub_arch_name ? cpu_sub_arch_name : ""); >>> return NULL; >>> } >> >> Which is what we want, I think (for the particular example you picked)? Yet >> again, I don't think I can see what you're trying to tell me. I also have to >> confess I've lost track of whether we're discussing install_template(), >> cpu_flag_match(), or both. For example in install_template() you may indeed >> be able to get away with little or no changes, as long as there's no used >> features tracking for APX (see the early ELF-specific part of output_insn()). >> Things would be somewhat inconsistent then, but that may be tolerable (as >> long as properly justified in the patch description). Not getting this into >> proper shape right with the introduction of APX may bite us later, though. >> > > Here is cpu_flag_match(). > I just want to say that for the APX part we don't need to handle it in the "Double VEX/EVEX Template...". ... I was referring to the dual VEX/EVEX logic. I have to admit I still don't understand how you get away without touching that, but if everything works, all is fine of course. Jan