From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-x330.google.com (mail-wm1-x330.google.com [IPv6:2a00:1450:4864:20::330]) by sourceware.org (Postfix) with ESMTPS id D8E7E3847718 for ; Wed, 3 Apr 2024 08:19:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D8E7E3847718 Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D8E7E3847718 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::330 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712132346; cv=none; b=k9CH8gKQOOSzfgxSXfMRSicM2vKTlT3b/Zn9MudodglvbdKeCMPrSAghXTrTg61ZaKwhXKFRtsNmTvnabGWOJz0n0i0gTE71fl3gVZsHnnJiM8F+sDae9aZt/1TaGUKyKYqkJHhfs3Q1cjcEV0Cs2jgP4MYPUUcJJoevCtBfVMw= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712132346; c=relaxed/simple; bh=08mcG2IBOFuQ+uVnFTsGOXTAbcOpYI+lWhhh0G+yGWA=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=Nh7HstwQwxafX4YzwTqrqFyl2nngR1z6uJEJGTLzzUoQX8yB1GFYTU8FV5rp3kRISRhZC/EAJYKjOBUBGmwUMZONseldjL940kBpbluoRG48iN5Df9X6/Yn9Xy5NT0+87wsLRZcO3080bclTaVGebhUu4Oy0hG4CPvwqbXDBIPQ= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wm1-x330.google.com with SMTP id 5b1f17b1804b1-4156c4fe401so15675745e9.1 for ; Wed, 03 Apr 2024 01:19:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1712132341; x=1712737141; darn=sourceware.org; h=content-transfer-encoding:in-reply-to:autocrypt:from:references:cc :to:content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=Oe36xNViTNgau0GdRoaKreiZjTREYyZyjF2ufyFGhmE=; b=N+kYsOLWR7UyuzGA/UNtsFFafPOLhmiECEqnl7pq3J48fRQP27L7ClC/doEvg6N6Wa nnLmCRiDJLp6B7vW1eTiTHacIS3InSqRaCr1vXjf3bNwtQdZ7HMPYW0ifoMtaqVlvRtX O8vNwPtpq+ubl36sxoRfxN6Tsnl7p9CsCEeyQHSx84HKByGKL9ASoY05cSFGohdhCLfC 78uzHGc2szMU0tVDhIxuPL+SP5BCuqqEycn7n+PtrbtRWDHaJbZUiy9juKvlRc4lsMJd p5/YJrDaOBJpj+jru9ZTQylmIXu1aA0V0TDzY3ZHOUnM1WAQSen/TEyN7gwpFTAIrQ36 BacA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712132341; x=1712737141; h=content-transfer-encoding:in-reply-to:autocrypt:from:references:cc :to:content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Oe36xNViTNgau0GdRoaKreiZjTREYyZyjF2ufyFGhmE=; b=Gi1CKoPrEinVxIVACSa+9e1gO8VcgBZwh9aldnubqRvxergJpALKjmxU3ZH1xCPJ5p N53Vy7X1wyxSOAHvjUyh5++MeoBbyxDxktkZt7W46eBnQ9wHrxcV+NV9clIRVvLhRarg xfADAAnqgAkAbLEPluOu50yjSUueHgB9ZWUrjT/DNPrSh3e9veBod3PcPiO2EYvbJjgS 666FyrW5G+XXL1lJ94VJvt0VD816hUixOITLMaBqjGlwdUOdQAG6scWUDyIhBxwy0LC0 0kIcALukt2nAWLJ7xsUlhjTTrdyqe2zzlF8vOibWGlDGlUrWP1He7ELKRT7RNe+zRlBe 9eGw== X-Forwarded-Encrypted: i=1; AJvYcCXCocD33QRbczJpaKwv8pUoY9d7qRu74v4qVI/jYdyFqh/MVSEmjvETTqpeT4Lq2sDM8e3TE49FhklddFJvA5Hw676hXWTg+A== X-Gm-Message-State: AOJu0YxqTe1hsE5e32vsJxke5CSsUJ2YVI6t7gY/e96U8glaGNfasYAV t3hveqe0MJ50hToupAc+Cu9/0DHb2iSBFyM+k5kBdWdCAkaxv7Yfl/ahiV+0EA== X-Google-Smtp-Source: AGHT+IGMj5QE0MPxZCKcQ9e+Iwzj+fNs/Tv2dSafACmtkq5JN7yUoOl++2BgbDAYO0OO4Qf81JNR6g== X-Received: by 2002:a05:600c:281:b0:414:9310:7e14 with SMTP id 1-20020a05600c028100b0041493107e14mr11148531wmk.20.1712132341547; Wed, 03 Apr 2024 01:19:01 -0700 (PDT) Received: from [10.156.60.236] (ip-037-024-206-209.um08.pools.vodafone-ip.de. [37.24.206.209]) by smtp.gmail.com with ESMTPSA id j16-20020a5d6050000000b003437fec702dsm2432617wrt.21.2024.04.03.01.19.01 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 03 Apr 2024 01:19:01 -0700 (PDT) Message-ID: <7add52dd-e2ab-4a65-8636-f5bb41d4d45c@suse.com> Date: Wed, 3 Apr 2024 10:19:00 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 4/5] x86/APX: extend SSE2AVX coverage Content-Language: en-US To: "Cui, Lili" Cc: "H.J. Lu" , Binutils References: <155929a3-eb8b-4b82-a4ca-84ab6de34b97@suse.com> From: Jan Beulich Autocrypt: addr=jbeulich@suse.com; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3025.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 03.04.2024 09:59, Cui, Lili wrote: >>> This conversion is clever, although the mnemonic has changed, but >> considering it is controlled by -msse2avx, maybe we can mention in the >> option that it might change the mnemonic. Judging from the option name >> alone, it is difficult for users to predict that the mnemonic will change >> (traditionally, it seems to just add V). >> >> I don't think doc adjustment is needed here. We already have at least one >> example where the mnemonic also changes: CVTPI2PD -> VCVTDQ2PD. >> > > Oh, there has been such a conversion before. Another thing that comes to mind is that sse2avx was previously used to support sse to vex conversion. This option works on machines that don't support evex. We now extend sse to evex, which makes this option unavailable on machines that do not support the evex instruction (e.g. hybrid machines like Alderlake). Do you think we should add a new option? That's a question I've tentatively answered with "No". SSE => VEX requires systems supporting AVX. SSE-with-eGPR requires systems with APX. SSE-with-eGPR => EVEX similarly can rely on APX being there, and I expect all such systems will support at least AVX10/128. If that is deemed a wrong assumption, then indeed we may need to consider adding a new option (but not -msse2avx512 as you suggest further down, as SSE only ever covers 128- bit operations; -msse2avx10 maybe). >>>> Should we also convert %xmm-only templates (to consistently permit >>>> use of {evex})? Or should we reject use of {evex}, but then also that >>>> of {vex}/{vex3}? >>> >>> Do you mean SHA and KeyLocker? >> >> No, I mean templates with all XMM operands and no memory ones. Such >> don't use eGPR-s, yet could be converted to their EVEX counterparts, too (by >> way of the programmer adding {evex} to the _legacy_ insn). Hence the >> question on how to treat {evex} there, and then also {vex} / {vex3}. Take, for >> example, MOVHLPS or MOVLHPS. > > I'm not sure if you want to support this conversion under -sse2avx. I think this conversion is only used by people writing assembler by hand. Aiui -msse2avx is there mainly for hand-written assembly. Compilers will do better insn selection on their own anyway. > As for adding a prefix to convert sse to vex or evex, I think this requirement doesn't make much sense at the moment, maybe in the future if evex is faster than the vex instruction we can provide an option like sse2avx512 to achieve this conversion. That's not my point. Consider this example: .text sse2avx: movlhps %xmm0, %xmm1 {vex} movlhps %xmm0, %xmm1 {vex3} movlhps %xmm0, %xmm1 {evex} movlhps %xmm0, %xmm1 movlps (%rax), %xmm1 {vex} movlps (%rax), %xmm1 {vex3} movlps (%rax), %xmm1 {evex} movlps (%rax), %xmm1 Other than the {evex}-prefixed lines, everything assembles smoothly prior to the patch here. IOW even {vex3} has an effect on the non-VEX mnemonic. With my patch as it is now, the 2nd {evex}-prefixed line assembles fine, while the 1st doesn't. This is simply inconsistent. Hence why I see two options: Disallow all three pseudo-prefixes on legacy mnemonics, or permit {evex} consistently, too. Jan