From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-x32b.google.com (mail-wm1-x32b.google.com [IPv6:2a00:1450:4864:20::32b]) by sourceware.org (Postfix) with ESMTPS id 5730C3858C78 for ; Mon, 11 Dec 2023 12:27:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5730C3858C78 Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 5730C3858C78 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::32b ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702297679; cv=none; b=khGKYNyopn7TpeMUvvsh2NOahMAQz6QRkmsXl9YEeiOiRLHTgyRtTuulRTjpMOUYs/iepVsk9HXn7ivtbghsaT99wBzvxwySvejq53O8LLWmrIKh1dq5/G6H2HO/t8vGjsmIyTI6XtBoV6iYi7qCS+PpEaryW2argJ8bLsowvZ8= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702297679; c=relaxed/simple; bh=wusd0BV5JGAIJg9iSLd+Ft6o3u8Qc2heOYS0pt3Jjmk=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=wdRQ0EoUv7tdSm68XxA5xAeQYlSgkrdnQF3V7D5QtHtxN+3OfK14ELU2f995Wln+wsjWiqaANHMpkvVBKNb/RvJw9HOXDwkLuEI/rGyhbC4YB9GruZmP7mXv8MCmGM04B6VuWqtc7+MvF7Kjlnmw5vHdPqqMtYfmkxhMNA4+wCw= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wm1-x32b.google.com with SMTP id 5b1f17b1804b1-40c2d50bfbfso19838485e9.0 for ; Mon, 11 Dec 2023 04:27:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1702297676; x=1702902476; darn=sourceware.org; h=content-transfer-encoding:in-reply-to:autocrypt:from:references:cc :to:content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=mi2SYLqd2HTSoSfdMJzCVqd8nsLaY0Q9S2e4pjc+93A=; b=Oz7HFGVh12vU42itS5asTTR8wVRSos4BbcWiB/l2wuxkNUU8u9CDGaPCmlTVEvdtIH xqVAFocA0JWXzUCBRmPig+KAvl0u0i1HycVrlA4Ld2RrVGx1/NrXjLTbmSGD8IdR8a/C XrRdiOBBYtFwCH8gH7SnH4EZP6KO2pD5aWuK7gIv165s0B5r5gnO+lmI2N/MLnQIFxH3 amyXQVja8/oq9baTs+ddBTrNPXQMiltm3A3A2u6c9B1QFnEAaY0t3AB8aVTK168vWyoh wO5NEGH0EeZJ4TNL/6/k6C7a0IJiLHVV32ICBHB8WTgo6IuE7Dm3Trv4nZic3Ja88vZq FUXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702297676; x=1702902476; h=content-transfer-encoding:in-reply-to:autocrypt:from:references:cc :to:content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=mi2SYLqd2HTSoSfdMJzCVqd8nsLaY0Q9S2e4pjc+93A=; b=B2q+jVShCbxbmj3moZZs6vPoJjAJClDnp2r+kE0B5rEK4Z+/ZchT4Vpj1E4ceNEa4r ukuPGyIzQq+N3Gt4AVOGCyLtKvHNqWDHQEpduMl1fIXG8ozxsI0uxV44Tp92w9A81SXL I9voWxouaXq//qR6O6URtDKuhPCv822d+NCYqwlJbd4Jyq58pVlUhUWwgEtANDbxp6s+ YyE/JjjuEIzbkaXqm3U0I7xC6tvvpmfADVWx4J4xtZiGUbMzgyOUTIeBBLqahqbqdgx3 UPcmMY8D/uCPWftv+lG0FODJUOUlIiXrwNX6uCIOSvaEtliCjac9NNMDv39ZQGfckI5F GY9A== X-Gm-Message-State: AOJu0YwJwwh5aHgjpU9vdFcLXCMX2PGdHa7XYn+FZsfUjur+iUv88zhv 8kjT6sO7AobQsLGjCv281Clh X-Google-Smtp-Source: AGHT+IEjO1lesmRAqN68hWNkSEMUwFmOOoOV7z1d/iCNiYHOuOjQMlt0DwPKhTa6IngfqnWczixZnw== X-Received: by 2002:a05:600c:287:b0:40b:5e4a:2379 with SMTP id 7-20020a05600c028700b0040b5e4a2379mr2257515wmk.123.1702297675996; Mon, 11 Dec 2023 04:27:55 -0800 (PST) Received: from [10.156.60.236] (ip-037-024-206-209.um08.pools.vodafone-ip.de. [37.24.206.209]) by smtp.gmail.com with ESMTPSA id e12-20020a05600c4e4c00b0040b398f0585sm13056622wmq.9.2023.12.11.04.27.55 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 11 Dec 2023 04:27:55 -0800 (PST) Message-ID: Date: Mon, 11 Dec 2023 13:27:45 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 8/9] Support APX NDD optimized encoding. Content-Language: en-US To: "Cui, Lili" Cc: hongjiu.lu@intel.com, "Hu, Lin1" , binutils@sourceware.org References: <20231124070213.3886483-1-lili.cui@intel.com> <20231124070213.3886483-8-lili.cui@intel.com> From: Jan Beulich Autocrypt: addr=jbeulich@suse.com; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL In-Reply-To: <20231124070213.3886483-8-lili.cui@intel.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3025.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SCC_5_SHORT_WORD_LINES,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 24.11.2023 08:02, Cui, Lili wrote: > --- a/gas/config/tc-i386.c > +++ b/gas/config/tc-i386.c > @@ -7148,6 +7148,58 @@ check_APX_operands (const insn_template *t) > return 0; > } > > +/* Check if the instruction use the REX registers. */ > +static bool > +check_RexOperands () > +{ > + for (unsigned int op = 0; op < i.operands; op++) > + { > + if (i.types[op].bitfield.class != Reg) > + continue; > + > + if (i.op[op].regs->reg_flags & (RegRex | RegRex64)) > + return true; > + } > + > + if ((i.index_reg && (i.index_reg->reg_flags & (RegRex | RegRex64))) > + || (i.base_reg && (i.base_reg->reg_flags & (RegRex | RegRex64)))) > + return true; > + > + /* Check pseudo prefix {rex} are valid. */ > + return i.rex_encoding; Can this actually happen, when we're converting from EVEX to legacy? (Initially I wanted to ask about "rex" and alike prefixes, i.e. the non- pseudo ones.) > +} > + > +/* Optimize APX NDD insns to legacy insns. */ > +static unsigned int > +can_convert_NDD_to_legacy (const insn_template *t) > +{ > + unsigned int match_dest_op = ~0; > + > + if (t->opcode_modifier.vexvvvv == VexVVVV_DST No new callers are expected to appear (any time soon) and the sole caller has checked this already. Also with this check, ... > + && t->opcode_space == SPACE_EVEXMAP4 ... what (further) effect is this one intended to have? > + && !i.has_nf > + && i.reg_operands >= 2) > + { > + unsigned int dest = i.operands - 1; > + unsigned int src1 = i.operands - 2; > + unsigned int src2 = (i.operands > 3) ? i.operands - 3 : 0; > + > + if (i.types[src1].bitfield.class == Reg > + && i.op[src1].regs == i.op[dest].regs) > + match_dest_op = src1; > + /* If the first operand is the same as the third operand, > + these instructions need to support the ability to commutative > + the first two operands and still not change the semantics in order > + to be optimized. */ > + else if (i.types[src2].bitfield.class == Reg > + && i.op[src2].regs == i.op[dest].regs > + && optimize > 1 > + && t->opcode_modifier.commutative) Based on the "cheap conditions first" principle and to also be better in line with the comment, may I suggest + else if (optimize > 1 + && t->opcode_modifier.commutative + && i.types[src2].bitfield.class == Reg + && i.op[src2].regs == i.op[dest].regs) ? > + match_dest_op = src2; > + } > + return match_dest_op; > +} > + > /* Helper function for the progress() macro in match_template(). */ > static INLINE enum i386_error progress (enum i386_error new, > enum i386_error last, > @@ -7675,6 +7727,61 @@ match_template (char mnem_suffix) > i.memshift = memshift; > } > > + /* If we can optimize a NDD insn to legacy insn, like > + add %r16, %r8, %r8 -> add %r16, %r8, > + add %r8, %r16, %r8 -> add %r16, %r8, then rematch template. > + Note that the semantics have not been changed. */ > + if (optimize > + && !i.no_optimize > + && i.vec_encoding != vex_encoding_evex > + && t + 1 < current_templates->end > + && !t[1].opcode_modifier.evex > + && t[1].opcode_space <= SPACE_0F38 > + && t->opcode_modifier.vexvvvv == VexVVVV_DST) > + { > + unsigned int match_dest_op = can_convert_NDD_to_legacy (t); > + size_match = true; This would perhaps better ... > + if (match_dest_op != (unsigned int) ~0) > + { ... live here > + /* We ensure that the next template has the same input > + operands as the original matching template by the first > + opernd (ATT), thus avoiding the error caused by the wrong order > + of insns in i386.tbl. */ I'm sorry, but I (still) can't make sense of this last part of the comment, after the comma. > + overlap0 = operand_type_and (i.types[0], > + t[1].operand_types[0]); > + if (t->opcode_modifier.d) > + overlap1 = operand_type_and (i.types[0], > + t[1].operand_types[1]); > + if (!operand_type_match (overlap0, i.types[0]) > + && (!t->opcode_modifier.d > + || (t->opcode_modifier.d > + && !operand_type_match (overlap1, i.types[0])))) What's wrong with the simpler && (!t->opcode_modifier.d || !operand_type_match (overlap1, i.types[0]))) ? > + size_match = false; Yet still, and despite the improved comment, I don't really see what all of this is about. What cases would be mis-handled if this wasn't there? > + if (size_match > + /* Optimizing some non-legacy-map0/1 without REX/REX2 prefix will be valuable. */ > + && (t[1].opcode_space <= SPACE_0F Where a comment is placed is meaningful to understanding what it's about. The wayy you have it, is says "non-legacy-map0/1" on a check that the (next) encoding is map0 or map1. I think this wants moving down by a line, and even then also re-wording: If I didn't (vaguely) recall context, I don't think I could derive what is meant. Iirc this is about legacy encodings being one byte shorter for certain 0f38 space insns when they don't require a REX prefix to encode. How about something like "Some non-legacy-map0/1 insns can be shorter when legacy-encoded and when no REX prefix is required"? > + || (!check_EgprOperands (t + 1) > + && !check_RexOperands () I'm not going to insist that you adjust this, but these two calls side by side demonstrate a curious inconsistency: The former requires t to be passed in. If you keep it like that, I may change this down the road, the more that the t-related aspect isn't relevant here at all (and could hence be moved out of the function to the single place where it is needed). > + && !i.op[i.operands - 1].regs->reg_type.bitfield.qword))) > + { > + unsigned int src1 = i.operands - 2; > + unsigned int src2 = (i.operands > 3) ? i.operands - 3 : 0; > + > + if (match_dest_op == src2) > + swap_2_operands (match_dest_op, src1); Isn't it wrong (albeit benign) to swap when i.operands == 2? IOW wouldn't if (i.reg_operands > 2 && match_dest_op == i.operands - 3) swap_2_operands (match_dest_op, i.operands - 2); be more in line with what's actually wanted? > --- /dev/null > +++ b/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s > @@ -0,0 +1,123 @@ > +# Check 64bit APX NDD instructions with optimized encoding > + > + .text > +_start: > +add %r31,%r8,%r8 > +addb %r31b,%r8b,%r8b > +{store} add %r31,%r8,%r8 > +{load} add %r31,%r8,%r8 > +add %r31,(%r8),%r31 > +add (%r31),%r8,%r8 > +add $0x12344433,%r15,%r15 > +add $0xfffffffff4332211,%r8,%r8 > +inc %r31,%r31 > +incb %r31b,%r31b > +sub %r15,%r17,%r17 > +subb %r15b,%r17b,%r17b > +sub %r15,(%r8),%r15 > +sub (%r15,%rax,1),%r16,%r16 > +sub $0x1234,%r30,%r30 > +dec %r17,%r17 > +decb %r17b,%r17b > +sbb %r15,%r17,%r17 > +sbbb %r15b,%r17b,%r17b > +sbb %r15,(%r8),%r15 > +sbb (%r15,%rax,1),%r16,%r16 > +sbb $0x1234,%r30,%r30 > +and %r15,%r17,%r17 > +andb %r15b,%r17b,%r17b > +and %r15,(%r8),%r15 > +and (%r15,%rax,1),%r16,%r16 > +and $0x1234,%r30,%r30 > +or %r15,%r17,%r17 > +orb %r15b,%r17b,%r17b > +or %r15,(%r8),%r15 > +or (%r15,%rax,1),%r16,%r16 > +or $0x1234,%r30,%r30 > +xor %r15,%r17,%r17 > +xorb %r15b,%r17b,%r17b > +xor %r15,(%r8),%r15 > +xor (%r15,%rax,1),%r16,%r16 > +xor $0x1234,%r30,%r30 > +adc %r15,%r17,%r17 > +adcb %r15b,%r17b,%r17b > +adc %r15,(%r8),%r15 > +adc (%r15,%rax,1),%r16,%r16 > +adc $0x1234,%r30,%r30 > +neg %r17,%r17 > +negb %r17b,%r17b > +not %r17,%r17 > +notb %r17b,%r17b > +imul 0x90909(%eax),%edx,%edx > +imul 0x909(%rax,%r31,8),%rdx,%rdx > +imul %rdx,%rax,%rdx > +rol %r31,%r31 > +rolb %r31b,%r31b > +rol $0x2,%r12,%r12 > +rolb $0x2,%r12b,%r12b > +ror %r31,%r31 > +rorb %r31b,%r31b > +ror $0x2,%r12,%r12 > +rorb $0x2,%r12b,%r12b > +rcl %r31,%r31 > +rclb %r31b,%r31b > +rcl $0x2,%r12,%r12 > +rclb $0x2,%r12b,%r12b > +rcr %r31,%r31 > +rcrb %r31b,%r31b > +rcr $0x2,%r12,%r12 > +rcrb $0x2,%r12b,%r12b > +sal %r31,%r31 > +salb %r31b,%r31b > +sal $0x2,%r12,%r12 > +salb $0x2,%r12b,%r12b > +shl %r31,%r31 > +shlb %r31b,%r31b > +shl $0x2,%r12,%r12 > +shlb $0x2,%r12b,%r12b > +shr %r31,%r31 > +shrb %r31b,%r31b > +shr $0x2,%r12,%r12 > +shrb $0x2,%r12b,%r12b > +sar %r31,%r31 > +sarb %r31b,%r31b > +sar $0x2,%r12,%r12 > +sarb $0x2,%r12b,%r12b > +shld $0x1,%r12,(%rax),%r12 > +shld $0x2,%r8,%r12,%r12 > +shld $0x2,%r8,%r12,%r8 > +shld %cl,%r9,(%rax),%r9 > +shld %cl,%r12,%r16,%r16 > +shld %cl,%r12,%r16,%r12 > +shrd $0x1,%r12,(%rax),%r12 > +shrd $0x1,%r13,%r12,%r12 > +shrd $0x1,%r13,%r12,%r13 > +shrd %cl,%r9,(%rax),%r9 > +shrd %cl,%r12,%r16,%r16 > +shrd %cl,%r12,%r16,%r12 > +cmovo 0x90909090(%eax),%edx,%edx > +cmovno 0x90909090(%eax),%edx,%edx > +cmovb 0x90909090(%eax),%edx,%edx > +cmovae 0x90909090(%eax),%edx,%edx > +cmove 0x90909090(%eax),%edx,%edx > +cmovne 0x90909090(%eax),%edx,%edx > +cmovbe 0x90909090(%eax),%edx,%edx > +cmova 0x90909090(%eax),%edx,%edx > +cmovs 0x90909090(%eax),%edx,%edx > +cmovns 0x90909090(%eax),%edx,%edx > +cmovp 0x90909090(%eax),%edx,%edx > +cmovnp 0x90909090(%eax),%edx,%edx > +cmovl 0x90909090(%eax),%edx,%edx > +cmovge 0x90909090(%eax),%edx,%edx > +cmovle 0x90909090(%eax),%edx,%edx > +cmovg 0x90909090(%eax),%edx,%edx > +adcx %ebx,%eax,%eax > +adcx %eax,%ebx,%eax > +adcx %rbx,%rax,%rax > +adcx %r15,%r8,%r8 Might this better be adcx %r15d,%r8d,%r8d to avoid having two exclusion criteria (REX register use and REX.W set)? Or maybe even split to further separate source and destination: adcx %eax,%r8d,%r8d adcx %r15d,%eax,%eax ? > +adcx (%edx,%ecx,1),%eax,%eax > +adox %ebx,%eax,%eax > +adox %eax,%ebx,%eax > +adox %rbx,%rax,%rax > +adox %r15,%r8,%r8 > +adox (%edx,%ecx,1),%eax,%eax Same here then. Jan