From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qv1-xf2d.google.com (mail-qv1-xf2d.google.com [IPv6:2607:f8b0:4864:20::f2d]) by sourceware.org (Postfix) with ESMTPS id CE437387103C for ; Tue, 14 May 2024 08:34:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CE437387103C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org CE437387103C Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::f2d ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715675677; cv=none; b=i9/gbGlbKjDasujBOdHyp9UBkoc5VwfHM0dfF7C1HO9aquJZpR/QaNZjbjeykjETblN8EBU+STpJQLO3NrbsjaYy4T+MN74QRiYXzbbIi+rZwBCPEuHWIiqtQULXWqqpPH5CKTkEFVxvh3ZIoqziJxpL4bdBnJu5gVdaNaG2usw= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715675677; c=relaxed/simple; bh=LZIdE8oYgckJdk9jgDKhoH5Y8/7cCVu2/ZtTcpMUNiU=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=eBLo/Roizz0EXUadulkqME5HL9CwJRiCTnGLv2e8Ln8AwGLAyvp2m2fdGgCFzY7KZbB8vhitEQ7KVsigDyjIMIBSnGhdoDE/UTIG4Bjwv2nYoAh2UkVCaXJj6zLVMyBj0bfW2Qt/8YQ7I7wKIwe/oZaJk1iEw5++KHCz5pwZCJI= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-qv1-xf2d.google.com with SMTP id 6a1803df08f44-69b6c2e9ed9so25682826d6.1 for ; Tue, 14 May 2024 01:34:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715675674; x=1716280474; darn=gcc.gnu.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=lIWl1N9j04mV7ZRgPMP5dZ9pQC+1VMK+rypmGAslBjo=; b=WrV7hEvViHtqg+NRJhI8DxC6Ah4na9m/xYU5Poj4pwKLHunbjPJ2lmo00KJgdF94TB RdRe2ckNx3Bb04LqktIdc8yNFUAsu1BgwTcuq6b3e+Z5bDN4cNJoHHl8AyRcfA+JeALD DFPEfDVz0nkQnwHueG7h7ekLYZflBm8uiEifdVp6OvHxTLN6QHFbflsEArKFtnxbjFN9 l1R3ry0bYjnF0GDxRE+ZDobAPaJet7I1aLsCShff9DaFHvi4Vsgo9mHFSOjoWeqcx08S zvOsfXwMn9dkKuYCUE4l5B2zf3qxCmCvVoPy0FTHYwmIMcevzHH+Iaudt3TCqaIxpVch ZvEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715675674; x=1716280474; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lIWl1N9j04mV7ZRgPMP5dZ9pQC+1VMK+rypmGAslBjo=; b=CbHwROPWg86QGVcNezgWuN9hCzavoPi9q8r/934reDxUp9Y72wGnaaSg4ylac3LVDs uoGLvoWdo601SMSgJXNDxJBmhEqDJEXEMYgtY9OITmroNMH7ELXI1UjZz77CyeKBf/BM 1M8S/yq1s1MSHTu5wFaAhITOyC56RDys+lM+OMzU+IKkN5HhAsPLfL8NY13o4CKKx7kG CBZECzBtzssV3hSbp9q8YCU679Uals2rL98hh6F52OAKRX9RuZmCYyAUr5VT2e+KluS8 f6D4N9jDwc9AEnk4N+gOACtDTKVzG51U9/Fs0QEGhi/dPdgj3mFOVHhJkkLTurILAdGQ Z99w== X-Gm-Message-State: AOJu0YwdlLYdztAEBzSGoJCjslg96EPSFnskxFBR4tLmn2J7DmJB7spj fBWRap2bZpvJn2LcuuftgaI89UNOiXYfNXvCWpGKsRWEZV6KGih25LAEH1/Zq+2WJB8I61ZF8rU FUUHFoTHaR7Vp8hXZL6nqNTwRv6c= X-Google-Smtp-Source: AGHT+IFygrQuMihkmtwIBxLoU8UjmZ6Myeeb/VtCfuIhFb2JTmAOlcTtKFEV7yfmV4M2CMyknfkXDWQJR6bbLbKFCRs= X-Received: by 2002:a05:6214:2c02:b0:6a0:b352:f407 with SMTP id 6a1803df08f44-6a168165e40mr131215676d6.29.1715675674009; Tue, 14 May 2024 01:34:34 -0700 (PDT) MIME-Version: 1.0 References: <001801daa4b7$62d704c0$28850e40$@nextmovesoftware.com> In-Reply-To: <001801daa4b7$62d704c0$28850e40$@nextmovesoftware.com> From: Hongtao Liu Date: Tue, 14 May 2024 16:46:04 +0800 Message-ID: Subject: Re: [x86 SSE] Improve handling of ternlog instructions in i386/sse.md To: Roger Sayle Cc: gcc-patches@gcc.gnu.org, Uros Bizjak Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=0.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,MEDICAL_SUBJECT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, May 13, 2024 at 5:57=E2=80=AFAM Roger Sayle wrote: > > > This patch improves the way that the x86 backend recognizes and > expands AVX512's bitwise ternary logic (vpternlog) instructions. I like the patch. 1 file changed, 25 insertions(+), 1 deletion(-) gcc/config/i386/i386-expand.cc | 26 +++++++++++++++++++++++++- modified gcc/config/i386/i386-expand.cc @@ -25601,6 +25601,7 @@ ix86_gen_bcst_mem (machine_mode mode, rtx x) int ix86_ternlog_idx (rtx op, rtx *args) { + /* Nice dynamic programming:) */ int idx0, idx1; if (!op) @@ -25651,6 +25652,7 @@ ix86_ternlog_idx (rtx op, rtx *args) return 0xaa; } /* Maximum of one volatile memory reference per expression. */ + /* According to comments, it should be && ? */ if (side_effects_p (op) || side_effects_p (args[2])) return -1; if (rtx_equal_p (op, args[2])) @@ -25666,6 +25668,8 @@ ix86_ternlog_idx (rtx op, rtx *args) case SUBREG: if (!VECTOR_MODE_P (GET_MODE (SUBREG_REG (op))) + /* It could be TI/OI/XImode since it's just bit operations, + So no need for VECTOR_MODE_P? */ || GET_MODE_SIZE (GET_MODE (SUBREG_REG (op))) !=3D GET_MODE_SIZE (GET_MODE (op))) return -1; @@ -25701,7 +25705,7 @@ ix86_ternlog_idx (rtx op, rtx *args) case UNSPEC: if (XINT (op, 1) !=3D UNSPEC_VTERNLOG || XVECLEN (op, 0) !=3D 4 - || CONST_INT_P (XVECEXP (op, 0, 3))) + || !CONST_INT_P (XVECEXP (op, 0, 3))) return -1; /* TODO: Handle permuted operands. */ @@ -25778,10 +25782,13 @@ ix86_ternlog_operand_p (rtx op) /* Prefer pxor. */ if (ix86_ternlog_leaf_p (XEXP (op, 0), mode) && (ix86_ternlog_leaf_p (op1, mode) + /* Add some comments, it's because we already have one_cmpl2. */ || vector_all_ones_operand (op1, mode))) return false; break; + /* Wouldn't pternlog match (SUBREG: (REG))???,and it should also be excluded. + Similar for SUBREG: (AND/IOR/XOR)? */ default: break; } @@ -25865,25 +25872,35 @@ ix86_expand_ternlog (machine_mode mode, rtx op0, rtx op1, rtx op2, int idx, case 0x0a: /* ~a&c */ if ((!op1 || !side_effects_p (op1)) + /* shouldn't op1 always be register_operand with no side effects when it exists? + _vternlog_mask only supports register_operand for op1. + ix86_ternlog_idx only assigns REG to args[1]. + Ditto for op0, also we should add op2 && register_operand (op2, mode= ) + to avoid segment fault? */ && register_operand (op0, mode) && register_operand (op2, mode)) return ix86_expand_ternlog_andnot (mode, op0, op1, target); + /* op2 instead of op1??? */ break; case 0x0c: /* ~a&b */ if ((!op2 || !side_effects_p (op2)) && register_operand (op0, mode) && register_operand (op1, mode)) + /* If op0 and op1 exist, they must be register_operand? So just op0 && op1? */ return ix86_expand_ternlog_andnot (mode, op0, op1, target); break; case 0x0f: /* ~a */ if ((!op1 || !side_effects_p (op1)) + /* No need for !side_effects for op1? */ + /* Ditto. */ && (!op2 || !side_effects_p (op2))) { if (GET_MODE (op0) !=3D mode) op0 =3D gen_lowpart (mode, op0); if (!TARGET_64BIT && !register_operand (op0, mode)) + /* It must be register_operand for op0 when it exists, no? */ op0 =3D force_reg (mode, op0); emit_move_insn (target, gen_rtx_XOR (mode, op0, CONSTM1_RTX (mode))); return target; @@ -25894,6 +25911,7 @@ ix86_expand_ternlog (machine_mode mode, rtx op0, rtx op1, rtx op2, int idx, if ((!op0 || !side_effects_p (op0)) && register_operand (op1, mode) && register_operand (op2, mode)) + /* op1 && op2 && register_operand (op2, mode)?? */ return ix86_expand_ternlog_andnot (mode, op1, op2, target); break; @@ -25901,12 +25919,14 @@ ix86_expand_ternlog (machine_mode mode, rtx op0, rtx op1, rtx op2, int idx, if ((!op2 || !side_effects_p (op2)) && register_operand (op0, mode) && register_operand (op1, mode)) + /* op0 && op1? */ return ix86_expand_ternlog_andnot (mode, op1, op0, target); break; case 0x33: /* ~b */ if ((!op0 || !side_effects_p (op0)) && (!op2 || !side_effects_p (op2))) + /* op1 && (!op2 || !side_effects_p (op2)) ? */ { if (GET_MODE (op1) !=3D mode) op1 =3D gen_lowpart (mode, op1); @@ -26051,6 +26071,10 @@ ix86_expand_ternlog (machine_mode mode, rtx op0, rtx op1, rtx op2, int idx, tmp2 =3D ix86_gen_bcst_mem (mode, op2); if (!tmp2) tmp2 =3D validize_mem (force_const_mem (mode, op2)); + /* Can we use ix86_expand_vector_move here, it will try move integer to gpr, + and broadcast gpr to the vector register. + It should be faster than a constant pool, and PR115021 should be solved b= y + another way instead of this walkaround. */ } else tmp2 =3D op2; --=20 BR, Hongtao