From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 1D9E93858D20 for ; Sat, 6 Jan 2024 13:30:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1D9E93858D20 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 1D9E93858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=162.254.253.69 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1704547839; cv=none; b=td4VyW04nLGp06JyxyQXwhUn9qu/zOeoWWFzILNb3r/ALYcYwyz3JkW3mXBotdnFX6Qda3uD+RAwj8LiaQnGciw70G5d8gP/keYpwYWTWlxS8sCQXFDu4I/nGIKTLjU3/9lAXdeJCtva8/eZ1/j+m+5tkC6/cOQdzUGimDLSmUc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1704547839; c=relaxed/simple; bh=yMWn6scKye5Smd2b79WpDxNW+6QtOl8QM6aDhbQBZh0=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=lAhctngD6wfOHexsyhqflCz55CzCF6szizMmnLxjjfonBoNH2IEiTX5A7Q2fPWTgtQIddaUNW/rL/Zn9e0s7b6rTx4p0Nb8m/+C1LZrGocpJ9w3KaKIppHyxwiqZ+MTFzncb5RkJe2nqT678/oYHOMYPziDVKtkRKO/dAnaiIFc= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=uyDPpzm35YW5rFFGBN0TK9WGLcfhjO7FBv1kbnTRPJs=; b=oWSRUhZp8S2u8Nvoe4bjo6c+B9 ealzmQPNJw9zaNkGnx5C/PoHBm1hMgJhnmW5FlongGZqQ48inF0Afh3Ye1gR1zNFpd66A13QY4wDe R3EPlA/Vp467ekwPKYRY72E+RQbjkesRMhVQlGzCjBio1QtqwFY3goFf78Ijkbv1SDEJ6M8uh8mcj 4sPsTHecrA1taUR0JWa6GlU7Iqrzs9aN4O31XNMsbX8tkBVJkq6LXIlpabPqAi2xeoxGgFi46cEl+ ThW58QyjxxwuBkTVlkqHX8oqnVTbiKz+0yNOBmrn243boQI0eecUUZP3We7/tm4YaBce1aUvkJM6X jy/W9svA==; Received: from host109-154-238-190.range109-154.btcentralplus.com ([109.154.238.190]:59937 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96.2) (envelope-from ) id 1rM6l2-0000aR-1g; Sat, 06 Jan 2024 08:30:36 -0500 From: "Roger Sayle" To: Cc: "'Uros Bizjak'" Subject: [x86 PATCH] PR target/113231: Improved costs in Scalar-To-Vector (STV) pass. Date: Sat, 6 Jan 2024 13:30:34 -0000 Message-ID: <03c401da40a4$8819fe30$984dfa90$@nextmovesoftware.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_03C5_01DA40A4.881A7360" X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdpApDbB5NHW6ssNQtCM9MBaqshtwQ== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,LIKELY_SPAM_BODY,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This is a multipart message in MIME format. ------=_NextPart_000_03C5_01DA40A4.881A7360 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit This patch improves the cost/gain calculation used during the i386 backend's SImode/DImode scalar-to-vector (STV) conversion pass. The current code handles loads and stores, but doesn't consider that converting other scalar operations with a memory destination, requires an explicit load before and an explicit store after the vector equivalent. To ease the review, the significant change looks like: /* For operations on memory operands, include the overhead of explicit load and store instructions. */ if (MEM_P (dst)) igain += !optimize_insn_for_size_p () ? (m * (ix86_cost->int_load[2] + ix86_cost->int_store[2]) - (ix86_cost->sse_load[sse_cost_idx] + ix86_cost->sse_store[sse_cost_idx])) : -COSTS_N_BYTES (8); however the patch itself is complicated by a change in indentation which leads to a number of lines with only whitespace changes. For architectures where integer load/store costs are the same as vector load/store costs, there should be no change without -Os/-Oz. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2024-01-06 Roger Sayle gcc/ChangeLog PR target/113231 * config/i386/i386-features.cc (compute_convert_gain): Include the overhead of explicit load and store (movd) instructions when converting non-store scalar operations with memory destinations. gcc/testsuite/ChangeLog PR target/113231 * gcc.target/i386/pr113231.c: New test case. Thanks again, Roger -- ------=_NextPart_000_03C5_01DA40A4.881A7360 Content-Type: text/plain; name="patchvc.txt" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="patchvc.txt" diff --git a/gcc/config/i386/i386-features.cc = b/gcc/config/i386/i386-features.cc=0A= index 4ae3e75..3677aef 100644=0A= --- a/gcc/config/i386/i386-features.cc=0A= +++ b/gcc/config/i386/i386-features.cc=0A= @@ -563,183 +563,195 @@ general_scalar_chain::compute_convert_gain ()=0A= else if (MEM_P (src) && REG_P (dst))=0A= igain +=3D m * ix86_cost->int_load[2] - = ix86_cost->sse_load[sse_cost_idx];=0A= else=0A= - switch (GET_CODE (src))=0A= - {=0A= - case ASHIFT:=0A= - case ASHIFTRT:=0A= - case LSHIFTRT:=0A= - if (m =3D=3D 2)=0A= - {=0A= - if (INTVAL (XEXP (src, 1)) >=3D 32)=0A= - igain +=3D ix86_cost->add;=0A= - /* Gain for extend highpart case. */=0A= - else if (GET_CODE (XEXP (src, 0)) =3D=3D ASHIFT)=0A= - igain +=3D ix86_cost->shift_const - ix86_cost->sse_op;=0A= - else=0A= - igain +=3D ix86_cost->shift_const;=0A= - }=0A= -=0A= - igain +=3D ix86_cost->shift_const - ix86_cost->sse_op;=0A= + {=0A= + /* For operations on memory operands, include the overhead=0A= + of explicit load and store instructions. */=0A= + if (MEM_P (dst))=0A= + igain +=3D !optimize_insn_for_size_p ()=0A= + ? (m * (ix86_cost->int_load[2]=0A= + + ix86_cost->int_store[2])=0A= + - (ix86_cost->sse_load[sse_cost_idx] +=0A= + ix86_cost->sse_store[sse_cost_idx]))=0A= + : -COSTS_N_BYTES (8);=0A= =0A= - if (CONST_INT_P (XEXP (src, 0)))=0A= - igain -=3D vector_const_cost (XEXP (src, 0));=0A= - break;=0A= + switch (GET_CODE (src))=0A= + {=0A= + case ASHIFT:=0A= + case ASHIFTRT:=0A= + case LSHIFTRT:=0A= + if (m =3D=3D 2)=0A= + {=0A= + if (INTVAL (XEXP (src, 1)) >=3D 32)=0A= + igain +=3D ix86_cost->add;=0A= + /* Gain for extend highpart case. */=0A= + else if (GET_CODE (XEXP (src, 0)) =3D=3D ASHIFT)=0A= + igain +=3D ix86_cost->shift_const - ix86_cost->sse_op;=0A= + else=0A= + igain +=3D ix86_cost->shift_const;=0A= + }=0A= =0A= - case ROTATE:=0A= - case ROTATERT:=0A= - igain +=3D m * ix86_cost->shift_const;=0A= - if (TARGET_AVX512VL)=0A= - igain -=3D ix86_cost->sse_op;=0A= - else if (smode =3D=3D DImode)=0A= - {=0A= - int bits =3D INTVAL (XEXP (src, 1));=0A= - if ((bits & 0x0f) =3D=3D 0)=0A= - igain -=3D ix86_cost->sse_op;=0A= - else if ((bits & 0x07) =3D=3D 0)=0A= - igain -=3D 2 * ix86_cost->sse_op;=0A= - else=0A= - igain -=3D 3 * ix86_cost->sse_op;=0A= - }=0A= - else if (INTVAL (XEXP (src, 1)) =3D=3D 16)=0A= - igain -=3D ix86_cost->sse_op;=0A= - else=0A= - igain -=3D 2 * ix86_cost->sse_op;=0A= - break;=0A= + igain +=3D ix86_cost->shift_const - ix86_cost->sse_op;=0A= =0A= - case AND:=0A= - case IOR:=0A= - case XOR:=0A= - case PLUS:=0A= - case MINUS:=0A= - igain +=3D m * ix86_cost->add - ix86_cost->sse_op;=0A= - /* Additional gain for andnot for targets without BMI. */=0A= - if (GET_CODE (XEXP (src, 0)) =3D=3D NOT=0A= - && !TARGET_BMI)=0A= - igain +=3D m * ix86_cost->add;=0A= -=0A= - if (CONST_INT_P (XEXP (src, 0)))=0A= - igain -=3D vector_const_cost (XEXP (src, 0));=0A= - if (CONST_INT_P (XEXP (src, 1)))=0A= - igain -=3D vector_const_cost (XEXP (src, 1));=0A= - if (MEM_P (XEXP (src, 1)))=0A= - {=0A= - if (optimize_insn_for_size_p ())=0A= - igain -=3D COSTS_N_BYTES (m =3D=3D 2 ? 3 : 5);=0A= - else=0A= - igain +=3D m * ix86_cost->int_load[2]=0A= - - ix86_cost->sse_load[sse_cost_idx];=0A= - }=0A= - break;=0A= + if (CONST_INT_P (XEXP (src, 0)))=0A= + igain -=3D vector_const_cost (XEXP (src, 0));=0A= + break;=0A= =0A= - case NEG:=0A= - case NOT:=0A= - igain -=3D ix86_cost->sse_op + COSTS_N_INSNS (1);=0A= + case ROTATE:=0A= + case ROTATERT:=0A= + igain +=3D m * ix86_cost->shift_const;=0A= + if (TARGET_AVX512VL)=0A= + igain -=3D ix86_cost->sse_op;=0A= + else if (smode =3D=3D DImode)=0A= + {=0A= + int bits =3D INTVAL (XEXP (src, 1));=0A= + if ((bits & 0x0f) =3D=3D 0)=0A= + igain -=3D ix86_cost->sse_op;=0A= + else if ((bits & 0x07) =3D=3D 0)=0A= + igain -=3D 2 * ix86_cost->sse_op;=0A= + else=0A= + igain -=3D 3 * ix86_cost->sse_op;=0A= + }=0A= + else if (INTVAL (XEXP (src, 1)) =3D=3D 16)=0A= + igain -=3D ix86_cost->sse_op;=0A= + else=0A= + igain -=3D 2 * ix86_cost->sse_op;=0A= + break;=0A= =0A= - if (GET_CODE (XEXP (src, 0)) !=3D ABS)=0A= - {=0A= + case AND:=0A= + case IOR:=0A= + case XOR:=0A= + case PLUS:=0A= + case MINUS:=0A= + igain +=3D m * ix86_cost->add - ix86_cost->sse_op;=0A= + /* Additional gain for andnot for targets without BMI. */=0A= + if (GET_CODE (XEXP (src, 0)) =3D=3D NOT=0A= + && !TARGET_BMI)=0A= igain +=3D m * ix86_cost->add;=0A= - break;=0A= - }=0A= - /* FALLTHRU */=0A= -=0A= - case ABS:=0A= - case SMAX:=0A= - case SMIN:=0A= - case UMAX:=0A= - case UMIN:=0A= - /* We do not have any conditional move cost, estimate it as a=0A= - reg-reg move. Comparisons are costed as adds. */=0A= - igain +=3D m * (COSTS_N_INSNS (2) + ix86_cost->add);=0A= - /* Integer SSE ops are all costed the same. */=0A= - igain -=3D ix86_cost->sse_op;=0A= - break;=0A= =0A= - case COMPARE:=0A= - if (XEXP (src, 1) !=3D const0_rtx)=0A= - {=0A= - /* cmp vs. pxor;pshufd;ptest. */=0A= - igain +=3D COSTS_N_INSNS (m - 3);=0A= - }=0A= - else if (GET_CODE (XEXP (src, 0)) !=3D AND)=0A= - {=0A= - /* test vs. pshufd;ptest. */=0A= - igain +=3D COSTS_N_INSNS (m - 2);=0A= - }=0A= - else if (GET_CODE (XEXP (XEXP (src, 0), 0)) !=3D NOT)=0A= - {=0A= - /* and;test vs. pshufd;ptest. */=0A= - igain +=3D COSTS_N_INSNS (2 * m - 2);=0A= - }=0A= - else if (TARGET_BMI)=0A= - {=0A= - /* andn;test vs. pandn;pshufd;ptest. */=0A= - igain +=3D COSTS_N_INSNS (2 * m - 3);=0A= - }=0A= - else=0A= - {=0A= - /* not;and;test vs. pandn;pshufd;ptest. */=0A= - igain +=3D COSTS_N_INSNS (3 * m - 3);=0A= - }=0A= - break;=0A= + if (CONST_INT_P (XEXP (src, 0)))=0A= + igain -=3D vector_const_cost (XEXP (src, 0));=0A= + if (CONST_INT_P (XEXP (src, 1)))=0A= + igain -=3D vector_const_cost (XEXP (src, 1));=0A= + if (MEM_P (XEXP (src, 1)))=0A= + {=0A= + if (optimize_insn_for_size_p ())=0A= + igain -=3D COSTS_N_BYTES (m =3D=3D 2 ? 3 : 5);=0A= + else=0A= + igain +=3D m * ix86_cost->int_load[2]=0A= + - ix86_cost->sse_load[sse_cost_idx];=0A= + }=0A= + break;=0A= =0A= - case CONST_INT:=0A= - if (REG_P (dst))=0A= - {=0A= - if (optimize_insn_for_size_p ())=0A= - {=0A= - /* xor (2 bytes) vs. xorps (3 bytes). */=0A= - if (src =3D=3D const0_rtx)=0A= - igain -=3D COSTS_N_BYTES (1);=0A= - /* movdi_internal vs. movv2di_internal. */=0A= - /* =3D> mov (5 bytes) vs. movaps (7 bytes). */=0A= - else if (x86_64_immediate_operand (src, SImode))=0A= - igain -=3D COSTS_N_BYTES (2);=0A= - else=0A= - /* ??? Larger immediate constants are placed in the=0A= - constant pool, where the size benefit/impact of=0A= - STV conversion is affected by whether and how=0A= - often each constant pool entry is shared/reused.=0A= - The value below is empirically derived from the=0A= - CSiBE benchmark (and the optimal value may drift=0A= - over time). */=0A= - igain +=3D COSTS_N_BYTES (0);=0A= - }=0A= - else=0A= - {=0A= - /* DImode can be immediate for TARGET_64BIT=0A= - and SImode always. */=0A= - igain +=3D m * COSTS_N_INSNS (1);=0A= - igain -=3D vector_const_cost (src);=0A= - }=0A= - }=0A= - else if (MEM_P (dst))=0A= - {=0A= - igain +=3D (m * ix86_cost->int_store[2]=0A= - - ix86_cost->sse_store[sse_cost_idx]);=0A= - igain -=3D vector_const_cost (src);=0A= - }=0A= - break;=0A= + case NEG:=0A= + case NOT:=0A= + igain -=3D ix86_cost->sse_op + COSTS_N_INSNS (1);=0A= =0A= - case VEC_SELECT:=0A= - if (XVECEXP (XEXP (src, 1), 0, 0) =3D=3D const0_rtx)=0A= - {=0A= - // movd (4 bytes) replaced with movdqa (4 bytes).=0A= - if (!optimize_insn_for_size_p ())=0A= - igain +=3D ix86_cost->sse_to_integer - ix86_cost->xmm_move;=0A= - }=0A= - else=0A= - {=0A= - // pshufd; movd replaced with pshufd.=0A= - if (optimize_insn_for_size_p ())=0A= - igain +=3D COSTS_N_BYTES (4);=0A= - else=0A= - igain +=3D ix86_cost->sse_to_integer;=0A= - }=0A= - break;=0A= + if (GET_CODE (XEXP (src, 0)) !=3D ABS)=0A= + {=0A= + igain +=3D m * ix86_cost->add;=0A= + break;=0A= + }=0A= + /* FALLTHRU */=0A= +=0A= + case ABS:=0A= + case SMAX:=0A= + case SMIN:=0A= + case UMAX:=0A= + case UMIN:=0A= + /* We do not have any conditional move cost, estimate it as a=0A= + reg-reg move. Comparisons are costed as adds. */=0A= + igain +=3D m * (COSTS_N_INSNS (2) + ix86_cost->add);=0A= + /* Integer SSE ops are all costed the same. */=0A= + igain -=3D ix86_cost->sse_op;=0A= + break;=0A= =0A= - default:=0A= - gcc_unreachable ();=0A= - }=0A= + case COMPARE:=0A= + if (XEXP (src, 1) !=3D const0_rtx)=0A= + {=0A= + /* cmp vs. pxor;pshufd;ptest. */=0A= + igain +=3D COSTS_N_INSNS (m - 3);=0A= + }=0A= + else if (GET_CODE (XEXP (src, 0)) !=3D AND)=0A= + {=0A= + /* test vs. pshufd;ptest. */=0A= + igain +=3D COSTS_N_INSNS (m - 2);=0A= + }=0A= + else if (GET_CODE (XEXP (XEXP (src, 0), 0)) !=3D NOT)=0A= + {=0A= + /* and;test vs. pshufd;ptest. */=0A= + igain +=3D COSTS_N_INSNS (2 * m - 2);=0A= + }=0A= + else if (TARGET_BMI)=0A= + {=0A= + /* andn;test vs. pandn;pshufd;ptest. */=0A= + igain +=3D COSTS_N_INSNS (2 * m - 3);=0A= + }=0A= + else=0A= + {=0A= + /* not;and;test vs. pandn;pshufd;ptest. */=0A= + igain +=3D COSTS_N_INSNS (3 * m - 3);=0A= + }=0A= + break;=0A= +=0A= + case CONST_INT:=0A= + if (REG_P (dst))=0A= + {=0A= + if (optimize_insn_for_size_p ())=0A= + {=0A= + /* xor (2 bytes) vs. xorps (3 bytes). */=0A= + if (src =3D=3D const0_rtx)=0A= + igain -=3D COSTS_N_BYTES (1);=0A= + /* movdi_internal vs. movv2di_internal. */=0A= + /* =3D> mov (5 bytes) vs. movaps (7 bytes). */=0A= + else if (x86_64_immediate_operand (src, SImode))=0A= + igain -=3D COSTS_N_BYTES (2);=0A= + else=0A= + /* ??? Larger immediate constants are placed in the=0A= + constant pool, where the size benefit/impact of=0A= + STV conversion is affected by whether and how=0A= + often each constant pool entry is shared/reused.=0A= + The value below is empirically derived from the=0A= + CSiBE benchmark (and the optimal value may drift=0A= + over time). */=0A= + igain +=3D COSTS_N_BYTES (0);=0A= + }=0A= + else=0A= + {=0A= + /* DImode can be immediate for TARGET_64BIT=0A= + and SImode always. */=0A= + igain +=3D m * COSTS_N_INSNS (1);=0A= + igain -=3D vector_const_cost (src);=0A= + }=0A= + }=0A= + else if (MEM_P (dst))=0A= + {=0A= + igain +=3D (m * ix86_cost->int_store[2]=0A= + - ix86_cost->sse_store[sse_cost_idx]);=0A= + igain -=3D vector_const_cost (src);=0A= + }=0A= + break;=0A= +=0A= + case VEC_SELECT:=0A= + if (XVECEXP (XEXP (src, 1), 0, 0) =3D=3D const0_rtx)=0A= + {=0A= + // movd (4 bytes) replaced with movdqa (4 bytes).=0A= + if (!optimize_insn_for_size_p ())=0A= + igain +=3D ix86_cost->sse_to_integer - ix86_cost->xmm_move;=0A= + }=0A= + else=0A= + {=0A= + // pshufd; movd replaced with pshufd.=0A= + if (optimize_insn_for_size_p ())=0A= + igain +=3D COSTS_N_BYTES (4);=0A= + else=0A= + igain +=3D ix86_cost->sse_to_integer;=0A= + }=0A= + break;=0A= +=0A= + default:=0A= + gcc_unreachable ();=0A= + }=0A= + }=0A= =0A= if (igain !=3D 0 && dump_file)=0A= {=0A= diff --git a/gcc/testsuite/gcc.target/i386/pr113231.c = b/gcc/testsuite/gcc.target/i386/pr113231.c=0A= new file mode 100644=0A= index 0000000..f9dcd9a=0A= --- /dev/null=0A= +++ b/gcc/testsuite/gcc.target/i386/pr113231.c=0A= @@ -0,0 +1,8 @@=0A= +/* { dg-do compile } */=0A= +/* { dg-options "-Os" } */=0A= +=0A= +void foo(int *i) { *i *=3D 2; }=0A= +void bar(int *i) { *i <<=3D 2; }=0A= +void baz(int *i) { *i >>=3D 2; }=0A= +=0A= +/* { dg-final { scan-assembler-not "movd" } } */=0A= ------=_NextPart_000_03C5_01DA40A4.881A7360--