From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id E52193858D37 for ; Mon, 23 Oct 2023 14:47:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E52193858D37 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E52193858D37 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=162.254.253.69 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698072469; cv=none; b=CHfVbVWoCe+CZOkRWSrhzYYjZUMYa1RgfXesOMGte3qgFkWyxv1Ljlo5vQo/2YOPa7NH2WqqhNvHEHJcESLQ7C0xcdIIKmwFYKkZBafjHowiY9KmLV75HRN+VxGq9Jr/MK9zoXrFOLMDsIWQKBqHFv/xkgYjkC4R8oXjXNPZX0c= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698072469; c=relaxed/simple; bh=kUDTod1VKsWtIJ3fpsheuV7q3pLnFBdqyH16OwiEk+Q=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=qBqbS6wDJJMq7E5JpFlUfLWbE08LqQrctE8G3VsCNMRelmFGq8i4EY4AOR3/iXdKhpbw8IxsLfgB77NSY8hLlIC186OJxZ84Dm1+QFJJrhDn1belPtNxAURjxJreraXU3ZuBcsmkOS4MPcHT+eVpMQVTb+7N+bbmAbipd/aPkak= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=ljkcFqQ2fmxM2wwb+zn1j/74IMopbW2RgIkC9V3TQ0Q=; b=iM7TLYyT7colWZOopHIsKJJ0tt 6Y8SHgWKi2pafoS6wEKwyO7PBBMzmDszOPkOJFe6HigK7IwfLfCGVT8geR/ZxpFSiOePvOtM9wNIC kmFaIb4dzCB6FUxHepJ54aZ9XnPGufRYuusFgL4VYux2S+zG9nz5d48Avzrvz531jyhmEBK+jHWxd Glpy9/G2vwBPj2emFvS1GbBYUMf2zwAU6Ppm59+Kn1udngGlxQ2McH+hGplhnEXfMxZRk1m4OZ/uF 0xfSqxsX+0+Y19a8DZPuIio7TMzM34v6LqLvkpYoMbhVttIMOWNmvef3HVOgxMNqgiia0ZmYhKgcr AA1Mqutg==; Received: from [185.62.158.67] (port=59056 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96.2) (envelope-from ) id 1quwDa-0003Kv-33; Mon, 23 Oct 2023 10:47:47 -0400 From: "Roger Sayle" To: Cc: "'Uros Bizjak'" Subject: [x86 PATCH] Fine tune STV register conversion costs for -Os. Date: Mon, 23 Oct 2023 15:47:43 +0100 Message-ID: <008701da05bf$e2196b20$a64c4160$@nextmovesoftware.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_0088_01DA05C8.43E01D10" X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdoFvuPAJF8GrvPGRKyG7FIXwkrlrg== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This is a multipart message in MIME format. ------=_NextPart_000_0088_01DA05C8.43E01D10 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit The eagle-eyed may have spotted that my recent testcases for DImode shifts on x86_64 included -mno-stv in the dg-options. This is because the Scalar-To-Vector (STV) pass currently transforms these shifts to use SSE vector operations, producing larger code even with -Os. The issue is that the compute_convert_gain currently underestimates the size of instructions required for interunit moves, which is corrected with the patch below. For the simple test case: unsigned long long shl1(unsigned long long x) { return x << 1; } without this patch, GCC -m32 -Os -mavx2 currently generates: shl1: push %ebp // 1 byte mov %esp,%ebp // 2 bytes vmovq 0x8(%ebp),%xmm0 // 5 bytes pop %ebp // 1 byte vpaddq %xmm0,%xmm0,%xmm0 // 4 bytes vmovd %xmm0,%eax // 4 bytes vpextrd $0x1,%xmm0,%edx // 6 bytes ret // 1 byte = 24 bytes total with this patch, we now generate the shorter shl1: push %ebp // 1 byte mov %esp,%ebp // 2 bytes mov 0x8(%ebp),%eax // 3 bytes mov 0xc(%ebp),%edx // 3 bytes pop %ebp // 1 byte add %eax,%eax // 2 bytes adc %edx,%edx // 2 bytes ret // 1 byte = 15 bytes total Benchmarking using CSiBE, shows that this patch saves 1361 bytes when compiling with -m32 -Os, and saves 172 bytes when compiling with -Os. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2023-10-23 Roger Sayle gcc/ChangeLog * config/i386/i386-features.cc (compute_convert_gain): Provide more accurate values (sizes) for inter-unit moves with -Os. Thanks in advance, Roger -- ------=_NextPart_000_0088_01DA05C8.43E01D10 Content-Type: text/plain; name="patchoz.txt" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="patchoz.txt" diff --git a/gcc/config/i386/i386-features.cc = b/gcc/config/i386/i386-features.cc=0A= index cead397..6fac67e 100644=0A= --- a/gcc/config/i386/i386-features.cc=0A= +++ b/gcc/config/i386/i386-features.cc=0A= @@ -752,11 +752,33 @@ general_scalar_chain::compute_convert_gain ()=0A= fprintf (dump_file, " Instruction conversion gain: %d\n", gain);=0A= =0A= /* Cost the integer to sse and sse to integer moves. */=0A= - cost +=3D n_sse_to_integer * ix86_cost->sse_to_integer;=0A= - /* ??? integer_to_sse but we only have that in the RA cost table.=0A= - Assume sse_to_integer/integer_to_sse are the same which they=0A= - are at the moment. */=0A= - cost +=3D n_integer_to_sse * ix86_cost->sse_to_integer;=0A= + if (!optimize_function_for_size_p (cfun))=0A= + {=0A= + cost +=3D n_sse_to_integer * ix86_cost->sse_to_integer;=0A= + /* ??? integer_to_sse but we only have that in the RA cost table.=0A= + Assume sse_to_integer/integer_to_sse are the same which = they=0A= + are at the moment. */=0A= + cost +=3D n_integer_to_sse * ix86_cost->sse_to_integer;=0A= + }=0A= + else if (TARGET_64BIT || smode =3D=3D SImode)=0A= + {=0A= + cost +=3D n_sse_to_integer * COSTS_N_BYTES (4);=0A= + cost +=3D n_integer_to_sse * COSTS_N_BYTES (4);=0A= + }=0A= + else if (TARGET_SSE4_1)=0A= + {=0A= + /* vmovd (4 bytes) + vpextrd (6 bytes). */=0A= + cost +=3D n_sse_to_integer * COSTS_N_BYTES (10);=0A= + /* vmovd (4 bytes) + vpinsrd (6 bytes). */=0A= + cost +=3D n_integer_to_sse * COSTS_N_BYTES (10);=0A= + }=0A= + else=0A= + {=0A= + /* movd (4 bytes) + psrlq (5 bytes) + movd (4 bytes). */=0A= + cost +=3D n_sse_to_integer * COSTS_N_BYTES (13);=0A= + /* movd (4 bytes) + movd (4 bytes) + unpckldq (4 bytes). */=0A= + cost +=3D n_integer_to_sse * COSTS_N_BYTES (12);=0A= + }=0A= =0A= if (dump_file)=0A= fprintf (dump_file, " Registers conversion cost: %d\n", cost);=0A= ------=_NextPart_000_0088_01DA05C8.43E01D10--