From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id D959D3858D33 for ; Fri, 22 Dec 2023 10:25:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D959D3858D33 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D959D3858D33 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=162.254.253.69 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1703240748; cv=none; b=Q4AiyID0VIFEfhIdCUgVdQ+E90CxABeAgT2brE8bWi168nSGJzwYqn7bhiXJApvFSTDnLt3qVCVMfo9LJgIBfsrMM+i6Itm9eMLprAt/HHFCAFA/qq4QJ3yXaMI3ILAhUwdYxwN6keCeFXlUBpVcw0SBwVa3lKZfYvPnMmmkqSc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1703240748; c=relaxed/simple; bh=7R0McS09HzeLzaPUTd3En0BuCcy0oGC0ucte0GoSwUU=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=lD9MjFk3VE/dfRbA3CIYwvtbmQYUIGyzzQhKAHezqxKAUEW8TOpf1eUV9UmCVzvkoZzZI06D3zfAwdCzv+rQ78nD4CudanQwcZY+7D6pE//GVY1ppOQp/OrAWGHKOtfmBZmcrQlH41+7HLbM40jmogj6cNimnAuLesvJ57lo0iA= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=Lcswg7UayMnm5zDqnYWOBDTg6EiyZFmPbTTvLQK8+z8=; b=dOYAfhX0D5/E0wMmuZf9pLW6AP oXf/YIaxuxvcNmggOMHhhoUhzU1Gt17JkbvPu8cobUzfe519S3FV36xvwXrsMd8eyulSWmLW+wO7r KthsRRq0XBglc44i8FjO9DMs7Cp+xiLxMMEFuXhZT5QqcXmizBWgCfn1JtDlOZn5aFFMPDG11PB/l E5pNo34JCRSJcuhHPUcZ1PHGxykvjjYJgnjFKTumue51ivmmUiSsCVWEPS/XMIErQ+ByXaw+FmFhD wysz7qfNFZMxfxm/YHGwS70d5r73jdo7bGDXidLiYMFPFv4ou/UoWbw4244TbaO7SArXPGvm7mjmW B6fdb/sw==; Received: from [167.98.85.149] (port=64872 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96.2) (envelope-from ) id 1rGcis-0001c2-0h; Fri, 22 Dec 2023 05:25:42 -0500 From: "Roger Sayle" To: Cc: "'Uros Bizjak'" , "'Hongtao Liu'" Subject: [x86_64 PATCH] PR target/112992: Optimize mode for broadcast of constants. Date: Fri, 22 Dec 2023 10:25:39 -0000 Message-ID: <027c01da34c1$369974d0$a3cc5e70$@nextmovesoftware.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_027D_01DA34C1.369974D0" X-Mailer: Microsoft Outlook 16.0 Thread-Index: Ado0wGWyBk2xu6PtSN2iTB8D/E6WtQ== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-6.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_ABUSEAT,RCVD_IN_BL_SPAMCOP_NET,RCVD_IN_SBL_CSS,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This is a multipart message in MIME format. ------=_NextPart_000_027D_01DA34C1.369974D0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit This patch resolves the second part of PR target/112992, building upon Hongtao Liu's solution to the first part. The issue addressed by this patch is that when initializing vectors by broadcasting integer constants, the compiler has the flexibility to select the most appropriate vector mode to perform the broadcast, as long as the resulting vector has an identical bit pattern. For example, the following constants are all equivalent: V4SImode {0x01010101, 0x01010101, 0x01010101, 0x01010101 } V8HImode {0x0101, 0x0101, 0x0101, 0x0101, 0x0101, 0x0101, 0x0101, 0x0101 } V16QImode {0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, ... 0x01 } So instruction sequences that construct any of these can be used to construct the others (with a suitable cast/SUBREG). On x86_64, it turns out that broadcasts of SImode constants are preferred, as DImode constants often require a longer movabs instruction, and HImode and QImode broadcasts require multiple uops on some architectures. Hence, SImode is always the equal shortest/fastest implementation. Examples of this improvement, can be seen in the testsuite. gcc.target/i386/pr102021.c Before: 0: 48 b8 0c 00 0c 00 0c movabs $0xc000c000c000c,%rax 7: 00 0c 00 a: 62 f2 fd 28 7c c0 vpbroadcastq %rax,%ymm0 10: c3 retq After: 0: b8 0c 00 0c 00 mov $0xc000c,%eax 5: 62 f2 7d 28 7c c0 vpbroadcastd %eax,%ymm0 b: c3 retq and gcc.target/i386/pr90773-17.c: Before: 0: 48 8b 15 00 00 00 00 mov 0x0(%rip),%rdx # 7 7: b8 0c 00 00 00 mov $0xc,%eax c: 62 f2 7d 08 7a c0 vpbroadcastb %eax,%xmm0 12: 62 f1 7f 08 7f 02 vmovdqu8 %xmm0,(%rdx) 18: c7 42 0f 0c 0c 0c 0c movl $0xc0c0c0c,0xf(%rdx) 1f: c3 retq After: 0: 48 8b 15 00 00 00 00 mov 0x0(%rip),%rdx # 7 7: b8 0c 0c 0c 0c mov $0xc0c0c0c,%eax c: 62 f2 7d 08 7c c0 vpbroadcastd %eax,%xmm0 12: 62 f1 7f 08 7f 02 vmovdqu8 %xmm0,(%rdx) 18: c7 42 0f 0c 0c 0c 0c movl $0xc0c0c0c,0xf(%rdx) 1f: c3 retq where according to Agner Fog's instruction tables broadcastd is slightly faster on some microarchitectures, for example Knight's Landing. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2023-12-21 Roger Sayle gcc/ChangeLog PR target/112992 * config/i386/i386-expand.cc (ix86_convert_const_wide_int_to_broadcast): Allow call to ix86_expand_vector_init_duplicate to fail, and return NULL_RTX. (ix86_broadcast_from_constant): Revert recent change; Return a suitable MEMREF independently of mode/target combinations. (ix86_expand_vector_move): Allow ix86_expand_vector_init_duplicate to decide whether expansion is possible/preferrable. Only try forcing DImode constants to memory (and trying again) if calling ix86_expand_vector_init_duplicate fails with an DImode immediate constant. (ix86_expand_vector_init_duplicate) : Try using V4SImode for suitable immediate constants. : Try using V8SImode for suitable constants. : Use constant pool for AVX without AVX2. : Fail for CONST_INT_P, i.e. use constant pool. : Likewise. : For CONST_INT_P try using V4SImode via widen. : For CONT_INT_P try using V8HImode via widen.