From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 142FB3858C42 for ; Sat, 6 Jan 2024 22:53:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 142FB3858C42 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 142FB3858C42 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=162.254.253.69 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1704581642; cv=none; b=trFj1otgHAN6xCJssVHY6zQorh4HeA23fAP/1WX3oQRH0YkOYz9vG+l84tWav7p4falf5Hd+AK/YKZ22o5swqr9mmNwnlzlb2S48RbQ0efQOIGOn+HZoeueGwe14XXORsMFX6JUkH9L8BI+QHkcceL/CZUx8Y0XqCohRb5CsvdI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1704581642; c=relaxed/simple; bh=5dDcm5IrS1NiSLmMDmV7qu/FP38dpq21sAjHuW1dQkE=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=Phq8F+XmEIIGrVtfO75Rq7F6pMzmjZlUWf4yCnL7YkVJ+0vX1xDMWX1AhLEfIv9PnIbF6GEwACdpZpIF9iUfEMNUFIPOJCpuTp3lpvHH5Bg+DhTeig6WLZxhsJYOVbCT/nvOMyS/Aa9cLjCbHhe6Qc7Er3NIbiyMb2lf653ns5E= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:In-Reply-To:References:Cc:To:From:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=g+ZFkHesreBuB2VZLEOGTJoUcjjbvwljalMj8jY2M8k=; b=nSDgphiHrx9E44Uj+RjQiRo4TV /XeTjsVo9qI7GXx9wAvsWsod+gOoIMLZ7hdedzYifqtnSXnuYbug9E9AIyuMlckfIKv6doS9SWaBq 4bSOpD9eeHuhK1BoT2NwcSX3gMnK3Pi7LO1elCVKhfyKlUm9cQ6h0hf6lwgsmwooFLtxfrtjaKSsr 7cHil83jT2leH4E440vfJMPuo1/hdGfaxACDclcg80JCmKiOWiaFiB4UshDfHQi7JB6M0UfjP/Nz6 5yqj5YWwkABsNhVmGcZfcGdzCEgTJ4si0cJfz5iw+KQTzsWYBEhV9Px2ioRG4M7tVDTdQRtCuC5et vttBSVHg==; Received: from host109-154-238-190.range109-154.btcentralplus.com ([109.154.238.190]:49955 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96.2) (envelope-from ) id 1rMFYD-00006o-1O; Sat, 06 Jan 2024 17:53:57 -0500 From: "Roger Sayle" To: "'Hongtao Liu'" Cc: , "'Uros Bizjak'" References: <027c01da34c1$369974d0$a3cc5e70$@nextmovesoftware.com> In-Reply-To: Subject: RE: [x86_64 PATCH] PR target/112992: Optimize mode for broadcast of constants. Date: Sat, 6 Jan 2024 22:53:54 -0000 Message-ID: <01e401da40f3$3ac28c70$b047a550$@nextmovesoftware.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_01E5_01DA40F3.3AC28C70" X-Mailer: Microsoft Outlook 16.0 Thread-Index: AQE8zQXeMhCagPT3r8csPlUfOzifOwFhO1Pqsf1ACxA= Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This is a multipart message in MIME format. ------=_NextPart_000_01E5_01DA40F3.3AC28C70 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hi Hongtao, Many thanks for the review. This revised patch implements several of your suggestions, specifically to use pshufd for V4SImode and punpcklqdq for V2DImode. These changes are demonstrated by the examples below: typedef unsigned int v4si __attribute((vector_size(16))); typedef unsigned long long v2di __attribute((vector_size(16))); v4si foo() { return (v4si){1,1,1,1}; } v2di bar() { return (v2di){1,1}; } The previous version of my patch generated: foo: movdqa .LC0(%rip), %xmm0 ret bar: movdqa .LC1(%rip), %xmm0 ret with this revised version, -O2 generates: foo: movl $1, %eax movd %eax, %xmm0 pshufd $0, %xmm0, %xmm0 ret bar: movl $1, %eax movq %rax, %xmm0 punpcklqdq %xmm0, %xmm0 ret However, if it's OK with you, I'd prefer to allow this function to return false, safely falling back to emitting a vector load from the constant bool rather than ICEing from a gcc_assert. For one thing this isn't a unrecoverable correctness issue, but at worst a missed optimization. The deeper reason is that this usefully provides a handle for tuning on different microarchitectures. On some (AMD?) machines, where !TARGET_INTER_UNIT_MOVES_TO_VEC, the first form above may be preferable to the second. Currently the start of ix86_convert_const_wide_int_to_broadcast disables broadcasts for !TARGET_INTER_UNIT_MOVES_TO_VEC even when an implementation doesn't reuire an inter unit move, such as a broadcast from memory. I plan follow-up patches that benefit from this flexibility. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=3Dunix{-m32} with no new failures. Ok for mainline? gcc/ChangeLog PR target/112992 * config/i386/i386-expand.cc (ix86_convert_const_wide_int_to_broadcast): Allow call to ix86_expand_vector_init_duplicate to fail, and return NULL_RTX. (ix86_broadcast_from_constant): Revert recent change; Return a suitable MEMREF independently of mode/target combinations. (ix86_expand_vector_move): Allow = ix86_expand_vector_init_duplicate to decide whether expansion is possible/preferrable. Only try forcing DImode constants to memory (and trying again) if calling ix86_expand_vector_init_duplicate fails with an DImode immediate constant. (ix86_expand_vector_init_duplicate) : Try using V4SImode for suitable immediate constants. : Try using V8SImode for suitable constants. : Fail for CONST_INT_P, i.e. use constant pool. : Likewise. : For CONST_INT_P try using V4SImode via widen. : For CONT_INT_P try using V8HImode via widen.