From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yw1-x1132.google.com (mail-yw1-x1132.google.com [IPv6:2607:f8b0:4864:20::1132]) by sourceware.org (Postfix) with ESMTPS id 465973858D35 for ; Mon, 8 Jan 2024 01:50:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 465973858D35 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 465973858D35 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::1132 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1704678643; cv=none; b=j9SaDI3FfORvFLL/BCks7mKZDIREMfT0LvJ1qhZ7mBz/zwxMwoJ9Tyrq10X+t4hoz39PADfR/MZ+Osnm7rLGMFeb0xn1w5f7sN9zvtQ78j0BbwsKpLdpB9IHzh0Em/vIwLl91TfnXy0M5i9Y4ek2ywLh4cz/zVXF2Mha3uUzi0M= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1704678643; c=relaxed/simple; bh=CMcSwTLz3yUxyYiR6QVWgO9cxSIn3yTMqNga9P6AY18=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=UOxItIag6hj8LqBzDuQ3W1s0v3R6NwezYJShtcfRzeXW3V9sZHTqSzoZ3rI/K56Ycma2ycRhVnHVre+Hp0dSBl22ljbPUKsz4dP+JGNq3fgt8Wqb7UQ3YiI0zrhNJS7G8gQ0tnxTkhcJXAkhHz7Pkt8r2ELBH0pZLJ/mFoS3h08= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-yw1-x1132.google.com with SMTP id 00721157ae682-5e7f0bf46a2so9007117b3.1 for ; Sun, 07 Jan 2024 17:50:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1704678638; x=1705283438; darn=gcc.gnu.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=lXt0ro/1A0rNiS4cu6Kfx2jZEvX6KabA7Vjqhh8tqJk=; b=m18MSGmFs6KSGSkjz+DDMLeVoZzGkA3rOOWRUuSoyKWWMv2JkN5eoN3DcC+pJHlcsA kgpFVddym1Ly7JWEXDkCD4gbdzg1Tf1lyqqhkQ7l3TBen50MKMWmi8f/BMenfz2H+8RQ AbTnkRqWWZC8RNhhu7WjndUGdYTAvKXns945xxaqcYa+DK728Vqv6J4ymkAYhZKk2c98 8ab1StTCzbkjDc7VXb8bmMfYKQp32+MVYQwpwrtCWnFz2poUDH76UmJllNqiP8HDpvt/ PibM8FUdgsO2zeccIBYiVA2vEqMjapSNt1/QNuaGNwJs+s48GuWeMGA0mWdvTf3Gwijf FlAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704678638; x=1705283438; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lXt0ro/1A0rNiS4cu6Kfx2jZEvX6KabA7Vjqhh8tqJk=; b=YEiW4jabI0YjbuFHC9yViqVufRYm19eu0fa5/RAlwYLbz8tv6S6f0rLfAIw5kUHe7a q41FoMoQflBIVuzoK8IKPdYVZ5RbdkEEyKteglPczzWp5DMtHwwsWIEXvQjqLBnZyH3P KCM0ttPzVOrkM1uGPNo4J0OiGg+1omqlsMpttL8tzEh/gbbxCyiKEUXmh/QBXTjkjN1T aBwTQB2UAK6tMYIzi8zapFlo/pjWU6nm8FRa5p974sA8miXqJoN7GTxltKKZxeI94/IE NotBkGOW7W+EXpkBZW20NCRq2m0wADgZ7vClMGggz4ObJ2X0g/EthNaeZOYzw4WEIC0c CH9Q== X-Gm-Message-State: AOJu0Yxoq9Mf6kS6Z01ZAoZ41CyJ8HYh3qdkp6tu5UXVd7N1GX0jLelF qcDkc8vNdhYUVXr608FwGA63ZkERWBWUYfizqDQdB+CpU40= X-Google-Smtp-Source: AGHT+IF7l6qhB2Da53HAo4Zl9oPACLIUOlVYWXqL/KcVjqtOrtQuUdLdfAP0H+uCENxEmAuRPHqzk/5A+80jbeAHgZg= X-Received: by 2002:a0d:f3c2:0:b0:5ef:2cf6:959a with SMTP id c185-20020a0df3c2000000b005ef2cf6959amr1176010ywf.86.1704678638555; Sun, 07 Jan 2024 17:50:38 -0800 (PST) MIME-Version: 1.0 References: <027c01da34c1$369974d0$a3cc5e70$@nextmovesoftware.com> <01e401da40f3$3ac28c70$b047a550$@nextmovesoftware.com> In-Reply-To: <01e401da40f3$3ac28c70$b047a550$@nextmovesoftware.com> From: Hongtao Liu Date: Mon, 8 Jan 2024 09:58:56 +0800 Message-ID: Subject: Re: [x86_64 PATCH] PR target/112992: Optimize mode for broadcast of constants. To: Roger Sayle Cc: gcc-patches@gcc.gnu.org, Uros Bizjak Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Sun, Jan 7, 2024 at 6:53=E2=80=AFAM Roger Sayle wrote: > > Hi Hongtao, > > Many thanks for the review. This revised patch implements several > of your suggestions, specifically to use pshufd for V4SImode and > punpcklqdq for V2DImode. These changes are demonstrated by the > examples below: > > typedef unsigned int v4si __attribute((vector_size(16))); > typedef unsigned long long v2di __attribute((vector_size(16))); > > v4si foo() { return (v4si){1,1,1,1}; } > v2di bar() { return (v2di){1,1}; } > > The previous version of my patch generated: > > foo: movdqa .LC0(%rip), %xmm0 > ret > bar: movdqa .LC1(%rip), %xmm0 > ret > > with this revised version, -O2 generates: > > foo: movl $1, %eax > movd %eax, %xmm0 > pshufd $0, %xmm0, %xmm0 > ret > bar: movl $1, %eax > movq %rax, %xmm0 > punpcklqdq %xmm0, %xmm0 > ret > > However, if it's OK with you, I'd prefer to allow this function to > return false, safely falling back to emitting a vector load from > the constant bool rather than ICEing from a gcc_assert. For one Sure, that makes sense. > thing this isn't a unrecoverable correctness issue, but at worst > a missed optimization. The deeper reason is that this usefully > provides a handle for tuning on different microarchitectures. > On some (AMD?) machines, where !TARGET_INTER_UNIT_MOVES_TO_VEC, > the first form above may be preferable to the second. Currently > the start of ix86_convert_const_wide_int_to_broadcast disables > broadcasts for !TARGET_INTER_UNIT_MOVES_TO_VEC even when an > implementation doesn't reuire an inter unit move, such as a > broadcast from memory. I plan follow-up patches that benefit > from this flexibility. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=3Dunix{-m32} > with no new failures. Ok for mainline? Ok. > > gcc/ChangeLog > PR target/112992 > * config/i386/i386-expand.cc > (ix86_convert_const_wide_int_to_broadcast): Allow call to > ix86_expand_vector_init_duplicate to fail, and return NULL_RTX. > (ix86_broadcast_from_constant): Revert recent change; Return a > suitable MEMREF independently of mode/target combinations. > (ix86_expand_vector_move): Allow ix86_expand_vector_init_duplicat= e > to decide whether expansion is possible/preferrable. Only try > forcing DImode constants to memory (and trying again) if calling > ix86_expand_vector_init_duplicate fails with an DImode immediate > constant. > (ix86_expand_vector_init_duplicate) : Try using > V4SImode for suitable immediate constants. > : Try using V8SImode for suitable constants. > : Fail for CONST_INT_P, i.e. use constant pool. > : Likewise. > : For CONST_INT_P try using V4SImode via widen. > : For CONT_INT_P try using V8HImode via widen. >