From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x136.google.com (mail-lf1-x136.google.com [IPv6:2a00:1450:4864:20::136]) by sourceware.org (Postfix) with ESMTPS id 1525D3858C33 for ; Thu, 20 Jul 2023 08:11:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1525D3858C33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-lf1-x136.google.com with SMTP id 2adb3069b0e04-4faaaa476a9so744496e87.2 for ; Thu, 20 Jul 2023 01:11:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1689840659; x=1690445459; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=o7kFvzl8+zxqcVUAXkPCT732c4JxbI2N+MUQj+ROwK4=; b=KaBqxGYAJMjwE2495rMQjwWLVjCyzUco426GBTxb1LRvIjqsN7zdbwVBuWRT9Wuu60 8tgJTsF2RrJEaJUc2Mds3CRMBWUHnJDclrXWSZ/IRKbSYXJU2gVkIzKGJoxS9VEMJOzm 94A7rvKHrQWp4NN8v32aGQSOnJUiMO0D3SWrSzDD0riZ2T42KOQ63TXaoUVIxu44VW1g bJP+u0GAod6HQwjqZhn04SE0td/+zNUJSkrkAjvfLmLMnZWa6EGQv0h4Nk8N0Ci+NCkO tKQkeBLZ5YmzT3lZPhQh/cxkMg+iafPh4qFT5rqPU8aqKB9coaaT36wTaWZd4FZuygMy v8DQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689840659; x=1690445459; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=o7kFvzl8+zxqcVUAXkPCT732c4JxbI2N+MUQj+ROwK4=; b=R3KA1tYa2ZbyrLIr8O2eSw/HcLizEgTawYv1wAH7MJRorLMj/C0UihDxXz1X6uEVFg ildPK+Og+gtHXL48sDX3lYAkzZasNlUdVwPeZYFN1FpInUfwI2LR6riB5p7eKcEsAegT PL7Oz6zDKgzNu6PGSlBoE2FlV2+OI2qTlulmzq4adIRmNPy0xkj2ZqU5WipZ/YiR+MZO bF/FzqaUBbddLhRTvmEGmbKdC9Tq78hNaTZ/wJjSJFOvB37UyY769Yb41o5VYxMB/Ahe XmAx2zzrbeQ0txXEJNhv9QReujNErB+IiyvRv1H/eA6llbnXlR4fu482wU9u0ArfM1LW wETw== X-Gm-Message-State: ABy/qLYRQ5s79GZ17KOP20OwNQKG8mvblZApN3D6zFaX9A69le2KdwCz opZm/Wfwdj9VwDmUU4x/1LGFPkNnNw2CdagHrqo= X-Google-Smtp-Source: APBJJlHmEQw/VdnJkJ/AbX2q8L6Rwqn80S6SXhVd78KJIw7fE51euqIrHrPMuPSa/rSu7vCWoL2q8n0hXXRP6YcsRBM= X-Received: by 2002:a05:6512:694:b0:4f8:742f:3bed with SMTP id t20-20020a056512069400b004f8742f3bedmr1204101lfe.37.1689840658661; Thu, 20 Jul 2023 01:10:58 -0700 (PDT) MIME-Version: 1.0 References: <20230720073516.2171485-1-hongtao.liu@intel.com> In-Reply-To: <20230720073516.2171485-1-hongtao.liu@intel.com> From: Uros Bizjak Date: Thu, 20 Jul 2023 10:10:47 +0200 Message-ID: Subject: Re: [PATCH] Optimize vlddqu to vmovdqu for TARGET_AVX To: liuhongt Cc: gcc-patches@gcc.gnu.org, hubicka@ucw.cz Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-7.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, Jul 20, 2023 at 9:35=E2=80=AFAM liuhongt wr= ote: > > For Intel processors, after TARGET_AVX, vmovdqu is optimized as fast > as vlddqu, UNSPEC_LDDQU can be removed to enable more optimizations. > Can someone confirm this with AMD folks? > If AMD doesn't like such optimization, I'll put my optimization under > micro-architecture tuning. The instruction is reachable only as __builtin_ia32_lddqu* (aka _mm_lddqu_si*), so it was chosen by the programmer for a reason. I think that in this case, the compiler should not be too smart and change the instruction behind the programmer's back. The caveats are also explained at length in the ISA manual. Uros. > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > If AMD also like such optimization, Ok for trunk? > > gcc/ChangeLog: > > * config/i386/sse.md (_lddqu): Change to > define_expand, expand as simple move when TARGET_AVX > && ( =3D=3D 16 || !TARGET_AVX256_SPLIT_UNALIGNED_LOAD)= . > The original define_insn is renamed to > .. > (_lddqu): .. this. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/vlddqu_vinserti128.c: New test. > --- > gcc/config/i386/sse.md | 15 ++++++++++++++- > .../gcc.target/i386/vlddqu_vinserti128.c | 11 +++++++++++ > 2 files changed, 25 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.target/i386/vlddqu_vinserti128.c > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > index 2d81347c7b6..d571a78f4c4 100644 > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -1835,7 +1835,20 @@ (define_peephole2 > [(set (match_dup 4) (match_dup 1))] > "operands[4] =3D adjust_address (operands[0], V2DFmode, 0);") > > -(define_insn "_lddqu" > +(define_expand "_lddqu" > + [(set (match_operand:VI1 0 "register_operand") > + (unspec:VI1 [(match_operand:VI1 1 "memory_operand")] > + UNSPEC_LDDQU))] > + "TARGET_SSE3" > +{ > + if (TARGET_AVX && ( =3D=3D 16 || !TARGET_AVX256_SPLIT_UNALI= GNED_LOAD)) > + { > + emit_move_insn (operands[0], operands[1]); > + DONE; > + } > +}) > + > +(define_insn "*_lddqu" > [(set (match_operand:VI1 0 "register_operand" "=3Dx") > (unspec:VI1 [(match_operand:VI1 1 "memory_operand" "m")] > UNSPEC_LDDQU))] > diff --git a/gcc/testsuite/gcc.target/i386/vlddqu_vinserti128.c b/gcc/tes= tsuite/gcc.target/i386/vlddqu_vinserti128.c > new file mode 100644 > index 00000000000..29699a5fa7f > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/vlddqu_vinserti128.c > @@ -0,0 +1,11 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx2 -O2" } */ > +/* { dg-final { scan-assembler-times "vbroadcasti128" 1 } } */ > +/* { dg-final { scan-assembler-not {(?n)vlddqu.*xmm} } } */ > + > +#include > +__m256i foo(void *data) { > + __m128i X1 =3D _mm_lddqu_si128((__m128i*)data); > + __m256i V1 =3D _mm256_broadcastsi128_si256 (X1); > + return V1; > +} > -- > 2.39.1.388.g2fc9e9ca3c >