From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yb1-xb2f.google.com (mail-yb1-xb2f.google.com [IPv6:2607:f8b0:4864:20::b2f]) by sourceware.org (Postfix) with ESMTPS id D34023858C41 for ; Mon, 21 Aug 2023 13:09:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D34023858C41 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-yb1-xb2f.google.com with SMTP id 3f1490d57ef6-d7225259f52so3408423276.0 for ; Mon, 21 Aug 2023 06:09:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692623341; x=1693228141; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=5afsX+S9l91I1g6fNAQ2eAz1/KDbdh4XFCWGv3H5NUE=; b=A02a+hJGRvqsAEup/0dQWfnxnJd7usDN2YjsuR8eu2XO2O5SJFeTTPCPg9HGsI9d19 AV1pwWaS3789F14SaIZw7Sba+7EwL0pdw49YGKaAkJiShzGG3AtduT4wyu7vcF6rgzOL pOdKMzo/ZJrL4I4WrUYY8zdBni4Bpc2k4SJGz4ns4LI7aILwW7HEKModiLL/banUFB9t Bah4CUPq0lpDj9qDMNHw6OdDwPDkbyGM3BnrXcJ1TSBFL/Q/+lsXZFdte7j/5pucjpAG Acracb9ljSWOIxL0NdczuN17CtNzPQwIe5OrO03YR5uBdJSUokI7mubqv8PJCOqBg0oO ifCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692623341; x=1693228141; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5afsX+S9l91I1g6fNAQ2eAz1/KDbdh4XFCWGv3H5NUE=; b=i0PiqvOet5az2+Mg7qzam7QmSYWnlwMlcBh+gH9wkQ1NiBCwUnZRa1X3HVCMi/s97y uln6aEnUX1jZUdqODWdhGo5Z9587dXOfpIXjK/8yJ5XSWMNqHU/DcKfFkQJaHGfW3ip6 gb3l7VdZ0TD6Tgw3i9IR+fMBQ46Jb43SHO9lNC3Y+8dXqcGLnVGEOqTwZ0GC1lq8fcN8 9Tp9EDOSFNZfjJdOUWNf1H/irXsVdDRk4pP+ZZCHOqBypypANyyinIgZsTHt3YWjuyLr cnOwHIfX3n/2fOc1OX7KQfLONjLiMs4h3x+I8WmAV0rpnr+Oc1c+SjNSUySz1qvTXBKV E67g== X-Gm-Message-State: AOJu0Yx4I4JtldqTyV/Ao5/Pwke4SlRAVdv/AW7gaDoELxcGkQu+P9N6 iP3zSNPWzQ4Y6xzdUsm91020f5ci4ehrzB03FAo= X-Google-Smtp-Source: AGHT+IGkcBiCMJ8qU443nfTbOKuj0SEM8d7H9njgElRCoVda8kvaaNAK7b8FdMlpVh4yeRlapdxHqj9JYExgCHvZYRU= X-Received: by 2002:a25:d794:0:b0:d49:bfe4:9c50 with SMTP id o142-20020a25d794000000b00d49bfe49c50mr5703336ybg.18.1692623341224; Mon, 21 Aug 2023 06:09:01 -0700 (PDT) MIME-Version: 1.0 References: <20230821122515.5A627385697D@sourceware.org> In-Reply-To: From: Hongtao Liu Date: Mon, 21 Aug 2023 21:08:49 +0800 Message-ID: Subject: Re: [PATCH] Fix FAIL: gcc.target/i386/pr87007-5.c To: Richard Biener Cc: gcc-patches@gcc.gnu.org, hongtao.liu@intel.com, hjl.tools@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-7.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, Aug 21, 2023 at 8:59=E2=80=AFPM Richard Biener = wrote: > > On Mon, 21 Aug 2023, Hongtao Liu wrote: > > > On Mon, Aug 21, 2023 at 8:25?PM Richard Biener via Gcc-patches > > wrote: > > > > > > The following fixes the gcc.target/i386/pr87007-5.c testcase which > > > changed code generation again after the recent sinking improvements. > > > We now have > > > > > > vxorps %xmm0, %xmm0, %xmm0 > > > vsqrtsd d2(%rip), %xmm0, %xmm0 > > > > > > and an unnecessary xor again in one case, the other vsqrtsd has > > > a register source and a properly zeroing load: > > > > > > vmovsd d3(%rip), %xmm0 > > > testl %esi, %esi > > > jg .L11 > > > .L3: > > > vsqrtsd %xmm0, %xmm0, %xmm0 > > > > > > the following patch XFAILs the scan. I'm not sure what's at > > > fault here, there are no loops in the CFG, but somehow > > > r84:DF=3Dsqrt(['d2']) gets a pxor but r84:DF=3Dsqrt(r83:DF) > > > doesn't. I guess I don't really understand what > > > remove_partial_avx_dependency is supposed to do so can't > > > really assess whether the pxor is necessary or not. > > There's a false dependency on xmm0 when the source operand in the > > pattern is memory, the pattern only takes xmm0 as dest, but the output > > instruction takes xmm0 also as input(the second source operand), > > that's why we need an pxor here. > > OK, so XFAIL is wrong, we should instead scan for one xorps then > (like it was in the past). > > > When the source operand in the pattern is register_operand, we can > > reuse the register_operand for the second source operand. The > > instructions here are not very obvious, the more representative one > > should be vsqrtsd %xmm1, %xmm1(rused one), %xmm0. > > > > > > OK? > > Can we add -fno-XXX to disable the optimization to make the assembly > > more stable? > > Not sure, we could feed GIMPLE IR to RTL expansion instead of > feeding a complex testcase through the pipeline, but I'm not sure > what we were originally supposed to test (the PR trail is a bit > large). > > > Or current codegen should be optimal(for the sinking), then Ok for the = patch. > > So like the following (I've just adjusted the comments to reflect the > pxor is necessary). > > OK? OK. > > Richard. > > From 7bed9399ae736c20a677ccf7e7fc4d2751a32327 Mon Sep 17 00:00:00 2001 > From: Richard Biener > Date: Mon, 21 Aug 2023 14:09:48 +0200 > Subject: [PATCH] Fix FAIL: gcc.target/i386/pr87007-5.c > To: gcc-patches@gcc.gnu.org > > The following fixes the gcc.target/i386/pr87007-5.c testcase which > changed code generation again after the recent sinking improvements. > We now have > > vxorps %xmm0, %xmm0, %xmm0 > vsqrtsd d2(%rip), %xmm0, %xmm0 > > and a necessary xor again in one case, the other vsqrtsd has > a register source and a properly zeroing load: > > vmovsd d3(%rip), %xmm0 > testl %esi, %esi > jg .L11 > .L3: > vsqrtsd %xmm0, %xmm0, %xmm0 > > the following patch adjusts the scan. > > * gcc.target/i386/pr87007-5.c: Update comment, adjust subtest. > --- > gcc/testsuite/gcc.target/i386/pr87007-5.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c b/gcc/testsuite/gc= c.target/i386/pr87007-5.c > index a6cdf11522e..8f2dc947f6c 100644 > --- a/gcc/testsuite/gcc.target/i386/pr87007-5.c > +++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c > @@ -1,6 +1,8 @@ > /* { dg-do compile } */ > /* { dg-options "-Ofast -march=3Dskylake-avx512 -mfpmath=3Dsse -fno-tree= -vectorize -fdump-tree-cddce3-details -fdump-tree-lsplit-optimized" } */ > -/* Load of d2/d3 is hoisted out, vrndscalesd will reuse loades register = to avoid partial dependence. */ > +/* Load of d2/d3 is hoisted out, the loop is split, store of d1 and sqrt > + are sunk out of the loop and the loop is elided. One vsqrtsd with > + memory operand needs a xor to avoid partial dependence. */ > > #include > > @@ -17,4 +19,4 @@ foo (int n, int k) > > /* { dg-final { scan-tree-dump "optimized: loop split" "lsplit" } } */ > /* { dg-final { scan-tree-dump-times "removing loop" 2 "cddce3" } } */ > -/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } = */ > +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } = */ > -- > 2.35.3 > --=20 BR, Hongtao