From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <hubicka@kam.mff.cuni.cz>
Received: from akamas.troja.mff.cuni.cz (akamas.n.mff.cuni.cz [195.113.16.19])
 by sourceware.org (Postfix) with ESMTPS id 0A2EB385ED40;
 Thu, 27 Jan 2022 12:04:36 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0A2EB385ED40
Received: from nikam.ms.mff.cuni.cz (nikam.kam.mff.cuni.cz [195.113.17.177])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by akamas.troja.mff.cuni.cz (Postfix) with ESMTPS id 7823040067;
 Thu, 27 Jan 2022 13:04:34 +0100 (CET)
Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202)
 id 720CD2812EC; Thu, 27 Jan 2022 13:04:34 +0100 (CET)
Date: Thu, 27 Jan 2022 13:04:34 +0100
From: Jan Hubicka <hubicka@kam.mff.cuni.cz>
To: rguenther at suse dot de <gcc-bugzilla@gcc.gnu.org>
Cc: gcc-bugs@gcc.gnu.org
Subject: Re: [Bug rtl-optimization/102178] [12 Regression] SPECFP 2006
 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22
Message-ID: <YfKKUj9Kvx1j8xtE@kam.mff.cuni.cz>
References: <bug-102178-4@http.gcc.gnu.org/bugzilla/>
 <bug-102178-4-ndb9EebTFx@http.gcc.gnu.org/bugzilla/>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <bug-102178-4-ndb9EebTFx@http.gcc.gnu.org/bugzilla/>
X-Spam-Status: No, score=-3.6 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, KAM_NUMSUBJECT, SPF_HELO_NONE, SPF_NONE,
 TXREP autolearn=no autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 27 Jan 2022 12:04:37 -0000

> I would say so.  It saves code size and also uop space unless the two
> can magically fuse to a immediate to %xmm move (I doubt that).
I made simple benchmark

double a=10;
int
main()
{
        long int i;
        double sum,val1,val2,val3,val4;
         for (i=0;i<1000000000;i++)
         {
#if 1
#if 1
                asm __volatile__("movabsq $0x3ff03db8fde2ef4e, %%r8;vmovq   %%r8, %0": "=x"(val1): :"r8","xmm11");
                asm __volatile__("movabsq $0x3ff03db8fde2ef4e, %%r8;vmovq   %%r8, %0": "=x"(val2): :"r8","xmm11");
                asm __volatile__("movabsq $0x3ff03db8fde2ef4e, %%r8;vmovq   %%r8, %0": "=x"(val3): :"r8","xmm11");
                asm __volatile__("movabsq $0x3ff03db8fde2ef4e, %%r8;vmovq   %%r8, %0": "=x"(val4): :"r8","xmm11");
#else
                asm __volatile__("movq %1, %%r8;vmovq   %%r8, %0": "=x"(val1):"m"(a) :"r8","xmm11");
                asm __volatile__("movq %1, %%r8;vmovq   %%r8, %0": "=x"(val2):"m"(a) :"r8","xmm11");
                asm __volatile__("movq %1, %%r8;vmovq   %%r8, %0": "=x"(val3):"m"(a) :"r8","xmm11");
                asm __volatile__("movq %1, %%r8;vmovq   %%r8, %0": "=x"(val4):"m"(a) :"r8","xmm11");
#endif
#else
                asm __volatile__("vmovq   %1, %0": "=x"(val1):"m"(a) :"r8","xmm11");
                asm __volatile__("vmovq   %1, %0": "=x"(val2):"m"(a) :"r8","xmm11");
                asm __volatile__("vmovq   %1, %0": "=x"(val3):"m"(a) :"r8","xmm11");
                asm __volatile__("vmovq   %1, %0": "=x"(val4):"m"(a) :"r8","xmm11");
#endif
                sum+=val1+val2+val3+val4;
                 }
         return sum;

and indeed the third variant runs 1.2s while the first two takes equal
time 2.4s on my zen2 laptop.