From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <amonakov@ispras.ru>
Received: from mail.ispras.ru (mail.ispras.ru [83.149.199.84])
 by sourceware.org (Postfix) with ESMTPS id D59663850425
 for <gcc-help@gcc.gnu.org>; Wed, 11 May 2022 13:26:08 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D59663850425
Received: from [10.10.3.121] (unknown [10.10.3.121])
 by mail.ispras.ru (Postfix) with ESMTPS id C357B4076265;
 Wed, 11 May 2022 13:26:05 +0000 (UTC)
Date: Wed, 11 May 2022 16:26:05 +0300 (MSK)
From: Alexander Monakov <amonakov@ispras.ru>
To: Paul Zimmermann <Paul.Zimmermann@inria.fr>
cc: gcc-help@gcc.gnu.org, stephane.glondu@inria.fr, marc.glisse@inria.fr, 
 sibid@uvic.ca
Subject: Re: slowdown with -std=gnu18 with respect to -std=c99
In-Reply-To: <2b4e81-fee-e79-5ea0-bf658f20b4c2@ispras.ru>
Message-ID: <9b56647e-46bb-9a79-d9a0-439e2f35ee27@ispras.ru>
References: <mw1qxbc954.fsf@tomate.loria.fr>
 <9f7e3aa9-8d46-1fbb-75b-1c8ad9a667f@ispras.ru>
 <c8517377-e695-06fe-0be4-b7e409d471b9@inria.fr>
 <c1a686bd-d2fa-a934-f931-6bf96f11e3d9@inria.fr>
 <4d36d96-2de9-f8ac-2d52-ea32b1cc6d9@grove.saclay.inria.fr>
 <74dc894-7774-e5bb-81-c5955c94ee4@ispras.ru> <mw7d6zf6iv.fsf@tomate.loria.fr>
 <2b4e81-fee-e79-5ea0-bf658f20b4c2@ispras.ru>
MIME-Version: 1.0
X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS,
 KAM_NUMSUBJECT, SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
Content-Type: text/plain; charset=ISO-8859-15
Content-Transfer-Encoding: 8BIT
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
X-BeenThere: gcc-help@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-help mailing list <gcc-help.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-help>,
 <mailto:gcc-help-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-help/>
List-Post: <mailto:gcc-help@gcc.gnu.org>
List-Help: <mailto:gcc-help-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-help>,
 <mailto:gcc-help-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Wed, 11 May 2022 13:26:10 -0000

On Fri, 6 May 2022, Alexander Monakov wrote:

> The primary issue here is false dependency on vcvtss2sd instruction. In the
> snippet shown in Stéphane's email, the slower variant begins with
> 
>     vcvtss2sd   -0x4(%rsp),%xmm1,%xmm1
> 
> The cvtss2sd instruction is specified to take the upper bits of SSE register
> unmodified, so here it merges high bits of xmm1 with results of float->double
> conversion (in low bits) into new xmm1. Unless the CPU can track dependencies
> separately for vector register components, it has to delay this instruction
> until the previous computation that modified xmm1 has completed (AMD Zen2 is
> an example of a microarchitecture that apparently can).

For future reference, my statement in parenthesis was a bit inaccurate: Zen 2
avoids the false dependency provided that xmm1 carries all-zeroes in high bits
after being idiomatically zeroed (i.e. via pxor). Thanks to Andreas Abel for
pointing out there's a limitation.

(nevertheless, the "blessed" state seemingly survives context switches, so
it's quite useful, including this testcase)

Alexander