From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <stefan.kanthak@nexgo.de>
Received: from smtpout2.vodafonemail.de (smtpout2.vodafonemail.de
 [145.253.239.133])
 by sourceware.org (Postfix) with ESMTPS id 5E50C385C421
 for <gcc@gcc.gnu.org>; Fri,  6 Aug 2021 14:42:30 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5E50C385C421
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=nexgo.de
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nexgo.de
Received: from smtp.vodafone.de (smtpa08.fra-mediabeam.com [10.2.0.39])
 by smtpout2.vodafonemail.de (Postfix) with ESMTP id B684C127C81;
 Fri,  6 Aug 2021 16:41:33 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nexgo.de;
 s=vfde-smtpout-mb-15sep; t=1628260893;
 bh=MtSw2SVVVHVX20Ajq/CN45+HP/CwGTio4lx/wqnz61Y=;
 h=From:To:Cc:References:In-Reply-To:Subject:Date;
 b=MACi79wT+ZcbWpz/aW/upcXD52I9n6XfXdFGabWjSdSdtqlkX9f99UdhbMPrETzsU
 JNCqSI/UOq1PzjliAxAsfWXA+sYKXTo1xAOxxb7As2FqkfzStEH3i4VmGKTPs1pepb
 R5ASXHgXOVY8MeT1WC2yRHDMPARFymJ+dvfxb8f4=
Received: from H270 (p5b38f1bc.dip0.t-ipconnect.de [91.56.241.188])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits))
 (No client certificate requested)
 by smtp.vodafone.de (Postfix) with ESMTPSA id 0F1EA14029E;
 Fri,  6 Aug 2021 14:41:33 +0000 (UTC)
Message-ID: <441B0909922042ECA7BFC320B7238D6F@H270>
From: "Stefan Kanthak" <stefan.kanthak@nexgo.de>
To: "Michael Matz" <matz@suse.de>
Cc: "Gabriel Paubert" <paubert@iram.es>,
	<gcc@gcc.gnu.org>
References: <E39350A14D6B4ED0ADD88004A3B5CC51@H270>
 <20210805094243.GA340@lt-gp.iram.es> <4C9973413BE546A0A847447291A2977F@H270>
 <20210805135908.GA10666@lt-gp.iram.es>
 <28386DCA46C94330BF16F227D2B01536@H270>
 <alpine.LSU.2.20.2108061317160.3476@wotan.suse.de>
In-Reply-To: <alpine.LSU.2.20.2108061317160.3476@wotan.suse.de>
Subject: Re: Suboptimal code generated for __buitlin_trunc on AMD64 without
 SS4_4.1
Date: Fri, 6 Aug 2021 16:32:48 +0200
Organization: Me, myself & IT
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Windows Mail 6.0.6002.18197
X-MimeOLE: Produced By Microsoft MimeOLE V6.1.7601.24158
X-purgate-type: clean
X-purgate-Ad: Categorized by eleven eXpurgate (R) http://www.eleven.de
X-purgate: This mail is considered clean (visit http://www.eleven.de for
 further information)
X-purgate: clean
X-purgate-size: 2040
X-purgate-ID: 155817::1628260893-00007455-6EB12725/0/0
X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_NUMSUBJECT, RCVD_IN_DNSWL_LOW,
 RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc mailing list <gcc.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <mailto:gcc-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Fri, 06 Aug 2021 14:42:32 -0000

Michael Matz <matz@suse.de> wrote:


> Hello,
> 
> On Fri, 6 Aug 2021, Stefan Kanthak wrote:
> 
>> For -ffast-math, where the sign of -0.0 is not handled and the spurios
>> invalid floating-point exception for |argument| >= 2**63 is acceptable,
> 
> This claim would need to be proven in the wild.

I should have left the "when" after the "and" which I originally had
written...

> |argument| > 2**52 are already integer, and shouldn't generate a spurious
> exception from the various to-int conversions, not even in fast-math mode
> for some relevant set of applications (at least SPECcpu).
> 
> Btw, have you made speed measurements with your improvements?

No.

> The size improvements are obvious, but speed changes can be fairly
> unintuitive, e.g. there were old K8 CPUs where the memory loads for
> constants are actually faster than the equivalent sequence of shifting
> and masking for the >= compares.  That's an irrelevant CPU now, but it
> shows that intuition about speed consequences can be wrong.

I know. I also know of CPUs that can't load a 16-byte wide XMM register
in one go, but had to split the load into 2 8-byte loads.

If the constant happens to be present in L1 cache, it MAY load as fast
as an immediate.
BUT: on current CPUs, the code GCC generates

        movsd  .LC1(%rip), %xmm2
        movsd  .LC0(%rip), %xmm4
        movapd %xmm0, %xmm3
        movapd %xmm0, %xmm1
        andpd  %xmm2, %xmm3
        ucomisd %xmm3, %xmm4
        jbe    38 <_trunc+0x38>
 
needs
- 4 cycles if the movsd are executed in parallel and the movapd are
  handled by the register renamer,
- 5 cycles if the movsd and the movapd are executed in parallel,
- 7 cycles else,
plus an unknown number of cycles if the constants are not in L1.
The proposed

        movq   rax, xmm0
        add    rax, rax
        shr    rax, 53
        cmp    eax, 53+1023
        jae    return

needs 5 cycles (moves from XMM to GPR are AFAIK not handled by the
register renamer).

Stefan