From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf1-x435.google.com (mail-pf1-x435.google.com [IPv6:2607:f8b0:4864:20::435]) by sourceware.org (Postfix) with ESMTPS id 633FD3858D28 for ; Wed, 20 Mar 2024 19:34:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 633FD3858D28 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=dabbelt.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=dabbelt.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 633FD3858D28 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::435 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1710963279; cv=none; b=ir22vhaU3W9YNqDYXRLx8oAjhqISCFnLwpE2ZJT4aC8IqFK7g88jsaIA+OG4DNyIL+grD/+yCOj5R/wbEVwcKJUG6jtzLuhvnaU7x0G7ge5+GRNspl8UCNhkSXZke33YnwNM4UvO7/55TruOWG/67P6g2818ZFJeU1gT/MCuVQU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1710963279; c=relaxed/simple; bh=h1c2JxEziA/AOVIWxHcdJnmbAq+6XulGVOrB7Cff/DY=; h=DKIM-Signature:Date:Subject:From:To:Message-ID:Mime-Version; b=S4pXMHgOwvDiRCsAGA1ZSKFu5ws9yxJ5XckaH+e/O9Pp73yM7nk5pfYt9+bG3WdacFzBVPNC2a8I43osjIgP+PO7S0q8cZIUPZTXsPoUiucAXwO4wZKf+Qdb9540CRxqOJcDwxOCuqwBE7p+BEUErIO+brSgQuWXtV3SHkO/ujM= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pf1-x435.google.com with SMTP id d2e1a72fcca58-6e6ca2ac094so288708b3a.0 for ; Wed, 20 Mar 2024 12:34:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dabbelt-com.20230601.gappssmtp.com; s=20230601; t=1710963273; x=1711568073; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:message-id:to:from:cc :in-reply-to:subject:date:from:to:cc:subject:date:message-id :reply-to; bh=UcTrI4lkLvdTCPVcUK4Ra/4NMaaXbPw4IOsDwT+0krY=; b=M+aaClGpUntY38dBkL3c87gvuJmRxmHhmHGGQ7y9ALtn4y0wHX/8tT/J3MrM8vxt7d bgYaECxkK+EJbIIsn2hRkiYRB5C2Lk/JrT5Cst189XO+Tl0PUkI34wJwpi4i2BwGWpCB TKsPFs6DBWeLQz3kXWu0VTZfhmS5ld2tOfAnbks5/FiHJjK/JK0RusdcoaAlksZKtzuk EzJZ/jw43rgt4Ym/hW2o7Oj3NG6VGSNN8DKF0sG7mGNozB4MYGn1rqf7KntzF4IDKU00 SBTneBgQI+DP1+9ZMLEbJg/z1I8F2XdVWE1V5eT/iqcykE6MvEDH534YzQBQva92C5y8 yfqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710963273; x=1711568073; h=content-transfer-encoding:mime-version:message-id:to:from:cc :in-reply-to:subject:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=UcTrI4lkLvdTCPVcUK4Ra/4NMaaXbPw4IOsDwT+0krY=; b=j11xgn4dTdJgElbPVaUOfBwZ/YTLzG0om8xJAwYQCrRDGqwaj1qXbMcmF4gA5l3LxT vKQqltOf9C3ximLwJ4Co3BrwL6v9tgtM2+p+Lk6M50Dl5rmGeDkPVr1VHvBGmfFT5ubs u0Mkba8C5YTCDXBTx0fMx7xhMHux7p9e4AoYKdURyX4JRo8X2eV+JhCkn04YH3vQyGt1 jrBrWqaDiC7fU1U8pvcLjrjJNLtcDqr72HBmBvpeToApqa5BZZka7rey9KRD/WFIiO+d huBa7GzDkg7QKJymyKKRTEAglyGipsu9wTaNJ+ED30pwwf2UWz07+DbOQGHK2jZPylgc MXXw== X-Forwarded-Encrypted: i=1; AJvYcCVbsx2R8nAxxvYPBuw+PVx06keOy8yZb6dhAW8XHR9sdVXOiewszu9rLkSth5sdiTJZu7gQrjhDCpn1+W2h6WuRVIfCCZmd9g== X-Gm-Message-State: AOJu0Yw1wjmtO8QU8vHz4g3TibjjZvispUz0Fhbvm6NXb5gB83zWW/11 AcQc+Cxr/uhMqhvXFgyc6FAsccyzssm/zLU6Lm6RqUyoylcFXIJlPRPtHk6ehyo= X-Google-Smtp-Source: AGHT+IHg0EG4dbgLm5YfoP/YoJkHX25j6B72xVQkgTIZZd+AUvvXcM1WTePGLJL/U6AjFH9jJHnhLQ== X-Received: by 2002:a05:6a20:3b05:b0:1a3:495e:3f17 with SMTP id c5-20020a056a203b0500b001a3495e3f17mr5845506pzh.24.1710963273036; Wed, 20 Mar 2024 12:34:33 -0700 (PDT) Received: from localhost ([50.213.54.97]) by smtp.gmail.com with ESMTPSA id y31-20020a056a00181f00b006e6b710e41fsm11965394pfa.211.2024.03.20.12.34.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Mar 2024 12:34:32 -0700 (PDT) Date: Wed, 20 Mar 2024 12:34:32 -0700 (PDT) X-Google-Original-Date: Wed, 20 Mar 2024 12:34:30 PDT (-0700) Subject: Re: RISC-V: Use convert instructions instead of calling library functions In-Reply-To: <194c1579-0f0e-4f2c-9107-45591d5051dd@ventanamicro.com> CC: jeffreyalaw@gmail.com, jivanhakobyan9@gmail.com, gcc-patches@gcc.gnu.org From: Palmer Dabbelt To: Jeff Law Message-ID: Mime-Version: 1.0 (MHng) Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, 20 Mar 2024 11:54:34 PDT (-0700), Jeff Law wrote: > > > On 3/19/24 10:23 AM, Palmer Dabbelt wrote: >> On Mon, 18 Mar 2024 20:50:14 PDT (-0700), jeffreyalaw@gmail.com wrote: >>> >>> >>> On 3/18/24 3:09 AM, Jivan Hakobyan wrote: >>>> As RV has round instructions it is reasonable to use them instead of >>>> calling the library functions. >>>> >>>> With my patch for the following C code: >>>> double foo(double a) { >>>>      return ceil(a); >>>> } >>>> >>>> GCC generates the following ASM code (before it was tail call) >>>> foo: >>>>          fabs.d  fa4,fa0 >>>>          lui     a5,%hi(.LC0) >>>>          fld     fa3,%lo(.LC0)(a5) >>>>          flt.d   a5,fa4,fa3 >>>>          beq     a5,zero,.L3 >>>>          fcvt.l.d a5,fa0,rup >> >> I'm not sure exactly what context this is in, but my reading of >> "according to the current rounding mode" means we'd usually use the >> dynamic rounding mode. > As Andrew W. noted, we're dealing with ceil and thus rup is the correct > rounding mode to use here. > > > >> >> My only worry here is that when we were doing the other patterns we >> decided not to do rint.  I can't remember exactly why, but from reading >> the docs we probably just skipped it due to the inexact handling and Zfa >> having an instruction that just does this.  FP stuff is always a bit of >> a time sink, so at that point it probably just fell off the priority list. > rint is supposed to raise FE_INEXACT, so it's actually a good match for > RISC-V fcvt semantics as they appropriately raise FE_INEXACT. > > nearby* do not arise FE_INEXACT and thus would rely on the new Zfa > instructions where we have ones that do not raise FE_INEXACT or they > need to be conditional on flag_fp_int_builtin_inexact. One could > reasonably argue that when flag_fp_int_builtin_inexact is enabled that a > call to nearby* ought to be converted into a call to rint*. > > >> >> I'm not really an FP guy, so I usually just poke around what the other >> ports generate and try to figure out what's going on.  arm64 has the Zfa >> instruction and x86 FP is complicated, so I'm not sure exactly who else >> to look at for this sort of stuff.  From just looking at the code, >> though, I think there's two issues -- I'm not really an FP person, >> though, so take this with a grain of salt: > Right. And the condition under which we use the new sequence for > ceil/round actually borrows from x86. Essentially we only use the new > sequence when we've been told we don't care about FE_INEXACT or fp > exceptions in general. > >> >> IIUC what we've got here doesn't actually set the inexact flag for the >> bounds check failure case, as we're just loading up an exact constant >> and doing the bounds check.  We're also not clamping to INT_MAX-type >> values, but not sure if we're supposed to.  I think we could fix both of >> those by adjusting the expansion to something like > The state of FE_INEXACT is a don't care here due the condition on the > expansion code. > > >> >>              fabs.d  fa4,fa0 >>              lui     a5,%hi(.LC0) >>              fld     fa3,%lo(.LC0)(a5) >>              flt.d   a5,fa4,fa3 >>              bne     a5,zero,.L3 >>              mv      fa0, fa3 >>     .L3: >>              fcvt.l.d a5,fa0,rup >>              fcvt.d.l        fa4,a5 >>              fsgnj.d fa0,fa4,fa0 >>              ret >> >> and then adjusting the constant to be an epsilon larger than INT_MAX so >> it'll still trigger the clamping but also inexact. > I think Jivan's sequence is more correct. It's not just INT_MAX here > that's concerning, there's a whole class of values that cause problems. Ya, sorry, I thought I'd replied to Andrew's email somewhere -- I'd just managed to confuse myself about how the FP stuff works, I also think Jivan's code is correct now. >> There's also a pair of changes to the ISA in 2020 that added the >> conversion inexact handling requirement, it was a grey area before.  I >> don't remember exactly what happened there, but I did remember it >> happening.  I don't think anyone cares all that much about the >> performance of systems that target the older ISAs, so maybe we just >> restrict the non-libcall expansion to ISAs that contain the new wording? > I think all this got sufficiently cleaned up. The spec is explicit > about when FE_INEXACT gets raised on the fcvt instructions. I referred > to it repeatedly when analyzing Jivan's work. We still have support for stuf like -misa-spec=2.2 (and some 2019 releases with clunky version numbers). Those all predate the convert/inexact wording that got added. Though if FE_INEXACT is a don't care here, then I think it doesn't matter if the wording got changed. In that case I think this is fine, so Reviewed-by: Palmer Dabbelt > We can hash through the final items in a few weeks once the trunk > re-opens for development. I think those were all the isuses on my end ;) > > > Jeff