From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-il1-x136.google.com (mail-il1-x136.google.com [IPv6:2607:f8b0:4864:20::136]) by sourceware.org (Postfix) with ESMTPS id DBE153858C5E for ; Mon, 13 Mar 2023 15:20:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DBE153858C5E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-il1-x136.google.com with SMTP id h5so3068906ile.13 for ; Mon, 13 Mar 2023 08:20:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1678720810; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=RFXv6jl7E1XBtG7YTuUuk+AOiV2vt+U2hytyPbbphJQ=; b=NuMtXt6sxas+cBe6R1K0Fkaf0viQiDO4aa1/obEna1RzbnQcarqUYXbbQvghFWH0NZ ajwBzS6+3MvLON+rKx1DjNNsIgszen1eb11EJkGweG2NaRion+VOMQQ1hFoQBsTXpRoV vgSc9pW3OZiKsTzug3BK29Vbczab2cb2Nil6cHY78lb2KL0VtiMhfU0sRPZOLwqjzulf zwsfbb+7Sl9xasVPlQy1f2f8jQsNeUjoQjBUECcuwb36qji620CQy+Ypf0+N0euon+g8 JGihXkKQ49QKi9UFtAWb2hr5CNyebV0UsLxtiN0/Bo0cXD4L+PSE2BXMNuak48zEZRFI 8JBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678720810; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=RFXv6jl7E1XBtG7YTuUuk+AOiV2vt+U2hytyPbbphJQ=; b=BM2BE45gDCmP+yFBEE8U25HhoBssvr+PXsZk0DMKAF3dlAbkYRRKcIY6caJsklm1p4 u9tOuw1iRDoEFDr/3tXpVgmuCpKE8bDwuyOafbXljuoDiEVsS1cN9iOqIU37K4oRHG1E x6LfL7lTNXbiZp4FsPjSqm74PNygxZ3cTurHdKR9NSyuUBHwaPgGo5i8RcvgesI0rN// OA0yQAePnuSPkRekYgyI8muovqp4cGuJFNGnbZYvfH2U0eH1BglZ2Fc6S+/8faQe6brB pcP4+wd/ljfOAVcBE1Bp18MfselfDqLKByHm3RIQGsE6BaBz7NzcXpMFJH3RtanbLxFB ROHw== X-Gm-Message-State: AO0yUKWi+dAhCpeS/ucqOFc3CZgPwBfssWg9KdxsmoANt21TKFGzC5VA vD5V087lIt8qT6S35NkQGcZy+yAI6fHqQyWK6KSpHv4mHco= X-Google-Smtp-Source: AK7set+tVZwK1Z9vKAYubIrv/pHkNWtwS2kBiASOJGjXcZARRkOSO9USEzpbDFb/ElVVLk/+GBIc94iFhd27nVpzkNI= X-Received: by 2002:a92:d68a:0:b0:323:1203:740b with SMTP id p10-20020a92d68a000000b003231203740bmr58950iln.1.1678720810088; Mon, 13 Mar 2023 08:20:10 -0700 (PDT) MIME-Version: 1.0 References: <20230310175900.2388957-1-adhemerval.zanella@linaro.org> <20230310175900.2388957-5-adhemerval.zanella@linaro.org> In-Reply-To: <20230310175900.2388957-5-adhemerval.zanella@linaro.org> From: Matt Turner Date: Mon, 13 Mar 2023 11:19:58 -0400 Message-ID: Subject: Re: [PATCH 4/4] math: Improve fmodf To: Adhemerval Zanella Cc: libc-alpha@sourceware.org, Wilco Dijkstra , "H . J . Lu" , kirill Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, Mar 10, 2023 at 1:01=E2=80=AFPM Adhemerval Zanella via Libc-alpha wrote: > > This uses a new algorithm similar to already proposed earlier [1]. > With x =3D mx * 2^ex and y =3D my * 2^ey (mx, my, ex, ey being integers), > the simplest implementation is: > > mx * 2^ex =3D=3D 2 * mx * 2^(ex - 1) > > while (ex > ey) > { > mx *=3D 2; > --ex; > mx %=3D my; > } > > With mx/my being mantissa of double floating pointer, on each step the > argument reduction can be improved 8 (which is sizeof of uint32_t minus > MANTISSA_WIDTH plus the signal bit): > > while (ex > ey) > { > mx << 8; > ex -=3D 8; > mx %=3D my; > } */ > > The implementation uses builtin clz and ctz, along with shifts to > convert hx/hy back to doubles. Different than the original patch, > this path assume modulo/divide operation is slow, so use multiplication > with invert values. > > I see the following performance improvements using fmod benchtests > (result only show the 'mean' result): > > Architecture | Input | master | patch > -----------------|-----------------|----------|-------- > x86_64 (Ryzen 9) | subnormals | 17.2549 | 12.3214 > x86_64 (Ryzen 9) | normal | 85.4096 | 52.6625 > x86_64 (Ryzen 9) | close-exponents | 19.1072 | 17.4622 > aarch64 (N1) | subnormal | 10.2182 | 6.81778 > aarch64 (N1) | normal | 60.0616 | 158.339 Is this line correct? 60 -> 158?