From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yw1-x112e.google.com (mail-yw1-x112e.google.com [IPv6:2607:f8b0:4864:20::112e]) by sourceware.org (Postfix) with ESMTPS id 01B0D3858D28 for ; Fri, 17 Mar 2023 16:08:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 01B0D3858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-yw1-x112e.google.com with SMTP id 00721157ae682-53d277c1834so102601427b3.10 for ; Fri, 17 Mar 2023 09:08:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679069301; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Cv2AmqRPxbYbMwXuFOfU4rbp7VDpYTy3oSgIgpK6Or8=; b=PGlQye97Al6n8j3Kmy37g/tPRXQIwe6U6FlxlYbJz1z3aXpAgtCaHj2rh7db8+XWbV I9wkOGaOOahdCFsbxELgRlbLj1xbiokdzNqgherlgVhu/LC9eyzjwaIwGkdmqReCRQDB pJySpajV6+vJjax/6sGU8oP2uk7G4NxbGvXmvjqNIj2KH1VWrXSxbKwV9uhOTxLZyvAJ Qx0zSZ29IqfL1n/v9duVqICPw8NLd9R9g0i4wtOtVc8WWYtk6L/VbyUJidzkfBhibtLp GW5ZmkRt3+ALMTr1PUoF9BdPfSXn09A5VJDpSxsaPd+Jz/eqqJqMxA4XfONZ/OvrqWEN DrLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679069301; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Cv2AmqRPxbYbMwXuFOfU4rbp7VDpYTy3oSgIgpK6Or8=; b=vrVoymBRpTNRAO5CgektLEVvwwYCtvqeQa5LwaCZeqBc/Ix+nwP3r/R8HYrr5Xk03Y n0C6IqTVqBlPc3acOEfVaXSrFxAg27Ft1CCk0OT7/MsNHMFJW3fCKqW//6K0PTxn0NLg vq2FG9RWzulyU7eo6AuNwBZZQhhOmzPrLMy3HEDd0J4ERZAdMC/1skBD1oMq0J1Dvcqx Mak7ucdJDfMODhWO8pJWANHtqS7uhm3MZvRGGzULxSlyHxdS8vEfni9U76VHM1T3mlVk gBrqXxTYHZ6hsJ5adDLRiTKLbwqYhWeOl9TvFRS1SRqaTWb+qALRriKrkJa3Lth1M7od ozhw== X-Gm-Message-State: AO0yUKXdv/WqP6cYPqvtqFb+D8W4zz21qa1BxUg6Kh5TDCg61Eff1fGG x5cdHENjax1duQDA8avCSeKr30ip9t0AjZu1LZF85Luskl0= X-Google-Smtp-Source: AK7set/po2AVHKnhL4sHus3aIsA6xKDFy3EdZiybd5hVQR+HFGAKP6MiIuPn1nKNtJ760oeOr2Elavc79SuYZ1jSpmk= X-Received: by 2002:a81:a704:0:b0:544:bbd2:749c with SMTP id e4-20020a81a704000000b00544bbd2749cmr1968052ywh.2.1679069301298; Fri, 17 Mar 2023 09:08:21 -0700 (PDT) MIME-Version: 1.0 References: <20230315205910.4120377-1-adhemerval.zanella@linaro.org> <20230315205910.4120377-4-adhemerval.zanella@linaro.org> In-Reply-To: From: "H.J. Lu" Date: Fri, 17 Mar 2023 09:07:45 -0700 Message-ID: Subject: Re: [PATCH v2 3/5] math: Improve fmod To: Wilco Dijkstra Cc: Adhemerval Zanella Netto , "libc-alpha@sourceware.org" , kirill Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-3016.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, Mar 17, 2023 at 7:55=E2=80=AFAM Wilco Dijkstra wrote: > > Hi Adhemerval, > > >> It's these cases where x87 is still faster than the generic version: > >> > >>> E5-2640 | close-exponents | 39.298 | 22.2742 > >>> > >>> i7-4510U | close-exponents | 29.463 | 22.8572 > >> > >> Are these mostly x < y or cases where the exponent difference is just = over 11 and > >> thus we do not use the fast path? > > > > In fact the fast path will be used on ~83% of the cases (849 from 1024 = entries). > > Profiling shows that the initial checks might be the culprit, since gen= eric > > compat wrapper uses compiler builtins that might map to fp instructions= . But even > > trying to mimic did not improve much. It seems that for some CPU the i= nteger > > operations to create the final floating number is what is costly. > > If it is mostly the fast path we could further tune it and reduce instruc= tion counts. > It takes 6 if statements to enter this fast path, we could reduce that to= 3. There are > several large constants which could be simplified (older x86 cores might = have > issues with multiple 10-byte MOVABS in the instruction stream). > > Also I think your results for generic above use the wrapper, so we'd stil= l get the > > 20% speedup which should make things closer. > The current __ieee754_fmod doesn't set errno nor does x87 __ieee754_fmod. A wrapper will avoid setting errno. --=20 H.J.