From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi1-x229.google.com (mail-oi1-x229.google.com [IPv6:2607:f8b0:4864:20::229]) by sourceware.org (Postfix) with ESMTPS id 392D53858C83 for ; Wed, 15 Mar 2023 20:59:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 392D53858C83 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-oi1-x229.google.com with SMTP id s41so15120398oiw.13 for ; Wed, 15 Mar 2023 13:59:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1678913955; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:from:to:cc:subject:date:message-id:reply-to; bh=gb2JhsgIj8MTc4/ZxSGXnp3b7Q0ihuLJ2uaK+z6eVnM=; b=M/0gsNwjAOoPsLsL8fhEXBK94rArjCNCoEXuZYUi/UUsPGkzWZvj5vafV6OOmpvZp6 BPlMoYjpgQ3oH2XTmndEc9yzAEaxQCZU8iuha8VhJvuVxe3yAS9Cy475VqX8iba4Nzhq Td0ZlvWxewPheloTaenAOIWxtpc4WsOQfk3Cd41sRzuGPdlU6ujU0u6l89x/SV5w24Rk KI391G0Y9bDMDziwJp6ADLk0AriZgrM315T9HdFEoM7zaiW4SufWTpV+nrf8F5RarwDi s8yzZD/4MuKCQ/2By9Jl30UJ5RHddfZUcg0wK0aTqXRmjDsJjulIMYvVOjt5i88uM5Gn +ZiA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678913955; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=gb2JhsgIj8MTc4/ZxSGXnp3b7Q0ihuLJ2uaK+z6eVnM=; b=7jI6ayZnBoLBDN38/RKRwaQHIDC8s8zxr8PXpoot3Vsnqn2PmGKeBz/oXrUcTPdPI3 aBJ/4kGKfhQrH913ku2m0LyTNsPjPtWcaEQciuDK2R9pJakig30J52DpfQm0WhTgZEke FSGFqJrcfj4oXHRyRAct982DehhbZZ3Oy6l5t/JtDaAorxkehcddbe+zCYjTaeDiASoN zheNXYz2P5zK2qllTFlG3yhIoSMXEPcc/AuiPh375kVUkajzexfOuNQl6KrMPMyO2FCr UQvNMjWc+ZgEms5Fiu12dnP2qfwFBWl9iryaZy0nv0AP8IMrut5ZC5qqTmEBZQMfHxdz QEIA== X-Gm-Message-State: AO0yUKXH3TZrJAipsrRHf4Yjl3ioPsJNs5NR/lsW51wTJM0JMDGHzH2G HNB9yBQkJG121guCB5JHxpwVe5PCmRgK5+yT1hs4hQ== X-Google-Smtp-Source: AK7set+WftiKTayp7gHH7/u+iNsHgCGroPb3Dt95S6glMJnNLbyeUZLmC2qDHYELIMeaTqJIGHDlcQ== X-Received: by 2002:a05:6808:6288:b0:384:3b0b:5ff8 with SMTP id du8-20020a056808628800b003843b0b5ff8mr1738582oib.29.1678913954805; Wed, 15 Mar 2023 13:59:14 -0700 (PDT) Received: from mandiga.. ([2804:1b3:a7c0:544b:aa22:a036:fabf:6e7f]) by smtp.gmail.com with ESMTPSA id e11-20020a9d490b000000b006884c42a38asm2805857otf.41.2023.03.15.13.59.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 15 Mar 2023 13:59:14 -0700 (PDT) From: Adhemerval Zanella To: libc-alpha@sourceware.org, Wilco Dijkstra , "H . J . Lu" Subject: [PATCH v2 0/5] Improve fmod and fmodf Date: Wed, 15 Mar 2023 17:59:05 -0300 Message-Id: <20230315205910.4120377-1-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This is an updated version of a previous submission aimed to improve fmod implementation [1] by Kirill Okhotnikov. I extended it with: 1. Proper benchmarks for both single and double. The inputs are divided in 3 subsets: subnormals, normal nubmers, and close exponents. It uses a list with random generated values. 2. Use math_config.h definitions instead math_private (so it might eventually get back on optimize-routines). 3. Implement the same strategy for float version. 4. Also tuned the final division to use multiplication with inverse instead of direct modulo. It showed better performance on both x86_64 and aarch64 chips I have tested. 5. Remove SVID error handling wrapper. The performance shows a good improvement compared to current algorithm for fmod (using gcc 11): Architecture | Input | master | patch -----------------|-----------------|----------|-------- x86_64 (Ryzen 9) | subnormals | 19.1584 | 9.40992 x86_64 (Ryzen 9) | normal | 1016.51 | 296.738 x86_64 (Ryzen 9) | close-exponents | 18.4428 | 13.119 aarch64 (N1) | subnormal | 11.153 | 4.33313 aarch64 (N1) | normal | 528.649 | 158.339 aarch64 (N1) | close-exponents | 11.4517 | 5.76138 I also see similar improvements on arm-linux-gnueabihf when running on the N1 aarch64 chips, where it uses a lot of soft-fp implementation (for modulo, clz, ctz, and multiplication): Architecture | Input | master | patch -----------------|-----------------|----------|-------- armhf (N1) | subnormal | 15.7284 | 14.5746 armhf (N1) | normal | 837.525 | 241.738 armhf (N1) | close-exponents | 16.2111 | 22.457 The fmodf shows a more moderate improvement: Architecture | Input | master | patch -----------------|-----------------|----------|-------- x86_64 (Ryzen 9) | subnormals | 17.2549 | 9.35776 x86_64 (Ryzen 9) | normal | 85.4096 | 46.2761 x86_64 (Ryzen 9) | close-exponents | 19.1072 | 12.6199 aarch64 (N1) | subnormal | 10.2182 | 4.39188 aarch64 (N1) | normal | 60.0616 | 18.3888 aarch64 (N1) | close-exponents | 11.5256 | 5.93518 armhf (N1) | subnormal | 11.6662 | 7.75977 armhf (N1) | normal | 69.2759 | 31.623 armhf (N1) | close-exponents | 13.6472 | 15.6689 I also checked against H.J proposal to use fprem on x86_64 [2] and against recent suggestion on libc-alpha [3], and on both cases this newer implementation shows better performance. Changes from v1: * Remove SVID error handling wrapper. * Extend testing for subnormal with different signs. * Code cleanup. [1] https://sourceware.org/pipermail/libc-alpha/2020-November/119794.html [2] https://patchwork.sourceware.org/project/glibc/patch/20230309183312.205763-1-hjl.tools@gmail.com/ [3] https://sourceware.org/pipermail/libc-alpha/2023-March/146164.html Adhemerval Zanella (5): benchtests: Add fmod benchmark benchtests: Add fmodf benchmark math: Improve fmod math: Improve fmodf math: Remove the error handling wrapper from fmod and fmodf benchtests/Makefile | 2 + benchtests/fmod-inputs | 2182 +++++++++++++++++ benchtests/fmodf-inputs | 2182 +++++++++++++++++ math/Versions | 4 + math/libm-test-fmod.inc | 11 + math/w_fmod_compat.c | 13 +- math/w_fmodf_compat.c | 6 +- sysdeps/i386/fpu/w_fmod_compat.c | 14 + sysdeps/i386/fpu/w_fmodf_compat.c | 14 + sysdeps/ieee754/dbl-64/e_fmod.c | 245 +- sysdeps/ieee754/dbl-64/math_config.h | 70 + sysdeps/ieee754/dbl-64/math_err.c | 6 + sysdeps/ieee754/dbl-64/w_fmod.c | 1 + sysdeps/ieee754/flt-32/e_fmodf.c | 241 +- sysdeps/ieee754/flt-32/math_config.h | 48 + sysdeps/ieee754/flt-32/math_errf.c | 6 + sysdeps/ieee754/flt-32/w_fmodf.c | 1 + sysdeps/m68k/m680x0/fpu/w_fmod_compat.c | 14 + sysdeps/m68k/m680x0/fpu/w_fmodf_compat.c | 14 + sysdeps/unix/sysv/linux/aarch64/libm.abilist | 2 + sysdeps/unix/sysv/linux/alpha/libm.abilist | 2 + sysdeps/unix/sysv/linux/arm/be/libm.abilist | 2 + sysdeps/unix/sysv/linux/arm/le/libm.abilist | 2 + sysdeps/unix/sysv/linux/hppa/libm.abilist | 2 + .../sysv/linux/m68k/coldfire/libm.abilist | 2 + .../sysv/linux/microblaze/be/libm.abilist | 2 + .../sysv/linux/microblaze/le/libm.abilist | 2 + .../unix/sysv/linux/mips/mips32/libm.abilist | 2 + .../unix/sysv/linux/mips/mips64/libm.abilist | 2 + sysdeps/unix/sysv/linux/nios2/libm.abilist | 2 + .../linux/powerpc/powerpc32/fpu/libm.abilist | 2 + .../powerpc/powerpc32/nofpu/libm.abilist | 2 + .../linux/powerpc/powerpc64/be/libm.abilist | 2 + .../linux/powerpc/powerpc64/le/libm.abilist | 2 + .../unix/sysv/linux/s390/s390-32/libm.abilist | 2 + .../unix/sysv/linux/s390/s390-64/libm.abilist | 2 + sysdeps/unix/sysv/linux/sh/be/libm.abilist | 2 + sysdeps/unix/sysv/linux/sh/le/libm.abilist | 2 + .../sysv/linux/sparc/sparc32/libm.abilist | 2 + .../sysv/linux/sparc/sparc64/libm.abilist | 2 + .../unix/sysv/linux/x86_64/64/libm.abilist | 2 + .../unix/sysv/linux/x86_64/x32/libm.abilist | 2 + 42 files changed, 4923 insertions(+), 197 deletions(-) create mode 100644 benchtests/fmod-inputs create mode 100644 benchtests/fmodf-inputs create mode 100644 sysdeps/i386/fpu/w_fmod_compat.c create mode 100644 sysdeps/i386/fpu/w_fmodf_compat.c create mode 100644 sysdeps/ieee754/dbl-64/w_fmod.c create mode 100644 sysdeps/ieee754/flt-32/w_fmodf.c create mode 100644 sysdeps/m68k/m680x0/fpu/w_fmod_compat.c create mode 100644 sysdeps/m68k/m680x0/fpu/w_fmodf_compat.c -- 2.34.1