From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x30.google.com (mail-oa1-x30.google.com [IPv6:2001:4860:4864:20::30]) by sourceware.org (Postfix) with ESMTPS id 3089A3858D35 for ; Thu, 16 Mar 2023 20:39:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3089A3858D35 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-oa1-x30.google.com with SMTP id 586e51a60fabf-17aa62d0a4aso3564179fac.4 for ; Thu, 16 Mar 2023 13:39:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1678999166; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=N2dlJLDVi1jVNWM5nPBNCNdwt9cxNmb6koJEzMVv3hM=; b=SXldyv0lPOtSAJZanYLWuaIS7Wx54Q6kGuMnaj2yU3OTrX2s1YhIEVX+b075Dm6ZJl fjMe7l8rSxj4xv7kRX0X+GFap/vDwkumRxgyru2+zistkoRlJQfk7wPnRN/GV6MBa0e0 PgQ5nzXYMbBzNeCb3XVqKIbCmx0MxSLN7TLBbgiXBzMwd5ibyq/Tgp6eaS9Bo8wAFtK7 v8gfQrhepC5zkZLThTtdtA30usQHQPTO990LG+YAA/GJ48azUlZnEULWA0H2Zag0KQ9s DMBnXPDb4eMGdPsWk1P0j1OszyQt350Yn+DBGgYdgVuR7gVg3hfq01GCkFZbBxjECBSS aQzA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678999166; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=N2dlJLDVi1jVNWM5nPBNCNdwt9cxNmb6koJEzMVv3hM=; b=yEIqRwP2OSG5uSqPBnB+IKonBVh/6LZIzurIBvf/oZRHHImDCoqKciGe/KFJVTzirP TGNF3oRv+Kz/ptxQamsoWQTl51qK7heUTCIcpbk1nmuvFww9QU53clmXsbHp2LIMD5FI d1U3dLm+ltF1Tq2wZa2bo6FQ9WDema1JXvKBMHwuLR+HYQHoZJ0qb0UUML+Ou+mMHsME oPRnY2Y1mX6kik+GT+LdJt8pdrygmsHOxA8h12iN3ClkrDENSQbJqeH85z4p3sdbF7Bb tIpKAsQDDasAusYtJWH9xi3OkHmzi7pIhkBp+ijsYQsXAS/3hrlNmZ734UZcHymMHNCP JZGQ== X-Gm-Message-State: AO0yUKXGTnrE1BW9sCw5wNm8YIKSHFwP1rslouoIU9suVtfyH10OL2Pm 9sBGWv7ZfpoIqA7eT17p++pjpQ== X-Google-Smtp-Source: AK7set+t1Yobs8ApFg5E5xTSyTND1AN/OZIeJpRUuSI+0FLaDD9UviNbgPPDMcTQzG2PnZ8i4wJ0Og== X-Received: by 2002:a05:6870:d1ca:b0:177:c71d:a432 with SMTP id b10-20020a056870d1ca00b00177c71da432mr8738920oac.4.1678999166494; Thu, 16 Mar 2023 13:39:26 -0700 (PDT) Received: from ?IPV6:2804:1b3:a7c0:c260:9a4:3847:50c3:a2f0? ([2804:1b3:a7c0:c260:9a4:3847:50c3:a2f0]) by smtp.gmail.com with ESMTPSA id b20-20020a056870d1d400b001768b4759e5sm235678oac.14.2023.03.16.13.39.24 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 16 Mar 2023 13:39:25 -0700 (PDT) Message-ID: Date: Thu, 16 Mar 2023 17:39:22 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.8.0 Subject: Re: [PATCH v2 3/5] math: Improve fmod Content-Language: en-US To: Wilco Dijkstra , "H.J. Lu" Cc: "libc-alpha@sourceware.org" , kirill References: <20230315205910.4120377-1-adhemerval.zanella@linaro.org> <20230315205910.4120377-4-adhemerval.zanella@linaro.org> From: Adhemerval Zanella Netto Organization: Linaro In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3.7 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 16/03/23 13:13, Wilco Dijkstra wrote: > Hi, > > It's these cases where x87 is still faster than the generic version: > >> E5-2640          | close-exponents | 39.298   | 22.2742 >> >> i7-4510U         | close-exponents | 29.463   | 22.8572 > > Are these mostly x < y or cases where the exponent difference is just over 11 and > thus we do not use the fast path? In fact the fast path will be used on ~83% of the cases (849 from 1024 entries). Profiling shows that the initial checks might be the culprit, since generic compat wrapper uses compiler builtins that might map to fp instructions. But even trying to mimic did not improve much. It seems that for some CPU the integer operations to create the final floating number is what is costly. > >> I am also checking a algorithm change to use simple loop for the normal inputs, >> where integer modulo operation is used instead of inverse multiplication. > > Adding another fast path for a wider range of exponent difference could be faster > than the generic modulo loop. This could do 2 modulo steps and maybe handle > tail zeroes (which I think is what HJ's testcase will benefit from). > > For really large exponent differences, the generic modulo code could process 30 > or 60 bits per iteration instead of just 11. It's more complex (so would be a separate > patch) but it should help CPUs with relatively high latency multipliers. Yes, it might be an improvement indeed.