From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vk1-xa31.google.com (mail-vk1-xa31.google.com [IPv6:2607:f8b0:4864:20::a31]) by sourceware.org (Postfix) with ESMTPS id D2E9A3857C7B for ; Thu, 11 Nov 2021 17:13:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D2E9A3857C7B Received: by mail-vk1-xa31.google.com with SMTP id f78so3638211vka.5 for ; Thu, 11 Nov 2021 09:13:35 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=gRVpGn6VU6thhAr1A504b5Ux5Tt3BgMg3tuJ+8AmU0Q=; b=j4rL6B3ofcavNDYLBzq7RhVnV81k320jqRI+21QW6fx4zveqDbeFlebx4m2u8WXSMk lOJBH7+oe5Oi/0XrcRO/7+AMSQsexoNLdnzx1FkYZN6fNNW1OUnZmp/5YV2QYPfhbRil Iir+BcNSHo8Vwe+j4BWqto3XUsJ0sJXM6Bzf/V1ANs7zNdBCmKWNoEgg8MAcHKmMeNe3 KsN7EkGywpQUd1ZkqXBZezt4K7hBYROg7HzAwYtAeonfi6k7wODKgqxyK+GbgF1Z5qlo ZnVCDq0cI9n9tvrsw4gXGy5GLlYS1UPR0Atb+oywrFR0nmtDPvsdNc85n26X9z7SNrh2 4dbQ== X-Gm-Message-State: AOAM530TovDERy5XWUdQSOgMUHWKamiOY/HmAUan8c9zBfaFzemiEuJ9 V4T1znXoNSp3vJt2XBRhMN7sIw== X-Google-Smtp-Source: ABdhPJyk7/49AbL+lRvWif7Ul5Cr0NW5F7C+tD6mLDguMqzIx0boGLqJVjiDPiKbnhFeIPP9/j1D2w== X-Received: by 2002:a1f:18cb:: with SMTP id 194mr12920159vky.16.1636650815098; Thu, 11 Nov 2021 09:13:35 -0800 (PST) Received: from ?IPV6:2804:431:c7cb:55a:48f2:1d0b:8ae8:643a? ([2804:431:c7cb:55a:48f2:1d0b:8ae8:643a]) by smtp.gmail.com with ESMTPSA id c9sm2468305uaf.12.2021.11.11.09.13.33 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 11 Nov 2021 09:13:34 -0800 (PST) Message-ID: <384b240c-29c3-af14-05e6-951f00178cff@linaro.org> Date: Thu, 11 Nov 2021 14:13:31 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.1 Subject: Re: [PATCH v3 5/7] math: Remove powerpc e_hypot Content-Language: en-US To: Wilco Dijkstra , "Paul A. Clarke" Cc: "libc-alpha@sourceware.org" , Tulio Magno Quites Machado Filho References: <20211101202059.1026032-1-adhemerval.zanella@linaro.org> <20211101202059.1026032-6-adhemerval.zanella@linaro.org> <20211109192800.GA4930@li-24c3614c-2adc-11b2-a85c-85f334518bdb.ibm.com> <37a5bc8c-a9ec-952d-427e-62632f7f7a0a@linaro.org> From: Adhemerval Zanella In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-8.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TVD_SUBJ_WIPE_DEBT, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Nov 2021 17:13:37 -0000 On 11/11/2021 14:05, Wilco Dijkstra wrote: > Hi Adhemerval, > >> On 10/11/2021 11:34, Wilco Dijkstra wrote: >>> I think the new algorithm will always be slower due to the dependent sqrt and >>> division. So it's hard to improve unless we only use it for special cases (eg. when >>> ax and ay are close). Returning sqrt (fma (ax, ax, ay * ay)) is about twice as fast >>> and gives just over 1 ULP, so we're losing a lot of performance for a small ULP >>> improvement. >> >> My main drive for this change is remove the arch-specific implementation in >> favor of an implementation that might be optimized better by the compiler >> without the need to extra hacks by arch-specific hooks (as I did for power7). > > I'm all for having a single optimized generic implementation like we did for other > math functions. In general there is little scope for compiler optimizations due to > conservative FP settings - it is all down to highly optimizing both the algorithm > and implementation. > >> Another option is to use the powerpc implementation which favor FP over integer >> as the default one. > > That is the fastest implementation. It is less accurate though (~1.04ULP with FMA > and ~1.21ULP without FMA), so I'm not sure that would be acceptable. This should not be worse than the current default (the powerpc one is essentially the same as default using FP operations). > > I did some quick optimizations on the new algorithm, on Neoverse N1 my fastest > version is less than 10% slower than the powerpc version, and has ~0.94 ULP error. Do you mean besides the optimized nan/inf checks? I can check if it helps on powerpc.