From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ua1-x929.google.com (mail-ua1-x929.google.com [IPv6:2607:f8b0:4864:20::929]) by sourceware.org (Postfix) with ESMTPS id 46D333858D3C for ; Tue, 7 Dec 2021 13:19:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 46D333858D3C Received: by mail-ua1-x929.google.com with SMTP id y5so26418537ual.7 for ; Tue, 07 Dec 2021 05:19:54 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=WtFoJTHgCS8Z7ZEpnpKItbZ5B4fc+ZIv+6YbGq6riQA=; b=Rwb4EkkY4RxMO4Og78h1a79b90v8r7r/CdJy+ACfTwCT09JGFQQQhT03qyU08nm3kx UGFekYnfG3Ho6u89KhHlnr/d+EG1ZouLg1AjiaHZxQCIhgTzCmNDdAEh+JdvT9D9qIiK mYDO70AU5bv6ZLTN6I1aXOKTATuhZK9kjP8/Yr1ebowQKJx/fgtxf/8UE0Z/aKozqOFF j9pfxI95IrfdImIJf/hZDq2jmpjzlmEtGzUjo/wwYgwx0bA2h+y79cKopJiM1CujA1dJ BcHs5c0W1SvuLVaoW0K98S9RUCll7D0vZUTxAiwY8mEO5lorax38mFcM5USZf2LhayUA fCkg== X-Gm-Message-State: AOAM530ig0qRLs6IbY4MgCcleWXTPcu/zeCp9xupErlIy+rMhx0NTaBL 5tbbTWgkRxt78IcuuauybY3+gQ== X-Google-Smtp-Source: ABdhPJzGJGNqZ8Mc24qDCaPTMbJsctiyTy8WvEqPyFRJIEE7RKqPg33lSgp3q7jLcl8HJCdoGfPt9A== X-Received: by 2002:a67:b347:: with SMTP id b7mr45276836vsm.49.1638883193789; Tue, 07 Dec 2021 05:19:53 -0800 (PST) Received: from ?IPV6:2804:431:c7ca:a776:246c:70fd:1377:eec7? ([2804:431:c7ca:a776:246c:70fd:1377:eec7]) by smtp.gmail.com with ESMTPSA id u20sm6133941vke.0.2021.12.07.05.19.52 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 07 Dec 2021 05:19:53 -0800 (PST) Message-ID: <180c751f-80ec-638e-6564-43ab4802d4dc@linaro.org> Date: Tue, 7 Dec 2021 10:19:50 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.3.2 Subject: Re: [PATCH v4 06/12] math: Remove powerpc e_hypot Content-Language: en-US To: "Paul A. Clarke" Cc: libc-alpha@sourceware.org, Tulio Magno Quites Machado Filho References: <20211203000103.737833-1-adhemerval.zanella@linaro.org> <20211203000103.737833-7-adhemerval.zanella@linaro.org> <9ab1d04d-e6f7-0ca6-0541-374a8a55ab09@linaro.org> <20211206212932.GB48332@li-24c3614c-2adc-11b2-a85c-85f334518bdb.ibm.com> From: Adhemerval Zanella In-Reply-To: <20211206212932.GB48332@li-24c3614c-2adc-11b2-a85c-85f334518bdb.ibm.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TVD_SUBJ_WIPE_DEBT, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Dec 2021 13:19:56 -0000 On 06/12/2021 18:29, Paul A. Clarke wrote: > On Mon, Dec 06, 2021 at 02:12:27PM -0300, Adhemerval Zanella wrote: >> On 02/12/2021 21:00, Adhemerval Zanella wrote: >>> The generic implementation is shows only slight worse performance: >>> >>> POWER9 reciprocal-throughput latency >>> master 13.4024 14.0967 >>> new hypot 14.8479 15.8061 >>> >>> POWER8 reciprocal-throughput latency >>> master 15.5767 16.8885 >>> new hypot 16.5371 18.4057 >>> >>> One way to improve might to make gcc generate xsmaxdp/xsmindp for >>> fmax/fmin (it onl does for -ffast-math, clang does for default >>> options). >>> >>> Checked on powerpc64-linux-gnu (power8) and powerpc64le-linux-gnu >>> (power9). >> >> Hi Tulio/Paul, >> >> This the only missing patch in this set and I would like to check with you, >> powerpc maintainers, that if it would be ok to push it. The resulting >> performance difference, including the latest one that removes the wrappers, >> is slight better: >> >> >> POWER9 reciprocal-throughput latency >> master 13.4024 14.0967 >> new hypot 11.9206 13.9871 >> >> POWER8 reciprocal-throughput latency >> master 15.5767 16.8885 >> new hypot 15.3541 18.0856 > > For Power10 / master: > > "hypot": { > "workload-random": { > "duration": 5.28242e+08, > "iterations": 4.8e+07, > "reciprocal-throughput": 8.28478, > "latency": 13.7253, > "max-throughput": 1.20703e+08, > "min-throughput": 7.2858e+07 > } > } > > For Power10 / new hypot: > > "hypot": { > "workload-random": { > "duration": 5.30731e+08, > "iterations": 5.2e+07, > "reciprocal-throughput": 7.21945, > "latency": 13.1933, > "max-throughput": 1.38515e+08, > "min-throughput": 7.57963e+07 > } > } So I assume it would be ok to push this patch based on the power10 performance numbers, right? > >> The POWER8 çatency difference seems to be due branch misprediction >> in the max/min selection. In fact, if I use xsmaxdp/xsmindp on the >> USE_FMAX_BUILTIN/USE_FMIN_BUILTIN, I see way better results on POWER8: >> >> POWER8 reciprocal-throughput latency >> xsmaxdp/xsmindp 12.8959 16.2082 >> >> POWER9 is not affected (I don't see any performance difference by >> using xsmaxdp/xsmindp). >> >> The xsmaxdp/xsmindp unfortunately are only emitted with -ffast-math >> for some reason, clang use them on default -O2 option. > > I believe clang may be wrong here, in that the instructions do not > properly handle NaN for the fmin/fmax semantics without -ffast-math. At least on power8/power9 I don't see any test-double-{fmax,fmin} regressions if I use xsmaxdp/xsmindp instead of default implementation. And libm-test-{fmax,fmin}.inc does extensively tests both default and signaling NaNs.