From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-alpha-return-106663-listarch-libc-alpha=sources.redhat.com@sourceware.org>
Received: (qmail 106567 invoked by alias); 5 Nov 2019 18:55:39 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Received: (qmail 106554 invoked by uid 89); 5 Nov 2019 18:55:39 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-9.0 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy=maintains
X-HELO: mail-qt1-f195.google.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=linaro.org; s=google;
        h=subject:to:references:from:openpgp:autocrypt:message-id:date
         :user-agent:mime-version:in-reply-to:content-language
         :content-transfer-encoding;
        bh=/vZu8I33UP3xMpXssZaYxChpOSC19RGG6DJuFo+Jo5s=;
        b=GMLMN0h5XfrA3P2aPcH0ZZWK5RH72lDTAEy2WJxzaslJM9lbx6LsfDNShPLW2q3Xys
         W0sRCIIrme4YLH9p51Xv/dvNi0gW6sNW6j47uDEaftN7VW2AZC3tqWQphuOMtBloMMpm
         OItGLFeAJ2pWAVgqxDQQdSt8os7MqH7fp3KxdB+ftf7m+50V8n9ygxo2MUzCENn/EQ7r
         cwpgJPQsvCeb1upvMX6+roaLDjsd5BhF95Mnvm62Jhz70stKWGKo3dpVYNSBdxKTA0JH
         mQoHmt0Vr+AK2fq8R2hNsbk7V6bz507hNZan4UZnCx9ldKnjbwY0XJLRDLVmoDmTlK1E
         OyFw==
Return-Path: <adhemerval.zanella@linaro.org>
Subject: Re: [PATCH 01/17] S390: Use load-fp-integer instruction for nearbyint
 functions.
To: libc-alpha@sourceware.org
References: <1572881244-6781-1-git-send-email-stli@linux.ibm.com>
 <fc754475-f2cd-cb07-c527-1ccc567a4868@linaro.org>
 <ee15f9ef-bb5c-293d-fd91-94f6a0ad549c@linux.ibm.com>
From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Openpgp: preference=signencrypt
Message-ID: <4cdb552e-5b56-4e43-a33b-44ec9892cc3f@linaro.org>
Date: Tue, 05 Nov 2019 18:55:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101
 Thunderbird/60.9.0
MIME-Version: 1.0
In-Reply-To: <ee15f9ef-bb5c-293d-fd91-94f6a0ad549c@linux.ibm.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-SW-Source: 2019-11/txt/msg00134.txt.bz2



On 05/11/2019 12:49, Stefan Liebler wrote:
> On 11/4/19 7:22 PM, Adhemerval Zanella wrote:
>>
>>
>> On 04/11/2019 12:27, Stefan Liebler wrote:
>>> If compiled with z196 zarch support, the load-fp-integer instruction
>>> is used to implement nearbyint, nearbyintf, nearbyintl.
>>> Otherwise the common-code implementation is used.
>>
>>> +
>>> +double
>>> +__nearbyint (double x)
>>> +{
>>> +Â  double y;
>>> +Â  /* The z196 zarch "load fp integer" (fidbra) instruction is rounding
>>> +Â Â Â Â  x to the nearest integer according to current rounding mode (M3-field: 0)
>>> +Â Â Â Â  where inexact exceptions are suppressed (M4-field: 4).Â  */
>>> +Â  __asm__ ("fidbra %0,0,%1,4" : "=f" (y) : "f" (x));
>>> +Â  return y;
>>> +}
>>> +libm_alias_double (__nearbyint, nearbyint)
>>
>> At least with recent gcc __builtin_nearbyint generates the expected fidbra
>> instruction for -march=z196.Â  I wonder if we could start to simplify some
>> math symbols implementation where new architectures/extensions provide
>> direct implementation by a direct mapping implemented by compiler builtins.
>>
>> I would expect to:
>>
>> Â Â  1. Move all sysdeps/ieee754/dbl-64/wordsize-64 to sysdeps/ieee754/dbl-64/
>> Â Â Â Â Â  since I hardly doubt these micro-optimizations really pay off with
>> Â Â Â Â Â  recent architectures and compiler version.
>>
>> Â Â  2. Add internal macros __USE_<SYMBOL>_BUILTIN and use as:
>>
>> Â Â Â Â Â  * sysdeps/ieee754/dbl-64/s_nearbyint.c
>> Â Â Â Â Â  Â Â Â Â Â  [...]
>> Â Â Â Â Â  double
>> Â Â Â Â Â  __nearbyint (double x)
>> Â Â Â Â Â  {
>> Â Â Â Â Â  #if __USE_NEARBYINT_BUILTIN
>> Â Â Â Â Â Â Â  return __builtin_nearbyint (x);
>> Â Â Â Â Â  #else
>> Â Â Â Â Â Â Â  /* Use generic implementation.Â  */
>> Â Â Â Â Â  #endif
>> Â Â Â Â Â  }
>>
>> Â Â  3. Define the __USE_<SYMBOL>_BUILTIN for each architecture.
>>
>> It would allow to simplify some architectures, aarch64 for instance.
>>
> 
> Currently the long double builtins are generating an extra not needed stack frame compared to the inline assembly. But this needs to be fixed in gcc.
> 
> E.g. if build for s390 (31bit), where the fidbra & co instructions are not available, the builtins generate a call to libc which would end in an infinite loop.Â  I will make some tests on s390 starting with the current minimum gcc 6.2 to be sure that the instructions are used.Â  I have never build glibc with other compilers like clang.Â  Is there a special need to check this behavior?

I think google maintains some branches with clang support (google/grte/*),
but there is no know effort to sync these with master.  So I see there is
no need to focus on non-gcc compiler for now.

> 
> In general I can start with those functions where the builtins can be used on s390, but I won't move all wordsize-64 functions and adjust them to use the builtins with this patch series.
> This means for now, I start with using builtins for nearbyint, rint, floor, ceil, trunc, round and copysign.
> 
> Afterwards the same can be done for the remaining functions.
> 
> I will create an own header file, e.g. sysdeps/generic/math-use-builtins.h in the same way as fix-fp-int-compare-invalid.h.
> The generic version contains all USE_XYZ_BUILTIN macros defined to 0
> and each architecture can provide its own file with other settings.
> For each functions XYZ there will be three macros, e.g. USE_NEARBYINT_BUILTIN, USE_NEARBYINTF_BUILTIN, USE_NEARBYINTL_BUILTIN.
> How about this?
> 

I think it is fair start, with the adjustments pointed out by Joseph.
I will check out the worksize-64 refactor to avoid duplicate the
implementations.