From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=IIrV=7H=linaro.org=adhemerval.zanella@sourceware.org>
Received: from mail-oi1-x232.google.com (mail-oi1-x232.google.com [IPv6:2607:f8b0:4864:20::232])
	by sourceware.org (Postfix) with ESMTPS id 2CDAA3858C83
	for <libc-alpha@sourceware.org>; Wed, 15 Mar 2023 17:51:02 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2CDAA3858C83
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org
Received: by mail-oi1-x232.google.com with SMTP id bj30so14756756oib.6
        for <libc-alpha@sourceware.org>; Wed, 15 Mar 2023 10:51:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=linaro.org; s=google; t=1678902661;
        h=content-transfer-encoding:in-reply-to:organization:from:references
         :cc:to:content-language:subject:user-agent:mime-version:date
         :message-id:from:to:cc:subject:date:message-id:reply-to;
        bh=Wib67622FA+9z46+mQArMRhU5zB8keCM5sOJcyw98xM=;
        b=ogkx4skkqtI5mSdJHRcTrCluK9Ke86s8bOaS4/gIxjS0La59rB5/tG0XTOESzFnzx0
         fcYICHQeIQ4IZ/9VgJz8udXwshtHeFyMZZbaSARI7fK5YG0IbMNH29vFSzgahO7ytxj5
         5p0lWJhcO6Is0h0VSw70J9uKRDfFj2lX6QVKqXHfH0Ox5aU4s2V7yQTvizLyaNW39xeJ
         bfivEcc2uavYxeMU+jID/OeKz5xv0uYsoarPO6EzXme70Pc6cFxmu5RBtai+MGphA6iV
         jgllNH0WbWLsdMH+4nabpcXP+mflpp1GBJ9WqvLu6ikOKBVvttr4zv3fSFEAnQ8Wxv39
         TZqg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112; t=1678902661;
        h=content-transfer-encoding:in-reply-to:organization:from:references
         :cc:to:content-language:subject:user-agent:mime-version:date
         :message-id:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=Wib67622FA+9z46+mQArMRhU5zB8keCM5sOJcyw98xM=;
        b=heFmzpZK3KcICiv1+GqDXw6qtIGND2TtYPKtFXepxU05GMZSK9owQuqcqtJMjKpv7N
         Tp2T0/k7Vgfy/VIsSQlby679Fvf5cJuOySbvv2VSRugTDi73vjycSmyrvEyJhFhqSwYT
         vmtYZ5tdX8rc8LaWSh4UH5hvMOOWob3yWO6f+AN5bZDQQohY53U/o5ViGge2RrzeRU+Z
         NUbm/8fOzOdLuLYJ5XlryoCEfUz+XjCHzCrrz6WPzp+UR+lMlzjMkTT6xaaMiBuyhFB+
         pZ4JrBUq2KDFlPuxlPG/5zXJq0TnirVKo1mYX7A+lbGK5uL53ffIizX8cF2N3fb/bO9b
         2/uw==
X-Gm-Message-State: AO0yUKVlw5ISAydHAOaJWCCWdnWetHDy7xpbUaxUkf0Y2E1/ruBSFBUp
	K8LdZAwnDONhH43gC/bEYhL5mhL/RRt7qIhoWDa3Wg==
X-Google-Smtp-Source: AK7set8QUf8sT4tZdc1S36YgsNvY9uCaK/abR80vpndeUxpo+VK2v6xMGinHFNiOfy23aXWBVIfODA==
X-Received: by 2002:a54:4018:0:b0:384:32e0:e0b5 with SMTP id x24-20020a544018000000b0038432e0e0b5mr1449019oie.11.1678902661239;
        Wed, 15 Mar 2023 10:51:01 -0700 (PDT)
Received: from ?IPV6:2804:1b3:a7c0:544b:8d01:8008:2c0e:dce7? ([2804:1b3:a7c0:544b:8d01:8008:2c0e:dce7])
        by smtp.gmail.com with ESMTPSA id r64-20020acada43000000b003646062e83bsm2426181oig.29.2023.03.15.10.50.59
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Wed, 15 Mar 2023 10:51:00 -0700 (PDT)
Message-ID: <420f7c21-8dcd-078c-fbc2-6ee8b7d2af53@linaro.org>
Date: Wed, 15 Mar 2023 14:50:57 -0300
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0)
 Gecko/20100101 Thunderbird/102.8.0
Subject: Re: [PATCH 4/4] math: Improve fmodf
Content-Language: en-US
To: Wilco Dijkstra <Wilco.Dijkstra@arm.com>,
 "libc-alpha@sourceware.org" <libc-alpha@sourceware.org>,
 "H . J . Lu" <hjl.tools@gmail.com>
Cc: kirill <kirill.okhotnikov@gmail.com>
References: <20230310175900.2388957-1-adhemerval.zanella@linaro.org>
 <20230310175900.2388957-5-adhemerval.zanella@linaro.org>
 <PAWPR08MB89827C35016A4F09DD92BF3983BE9@PAWPR08MB8982.eurprd08.prod.outlook.com>
From: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Organization: Linaro
In-Reply-To: <PAWPR08MB89827C35016A4F09DD92BF3983BE9@PAWPR08MB8982.eurprd08.prod.outlook.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-10.2 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <libc-alpha.sourceware.org>


On 14/03/23 13:42, Wilco Dijkstra wrote:
> Hi Adhemerval,
> 
> This looks good overall. I guess we still needs error handling and the wrapper
> disabled since it spends 1/3 of the time in the useless wrapper but that can be
> a separate patch.

The wrapper should be simple to add.

> 
> A few comments below, there are a few things that look wrong, and I think most
> of the helper functions just add extra complexity and branches for no gain:
> 
> 
> +  uint32_t sx = hx & SIGN_MASK;
> +  /* Get |x| and |y|.  */
> +  hx ^= sx;
> +  hy &= ~SIGN_MASK;
> +
> +  /* Special cases:
> +     - If x or y is a Nan, NaN is returned.
> +     - If x is an inifinity, a NaN is returned.
> +     - If y is zero, Nan is returned.
> +     - If x is +0/-0, and y is not zero, +0/-0 is returned.  */
> +  if (__glibc_unlikely (hy == 0        || hx >= EXPONENT_MASK || hy > EXPONENT_MASK))
> +    return (x * y) / (x * y);
> +
> +  if (__glibc_unlikely (hx <= hy))
> +    {
> +      if (hx < hy)
> +       return x;
> +      return sx ? -0.0 : 0.0;
> 
> This should be return asfloat (sx);

Ack.

> 
> +    }
> +
> +  int ex = get_unbiased_exponent (hx);
> +  int ey = get_unbiased_exponent (hy);
> 
> Should be hx >> MANTISSA_WIDTH since we cleared the sign bits (now we don't
> need to add get_unbiased_exponent).

Ack.

> 
> +  /* Common case where exponents are close: ey >= -103 and |x/y| < 2^8,  */
> +  if (__glibc_likely (ey > MANTISSA_WIDTH && ex - ey <= EXPONENT_WIDTH))
> +    {
> +      uint32_t mx = get_explicit_mantissa (hx);
> +      uint32_t my = get_explicit_mantissa (hy);
> 
> Note this is equivalent to:
> 
> mx = (hx & MANTISSA_MASK) | (MANTISSA_MASK + 1);

Ack.

> 
> So we don't need get_explicit_mantissa (or we could change it to do the above).
> If we do this before the if statement, we don't need to repeat it below.
> 
> +      uint32_t d = (ex == ey) ? (mx - my) : (mx << (ex - ey)) % my;
> +      if (d == 0)
> +       return 0.0;
> 
> Looks like a bug, should be asfloat (sx)?

In fact I think this check is not really required since we already if
inputs are equal.

> 
> +      return make_float (d, ey - 1, sx);
> +    }
> +
> +  /* Special case, both x and y are subnormal.  */
> +  if (__glibc_unlikely (ex == 0 && ey == 0))
> +    return asfloat (hx % hy);
> 
> Similarly, shouldn't this be asfloat (sx | (hx % hy))?

Yes, I will add tests to check for it.

> 
> +  /* Convert |x| and |y| to 'mx + 2^ex' and 'my + 2^ey'.  Assume that hx is
> +     not subnormal by conditions above.  */
> +  uint32_t mx = get_explicit_mantissa (hx);
> +  ex--;
> +
> +  uint32_t my = get_explicit_mantissa (hy);
> 
> If we set mx/my above then this isn't needed.

Right, I think we can mask out the mantissa and set the implicit bit then.

> 
> +  int lead_zeros_my = EXPONENT_WIDTH;
> +  if (__glibc_likely (ey > 0))
> +    ey--;
> +  else
> +    {
> +      my = get_mantissa (hy);
> 
> This is really my = hy; since we know hy is positive denormal.

Indeed.

> 
> +      lead_zeros_my = __builtin_clz (my);
> +    }
> +
> +  int tail_zeros_my = __builtin_ctz (my);
> +  int sides_zeroes = lead_zeros_my + tail_zeros_my;
> +  int exp_diff = ex - ey;
> +
> +  int right_shift = exp_diff < tail_zeros_my ? exp_diff : tail_zeros_my;
> +  my >>= right_shift;
> +  exp_diff -= right_shift;
> +  ey += right_shift;
> +
> +  int left_shift = exp_diff < EXPONENT_WIDTH ? exp_diff : EXPONENT_WIDTH;
> +  mx <<= left_shift;
> +  exp_diff -= left_shift;
> +
> +  mx %= my;
> +
> +  if (__glibc_unlikely (mx == 0))
> +    return sx ? -0.0 : 0.0;
> 
> Should be asfloat (sx);

Ack.

> 
> +  if (exp_diff == 0)
> +    return make_float (my, ey, sx);
> +
> +  /* Assume modulo/divide operation is slow, so use multiplication with invert
> +     values.  */
> +  uint32_t inv_hy = UINT32_MAX / my;
> +  while (exp_diff > sides_zeroes) {
> +    exp_diff -= sides_zeroes;
> +    uint32_t hd = (mx * inv_hy) >> (BIT_WIDTH - sides_zeroes);
> +    mx <<= sides_zeroes;
> +    mx -= hd * my;
> +    while (__glibc_unlikely (mx > my))
> +      mx -= my;
> +  }
> +  uint32_t hd = (mx * inv_hy) >> (BIT_WIDTH - exp_diff);
> +  mx <<= exp_diff;
> +  mx -= hd * my;
> +  while (__glibc_unlikely (mx > my))
> +    mx -= my;
> +
> +  return make_float (mx, ey, sx);
>  }
>  libm_alias_finite (__ieee754_fmodf, __fmodf)
> diff --git a/sysdeps/ieee754/flt-32/math_config.h b/sysdeps/ieee754/flt-32/math_config.h
> index 23045f59d6..cdab3a36ef 100644
> --- a/sysdeps/ieee754/flt-32/math_config.h
> +++ b/sysdeps/ieee754/flt-32/math_config.h
> @@ -110,6 +110,95 @@ issignalingf_inline (float x)
>    return 2 * (ix ^ 0x00400000) > 2 * 0x7fc00000UL;
>  }
>  
> +#define BIT_WIDTH       32
> +#define MANTISSA_WIDTH  23
> +#define EXPONENT_WIDTH  8
> +#define MANTISSA_MASK   0x007fffff
> +#define EXPONENT_MASK   0x7f800000
> +#define EXP_MANT_MASK   0x7fffffff
> +#define QUIET_NAN_MASK  0x00400000
> +#define SIGN_MASK       0x80000000
> +
> +static inline bool
> +is_nan (uint32_t x)
> +{
> +  return (x & EXP_MANT_MASK) > EXPONENT_MASK;
> +}
> +
> +static inline bool
> +is_quiet_nan (uint32_t x)
> +{
> +   return (x & EXP_MANT_MASK) == (EXPONENT_MASK | QUIET_NAN_MASK);
> +}
> +
> +static inline bool
> +is_inf_or_nan (uint32_t x)
> +{
> +  return (x & EXPONENT_MASK) == EXPONENT_MASK;
> +}
> +
> +static inline uint16_t
> +get_unbiased_exponent (uint32_t x)
> +{
> +  return (x & EXPONENT_MASK) >> MANTISSA_WIDTH;
> +}
> +
> +/* Return mantissa with the implicit bit set iff X is a normal number.  */
> +static inline uint32_t
> +get_explicit_mantissa (uint32_t x)
> +{
> +  uint32_t p1 = (get_unbiased_exponent (x) > 0 && !is_inf_or_nan (x)
> +    ? (MANTISSA_MASK + 1) : 0);
> +  uint32_t p2 = (x & MANTISSA_MASK);
> +  return p1 | p2;
> +}
> 
> I don't think we need this (and anything called by it).
> 
> +static inline uint32_t
> +set_mantissa (uint32_t x, uint32_t m)
> +{
> +  m &= MANTISSA_MASK;
> +  x &= ~(MANTISSA_MASK);
> +  return x |= m;
> +}
> +
> +static inline uint32_t
> +get_mantissa (uint32_t x)
> +{
> +  return x & MANTISSA_MASK;
> +}
> +
> +static inline uint32_t
> +set_unbiased_exponent (uint32_t x, uint32_t e)
> +{
> +  e = (e << MANTISSA_WIDTH) & EXPONENT_MASK;
> +  x &= ~(EXPONENT_MASK);
> +  return x |= e;
> +}
> +
> +/* Convert integer number X, unbiased exponent EP, and sign S to double:
> +
> +   result = X * 2^(EP+1 - exponent_bias)
> +
> +   NB: zero is not supported.  */
> +static inline double
> +make_float (uint32_t x, int ep, uint32_t s)
> +{
> +  int lz = __builtin_clz (x) - EXPONENT_WIDTH;
> +  x <<= lz;
> +  ep -= lz;
> +
> +  uint32_t r = 0;
> +  if (__glibc_likely (ep >= 0))
> +    {
> +      r = set_mantissa (r, x);
> +      r = set_unbiased_exponent (r, ep + 1);
> +    }
> +  else
> +    r = set_mantissa (r, x >> -ep);
> +
> +  return asfloat (r | s);
> +}
> 
> This is overly complex, make_float is as trivial as:
> 
> static inline double
> make_float (uint32_t x, int ep, uint32_t s)
> {
>   int lz = __builtin_clz (x) - EXPONENT_WIDTH;
>   x <<= lz;
>   ep -= lz;
> 
>   if (__glibc_unlikely (ep < 0))
>   {
>     x >> -ep;
>     ep = 0;
>   }
>   return asfloat (s + x + (ep << MANTISSA_WIDTH));
> }
> 
> And now you don't need set_unbiased_exponent and set_mantissa either.

Indeed it is simpler, thanks.

> 
> Cheers,
> Wilco