From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi1-x232.google.com (mail-oi1-x232.google.com [IPv6:2607:f8b0:4864:20::232]) by sourceware.org (Postfix) with ESMTPS id 2CDAA3858C83 for ; Wed, 15 Mar 2023 17:51:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2CDAA3858C83 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-oi1-x232.google.com with SMTP id bj30so14756756oib.6 for ; Wed, 15 Mar 2023 10:51:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1678902661; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=Wib67622FA+9z46+mQArMRhU5zB8keCM5sOJcyw98xM=; b=ogkx4skkqtI5mSdJHRcTrCluK9Ke86s8bOaS4/gIxjS0La59rB5/tG0XTOESzFnzx0 fcYICHQeIQ4IZ/9VgJz8udXwshtHeFyMZZbaSARI7fK5YG0IbMNH29vFSzgahO7ytxj5 5p0lWJhcO6Is0h0VSw70J9uKRDfFj2lX6QVKqXHfH0Ox5aU4s2V7yQTvizLyaNW39xeJ bfivEcc2uavYxeMU+jID/OeKz5xv0uYsoarPO6EzXme70Pc6cFxmu5RBtai+MGphA6iV jgllNH0WbWLsdMH+4nabpcXP+mflpp1GBJ9WqvLu6ikOKBVvttr4zv3fSFEAnQ8Wxv39 TZqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678902661; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Wib67622FA+9z46+mQArMRhU5zB8keCM5sOJcyw98xM=; b=heFmzpZK3KcICiv1+GqDXw6qtIGND2TtYPKtFXepxU05GMZSK9owQuqcqtJMjKpv7N Tp2T0/k7Vgfy/VIsSQlby679Fvf5cJuOySbvv2VSRugTDi73vjycSmyrvEyJhFhqSwYT vmtYZ5tdX8rc8LaWSh4UH5hvMOOWob3yWO6f+AN5bZDQQohY53U/o5ViGge2RrzeRU+Z NUbm/8fOzOdLuLYJ5XlryoCEfUz+XjCHzCrrz6WPzp+UR+lMlzjMkTT6xaaMiBuyhFB+ pZ4JrBUq2KDFlPuxlPG/5zXJq0TnirVKo1mYX7A+lbGK5uL53ffIizX8cF2N3fb/bO9b 2/uw== X-Gm-Message-State: AO0yUKVlw5ISAydHAOaJWCCWdnWetHDy7xpbUaxUkf0Y2E1/ruBSFBUp K8LdZAwnDONhH43gC/bEYhL5mhL/RRt7qIhoWDa3Wg== X-Google-Smtp-Source: AK7set8QUf8sT4tZdc1S36YgsNvY9uCaK/abR80vpndeUxpo+VK2v6xMGinHFNiOfy23aXWBVIfODA== X-Received: by 2002:a54:4018:0:b0:384:32e0:e0b5 with SMTP id x24-20020a544018000000b0038432e0e0b5mr1449019oie.11.1678902661239; Wed, 15 Mar 2023 10:51:01 -0700 (PDT) Received: from ?IPV6:2804:1b3:a7c0:544b:8d01:8008:2c0e:dce7? ([2804:1b3:a7c0:544b:8d01:8008:2c0e:dce7]) by smtp.gmail.com with ESMTPSA id r64-20020acada43000000b003646062e83bsm2426181oig.29.2023.03.15.10.50.59 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 15 Mar 2023 10:51:00 -0700 (PDT) Message-ID: <420f7c21-8dcd-078c-fbc2-6ee8b7d2af53@linaro.org> Date: Wed, 15 Mar 2023 14:50:57 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.8.0 Subject: Re: [PATCH 4/4] math: Improve fmodf Content-Language: en-US To: Wilco Dijkstra , "libc-alpha@sourceware.org" , "H . J . Lu" Cc: kirill References: <20230310175900.2388957-1-adhemerval.zanella@linaro.org> <20230310175900.2388957-5-adhemerval.zanella@linaro.org> From: Adhemerval Zanella Netto Organization: Linaro In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-10.2 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 14/03/23 13:42, Wilco Dijkstra wrote: > Hi Adhemerval, > > This looks good overall. I guess we still needs error handling and the wrapper > disabled since it spends 1/3 of the time in the useless wrapper but that can be > a separate patch. The wrapper should be simple to add. > > A few comments below, there are a few things that look wrong, and I think most > of the helper functions just add extra complexity and branches for no gain: > > > +  uint32_t sx = hx & SIGN_MASK; > +  /* Get |x| and |y|.  */ > +  hx ^= sx; > +  hy &= ~SIGN_MASK; > + > +  /* Special cases: > +     - If x or y is a Nan, NaN is returned. > +     - If x is an inifinity, a NaN is returned. > +     - If y is zero, Nan is returned. > +     - If x is +0/-0, and y is not zero, +0/-0 is returned.  */ > +  if (__glibc_unlikely (hy == 0        || hx >= EXPONENT_MASK || hy > EXPONENT_MASK)) > +    return (x * y) / (x * y); > + > +  if (__glibc_unlikely (hx <= hy)) > +    { > +      if (hx < hy) > +       return x; > +      return sx ? -0.0 : 0.0; > > This should be return asfloat (sx); Ack. > > +    } > + > +  int ex = get_unbiased_exponent (hx); > +  int ey = get_unbiased_exponent (hy); > > Should be hx >> MANTISSA_WIDTH since we cleared the sign bits (now we don't > need to add get_unbiased_exponent). Ack. > > +  /* Common case where exponents are close: ey >= -103 and |x/y| < 2^8,  */ > +  if (__glibc_likely (ey > MANTISSA_WIDTH && ex - ey <= EXPONENT_WIDTH)) > +    { > +      uint32_t mx = get_explicit_mantissa (hx); > +      uint32_t my = get_explicit_mantissa (hy); > > Note this is equivalent to: > > mx = (hx & MANTISSA_MASK) | (MANTISSA_MASK + 1); Ack. > > So we don't need get_explicit_mantissa (or we could change it to do the above). > If we do this before the if statement, we don't need to repeat it below. > > +      uint32_t d = (ex == ey) ? (mx - my) : (mx << (ex - ey)) % my; > +      if (d == 0) > +       return 0.0; > > Looks like a bug, should be asfloat (sx)? In fact I think this check is not really required since we already if inputs are equal. > > +      return make_float (d, ey - 1, sx); > +    } > + > +  /* Special case, both x and y are subnormal.  */ > +  if (__glibc_unlikely (ex == 0 && ey == 0)) > +    return asfloat (hx % hy); > > Similarly, shouldn't this be asfloat (sx | (hx % hy))? Yes, I will add tests to check for it. > > +  /* Convert |x| and |y| to 'mx + 2^ex' and 'my + 2^ey'.  Assume that hx is > +     not subnormal by conditions above.  */ > +  uint32_t mx = get_explicit_mantissa (hx); > +  ex--; > + > +  uint32_t my = get_explicit_mantissa (hy); > > If we set mx/my above then this isn't needed. Right, I think we can mask out the mantissa and set the implicit bit then. > > +  int lead_zeros_my = EXPONENT_WIDTH; > +  if (__glibc_likely (ey > 0)) > +    ey--; > +  else > +    { > +      my = get_mantissa (hy); > > This is really my = hy; since we know hy is positive denormal. Indeed. > > +      lead_zeros_my = __builtin_clz (my); > +    } > + > +  int tail_zeros_my = __builtin_ctz (my); > +  int sides_zeroes = lead_zeros_my + tail_zeros_my; > +  int exp_diff = ex - ey; > + > +  int right_shift = exp_diff < tail_zeros_my ? exp_diff : tail_zeros_my; > +  my >>= right_shift; > +  exp_diff -= right_shift; > +  ey += right_shift; > + > +  int left_shift = exp_diff < EXPONENT_WIDTH ? exp_diff : EXPONENT_WIDTH; > +  mx <<= left_shift; > +  exp_diff -= left_shift; > + > +  mx %= my; > + > +  if (__glibc_unlikely (mx == 0)) > +    return sx ? -0.0 : 0.0; > > Should be asfloat (sx); Ack. > > +  if (exp_diff == 0) > +    return make_float (my, ey, sx); > + > +  /* Assume modulo/divide operation is slow, so use multiplication with invert > +     values.  */ > +  uint32_t inv_hy = UINT32_MAX / my; > +  while (exp_diff > sides_zeroes) { > +    exp_diff -= sides_zeroes; > +    uint32_t hd = (mx * inv_hy) >> (BIT_WIDTH - sides_zeroes); > +    mx <<= sides_zeroes; > +    mx -= hd * my; > +    while (__glibc_unlikely (mx > my)) > +      mx -= my; > +  } > +  uint32_t hd = (mx * inv_hy) >> (BIT_WIDTH - exp_diff); > +  mx <<= exp_diff; > +  mx -= hd * my; > +  while (__glibc_unlikely (mx > my)) > +    mx -= my; > + > +  return make_float (mx, ey, sx); >  } >  libm_alias_finite (__ieee754_fmodf, __fmodf) > diff --git a/sysdeps/ieee754/flt-32/math_config.h b/sysdeps/ieee754/flt-32/math_config.h > index 23045f59d6..cdab3a36ef 100644 > --- a/sysdeps/ieee754/flt-32/math_config.h > +++ b/sysdeps/ieee754/flt-32/math_config.h > @@ -110,6 +110,95 @@ issignalingf_inline (float x) >    return 2 * (ix ^ 0x00400000) > 2 * 0x7fc00000UL; >  } >   > +#define BIT_WIDTH       32 > +#define MANTISSA_WIDTH  23 > +#define EXPONENT_WIDTH  8 > +#define MANTISSA_MASK   0x007fffff > +#define EXPONENT_MASK   0x7f800000 > +#define EXP_MANT_MASK   0x7fffffff > +#define QUIET_NAN_MASK  0x00400000 > +#define SIGN_MASK       0x80000000 > + > +static inline bool > +is_nan (uint32_t x) > +{ > +  return (x & EXP_MANT_MASK) > EXPONENT_MASK; > +} > + > +static inline bool > +is_quiet_nan (uint32_t x) > +{ > +   return (x & EXP_MANT_MASK) == (EXPONENT_MASK | QUIET_NAN_MASK); > +} > + > +static inline bool > +is_inf_or_nan (uint32_t x) > +{ > +  return (x & EXPONENT_MASK) == EXPONENT_MASK; > +} > + > +static inline uint16_t > +get_unbiased_exponent (uint32_t x) > +{ > +  return (x & EXPONENT_MASK) >> MANTISSA_WIDTH; > +} > + > +/* Return mantissa with the implicit bit set iff X is a normal number.  */ > +static inline uint32_t > +get_explicit_mantissa (uint32_t x) > +{ > +  uint32_t p1 = (get_unbiased_exponent (x) > 0 && !is_inf_or_nan (x) > +    ? (MANTISSA_MASK + 1) : 0); > +  uint32_t p2 = (x & MANTISSA_MASK); > +  return p1 | p2; > +} > > I don't think we need this (and anything called by it). > > +static inline uint32_t > +set_mantissa (uint32_t x, uint32_t m) > +{ > +  m &= MANTISSA_MASK; > +  x &= ~(MANTISSA_MASK); > +  return x |= m; > +} > + > +static inline uint32_t > +get_mantissa (uint32_t x) > +{ > +  return x & MANTISSA_MASK; > +} > + > +static inline uint32_t > +set_unbiased_exponent (uint32_t x, uint32_t e) > +{ > +  e = (e << MANTISSA_WIDTH) & EXPONENT_MASK; > +  x &= ~(EXPONENT_MASK); > +  return x |= e; > +} > + > +/* Convert integer number X, unbiased exponent EP, and sign S to double: > + > +   result = X * 2^(EP+1 - exponent_bias) > + > +   NB: zero is not supported.  */ > +static inline double > +make_float (uint32_t x, int ep, uint32_t s) > +{ > +  int lz = __builtin_clz (x) - EXPONENT_WIDTH; > +  x <<= lz; > +  ep -= lz; > + > +  uint32_t r = 0; > +  if (__glibc_likely (ep >= 0)) > +    { > +      r = set_mantissa (r, x); > +      r = set_unbiased_exponent (r, ep + 1); > +    } > +  else > +    r = set_mantissa (r, x >> -ep); > + > +  return asfloat (r | s); > +} > > This is overly complex, make_float is as trivial as: > > static inline double > make_float (uint32_t x, int ep, uint32_t s) > { >   int lz = __builtin_clz (x) - EXPONENT_WIDTH; >   x <<= lz; >   ep -= lz; > > if (__glibc_unlikely (ep < 0)) > { > x >> -ep; > ep = 0; > } > return asfloat (s + x + (ep << MANTISSA_WIDTH)); > } > > And now you don't need set_unbiased_exponent and set_mantissa either. Indeed it is simpler, thanks. > > Cheers, > Wilco