From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-alpha-return-108991-listarch-libc-alpha=sources.redhat.com@sourceware.org>
Received: (qmail 112490 invoked by alias); 28 Jan 2020 10:47:16 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Received: (qmail 112476 invoked by uid 89); 28 Jan 2020 10:47:16 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-14.3 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_2,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.1 spammy=claim
X-HELO: mail.bitwise.fi
Subject: Re: [PATCH 1/2] ieee754: Remove slow paths from asin and acos
To: Szabolcs Nagy <Szabolcs.Nagy@arm.com>,
 "libc-alpha@sourceware.org" <libc-alpha@sourceware.org>
Cc: nd <nd@arm.com>
References: <20200127104511.16618-1-anssi.hannula@bitwise.fi>
 <714b667d-7345-bcd4-ca30-7ecbb7098bbc@arm.com>
From: Anssi Hannula <anssi.hannula@bitwise.fi>
Message-ID: <158bbf4a-ecb2-34da-f84e-ec05598e3746@bitwise.fi>
Date: Tue, 28 Jan 2020 10:51:00 -0000
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101
 Thunderbird/68.4.2
MIME-Version: 1.0
In-Reply-To: <714b667d-7345-bcd4-ca30-7ecbb7098bbc@arm.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-SW-Source: 2020-01/txt/msg00595.txt.bz2

On 27.1.2020 14.51, Szabolcs Nagy wrote:
> On 27/01/2020 10:45, Anssi Hannula wrote:
>> asin and acos have slow paths for rounding the last bit that cause some
>> calls to be 500-1500x slower than average calls.
>>
>> These slow paths are rare, a test of a trillion (1.000.000.000.000)
>> random inputs between -1 and 1 showed 32870 slow calls for acos and 4473
>> for asin, with most occurrences between -1.0 .. -0.9 and 0.9 .. 1.0.
>>
>> The slow paths claim correct rounding and use __sin32() and __cos32()
>> (which compare two result candidates and return the closest one) as the
>> final step, with the second result candidate (res1) having a small offset
>> applied from res. This suggests that res and res1 are intended to be 1
>> ULP apart (which makes sense for rounding), barring bugs, allowing us to
>> pick either one and still remain within 1 ULP of the exact result.
>>
>> Remove the slow paths as the accuracy is better than 1 ULP even without
>> them, which is enough for glibc.
>>
>> Also remove code comments claiming correctly rounded results.
> the patch looks good to me.
>
>> After slow path removal, checking the accuracy of 14.400.000.000 random
>> asin() and acos() inputs showed only three incorrectly rounded
>> (error > 0.5 ULP) results:
>> - asin(-0x1.ee2b43286db75p-1) (0.500002 ULP, same as before)
>> - asin(-0x1.f692ba202abcp-4)  (0.500003 ULP, same as before)
>> - asin(-0x1.9915e876fc062p-1) (0.50000000001 ULP, previously exact)
>> The first two had the same error even before this commit, and they did
>> not use the slow path at all.
>>
>> Checking 4934 known randomly found previously-slow-path asin inputs
>> shows 25 calls with incorrectly rounded results, with a maximum error of
>> 0.500000002 ULP (for 0x1.fcd5742999ab8p-1). The previous slow-path code
>> rounded all these inputs correctly (error < 0.5 ULP).
>> The observed average speed increase was 130x.
>>
>> Checking 36240 known randomly found previously-slow-path acos inputs
>> shows 42 calls with incorrectly rounded results, with a maximum error of
>> 0.500000008 ULP (for 0x1.f63845056f35ep-1). The previous "exact"
>> slow-path code showed 34 calls with incorrectly rounded results, with the
>> same maximum error of 0.500000008 ULP (for 0x1.f63845056f35ep-1).
>> The observed average speed increase was 130x.
>>
>> The functions could likely be trimmed more while keeping acceptable
>> accuracy, but this at least gets rid of the egregiously slow cases.
>>
>> Tested on x86_64.
>> ---
> ...
>>  double
>>  SECTION
>>  __ieee754_asin(double x){
>> @@ -100,13 +88,7 @@ __ieee754_asin(double x){
>>        if (res == res+1.00014*cor) return res;
>>        else {
>>  	__doasin(x,0,w);
>> -	if (w[0]==(w[0]+1.00000001*w[1])) return w[0];
>> -	else {
>> -	  y=fabs(x);
>> -	  res=fabs(w[0]);
>> -	  res1=fabs(w[0]+1.1*w[1]);
>> -	  return (m>0)?__sin32(y,res,res1):-__sin32(y,res,res1);
>> -	}
>> +	return w[0];
> if we assume the code is correctly rounded (though your tests
> show it isnt), then we can derive error bounds from the
> checks, e.g.
>
>   if (res == res+1.00014*cor) return res;
>
> here means that the computed result is res + cor, where
> |cor| < 0.5 ulp (in nearest rounding mode), correct rounding
> would require the exact result res + xcor with
> |xcor| < 1.00014*|cor| < .50007 ulp.
>
> so in this case i think that earlier if can be removed too
> with a comment saying that worst case error is expected to
> be .50007 ulp in nearest rounding mode.

Indeed, I suspected as much.
I wasn't really comfortable trying to derive error bounds (and/or remove
more code) as the original results were not 100% accurate and I'm not
too much of an expert on this.

> it would be nice to do similar analysis in other cases
> and have some comments in the code about expected error
> bounds. (but if that seems too much work i'm not against
> the proposed patch.)


I'd prefer to keep the patch as-is as long as that is acceptable, as
I've spent a bit too much time on this already :)

Thanks for taking a look.

-- 
Anssi Hannula / Bitwise Oy
+358 503803997