From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 104482 invoked by alias); 11 Sep 2017 18:50:36 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 104462 invoked by uid 89); 11 Sep 2017 18:50:35 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 spammy=complicating, Hx-languages-length:2499 X-HELO: EUR01-VE1-obe.outbound.protection.outlook.com From: Wilco Dijkstra To: "patrick.mcgehearty@oracle.com" CC: "libc-alpha@sourceware.org" , nd Subject: Re: [PATCH] improves exp() and expf() performance on Sparc. Date: Mon, 11 Sep 2017 18:50:00 -0000 Message-ID: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Wilco.Dijkstra@arm.com; x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1;DB6PR0801MB1350;6:v4rhAktBOn2RjZj0M9KN+8j2e7CHL9I+F4rdZBOpHRbBMwQw5aeOlJC9PRYR7q5+oJv4gdyzDwhsWZdtSAJI92AtNm0xisXD8OB6RYW3GnMUQQ4hYHAxvDYNioiuJXlQfJ+A+zlQnsBINGcn/U7yqwjNVW6Bq2z8hSQaEn3FPkLC8kHrqIKeiOSZKEFKxyWleLyUgah0Zkj65ESWHf7IN0Q0WSEb7a4gqclRIsMSiEztfd+eDQYrLYoc2nHyWdVRBoeRPEK6cGXTHaCFNQwzT3qWAm6BPkecRqHyEAfsoJVvsFroihYK8U8+YKUgDu5NuvSGZLnCa6zMws/1bsJJNA==;5:4oh3d2C3Px7jvPyJi0K3PJMTiVPb7jxGJT+RcFm0WKZE1Wt5c7LIQaTZu2/adGGs5oVYn82GiAeYaUodoDXfj+ERf4mmxI5SxpEb1gPY843paUpbbJkoZy+MyI9ElqMZ3dtA2gnwpXGj3RbRB5CrXA==;24:4PjaPvgkndVDOK7kDqVkti75I+wn0If6DtiJvh596Ncux8FXReMzGNfXPSdG9/cbUmSfdCqjKoy5899n64M73ylIq79sDXuwiaZvAAN5FdE=;7:p1o+QTRiY2Y5bHhpmMPYM8tZSy1qVhdDEX+bnY13lOo9pmjrJG7VWiXPOq3IrBSDZHK4pczS0sG7e7tNEgXJR9wUBpFYatcqx7CIojkJKTHFUjlGQEUnecA/unjCIdELwwAhhdQ7JBCw7fROx1/w9ADK6SWzJ4DySEUPr2BcQduYDUvt7kDYN9vZH18BTzTUEaxbEALli/cjNo7PFITrEKXFvjS/orH9MZFFvDxiqvM= x-ms-exchange-antispam-srfa-diagnostics: SSOS; x-ms-office365-filtering-correlation-id: 268f88f1-662d-4cc5-c580-08d4f945f8be x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(300000502095)(300135100095)(22001)(2017030254152)(48565401081)(300000503095)(300135400095)(2017052603199)(201703131423075)(201703031133081)(201702281549075)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095);SRVR:DB6PR0801MB1350; x-ms-traffictypediagnostic: DB6PR0801MB1350: nodisclaimer: True x-exchange-antispam-report-test: UriScan:; x-microsoft-antispam-prvs: x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(2401047)(5005006)(8121501046)(3002001)(100000703101)(100105400095)(10201501046)(93006095)(93001095)(6055026)(6041248)(20161123562025)(20161123564025)(20161123558100)(20161123555025)(20161123560025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095);SRVR:DB6PR0801MB1350;BCL:0;PCL:0;RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095);SRVR:DB6PR0801MB1350; x-forefront-prvs: 04270EF89C x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(6009001)(39860400002)(199003)(24454002)(189002)(72206003)(105586002)(66066001)(81166006)(966005)(33656002)(53936002)(7736002)(25786009)(6436002)(2351001)(305945005)(9686003)(55016002)(6306002)(99286003)(106356001)(5250100002)(54906002)(5640700003)(2900100001)(229853002)(3846002)(6116002)(478600001)(102836003)(6916009)(54356999)(74316002)(50986999)(4326008)(2501003)(8936002)(5660300001)(81156014)(86362001)(101416001)(8676002)(110136004)(189998001)(7696004)(6506006)(14454004)(3660700001)(97736004)(2906002)(68736007)(3280700002)(6246003);DIR:OUT;SFP:1101;SCL:1;SRVR:DB6PR0801MB1350;H:DB6PR0801MB2053.eurprd08.prod.outlook.com;FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-originalarrivaltime: 11 Sep 2017 18:50:28.6612 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB6PR0801MB1350 X-SW-Source: 2017-09/txt/msg00460.txt.bz2 Patrick wrote: > When the differences are this large and the new max is faster than the > old min, I don't see a need in doing further performance testing. Agreed, the new version is significantly faster that there is really no con= test. What isn't obvious is how much penalty the very large tables have. So while I think further improvements are feasible, that shouldn't hold up adding it to generic code. > Moving on to expf, the comparison for individual values shows an > improvement in the range of 15x. benchtests does not measure expf(). We do now have an expf benchmark, see: https://sourceware.org/ml/libc-alpha/2017-08/msg01126.html > The Szabolcs code appears to provide similar benefits. There were > some discussion of accuracy and of possible changes to the algorithm, > perhaps by using a larger table. The Sparc code uses a larger table and > thus may be more accurate for some ulp sensitive values. Or it may be > a non-issue since both algorithms are using double precision for > computation. Part of the discussion was to further improve performance by reducing the polynomial and increasing the table for a small increase in ULP error (stil= l well below 1ULP). Another aspect discussed was what one should do for non-nearest rounding modes - I don't believe we should expect math functions to be perf= ect in those modes if that means complicating or even slowing down round-to-nearest (while this is no longer a critical performance bug on most targets after I= fixed the fenv implementation, it still causes significant slowdowns in many math fun= ctions). Talking about tables, the Sparc version uses very large tables which may be why it didn't do as well in the expf benchmark or running wrf_s (1.9% slowe= r). This appears to be inherent to the algorithm used - while it seems feasible= to almost halve the tables, it would mean lower throughput and increased laten= cy. > Wilco Dijkstra compared the new Sparc code to Szabolcs code on aarch64 > and found Szabolcs code to be 10% faster on aarch64. That result is > close enough to justify testing on Sparc. In addition to a performance > comparison, we'd want to compare accuracy to see if there are notable > differences. Accuracy is unlikely an issue given both are already far more accurate than strictly necessary. For testing I would suggest running the expf trace as w= ell as wrf_s, both built and ifunced in the same way (as Joseph already suggested). Wilco