From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 8136 invoked by alias); 15 Mar 2018 12:01:05 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 7925 invoked by uid 89); 15 Mar 2018 12:00:51 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_PASS,TIME_LIMIT_EXCEEDED autolearn=unavailable version=3.3.2 spammy=Hx-languages-length:2664, million X-HELO: EUR03-AM5-obe.outbound.protection.outlook.com Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Szabolcs.Nagy@arm.com; Cc: nd@arm.com Subject: Re: [PATCH] v12 Improves __ieee754_exp() performance by 6-11% on aarch64/sparc/x86. To: Patrick McGehearty , libc-alpha@sourceware.org References: <1521087720-23806-1-git-send-email-patrick.mcgehearty@oracle.com> From: Szabolcs Nagy Message-ID: Date: Thu, 15 Mar 2018 12:01:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux aarch64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <1521087720-23806-1-git-send-email-patrick.mcgehearty@oracle.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-ClientProxiedBy: DB6PR07CA0168.eurprd07.prod.outlook.com (2603:10a6:6:43::22) To VI1PR08MB3294.eurprd08.prod.outlook.com (2603:10a6:803:3e::11) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-HT: Tenant X-MS-Office365-Filtering-Correlation-Id: 2256cec8-0a3a-47f3-818b-08d58a6c5483 X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(7020095)(4652020)(48565401081)(5600026)(4604075)(2017052603328)(7153060)(7193020);SRVR:VI1PR08MB3294; X-Microsoft-Exchange-Diagnostics: 1;VI1PR08MB3294;3:CULEnLEo7xrMq+zp5eKeeEPNDwYoRk/ms7dWbYeAhZ3CfbL6eBH/x+ljIkJq1i9NmrUYRLzKWBGm3NIWvSh17IQXepv5xEoQEAsqEAD150ArtWN+u65gOtSNTrHP3ZvQ9RNhkDpvUXnM8bz1cH8vj0LUXsq1K79IwLCZYIhTZR3BaIPu6GougYFJPATqTp/y82UjRiB5oRmP8maPxETA3ndVwlykZHo7n/qyCBmNUnlvoeJh1ecOipC9MLShFW3B;25:df/LN1IsxxXMyiYDbgDbkb+TsnEvdBXoW/p1Ynud1u9U85DxkgpIBvoGATBrDBJNw8J2g1NMtgg+a8zc63qNJ1ZrAMMmkqnuDtob6q6pgREhdhMUaE3deYlnojRSfaiWnEd7YNacbfdj/SfertUTBcTGIInJBDEBsL4Tab5IHGRzugVh4CGH8Giha2PMpJgyLRiFbyIlqSf+J6187n4pwpPavFa8LKcJCmtGDcMESDrMumKW6Wf5G4fmeN5W24fbL7UiAtBkvokjlhnqL7ng6aNO2ax4ajYQ9EaHONJlC1IHXxbtPGiu9PC1Xqbubtz8VNal8xzOKKO5ILa08C+2sw==;31:2W8AFmCB1JWptPUIZRMQ8+y8a4QyJOKRYCCl40eNLNU42DqeV0/VngUiEoGE/cxAiXyw6b628sir6dcR2xY0kAbNmu6B9dxbVlTHs555w41Mi9LK/4LnsLum1L29Wb6pELEdbuL/fpAAdc20Fdw1ZMz4MAiKQ2aElP4n7aCMc+NxaNuFJUK1RQTrDYWGT1RTHHS5qKsVSqzwcbRkQRcRNqPc/9fnMhayyW3jwEYapjE= X-MS-TrafficTypeDiagnostic: VI1PR08MB3294: NoDisclaimer: True X-Microsoft-Exchange-Diagnostics: 1;VI1PR08MB3294;20:3+qkJ4J3+Y7hM4qnaVN3e+wexLJ4uDhaC0R/Kmf7v0+y7CxiUEYo561D67WbuKmE+IrDMOLfMEhIZqE7af1as2LZgcQmhEfoVMuSCEofL9F+AbZwHr1WJND0M/OPssrLABxqX1HjVIm74zFGyRJ7Lq5+h7QOPuRF5yHwMD2c5Eo=;4:mbmAQDTkSTnmGRwU113L/Z7Q88Hnzv3mHVFsEbgtVgdJtj0xb8jLBRhgfSWinZ6qxj6dLkU83BKXuk4X8jLD/Y5gHFceN2hsBN/StoKyz/honN2PXUdCYlz/Akk5xUgaZ5KukGSEUEbk9PEczTOcEUPJwmDWHL0C/EUND8EYl3MljEAu2MzIMLaVjV+aEEhnejIHNA4FzDSBxBgk7zLAx1epTwFrHzdSrf4q3hCmXybDsY98YpnwY5Bh/TQO1tPH+1RxsDCimB1qHxzqEY6DyA== X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(8211001083)(6040522)(2401047)(5005006)(8121501046)(93006095)(93001095)(3231221)(944501244)(52105095)(3002001)(10201501046)(6055026)(6041310)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123558120)(20161123564045)(20161123560045)(20161123562045)(6072148)(201708071742011);SRVR:VI1PR08MB3294;BCL:0;PCL:0;RULEID:;SRVR:VI1PR08MB3294; X-Forefront-PRVS: 0612E553B4 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10009020)(6049001)(366004)(39380400002)(39860400002)(396003)(346002)(376002)(199004)(189003)(106356001)(67846002)(50466002)(25786009)(65826007)(76176011)(23676004)(52116002)(2486003)(52146003)(8936002)(47776003)(36756003)(58126008)(4326008)(59450400001)(66066001)(65956001)(65806001)(6246003)(105586002)(81156014)(97736004)(81166006)(2906002)(16576012)(3846002)(68736007)(5660300001)(8676002)(229853002)(6116002)(72206003)(230700001)(16526019)(26005)(305945005)(31696002)(77096007)(478600001)(386003)(2950100002)(53936002)(6486002)(53546011)(31686004)(316002)(86362001)(7736002)(64126003);DIR:OUT;SFP:1101;SCL:1;SRVR:VI1PR08MB3294;H:[10.2.206.230];FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; Received-SPF: None (protection.outlook.com: arm.com does not designate permitted sender hosts) X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtWSTFQUjA4TUIzMjk0OzIzOktzMnd4emVSNWg2MGhtTDJkRXRyMktGdTV3?= =?utf-8?B?Unl2ZndvaFNiSVQyKzVQZWUwZU05ejZ1aWZhYXgyWEM1TjZOSnNKdVF1THBE?= =?utf-8?B?NFc3bnpUemtITXdLeGpzSk9UYzJxQzFOWWxRWXhXZ0YraTdXMzdaV2wyVGpU?= =?utf-8?B?QUhOc3hUM3NoL2wyU1p3ck0yVGdRVDZlWHNCcTRPSjZaZjJiVVdURTd0bjIv?= =?utf-8?B?YklhYmx5L0o0RlZGVVhjS0N2RVBYNXkxOFZUVFQ5WmJBVHBwK0lmb2xoTUpF?= =?utf-8?B?dGZOclBweHRBTXBqS2hMR2lBTStDd3A4Q0piMEZsSlhDdmRYc2xpSXZESUkx?= =?utf-8?B?bS9EenpnblZFSERJS2VyZGlnbDRaSEtYR2MvL0F5WU5sTUlNNGxJMEwxY0Jo?= =?utf-8?B?OWo2SkRZR1BUYjdwRXdtZS9uM2dvcnB6RGRMamd4TlpiRUJUeWx3QXBCVW1G?= =?utf-8?B?NGpIREtXbXFzalBlNWF5KzhjYmZVdWlML296MnprWkxlMjY5WEhJNmpZSzI3?= =?utf-8?B?cWFlMG05bW5rUUdDWkhsUzh4MXVEb21LRGtPQ2dFSWVwVkRpQ0prNzFZSUh5?= =?utf-8?B?R2Fmak5iUk1uVE0zS1FlS2hFMEFDSnlxS243Z2xZT25RcENPb0FObzJvTGty?= =?utf-8?B?ZE5jMWQ1SGRIUCtaeUdZTk5PdzIyZ0JsbzlQM2FEbS9haFVkYnVSdnFJSDlK?= =?utf-8?B?UFZ5K1V1SExnODMrT0NDaXByeHQzTjZOaHpnekREcUd1ZmtBRGZaZmhSTzY1?= =?utf-8?B?VnQxempPMnNweU1jYSs5TWJDMk5SbjljelBabEQ2MUR0L3VXdXgyODd2NjJT?= =?utf-8?B?K0RueGJEQVFjdHZCSWFtZVlnL2ZvYzlsNUhiOUErbG9UVjFRZlVSUWQvRkk5?= =?utf-8?B?Z2lOTU11WU4wWWRZNmhvS2xvL2pSa3hFblBkTXphRDB6MHBQbm8zZjVSRGVY?= =?utf-8?B?ZUhmck1ISXJlUUlsQU5iRGdQemk3VmVLZXNSRHN2Zjk2WUFqdjVJWUl4NGVM?= =?utf-8?B?Z3hFQ3lFVFNTWnluZG5XUWtDTVFKbnNZcVhyVTRiaitUNEp6bFFaWUtHTWNW?= =?utf-8?B?ZC9uVXpCOVduVXhtUlZHbS9zN3NIZVEwVVkyVG5iVU5sZFdtZ2tSZE5Ka1FT?= =?utf-8?B?REh1QnhIbkZMaVI5VnJ2R1FKM1ZOSHYyN0FvYW1qWTcrRzVVVUt3NUFYckpS?= =?utf-8?B?SUFrdEhwL0FtNG9oQjRQWmtKVXVKVmg2RmdyT2FXaVBPUjh0bktrQlloUzdU?= =?utf-8?B?VHZ1VVlNM05uR2w5cE1Da2NFT2hrQU51RlN3MEdFSUpKSy9VQVY2aDQvelhk?= =?utf-8?B?MEppZlRuTXl1cUdpZER3TGV3bUxoV3I4NGpUZXFud05GRTFoTmNHVXlmMkFj?= =?utf-8?B?aFc0bER1T1hLU3E2SDZVRkJGdFZPTWJ6K2pSMms0M2c2cFpYd1RhMUxXOUF3?= =?utf-8?B?Ylh1NStqVFdiNkhXNWQzK09Va3psS21pNnQ4RFRsVmJoL0dHbllkaWRMVFBH?= =?utf-8?B?aUJoK3pOVzl6Z0VyM1J3V3U2SHg5VnNWOGFCY3kzWHNobDF3V0tBL3B4ei9s?= =?utf-8?B?bllpQjQwcDlHajRkdlIvcnNBL2FQcTJMeFVjTXozaVppdmtHZHliNVpUNU14?= =?utf-8?B?VTVUUjkvQkFQREFXcHRhWWxIeWY1MzBILzJUVzBiYTh6dUZBY3NMajVaNUlJ?= =?utf-8?B?U0RKRVpQV2FoektUTi8vb2llVXBtS1FXWDZCZmJ1WWVhYVJOSXk1dUhqQ093?= =?utf-8?B?cmpIeFNUWVdzVngwQ0YrQ3cvZG5oaFFMQ2JFSmw1RGxiTHU3eFVKdERtOERO?= =?utf-8?B?Y3A5QkxkNzM2ZkFTQ3lUMi9hSFNYcjlEY2Fibnk2WHlKa2c9PQ==?= X-Microsoft-Antispam-Message-Info: jL3v6NYAa2EG4jmRYHD1JAyk2e9qagS20JfMON7hPKjaKfXEfOowkMt/TA80fEcWMdvv8IzH/6lb8DnnCIRBWOS/0UtApiweoGLjTzu9IhgMF9UO5iP08lMM/iByzzQReRC/qmLdWCWw3+14dk5mNeQk5sXQbjq+awwYZ5ekHNLi8yTVdu5PUvRA3SvFjyNp X-Microsoft-Exchange-Diagnostics: 1;VI1PR08MB3294;6:LkBsnvqCotNuyf7fgqiVwO0TKiXPe3CcaTd5C8VITXBYksJTUJup98Gfsgo1+ZGLNodZg5QOG3VDSxNMd399i19asy0toX6WGawWnMIbByM/V6BNty98yx3cVQ78P4Rt6PMXj2lexGDXNImMOviAE4vXQCRXdTdnUsWj17emxYEvqY5HBqce0IWOmjaI5M317C9Rk4NRek21LPH8TZ5hlpEGVUArxfk7nP4pv+KFBj76miuEt8leDKZOcv9h+1WmoXLuAbMvDhWeG9BaFLLpvRhdZLes7noTFEPCeDfFtIkE6f8LyqRXvBOJKoWIdztqpm3iogF3OzifpFJZRCZi/3y40+BT54taTdhIRIfhpkw=;5:ZEqOkZnKTx6IGtBjDvHJ0DYJBJMF9t0YQHgaDQmXwwBbHWITY5yW7hB/hPIOrMjDHul0HRJAYxOv4fdqbNIqRPfI3DUbbc/Oju181j7PkgqraA3f1QEnehto983xsPnEVHMYnOg47LA2u1stOCBGYTFHCyWACXCpc5wgAIgAcx4=;24:4hKz8eOy+HitiMHl7X8JxAJP387WfSjdrI2O3Fzf383/hkt7mTTKb5DmM+nx0HSx9EE9tV19Ochiqm9QnjMDaQohY7ruWA+Tyr0WBB2etqY=;7:u8MXSjWH0Fccj9LNpRVZJglBF2Q74weY+Vp3JixUnfdrN4vfZcdlymD3VL2WeE2DFFq9VjCLHFfwzCOcd9RbJLCS+AHIxjsRoaq+FH0ldSIRjE0Or+rXw9rYIUFy2+YTGpZ2eTRp3/5zuUU88VSeUlBrLDI5RMzfLuVrnrMpP4x6E+VLiUSvHsrPT4SUB6O6uWDYEG7487efN1tnZj/H3QKy0MQSR32xxs6CEqHTjdEDDpoYP35s8PmErgCWKaQq SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Mar 2018 12:00:21.7111 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 2256cec8-0a3a-47f3-818b-08d58a6c5483 X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB3294 X-SW-Source: 2018-03/txt/msg00359.txt.bz2 On 15/03/18 04:22, Patrick McGehearty wrote: > New with this version: > Only updates to e_exp.c and eexp.tbl plus revised > libm-test-ulps for aarch64/sparc/x86_64 as removal of slowexp() > was accomplished by prior patch. > > Summary of patch rationale > > These changes will be active for all platforms that don't provide > their own exp() routines. They will also be active for ieee754 > versions of ccos, ccosh, cosh, csin, csinh, sinh, exp10, gamma, and > erf. > > Typical performance gains are 6% on aarch64, 28% on Sparc s7 and 11% > on x86_64 based on the glibc_perf tests. > i think this is the wrong measurement for this algorithm: it uses two different methods for about |x|<1 and |x|>1 the first is fast (but uses yet another table) the second is slow (!) and the branches that decide can easily mispredict in sensible workloads. so i think on all targets (including sparc) one could do better by using a single method (assuming that can give similar speed to the fast method but on the entire input range). > Glibc correctness tests for exp() and expf() were run. Within the test > suite 1 input value was found to cause a 1 ulp difference when > "FE_TONEAREST" rounding mode is set. No differences in exp() > were seen for the tested values for the other rounding modes. > > When tested over a range of 10 million input values, the new code > gets a 1 ulp error approximately 1.6 times per 1000 values. > That rate was similar for all four rounding modes. > The patch uses a 64 entry scaling table. The existing > code uses a 512 entry table. > > Further optimization is possible in the handling of rounding > modes. Using get_rounding_mode and libc_fesetround() instead of > SET_RESTORE_ROUND provides a measurable gain for Sparc. > Unfortunately, on x86, one works with sse fp unit rounding mode while > the other works on x87 fp unit rounding mode. Adding libc_fegetround, > libc_fegetroundf and libc_fegetroundl to to match libc_fesetround() > should not be too large a task but outside the scope of this patch. the rounding mode setting should be completely removed. (after analysis of the worst-case non-nearest rounding errors) i think non-nearest error of this algorithm should be at most 1ulp without rounding mode change and functions using exp may see 1-2ulp error increase in non-nearest rounding mode (but if that's too high that should be fixed on the call site). but i dont want you to spend time changing the code, i'll post my exp variant soon, so you can benchmark it on sparc then we can continue the discussion depending on the results.