From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 35217 invoked by alias); 23 Mar 2017 17:52:41 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 35199 invoked by uid 89); 23 Mar 2017 17:52:40 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 spammy=ish, independent X-HELO: EUR03-AM5-obe.outbound.protection.outlook.com Authentication-Results: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=arm.com; Message-ID: <58D40B60.1040302@arm.com> Date: Thu, 23 Mar 2017 17:52:00 -0000 From: Szabolcs Nagy User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.8.0 MIME-Version: 1.0 To: Stefan Liebler , CC: Subject: Re: [PATCH 1/2] Optimize generic spinlock code and use C11 like atomic macros. References: <1481905917-15654-1-git-send-email-stli@linux.vnet.ibm.com> <5857CF10.1060100@arm.com> <628f6311-239c-5eea-572c-c2acae6fcbee@linux.vnet.ibm.com> <1487017743.16322.80.camel@redhat.com> <60a34645-17e4-6693-1343-03c55b0c47ad@linux.vnet.ibm.com> <1487437038.20203.68.camel@redhat.com> <25ad863b-6f20-bfb7-95e6-3b04a2b3eee8@linux.vnet.ibm.com> <1487598702.20203.138.camel@redhat.com> <9c3fc2b3-57b6-b160-3f97-5ce3be05f4c0@linux.vnet.ibm.com> <58D2746A.90405@arm.com> <12a89cb5-13fa-1520-0240-a839542ee61a@linux.vnet.ibm.com> In-Reply-To: <12a89cb5-13fa-1520-0240-a839542ee61a@linux.vnet.ibm.com> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-ClientProxiedBy: VI1PR09CA0062.eurprd09.prod.outlook.com (10.174.49.30) To VI1PR0802MB2494.eurprd08.prod.outlook.com (10.175.23.150) X-MS-Office365-Filtering-Correlation-Id: 89fb80d6-f506-4eed-1014-08d4721563d5 X-MS-Office365-Filtering-HT: Tenant X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001)(48565401081);SRVR:VI1PR0802MB2494; X-Microsoft-Exchange-Diagnostics: 1;VI1PR0802MB2494;3:+AKzOmCTj+32xZ/LF3yLCmjQ+Iy4wRMhDyV35Wy/N0QW8hZoyPcvvtu30wExzH8G867NSi8rdRHvrJw1PuX1+kSNBsPhypnEYRP/C7FqRnlChJ7KIXtEYSCwWAfHIp/RhwYvzwkkSv/9NedSdthZB9ZFCDdqsOgLodKgzHlIxWf73NqGEFAfTUBjgPelpBjnBKzWj6fLR/YIkiyGrQmbQ5mql18fk8v1R/f2cXqprIUyTy9OhvglmjjT5VQ2i2xkcsZcBFKXpVCn1jc58Nzxk3n6Y+4IJygtNHm3m0ZZkww=;25:eG1anwzqJ9SdZD55Wvk/HuAUKYVTYcO7srsqFCd5yyARm2VHJipy2+75lvVwOBmgXFROb7ZDDo2DG59Sdh/5nhPII4f3FPGRpesLsGicY3RozMso1xbBeRUbdz2meQ6s73gmluI66cxG3TMIFdCSNVRW+ZxpxeFVIHyF0nX2WhuIGQ6oTpmElp3I0gPoFW0tbNRxggYuE0jNqAK3UQEfxzyAaqA6cN2vXt556MDg/2fGeyV8p8SKSYFopoufFfOV5v7We6L3KCMO0WlUOWMAef7578n2doRT+U9pf9Zurt5N9O2Py2jP0rjU+278RADqAkWr7hhkcC3XIIh+Zwgt+pGX9fwgx0iu6mizZSU5x/rBBU1T7fTVvbBLoyDmG55uYjhAV0BTBVnosQjQA7Awo/Kj2bPD5IB4A6dvQGmvKj1//HXQ8GxmVoyG8PLGHz/6Lb8sZCheheBxwF52//65GA== X-Microsoft-Exchange-Diagnostics: 1;VI1PR0802MB2494;31:1TFAteVuYqegRuNwAne94Jy8qvBcQ0tYmdJm31rpvuP2T0Pq6O/36jeLYA6fVTO+NEcrHC39hwVhWzjLhpZeYriykHwn4djsbAZAe35Ja43kuJE76vWi/U3sUgjtFQyXfmWv6BoTbHn1inHkQlc+ezk5CDQXXGD7FbkmNZolli8rdCucNxfH+56Ma+TeFwiqU355s/zeba/Qo4kghLcaOAm4oMOBsYBkZk56v8/zSy8=;20:tApfNpppTszwVVqObvY6ALltqzF1fQUsfeSAoqVBnF2uQTRbAZVJ7FSYHd2PGDc6pDbDxs0T+e73gWwtiHwqhxLCvXWk5+UgsRYxKEh6rgOA7qu/6z+WTjkZcP9LsZ1Rd5u4CNl5e1PEhRT7J8RkjPqz+rc7a7F9ECzSclbqGjI= NoDisclaimer: True X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040375)(601004)(2401047)(5005006)(8121501046)(3002001)(10201501046)(6055026)(6041248)(20161123555025)(20161123558025)(20161123564025)(20161123562025)(20161123560025)(6072148);SRVR:VI1PR0802MB2494;BCL:0;PCL:0;RULEID:;SRVR:VI1PR0802MB2494; X-Microsoft-Exchange-Diagnostics: 1;VI1PR0802MB2494;4:TVgv/K/Osguhj2qFnfXBWA64bcH9GnwF56ee0RgywDYLzU8StwQfhcE/gWRk05VqaxyDhheK2SHB9sNyuswo1S6SBeTse0WEgMd9kOr6EyZDl8l5gU/bESTrnLoOr0RJY0Hj6TOETCGkRHWRiIk5DAsefLl6XBa3Y5GdwJRT55+wPy01j8DwaN4x4mTK5NHxThVMhg5Vi5406reYZDI92nu+niR7WRvIWts7bBUYV699Zf6f962bUdshpzL3DmrtUJ7dXUYhQt4O0D5YM/kPiULECQOJMRprDFcIdnqpDKDa36nRpxhtxCwQBOVejyH8yRrhjT6TjADVUUl5KZdjCoB3be7i3bjtMChHWksgf6Z7BQxMURD343q94LX274hYxJ0qBY2js1KBdz97n1mFqyjIbPhJgtg10rt3lLL2Aev94Ny8zSO0eXNrHmANiZkGIpv9Mjo6iI6JMIkhFqNCopN1AOXl00LLGS5/81PZq/b6aSQUay4fHVRR6WfL+VuX1VJL7VOLgw9rsrs7WdGzLXLKOFOUTnRqlBb4Wz0G/dOv0yxVfyT+wDxITuzBmJ4/diBHQqWDhAko/dlBQ8kdbud355sWmkifDeb1+3nO7b6x6RDhuyZlf3HF+aB2s/S5 X-Forefront-PRVS: 0255DF69B9 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10009020)(4630300001)(6009001)(6049001)(39410400002)(39840400002)(39450400003)(39850400002)(39860400002)(24454002)(377454003)(77096006)(33656002)(87266999)(47776003)(54356999)(50986999)(65956001)(90366009)(50466002)(76176999)(6666003)(23676002)(42186005)(305945005)(81166006)(2906002)(65806001)(7736002)(53546009)(66066001)(6486002)(8676002)(189998001)(38730400002)(229853002)(65816999)(93886004)(2950100002)(6116002)(6246003)(53936002)(3846002)(4326008)(83506001)(5660300001)(86362001)(64126003)(25786009)(36756003)(230700001);DIR:OUT;SFP:1101;SCL:1;SRVR:VI1PR0802MB2494;H:[10.2.206.69];FPR:;SPF:None;MLV:sfv;LANG:en; X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtWSTFQUjA4MDJNQjI0OTQ7MjM6MDhTU3U0RWhJcmY4V0tLQmgzTlhzY2Mv?= =?utf-8?B?VVZHWEN4VGhLM2RCUU5LQzVWMC9UcldIVk1hK0VNOVpkV2E0TkEzUGNYUnhv?= =?utf-8?B?U2lzblFxU0tkaTdwd2N2Ui93Z05YR1o0Y2pndUgrUjNhVXJSdlBpWUZSeWdX?= =?utf-8?B?MkhOeElYSjdhbjJKdmM1RjJaVmtTQXJBS2pLY3QvQ0VJdXhiaVlOK1JnQW81?= =?utf-8?B?QkU1NDRvdU11cHhGRXlNV1hHVzk4Yjl2STBUOU9wOEtZZ0lDcUhpOGphZ3Jh?= =?utf-8?B?MFg0WWMwSGViVVR1N3B3MTNqRnF0Q2FuVk9tZXZzcHVKOTZ6cmlpT09qb1Fo?= =?utf-8?B?SUZIMUszQVJncEJ2djlTY2lwaVlSbG41U253eFU2TDF2YWRQV25YUE1FZ0pt?= =?utf-8?B?Uk1LRHh0TnZ1QzZiclduUUZSQ21SNzMyS3NyTGRxK0srOVppdmkrYTNyaDZv?= =?utf-8?B?R3dTYlZTL2ZCeWwrb2RKVGZWdGdWMU5wQjRSOHQ1cGx0N1Y4VCtVR1Arc2dx?= =?utf-8?B?dm53eDRxMWFIc2hnRzhqajVwd2djc3FKUklFcHhmNjRza0xOZU9Mb0MxbzE3?= =?utf-8?B?dGM1dVBhRkVvbFdPRitiY0NVUG05bmxrVWc2dTNTZjlUcXViODB6a2E3TElO?= =?utf-8?B?TEJleWs2alJQaXhkQ0NCeExWMk90SU1aZTdjanRCWnNwUS8ycEk2cFl1eTIz?= =?utf-8?B?eU4xN29sZkJBQXdndU81MXJMblNRRVpWZ3dPOENPb2E5K05US29ndjFqZGZ1?= =?utf-8?B?OEU5enRiNDBWT1IrVDdBVFFxR1phaTYrL29MVUFFaEhra0tSMU1CamNobzlh?= =?utf-8?B?SnB5SVA2eGV5citydmRHckZxSGtnRlJNN1ZoMHRCWmhOMXRFT2RaS1gremly?= =?utf-8?B?SVNaamRXUG0wMEFKWDgyZ1doZWlGQU9CeXEwM3RtY3BvUWhFQ1pnOWZ5Njg2?= =?utf-8?B?d1FQN1B2bEV4bnE0V2kyZVJNdGdGOVVPUWtoclh0OTFEdXd0dk9PNlZrMFpw?= =?utf-8?B?SUxwWUpnM054MkRQdkx3djNmZ00xYlpmL2JQUVRFWFdGM29hN1I5bDdvMWYz?= =?utf-8?B?TjByV0J0bGRpd2pMSUlLdWdVVWd3RVFZeVdHUXZ0U1BJdXoyUjV3V09sbmJH?= =?utf-8?B?VVM1TGRWdWExRjAzOE5Wd1FZOXlMWUVYSUREOVMrbDhvY0pjZFhrNDJzS29J?= =?utf-8?B?Slo4RTF3eUs5ZXJBYmJJNmJYSmJZbEhRcm5GWGZNeElxSTV4UGVHNlUrRkU4?= =?utf-8?B?ZnI5ZkQrUG16U1Bld0dVbHgyT0JWSVVmVFJ4ZVkzNC9VU0VRcmFOalZSTEJn?= =?utf-8?B?amk1ZlBQOTk0bXJoeHI3ZDluWWRsTTBxNUY1VDgvK1ZCQ0N4L0NOcytrTnd2?= =?utf-8?B?bGoxemJVek1HN1lxMlRkbTZndzVReGsreGQ2eHNBSW1WbnRpci9UL21sdzli?= =?utf-8?B?azNNZmRhSjRhc3JrUjdmNGRaWHYyTXFuU0IzK1djK3psNk8zWkN4MGo2MDBC?= =?utf-8?B?YWhGN0pHZ1NlTTlhbWc0b3JtbWZmQ01IQ2Y0UXpaK2pHM2FlYmJDcE11b2Jv?= =?utf-8?B?dHE1alZ2TXZnSERiekRDeEMwYWlpaWlrd0Q3NGk5ai81dURTanVBV21RbGFE?= =?utf-8?B?OVdtTWlMUUZaSWl0NkdsZDFaU0xaSUNZUUFneTFIaVBYUU5VS0YwQjVhZFE9?= =?utf-8?Q?=3D?= X-Microsoft-Exchange-Diagnostics: 1;VI1PR0802MB2494;6:/GPyDBixtwdvunI2OTLApuaGaJuxWAfM/i9jMKMYq/bh9b14MNcEooK+PZbfRMYpL7wJQfOi7UQ0RN/m644MQloMuX3TLPGL+swwm0qNhUXUc7AHWo0AT4C+ug9TWU1gfNGoi6bDMvuBC3vgY3oiiMrT75YP9bHtr4WTRZvQipF+pECKuG7erWFK/EXrAtvqbuu3U8r1dZxAgs0eBZGk4DoQ46TQ78RPPE6EEEqv7MWXP9dDsruvaFAoddoEnJadd7A0v+0v3B34h8Kfww1m9h48s/YbNeHeEWDqJWTQXkEMztxzJSra+2jGxiUHjYFBX0+x3D9SygdR3t0KLca/LXpiTApMU/xqqexPNPC4WPT2Txq0DFD3QfLLHY8vzPTZbDjNYmZoipS5+FTTJ3A26I0wHxtRqDAdBY/hL11sGO8=;5:2TAlAwISFJuuceanykndoEl3Fh3OCxxPjdC7KNKZlBrHTjd7LhqedeFolcqSXLxt8X6oiwG6k5y4LdjCCC6ZI9bBN5mxD1uef+7hwKy0HWmx+H9+UWt/qFNUcIEvN+pUr90mfpb7RDVCxXTtYLDnmPXkrPHtugExV44qoW030kM=;24:vV5kiomCINSmY3IioSWPSdzizw3KlYTYk0Euu15hldZj5Ty5GRV0HnB1YQhC1Q/yAtlFVFgh80Vc7eWC3O0s4GI1tImSt/j2YPnclCbD8D8= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;VI1PR0802MB2494;7:dQTcEF5YJ/1Y/s0a2G14WzjITUOEPA1i1AsIjZSVuCvm5lD8wn9MVu1yf4qE/af5L0ikI3XGwyJ+CdUlNOJ75C+9GEIMm/Cg3UWlYVw43JE+loe5r3ojbsl6c5mD8ukwYM74MOUbYF0EKrvbfBXe/ebWr4l+PmGuRbeqdrPbNy0d7WxtfkllswO8rzwlRqlbLHb3WCRMcG1OM0uUuFs5SXWJxwyXA8g0wt8bMrZ5nMMhvPjRSehADLBgO256yB5cnLM/WdsgPjqPbK+REWJgcLXyBiKv5iij5n5w0JqH/NT2zN/ZD61LtIg/Qpo5oXmByf0DasRhpGsgOgWYg1RRRw== X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Mar 2017 17:52:35.8788 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0802MB2494 X-SW-Source: 2017-03/txt/msg00560.txt.bz2 On 23/03/17 16:15, Stefan Liebler wrote: > On 03/22/2017 01:56 PM, Szabolcs Nagy wrote: >> the performance of the unconteded case can be improved >> slightly by reverting the unlock change (the release >> store is stronger than the barrier was, conceptually >> there is a barrier before and after an armv8 release >> store to prevent an independent load-acquire to get >> reordered with it in either direction) >> > Thus you mean something like the following? > atomic_thread_fence_release (); > atomic_store_relaxed (lock, 0); > (Info: I've used scripts/build-many-glibcs.py to get the following objdumps. The > sysdeps/powerpc/nptl/pthread_spin_unlock.c is using atomic_store_release, too. For the following > powerpc64-linux-gnu objdumps, I've removed the powerpc-spinlock implementation to see the differences) > =>aarch64-linux-gnu > 0000000000000000 : > 0: d5033bbf dmb ish > 4: b900001f str wzr, [x0] > 8: 52800000 mov w0, #0x0 // #0 > c: d65f03c0 ret > =>powerpc64-linux-gnu: > 0000000000000000 <.pthread_spin_unlock>: > 0: 7c 69 1b 78 mr r9,r3 > 4: 7c 20 04 ac lwsync > 8: 39 40 00 00 li r10,0 > c: 38 60 00 00 li r3,0 > 10: 91 49 00 00 stw r10,0(r9) > 14: 4e 80 00 20 blr > > Here is the upstream code as comparison: > atomic_full_barrier (); > *lock = 0; > =>aarch64-linux-gnu > 0000000000000000 : > 0: aa0003e1 mov x1, x0 > 4: d5033bbf dmb ish > 8: 52800000 mov w0, #0x0 // #0 > c: b900003f str wzr, [x1] > 10: d65f03c0 ret > =>powerpc64-linux-gnu: > 0000000000000000 <.pthread_spin_unlock>: > 0: 7c 69 1b 78 mr r9,r3 > 4: 7c 00 04 ac hwsync > 8: 39 40 00 00 li r10,0 > c: 38 60 00 00 li r3,0 > 10: 91 49 00 00 stw r10,0(r9) > 14: 4e 80 00 20 blr > i compared this (full barrier) to > And the code of my latest patch: > atomic_store_release (lock, 0); > =>aarch64-linux-gnu > 0000000000000000 : > 0: 889ffc1f stlr wzr, [x0] > 4: 52800000 mov w0, #0x0 // #0 > 8: d65f03c0 ret > =>powerpc64-linux-gnu: > 0000000000000000 <.pthread_spin_unlock>: > 0: 7c 69 1b 78 mr r9,r3 > 4: 7c 20 04 ac lwsync > 8: 39 40 00 00 li r10,0 > c: 38 60 00 00 li r3,0 > 10: 91 49 00 00 stw r10,0(r9) > 14: 4e 80 00 20 blr > to this (release store). but meanwhile i convinced myself that stlr makes more sense architecturally (even though on the particular implementation i tested this on it was slower). so i'd prefer keeping the atomic_store_release.