From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 32482 invoked by alias); 21 Dec 2012 09:55:37 -0000 Received: (qmail 32471 invoked by uid 22791); 21 Dec 2012 09:55:37 -0000 X-SWARE-Spam-Status: No, hits=-5.0 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,KHOP_RCVD_TRUST,KHOP_THREADED,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_YE X-Spam-Check-By: sourceware.org Received: from mail-wi0-f169.google.com (HELO mail-wi0-f169.google.com) (209.85.212.169) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 21 Dec 2012 09:55:29 +0000 Received: by mail-wi0-f169.google.com with SMTP id hq12so1795236wib.4 for ; Fri, 21 Dec 2012 01:55:28 -0800 (PST) MIME-Version: 1.0 Received: by 10.180.79.37 with SMTP id g5mr21810732wix.8.1356083727967; Fri, 21 Dec 2012 01:55:27 -0800 (PST) Received: by 10.194.179.130 with HTTP; Fri, 21 Dec 2012 01:55:27 -0800 (PST) In-Reply-To: <20121221091338.GC15548@kam.mff.cuni.cz> References: <20121219200828.73DB9106927@rong.mtv.corp.google.com> <20121220162054.GA26643@atrey.karlin.mff.cuni.cz> <20121221091338.GC15548@kam.mff.cuni.cz> Date: Fri, 21 Dec 2012 09:55:00 -0000 Message-ID: Subject: Re: [google 4.7] atomic update of profile counters (issue6965050) From: Richard Biener To: Jan Hubicka Cc: Andrew Pinski , Rong Xu , GCC Patches , David Li , reply@codereview.appspotmail.com Content-Type: text/plain; charset=ISO-8859-1 X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2012-12/txt/msg01297.txt.bz2 On Fri, Dec 21, 2012 at 10:13 AM, Jan Hubicka wrote: >> On Thu, Dec 20, 2012 at 8:20 AM, Jan Hubicka wrote: >> >> On Wed, Dec 19, 2012 at 4:29 PM, Andrew Pinski wrote: >> >> > >> >> > On Wed, Dec 19, 2012 at 12:08 PM, Rong Xu wrote: >> >> > > Hi, >> >> > > >> >> > > This patch adds the supprot of atomic update the profile counters. >> >> > > Tested with google internal benchmarks and fdo kernel build. >> >> > >> >> > I think you should use the __atomic_ functions instead of __sync_ >> >> > functions as they allow better performance for simple counters as you >> >> > can use __ATOMIC_RELAXED. >> >> >> >> You are right. I think __ATOMIC_RELAXED should be OK here. >> >> Thanks for the suggestion. >> >> >> >> > >> >> > And this would be useful for the trunk also. I was going to implement >> >> > this exact thing this week but some other important stuff came up. >> >> >> >> I'll post trunk patch later. >> > >> > Yes, I like that patch, too. Even if the costs are quite high (and this is why >> > atomic updates was sort of voted down in the past) the alternative of using TLS >> > has problems with too-much per-thread memory. >> >> Actually sometimes (on some processors) atomic increments are cheaper >> than doing a regular incremental. Mainly because there is an >> instruction which can handle it in the L2 cache rather than populating >> the L1. Octeon is one such processor where this is true. > > One reason for large divergence may be the fact that we optimize the counter > update code. Perhaps declaring counters volatile will prevent load/store motion > and reduce the racing, too. Well, that will make it slower, too. The best benchmark to check is tramp3d for all this stuff. I remember that ICC when it had a function call for each counter update was about 100000x slower instrumented than w/o instrumentation (that is, I never waited long enough to make it finish even one iteration ...) Thus, it's very important that counter updates are subject to loop invariant / store motion (and SCEV const-prop)! GCC does a wonderful job here at the moment, please do not regress here. Richard. > Honza >> >> Thanks, >> Andrew Pinski >> >> > >> > While there are even more alternatives, like recording the changes and >> > commmiting them in blocks (say at function return), I guess some solution is >> > better than no solution. >> > >> > Thanks, >> > Honza