From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 20425 invoked by alias); 20 Dec 2012 19:35:39 -0000 Received: (qmail 20317 invoked by uid 22791); 20 Dec 2012 19:35:38 -0000 X-SWARE-Spam-Status: No, hits=-4.9 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,KHOP_RCVD_TRUST,KHOP_THREADED,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_YE,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from mail-qa0-f49.google.com (HELO mail-qa0-f49.google.com) (209.85.216.49) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 20 Dec 2012 19:35:31 +0000 Received: by mail-qa0-f49.google.com with SMTP id r4so2877575qaq.15 for ; Thu, 20 Dec 2012 11:35:30 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:x-gm-message-state; bh=lLYbBVmBhGxgVLWNCc1vCj/pI2/iZbwHDq6oWGxIEos=; b=QW4VTsmMZft4XqrilbtUsu9roNSyA2wCZi4nIA4L6FJvfquwBetaVQZlcGC8HVPF+3 C74q6nf32S1qrYpSLscpHxC1qFAm07CwoO8m/U8f50W7RYVq8GKYobpTDFBBNiXmDKru /zvf819yrfRwgp5euF8ZuZ6d2J6pC07tfbJkqZoJr/ak/G17yTWoDpruYt8PeN6a3xV2 ocYYtfojkPZt8ykN6e60bNxm2YytVTBz0SR5jiwo/IR9POfqqsHPgnB4zNH8viA4WVE9 97ITdIRcXhohzNmznpR8BLQ8IhepgCgahuc8hVpa4IkrhUzJuss8P4eIl6cNBqH23cTA 5tRw== MIME-Version: 1.0 Received: by 10.49.128.231 with SMTP id nr7mr6040864qeb.49.1356032130363; Thu, 20 Dec 2012 11:35:30 -0800 (PST) Received: by 10.229.10.77 with HTTP; Thu, 20 Dec 2012 11:35:30 -0800 (PST) In-Reply-To: References: <20121219200828.73DB9106927@rong.mtv.corp.google.com> <20121220162054.GA26643@atrey.karlin.mff.cuni.cz> Date: Thu, 20 Dec 2012 19:35:00 -0000 Message-ID: Subject: Re: [google 4.7] atomic update of profile counters (issue6965050) From: Rong Xu To: Andrew Pinski Cc: Jan Hubicka , GCC Patches , David Li , reply@codereview.appspotmail.com Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQkKHLcpqh26/Cu70dB/o6hkqHbYue9APucAhCKw7AwaAqxDKyETuymoQ55Pf0KjFd01nNfSl7uIHX/GhxA3pIEJ0nMW0c24uAVU8jRRmFeFMgnS8pWyCwESvfWUL4k+qQ86CNZnnkKPFpKlOLLXxvIKFDJ2n6dOYlbdtPkW1DEPf408QY3G4g5FXkg9zXLCxe4nBHCc X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2012-12/txt/msg01267.txt.bz2 we have this patch primarily for getting valid profile counts. we observe that for some high-threaded programs, we are getting poor counter due to data racing of counter update (like counter value is only 15% of what it supposed to be for a 10-thread program). In general, enabling atomic updates slows down programs. (for my some of my toy programs, it has 3x slow down.) And that the reason I use options to control value and edge profile count. -Rong On Thu, Dec 20, 2012 at 8:57 AM, Andrew Pinski wrote: > On Thu, Dec 20, 2012 at 8:20 AM, Jan Hubicka wrote: >>> On Wed, Dec 19, 2012 at 4:29 PM, Andrew Pinski wrote: >>> > >>> > On Wed, Dec 19, 2012 at 12:08 PM, Rong Xu wrote: >>> > > Hi, >>> > > >>> > > This patch adds the supprot of atomic update the profile counters. >>> > > Tested with google internal benchmarks and fdo kernel build. >>> > >>> > I think you should use the __atomic_ functions instead of __sync_ >>> > functions as they allow better performance for simple counters as you >>> > can use __ATOMIC_RELAXED. >>> >>> You are right. I think __ATOMIC_RELAXED should be OK here. >>> Thanks for the suggestion. >>> >>> > >>> > And this would be useful for the trunk also. I was going to implement >>> > this exact thing this week but some other important stuff came up. >>> >>> I'll post trunk patch later. >> >> Yes, I like that patch, too. Even if the costs are quite high (and this is why >> atomic updates was sort of voted down in the past) the alternative of using TLS >> has problems with too-much per-thread memory. > > Actually sometimes (on some processors) atomic increments are cheaper > than doing a regular incremental. Mainly because there is an > instruction which can handle it in the L2 cache rather than populating > the L1. Octeon is one such processor where this is true. > > Thanks, > Andrew Pinski > >> >> While there are even more alternatives, like recording the changes and >> commmiting them in blocks (say at function return), I guess some solution is >> better than no solution. >> >> Thanks, >> Honza