From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by sourceware.org (Postfix) with ESMTPS id C8C6B3857403 for ; Mon, 10 May 2021 15:36:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org C8C6B3857403 IronPort-SDR: z1M6mqq4BM8kjfuaQV/gLLDqRwzq0WZfIIoQRKI99gvu8qYKzuUWsWxlq/VLYu8f7gIZqJSEaM hQrX9uc/eRCg== X-IronPort-AV: E=McAfee;i="6200,9189,9980"; a="197237453" X-IronPort-AV: E=Sophos;i="5.82,287,1613462400"; d="scan'208";a="197237453" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 May 2021 08:36:07 -0700 IronPort-SDR: J1b5rR3FSrnq0an1CTl8/putgXs7wnnYblBkNF20yAdC9OlR5X2lHN7M252JX/C7LQFrUXgi0D iMih2BB3ZArQ== X-IronPort-AV: E=Sophos;i="5.82,287,1613462400"; d="scan'208";a="433859778" Received: from akleen-mobl1.amr.corp.intel.com (HELO [10.209.32.217]) ([10.209.32.217]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 May 2021 08:36:07 -0700 Subject: Re: State of AutoFDO in GCC To: Jan Hubicka Cc: 172060045@hdu.edu.cn, gcc , Eugene.Rozenfeld@microsoft.com References: <87a6plulkz.fsf@linux.intel.com> <20210426180011.GA1401198@tassilo.jf.intel.com> <20210429054025.GB4032392@tassilo.jf.intel.com> <4d005159.55ea.1791e171ab9.Coremail.172060045@hdu.edu.cn> <875z0378j1.fsf@linux.intel.com> <67df7dbb.5eae.1794bba0ca3.Coremail.172060045@hdu.edu.cn> <7c802b11-857e-78eb-c2e8-0a3044817793@linux.intel.com> <20210509170121.GE25641@kam.mff.cuni.cz> From: Andi Kleen Message-ID: Date: Mon, 10 May 2021 08:36:06 -0700 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.10.1 MIME-Version: 1.0 In-Reply-To: <20210509170121.GE25641@kam.mff.cuni.cz> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US X-Spam-Status: No, score=-1.1 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, NICE_REPLY_A, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 May 2021 15:36:10 -0000 On 5/9/2021 10:01 AM, Jan Hubicka wrote: >>> With my tests, AutoFDO could achieve almost half of the effect of >>> instrumentation FDO on real applications such as MySQL 8.0.20 . >> Likely this could be improved with some of the missing changes. Apparently >> discriminator support is worth quite a bit especially on dense C++ code >> bases. Without that, gcc autofdo only works on line numbers, which can be >> very limiting if single lines have a lot of basic blocks. >> >> Sadly discriminator support is currently only on the old Google branch and >> not in gcc mainline >> >> Longer term it would probably be best to replace all this with some custom >> specialized binary annotation instead of stretching DWARF beyond its limits. > I think it makes sense to pick the AutoFDO to the lists of things that > would be nice to fix in GCC12. I guess first we need to solve the > issues with the tool producing autofdo gcda files and once that works > setup some benchmarking so we know how things compare to FDO and they > get tested. It should work with my updated branch and latest perf. This is an old version before the great LLVMification and the regressions, so it builds on its own. I removed all the checks that broke with new perf versions, at least as far as I could test. https://github.com/andikleen/autofdo/tree/perf-future Longer term I'm thinking the autofdo tools setup as currently done is not the right way anyways. The problem is that to get good results you need to keep autofdo running for a long time, but that causes perf to produce gigantic perf.data files on disk for every sample. For the gcc boot strap runs we had cases where it reached GBs. This is all quite unnecessary because all it needs to do is to keep some statistics on basic block edges, so keeping all the samples is not needed at all. As a minimum we probably need to figure out how to run it in online in perf pipe mode to avoid the temporary files. But the actual code algorithms of create_gcov are rather simple, so maybe it could be converted into a simple online tool that does the profiling. And then we need some tooling to find the profile data for a given binary in this combined output. Today it's rather difficult to patch build systems to always point to the right files. This would need some metadata or at least a file name convention for gcov. This would allow longer term profiling of full real systems with existing build systems. > If you point me to the discriminator patches I can try to figure out how > hard would be to mainline them. It's difficult to find now because it was a branch in the old SVN that wasn't converted. Sadly the great git conversion was quite lossy. IIRC it was separate patches in the google gcc_4_8 SVN branch (of which I don't seem to have a copy either), but in _4_9 they squashed everything in autofdo together. It could be gotten if someone has some backup of the old SVN repository and git convert the google_4_8 branch. Here in the 4_9 version you can search for get_discriminator and diff it against newer versions: https://github.com/andikleen/gcc-old-svn/blob/6ff70bb2ef3cc0a5c6940030a89546bf40e70891/gcc/auto-profile.c#L393 and all the changes to other files were in the complete autofdo patch: https://github.com/andikleen/gcc-old-svn/commit/d71978a93358a397fb80b20f3a65caad3d9addf1#diff-94f0fa7f897ccce65856dc5a98bae4bf6957a346766613d79414c976d093aa4a can also search for discriminator there The basic ideas was quite simple. You have an unique value for the dwarf discriminator for each basic block, and then you include that in all the autofdo location comparisons. > I am not too sure about custom > annotations though - storing info about regions of code segment is > always a pain and it is quite nice that dwarf provides support for that. > But I guess we could go step by step. I first need a working setup ;) The thing about custom annotations is that it would make autofdo independent of the early inliner, which is one of the main reasons for the strange structure and placement of the autofdo passes. If we had an independent stable identifier for code regions this could be all done at better places. Maybe a similar thing could be done in dwarf, but it would seem a stretch because it's really designed around source code. -Andi