From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 12252 invoked by alias); 18 Dec 2017 16:42:20 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 11757 invoked by uid 89); 18 Dec 2017 16:42:20 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=culture, professionals, fortune, heck X-HELO: esa1.cray.iphmx.com Received: from esa1.cray.iphmx.com (HELO esa1.cray.iphmx.com) (68.232.142.33) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 18 Dec 2017 16:42:18 +0000 X-Cray-OBMMKR: 1433258124 17058908 Received: from cray-smtp-2.cray.com (HELO CFWEX01.americas.cray.com) ([136.162.34.11]) by esa1.cray.iphmx.com with ESMTP/TLS/AES256-SHA; 18 Dec 2017 16:56:59 +0000 Received: from lnx-dag.us.cray.com (172.31.134.145) by CFWEX01.americas.cray.com (172.30.88.25) with Microsoft SMTP Server (TLS) id 14.3.319.2; Mon, 18 Dec 2017 10:41:58 -0600 From: To: Leslie Zhai , , , CC: GCC Development , LLVM Developers Mailing List Subject: Re: Register Allocation Graph Coloring algorithm and Others References: Date: Mon, 18 Dec 2017 16:42:00 -0000 In-Reply-To: +615F0DCE4D5873A9 (Leslie Zhai's message of "Fri, 15 Dec 2017 11:18:37 +0800") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-SW-Source: 2017-12/txt/msg00107.txt.bz2 Leslie Zhai writes: > * Memory (20 - 100 cycles) is expensive than Register (1 cycle), but > it has to spill code when PhysReg is unavailable As Vladimir said, the cache makes this kind of analysis much more tricky. It's not necessarily the case that memory=bad and register=good. Since there are tradeoffs involved, one needs to evaluate different strategies to determine *when* memory is worse than a register. It may very well be the case that leaving something in memory frees up a register for something much more important to use it. All of the register allocation algorithms try to determine this kind of thing through various heuristics. Which heuristic is most effective is highly target-dependent. In my experience, changing heuristics can swing performance 20% or more in some cases. Today's processors and memory systems are so complex that 2nd- and even 3rd-order effects become important. It is very, very wrong on today's machines to use # of spills as a metric to determine the "goodness" of an allocation. Determining *what* to spill is much more important than the raw number of spills. Many times I have a seen codes generated with more spills perform much better than the code generated with fewer spills. Almost all of the papers around the time of Chaiten-Briggs used # of spills as the metric. That may have been appropriate at that time but take those results with giant grains of salt today. Of course they are still very good papers for understanding algorithms and heuristics. The best way I know to determine what's best for your target is to run a whole bunch of *real* codes on them, trying different allocation algorithms and heuristics. It is a lot of work, still worthy of a Ph.D. even today. Register allocation is not a "solved" problem and probably never will be as architectures continue to change and become ever more diverse. > * Folding spill code into instructions, handling register coallescing, > splitting live ranges, doing rematerialization, doing shrink wrapping > are harder than RegAlloc Again, as Vladimir said, they are all part of register allocation. Sometimes they are implemented as separate passes but logically they all contribute work to the task of assigning registers as efficiently as possible. And all of them use heuristics. Choosing when and when not to, for example, coalesce can be important. Splitting live ranges is logically the opposite of coalescing. Theoretically one can "undo" a bad coalescing decision by re-splitting the live range but it's usually not that simple as other decisions in the interim can make that tricky if not impossible. It is a delicate balancing act. > * The papers by Briggs and Chaiten contradict[2] themselves when > examine the text of the paper vs. the pseudocode provided? As others have said, I don't recall any inconsistencies but that doesn't mean there aren't bugs and/or incomplete descriptions open to interpretation. One of my biggest issues with our computer academic culture is that we do not value repeatability. It is virtually impossible to take a paper, implement the algorithm as described (or as best as possible given ambiguity) and obtain the same results. Heck, half my Ph.D. dissertation was dissecting a published paper, examining the sensitivity of the described algorithm to various parts of the described heuristic that were ambiguous. By interpreting the heuristic description in different ways I observed very different results. I read papers for the algorithms, heuristics and ideas in them. I pay no attention to results because in the real world we have to implement the ideas and test them ourselves to see if they will help us. Peter is right to point you to Preston. He is very accessible, friendly and helpful. I had the good fortune to work with him for a few years and he taught me a lot. He has much real-world experience on codes actually used in production. That experience is gold. Good luck to you! You're entering a world that some computing professionals think is a waste of time because "we already know how to do that." Those of us in the trenches know better. :) -David