From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 3454 invoked by alias); 21 Sep 2004 00:13:20 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 3418 invoked from network); 21 Sep 2004 00:13:17 -0000 Received: from unknown (HELO mail.kloo.net) (63.192.214.25) by sourceware.org with SMTP; 21 Sep 2004 00:13:17 -0000 Received: by mail.kloo.net (Postfix, from userid 504) id 7E9A03B0228; Mon, 20 Sep 2004 16:59:09 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by mail.kloo.net (Postfix) with ESMTP id 781EF3B4545; Mon, 20 Sep 2004 16:59:09 -0700 (PDT) Date: Tue, 21 Sep 2004 07:21:00 -0000 From: To: Steven Bosscher Cc: gcc@gcc.gnu.org Subject: Re: Interesting paper from Perdue In-Reply-To: <7347164.1095676587061.SLOX.WebMail.wwwrun@extimap.suse.de> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-SW-Source: 2004-09/txt/msg01209.txt.bz2 On Mon, 20 Sep 2004, Steven Bosscher wrote: > I don't know if anyone has ever seen/read/mentioned this paper > before, I might have missed it. Otherwise, interesting reading: > https://engineering.purdue.edu/ECE/Research/TR/2004pdfs/TR-ECE-04-01.pdf > > Gr. > Steven I'll digress and rant a bit; apologizes in advance. This is just the tip of the iceberg, really. There are many other instances where various optimizations are improved in isolation and degrade performance because they don't consider the effects on the other optimization passes. For example, Some of the recent work on alias analysis really worries me because I believe this will result in a medium-term net performance decrease on many targets. Consider: 1. Improved alias analysis allows better disambiguation of memory references. 2. The current scheduler is overly aggressive about hoisting loads, and is only restrained by the inadequacy of the current alias analysis. When alias analysis is improved, the first scheduling pass will greatly increase register pressure. 3. The register allocator inserts code suboptimially (in particular, restores are too early) and lacks basic fatures such as live-range splitting and rematerialization. Therefore, it exhibits increasingly bad behavior as register pressure increases. I think the following will occur: 1. Targets with the first instruction scheduling pass enabled will exhibit a net decrease in performance due to increased register pressure. This will be exacerbated if the target has fewer registers (e.g. slightly worse on IA64, much worse on PPC). The SH is unlikely to be affected due to scheduler modifications already implemented. 1. Targets without the first scheduling pass enabled will exhibit a net decrease in performance only if the register set is very small (fewer than 16 registers). This includes the x86 and most embedded processors such as the H8/300, M68HC11, 8051, etc. As I see it, the register allocator and the instruction scheduler are really the base of the foundations for GCC optimization. We keep adding improvements which: 1. Allow more intermediate values to be kept in registers which increase register pressure 2. Allow memory to be retained in registers longer, which increases register pressure 3. Create larger basic blocks, which increases register pressure 4. Allow more loop unrolling, which increases register pressure 5. etc ...and the register allocator doesn't handle the increased register pressure well, so the net result is very little improvement. We really spend some time improving the foundation of GCC instead of piling more and more optimizations on top of it. Toshi