From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 26815 invoked by alias); 9 Dec 2011 19:11:02 -0000 Received: (qmail 26777 invoked by uid 22791); 9 Dec 2011 19:11:01 -0000 X-SWARE-Spam-Status: No, hits=-2.9 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 X-Spam-Check-By: sourceware.org Received: from localhost (HELO gcc.gnu.org) (127.0.0.1) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 09 Dec 2011 19:10:48 +0000 From: "vmakarov at redhat dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug rtl-optimization/21617] [4.4/4.5/4.6/4.7 Regression] CRC64 algorithm optimization problem on Intel 32-bit Date: Fri, 09 Dec 2011 19:16:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: rtl-optimization X-Bugzilla-Keywords: missed-optimization, ra X-Bugzilla-Severity: normal X-Bugzilla-Who: vmakarov at redhat dot com X-Bugzilla-Status: NEW X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 4.4.7 X-Bugzilla-Changed-Fields: CC Message-ID: In-Reply-To: References: X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2011-12/txt/msg01079.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21617 Vladimir Makarov changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |vmakarov at redhat dot com --- Comment #6 from Vladimir Makarov 2011-12-09 19:09:52 UTC --- There is small difference in the code which results in such degradation. -O1 generates an insn in the major loop (insn 43 42 44 5 /home/cygnus/vmakarov/build1/trunk/crctest64.c:241 (parallel [ (set (reg/v:SI 77 [ __tab_index ]) (xor:SI (reg:SI 108) (reg:SI 120))) (clobber (reg:CC 17 flags)) ]) 395 {*xorsi_1} (expr_list:REG_DEAD (reg:SI 108) (expr_list:REG_DEAD (reg:SI 120) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))))) -O2 generates analogous insn (insn 39 38 40 5 /home/cygnus/vmakarov/build1/trunk/crctest64.c:241 (parallel [ (set (reg/v:SI 83 [ __tab_index ]) (xor:SI (reg/v:SI 83 [ __tab_index ]) (reg:SI 143))) (clobber (reg:CC 17 flags)) ]) 395 {*xorsi_1} (expr_list:REG_DEAD (reg:SI 143) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil)))) The reason for the difference because of regmove optimization. The RTL insn in the second variant looks even better but it makes pseudo 83 most frequently used and assigned first by pushing it last to the coloring stack between bunch trivially colorable pseudos. The set of trivially colorable pseudos contains two double word pseudos which need two adjacent hard registers each. Assigning pseudo 83 first (the case is complicated more because some pseudos cross calls) results in presence of only one pair of adjacent hard registers although there are still 2 free hard register for the second double word pseudos but they are not adjacent. It results in spilling of one double word pseudo and code performance degradation. For -O1 analog pseudo 83 (p77) is assigned last after assigning to two double word pseudos and spilling does not occur. To solve the problem we should increase probability of keeping free hard registers adjacent. It can be done by pushing multi-word pseudos last to the coloring stack and as consequence to assign them first by modifying function bucket_allocno_compare_func. I did the problem was solved unfortunately, it results in 2% performance degradation of SPEC2000 perlbmk although there is a small code size improvement on SPEC2000 with this heuristic. On a general note, RA allocation is all about heuristics. So it is possible to find a test where it will work worse than other heuristics. The most important that RA works well in overall (on big credible set of tests). With this point of view IRA is much better than the previous register allocator. But because crc code is important, I'll continue the work on tuning which does not degrade SPEC2000 and which does solve problem.