From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 17392 invoked by alias); 13 Nov 2012 13:05:44 -0000 Received: (qmail 16933 invoked by uid 48); 13 Nov 2012 13:04:29 -0000 From: "jakub at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/54073] [4.7/4.8 Regression] SciMark Monte Carlo test performance has seriously decreased in recent GCC releases Date: Tue, 13 Nov 2012 13:05:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: jakub at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 4.7.3 X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2012-11/txt/msg01143.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073 --- Comment #8 from Jakub Jelinek 2012-11-13 13:04:28 UTC --- Created attachment 28674 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28674 gcc48-pr54073.patch On x86_64-linux on SandyBridge CPU with -O3 -march=corei7-avx I've tracked it down to the http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=171341 change, in particular emit_conditional_move part of the changes. Before the change emit_conditional_move would completely ignore the predicate on the comparison operand (operands[1]), starting with r171341 it honors it. And the movsicc's ordered_comparison_operator would give up on the UNLT comparison in the MonteCarlo testcase, while ix86_expand_int_movcc expands it just fine and at least on that loop it is beneficial to use vucomisd %xmm0, %xmm1 cmovae %eax, %ebp instead of: .L4: addl $1, %ebx ... vucomisd %xmm0, %xmm2 jb .L4 The attached proof of concept patch attempts to just restore the 4.6 and earlier behavior by allowing in all comparisons. Of course perhaps it might be possible it needs better tuning than that, I meant it just as a start for discussions. vanilla trunk: ** ** ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark ** ** for details. (Results can be submitted to pozo@nist.gov) ** ** ** Using 2.00 seconds min time per kenel. Composite Score: 1886.79 FFT Mflops: 1726.97 (N=1024) SOR Mflops: 1239.20 (100 x 100) MonteCarlo: Mflops: 374.13 Sparse matmult Mflops: 1956.30 (N=1000, nz=5000) LU Mflops: 4137.37 (M=100, N=100) patched trunk: ** ** ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark ** ** for details. (Results can be submitted to pozo@nist.gov) ** ** ** Using 2.00 seconds min time per kenel. Composite Score: 1910.08 FFT Mflops: 1726.97 (N=1024) SOR Mflops: 1239.20 (100 x 100) MonteCarlo: Mflops: 528.94 Sparse matmult Mflops: 1949.03 (N=1000, nz=5000) LU Mflops: 4106.27 (M=100, N=100)