From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 119336 invoked by alias); 20 Mar 2015 22:15:59 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 119282 invoked by uid 48); 20 Mar 2015 22:15:54 -0000 From: "linux at carewolf dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/65492] Bad optimization in -O3 due to if-conversion and/or unrolling Date: Sat, 21 Mar 2015 02:09:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 5.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: linux at carewolf dot com X-Bugzilla-Status: NEW X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2015-03/txt/msg02186.txt.bz2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65492 --- Comment #9 from Allan Jensen --- Looking at the assembler, it does indeed appear that the only difference just loop unrolling and if conversion. After testing on another machine (and old PhenomII as opposed to the Sandybridge), and report that disabling tree-loop-if-convert directly or indirectly via tree-loop-vectorize -O3 regains all of the speed difference to -O2 on PhenomII. My guess is that the small loop-unrolling is conflicting with op-cache Intel introduced in the SandyBridge and newer architectures which speeds up small tight loops. On architectures without op-cache the loop-unrolling is probably still slightly faster. Unfortunately, using -mtune=sandybridge does not improve the situation, so maybe there should be some architecture tuning on even trivial loop unrolling, and possibly discussion on making it part of generic-x64 tuning.