From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 24705 invoked by alias); 18 Feb 2015 00:03:44 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 24650 invoked by uid 48); 18 Feb 2015 00:03:41 -0000 From: "solar-gcc at openwall dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/51017] GCC 4.6 performance regression (vs. 4.4/4.5), PRE increases register pressure Date: Wed, 18 Feb 2015 00:03:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 4.6.2 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: solar-gcc at openwall dot com X-Bugzilla-Status: NEW X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2015-02/txt/msg01950.txt.bz2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017 --- Comment #17 from Alexander Peslyak --- (In reply to Richard Biener from comment #16) > I'm completely confused now as to what the original regression was reported > against. I'm sorry, I should have re-read my original description of the regression before I wrote comment 13. Together, these are indeed confusing. > I thought it was the default options in the Makefile, -O2 > -fomit-frame-pointer, which showed the regression and you found -Os would > mitigate it somewhat (and I more specifically told you it is -fno-tree-pre > that makes the actual difference). That's one of the regressions I mentioned in the original description. Yes, you identified -fno-tree-pre as the component of -Os that makes the difference - Thank You! However, I also mentioned in the original description that a bigger regression with 4.6+ vs. 4.5 and 4.4 remained despite of -Os, and I had no similar workaround for it at the time (but enabling -fopenmp made it go away, perhaps due to changes to declarations in the source code in #ifdef _OPENMP blocks). I think we can now say that this bigger 4.6+ regression was primarily caused by the unaligned load instructions. So two regressions are figured out, and the remaining slowdown (not investigated yet) vs. 4.1 to 4.3 (which worked best) is only 6% to 10% in recent versions (9% in 4.9.2). > So - what options give good results with old compilers but bad results with > new compilers? On CPUs where movups/movdqu are slower than their aligned counterparts (for addresses that happen to be aligned), any sane optimization options of 4.6+ give bad results as compared to pre-4.6 with same options. As you say, this can be fixed in the source code (and I most likely will fix it there), but I think many other programs may experience similar slowdowns, so maybe GCC should do something about this too. Other than that, either -Os or -fno-tree-pre works around the second worst slowdown seen in 4.6+. To avoid confusion, maybe this bug should focus on one of the three regressions? Should we keep it for PRE only? Should we create a new bug for the unnecessary and non-optional use of unaligned load instructions for source code like this, or is this considered the new intended behavior despite of the major slowdown on such CPUs? (Presumably not only for JtR. I'd expect this to affect many programs.) Should we also create a bug for investigating the remaining slowdown of 9% in 4.9.2 (vs. 4.1 to 4.3), or is it considered too minor to bother? Thank you!