From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-477514-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 7062 invoked by alias); 17 Feb 2015 03:11:17 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Received: (qmail 7010 invoked by uid 48); 17 Feb 2015 03:11:13 -0000
From: "solar-gcc at openwall dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/51017] GCC 4.6 performance regression (vs. 4.4/4.5), PRE increases register pressure
Date: Tue, 17 Feb 2015 03:11:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 4.6.2
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: solar-gcc at openwall dot com
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-51017-4-nBtd8okxAV@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-51017-4@http.gcc.gnu.org/bugzilla/>
References: <bug-51017-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-02/txt/msg01847.txt.bz2

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017
--- Comment #14 from Alexander Peslyak <solar-gcc at openwall dot com> ---
For completeness, here are the results for 4.7.x, 4.8.x, and 4.9.0:

4.7.0o - 2142K c/s, 29692 bytes, 1267 movaps, 465 movups
4.7.0h - 2823K c/s, 29692 bytes, 1732 movaps, 0 movups
4.7.4o - 2144K c/s, 29692 bytes, 1267 movaps, 465 movups
4.7.4h - 2827K c/s, 29692 bytes, 1732 movaps, 0 movups
4.8.0o - 1825K c/s, 27813 bytes, 1341 movaps, 721 movups
4.8.0h - 2792K c/s, 27813 bytes, 2062 movaps, 0 movups
4.8.4o - 1827K c/s, 27807 bytes, 1341 movaps, 721 movups
4.8.4h - 2786K c/s, 27807 bytes, 2062 movaps, 0 movups
4.9.0o - 1852K c/s, 28262 bytes, 1319 movaps, 721 movups
4.9.0h - 2685K c/s, 28262 bytes, 2040 movaps, 0 movups

4.8 produces the smallest code so far, but even with the aligned loads hack is
still 6% slower than 4.3.

All of these are with "-O2 -fomit-frame-pointer -Os -funroll-loops
-finline-functions", like similar results I had posted before.  Xeon E5420,
x86_64.