From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 25860 invoked by alias); 22 Oct 2013 14:56:58 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 25770 invoked by uid 48); 22 Oct 2013 14:56:55 -0000 From: "jakub at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug rtl-optimization/47477] [4.7/4.8/4.9 regression] Sub-optimal mov at end of method Date: Tue, 22 Oct 2013 14:56:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: rtl-optimization X-Bugzilla-Version: 4.6.0 X-Bugzilla-Keywords: missed-optimization, ra X-Bugzilla-Severity: normal X-Bugzilla-Who: jakub at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 4.8.3 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2013-10/txt/msg01614.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47477 --- Comment #18 from Jakub Jelinek --- (In reply to Kai Tietz from comment #17) > What optimization you expect here? I see by the new type-demotion pass some > changes in optimized tree-output: This one is for vectorization, try it with -O3 -mavx2 and look what vectorized loop we get. With type demotion and promotion for the vectorized loops (perhaps only for that and not for the scalar loops), you could get similar vectorization to say: short a[1024], b[1024]; void foo (void) { int i; for (i = 0; i < 1024; i++) { unsigned short c = ((short)(a[i] << 8) >> 8) + 5U; unsigned short d = b[i] + 12U; a[i] = c + d; } } though even in this case I still couldn't achieve the sign extension to be actually performed as 16-bit left + right (signed) shift, while I guess that would lead to even better code. Or look at how we vectorize: short a[1024], b[1024]; void foo (void) { int i; for (i = 0; i < 1024; i++) { unsigned char e = a[i]; short c = e + 5; long long d = (long long) b[i] + 12; a[i] = c + d; } } (note, here forwprop pass already performs type promotion, instead of converting a[i] to unsigned char and back to short, it computes a[i] & 255 in short mode) and how we could instead with type demotions: short a[1024], b[1024]; void foo (void) { int i; for (i = 0; i < 1024; i++) { unsigned short c = (a[i] & 0xff) + 5U; unsigned short d = b[i] + 12U; a[i] = c + d; } } These are all admittedly artificial testcases, but I've seen tons of loops where multiple types were vectorized and I think in some portion of those loops we could either use just a single type size, or at least decrease the number of conversions and different type sizes in the vectorized loops.