From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 50869 invoked by alias); 17 Mar 2015 12:22:45 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 50807 invoked by uid 48); 17 Mar 2015 12:22:41 -0000 From: "jakub at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug rtl-optimization/65078] [5 Regression] 4.9 and 5.0 generate more spill-fill in comparison with 4.8.2 Date: Tue, 17 Mar 2015 12:22:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: rtl-optimization X-Bugzilla-Version: 5.0 X-Bugzilla-Keywords: ra X-Bugzilla-Severity: normal X-Bugzilla-Who: jakub at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 5.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2015-03/txt/msg01681.txt.bz2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65078 --- Comment #10 from Jakub Jelinek --- During the expansion, we don't try vec_extract because we are trying to extract low DImode (64bits) out of a V16QImode pseudo, which is not really vector element extraction, and the middle end doesn't know that on this target it is beneficial to just subreg the V16QImode pseudo to identically sized vector with different sized elements (V2DImode in this case). So, in order to handle this at the expansion level, we probably would need to add some new optab like vec_extract that would be not just about the source mode, but also target mode (conversion optab?), or some target hook or macro that would instruct the middle-end to also try to subreg the vector mode to identically sized other vector mode before trying vec_extract. Immediately after the vec_extract check, we already convert the V16QImode to TImode and force_reg it, so that is the last spot that can do something about it during expansion. To fix this up before reload, we have the option of either !reload_completed splitter or some combiner pattern(s). Short testcase that shows hopefully optimal or close to that output for f5-f8 and really bad code for f1-f4, both with -O2 -m64 and -O2 -msse2 -m32. typedef unsigned char V __attribute__((vector_size (16))); typedef unsigned long long W __attribute__((vector_size (16))); typedef unsigned int T __attribute__((vector_size (16))); void f1 (unsigned long long *x, V y) { *x = ((W)y)[0]; } unsigned long long f2 (V y) { return ((W)y)[0]; } void f3 (unsigned int *x, V y) { *x = ((T)y)[0]; } unsigned int f4 (V y) { return ((T)y)[0]; } void f5 (unsigned long long *x, W y) { *x = ((W)y)[0]; } unsigned long long f6 (W y) { return ((W)y)[0]; } void f7 (unsigned int *x, T y) { *x = ((T)y)[0]; } unsigned int f8 (T y) { return ((T)y)[0]; }