From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 24335 invoked by alias); 22 Jan 2015 16:17:57 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 24235 invoked by uid 48); 22 Jan 2015 16:17:48 -0000 From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/64731] vector lowering should split loads and stores Date: Thu, 22 Jan 2015 16:17:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 5.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_status cf_reconfirmed_on cc short_desc everconfirmed Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2015-01/txt/msg02410.txt.bz2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64731 Richard Biener changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2015-01-22 CC| |rguenth at gcc dot gnu.org Summary|poor code when using |vector lowering should |vector_size((32)) for sse2 |split loads and stores Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- Ok, the issue is "simple" - veclower doesn't split the loads/stores itself but the registers: : # ivtmp.11_24 = PHI _8 = MEM[base: a_6(D), index: ivtmp.11_24, offset: 0B]; _11 = MEM[base: b_9(D), index: ivtmp.11_24, offset: 0B]; _17 = BIT_FIELD_REF <_8, 128, 0>; _4 = BIT_FIELD_REF <_11, 128, 0>; _5 = _4 + _17; _29 = BIT_FIELD_REF <_8, 128, 128>; _28 = BIT_FIELD_REF <_11, 128, 128>; _14 = _28 + _29; _12 = {_5, _14}; MEM[base: a_6(D), index: ivtmp.11_24, offset: 0B] = _12; ivtmp.11_23 = ivtmp.11_24 + 32; if (ivtmp.11_23 != 8192) goto ; else goto ; in this case it would also have a moderately hard time to split the loads/store as it is faced with TARGET_MEM_REFs already. Nothing combines this back into a sane form. I've recently added code that handles exactly the same situation but only for complex arithmetic (in tree-ssa-forwprop.c for PR64568). I wonder why with only -msse2 IVOPTs produces TARGET_MEM_REFs for the loads. For sure x86_64 cannot load V4DF in one instruction...