From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 84643 invoked by alias); 27 Aug 2015 11:17:02 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 84432 invoked by uid 48); 27 Aug 2015 11:16:54 -0000 From: "ramana at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/67366] Poor assembly generation for unaligned memory accesses on ARM v6 & v7 cpus Date: Thu, 27 Aug 2015 11:17:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 6.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: enhancement X-Bugzilla-Who: ramana at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2015-08/txt/msg01866.txt.bz2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67366 --- Comment #8 from Ramana Radhakrishnan --- (In reply to rguenther@suse.de from comment #7) > On Thu, 27 Aug 2015, ramana at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67366 > > > > --- Comment #6 from Ramana Radhakrishnan --- > > (In reply to rguenther@suse.de from comment #3) > > > On Thu, 27 Aug 2015, rearnsha at gcc dot gnu.org wrote: > > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67366 > > > > > > > > --- Comment #2 from Richard Earnshaw --- > > > > (In reply to Richard Biener from comment #1) > > > > > I think this boils down to the fact that memcpy expansion is done too late > > > > > and > > > > > that (with more recent GCC) the "inlining" done on the GIMPLE level is > > > > > restricted > > > > > to !SLOW_UNALIGNED_ACCESS but arm defines STRICT_ALIGNMENT to 1 > > > > > unconditionally. > > > > > > > > > > > > > Yep, we have to define STRICT_ALIGNMENT to 1 because not all load instructions > > > > work with misaligned addresses (ldm, for example). The only way to handle > > > > misaligned copies is through the movmisalign API. > > > > > > Are the movmisalign handled ones reasonably efficient? That is, more > > > efficient than memcpy/memmove? Then we should experiment with > > > > minor nit - missing include of optabs.h - fixing that and adding a > > movmisalignsi pattern in the backend that just generates either an unaligned / > > storesi insn generates the following for me for the above mentioned testcase. > > > > > > read32: > > @ args = 0, pretend = 0, frame = 0 > > @ frame_needed = 0, uses_anonymous_args = 0 > > @ link register save eliminated. > > ldr r0, [r0] @ unaligned > > bx lr > > > > > > > > > > I'm on holiday from this evening so don't really want to push something today > > ... > > Sure ;) When adding the GIMPLE folding I was just careful here as I > don't really have a STRICT_ALIGNMENT machine with movmisalign handling > available. Thus full testing is appreciated (might turn up some > testcases that need adjustment). There are more STRICT_ALIGN > guarded cases below in the function, eventually they can be modified > as well (at which point splitting out the alignment check to a separate > function makes sense). This was the backend hack I was playing with. It needs extension to HImode values, cleaning up and regression testing and some small amount of benchmarking to see we're doing the right thing. diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 288bbb9..eaff494 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -11423,6 +11423,27 @@ }" ) +(define_expand "movmisalignsi" + [(match_operand:SI 0 "general_operand") + (match_operand:SI 1 "general_operand")] + "unaligned_access" +{ + /* This pattern is not permitted to fail during expansion: if both arguments + are non-registers (e.g. memory := constant, which can be created by the + auto-vectorizer), force operand 1 into a register. */ + if (!s_register_operand (operands[0], SImode) + && !s_register_operand (operands[1], SImode)) + operands[1] = force_reg (SImode, operands[1]); + + if (MEM_P (operands[1])) + emit_insn (gen_unaligned_loadsi (operands[0], operands[1])); + else + emit_insn (gen_unaligned_storesi (operands[0], operands[1])); + + DONE; +}) + + ;; Vector bits common to IWMMXT and Neon (include "vec-common.md") ;; Load the Intel Wireless Multimedia Extension patterns