From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-495724-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 84643 invoked by alias); 27 Aug 2015 11:17:02 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Received: (qmail 84432 invoked by uid 48); 27 Aug 2015 11:16:54 -0000
From: "ramana at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/67366] Poor assembly generation for unaligned memory accesses on ARM v6 & v7 cpus
Date: Thu, 27 Aug 2015 11:17:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 6.0
X-Bugzilla-Keywords:
X-Bugzilla-Severity: enhancement
X-Bugzilla-Who: ramana at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution:
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-67366-4-RV0p98VvcD@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-67366-4@http.gcc.gnu.org/bugzilla/>
References: <bug-67366-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-08/txt/msg01866.txt.bz2

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67366

--- Comment #8 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> ---
(In reply to rguenther@suse.de from comment #7)
> On Thu, 27 Aug 2015, ramana at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67366
> > 
> > --- Comment #6 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> ---
> > (In reply to rguenther@suse.de from comment #3)
> > > On Thu, 27 Aug 2015, rearnsha at gcc dot gnu.org wrote:
> > > 
> > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67366
> > > > 
> > > > --- Comment #2 from Richard Earnshaw <rearnsha at gcc dot gnu.org> ---
> > > > (In reply to Richard Biener from comment #1)
> > > > > I think this boils down to the fact that memcpy expansion is done too late
> > > > > and
> > > > > that (with more recent GCC) the "inlining" done on the GIMPLE level is
> > > > > restricted
> > > > > to !SLOW_UNALIGNED_ACCESS but arm defines STRICT_ALIGNMENT to 1
> > > > > unconditionally.
> > > > > 
> > > > 
> > > > Yep, we have to define STRICT_ALIGNMENT to 1 because not all load instructions
> > > > work with misaligned addresses (ldm, for example).  The only way to handle
> > > > misaligned copies is through the movmisalign API.
> > > 
> > > Are the movmisalign handled ones reasonably efficient?  That is, more
> > > efficient than memcpy/memmove?  Then we should experiment with
> > 
> > minor nit - missing include of optabs.h - fixing that and adding a
> > movmisalignsi pattern in the backend that just generates either an unaligned /
> > storesi insn generates the following for me for the above mentioned testcase.
> > 
> > 
> > read32:
> >         @ args = 0, pretend = 0, frame = 0
> >         @ frame_needed = 0, uses_anonymous_args = 0
> >         @ link register save eliminated.
> >         ldr     r0, [r0]        @ unaligned
> >         bx      lr
> > 
> > 
> > 
> > 
> > I'm on holiday from this evening so don't really want to push something today
> > ... 
> 
> Sure ;)  When adding the GIMPLE folding I was just careful here as I
> don't really have a STRICT_ALIGNMENT machine with movmisalign handling
> available.  Thus full testing is appreciated (might turn up some
> testcases that need adjustment).  There are more STRICT_ALIGN
> guarded cases below in the function, eventually they can be modified
> as well (at which point splitting out the alignment check to a separate
> function makes sense).

This was the backend hack I was playing with. It needs extension to HImode
values, cleaning up and regression testing and some small amount of
benchmarking to see we're doing the right thing.
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 288bbb9..eaff494 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -11423,6 +11423,27 @@
   }"
 )

+(define_expand "movmisalignsi"
+  [(match_operand:SI 0 "general_operand")
+   (match_operand:SI 1 "general_operand")]
+  "unaligned_access"
+{
+  /* This pattern is not permitted to fail during expansion: if both arguments
+     are non-registers (e.g. memory := constant, which can be created by the
+     auto-vectorizer), force operand 1 into a register.  */
+  if (!s_register_operand (operands[0], SImode)
+      && !s_register_operand (operands[1], SImode))
+    operands[1] = force_reg (SImode, operands[1]);
+
+  if (MEM_P (operands[1]))
+    emit_insn (gen_unaligned_loadsi (operands[0], operands[1]));
+  else
+    emit_insn (gen_unaligned_storesi (operands[0], operands[1]));
+
+  DONE;
+})
+
+
 ;; Vector bits common to IWMMXT and Neon
 (include "vec-common.md")
 ;; Load the Intel Wireless Multimedia Extension patterns