From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 30478 invoked by alias); 16 Aug 2015 19:02:42 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 30468 invoked by uid 89); 16 Aug 2015 19:02:41 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,RP_MATCHES_RCVD,SPF_PASS autolearn=ham version=3.3.2 X-HELO: resqmta-po-09v.sys.comcast.net Received: from resqmta-po-09v.sys.comcast.net (HELO resqmta-po-09v.sys.comcast.net) (96.114.154.168) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Sun, 16 Aug 2015 19:02:40 +0000 Received: from resomta-po-11v.sys.comcast.net ([96.114.154.235]) by resqmta-po-09v.sys.comcast.net with comcast id 5X2Z1r00454zqzk01X2eCt; Sun, 16 Aug 2015 19:02:38 +0000 Received: from [IPv6:2001:558:6045:a4:40c6:7199:cd03:b02d] ([IPv6:2001:558:6045:a4:40c6:7199:cd03:b02d]) by resomta-po-11v.sys.comcast.net with comcast id 5X2c1r0072ztT3H01X2dqk; Sun, 16 Aug 2015 19:02:38 +0000 Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: arm memcpy of aligned data From: Mike Stump In-Reply-To: <557EE17C.5000008@arm.com> Date: Sun, 16 Aug 2015 19:24:00 -0000 Cc: gcc-patches Content-Transfer-Encoding: quoted-printable Message-Id: <177C53C9-4F9A-4C11-A9CC-B8965FFCFA1B@comcast.net> References: <9FF08D9A-529E-4FDE-9193-4BD81EDD89E3@comcast.net> <55682C81.9040709@arm.com> <55683C58.20701@arm.com> <557EE17C.5000008@arm.com> To: Kyrill Tkachov X-IsSubscribed: yes X-SW-Source: 2015-08/txt/msg00867.txt.bz2 On Jun 15, 2015, at 7:30 AM, Kyrill Tkachov wrote: >=20 > On 29/05/15 11:15, Kyrill Tkachov wrote: >> On 29/05/15 10:08, Kyrill Tkachov wrote: >>> Hi Mike, >>>=20 >>> On 28/05/15 22:15, Mike Stump wrote: >>>> So, the arm memcpy code of aligned data isn=92t as good as it can be. >>>>=20 >>>> void *memcpy(void *dest, const void *src, unsigned int n); >>>>=20 >>>> void foo(char *dst, int i) { >>>> memcpy (dst, &i, sizeof (i)); >>>> } >>>>=20 >>>> generates horrible code, but, it we are willing to notice the src or t= he destination are aligned, we can do much better: >>>>=20 >>>> $ ./cc1 -fschedule-fusion -fdump-tree-all-all -da -march=3Darmv7ve -mc= pu=3Dcortex-m4 -fomit-frame-pointer -quiet -O2 /tmp/t.c -o t.s >>>> $ cat t.s >>>> [ =85 ] >>>> foo: >>>> @ args =3D 0, pretend =3D 0, frame =3D 4 >>>> @ frame_needed =3D 0, uses_anonymous_args =3D 0 >>>> @ link register save eliminated. >>>> sub sp, sp, #4 >>>> str r1, [r0] @ unaligned >>>> add sp, sp, #4 >>> I think there's something to do with cpu tuning here as well. >> That being said, I do think this is a good idea. >> I'll give it a test. >=20 > The patch passes bootstrap and testing ok and I've seen it > improve codegen in a few places in SPEC. > I've added a testcase all marked up. >=20 > Mike, I'll commit the attached patch in 24 hours unless somebody objects. Was this ever applied?