From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 11277 invoked by alias); 9 Apr 2013 09:05:03 -0000 Mailing-List: contact libc-ports-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-ports-owner@sourceware.org Received: (qmail 11253 invoked by uid 89); 9 Apr 2013 09:05:02 -0000 X-Spam-SWARE-Status: No, score=-1.5 required=5.0 tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,KHOP_THREADED,RCVD_IN_DNSWL_LOW,TW_CP autolearn=ham version=3.3.1 Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Tue, 09 Apr 2013 09:05:01 +0000 Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Tue, 09 Apr 2013 10:04:59 +0100 Received: from [10.1.69.67] ([10.1.255.212]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Tue, 9 Apr 2013 10:04:56 +0100 Message-ID: <5163D9B8.7030008@arm.com> Date: Tue, 09 Apr 2013 09:05:00 -0000 From: Richard Earnshaw User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:15.0) Gecko/20120907 Thunderbird/15.0.1 MIME-Version: 1.0 To: "Joseph S. Myers" CC: "Shih-Yuan Lee (FourDollars)" , "patches@eglibc.org" , "libc-ports@sourceware.org" , "rex.tsai@canonical.com" , "jesse.sung@canonical.com" , "yc.cheng@canonical.com" , Shih-Yuan Lee Subject: Re: [PATCH] ARM: NEON detected memcpy. References: In-Reply-To: X-MC-Unique: 113040910045903501 Content-Type: text/plain; charset=WINDOWS-1252; format=flowed Content-Transfer-Encoding: quoted-printable X-SW-Source: 2013-04/txt/msg00031.txt.bz2 On 03/04/13 16:08, Joseph S. Myers wrote: > I was previously told by people at ARM that NEON memcpy wasn't a good idea > in practice because of raised power consumption, context switch costs etc. > from using NEON in processes that otherwise didn't use it, even if it > appeared superficially beneficial in benchmarks. What really matters is system power increase vs performance gain and=20 what you might be able to save if you finish sooner. If a 10%=20 improvement to memcpy performance comes at a 12% increase in CPU power,=20 then that might seem like a net loss. But if the CPU is only 50% of the=20 system power, then the increase in system power increase is just half of=20 that (ie 6%), but the performance improvement will still be 10%. Note=20 that 20% is just an example to make the figures easier here, I've no=20 idea what the real numbers are, and they will be hightly dependent on=20 the other components in the system: a back-lit display, in particular,=20 will use a significant amount of power. It's also necessary to think about how the Neon unit in the processor is=20 managed. Is it power gated or simply clock gated. Power gated regions=20 are likely to have long power-up times (relative to normal CPU=20 operations), but clock-gated regions are typically instantaneously=20 available. Finally, you need to consider whether the unit is likely to be already=20 in use. With the increasing trend to using the hard-float ABI, VFP (and=20 Neon) are generally much more widely used in code now than they were, so=20 the other potential cost of using Neon (lazy context switching) is also=20 likely to be a non-issue, than if the unit is almost never touched. R.